WorldWideScience

Sample records for genomic re-organization based

  1. Discovery of global genomic re-organization based on comparison of two newly sequenced rice mitochondrial genomes with cytoplasmic male sterility-related genes

    Directory of Open Access Journals (Sweden)

    Yamada Mari

    2010-03-01

    Full Text Available Abstract Background Plant mitochondrial genomes are known for their complexity, and there is abundant evidence demonstrating that this organelle is important for plant sexual reproduction. Cytoplasmic male sterility (CMS is a phenomenon caused by incompatibility between the nucleus and mitochondria that has been discovered in various plant species. As the exact sequence of steps leading to CMS has not yet been revealed, efforts should be made to elucidate the factors underlying the mechanism of this important trait for crop breeding. Results Two CMS mitochondrial genomes, LD-CMS, derived from Oryza sativa L. ssp. indica (434,735 bp, and CW-CMS, derived from Oryza rufipogon Griff. (559,045 bp, were newly sequenced in this study. Compared to the previously sequenced Nipponbare (Oryza sativa L. ssp. japonica mitochondrial genome, the presence of 54 out of 56 protein-encoding genes (including pseudo-genes, 22 tRNA genes (including pseudo-tRNAs, and three rRNA genes was conserved. Two other genes were not present in the CW-CMS mitochondrial genome, and one of them was present as part of the newly identified chimeric ORF, CW-orf307. At least 12 genomic recombination events were predicted between the LD-CMS mitochondrial genome and Nipponbare, and 15 between the CW-CMS genome and Nipponbare, and novel genetic structures were formed by these genomic rearrangements in the two CMS lines. At least one of the genomic rearrangements was completely unique to each CMS line and not present in 69 rice cultivars or 9 accessions of O. rufipogon. Conclusion Our results demonstrate novel mitochondrial genomic rearrangements that are unique in CMS cytoplasm, and one of the genes that is unique in the CW mitochondrial genome, CW-orf307, appeared to be the candidate most likely responsible for the CW-CMS event. Genomic rearrangements were dynamic in the CMS lines in comparison with those of rice cultivars, suggesting that 'death' and possible 'birth' processes of the

  2. Value-based genomics.

    Science.gov (United States)

    Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi

    2018-03-20

    Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics.

  3. WormBase: Annotating many nematode genomes.

    Science.gov (United States)

    Howe, Kevin; Davis, Paul; Paulini, Michael; Tuli, Mary Ann; Williams, Gary; Yook, Karen; Durbin, Richard; Kersey, Paul; Sternberg, Paul W

    2012-01-01

    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

  4. Estrogens induce rapid cytoskeleton re-organization in human dermal fibroblasts via the non-classical receptor GPR30.

    Directory of Open Access Journals (Sweden)

    Julie Carnesecchi

    Full Text Available The post-menopausal decrease in estrogen circulating levels results in rapid skin deterioration pointing out to a protective effect exerted by these hormones. The identity of the skin cell type responding to estrogens is unclear as are the cellular and molecular processes they elicit. Here, we reported that lack of estrogens induces rapid re-organization of the human dermal fibroblast cytoskeleton resulting in striking cell shape change. This morphological change was accompanied by a spatial re-organization of focal adhesion and a substantial reduction of their number as evidenced by vinculin and actin co-staining. Cell morphology and cytoskeleton organization was fully restored upon 17β-estradiol (E2 addition. Treatment with specific ER antagonists and cycloheximide respectively showed that the E2 acts independently of the classical Estrogen Receptors and that cell shape change is mediated by non-genomic mechanisms. E2 treatment resulted in a rapid and transient activation of ERK1/2 but not Src or PI3K. We show that human fibroblasts express the non-classical E2 receptor GPR30 and that its agonist G-1 phenocopies the effect of E2. Inhibiting GPR30 through treatment with the G-15 antagonist or specific shRNA impaired E2 effects. Altogether, our data reveal a novel mechanism by which estrogens act on skin fibroblast by regulating cell shape through the non-classical G protein-coupled receptor GPR30 and ERK1/2 activation.

  5. Estrogens induce rapid cytoskeleton re-organization in human dermal fibroblasts via the non-classical receptor GPR30.

    Science.gov (United States)

    Carnesecchi, Julie; Malbouyres, Marilyne; de Mets, Richard; Balland, Martial; Beauchef, Gallic; Vié, Katell; Chamot, Christophe; Lionnet, Claire; Ruggiero, Florence; Vanacker, Jean-Marc

    2015-01-01

    The post-menopausal decrease in estrogen circulating levels results in rapid skin deterioration pointing out to a protective effect exerted by these hormones. The identity of the skin cell type responding to estrogens is unclear as are the cellular and molecular processes they elicit. Here, we reported that lack of estrogens induces rapid re-organization of the human dermal fibroblast cytoskeleton resulting in striking cell shape change. This morphological change was accompanied by a spatial re-organization of focal adhesion and a substantial reduction of their number as evidenced by vinculin and actin co-staining. Cell morphology and cytoskeleton organization was fully restored upon 17β-estradiol (E2) addition. Treatment with specific ER antagonists and cycloheximide respectively showed that the E2 acts independently of the classical Estrogen Receptors and that cell shape change is mediated by non-genomic mechanisms. E2 treatment resulted in a rapid and transient activation of ERK1/2 but not Src or PI3K. We show that human fibroblasts express the non-classical E2 receptor GPR30 and that its agonist G-1 phenocopies the effect of E2. Inhibiting GPR30 through treatment with the G-15 antagonist or specific shRNA impaired E2 effects. Altogether, our data reveal a novel mechanism by which estrogens act on skin fibroblast by regulating cell shape through the non-classical G protein-coupled receptor GPR30 and ERK1/2 activation.

  6. Genome chaos: survival strategy during crisis.

    Science.gov (United States)

    Liu, Guo; Stevens, Joshua B; Horne, Steven D; Abdallah, Batoul Y; Ye, Karen J; Bremer, Steven W; Ye, Christine J; Chen, David J; Heng, Henry H

    2014-01-01

    Genome chaos, a process of complex, rapid genome re-organization, results in the formation of chaotic genomes, which is followed by the potential to establish stable genomes. It was initially detected through cytogenetic analyses, and recently confirmed by whole-genome sequencing efforts which identified multiple subtypes including "chromothripsis", "chromoplexy", "chromoanasynthesis", and "chromoanagenesis". Although genome chaos occurs commonly in tumors, both the mechanism and detailed aspects of the process are unknown due to the inability of observing its evolution over time in clinical samples. Here, an experimental system to monitor the evolutionary process of genome chaos was developed to elucidate its mechanisms. Genome chaos occurs following exposure to chemotherapeutics with different mechanisms, which act collectively as stressors. Characterization of the karyotype and its dynamic changes prior to, during, and after induction of genome chaos demonstrates that chromosome fragmentation (C-Frag) occurs just prior to chaotic genome formation. Chaotic genomes seem to form by random rejoining of chromosomal fragments, in part through non-homologous end joining (NHEJ). Stress induced genome chaos results in increased karyotypic heterogeneity. Such increased evolutionary potential is demonstrated by the identification of increased transcriptome dynamics associated with high levels of karyotypic variance. In contrast to impacting on a limited number of cancer genes, re-organized genomes lead to new system dynamics essential for cancer evolution. Genome chaos acts as a mechanism of rapid, adaptive, genome-based evolution that plays an essential role in promoting rapid macroevolution of new genome-defined systems during crisis, which may explain some unwanted consequences of cancer treatment.

  7. Genomics-based plant germplasm research (GPGR)

    Institute of Scientific and Technical Information of China (English)

    Jizeng Jia; Hongjie Li; Xueyong Zhang; Zichao Li; Lijuan Qiu

    2017-01-01

    Plant germplasm underpins much of crop genetic improvement. Millions of germplasm accessions have been collected and conserved ex situ and/or in situ, and the major challenge is now how to exploit and utilize this abundant resource. Genomics-based plant germplasm research (GPGR) or "Genoplasmics" is a novel cross-disciplinary research field that seeks to apply the principles and techniques of genomics to germplasm research. We describe in this paper the concept, strategy, and approach behind GPGR, and summarize current progress in the areas of the definition and construction of core collections, enhancement of germplasm with core collections, and gene discovery from core collections. GPGR is opening a new era in germplasm research. The contribution, progress and achievements of GPGR in the future are predicted.

  8. GenColors-based comparative genome databases for small eukaryotic genomes.

    Science.gov (United States)

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  9. A Trichosporonales genome tree based on 27 haploid and three evolutionarily conserved 'natural' hybrid genomes.

    Science.gov (United States)

    Takashima, Masako; Sriswasdi, Sira; Manabe, Ri-Ichiroh; Ohkuma, Moriya; Sugita, Takashi; Iwasaki, Wataru

    2018-01-01

    To construct a backbone tree consisting of basidiomycetous yeasts, draft genome sequences from 25 species of Trichosporonales (Tremellomycetes, Basidiomycota) were generated. In addition to the hybrid genomes of Trichosporon coremiiforme and Trichosporon ovoides that we described previously, we identified an interspecies hybrid genome in Cutaneotrichosporon mucoides (formerly Trichosporon mucoides). This hybrid genome had a gene retention rate of ~55%, and its closest haploid relative was Cutaneotrichosporon dermatis. After constructing the C. mucoides subgenomes, we generated a phylogenetic tree using genome data from the 27 haploid species and the subgenome data from the three hybrid genome species. It was a high-quality tree with 100% bootstrap support for all of the branches. The genome-based tree provided superior resolution compared with previous multi-gene analyses. Although our backbone tree does not include all Trichosporonales genera (e.g. Cryptotrichosporon), it will be valuable for future analyses of genome data. Interest in interspecies hybrid fungal genomes has recently increased because they may provide a basis for new technologies. The three Trichosporonales hybrid genomes described in this study are different from well-characterized hybrid genomes (e.g. those of Saccharomyces pastorianus and Saccharomyces bayanus) because these hybridization events probably occurred in the distant evolutionary past. Hence, they will be useful for studying genome stability following hybridization and speciation events. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  10. CyanoBase: the cyanobacteria genome database update 2010.

    Science.gov (United States)

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.

  11. CyanoBase: the cyanobacteria genome database update 2010

    OpenAIRE

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2009-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in var...

  12. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...

  13. Connectome-harmonic decomposition of human brain activity reveals dynamical repertoire re-organization under LSD.

    Science.gov (United States)

    Atasoy, Selen; Roseman, Leor; Kaelen, Mendel; Kringelbach, Morten L; Deco, Gustavo; Carhart-Harris, Robin L

    2017-12-15

    Recent studies have started to elucidate the effects of lysergic acid diethylamide (LSD) on the human brain but the underlying dynamics are not yet fully understood. Here we used 'connectome-harmonic decomposition', a novel method to investigate the dynamical changes in brain states. We found that LSD alters the energy and the power of individual harmonic brain states in a frequency-selective manner. Remarkably, this leads to an expansion of the repertoire of active brain states, suggestive of a general re-organization of brain dynamics given the non-random increase in co-activation across frequencies. Interestingly, the frequency distribution of the active repertoire of brain states under LSD closely follows power-laws indicating a re-organization of the dynamics at the edge of criticality. Beyond the present findings, these methods open up for a better understanding of the complex brain dynamics in health and disease.

  14. Microstructure of transcallosal motor fibers reflects type of cortical (re-)organization in congenital hemiparesis.

    Science.gov (United States)

    Juenger, Hendrik; Koerte, Inga K; Muehlmann, Marc; Mayinger, Michael; Mall, Volker; Krägeloh-Mann, Ingeborg; Shenton, Martha E; Berweck, Steffen; Staudt, Martin; Heinen, Florian

    2014-11-01

    Early unilateral brain lesions can lead to different types of corticospinal (re-)organization of motor networks. In one group of patients, the contralesional hemisphere exerts motor control not only over the contralateral non-paretic hand but also over the (ipsilateral) paretic hand, as the primary motor cortex is (re-)organized in the contralesional hemisphere. Another group of patients with early unilateral lesions shows "normal" contralateral motor projections starting in the lesioned hemisphere. We investigated how these different patterns of cortical (re-)organization affect interhemispheric transcallosal connectivity in patients with congenital hemiparesis. Eight patients with ipsilateral motor projections (group IPSI) versus 7 patients with contralateral motor projections (group CONTRA) underwent magnetic resonance diffusion tensor imaging (DTI). The corpus callosum (CC) was subdivided in 5 areas (I-V) in the mid-sagittal slice and volumetric information. The following diffusion parameters were calculated: fractional anisotropy (FA), trace, radial diffusivity (RD), and axial diffusivity (AD). DTI revealed significantly lower FA, increased trace and RD for group IPSI compared to group CONTRA in area III of the corpus callosum, where transcallosal motor fibers cross the CC. In the directly neighboring area IV, where transcallosal somatosensory fibers cross the CC, no differences were found for these DTI parameters between IPSI and CONTRA. Volume of callosal subsections showed significant differences for area II (connecting premotor cortices) and III, where group IPSI had lower volume. The results of this study demonstrate that the callosal microstructure in patients with congenital hemiparesis reflects the type of cortical (re-)organization. Early lesions disrupting corticospinal motor projections to the paretic hand consecutively affect the development or maintenance of transcallosal motor fibers. Copyright © 2014 European Paediatric Neurology Society

  15. Experimental evidence of dynamic re-organization of evolving landscapes under changing climatic forcing

    Science.gov (United States)

    Singh, Arvind; Tejedor, Alejandro; Zaliapin, Ilya; Reinhardt, Liam; Foufoula-Georgiou, Efi

    2015-04-01

    The aim of this study is to better understand the dynamic re-organization of an evolving landscape under a scenario of changing climatic forcing for improving our knowledge of geomorphic transport laws under transient conditions and developing predictive models of landscape response to external perturbations. Real landscape observations for long-term analysis are limited and to this end a high resolution controlled laboratory experiment was conducted at the St. Anthony Falls laboratory at the University of Minnesota. Elevation data were collected at temporal resolution of 5 mins and spatial resolution of 0.5 mm as the landscape approached steady state (constant uplift and precipitation rate) and in the transient state (under the same uplift and 5x precipitation). The results reveal rapid topographic re-organization under a five-fold precipitation increase with the fluvial regime expanding into the previously debris dominated regime, accelerated erosion happening at hillslope scales, and rivers shifting from an erosion-limited to a transport-limited regime. From a connectivity and clustering analysis of the erosional and depositional events, we demonstrate the strikingly different spatial patterns of landscape evolution under steady-state (SS) and transient-state (TS), even when the time under SS is "stretched" compared to that under TS such as to match the total volume and PDF of erosional and depositional amounts. We quantify the spatial coupling of hillslopes and channels and demonstrate that hillslopes lead and channels follow in re-organizing the whole landscape under such an amplified precipitation regime.

  16. Transcriptional regulation by histone modifications: towards a theory of chromatin re-organization during stem cell differentiation

    International Nuclear Information System (INIS)

    Binder, Hans; Steiner, Lydia; Przybilla, Jens; Rohlf, Thimo; Prohaska, Sonja; Galle, Jörg

    2013-01-01

    Chromatin-related mechanisms, as e.g. histone modifications, are known to be involved in regulatory switches within the transcriptome. Only recently, mathematical models of these mechanisms have been established. So far they have not been applied to genome-wide data. We here introduce a mathematical model of transcriptional regulation by histone modifications and apply it to data of trimethylation of histone 3 at lysine 4 (H3K4me3) and 27 (H3K27me3) in mouse pluripotent and lineage-committed cells. The model describes binding of protein complexes to chromatin which are capable of reading and writing histone marks. Molecular interactions of the complexes with DNA and modified histones create a regulatory switch of transcriptional activity. The regulatory states of the switch depend on the activity of histone (de-) methylases, the strength of complex-DNA-binding and the number of nucleosomes capable of cooperatively contributing to complex-binding. Our model explains experimentally measured length distributions of modified chromatin regions. It suggests (i) that high CpG-density facilitates recruitment of the modifying complexes in embryonic stem cells and (ii) that re-organization of extended chromatin regions during lineage specification into neuronal progenitor cells requires targeted de-modification. Our approach represents a basic step towards multi-scale models of transcriptional control during development and lineage specification. (paper)

  17. Barcode server: a visualization-based genome analysis system.

    Directory of Open Access Journals (Sweden)

    Fenglou Mao

    Full Text Available We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a identification of horizontally transferred genes, (b identification of genomic islands with special properties and (c binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a calculation of the k-mer based barcode image for a provided DNA sequence; (b detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c clustering of provided DNA sequences into groups having similar barcodes; and (d homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode.

  18. CRISPR/Cas9 based genome editing of Penicillium chrysogenum

    NARCIS (Netherlands)

    Pohl, Carsten; Kiel, Jan A K W; Driessen, Arnold J M; Bovenberg, Roel A L; Nygård, Yvonne

    2016-01-01

    CRISPR/Cas9 based systems have emerged as versatile platforms for precision genome editing in a wide range of organisms. Here we have developed powerful CRISPR/Cas9 tools for marker-based and marker-free genome modifications in Penicillium chrysogenum, a model filamentous fungus and industrially

  19. New Genome Similarity Measures based on Conserved Gene Adjacencies.

    Science.gov (United States)

    Doerr, Daniel; Kowada, Luis Antonio B; Araujo, Eloi; Deshpande, Shachi; Dantas, Simone; Moret, Bernard M E; Stoye, Jens

    2017-06-01

    Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerful-but also most complex-models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.

  20. RESEARCH NOTE Genome-based exome-sequencing analysis ...

    Indian Academy of Sciences (India)

    Navya

    2017-02-22

    Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.

  1. A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes

    Directory of Open Access Journals (Sweden)

    Michael Strong

    2009-12-01

    Full Text Available As the number of completely sequenced microbial genomes continues to rise at an impressive rate, it is important to prepare students with the skills necessary to investigate microorganisms at the genomic level. As a part of the core curriculum for first-year graduate students in the biological sciences, we have implemented a web-based tutorial to introduce students to the fields of comparative and functional genomics. The tutorial focuses on recent computational methods for identifying functionally linked genes and proteins on a genome-wide scale and was used to introduce students to the Rosetta Stone, Phylogenetic Profile, conserved Gene Neighbor, and Operon computational methods. Students learned to use a number of publicly available web servers and databases to identify functionally linked genes in the Escherichia coli genome, with emphasis on genome organization and operon structure. The overall effectiveness of the tutorial was assessed based on student evaluations and homework assignments. The tutorial is available to other educators at http://www.doe-mbi.ucla.edu/~strong/m253.php.

  2. Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

    Science.gov (United States)

    Jeong, Young-Min; Kim, Namshin; Ahn, Byung Ohg; Oh, Mijin; Chung, Won-Hyong; Chung, Hee; Jeong, Seongmun; Lim, Ki-Byung; Hwang, Yoon-Jung; Kim, Goon-Bo; Baek, Seunghoon; Choi, Sang-Bong; Hyung, Dae-Jin; Lee, Seung-Won; Sohn, Seong-Han; Kwon, Soo-Jin; Jin, Mina; Seol, Young-Joo; Chae, Won Byoung; Choi, Keun Jin; Park, Beom-Seok; Yu, Hee-Ju; Mun, Jeong-Hwan

    2016-07-01

    This study presents a chromosome-scale draft genome sequence of radish that is assembled into nine chromosomal pseudomolecules. A comprehensive comparative genome analysis with the Brassica genomes provides genomic evidences on the evolution of the mesohexaploid radish genome. Radish (Raphanus sativus L.) is an agronomically important root vegetable crop and its origin and phylogenetic position in the tribe Brassiceae is controversial. Here we present a comprehensive analysis of the radish genome based on the chromosome sequences of R. sativus cv. WK10039. The radish genome was sequenced and assembled into 426.2 Mb spanning >98 % of the gene space, of which 344.0 Mb were integrated into nine chromosome pseudomolecules. Approximately 36 % of the genome was repetitive sequences and 46,514 protein-coding genes were predicted and annotated. Comparative mapping of the tPCK-like ancestral genome revealed that the radish genome has intermediate characteristics between the Brassica A/C and B genomes in the triplicated segments, suggesting an internal origin from the genus Brassica. The evolutionary characteristics shared between radish and other Brassica species provided genomic evidences that the current form of nine chromosomes in radish was rearranged from the chromosomes of hexaploid progenitor. Overall, this study provides a chromosome-scale draft genome sequence of radish as well as novel insight into evolution of the mesohexaploid genomes in the tribe Brassiceae.

  3. Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.

    Science.gov (United States)

    Hiscock, D; Upton, C

    2000-05-01

    The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .

  4. Genome-Based Comparison of Clostridioides difficile: Average Amino Acid Identity Analysis of Core Genomes.

    Science.gov (United States)

    Cabal, Adriana; Jun, Se-Ran; Jenjaroenpun, Piroon; Wanchai, Visanu; Nookaew, Intawat; Wongsurawat, Thidathip; Burgess, Mary J; Kothari, Atul; Wassenaar, Trudy M; Ussery, David W

    2018-02-14

    Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same

  5. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models

    Directory of Open Access Journals (Sweden)

    Surovcik Katharina

    2006-03-01

    Full Text Available Abstract Background Horizontal gene transfer (HGT is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired

  6. Genomic profiling of oral squamous cell carcinoma by array-based comparative genomic hybridization.

    Directory of Open Access Journals (Sweden)

    Shunichi Yoshioka

    Full Text Available We designed a study to investigate genetic relationships between primary tumors of oral squamous cell carcinoma (OSCC and their lymph node metastases, and to identify genomic copy number aberrations (CNAs related to lymph node metastasis. For this purpose, we collected a total of 42 tumor samples from 25 patients and analyzed their genomic profiles by array-based comparative genomic hybridization. We then compared the genetic profiles of metastatic primary tumors (MPTs with their paired lymph node metastases (LNMs, and also those of LNMs with non-metastatic primary tumors (NMPTs. Firstly, we found that although there were some distinctive differences in the patterns of genomic profiles between MPTs and their paired LNMs, the paired samples shared similar genomic aberration patterns in each case. Unsupervised hierarchical clustering analysis grouped together 12 of the 15 MPT-LNM pairs. Furthermore, similarity scores between paired samples were significantly higher than those between non-paired samples. These results suggested that MPTs and their paired LNMs are composed predominantly of genetically clonal tumor cells, while minor populations with different CNAs may also exist in metastatic OSCCs. Secondly, to identify CNAs related to lymph node metastasis, we compared CNAs between grouped samples of MPTs and LNMs, but were unable to find any CNAs that were more common in LNMs. Finally, we hypothesized that subpopulations carrying metastasis-related CNAs might be present in both the MPT and LNM. Accordingly, we compared CNAs between NMPTs and LNMs, and found that gains of 7p, 8q and 17q were more common in the latter than in the former, suggesting that these CNAs may be involved in lymph node metastasis of OSCC. In conclusion, our data suggest that in OSCCs showing metastasis, the primary and metastatic tumors share similar genomic profiles, and that cells in the primary tumor may tend to metastasize after acquiring metastasis-associated CNAs.

  7. Changing Histopathological Diagnostics by Genome-Based Tumor Classification

    Directory of Open Access Journals (Sweden)

    Michael Kloth

    2014-05-01

    Full Text Available Traditionally, tumors are classified by histopathological criteria, i.e., based on their specific morphological appearances. Consequently, current therapeutic decisions in oncology are strongly influenced by histology rather than underlying molecular or genomic aberrations. The increase of information on molecular changes however, enabled by the Human Genome Project and the International Cancer Genome Consortium as well as the manifold advances in molecular biology and high-throughput sequencing techniques, inaugurated the integration of genomic information into disease classification. Furthermore, in some cases it became evident that former classifications needed major revision and adaption. Such adaptations are often required by understanding the pathogenesis of a disease from a specific molecular alteration, using this molecular driver for targeted and highly effective therapies. Altogether, reclassifications should lead to higher information content of the underlying diagnoses, reflecting their molecular pathogenesis and resulting in optimized and individual therapeutic decisions. The objective of this article is to summarize some particularly important examples of genome-based classification approaches and associated therapeutic concepts. In addition to reviewing disease specific markers, we focus on potentially therapeutic or predictive markers and the relevance of molecular diagnostics in disease monitoring.

  8. WormBase 2016: expanding to enable helminth genomic research.

    Science.gov (United States)

    Howe, Kevin L; Bolt, Bruce J; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Davis, Paul; Done, James; Down, Thomas; Gao, Sibyl; Grove, Christian; Harris, Todd W; Kishore, Ranjana; Lee, Raymond; Lomax, Jane; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Stanley, Eleanor; Tuli, Mary Ann; Van Auken, Kimberly; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wright, Adam; Yook, Karen; Berriman, Matthew; Kersey, Paul; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W

    2016-01-04

    WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  10. Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome

    Science.gov (United States)

    CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of human, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific t...

  11. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  12. Ascaris phylogeny based on multiple whole mtDNA genomes

    DEFF Research Database (Denmark)

    Nejsum, Peter; Hawash, Mohamed B F; Betson, Martha

    2016-01-01

    and C) of human and pig Ascaris based on partial cox1 sequences. In the present study, we selected major haplotypes from these different clusters to characterize their whole mitochondrial genomes for phylogenetic analysis. We also undertook coalescent simulations to investigate the evolutionary history...

  13. Integrated genome-based studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Tiedje, James M. [Michigan State Univ., East Lansing, MI (United States); Konstantinidis, Kostas [Michigan State Univ., East Lansing, MI (United States); Worden, Mark [Michigan State Univ., East Lansing, MI (United States)

    2014-01-08

    The aim of the work reported is to study Shewanella population genomics, and to understand the evolution, ecophysiology, and speciation of Shewanella. The tasks supporting this aim are: to study genetic and ecophysiological bases defining the core and diversification of Shewanella species; to determine gene content patterns along redox gradients; and to Investigate the evolutionary processes, patterns and mechanisms of Shewanella.

  14. Base-By-Base: single nucleotide-level analysis of whole viral genome alignments.

    Science.gov (United States)

    Brodie, Ryan; Smith, Alex J; Roper, Rachel L; Tcherepanov, Vasily; Upton, Chris

    2004-07-14

    With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

  15. Quantitative metagenomic analyses based on average genome size normalization

    DEFF Research Database (Denmark)

    Frank, Jeremy Alexander; Sørensen, Søren Johannes

    2011-01-01

    provide not just a census of the community members but direct information on metabolic capabilities and potential interactions among community members. Here we introduce a method for the quantitative characterization and comparison of microbial communities based on the normalization of metagenomic data...... marine sources using both conventional small-subunit (SSU) rRNA gene analyses and our quantitative method to calculate the proportion of genomes in each sample that are capable of a particular metabolic trait. With both environments, to determine what proportion of each community they make up and how......). These analyses demonstrate how genome proportionality compares to SSU rRNA gene relative abundance and how factors such as average genome size and SSU rRNA gene copy number affect sampling probability and therefore both types of community analysis....

  16. Genomic comparisons of Brucella spp. and closely related bacteria using base compositional and proteome based methods

    DEFF Research Database (Denmark)

    Bohlin, Jon; Snipen, Lars; Cloeckaert, Axel

    2010-01-01

    BACKGROUND: Classification of bacteria within the genus Brucella has been difficult due in part to considerable genomic homogeneity between the different species and biovars, in spite of clear differences in phenotypes. Therefore, many different methods have been used to assess Brucella taxonomy....... In the current work, we examine 32 sequenced genomes from genus Brucella representing the six classical species, as well as more recently described species, using bioinformatical methods. Comparisons were made at the level of genomic DNA using oligonucleotide based methods (Markov chain based genomic signatures...... between the oligonucleotide based methods used. Whilst the Markov chain based genomic signatures grouped the different species in genus Brucella according to host preference, the codon and amino acid frequencies based methods reflected small differences between the Brucella species. Only minor differences...

  17. Tracing common origins of Genomic Islands in prokaryotes based on genome signature analyses.

    Science.gov (United States)

    van Passel, Mark Wj

    2011-09-01

    Horizontal gene transfer constitutes a powerful and innovative force in evolution, but often little is known about the actual origins of transferred genes. Sequence alignments are generally of limited use in tracking the original donor, since still only a small fraction of the total genetic diversity is thought to be uncovered. Alternatively, approaches based on similarities in the genome specific relative oligonucleotide frequencies do not require alignments. Even though the exact origins of horizontally transferred genes may still not be established using these compositional analyses, it does suggest that compositionally very similar regions are likely to have had a common origin. These analyses have shown that up to a third of large acquired gene clusters that reside in the same genome are compositionally very similar, indicative of a shared origin. This brings us closer to uncovering the original donors of horizontally transferred genes, and could help in elucidating possible regulatory interactions between previously unlinked sequences.

  18. Genome-Based Microbial Taxonomy Coming of Age.

    Science.gov (United States)

    Hugenholtz, Philip; Skarshewski, Adam; Parks, Donovan H

    2016-06-01

    Reconstructing the complete evolutionary history of extant life on our planet will be one of the most fundamental accomplishments of scientific endeavor, akin to the completion of the periodic table, which revolutionized chemistry. The road to this goal is via comparative genomics because genomes are our most comprehensive and objective evolutionary documents. The genomes of plant and animal species have been systematically targeted over the past decade to provide coverage of the tree of life. However, multicellular organisms only emerged in the last 550 million years of more than three billion years of biological evolution and thus comprise a small fraction of total biological diversity. The bulk of biodiversity, both past and present, is microbial. We have only scratched the surface in our understanding of the microbial world, as most microorganisms cannot be readily grown in the laboratory and remain unknown to science. Ground-breaking, culture-independent molecular techniques developed over the past 30 years have opened the door to this so-called microbial dark matter with an accelerating momentum driven by exponential increases in sequencing capacity. We are on the verge of obtaining representative genomes across all life for the first time. However, historical use of morphology, biochemical properties, behavioral traits, and single-marker genes to infer organismal relationships mean that the existing highly incomplete tree is riddled with taxonomic errors. Concerted efforts are now needed to synthesize and integrate the burgeoning genomic data resources into a coherent universal tree of life and genome-based taxonomy. Copyright © 2016 Cold Spring Harbor Laboratory Press; all rights reserved.

  19. Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal

    In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...... explored. Results show that low sequencing depth caused underestimation of allele substitution effects in GWAS and overestimation of genomic heritability in prediction studies. Other factors susch as SNP marker density, population structure and size of training population influenced accuracy of genomic...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction...

  20. The effect of genealogy-based haplotypes on genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid; Fernando, Rohan L.; Su, Guosheng

    2013-01-01

    on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using...... local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (pi) of the haplotype covariates had zero effect......, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some...

  1. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  2. Genomic-based-breeding tools for tropical maize improvement.

    Science.gov (United States)

    Chakradhar, Thammineni; Hindu, Vemuri; Reddy, Palakolanu Sudhakar

    2017-12-01

    Maize has traditionally been the main staple diet in the Southern Asia and Sub-Saharan Africa and widely grown by millions of resource poor small scale farmers. Approximately, 35.4 million hectares are sown to tropical maize, constituting around 59% of the developing worlds. Tropical maize encounters tremendous challenges besides poor agro-climatic situations with average yields recorded <3 tones/hectare that is far less than the average of developed countries. On the contrary to poor yields, the demand for maize as food, feed, and fuel is continuously increasing in these regions. Heterosis breeding introduced in early 90 s improved maize yields significantly, but genetic gains is still a mirage, particularly for crop growing under marginal environments. Application of molecular markers has accelerated the pace of maize breeding to some extent. The availability of array of sequencing and genotyping technologies offers unrivalled service to improve precision in maize-breeding programs through modern approaches such as genomic selection, genome-wide association studies, bulk segregant analysis-based sequencing approaches, etc. Superior alleles underlying complex traits can easily be identified and introgressed efficiently using these sequence-based approaches. Integration of genomic tools and techniques with advanced genetic resources such as nested association mapping and backcross nested association mapping could certainly address the genetic issues in maize improvement programs in developing countries. Huge diversity in tropical maize and its inherent capacity for doubled haploid technology offers advantage to apply the next generation genomic tools for accelerating production in marginal environments of tropical and subtropical world. Precision in phenotyping is the key for success of any molecular-breeding approach. This article reviews genomic technologies and their application to improve agronomic traits in tropical maize breeding has been reviewed in

  3. CRISPR/Cas9 Based Genome Editing of Penicillium chrysogenum.

    Science.gov (United States)

    Pohl, C; Kiel, J A K W; Driessen, A J M; Bovenberg, R A L; Nygård, Y

    2016-07-15

    CRISPR/Cas9 based systems have emerged as versatile platforms for precision genome editing in a wide range of organisms. Here we have developed powerful CRISPR/Cas9 tools for marker-based and marker-free genome modifications in Penicillium chrysogenum, a model filamentous fungus and industrially relevant cell factory. The developed CRISPR/Cas9 toolbox is highly flexible and allows editing of new targets with minimal cloning efforts. The Cas9 protein and the sgRNA can be either delivered during transformation, as preassembled CRISPR-Cas9 ribonucleoproteins (RNPs) or expressed from an AMA1 based plasmid within the cell. The direct delivery of the Cas9 protein with in vitro synthesized sgRNA to the cells allows for a transient method for genome engineering that may rapidly be applicable for other filamentous fungi. The expression of Cas9 from an AMA1 based vector was shown to be highly efficient for marker-free gene deletions.

  4. Mechanisms of Base Substitution Mutagenesis in Cancer Genomes

    Directory of Open Access Journals (Sweden)

    Albino Bacolla

    2014-03-01

    Full Text Available Cancer genome sequence data provide an invaluable resource for inferring the key mechanisms by which mutations arise in cancer cells, favoring their survival, proliferation and invasiveness. Here we examine recent advances in understanding the molecular mechanisms responsible for the predominant type of genetic alteration found in cancer cells, somatic single base substitutions (SBSs. Cytosine methylation, demethylation and deamination, charge transfer reactions in DNA, DNA replication timing, chromatin status and altered DNA proofreading activities are all now known to contribute to the mechanisms leading to base substitution mutagenesis. We review current hypotheses as to the major processes that give rise to SBSs and evaluate their relative relevance in the light of knowledge acquired from cancer genome sequencing projects and the study of base modifications, DNA repair and lesion bypass. Although gene expression data on APOBEC3B enzymes provide support for a role in cancer mutagenesis through U:G mismatch intermediates, the enzyme preference for single-stranded DNA may limit its activity genome-wide. For SBSs at both CG:CG and YC:GR sites, we outline evidence for a prominent role of damage by charge transfer reactions that follow interactions of the DNA with reactive oxygen species (ROS and other endogenous or exogenous electron-abstracting molecules.

  5. Mechanisms of base substitution mutagenesis in cancer genomes.

    Science.gov (United States)

    Bacolla, Albino; Cooper, David N; Vasquez, Karen M

    2014-03-05

    Cancer genome sequence data provide an invaluable resource for inferring the key mechanisms by which mutations arise in cancer cells, favoring their survival, proliferation and invasiveness. Here we examine recent advances in understanding the molecular mechanisms responsible for the predominant type of genetic alteration found in cancer cells, somatic single base substitutions (SBSs). Cytosine methylation, demethylation and deamination, charge transfer reactions in DNA, DNA replication timing, chromatin status and altered DNA proofreading activities are all now known to contribute to the mechanisms leading to base substitution mutagenesis. We review current hypotheses as to the major processes that give rise to SBSs and evaluate their relative relevance in the light of knowledge acquired from cancer genome sequencing projects and the study of base modifications, DNA repair and lesion bypass. Although gene expression data on APOBEC3B enzymes provide support for a role in cancer mutagenesis through U:G mismatch intermediates, the enzyme preference for single-stranded DNA may limit its activity genome-wide. For SBSs at both CG:CG and YC:GR sites, we outline evidence for a prominent role of damage by charge transfer reactions that follow interactions of the DNA with reactive oxygen species (ROS) and other endogenous or exogenous electron-abstracting molecules.

  6. Integrated Genome-Based Studies of Shewanella Echophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Margrethe H. Serres

    2012-06-29

    Shewanella oneidensis MR-1 is a motile, facultative {gamma}-Proteobacterium with remarkable respiratory versatility; it can utilize a range of organic and inorganic compounds as terminal electronacceptors for anaerobic metabolism. The ability to effectively reduce nitrate, S0, polyvalent metals andradionuclides has established MR-1 as an important model dissimilatory metal-reducing microorganism for genome-based investigations of biogeochemical transformation of metals and radionuclides that are of concern to the U.S. Department of Energy (DOE) sites nationwide. Metal-reducing bacteria such as Shewanella also have a highly developed capacity for extracellular transfer of respiratory electrons to solid phase Fe and Mn oxides as well as directly to anode surfaces in microbial fuel cells. More broadly, Shewanellae are recognized free-living microorganisms and members of microbial communities involved in the decomposition of organic matter and the cycling of elements in aquatic and sedimentary systems. To function and compete in environments that are subject to spatial and temporal environmental change, Shewanella must be able to sense and respond to such changes and therefore require relatively robust sensing and regulation systems. The overall goal of this project is to apply the tools of genomics, leveraging the availability of genome sequence for 18 additional strains of Shewanella, to better understand the ecophysiology and speciation of respiratory-versatile members of this important genus. To understand these systems we propose to use genome-based approaches to investigate Shewanella as a system of integrated networks; first describing key cellular subsystems - those involved in signal transduction, regulation, and metabolism - then building towards understanding the function of whole cells and, eventually, cells within populations. As a general approach, this project will employ complimentary "top-down" - bioinformatics-based genome functional predictions, high

  7. Marker-based estimation of genetic parameters in genomics.

    Directory of Open Access Journals (Sweden)

    Zhiqiu Hu

    Full Text Available Linear mixed model (LMM analysis has been recently used extensively for estimating additive genetic variances and narrow-sense heritability in many genomic studies. While the LMM analysis is computationally less intensive than the Bayesian algorithms, it remains infeasible for large-scale genomic data sets. In this paper, we advocate the use of a statistical procedure known as symmetric differences squared (SDS as it may serve as a viable alternative when the LMM methods have difficulty or fail to work with large datasets. The SDS procedure is a general and computationally simple method based only on the least squares regression analysis. We carry out computer simulations and empirical analyses to compare the SDS procedure with two commonly used LMM-based procedures. Our results show that the SDS method is not as good as the LMM methods for small data sets, but it becomes progressively better and can match well with the precision of estimation by the LMM methods for data sets with large sample sizes. Its major advantage is that with larger and larger samples, it continues to work with the increasing precision of estimation while the commonly used LMM methods are no longer able to work under our current typical computing capacity. Thus, these results suggest that the SDS method can serve as a viable alternative particularly when analyzing 'big' genomic data sets.

  8. Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments

    Directory of Open Access Journals (Sweden)

    Tcherepanov Vasily

    2004-07-01

    Full Text Available Abstract Background With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes is not feasible without new bioinformatics tools. Results A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1 rapidly identify and correct alignment errors in large, multiple genome alignments; and 2 generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs to retrieve detailed annotation information about the aligned genomes or use information from text files. Conclusion Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

  9. Integrated Genome-Based Studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Andrei L. Osterman, Ph.D.

    2012-12-17

    Integration of bioinformatics and experimental techniques was applied to mapping and characterization of the key components (pathways, enzymes, transporters, regulators) of the core metabolic machinery in Shewanella oneidensis and related species with main focus was on metabolic and regulatory pathways involved in utilization of various carbon and energy sources. Among the main accomplishments reflected in ten joint publications with other participants of Shewanella Federation are: (i) A systems-level reconstruction of carbohydrate utilization pathways in the genus of Shewanella (19 species). This analysis yielded reconstruction of 18 sugar utilization pathways including 10 novel pathway variants and prediction of > 60 novel protein families of enzymes, transporters and regulators involved in these pathways. Selected functional predictions were verified by focused biochemical and genetic experiments. Observed growth phenotypes were consistent with bioinformatic predictions providing strong validation of the technology and (ii) Global genomic reconstruction of transcriptional regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors, 8 riboswitches and 6 translational attenuators. Of those, 45 regulons were inferred directly from the genome context analysis, whereas others were propagated from previously characterized regulons in other species. Selected regulatory predictions were experimentally tested. Integration of this analysis with microarray data revealed overall consistency and provided additional layer of interactions between regulons. All the results were captured in the new database RegPrecise, which is a joint development with the LBNL team. A more detailed analysis of the individual subsystems, pathways and regulons in Shewanella spp included bioinfiormatics-based prediction and experimental characterization of: (i) N-Acetylglucosamine catabolic pathway; (ii)Lactate utilization machinery; (iii) Novel Nrt

  10. KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

    Science.gov (United States)

    Wang, Dapeng; Xu, Jiayue; Yu, Jun

    2015-09-16

    The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.

  11. A critique of race-based and genomic medicine.

    Science.gov (United States)

    Meier, Robert J

    2012-03-01

    Now that a composite human genome has been sequenced (HGP), research has accelerated to discover precise genetic bases of several chronic health issues, particularly in the realms of cancer and cardiovascular disease. It is anticipated that in the future it will be possible and cost effective to regularly sequence individual genomes, and thereby produce a DNA profile that potentially can be used to assess the health risks for each person with respect to certain genetically predisposed conditions. Coupled with that enormous diagnostic power, it will then depend upon equally rapid research efforts to develop personalized courses of treatment, including that of pharmaceutical therapy. Initial treatment attempts have been made to match drug efficacy and safety to individuals of assigned or self-identified groups according to their genetic ancestry or presumed race. A prime example is that of BiDil, which was the first drug approved by the US FDA for the explicit treatment of heart patients of African American ancestry. This race-based approach to medicine has been met with justifiable criticism, notably on ethical grounds that have long plagued historical applications and misuses of human race classification, and also on questionable science. This paper will assess race-based medical research and practice in light of a more thorough understanding of human genetic variability. Additional concerns will be expressed with regard to the rapidly developing area of pharmacogenomics, promoted to be the future of personalized medicine. Genomic epidemiology will be discussed with several examples of on-going research that hopefully will provide a solid scientific grounding for personalized medicine to build upon.

  12. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    Segmental duplications are >1kb segments of duplicated DNA present in a genome with high sequence identity (>90%). They are associated with genomic rearrangements and provide a significant source of gene and genome evolution within mammalian genomes. Although segmental duplications have been...... extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... and their associated copy number alterations, focusing on the global organization of these segments and their possible functional significance in porcine phenotypes. This work provides insights into mammalian genome evolution and generates a valuable resource for porcine genomics research...

  13. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

    Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...

  14. CrusView: a Java-based visualization platform for comparative genomics analyses in Brassicaceae species.

    Science.gov (United States)

    Chen, Hao; Wang, Xiangfeng

    2013-09-01

    In plants and animals, chromosomal breakage and fusion events based on conserved syntenic genomic blocks lead to conserved patterns of karyotype evolution among species of the same family. However, karyotype information has not been well utilized in genomic comparison studies. We present CrusView, a Java-based bioinformatic application utilizing Standard Widget Toolkit/Swing graphics libraries and a SQLite database for performing visualized analyses of comparative genomics data in Brassicaceae (crucifer) plants. Compared with similar software and databases, one of the unique features of CrusView is its integration of karyotype information when comparing two genomes. This feature allows users to perform karyotype-based genome assembly and karyotype-assisted genome synteny analyses with preset karyotype patterns of the Brassicaceae genomes. Additionally, CrusView is a local program, which gives its users high flexibility when analyzing unpublished genomes and allows users to upload self-defined genomic information so that they can visually study the associations between genome structural variations and genetic elements, including chromosomal rearrangements, genomic macrosynteny, gene families, high-frequency recombination sites, and tandem and segmental duplications between related species. This tool will greatly facilitate karyotype, chromosome, and genome evolution studies using visualized comparative genomics approaches in Brassicaceae species. CrusView is freely available at http://www.cmbb.arizona.edu/CrusView/.

  15. Solution-based targeted genomic enrichment for precious DNA samples

    Directory of Open Access Journals (Sweden)

    Shearer Aiden

    2012-05-01

    Full Text Available Abstract Background Solution-based targeted genomic enrichment (TGE protocols permit selective sequencing of genomic regions of interest on a massively parallel scale. These protocols could be improved by: 1 modifying or eliminating time consuming steps; 2 increasing yield to reduce input DNA and excessive PCR cycling; and 3 enhancing reproducible. Results We developed a solution-based TGE method for downstream Illumina sequencing in a non-automated workflow, adding standard Illumina barcode indexes during the post-hybridization amplification to allow for sample pooling prior to sequencing. The method utilizes Agilent SureSelect baits, primers and hybridization reagents for the capture, off-the-shelf reagents for the library preparation steps, and adaptor oligonucleotides for Illumina paired-end sequencing purchased directly from an oligonucleotide manufacturing company. Conclusions This solution-based TGE method for Illumina sequencing is optimized for small- or medium-sized laboratories and addresses the weaknesses of standard protocols by reducing the amount of input DNA required, increasing capture yield, optimizing efficiency, and improving reproducibility.

  16. Kernel Based Nonlinear Dimensionality Reduction and Classification for Genomic Microarray

    Directory of Open Access Journals (Sweden)

    Lan Shu

    2008-07-01

    Full Text Available Genomic microarrays are powerful research tools in bioinformatics and modern medicinal research because they enable massively-parallel assays and simultaneous monitoring of thousands of gene expression of biological samples. However, a simple microarray experiment often leads to very high-dimensional data and a huge amount of information, the vast amount of data challenges researchers into extracting the important features and reducing the high dimensionality. In this paper, a nonlinear dimensionality reduction kernel method based locally linear embedding(LLE is proposed, and fuzzy K-nearest neighbors algorithm which denoises datasets will be introduced as a replacement to the classical LLE’s KNN algorithm. In addition, kernel method based support vector machine (SVM will be used to classify genomic microarray data sets in this paper. We demonstrate the application of the techniques to two published DNA microarray data sets. The experimental results confirm the superiority and high success rates of the presented method.

  17. A Genomics-Based Classification of Human Lung Tumors

    NARCIS (Netherlands)

    Seidel, Danila; Zander, Thomas; Heukamp, Lukas C.; Peifer, Martin; Bos, Marc; Fernandez-Cuesta, Lynnette; Leenders, Frauke; Lu, Xin; Ansen, Sascha; Gardizi, Masyar; Nguyen, Chau; Berg, Johannes; Russell, Prudence; Wainer, Zoe; Schildhaus, Hans-Ulrich; Rogers, Toni-Maree; Solomon, Benjamin; Pao, William; Carter, Scott L.; Getz, Gad; Hayes, D. Neil; Wilkerson, Matthew D.; Thunnissen, Erik; Travis, William D.; Perner, Sven; Wright, Gavin; Brambilla, Elisabeth; Buettner, Reinhard; Wolf, Juergen; Thomas, Roman; Gabler, Franziska; Wilkening, Ines; Mueller, Christian; Dahmen, Ilona; Menon, Roopika; Koenig, Katharina; Albus, Kerstin; Merkelbach-Bruse, Sabine; Fassunke, Jana; Schmitz, Katja; Kuenstlinger, Helen; Kleine, Michaela; Binot, Elke; Querings, Silvia; Altmueller, Janine; Boessmann, Ingelore; Nuemberg, Peter; Schneider, Peter; Groen, Harry; Timens, Wim

    2013-01-01

    We characterized genome alterations in 1255 clinically annotated lung tumors of all histological subgroups to identify genetically defined and clinically relevant subtypes. More than 55% of all cases had at least one oncogenic genome alteration potentially amenable to specific therapeutic

  18. CRISPR-Cas9 Based Engineering of Actinomycetal Genomes

    DEFF Research Database (Denmark)

    Tong, Yaojun; Charusanti, Pep; Zhang, Lixin

    2015-01-01

    . To facilitate the genetic manipulation of actinomycetes, we developed a highly efficient CRISPR-Cas9 system to delete gene(s) or gene cluster(s), implement precise gene replacements, and reversibly control gene expression in actinomycetes. We demonstrate our system by targeting two genes, actIORF1 (SCO5087......) and actVB (SCO5092), from the actinorhodin biosynthetic gene cluster in Streptomyces coelicolor A3(2). Our CRISPR-Cas9 system successfully inactivated the targeted genes. When no templates for homology-directed repair (HDR) were present, the site-specific DNA double-strand breaks (DSBs) introduced by Cas9....... Moreover, we developed a system to efficiently and reversibly control expression of target genes, deemed CRISPRi, based on a catalytically dead variant of Cas9 (dCas9). The CRISPR-Cas9 based system described here comprises a powerful and broadly applicable set of tools to manipulate actinomycetal genomes....

  19. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Science.gov (United States)

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  20. An efficient approach to BAC based assembly of complex genomes.

    Science.gov (United States)

    Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

    2016-01-01

    There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.

  1. Disk-based compression of data from genome sequencing.

    Science.gov (United States)

    Grabowski, Szymon; Deorowicz, Sebastian; Roguski, Łukasz

    2015-05-01

    High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since the redundancy between overlapping reads cannot be easily captured in the (relatively small) main memory. More interesting solutions for this problem are disk based, where the better of these two, from Cox et al. (2012), is based on the Burrows-Wheeler transform (BWT) and achieves 0.518 bits per base for a 134.0 Gbp human genome sequencing collection with almost 45-fold coverage. We propose overlapping reads compression with minimizers, a compression algorithm dedicated to sequencing reads (DNA only). Our method makes use of a conceptually simple and easily parallelizable idea of minimizers, to obtain 0.317 bits per base as the compression ratio, allowing to fit the 134.0 Gbp dataset into only 5.31 GB of space. http://sun.aei.polsl.pl/orcom under a free license. sebastian.deorowicz@polsl.pl Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming.

    Science.gov (United States)

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.

  3. Congruent Deep Relationships in the Grape Family (Vitaceae Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming.

    Directory of Open Access Journals (Sweden)

    Ning Zhang

    Full Text Available Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera. The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.

  4. Web Apollo: a web-based genomic annotation editing platform.

    Science.gov (United States)

    Lee, Eduardo; Helt, Gregg A; Reese, Justin T; Munoz-Torres, Monica C; Childers, Chris P; Buels, Robert M; Stein, Lincoln; Holmes, Ian H; Elsik, Christine G; Lewis, Suzanna E

    2013-08-30

    Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.

  5. GeNemo: a search engine for web-based functional genomic data.

    Science.gov (United States)

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-07-08

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  7. [A major game in the re-organization of the Professional Nursing School].

    Science.gov (United States)

    de Amorin, Wellington Mendonça; Barreira, Ieda de Alencar

    2007-01-01

    This is a historical-social description study supported on the thought of Pierre Bourdieu based on documental analysis. It describes the sanitarists and psychiatrists' actions from the reformulation of Education and Public Health Ministry into Education and Health Ministry in the beginning of New State and analyse the fight's strategies of the main agents to take advantage on their proposals of Professional Nursing School's reorganization. The fight's strategies that psychiatrists, sanitarists and certificated nurses had used to stake their projects, characterized a difficult battle inserted in a hard major game. The analyse of the ten course's months of the main document shows the conflict between those agents to impose a new rule to the school.

  8. HpBase: A genome database of a sea urchin, Hemicentrotus pulcherrimus.

    Science.gov (United States)

    Kinjo, Sonoko; Kiyomoto, Masato; Yamamoto, Takashi; Ikeo, Kazuho; Yaguchi, Shunsuke

    2018-04-01

    To understand the mystery of life, it is important to accumulate genomic information for various organisms because the whole genome encodes the commands for all the genes. Since the genome of Strongylocentrotus purpratus was sequenced in 2006 as the first sequenced genome in echinoderms, the genomic resources of other North American sea urchins have gradually been accumulated, but no sea urchin genomes are available in other areas, where many scientists have used the local species and reported important results. In this manuscript, we report a draft genome of the sea urchin Hemincentrotus pulcherrimus because this species has a long history as the target of developmental and cell biology in East Asia. The genome of H. pulcherrimus was assembled into 16,251 scaffold sequences with an N50 length of 143 kbp, and approximately 25,000 genes were identified in the genome. The size of the genome and the sequencing coverage were estimated to be approximately 800 Mbp and 100×, respectively. To provide these data and information of annotation, we constructed a database, HpBase (http://cell-innovation.nig.ac.jp/Hpul/). In HpBase, gene searches, genome browsing, and blast searches are available. In addition, HpBase includes the "recipes" for experiments from each lab using H. pulcherrimus. These recipes will continue to be updated according to the circumstances of individual scientists and can be powerful tools for experimental biologists and for the community. HpBase is a suitable dataset for evolutionary, developmental, and cell biologists to compare H. pulcherrimus genomic information with that of other species and to isolate gene information. © 2018 Japanese Society of Developmental Biologists.

  9. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    Josephine Erhiakporeh

    2016-07-06

    Jul 6, 2016 ... candidate genes for drought tolerance in sesame. (Sesamum ... Our results provided genomic resources for further functional analysis and genetic engineering .... reverse transcribed using the Reverse Transcription System.

  10. Integrated Genome-Based Studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Jizhong [Univ. of Oklahoma, Norman, OK (United States); He, Zhili [Univ. of Oklahoma, Norman, OK (United States)

    2014-04-08

    As a part of the Shewanella Federation project, we have used integrated genomic, proteomic and computational technologies to study various aspects of energy metabolism of two Shewanella strains from a systems-level perspective.

  11. GenomeVx: simple web-based creation of editable circular chromosome maps.

    Science.gov (United States)

    Conant, Gavin C; Wolfe, Kenneth H

    2008-03-15

    We describe GenomeVx, a web-based tool for making editable, publication-quality, maps of mitochondrial and chloroplast genomes and of large plasmids. These maps show the location of genes and chromosomal features as well as a position scale. The program takes as input either raw feature positions or GenBank records. In the latter case, features are automatically extracted and colored, an example of which is given. Output is in the Adobe Portable Document Format (PDF) and can be edited by programs such as Adobe Illustrator. GenomeVx is available at http://wolfe.gen.tcd.ie/GenomeVx

  12. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system.

    Science.gov (United States)

    Speth, Daan R; In 't Zandt, Michiel H; Guerrero-Cruz, Simon; Dutilh, Bas E; Jetten, Mike S M

    2016-03-31

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date.

  13. Genome-based prediction of common diseases: Methodological considerations for future research

    NARCIS (Netherlands)

    A.C.J.W. Janssens (Cécile); P. Tikka-Kleemola (Päivi)

    2009-01-01

    textabstractThe translation of emerging genomic knowledge into public health and clinical care is one of the major challenges for the coming decades. At the moment, genome-based prediction of common diseases, such as type 2 diabetes, coronary heart disease and cancer, is still not informative. Our

  14. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

    NARCIS (Netherlands)

    Speth, D.R.; Zandt, M.H. in 't; Guerrero Cruz, S.; Dutilh, B.E.; Jetten, M.S.M.

    2016-01-01

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is

  15. Genome analysis and DNA marker-based characterisation of pathogenic trypanosomes

    NARCIS (Netherlands)

    Agbo, Edwin Chukwura

    2003-01-01

    The advances in genomics technologies and genome analysis methods that offer new leads for accelerating discovery of putative targets for developing overall control tools are reviewed in Chapter 1. In Chapter 2, a PCR typing method based on restriction fragment length polymorphism analysis of the

  16. SIGI: score-based identification of genomic islands

    Directory of Open Access Journals (Sweden)

    Merkl Rainer

    2004-03-01

    Full Text Available Abstract Background Genomic islands can be observed in many microbial genomes. These stretches of DNA have a conspicuous composition with regard to sequence or encoded functions. Genomic islands are assumed to be frequently acquired via horizontal gene transfer. For the analysis of genome structure and the study of horizontal gene transfer, it is necessary to reliably identify and characterize these islands. Results A scoring scheme on codon frequencies Score_G1G2(cdn = log(f_G2(cdn / f_G1(cdn was utilized. To analyse genes of a species G1 and to test their relatedness to species G2, scores were determined by applying the formula to log-odds derived from mean codon frequencies of the two genomes. A non-redundant set of nearly 400 codon usage tables comprising microbial species was derived; its members were used alternatively at position G2. Genes having at least one score value above a species-specific and dynamically determined cut-off value were analysed further. By means of cluster analysis, genes were identified that comprise clusters of statistically significant size. These clusters were predicted as genomic islands. Finally and individually for each of these genes, the taxonomical relation among those species responsible for significant scores was interpreted. The validity of the approach and its limitations were made plausible by an extensive analysis of natural genes and synthetic ones aimed at modelling the process of gene amelioration. Conclusions The method reliably allows to identify genomic island and the likely origin of alien genes.

  17. Genomes-based phylogeny of the genus Xanthomonas

    Directory of Open Access Journals (Sweden)

    Rodriguez-R Luis M

    2012-03-01

    Full Text Available Abstract Background The genus Xanthomonas comprises several plant pathogenic bacteria affecting a wide range of hosts. Despite the economic, industrial and biological importance of Xanthomonas, the classification and phylogenetic relationships within the genus are still under active debate. Some of the relationships between pathovars and species have not been thoroughly clarified, with old pathovars becoming new species. A change in the genus name has been recently suggested for Xanthomonas albilineans, an early branching species currently located in this genus, but a thorough phylogenomic reconstruction would aid in solving these and other discrepancies in this genus. Results Here we report the results of the genome-wide analysis of DNA sequences from 989 orthologous groups from 17 Xanthomonas spp. genomes available to date, representing all major lineages within the genus. The phylogenetic and computational analyses used in this study have been automated in a Perl package designated Unus, which provides a framework for phylogenomic analyses which can be applied to other datasets at the genomic level. Unus can also be easily incorporated into other phylogenomic pipelines. Conclusions Our phylogeny agrees with previous phylogenetic topologies on the genus, but revealed that the genomes of Xanthomonas citri and Xanthomonas fuscans belong to the same species, and that of Xanthomonas albilineans is basal to the joint clade of Xanthomonas and Xylella fastidiosa. Genome reduction was identified in the species Xanthomonas vasicola in addition to the previously identified reduction in Xanthomonas albilineans. Lateral gene transfer was also observed in two gene clusters.

  18. A web-based multi-genome synteny viewer for customized data

    Directory of Open Access Journals (Sweden)

    Revanna Kashi V

    2012-08-01

    Full Text Available Abstract Background Web-based synteny visualization tools are important for sharing data and revealing patterns of complicated genome conservation and rearrangements. Such tools should allow biologists to upload genomic data for their own analysis. This requirement is critical because individual biologists are generating large amounts of genomic sequences that quickly overwhelm any centralized web resources to collect and display all those data. Recently, we published a web-based synteny viewer, GSV, which was designed to satisfy the above requirement. However, GSV can only compare two genomes at a given time. Extending the functionality of GSV to visualize multiple genomes is important to meet the increasing demand of the research community. Results We have developed a multi-Genome Synteny Viewer (mGSV. Similar to GSV, mGSV is a web-based tool that allows users to upload their own genomic data files for visualization. Multiple genomes can be presented in a single integrated view with an enhanced user interface. Users can navigate through all the selected genomes in either pairwise or multiple viewing mode to examine conserved genomic regions as well as the accompanying genome annotations. Besides serving users who manually interact with the web server, mGSV also provides Web Services for machine-to-machine communication to accept data sent by other remote resources. The entire mGSV package can also be downloaded for easy local installation. Conclusions mGSV significantly enhances the original functionalities of GSV. A web server hosting mGSV is provided at http://cas-bioinfo.cas.unt.edu/mgsv.

  19. INTEGRATED GENOME-BASED STUDIES OF SHEWANELLA ECOPHYSIOLOGY

    Energy Technology Data Exchange (ETDEWEB)

    NEALSON, KENNETH H.

    2013-10-15

    laboratories. Applications: 1. Corrosion: Electron flow is often part of the corrosive process, and several studies were done in concert with this proposal with regard to the ability of EET-capable bacteria to enhance, inhibit, or detect corrosion. These included using EET-capable bacteria to detect corrosion in its earliest stages [5], to use corrosion-causing bacteria for the study of the microbe/mineral interface during corrosion [1], and to study the groups of microbes involved with corrosion of natural systems [19]. 2. Bioenergy and microbial fuel cells: The production of electricity by Shewanella was shown early in this program (several years ago) to be dependent on the genes for extracellular electron transport (EET), and applied work involved the testing of various strains and conditions for the optimization of current production by the shewanellae [11,14,16]. 3. Identification of shewanellae strains: Based on similarities seen in genomic comparisons, a rapid method was employed for distinguishing between shewanellae strains [17]. Interactions with other laboratories: This grant was an extension of a grant involving the so-called ?Shewanella Federation?, and as such, a number of our publications were joint with other members of this group. The groups included: 1. Pacific Northwest Laboratories ? 2. Oak Ridge National Labs 3. Michigan State University 4. University of Oklahoma 5. Naval Research Laboratory, Washington DC 6. Burnham Medical Research Institute, San Diego 7. J. Craig Venter Institute, San Diego Education: Graduate Students: Michael Waters, Ph.D. ? at NIST, Washington D.C. Lewis Hsu, Ph.D. ? at NRL, San Diego Howard Harris, Ph.D. ? Postdoc at University, France Everett Salas, Ph.D. ? Scientist at Chevron McLean, Jeffrey, Ph.D. ? Scientist at J. Craig Venter Institute McCrow, John, Ph.D. ? Scientist at J. Craig Venter Institute Postdocs: Mohamed El-Naggar ? Professor of Physics, USC Jinjun Kan ? Senior Researcher at Undergraduatges: During this year, we had

  20. Genomic organization, annotation, and ligand-receptor inferences of chicken chemokines and chemokine receptor genes based on comparative genomics

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2005-03-01

    Full Text Available Abstract Background Chemokines and their receptors play important roles in host defense, organogenesis, hematopoiesis, and neuronal communication. Forty-two chemokines and 19 cognate receptors have been found in the human genome. Prior to this report, only 11 chicken chemokines and 7 receptors had been reported. The objectives of this study were to systematically identify chicken chemokines and their cognate receptor genes in the chicken genome and to annotate these genes and ligand-receptor binding by a comparative genomics approach. Results Twenty-three chemokine and 14 chemokine receptor genes were identified in the chicken genome. All of the chicken chemokines contained a conserved CC, CXC, CX3C, or XC motif, whereas all the chemokine receptors had seven conserved transmembrane helices, four extracellular domains with a conserved cysteine, and a conserved DRYLAIV sequence in the second intracellular domain. The number of coding exons in these genes and the syntenies are highly conserved between human, mouse, and chicken although the amino acid sequence homologies are generally low between mammalian and chicken chemokines. Chicken genes were named with the systematic nomenclature used in humans and mice based on phylogeny, synteny, and sequence homology. Conclusion The independent nomenclature of chicken chemokines and chemokine receptors suggests that the chicken may have ligand-receptor pairings similar to mammals. All identified chicken chemokines and their cognate receptors were identified in the chicken genome except CCR9, whose ligand was not identified in this study. The organization of these genes suggests that there were a substantial number of these genes present before divergence between aves and mammals and more gene duplications of CC, CXC, CCR, and CXCR subfamilies in mammals than in aves after the divergence.

  1. Ocean biogeochemistry modeled with emergent trait-based genomics

    Science.gov (United States)

    Coles, V. J.; Stukel, M. R.; Brooks, M. T.; Burd, A.; Crump, B. C.; Moran, M. A.; Paul, J. H.; Satinsky, B. M.; Yager, P. L.; Zielinski, B. L.; Hood, R. R.

    2017-12-01

    Marine ecosystem models have advanced to incorporate metabolic pathways discovered with genomic sequencing, but direct comparisons between models and “omics” data are lacking. We developed a model that directly simulates metagenomes and metatranscriptomes for comparison with observations. Model microbes were randomly assigned genes for specialized functions, and communities of 68 species were simulated in the Atlantic Ocean. Unfit organisms were replaced, and the model self-organized to develop community genomes and transcriptomes. Emergent communities from simulations that were initialized with different cohorts of randomly generated microbes all produced realistic vertical and horizontal ocean nutrient, genome, and transcriptome gradients. Thus, the library of gene functions available to the community, rather than the distribution of functions among specific organisms, drove community assembly and biogeochemical gradients in the model ocean.

  2. Evidence of codon usage in the nearest neighbor spacing distribution of bases in bacterial genomes

    Science.gov (United States)

    Higareda, M. F.; Geiger, O.; Mendoza, L.; Méndez-Sánchez, R. A.

    2012-02-01

    Statistical analysis of whole genomic sequences usually assumes a homogeneous nucleotide density throughout the genome, an assumption that has been proved incorrect for several organisms since the nucleotide density is only locally homogeneous. To avoid giving a single numerical value to this variable property, we propose the use of spectral statistics, which characterizes the density of nucleotides as a function of its position in the genome. We show that the cumulative density of bases in bacterial genomes can be separated into an average (or secular) plus a fluctuating part. Bacterial genomes can be divided into two groups according to the qualitative description of their secular part: linear and piecewise linear. These two groups of genomes show different properties when their nucleotide spacing distribution is studied. In order to analyze genomes having a variable nucleotide density, statistically, the use of unfolding is necessary, i.e., to get a separation between the secular part and the fluctuations. The unfolding allows an adequate comparison with the statistical properties of other genomes. With this methodology, four genomes were analyzed Burkholderia, Bacillus, Clostridium and Corynebacterium. Interestingly, the nearest neighbor spacing distributions or detrended distance distributions are very similar for species within the same genus but they are very different for species from different genera. This difference can be attributed to the difference in the codon usage.

  3. YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia.

    Science.gov (United States)

    Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh

    2015-01-16

    Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the

  4. Microarray-based ultra-high resolution discovery of genomic deletion mutations

    Science.gov (United States)

    2014-01-01

    Background Oligonucleotide microarray-based comparative genomic hybridization (CGH) offers an attractive possible route for the rapid and cost-effective genome-wide discovery of deletion mutations. CGH typically involves comparison of the hybridization intensities of genomic DNA samples with microarray chip representations of entire genomes, and has widespread potential application in experimental research and medical diagnostics. However, the power to detect small deletions is low. Results Here we use a graduated series of Arabidopsis thaliana genomic deletion mutations (of sizes ranging from 4 bp to ~5 kb) to optimize CGH-based genomic deletion detection. We show that the power to detect smaller deletions (4, 28 and 104 bp) depends upon oligonucleotide density (essentially the number of genome-representative oligonucleotides on the microarray chip), and determine the oligonucleotide spacings necessary to guarantee detection of deletions of specified size. Conclusions Our findings will enhance a wide range of research and clinical applications, and in particular will aid in the discovery of genomic deletions in the absence of a priori knowledge of their existence. PMID:24655320

  5. The ethical introduction of genome-based information and technologies into public health.

    Science.gov (United States)

    Howard, H C; Swinnen, E; Douw, K; Vondeling, H; Cassiman, J-J; Cambon-Thomsen, A; Borry, P

    2013-01-01

    With the human genome project running from 1989 until its completion in 2003, and the incredible advances in sequencing technology and in bioinformatics during the last decade, there has been a shift towards an increase focus on studying common complex disorders which develop due to the interplay of many different genes as well as environmental factors. Although some susceptibility genes have been identified in some populations for disorders such as cancer, diabetes and cardiovascular diseases, the integration of this information into the health care system has proven to be much more problematic than for single gene disorders. Furthermore, with the 1000$ genome supposedly just around the corner, and whole genome sequencing gradually being integrated into research protocols as well as in the clinical context, there is a strong push for the uptake of additional genomic testing. Indeed, the advent of public health genomics, wherein genomics would be integrated in all aspects of health care and public health, should be taken seriously. Although laudable, these advances also bring with them a slew of ethical and social issues that challenge the normative frameworks used in clinical genetics until now. With this in mind, we highlight herein 5 principles that are used as a primer to discuss the ethical introduction of genome-based information and genome-based technologies into public health. Copyright © 2013 S. Karger AG, Basel.

  6. An evaluation of multiple annealing and looping based genome amplification using a synthetic bacterial community

    KAUST Repository

    Wang, Yong; Gao, Zhaoming; Xu, Ying; Li, Guangyu; He, Lisheng; Qian, Peiyuan

    2016-01-01

    -generation-sequencing technology. Using a synthetic bacterial community, the amplification efficiency of the Multiple Annealing and Looping Based Amplification Cycles (MALBAC) kit that is originally developed to amplify the single-cell genomic DNA of mammalian organisms

  7. The Glyphosate-Based Herbicide Roundup Does not Elevate Genome-Wide Mutagenesis of Escherichia coli.

    Science.gov (United States)

    Tincher, Clayton; Long, Hongan; Behringer, Megan; Walker, Noah; Lynch, Michael

    2017-10-05

    Mutations induced by pollutants may promote pathogen evolution, for example by accelerating mutations conferring antibiotic resistance. Generally, evaluating the genome-wide mutagenic effects of long-term sublethal pollutant exposure at single-nucleotide resolution is extremely difficult. To overcome this technical barrier, we use the mutation accumulation/whole-genome sequencing (MA/WGS) method as a mutagenicity test, to quantitatively evaluate genome-wide mutagenesis of Escherichia coli after long-term exposure to a wide gradient of the glyphosate-based herbicide (GBH) Roundup Concentrate Plus. The genome-wide mutation rate decreases as GBH concentration increases, suggesting that even long-term GBH exposure does not compromise the genome stability of bacteria. Copyright © 2017 Tincher et al.

  8. Genome-based comparative analyses of Antarctic and temperate species of Paenibacillus.

    Directory of Open Access Journals (Sweden)

    Melissa Dsouza

    Full Text Available Antarctic soils represent a unique environment characterised by extremes of temperature, salinity, elevated UV radiation, low nutrient and low water content. Despite the harshness of this environment, members of 15 bacterial phyla have been identified in soils of the Ross Sea Region (RSR. However, the survival mechanisms and ecological roles of these phyla are largely unknown. The aim of this study was to investigate whether strains of Paenibacillus darwinianus owe their resilience to substantial genomic changes. For this, genome-based comparative analyses were performed on three P. darwinianus strains, isolated from gamma-irradiated RSR soils, together with nine temperate, soil-dwelling Paenibacillus spp. The genome of each strain was sequenced to over 1,000-fold coverage, then assembled into contigs totalling approximately 3 Mbp per genome. Based on the occurrence of essential, single-copy genes, genome completeness was estimated at approximately 88%. Genome analysis revealed between 3,043-3,091 protein-coding sequences (CDSs, primarily associated with two-component systems, sigma factors, transporters, sporulation and genes induced by cold-shock, oxidative and osmotic stresses. These comparative analyses provide an insight into the metabolic potential of P. darwinianus, revealing potential adaptive mechanisms for survival in Antarctic soils. However, a large proportion of these mechanisms were also identified in temperate Paenibacillus spp., suggesting that these mechanisms are beneficial for growth and survival in a range of soil environments. These analyses have also revealed that the P. darwinianus genomes contain significantly fewer CDSs and have a lower paralogous content. Notwithstanding the incompleteness of the assemblies, the large differences in genome sizes, determined by the number of genes in paralogous clusters and the CDS content, are indicative of genome content scaling. Finally, these sequences are a resource for further

  9. A contig-based strategy for the genome-wide discovery of microRNAs without complete genome resources.

    Directory of Open Access Journals (Sweden)

    Jun-Zhi Wen

    Full Text Available MicroRNAs (miRNAs are important regulators of many cellular processes and exist in a wide range of eukaryotes. High-throughput sequencing is a mainstream method of miRNA identification through which it is possible to obtain the complete small RNA profile of an organism. Currently, most approaches to miRNA identification rely on a reference genome for the prediction of hairpin structures. However, many species of economic and phylogenetic importance are non-model organisms without complete genome sequences, and this limits miRNA discovery. Here, to overcome this limitation, we have developed a contig-based miRNA identification strategy. We applied this method to a triploid species of edible banana (GCTCV-119, Musa spp. AAA group and identified 180 pre-miRNAs and 314 mature miRNAs, which is three times more than those were predicted by the available dataset-based methods (represented by EST+GSS. Based on the recently published miRNA data set of Musa acuminate, the recall rate and precision of our strategy are estimated to be 70.6% and 92.2%, respectively, significantly better than those of EST+GSS-based strategy (10.2% and 50.0%, respectively. Our novel, efficient and cost-effective strategy facilitates the study of the functional and evolutionary role of miRNAs, as well as miRNA-based molecular breeding, in non-model species of economic or evolutionary interest.

  10. Multilevel Genomics-Based Taxonomy of Renal Cell Carcinoma

    Directory of Open Access Journals (Sweden)

    Fengju Chen

    2016-03-01

    Full Text Available On the basis of multidimensional and comprehensive molecular characterization (including DNA methalylation and copy number, RNA, and protein expression, we classified 894 renal cell carcinomas (RCCs of various histologic types into nine major genomic subtypes. Site of origin within the nephron was one major determinant in the classification, reflecting differences among clear cell, chromophobe, and papillary RCC. Widespread molecular changes associated with TFE3 gene fusion or chromatin modifier genes were present within a specific subtype and spanned multiple subtypes. Differences in patient survival and in alteration of specific pathways (including hypoxia, metabolism, MAP kinase, NRF2-ARE, Hippo, immune checkpoint, and PI3K/AKT/mTOR could further distinguish the subtypes. Immune checkpoint markers and molecular signatures of T cell infiltrates were both highest in the subtype associated with aggressive clear cell RCC. Differences between the genomic subtypes suggest that therapeutic strategies could be tailored to each RCC disease subset.

  11. Phylogenetic tree based on complete genomes using fractal and correlation analyses without sequence alignment

    Directory of Open Access Journals (Sweden)

    Zu-Guo Yu

    2006-06-01

    Full Text Available The complete genomes of living organisms have provided much information on their phylogenetic relationships. Similarly, the complete genomes of chloroplasts have helped resolve the evolution of this organelle in photosynthetic eukaryotes. In this review, we describe two algorithms to construct phylogenetic trees based on the theories of fractals and dynamic language using complete genomes. These algorithms were developed by our research group in the past few years. Our distance-based phylogenetic tree of 109 prokaryotes and eukaryotes agrees with the biologists' "tree of life" based on the 16S-like rRNA genes in a majority of basic branchings and most lower taxa. Our phylogenetic analysis also shows that the chloroplast genomes are separated into two major clades corresponding to chlorophytes s.l. and rhodophytes s.l. The interrelationships among the chloroplasts are largely in agreement with the current understanding on chloroplast evolution.

  12. Event-based text mining for biology and functional genomics

    Science.gov (United States)

    Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.

    2015-01-01

    The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365

  13. Generation of a BAC-based physical map of the melon genome

    Directory of Open Access Journals (Sweden)

    Puigdomènech Pere

    2010-05-01

    Full Text Available Abstract Background Cucumis melo (melon belongs to the Cucurbitaceae family, whose economic importance among horticulture crops is second only to Solanaceae. Melon has high intra-specific genetic variation, morphologic diversity and a small genome size (450 Mb, which make this species suitable for a great variety of molecular and genetic studies that can lead to the development of tools for breeding varieties of the species. A number of genetic and genomic resources have already been developed, such as several genetic maps and BAC genomic libraries. These tools are essential for the construction of a physical map, a valuable resource for map-based cloning, comparative genomics and assembly of whole genome sequencing data. However, no physical map of any Cucurbitaceae has yet been developed. A project has recently been started to sequence the complete melon genome following a whole-genome shotgun strategy, which makes use of massive sequencing data. A BAC-based melon physical map will be a useful tool to help assemble and refine the draft genome data that is being produced. Results A melon physical map was constructed using a 5.7 × BAC library and a genetic map previously developed in our laboratories. High-information-content fingerprinting (HICF was carried out on 23,040 BAC clones, digesting with five restriction enzymes and SNaPshot labeling, followed by contig assembly with FPC software. The physical map has 1,355 contigs and 441 singletons, with an estimated physical length of 407 Mb (0.9 × coverage of the genome and the longest contig being 3.2 Mb. The anchoring of 845 BAC clones to 178 genetic markers (100 RFLPs, 76 SNPs and 2 SSRs also allowed the genetic positioning of 183 physical map contigs/singletons, representing 55 Mb (12% of the melon genome, to individual chromosomal loci. The melon FPC database is available for download at http://melonomics.upv.es/static/files/public/physical_map/. Conclusions Here we report the construction

  14. A BAC-based physical map of the Drosophila buzzatii genome

    Energy Technology Data Exchange (ETDEWEB)

    Gonzalez, Josefa; Nefedov, Michael; Bosdet, Ian; Casals, Ferran; Calvete, Oriol; Delprat, Alejandra; Shin, Heesun; Chiu, Readman; Mathewson, Carrie; Wye, Natasja; Hoskins, Roger A.; Schein, JacquelineE.; de Jong, Pieter; Ruiz, Alfredo

    2005-03-18

    Large-insert genomic libraries facilitate cloning of large genomic regions, allow the construction of clone-based physical maps and provide useful resources for sequencing entire genomes. Drosophilabuzzatii is a representative species of the repleta group in the Drosophila subgenus, which is being widely used as a model in studies of genome evolution, ecological adaptation and speciation. We constructed a Bacterial Artificial Chromosome (BAC) genomic library of D. buzzatii using the shuttle vector pTARBAC2.1. The library comprises 18,353 clones with an average insert size of 152 kb and a {approx}18X expected representation of the D. buzzatii euchromatic genome. We screened the entire library with six euchromatic gene probes and estimated the actual genome representation to be {approx}23X. In addition, we fingerprinted by restriction digestion and agarose gel electrophoresis a sample of 9,555 clones, and assembled them using Finger Printed Contigs (FPC) software and manual editing into 345 contigs (mean of 26 clones per contig) and 670singletons. Finally, we anchored 181 large contigs (containing 7,788clones) to the D. buzzatii salivary gland polytene chromosomes by in situ hybridization of 427 representative clones. The BAC library and a database with all the information regarding the high coverage BAC-based physical map described in this paper are available to the research community.

  15. Phytoplasma infection in tomato is associated with re-organization of plasma membrane, ER stacks, and actin filaments in sieve elements

    OpenAIRE

    Buxa, Stefanie V; Degola, Francesca; Polizzotto, Rachele; de Marco, Federica; Loschi, Alberto; Kogel, Karl-Heinz; di Toppi, Luigi Sanità; van Bel, Aart J. E.; Musetti, Rita

    2015-01-01

    Phytoplasmas, biotrophic wall-less prokaryotes, only reside in sieve elements of their host plants. The essentials of the intimate interaction between phytoplasmas and their hosts are poorly understood, which calls for research on potential ultrastructural modifications. We investigated modifications of the sieve-element ultrastructure induced in tomato plants by ‘Candidatus Phytoplasma solani,’ the pathogen associated with the stolbur disease. Phytoplasma infection induces a drastic re-organ...

  16. Integrated genome-based studies of Shewanella ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Segre Daniel; Beg Qasim

    2012-02-14

    This project was a component of the Shewanella Federation and, as such, contributed to the overall goal of applying the genomic tools to better understand eco-physiology and speciation of respiratory-versatile members of Shewanella genus. Our role at Boston University was to perform bioreactor and high throughput gene expression microarrays, and combine dynamic flux balance modeling with experimentally obtained transcriptional and gene expression datasets from different growth conditions. In the first part of project, we designed the S. oneidensis microarray probes for Affymetrix Inc. (based in California), then we identified the pathways of carbon utilization in the metal-reducing marine bacterium Shewanella oneidensis MR-1, using our newly designed high-density oligonucleotide Affymetrix microarray on Shewanella cells grown with various carbon sources. Next, using a combination of experimental and computational approaches, we built algorithm and methods to integrate the transcriptional and metabolic regulatory networks of S. oneidensis. Specifically, we combined mRNA microarray and metabolite measurements with statistical inference and dynamic flux balance analysis (dFBA) to study the transcriptional response of S. oneidensis MR-1 as it passes through exponential, stationary, and transition phases. By measuring time-dependent mRNA expression levels during batch growth of S. oneidensis MR-1 under two radically different nutrient compositions (minimal lactate and nutritionally rich LB medium), we obtain detailed snapshots of the regulatory strategies used by this bacterium to cope with gradually changing nutrient availability. In addition to traditional clustering, which provides a first indication of major regulatory trends and transcription factors activities, we developed and implemented a new computational approach for Dynamic Detection of Transcriptional Triggers (D2T2). This new method allows us to infer a putative topology of transcriptional dependencies

  17. CrusView: A Java-Based Visualization Platform for Comparative Genomics Analyses in Brassicaceae Species[OPEN

    Science.gov (United States)

    Chen, Hao; Wang, Xiangfeng

    2013-01-01

    In plants and animals, chromosomal breakage and fusion events based on conserved syntenic genomic blocks lead to conserved patterns of karyotype evolution among species of the same family. However, karyotype information has not been well utilized in genomic comparison studies. We present CrusView, a Java-based bioinformatic application utilizing Standard Widget Toolkit/Swing graphics libraries and a SQLite database for performing visualized analyses of comparative genomics data in Brassicaceae (crucifer) plants. Compared with similar software and databases, one of the unique features of CrusView is its integration of karyotype information when comparing two genomes. This feature allows users to perform karyotype-based genome assembly and karyotype-assisted genome synteny analyses with preset karyotype patterns of the Brassicaceae genomes. Additionally, CrusView is a local program, which gives its users high flexibility when analyzing unpublished genomes and allows users to upload self-defined genomic information so that they can visually study the associations between genome structural variations and genetic elements, including chromosomal rearrangements, genomic macrosynteny, gene families, high-frequency recombination sites, and tandem and segmental duplications between related species. This tool will greatly facilitate karyotype, chromosome, and genome evolution studies using visualized comparative genomics approaches in Brassicaceae species. CrusView is freely available at http://www.cmbb.arizona.edu/CrusView/. PMID:23898041

  18. Sequence based polymorphic (SBP marker technology for targeted genomic regions: its application in generating a molecular map of the Arabidopsis thaliana genome

    Directory of Open Access Journals (Sweden)

    Sahu Binod B

    2012-01-01

    Full Text Available Abstract Background Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs for any genomic regions. Here a sequence based polymorphic (SBP marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described. Results A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0 was obtained by applying Illumina's sequencing by synthesis (Solexa technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0 genome sequence identified putative single nucleotide polymorphisms (SNPs throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. Conclusions The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for

  19. Recombination analysis based on the complete genome of bocavirus

    Directory of Open Access Journals (Sweden)

    Chen Shengxia

    2011-04-01

    Full Text Available Abstract Bocavirus include bovine parvovirus, minute virus of canine, porcine bocavirus, gorilla bocavirus, and Human bocaviruses 1-4 (HBoVs. Although recent reports showed that recombination happened in bocavirus, no systematical study investigated the recombination of bocavirus. The present study performed the phylogenetic and recombination analysis of bocavirus over the complete genomes available in GenBank. Results confirmed that recombination existed among bocavirus, including the likely inter-genotype recombination between HBoV1 and HBoV4, and intra-genotype recombination among HBoV2 variants. Moreover, it is the first report revealing the recombination that occurred between minute viruses of canine.

  20. Assessment of a Competency-Based Undergraduate Course on Genetic and Genomics.

    Science.gov (United States)

    Kronk, Rebecca; Colbert, Alison; Lengetti, Evelyn

    2017-08-24

    In response to new demands in the nursing profession, an innovative undergraduate genetics course was designed based on the Essential Nursing Competencies and Curricula Guidelines for Genetics and Genomics. Reflective journaling and storytelling were used as major pedagogies, alongside more traditional approaches. Thematic content analysis of student reflections revealed transformational learning as the major theme emerging from genomic and genetic knowledge acquisition. Quantitative analyses of precourse/postcourse student self-assessments of competencies revealed significant findings.

  1. Ethical issues and best practice in clinically based genomic research: Exeter Stakeholders Meeting Report

    OpenAIRE

    Carrieri, D; Bewshea, C; Walker, G; Ahmad, T; Bowen, W; Hall, A; Kelly, S

    2016-01-01

    Current guidelines on consenting individuals to participate in genomic research are diverse. This creates problems for participants and also for researchers, particularly for clinicians who provide both clinical care and research to their patients. A group of 14 stakeholders met on 7 October 2015 in Exeter to discuss the ethical issues and the best practice arising in clinically based genomic research, with particular emphasis on the issue of returning results to study participants/patients i...

  2. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

    OpenAIRE

    Speth, D.R.; Zandt, M.H. in 't; Guerrero Cruz, S.; Dutilh, B.E.; Jetten, M.S.M.

    2016-01-01

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete d...

  3. MODEST: a web-based design tool for oligonucleotide-mediated genome engineering and recombineering

    DEFF Research Database (Denmark)

    Bonde, Mads; Klausen, Michael Schantz; Anderson, Mads Valdemar

    2014-01-01

    Recombineering and multiplex automated genome engineering (MAGE) offer the possibility to rapidly modify multiple genomic or plasmid sites at high efficiencies. This enables efficient creation of genetic variants including both single mutants with specifically targeted modifications as well......, which confers the corresponding genetic change, is performed manually. To address these challenges, we have developed the MAGE Oligo Design Tool (MODEST). This web-based tool allows designing of MAGE oligos for (i) tuning translation rates by modifying the ribosomal binding site, (ii) generating...

  4. Actin re-organization induced by Chlamydia trachomatis serovar D--evidence for a critical role of the effector protein CT166 targeting Rac.

    Directory of Open Access Journals (Sweden)

    Jessica Thalmann

    Full Text Available The intracellular bacterium Chlamydia trachomatis causes infections of urogenital tract, eyes or lungs. Alignment reveals homology of CT166, a putative effector protein of urogenital C. trachomatis serovars, with the N-terminal glucosyltransferase domain of clostridial glucosylating toxins (CGTs. CGTs contain an essential DXD-motif and mono-glucosylate GTP-binding proteins of the Rho/Ras families, the master regulators of the actin cytoskeleton. CT166 is preformed in elementary bodies of C. trachomatis D and is detected in the host-cell shortly after infection. Infection with high MOI of C. trachomatis serovar D containing the CT166 ORF induces actin re-organization resulting in cell rounding and a decreased cell diameter. A comparable phenotype was observed in HeLa cells treated with the Rho-GTPase-glucosylating Toxin B from Clostridium difficile (TcdB or HeLa cells ectopically expressing CT166. CT166 with a mutated DXD-motif (CT166-mut exhibited almost unchanged actin dynamics, suggesting that CT166-induced actin re-organization depends on the glucosyltransferase motif of CT166. The cytotoxic necrotizing factor 1 (CNF1 from E. coli deamidates and thereby activates Rho-GTPases and transiently protects them against TcdB-induced glucosylation. CNF1-treated cells were found to be protected from TcdB- and CT166-induced actin re-organization. CNF1 treatment as well as ectopic expression of non-glucosylable Rac1-G12V, but not RhoA-G14A, reverted CT166-induced actin re-organization, suggesting that CT166-induced actin re-organization depends on the glucosylation of Rac1. In accordance, over-expression of CT166-mut diminished TcdB induced cell rounding, suggesting shared substrates. Cell rounding induced by high MOI infection with C. trachomatis D was reduced in cells expressing CT166-mut or Rac1-G12V, and in CNF1 treated cells. These observations indicate that the cytopathic effect of C. trachomatis D is mediated by CT166 induced Rac1 glucosylation

  5. Toward Genomics-Based Breeding in C3 Cool-Season Perennial Grasses

    Science.gov (United States)

    Talukder, Shyamal K.; Saha, Malay C.

    2017-01-01

    Most important food and feed crops in the world belong to the C3 grass family. The future of food security is highly reliant on achieving genetic gains of those grasses. Conventional breeding methods have already reached a plateau for improving major crops. Genomics tools and resources have opened an avenue to explore genome-wide variability and make use of the variation for enhancing genetic gains in breeding programs. Major C3 annual cereal breeding programs are well equipped with genomic tools; however, genomic research of C3 cool-season perennial grasses is lagging behind. In this review, we discuss the currently available genomics tools and approaches useful for C3 cool-season perennial grass breeding. Along with a general review, we emphasize the discussion focusing on forage grasses that were considered orphan and have little or no genetic information available. Transcriptome sequencing and genotype-by-sequencing technology for genome-wide marker detection using next-generation sequencing (NGS) are very promising as genomics tools. Most C3 cool-season perennial grass members have no prior genetic information; thus NGS technology will enhance collinear study with other C3 model grasses like Brachypodium and rice. Transcriptomics data can be used for identification of functional genes and molecular markers, i.e., polymorphism markers and simple sequence repeats (SSRs). Genome-wide association study with NGS-based markers will facilitate marker identification for marker-assisted selection. With limited genetic information, genomic selection holds great promise to breeders for attaining maximum genetic gain of the cool-season C3 perennial grasses. Application of all these tools can ensure better genetic gains, reduce length of selection cycles, and facilitate cultivar development to meet the future demand for food and fodder. PMID:28798766

  6. Toward Genomics-Based Breeding in C3 Cool-Season Perennial Grasses

    Directory of Open Access Journals (Sweden)

    Shyamal K. Talukder

    2017-07-01

    Full Text Available Most important food and feed crops in the world belong to the C3 grass family. The future of food security is highly reliant on achieving genetic gains of those grasses. Conventional breeding methods have already reached a plateau for improving major crops. Genomics tools and resources have opened an avenue to explore genome-wide variability and make use of the variation for enhancing genetic gains in breeding programs. Major C3 annual cereal breeding programs are well equipped with genomic tools; however, genomic research of C3 cool-season perennial grasses is lagging behind. In this review, we discuss the currently available genomics tools and approaches useful for C3 cool-season perennial grass breeding. Along with a general review, we emphasize the discussion focusing on forage grasses that were considered orphan and have little or no genetic information available. Transcriptome sequencing and genotype-by-sequencing technology for genome-wide marker detection using next-generation sequencing (NGS are very promising as genomics tools. Most C3 cool-season perennial grass members have no prior genetic information; thus NGS technology will enhance collinear study with other C3 model grasses like Brachypodium and rice. Transcriptomics data can be used for identification of functional genes and molecular markers, i.e., polymorphism markers and simple sequence repeats (SSRs. Genome-wide association study with NGS-based markers will facilitate marker identification for marker-assisted selection. With limited genetic information, genomic selection holds great promise to breeders for attaining maximum genetic gain of the cool-season C3 perennial grasses. Application of all these tools can ensure better genetic gains, reduce length of selection cycles, and facilitate cultivar development to meet the future demand for food and fodder.

  7. DFAST and DAGA: web-based integrated genome annotation tools and resources.

    Science.gov (United States)

    Tanizawa, Yasuhiro; Fujisawa, Takatomo; Kaminuma, Eli; Nakamura, Yasukazu; Arita, Masanori

    2016-01-01

    Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus , obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii , whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.

  8. Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes

    DEFF Research Database (Denmark)

    Bohlin, J; Skjerve, E; Ussery, David

    2008-01-01

    with here are mainly used to examine similarities between archaeal and bacterial DNA from different genomes. These methods compare observed genomic frequencies of fixed-sized oligonucleotides with expected values, which can be determined by genomic nucleotide content, smaller oligonucleotide frequencies......, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore...... the reliability and best suited applications for some popular methods, which include relative oligonucleotide frequencies (ROF), di- to hexanucleotide zero'th order Markov methods (ZOM) and 2.order Markov chain Method (MCM). Tests were performed on distant homology searches with large DNA sequences, detection...

  9. Application of Microarray-Based Comparative Genomic Hybridization in Prenatal and Postnatal Settings: Three Case Reports

    Directory of Open Access Journals (Sweden)

    Jing Liu

    2011-01-01

    Full Text Available Microarray-based comparative genomic hybridization (array CGH is a newly emerged molecular cytogenetic technique for rapid evaluation of the entire genome with sub-megabase resolution. It allows for the comprehensive investigation of thousands and millions of genomic loci at once and therefore enables the efficient detection of DNA copy number variations (a.k.a, cryptic genomic imbalances. The development and the clinical application of array CGH have revolutionized the diagnostic process in patients and has provided a clue to many unidentified or unexplained diseases which are suspected to have a genetic cause. In this paper, we present three clinical cases in both prenatal and postnatal settings. Among all, array CGH played a major discovery role to reveal the cryptic and/or complex nature of chromosome arrangements. By identifying the genetic causes responsible for the clinical observation in patients, array CGH has provided accurate diagnosis and appropriate clinical management in a timely and efficient manner.

  10. Kernel-based whole-genome prediction of complex traits: a review.

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  11. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  12. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform

    Directory of Open Access Journals (Sweden)

    Wenning Zheng

    2016-03-01

    Full Text Available Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%, predicted hydrophobicity and molecular weight (Da using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1 client workstation, (2 web server, (3 application server, and (4 database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs, 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence

  13. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform.

    Science.gov (United States)

    Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh

    2016-01-01

    Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI) and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%), predicted hydrophobicity and molecular weight (Da) using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1) client workstation, (2) web server, (3) application server, and (4) database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC) framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs), 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence Factor

  14. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  15. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  16. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    Science.gov (United States)

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  17. The effect of using genealogy-based haplotypes for genomic prediction.

    Science.gov (United States)

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  18. Genome-wide screen for universal individual identification SNPs based on the HapMap and 1000 Genomes databases.

    Science.gov (United States)

    Huang, Erwen; Liu, Changhui; Zheng, Jingjing; Han, Xiaolong; Du, Weian; Huang, Yuanjian; Li, Chengshi; Wang, Xiaoguang; Tong, Dayue; Ou, Xueling; Sun, Hongyu; Zeng, Zhaoshu; Liu, Chao

    2018-04-03

    Differences among SNP panels for individual identification in SNP-selecting and populations led to few common SNPs, compromising their universal applicability. To screen all universal SNPs, we performed a genome-wide SNP mining in multiple populations based on HapMap and 1000Genomes databases. SNPs with high minor allele frequencies (MAF) in 37 populations were selected. With MAF from ≥0.35 to ≥0.43, the number of selected SNPs decreased from 2769 to 0. A total of 117 SNPs with MAF ≥0.39 have no linkage disequilibrium with each other in every population. For 116 of the 117 SNPs, cumulative match probability (CMP) ranged from 2.01 × 10-48 to 1.93 × 10-50 and cumulative exclusion probability (CEP) ranged from 0.9999999996653 to 0.9999999999945. In 134 tested Han samples, 110 of the 117 SNPs remained within high MAF and conformed to Hardy-Weinberg equilibrium, with CMP = 4.70 × 10-47 and CEP = 0.999999999862. By analyzing the same number of autosomal SNPs as in the HID-Ion AmpliSeq Identity Panel, i.e. 90 randomized out of the 110 SNPs, our panel yielded preferable CMP and CEP. Taken together, the 110-SNPs panel is advantageous for forensic test, and this study provided plenty of highly informative SNPs for compiling final universal panels.

  19. The Vigna Genome Server, 'VigGS': A Genomic Knowledge Base of the Genus Vigna Based on High-Quality, Annotated Genome Sequence of the Azuki Bean, Vigna angularis (Willd.) Ohwi & Ohashi.

    Science.gov (United States)

    Sakai, Hiroaki; Naito, Ken; Takahashi, Yu; Sato, Toshiyuki; Yamamoto, Toshiya; Muto, Isamu; Itoh, Takeshi; Tomooka, Norihiko

    2016-01-01

    The genus Vigna includes legume crops such as cowpea, mungbean and azuki bean, as well as >100 wild species. A number of the wild species are highly tolerant to severe environmental conditions including high-salinity, acid or alkaline soil; drought; flooding; and pests and diseases. These features of the genus Vigna make it a good target for investigation of genetic diversity in adaptation to stressful environments; however, a lack of genomic information has hindered such research in this genus. Here, we present a genome database of the genus Vigna, Vigna Genome Server ('VigGS', http://viggs.dna.affrc.go.jp), based on the recently sequenced azuki bean genome, which incorporates annotated exon-intron structures, along with evidence for transcripts and proteins, visualized in GBrowse. VigGS also facilitates user construction of multiple alignments between azuki bean genes and those of six related dicot species. In addition, the database displays sequence polymorphisms between azuki bean and its wild relatives and enables users to design primer sequences targeting any variant site. VigGS offers a simple keyword search in addition to sequence similarity searches using BLAST and BLAT. To incorporate up to date genomic information, VigGS automatically receives newly deposited mRNA sequences of pre-set species from the public database once a week. Users can refer to not only gene structures mapped on the azuki bean genome on GBrowse but also relevant literature of the genes. VigGS will contribute to genomic research into plant biotic and abiotic stresses and to the future development of new stress-tolerant crops. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  20. Improvement of the Threespine Stickleback Genome Using a Hi-C-Based Proximity-Guided Assembly.

    Science.gov (United States)

    Peichel, Catherine L; Sullivan, Shawn T; Liachko, Ivan; White, Michael A

    2017-09-01

    Scaffolding genomes into complete chromosome assemblies remains challenging even with the rapidly increasing sequence coverage generated by current next-generation sequence technologies. Even with scaffolding information, many genome assemblies remain incomplete. The genome of the threespine stickleback (Gasterosteus aculeatus), a fish model system in evolutionary genetics and genomics, is not completely assembled despite scaffolding with high-density linkage maps. Here, we first test the ability of a Hi-C based proximity-guided assembly (PGA) to perform a de novo genome assembly from relatively short contigs. Using Hi-C based PGA, we generated complete chromosome assemblies from a distribution of short contigs (20-100 kb). We found that 96.40% of contigs were correctly assigned to linkage groups (LGs), with ordering nearly identical to the previous genome assembly. Using available bacterial artificial chromosome (BAC) end sequences, we provide evidence that some of the few discrepancies between the Hi-C assembly and the existing assembly are due to structural variation between the populations used for the 2 assemblies or errors in the existing assembly. This Hi-C assembly also allowed us to improve the existing assembly, assigning over 60% (13.35 Mb) of the previously unassigned (~21.7 Mb) contigs to LGs. Together, our results highlight the potential of the Hi-C based PGA method to be used in combination with short read data to perform relatively inexpensive de novo genome assemblies. This approach will be particularly useful in organisms in which it is difficult to perform linkage mapping or to obtain high molecular weight DNA required for other scaffolding methods. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  1. GenomeRNAi: a database for cell-based RNAi phenotypes.

    Science.gov (United States)

    Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael

    2007-01-01

    RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at http://rnai.dkfz.de.

  2. Cloud-based interactive analytics for terabytes of genomic variants data.

    Science.gov (United States)

    Pan, Cuiping; McInnes, Gregory; Deflaux, Nicole; Snyder, Michael; Bingham, Jonathan; Datta, Somalee; Tsao, Philip S

    2017-12-01

    Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. cuiping@stanford.edu or ptsao@stanford.edu. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.

  3. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Directory of Open Access Journals (Sweden)

    Daniel M de Brito

    Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

  4. A transcriptome-snp-derived linkage map of Apios americana (potato bean) provides insights about genome re-organization and synteny conservation in the phaseolid legumes

    Science.gov (United States)

    Apios (Apios americana; “apios”), a tuberous perennial legume in the Phaseoleae tribe, was widely used as a food by Native Americans. Work in the last 40 years has led to several improved breeding lines. Aspects of the pollination biology (complex floral structure and tripping mechanism) have made c...

  5. PCR-Based Seamless Genome Editing with High Efficiency and Fidelity in Escherichia coli

    DEFF Research Database (Denmark)

    Liu, Yilan; Yang, Maohua; Yan, Daojiang

    2016-01-01

    Efficiency and fidelity are the key obstacles for genome editing toolboxes. In the present study, a PCR-based tandem repeat assisted genome editing (TRAGE) method with high efficiency and fidelity was developed. The design of TRAGE is based on the mechanism of repair of spontaneous double...... for seamlessly deleting, substituting and inserting targeted genes using PCR products. The effects of different manipulations including sucrose addition time, subculture times in LB with sucrose and stages of inoculation on the efficiency were investigated. With our recommended procedure, seamless excision...... of cat-sacB cassette can be realized in 48 h efficiently. We believe that the developed method has great potential for seamless genome editing in E. coli....

  6. Progress of CRISPR-Cas Based Genome Editing in Photosynthetic Microbes.

    Science.gov (United States)

    Naduthodi, Mihris Ibnu Saleem; Barbosa, Maria J; van der Oost, John

    2018-02-03

    The carbon footprint caused by unsustainable development and its environmental and economic impact has become a major concern in the past few decades. Photosynthetic microbes such as microalgae and cyanobacteria are capable of accumulating value-added compounds from carbon dioxide, and have been regarded as environmentally friendly alternatives to reduce the usage of fossil fuels, thereby contributing to reducing the carbon footprint. This light-driven generation of green chemicals and biofuels has triggered the research for metabolic engineering of these photosynthetic microbes. CRISPR-Cas systems are successfully implemented across a wide range of prokaryotic and eukaryotic species for efficient genome editing. However, the inception of this genome editing tool in microalgal and cyanobacterial species took off rather slowly due to various complications. In this review, we elaborate on the established CRISPR-Cas based genome editing in various microalgal and cyanobacterial species. The complications associated with CRISPR-Cas based genome editing in these species are addressed along with possible strategies to overcome these issues. It is anticipated that in the near future this will result in improving and expanding the microalgal and cyanobacterial genome engineering toolbox. © 2018 The Authors. Biotechnology Journal Published by Wiley-VCH Verlag GmbH & Co. KGaA.

  7. CRISPR-Cas9-Based Genome Editing of Human Induced Pluripotent Stem Cells.

    Science.gov (United States)

    Giacalone, Joseph C; Sharma, Tasneem P; Burnight, Erin R; Fingert, John F; Mullins, Robert F; Stone, Edwin M; Tucker, Budd A

    2018-02-28

    Human induced pluripotent stem cells (hiPSCs) are the ideal cell source for autologous cell replacement. However, for patients with Mendelian diseases, genetic correction of the original disease-causing mutation is likely required prior to cellular differentiation and transplantation. The emergence of the CRISPR-Cas9 system has revolutionized the field of genome editing. By introducing inexpensive reagents that are relatively straightforward to design and validate, it is now possible to correct genetic variants or insert desired sequences at any location within the genome. CRISPR-based genome editing of patient-specific iPSCs shows great promise for future autologous cell replacement therapies. One caveat, however, is that hiPSCs are notoriously difficult to transfect, and optimized experimental design considerations are often necessary. This unit describes design strategies and methods for efficient CRISPR-based genome editing of patient- specific iPSCs. Additionally, it details a flexible approach that utilizes positive selection to generate clones with a desired genomic modification, Cre-lox recombination to remove the integrated selection cassette, and negative selection to eliminate residual hiPSCs with intact selection cassettes. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  8. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  9. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    Science.gov (United States)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  10. Microarray-based whole-genome hybridization as a tool for determining procaryotic species relatedness

    Energy Technology Data Exchange (ETDEWEB)

    Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.

    2008-01-15

    The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.

  11. StreptoBase: An Oral Streptococcus mitis Group Genomic Resource and Analysis Platform.

    Directory of Open Access Journals (Sweden)

    Wenning Zheng

    Full Text Available The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC tool and Pathogenomic Profiling Tool (PathoProT, which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.

  12. StreptoBase: An Oral Streptococcus mitis Group Genomic Resource and Analysis Platform.

    Science.gov (United States)

    Zheng, Wenning; Tan, Tze King; Paterson, Ian C; Mutha, Naresh V R; Siow, Cheuk Chuen; Tan, Shi Yang; Old, Lesley A; Jakubovics, Nicholas S; Choo, Siew Woh

    2016-01-01

    The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE) and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC) tool and Pathogenomic Profiling Tool (PathoProT), which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.

  13. Re-exploration of U's Triangle Brassica Species Based on Chloroplast Genomes and 45S nrDNA Sequences.

    Science.gov (United States)

    Kim, Chang-Kug; Seol, Young-Joo; Perumal, Sampath; Lee, Jonghoon; Waminal, Nomar Espinosa; Jayakodi, Murukarthick; Lee, Sang-Choon; Jin, Seungwoo; Choi, Beom-Soon; Yu, Yeisoo; Ko, Ho-Cheol; Choi, Ji-Weon; Ryu, Kyoung-Yul; Sohn, Seong-Han; Parkin, Isobel; Yang, Tae-Jin

    2018-05-09

    The concept of U's triangle, which revealed the importance of polyploidization in plant genome evolution, described natural allopolyploidization events in Brassica using three diploids [B. rapa (A genome), B. nigra (B), and B. oleracea (C)] and derived allotetraploids [B. juncea (AB genome), B. napus (AC), and B. carinata (BC)]. However, comprehensive understanding of Brassica genome evolution has not been fully achieved. Here, we performed low-coverage (2-6×) whole-genome sequencing of 28 accessions of Brassica as well as of Raphanus sativus [R genome] to explore the evolution of six Brassica species based on chloroplast genome and ribosomal DNA variations. Our phylogenomic analyses led to two main conclusions. (1) Intra-species-level chloroplast genome variations are low in the three allotetraploids (2~7 SNPs), but rich and variable in each diploid species (7~193 SNPs). (2) Three allotetraploids maintain two 45SnrDNA types derived from both ancestral species with maternal dominance. Furthermore, this study sheds light on the maternal origin of the AC chloroplast genome. Overall, this study clarifies the genetic relationships of U's triangle species based on a comprehensive genomics approach and provides important genomic resources for correlative and evolutionary studies.

  14. The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows.

    Science.gov (United States)

    O'Connor, Brian D; Yuen, Denis; Chung, Vincent; Duncan, Andrew G; Liu, Xiang Kun; Patricia, Janice; Paten, Benedict; Stein, Lincoln; Ferretti, Vincent

    2017-01-01

    As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).

  15. A biological-based model that links genomic instability, bystander effects, and adaptive response

    International Nuclear Information System (INIS)

    Scott, B.R.

    2004-01-01

    This paper links genomic instability, bystander effects, and adaptive response in mammalian cell communities via a novel biological-based, dose-response model called NEOTRANS 3 . The model is an extension of the NEOTRANS 2 model that addressed stochastic effects (genomic instability, mutations, and neoplastic transformation) associated with brief exposure to low radiation doses. With both models, ionizing radiation produces DNA damage in cells that can be associated with varying degrees of genomic instability. Cells with persistent problematic instability (PPI) are mutants that arise via misrepair of DNA damage. Progeny of PPI cells also have PPI and can undergo spontaneous neoplastic transformation. Unlike NEOTRANS 2 , with NEOTRANS 3 newly induced mutant PPI cells and their neoplastically transformed progeny can be suppressed via our previously introduced protective apoptosis-mediated (PAM) process, which can be activated by low linear energy transfer (LET) radiation. However, with NEOTRANS 3 (which like NEOTRANS 2 involves cross-talk between nongenomically compromised [e.g., nontransformed, nonmutants] and genomically compromised [e.g., mutants, transformants, etc.] cells), it is assumed that PAM is only activated over a relatively narrow, dose-rate-dependent interval (D PAM ,D off ); where D PAM is a small stochastic activation threshold, and D off is the stochastic dose above which PAM does not occur. PAM cooperates with activated normal DNA repair and with activated normal apoptosis in guarding against genomic instability. Normal repair involves both error-free repair and misrepair components. Normal apoptosis and the error-free component of normal repair protect mammals by preventing the occurrence of mutant cells. PAM selectively removes mutant cells arising via the misrepair component of normal repair, selectively removes existing neoplastically transformed cells, and probably selectively removes other genomically compromised cells when it is activated

  16. An HMM-based comparative genomic framework for detecting introgression in eukaryotes.

    Directory of Open Access Journals (Sweden)

    Kevin J Liu

    2014-06-01

    Full Text Available One outcome of interspecific hybridization and subsequent effects of evolutionary forces is introgression, which is the integration of genetic material from one species into the genome of an individual in another species. The evolution of several groups of eukaryotic species has involved hybridization, and cases of adaptation through introgression have been already established. In this work, we report on PhyloNet-HMM-a new comparative genomic framework for detecting introgression in genomes. PhyloNet-HMM combines phylogenetic networks with hidden Markov models (HMMs to simultaneously capture the (potentially reticulate evolutionary history of the genomes and dependencies within genomes. A novel aspect of our work is that it also accounts for incomplete lineage sorting and dependence across loci. Application of our model to variation data from chromosome 7 in the mouse (Mus musculus domesticus genome detected a recently reported adaptive introgression event involving the rodent poison resistance gene Vkorc1, in addition to other newly detected introgressed genomic regions. Based on our analysis, it is estimated that about 9% of all sites within chromosome 7 are of introgressive origin (these cover about 13 Mbp of chromosome 7, and over 300 genes. Further, our model detected no introgression in a negative control data set. We also found that our model accurately detected introgression and other evolutionary processes from synthetic data sets simulated under the coalescent model with recombination, isolation, and migration. Our work provides a powerful framework for systematic analysis of introgression while simultaneously accounting for dependence across sites, point mutations, recombination, and ancestral polymorphism.

  17. Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.

    Science.gov (United States)

    Yang, Bin; Peng, Yu; Leung, Henry Chi-Ming; Yiu, Siu-Ming; Chen, Jing-Chi; Chin, Francis Yuk-Lun

    2010-04-16

    With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases. In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets. Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%. We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/~alse/MetaCluster/.

  18. Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web.

    Science.gov (United States)

    Miller, Chase A; Anthony, Jon; Meyer, Michelle M; Marth, Gabor

    2013-02-01

    High-throughput biological research requires simultaneous visualization as well as analysis of genomic data, e.g. read alignments, variant calls and genomic annotations. Traditionally, such integrative analysis required desktop applications operating on locally stored data. Many current terabyte-size datasets generated by large public consortia projects, however, are already only feasibly stored at specialist genome analysis centers. As even small laboratories can afford very large datasets, local storage and analysis are becoming increasingly limiting, and it is likely that most such datasets will soon be stored remotely, e.g. in the cloud. These developments will require web-based tools that enable users to access, analyze and view vast remotely stored data with a level of sophistication and interactivity that approximates desktop applications. As rapidly dropping cost enables researchers to collect data intended to answer questions in very specialized contexts, developers must also provide software libraries that empower users to implement customized data analyses and data views for their particular application. Such specialized, yet lightweight, applications would empower scientists to better answer specific biological questions than possible with general-purpose genome browsers currently available. Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Scribl simplifies the development of sophisticated web-based graphical tools that approach the dynamism and interactivity of desktop applications. Software is freely available online at http://chmille4.github.com/Scribl/ and is implemented in JavaScript with all modern browsers supported.

  19. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    Science.gov (United States)

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  20. Challenges and opportunities for whole-genome sequencing–based surveillance of antibiotic resistance

    NARCIS (Netherlands)

    Schürch, Anita C.; van Schaik, Willem

    2017-01-01

    Infections caused by drug-resistant bacteria are increasingly reported across the planet, and drug-resistant bacteria are recognized to be a major threat to public health and modern medicine. In this review, we discuss how whole-genome sequencing (WGS)–based approaches can contribute to the

  1. Systematic differences in the response of genetic variation to pedigree and genome-based selection methods

    NARCIS (Netherlands)

    Heidaritabar, M.; Vereijken, A.; Muir, W.M.; Meuwissen, T.H.E.; Cheng, H.; Megens, H.J.W.C.; Groenen, M.; Bastiaansen, J.W.M.

    2014-01-01

    Genomic selection (GS) is a DNA-based method of selecting for quantitative traits in animal and plant breeding, and offers a potentially superior alternative to traditional breeding methods that rely on pedigree and phenotype information. Using a 60¿K SNP chip with markers spaced throughout the

  2. Genomics-Based Identifcation of Microorganisms in Human Ocular Body Fluid

    DEFF Research Database (Denmark)

    Kirstahler, Philipp; Solborg Bjerrum, Søren; Friis-Møller, Alice

    2018-01-01

    genomes and (iii) the environment. Our metagenomic read classification revealed in nearly all cases the same microorganism that was determined in cultivation- and mass spectrometry-based analyses. For some patients, we identified the sequence type of the microorganism and antibiotic resistance genes...

  3. From evidence-based medicine to genomic medicine

    OpenAIRE

    Kumar, Dhavendra

    2007-01-01

    The concept of ‘evidence-based medicine’ dates back to mid-19th century or even earlier. It remains pivotal in planning, funding and in delivering the health care. Clinicians, public health practitioners, health commissioners/purchasers, health planners, politicians and public seek formal ‘evidence’ in approving any form of health care provision. Essentially ‘evidence-based medicine’ aims at the conscientious, explicit and judicious use of the current best evidence in making decisions about t...

  4. Sonication-based isolation and enrichment of Chlorella protothecoides chloroplasts for illumina genome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Angelova, Angelina [University of Arizona; Park, Sang-Hycuk [University of Arizona; Kyndt, John [Bellevue University; Fitzsimmons, Kevin [University of Arizona; Brown, Judith K [University of Arizona

    2013-09-01

    With the increasing world demand for biofuel, a number of oleaginous algal species are being considered as renewable sources of oil. Chlorella protothecoides Krüger synthesizes triacylglycerols (TAGs) as storage compounds that can be converted into renewable fuel utilizing an anabolic pathway that is poorly understood. The paucity of algal chloroplast genome sequences has been an important constraint to chloroplast transformation and for studying gene expression in TAGs pathways. In this study, the intact chloroplasts were released from algal cells using sonication followed by sucrose gradient centrifugation, resulting in a 2.36-fold enrichment of chloroplasts from C. protothecoides, based on qPCR analysis. The C. protothecoides chloroplast genome (cpDNA) was determined using the Illumina HiSeq 2000 sequencing platform and found to be 84,576 Kb in size (8.57 Kb) in size, with a GC content of 30.8 %. This is the first report of an optimized protocol that uses a sonication step, followed by sucrose gradient centrifugation, to release and enrich intact chloroplasts from a microalga (C. prototheocoides) of sufficient quality to permit chloroplast genome sequencing with high coverage, while minimizing nuclear genome contamination. The approach is expected to guide chloroplast isolation from other oleaginous algal species for a variety of uses that benefit from enrichment of chloroplasts, ranging from biochemical analysis to genomics studies.

  5. CrisprGE: a central hub of CRISPR/Cas-based genome editing.

    Science.gov (United States)

    Kaur, Karambir; Tandon, Himani; Gupta, Amit Kumar; Kumar, Manoj

    2015-01-01

    CRISPR system is a powerful defense mechanism in bacteria and archaea to provide immunity against viruses. Recently, this process found a new application in intended targeting of the genomes. CRISPR-mediated genome editing is performed by two main components namely single guide RNA and Cas9 protein. Despite the enormous data generated in this area, there is a dearth of high throughput resource. Therefore, we have developed CrisprGE, a central hub of CRISPR/Cas-based genome editing. Presently, this database holds a total of 4680 entries of 223 unique genes from 32 model and other organisms. It encompasses information about the organism, gene, target gene sequences, genetic modification, modifications length, genome editing efficiency, cell line, assay, etc. This depository is developed using the open source LAMP (Linux Apache MYSQL PHP) server. User-friendly browsing, searching facility is integrated for easy data retrieval. It also includes useful tools like BLAST CrisprGE, BLAST NTdb and CRISPR Mapper. Considering potential utilities of CRISPR in the vast area of biology and therapeutics, we foresee this platform as an assistance to accelerate research in the burgeoning field of genome engineering. © The Author(s) 2015. Published by Oxford University Press.

  6. An evaluation of multiple annealing and looping based genome amplification using a synthetic bacterial community

    KAUST Repository

    Wang, Yong

    2016-02-23

    The low biomass in environmental samples is a major challenge for microbial metagenomic studies. The amplification of a genomic DNA was frequently applied to meeting the minimum requirement of the DNA for a high-throughput next-generation-sequencing technology. Using a synthetic bacterial community, the amplification efficiency of the Multiple Annealing and Looping Based Amplification Cycles (MALBAC) kit that is originally developed to amplify the single-cell genomic DNA of mammalian organisms is examined. The DNA template of 10 pg in each reaction of the MALBAC amplification may generate enough DNA for Illumina sequencing. Using 10 pg and 100 pg templates for each reaction set, the MALBAC kit shows a stable and homogeneous amplification as indicated by the highly consistent coverage of the reads from the two amplified samples on the contigs assembled by the original unamplified sample. Although GenomePlex whole genome amplification kit allows one to generate enough DNA using 100 pg of template in each reaction, the minority of the mixed bacterial species is not linearly amplified. For both of the kits, the GC-rich regions of the genomic DNA are not efficiently amplified as suggested by the low coverage of the contigs with the high GC content. The high efficiency of the MALBAC kit is supported for the amplification of environmental microbial DNA samples, and the concerns on its application are also raised to bacterial species with the high GC content.

  7. Integrating genome-based informatics to modernize global disease monitoring, information sharing, and response

    DEFF Research Database (Denmark)

    Aarestrup, Frank Møller; Brown, Eric W; Detter, Chris

    2012-01-01

    The rapid advancement of genome technologies holds great promise for improving the quality and speed of clinical and public health laboratory investigations and for decreasing their cost. The latest generation of genome DNA sequencers can provide highly detailed and robust information on disease...... typing methods to provide point-of-care clinical diagnosis and other essential information for quicker and better treatment of patients. Provided there is free-sharing of information by all clinical and public health laboratories, these genomic tools could spawn a global system of linked databases......-causing microbes, and in the near future these technologies will be suitable for routine use in national, regional, and global public health laboratories. With additional improvements in instrumentation, these next- or third-generation sequencers are likely to replace conventional culture-based and molecular...

  8. Ethical issues and best practice in clinically based genomic research: Exeter Stakeholders Meeting Report.

    Science.gov (United States)

    Carrieri, D; Bewshea, C; Walker, G; Ahmad, T; Bowen, W; Hall, A; Kelly, S

    2016-09-27

    Current guidelines on consenting individuals to participate in genomic research are diverse. This creates problems for participants and also for researchers, particularly for clinicians who provide both clinical care and research to their patients. A group of 14 stakeholders met on 7 October 2015 in Exeter to discuss the ethical issues and the best practice arising in clinically based genomic research, with particular emphasis on the issue of returning results to study participants/patients in light of research findings affecting research and clinical practices. The group was deliberately multidisciplinary to ensure that a diversity of views was represented. This report outlines the main ethical issues, areas of best practice and principles underlying ethical clinically based genomic research discussed during the meeting. The main point emerging from the discussion is that ethical principles, rather than being formulaic, should guide researchers/clinicians to identify who the main stakeholders are to consult with for a specific project and to incorporate their voices/views strategically throughout the lifecycle of each project. We believe that the mix of principles and practical guidelines outlined in this report can contribute to current debates on how to conduct ethical clinically based genomic research. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  9. CRISPR/CAS9 based engineering of actinomycetal genomes

    DEFF Research Database (Denmark)

    2016-01-01

    The present invention relates to CRISPR/Cas-based methods for generating random-sized deletions around at least one target nucleic acid sequence, or for generating precise indels around at least one target nucleic acid sequence, or for modulating transcription of at least one target nucleic acid...

  10. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis

    Directory of Open Access Journals (Sweden)

    Stajich Jason E

    2006-11-01

    relatives. We could not confidently resolve whether Candida glabrata or Saccharomyces castellii lies at the base of the WGD clade. Conclusion We have constructed robust phylogenies for fungi based on whole genome analysis. Overall, our phylogenies provide strong support for the classification of phyla, sub-phyla, classes and orders. We have resolved the relationship of the classes Leotiomyctes and Sordariomycetes, and have identified two classes within the CTG clade of the Saccharomycotina that may correlate with sexual status.

  11. Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection.

    Science.gov (United States)

    Schmidt, Malthe; Kollers, Sonja; Maasberg-Prelle, Anja; Großer, Jörg; Schinkel, Burkhard; Tomerius, Alexandra; Graner, Andreas; Korzun, Viktor

    2016-02-01

    Genomic prediction of malting quality traits in barley shows the potential of applying genomic selection to improve selection for malting quality and speed up the breeding process. Genomic selection has been applied to various plant species, mostly for yield or yield-related traits such as grain dry matter yield or thousand kernel weight, and improvement of resistances against diseases. Quality traits have not been the main scope of analysis for genomic selection, but have rather been addressed by marker-assisted selection. In this study, the potential to apply genomic selection to twelve malting quality traits in two commercial breeding programs of spring and winter barley (Hordeum vulgare L.) was assessed. Phenotypic means were calculated combining multilocational field trial data from 3 or 4 years, depending on the trait investigated. Three to five locations were available in each of these years. Heritabilities for malting traits ranged between 0.50 and 0.98. Predictive abilities (PA), as derived from cross validation, ranged between 0.14 to 0.58 for spring barley and 0.40-0.80 for winter barley. Small training sets were shown to be sufficient to obtain useful PAs, possibly due to the narrow genetic base in this breeding material. Deployment of genomic selection in malting barley breeding clearly has the potential to reduce cost intensive phenotyping for quality traits, increase selection intensity and to shorten breeding cycles.

  12. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

    Science.gov (United States)

    Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

    2013-01-01

    Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).

  13. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

    Science.gov (United States)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R

    2009-07-01

    The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/

  14. Genome-based Modeling and Design of Metabolic Interactions in Microbial Communities.

    Science.gov (United States)

    Mahadevan, Radhakrishnan; Henson, Michael A

    2012-01-01

    Biotechnology research is traditionally focused on individual microbial strains that are perceived to have the necessary metabolic functions, or the capability to have these functions introduced, to achieve a particular task. For many important applications, the development of such omnipotent microbes is an extremely challenging if not impossible task. By contrast, nature employs a radically different strategy based on synergistic combinations of different microbial species that collectively achieve the desired task. These natural communities have evolved to exploit the native metabolic capabilities of each species and are highly adaptive to changes in their environments. However, microbial communities have proven difficult to study due to a lack of suitable experimental and computational tools. With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research.

  15. A genomic background based method for association analysis in related individuals.

    Directory of Open Access Journals (Sweden)

    Najaf Amin

    Full Text Available BACKGROUND: Feasibility of genotyping of hundreds and thousands of single nucleotide polymorphisms (SNPs in thousands of study subjects have triggered the need for fast, powerful, and reliable methods for genome-wide association analysis. Here we consider a situation when study participants are genetically related (e.g. due to systematic sampling of families or because a study was performed in a genetically isolated population. Of the available methods that account for relatedness, the Measured Genotype (MG approach is considered the 'gold standard'. However, MG is not efficient with respect to time taken for the analysis of genome-wide data. In this context we proposed a fast two-step method called Genome-wide Association using Mixed Model and Regression (GRAMMAR for the analysis of pedigree-based quantitative traits. This method certainly overcomes the drawback of time limitation of the measured genotype (MG approach, but pays in power. One of the major drawbacks of both MG and GRAMMAR, is that they crucially depend on the availability of complete and correct pedigree data, which is rarely available. METHODOLOGY: In this study we first explore type 1 error and relative power of MG, GRAMMAR, and Genomic Control (GC approaches for genetic association analysis. Secondly, we propose an extension to GRAMMAR i.e. GRAMMAR-GC. Finally, we propose application of GRAMMAR-GC using the kinship matrix estimated through genomic marker data, instead of (possibly missing and/or incorrect genealogy. CONCLUSION: Through simulations we show that MG approach maintains high power across a range of heritabilities and possible pedigree structures, and always outperforms other contemporary methods. We also show that the power of our proposed GRAMMAR-GC approaches to that of the 'gold standard' MG for all models and pedigrees studied. We show that this method is both feasible and powerful and has correct type 1 error in the context of genome-wide association analysis

  16. Collinearity analysis of Brassica A and C genomes based on an updated inferred unigene order

    Directory of Open Access Journals (Sweden)

    Ian Bancroft

    2015-06-01

    Full Text Available This data article includes SNP scoring across lines of the Brassica napus TNDH population based on Illumina sequencing of mRNA, expanded to 75 lines. The 21, 323 mapped markers defined 887 recombination bins, representing an updated genetic linkage map for the species. Based on this new map, 5 genome sequence scaffolds were split and the order and orientation of scaffolds updated to establish a new pseudomolecule specification. The order of unigenes and SNP array probes within these pseudomolecules was determined. Unigenes were assessed for sequence similarity to the A and C genomes. The 57, 246 that mapped to both enabled the collinearity of the A and C genomes to be illustrated graphically. Although the great majority was in collinear positions, some were not. Analyses of 60 such instances are presented, suggesting that the breakdown in collinearity was largely due to either the absence of the homoeologue on one genome (resulting in sequence match to a paralogue or multiple similar sequences being present. The mRNAseq datasets for the TNDH lines are available from the SRA repository (ERA283648; the remaining datasets are supplied with this article.

  17. FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption.

    Science.gov (United States)

    Zhang, Yuchen; Dai, Wenrui; Jiang, Xiaoqian; Xiong, Hongkai; Wang, Shuang

    2015-01-01

    The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment. We presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics. The proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets. Unlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics.

  18. Tree decomposition based fast search of RNA structures including pseudoknots in genomes.

    Science.gov (United States)

    Song, Yinglei; Liu, Chunmei; Malmberg, Russell; Pan, Fangfang; Cai, Liming

    2005-01-01

    Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.

  19. EchoBASE: an integrated post-genomic database for Escherichia coli.

    Science.gov (United States)

    Misra, Raju V; Horler, Richard S P; Reindl, Wolfgang; Goryanin, Igor I; Thomas, Gavin H

    2005-01-01

    EchoBASE (http://www.ecoli-york.org) is a relational database designed to contain and manipulate information from post-genomic experiments using the model bacterium Escherichia coli K-12. Its aim is to collate information from a wide range of sources to provide clues to the functions of the approximately 1500 gene products that have no confirmed cellular function. The database is built on an enhanced annotation of the updated genome sequence of strain MG1655 and the association of experimental data with the E.coli genes and their products. Experiments that can be held within EchoBASE include proteomics studies, microarray data, protein-protein interaction data, structural data and bioinformatics studies. EchoBASE also contains annotated information on 'orphan' enzyme activities from this microbe to aid characterization of the proteins that catalyse these elusive biochemical reactions.

  20. Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol.

    Directory of Open Access Journals (Sweden)

    Fei Lu

    Full Text Available Switchgrass (Panicum virgatum L. is a perennial grass that has been designated as an herbaceous model biofuel crop for the United States of America. To facilitate accelerated breeding programs of switchgrass, we developed both an association panel and linkage populations for genome-wide association study (GWAS and genomic selection (GS. All of the 840 individuals were then genotyped using genotyping by sequencing (GBS, generating 350 GB of sequence in total. As a highly heterozygous polyploid (tetraploid and octoploid species lacking a reference genome, switchgrass is highly intractable with earlier methodologies of single nucleotide polymorphism (SNP discovery. To access the genetic diversity of species like switchgrass, we developed a SNP discovery pipeline based on a network approach called the Universal Network-Enabled Analysis Kit (UNEAK. Complexities that hinder single nucleotide polymorphism discovery, such as repeats, paralogs, and sequencing errors, are easily resolved with UNEAK. Here, 1.2 million putative SNPs were discovered in a diverse collection of primarily upland, northern-adapted switchgrass populations. Further analysis of this data set revealed the fundamentally diploid nature of tetraploid switchgrass. Taking advantage of the high conservation of genome structure between switchgrass and foxtail millet (Setaria italica (L. P. Beauv., two parent-specific, synteny-based, ultra high-density linkage maps containing a total of 88,217 SNPs were constructed. Also, our results showed clear patterns of isolation-by-distance and isolation-by-ploidy in natural populations of switchgrass. Phylogenetic analysis supported a general south-to-north migration path of switchgrass. In addition, this analysis suggested that upland tetraploid arose from upland octoploid. All together, this study provides unparalleled insights into the diversity, genomic complexity, population structure, phylogeny, phylogeography, ploidy, and evolutionary dynamics

  1. Genomic-based tools for the risk assessment, management, and prevention of type 2 diabetes

    Directory of Open Access Journals (Sweden)

    Johansen Taber KA

    2015-01-01

    Full Text Available Katherine A Johansen Taber, Barry D DickinsonDepartment of Science and Biotechnology, American Medical Association, Chicago, IL, USAAbstract: Type 2 diabetes (T2D is a common and serious disorder and is a significant risk factor for the development of cardiovascular disease, neuropathy, nephropathy, retinopathy, periodontal disease, and foot ulcers and amputations. The burden of disease associated with T2D has led to an emphasis on early identification of the millions of individuals at high risk so that management and intervention strategies can be effectively implemented before disease progression begins. With increasing knowledge about the genetic basis of T2D, several genomic-based strategies have been tested for their ability to improve risk assessment, management and prevention. Genetic risk scores have been developed with the intent to more accurately identify those at risk for T2D and to potentially improve motivation and adherence to lifestyle modification programs. In addition, evidence is building that oral antihyperglycemic medications are subject to pharmacogenomic variation in a substantial number of patients, suggesting genomics may soon play a role in determining the most effective therapies. T2D is a complex disease that affects individuals differently, and risk prediction and treatment may be challenging for health care providers. Genomic approaches hold promise for their potential to improve risk prediction and tailor management for individual patients and to contribute to better health outcomes for those with T2D.Keywords: diabetes, genomic, risk prediction, management

  2. Enhancement of single guide RNA transcription for efficient CRISPR/Cas-based genomic engineering.

    Science.gov (United States)

    Ui-Tei, Kumiko; Maruyama, Shohei; Nakano, Yuko

    2017-06-01

    Genomic engineering using clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) protein is a promising approach for targeting the genomic DNA of virtually any organism in a sequence-specific manner. Recent remarkable advances in CRISPR/Cas technology have made it a feasible system for use in therapeutic applications and biotechnology. In the CRISPR/Cas system, a guide RNA (gRNA), interacting with the Cas protein, recognizes a genomic region with sequence complementarity, and the double-stranded DNA at the target site is cleaved by the Cas protein. A widely used gRNA is an RNA polymerase III (pol III)-driven single gRNA (sgRNA), which is produced by artificial fusion of CRISPR RNA (crRNA) and trans-activation crRNA (tracrRNA). However, we identified a TTTT stretch, known as a termination signal of RNA pol III, in the scaffold region of the sgRNA. Here, we revealed that sgRNA carrying a TTTT stretch reduces the efficiency of sgRNA transcription due to premature transcriptional termination, and decreases the efficiency of genome editing. Unexpectedly, it was also shown that the premature terminated sgRNA may have an adverse effect of inducing RNA interference. Such disadvantageous effects were avoided by substituting one base in the TTTT stretch.

  3. MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

    Science.gov (United States)

    Grimes, Susan M; Ji, Hanlee P

    2014-08-27

    Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

  4. Was it worth it? Patients' perspectives on the perceived value of genomic-based individualized medicine.

    Science.gov (United States)

    Halverson, Colin Me; Clift, Kristin E; McCormick, Jennifer B

    2016-04-01

    The value of genomic sequencing is often understood in terms of its ability to affect diagnosis or treatment. In these terms, successes occur only in a minority of cases. This paper presents views from patients who had exome sequencing done clinically to explore how they perceive the utility of genomic medicine. The authors used semi-structured, qualitative interviews in order to study patients' attitudes toward genomic sequencing in oncology and rare-disease settings. Participants from 37 cases were interviewed. In terms of the testing's key values-regardless of having received what clinicians described as meaningful results-participants expressed four qualities that are separate from traditional views of clinical utility: Participants felt they had been empowered over their own health. They felt they had contributed altruistically to the progress of genomic technology in medicine. They felt their suffering had been legitimated. They also felt a sense of closure, having done everything they could. Patients expressed overwhelmingly positive attitudes toward sequencing. Their rationale was not solely based on the results' clinical utility. It is important for clinicians to understand this non-medical reasoning as it pertains to patient decision-making and informed consent.

  5. Identification and characterization of short tandem repeats in the Tibetan macaque genome based on resequencing data.

    Science.gov (United States)

    Liu, San-Xu; Hou, Wei; Zhang, Xue-Yan; Peng, Chang-Jun; Yue, Bi-Song; Fan, Zhen-Xin; Li, Jing

    2018-07-18

    The Tibetan macaque, which is endemic to China, is currently listed as a Near Endangered primate species by the International Union for Conservation of Nature (IUCN). Short tandem repeats (STRs) refer to repetitive elements of genome sequence that range in length from 1-6 bp. They are found in many organisms and are widely applied in population genetic studies. To clarify the distribution characteristics of genome-wide STRs and understand their variation among Tibetan macaques, we conducted a genome-wide survey of STRs with next-generation sequencing of five macaque samples. A total of 1 077 790 perfect STRs were mined from our assembly, with an N50 of 4 966 bp. Mono-nucleotide repeats were the most abundant, followed by tetra- and di-nucleotide repeats. Analysis of GC content and repeats showed consistent results with other macaques. Furthermore, using STR analysis software (lobSTR), we found that the proportion of base pair deletions in the STRs was greater than that of insertions in the five Tibetan macaque individuals (Pgenome showed good amplification efficiency and could be used to study population genetics in Tibetan macaques. The neighbor-joining tree classified the five macaques into two different branches according to their geographical origin, indicating high genetic differentiation between the Huangshan and Sichuan populations. We elucidated the distribution characteristics of STRs in the Tibetan macaque genome and provided an effective method for screening polymorphic STRs. Our results also lay a foundation for future genetic variation studies of macaques.

  6. Towards precision medicine-based therapies for glioblastoma: interrogating human disease genomics and mouse phenotypes.

    Science.gov (United States)

    Chen, Yang; Gao, Zhen; Wang, Bingcheng; Xu, Rong

    2016-08-22

    Glioblastoma (GBM) is the most common and aggressive brain tumors. It has poor prognosis even with optimal radio- and chemo-therapies. Since GBM is highly heterogeneous, drugs that target on specific molecular profiles of individual tumors may achieve maximized efficacy. Currently, the Cancer Genome Atlas (TCGA) projects have identified hundreds of GBM-associated genes. We develop a drug repositioning approach combining disease genomics and mouse phenotype data towards predicting targeted therapies for GBM. We first identified disease specific mouse phenotypes using the most recently discovered GBM genes. Then we systematically searched all FDA-approved drugs for candidates that share similar mouse phenotype profiles with GBM. We evaluated the ranks for approved and novel GBM drugs, and compared with an existing approach, which also use the mouse phenotype data but not the disease genomics data. We achieved significantly higher ranks for the approved and novel GBM drugs than the earlier approach. For all positive examples of GBM drugs, we achieved a median rank of 9.2 45.6 of the top predictions have been demonstrated effective in inhibiting the growth of human GBM cells. We developed a computational drug repositioning approach based on both genomic and phenotypic data. Our approach prioritized existing GBM drugs and outperformed a recent approach. Overall, our approach shows potential in discovering new targeted therapies for GBM.

  7. Developing genomic knowledge bases and databases to support clinical management: current perspectives.

    Science.gov (United States)

    Huser, Vojtech; Sincan, Murat; Cimino, James J

    2014-01-01

    Personalized medicine, the ability to tailor diagnostic and treatment decisions for individual patients, is seen as the evolution of modern medicine. We characterize here the informatics resources available today or envisioned in the near future that can support clinical interpretation of genomic test results. We assume a clinical sequencing scenario (germline whole-exome sequencing) in which a clinical specialist, such as an endocrinologist, needs to tailor patient management decisions within his or her specialty (targeted findings) but relies on a genetic counselor to interpret off-target incidental findings. We characterize the genomic input data and list various types of knowledge bases that provide genomic knowledge for generating clinical decision support. We highlight the need for patient-level databases with detailed lifelong phenotype content in addition to genotype data and provide a list of recommendations for personalized medicine knowledge bases and databases. We conclude that no single knowledge base can currently support all aspects of personalized recommendations and that consolidation of several current resources into larger, more dynamic and collaborative knowledge bases may offer a future path forward.

  8. CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

    Directory of Open Access Journals (Sweden)

    Spraggins Thomas A

    2007-04-01

    Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the

  9. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    Science.gov (United States)

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  10. HANDS: a tool for genome-wide discovery of subgenome-specific base-identity in polyploids.

    KAUST Repository

    Mithani, Aziz

    2013-09-24

    The analysis of polyploid genomes is problematic because homeologous subgenome sequences are closely related. This relatedness makes it difficult to assign individual sequences to the specific subgenome from which they are derived, and hinders the development of polyploid whole genome assemblies.We here present a next-generation sequencing (NGS)-based approach for assignment of subgenome-specific base-identity at sites containing homeolog-specific polymorphisms (HSPs): \\'HSP base Assignment using NGS data through Diploid Similarity\\' (HANDS). We show that HANDS correctly predicts subgenome-specific base-identity at >90% of assayed HSPs in the hexaploid bread wheat (Triticum aestivum) transcriptome, thus providing a substantial increase in accuracy versus previous methods for homeolog-specific base assignment.We conclude that HANDS enables rapid and accurate genome-wide discovery of homeolog-specific base-identity, a capability having multiple applications in polyploid genomics.

  11. HANDS: a tool for genome-wide discovery of subgenome-specific base-identity in polyploids.

    KAUST Repository

    Mithani, Aziz; Belfield, Eric J; Brown, Carly; Jiang, Caifu; Leach, Lindsey J; Harberd, Nicholas P

    2013-01-01

    The analysis of polyploid genomes is problematic because homeologous subgenome sequences are closely related. This relatedness makes it difficult to assign individual sequences to the specific subgenome from which they are derived, and hinders the development of polyploid whole genome assemblies.We here present a next-generation sequencing (NGS)-based approach for assignment of subgenome-specific base-identity at sites containing homeolog-specific polymorphisms (HSPs): 'HSP base Assignment using NGS data through Diploid Similarity' (HANDS). We show that HANDS correctly predicts subgenome-specific base-identity at >90% of assayed HSPs in the hexaploid bread wheat (Triticum aestivum) transcriptome, thus providing a substantial increase in accuracy versus previous methods for homeolog-specific base assignment.We conclude that HANDS enables rapid and accurate genome-wide discovery of homeolog-specific base-identity, a capability having multiple applications in polyploid genomics.

  12. The role of duplications in the evolution of genomes highlights the need for evolutionary-based approaches in comparative genomics

    Directory of Open Access Journals (Sweden)

    Levasseur Anthony

    2011-02-01

    Full Text Available Abstract Understanding the evolutionary plasticity of the genome requires a global, comparative approach in which genetic events are considered both in a phylogenetic framework and with regard to population genetics and environmental variables. In the mechanisms that generate adaptive and non-adaptive changes in genomes, segmental duplications (duplication of individual genes or genomic regions and polyploidization (whole genome duplications are well-known driving forces. The probability of fixation and maintenance of duplicates depends on many variables, including population sizes and selection regimes experienced by the corresponding genes: a combination of stochastic and adaptive mechanisms has shaped all genomes. A survey of experimental work shows that the distinction made between fixation and maintenance of duplicates still needs to be conceptualized and mathematically modeled. Here we review the mechanisms that increase or decrease the probability of fixation or maintenance of duplicated genes, and examine the outcome of these events on the adaptation of the organisms. Reviewers This article was reviewed by Dr. Etienne Joly, Dr. Lutz Walter and Dr. W. Ford Doolittle.

  13. A Genomics-Based Model for Prediction of Severe Bioprosthetic Mitral Valve Calcification.

    Science.gov (United States)

    Ponasenko, Anastasia V; Khutornaya, Maria V; Kutikhin, Anton G; Rutkovskaya, Natalia V; Tsepokina, Anna V; Kondyukova, Natalia V; Yuzhalin, Arseniy E; Barbarash, Leonid S

    2016-08-31

    Severe bioprosthetic mitral valve calcification is a significant problem in cardiovascular surgery. Unfortunately, clinical markers did not demonstrate efficacy in prediction of severe bioprosthetic mitral valve calcification. Here, we examined whether a genomics-based approach is efficient in predicting the risk of severe bioprosthetic mitral valve calcification. A total of 124 consecutive Russian patients who underwent mitral valve replacement surgery were recruited. We investigated the associations of the inherited variation in innate immunity, lipid metabolism and calcium metabolism genes with severe bioprosthetic mitral valve calcification. Genotyping was conducted utilizing the TaqMan assay. Eight gene polymorphisms were significantly associated with severe bioprosthetic mitral valve calcification and were therefore included into stepwise logistic regression which identified male gender, the T/T genotype of the rs3775073 polymorphism within the TLR6 gene, the C/T genotype of the rs2229238 polymorphism within the IL6R gene, and the A/A genotype of the rs10455872 polymorphism within the LPA gene as independent predictors of severe bioprosthetic mitral valve calcification. The developed genomics-based model had fair predictive value with area under the receiver operating characteristic (ROC) curve of 0.73. In conclusion, our genomics-based approach is efficient for the prediction of severe bioprosthetic mitral valve calcification.

  14. In silico pattern-based analysis of the human cytomegalovirus genome.

    Science.gov (United States)

    Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas

    2003-04-01

    More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).

  15. Life based on phosphite: a genome-guided analysis of Desulfotignum phosphitoxidans.

    Science.gov (United States)

    Poehlein, Anja; Daniel, Rolf; Schink, Bernhard; Simeonova, Diliana D

    2013-11-02

    The Delta-Proteobacterium Desulfotignum phosphitoxidans is a type strain of the genus Desulfotignum, which comprises to date only three species together with D. balticum and D. toluenicum. D. phosphitoxidans oxidizes phosphite to phosphate as its only source of electrons, with either sulfate or CO2 as electron acceptor to gain its metabolic energy, which is of exclusive interest. Sequencing of the genome of this bacterium was undertaken to elucidate the genomic basis of this so far unique type of energy metabolism. The genome contains 4,998,761 base pairs and 4646 genes of which 3609 were assigned to a function, and 1037 are without function prediction. Metabolic reconstruction revealed that most biosynthetic pathways of Gram negative, autotrophic sulfate reducers were present. Autotrophic CO2 assimilation proceeds through the Wood-Ljungdahl pathway. Additionally, we have found and confirmed the ability of the strain to couple phosphite oxidation to dissimilatory nitrate reduction to ammonia, which in itself is a new type of energy metabolism. Surprisingly, only two pathways for uptake, assimilation and utilization of inorganic and organic phosphonates were found in the genome. The unique for D. phosphitoxidans Ptx-Ptd cluster is involved in inorganic phosphite oxidation and an atypical C-P lyase-coding cluster (Phn) is involved in utilization of organophosphonates. We present the whole genome sequence of the first bacterium able to gain metabolic energy via phosphite oxidation. The data obtained provide initial information on the composition and architecture of the phosphite-utilizing and energy-transducing systems needed to live with phosphite as an unusual electron donor.

  16. Genome-Based Studies of Marine Microorganisms to Maximize the Diversity of Natural Products Discovery for Medical Treatments

    Directory of Open Access Journals (Sweden)

    Xin-Qing Zhao

    2011-01-01

    Full Text Available Marine microorganisms are rich source for natural products which play important roles in pharmaceutical industry. Over the past decade, genome-based studies of marine microorganisms have unveiled the tremendous diversity of the producers of natural products and also contributed to the efficiency of harness the strain diversity and chemical diversity, as well as the genetic diversity of marine microorganisms for the rapid discovery and generation of new natural products. In the meantime, genomic information retrieved from marine symbiotic microorganisms can also be employed for the discovery of new medical molecules from yet-unculturable microorganisms. In this paper, the recent progress in the genomic research of marine microorganisms is reviewed; new tools of genome mining as well as the advance in the activation of orphan pathways and metagenomic studies are summarized. Genome-based research of marine microorganisms will maximize the biodiscovery process and solve the problems of supply and sustainability of drug molecules for medical treatments.

  17. Visualization for genomics: the Microbial Genome Viewer.

    NARCIS (Netherlands)

    Kerkhoven, R.; Enckevort, F.H.J. van; Boekhorst, J.; Molenaar, D; Siezen, R.J.

    2004-01-01

    SUMMARY: A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a

  18. Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing.

    Directory of Open Access Journals (Sweden)

    Alexander William Eastman

    2015-01-01

    Full Text Available Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing

  19. Genomic and pedigree-based prediction for leaf, stem, and stripe rust resistance in wheat.

    Science.gov (United States)

    Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Huerta-Espino, Julio; Lan, Caixia; Bhavani, Sridhar; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E

    2017-07-01

    Genomic prediction for seedling and adult plant resistance to wheat rusts was compared to prediction using few markers as fixed effects in a least-squares approach and pedigree-based prediction. The unceasing plant-pathogen arms race and ephemeral nature of some rust resistance genes have been challenging for wheat (Triticum aestivum L.) breeding programs and farmers. Hence, it is important to devise strategies for effective evaluation and exploitation of quantitative rust resistance. One promising approach that could accelerate gain from selection for rust resistance is 'genomic selection' which utilizes dense genome-wide markers to estimate the breeding values (BVs) for quantitative traits. Our objective was to compare three genomic prediction models including genomic best linear unbiased prediction (GBLUP), GBLUP A that was GBLUP with selected loci as fixed effects and reproducing kernel Hilbert spaces-markers (RKHS-M) with least-squares (LS) approach, RKHS-pedigree (RKHS-P), and RKHS markers and pedigree (RKHS-MP) to determine the BVs for seedling and/or adult plant resistance (APR) to leaf rust (LR), stem rust (SR), and stripe rust (YR). The 333 lines in the 45th IBWSN and the 313 lines in the 46th IBWSN were genotyped using genotyping-by-sequencing and phenotyped in replicated trials. The mean prediction accuracies ranged from 0.31-0.74 for LR seedling, 0.12-0.56 for LR APR, 0.31-0.65 for SR APR, 0.70-0.78 for YR seedling, and 0.34-0.71 for YR APR. For most datasets, the RKHS-MP model gave the highest accuracies, while LS gave the lowest. GBLUP, GBLUP A, RKHS-M, and RKHS-P models gave similar accuracies. Using genome-wide marker-based models resulted in an average of 42% increase in accuracy over LS. We conclude that GS is a promising approach for improvement of quantitative rust resistance and can be implemented in the breeding pipeline.

  20. Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm

    Energy Technology Data Exchange (ETDEWEB)

    Seaver, Samuel M. D.; Bradbury, Louis M. T.; Frelin, Océane; Zarecki, Raphy; Ruppin, Eytan; Hanson, Andrew D.; Henry, Christopher S.

    2015-03-10

    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes.

  1. A RAD-based linkage map and comparative genomics in the gudgeons (genus Gnathopogon, Cyprinidae

    Directory of Open Access Journals (Sweden)

    Kakioka Ryo

    2013-01-01

    Full Text Available Abstract Background The construction of linkage maps is a first step in exploring the genetic basis for adaptive phenotypic divergence in closely related species by quantitative trait locus (QTL analysis. Linkage maps are also useful for comparative genomics in non-model organisms. Advances in genomics technologies make it more feasible than ever to study the genetics of adaptation in natural populations. Restriction-site associated DNA (RAD sequencing in next-generation sequencers facilitates the development of many genetic markers and genotyping. We aimed to construct a linkage map of the gudgeons of the genus Gnathopogon (Cyprinidae for comparative genomics with the zebrafish Danio rerio (a member of the same family as gudgeons and for the future QTL analysis of the genetic architecture underlying adaptive phenotypic evolution of Gnathopogon. Results We constructed the first genetic linkage map of Gnathopogon using a 198 F2 interspecific cross between two closely related species in Japan: river-dwelling Gnathopogon elongatus and lake-dwelling Gnathopogon caerulescens. Based on 1,622 RAD-tag markers, a linkage map spanning 1,390.9 cM with 25 linkage groups and an average marker interval of 0.87 cM was constructed. We also identified a region involving female-specific transmission ratio distortion (TRD. Synteny and collinearity were extensively conserved between Gnathopogon and zebrafish. Conclusions The dense SNP-based linkage map presented here provides a basis for future QTL analysis. It will also be useful for transferring genomic information from a “traditional” model fish species, zebrafish, to screen candidate genes underlying ecologically important traits of the gudgeons.

  2. A LDA-based approach to promoting ranking diversity for genomics information retrieval.

    Science.gov (United States)

    Chen, Yan; Yin, Xiaoshi; Li, Zhoujun; Hu, Xiaohua; Huang, Jimmy Xiangji

    2012-06-11

    In the biomedical domain, there are immense data and tremendous increase of genomics and biomedical relevant publications. The wealth of information has led to an increasing amount of interest in and need for applying information retrieval techniques to access the scientific literature in genomics and related biomedical disciplines. In many cases, the desired information of a query asked by biologists is a list of a certain type of entities covering different aspects that are related to the question, such as cells, genes, diseases, proteins, mutations, etc. Hence, it is important of a biomedical IR system to be able to provide relevant and diverse answers to fulfill biologists' information needs. However traditional IR model only concerns with the relevance between retrieved documents and user query, but does not take redundancy between retrieved documents into account. This will lead to high redundancy and low diversity in the retrieval ranked lists. In this paper, we propose an approach which employs a topic generative model called Latent Dirichlet Allocation (LDA) to promoting ranking diversity for biomedical information retrieval. Different from other approaches or models which consider aspects on word level, our approach assumes that aspects should be identified by the topics of retrieved documents. We present LDA model to discover topic distribution of retrieval passages and word distribution of each topic dimension, and then re-rank retrieval results with topic distribution similarity between passages based on N-size slide window. We perform our approach on TREC 2007 Genomics collection and two distinctive IR baseline runs, which can achieve 8% improvement over the highest Aspect MAP reported in TREC 2007 Genomics track. The proposed method is the first study of adopting topic model to genomics information retrieval, and demonstrates its effectiveness in promoting ranking diversity as well as in improving relevance of ranked lists of genomics search

  3. IVAG: An Integrative Visualization Application for Various Types of Genomic Data Based on R-Shiny and the Docker Platform.

    Science.gov (United States)

    Lee, Tae-Rim; Ahn, Jin Mo; Kim, Gyuhee; Kim, Sangsoo

    2017-12-01

    Next-generation sequencing (NGS) technology has become a trend in the genomics research area. There are many software programs and automated pipelines to analyze NGS data, which can ease the pain for traditional scientists who are not familiar with computer programming. However, downstream analyses, such as finding differentially expressed genes or visualizing linkage disequilibrium maps and genome-wide association study (GWAS) data, still remain a challenge. Here, we introduce a dockerized web application written in R using the Shiny platform to visualize pre-analyzed RNA sequencing and GWAS data. In addition, we have integrated a genome browser based on the JBrowse platform and an automated intermediate parsing process required for custom track construction, so that users can easily build and navigate their personal genome tracks with in-house datasets. This application will help scientists perform series of downstream analyses and obtain a more integrative understanding about various types of genomic data by interactively visualizing them with customizable options.

  4. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    Directory of Open Access Journals (Sweden)

    Borui Pi

    Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  5. Demography-adjusted tests of neutrality based on genome-wide SNP data

    KAUST Repository

    Rafajlović, Marina

    2014-08-01

    Tests of the neutral evolution hypothesis are usually built on the standard model which assumes that mutations are neutral and the population size remains constant over time. However, it is unclear how such tests are affected if the last assumption is dropped. Here, we extend the unifying framework for tests based on the site frequency spectrum, introduced by Achaz and Ferretti, to populations of varying size. Key ingredients are the first two moments of the site frequency spectrum. We show how these moments can be computed analytically if a population has experienced two instantaneous size changes in the past. We apply our method to data from ten human populations gathered in the 1000 genomes project, estimate their demographies and define demography-adjusted versions of Tajima\\'s D, Fay & Wu\\'s H, and Zeng\\'s E. Our results show that demography-adjusted test statistics facilitate the direct comparison between populations and that most of the differences among populations seen in the original unadjusted tests can be explained by their underlying demographies. Upon carrying out whole-genome screens for deviations from neutrality, we identify candidate regions of recent positive selection. We provide track files with values of the adjusted and unadjusted tests for upload to the UCSC genome browser. © 2014 Elsevier Inc.

  6. SeMPI: a genome-based secondary metabolite prediction and identification web server.

    Science.gov (United States)

    Zierep, Paul F; Padilla, Natàlia; Yonchev, Dimitar G; Telukunta, Kiran K; Klementz, Dennis; Günther, Stefan

    2017-07-03

    The secondary metabolism of bacteria, fungi and plants yields a vast number of bioactive substances. The constantly increasing amount of published genomic data provides the opportunity for an efficient identification of gene clusters by genome mining. Conversely, for many natural products with resolved structures, the encoding gene clusters have not been identified yet. Even though genome mining tools have become significantly more efficient in the identification of biosynthetic gene clusters, structural elucidation of the actual secondary metabolite is still challenging, especially due to as yet unpredictable post-modifications. Here, we introduce SeMPI, a web server providing a prediction and identification pipeline for natural products synthesized by polyketide synthases of type I modular. In order to limit the possible structures of PKS products and to include putative tailoring reactions, a structural comparison with annotated natural products was introduced. Furthermore, a benchmark was designed based on 40 gene clusters with annotated PKS products. The web server of the pipeline (SeMPI) is freely available at: http://www.pharmaceutical-bioinformatics.de/sempi. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. UniPrimer: A Web-Based Primer Design Tool for Comparative Analyses of Primate Genomes

    Directory of Open Access Journals (Sweden)

    Nomin Batnyam

    2012-01-01

    Full Text Available Whole genome sequences of various primates have been released due to advanced DNA-sequencing technology. A combination of computational data mining and the polymerase chain reaction (PCR assay to validate the data is an excellent method for conducting comparative genomics. Thus, designing primers for PCR is an essential procedure for a comparative analysis of primate genomes. Here, we developed and introduced UniPrimer for use in those studies. UniPrimer is a web-based tool that designs PCR- and DNA-sequencing primers. It compares the sequences from six different primates (human, chimpanzee, gorilla, orangutan, gibbon, and rhesus macaque and designs primers on the conserved region across species. UniPrimer is linked to RepeatMasker, Primer3Plus, and OligoCalc softwares to produce primers with high accuracy and UCSC In-Silico PCR to confirm whether the designed primers work. To test the performance of UniPrimer, we designed primers on sample sequences using UniPrimer and manually designed primers for the same sequences. The comparison of the two processes showed that UniPrimer was more effective than manual work in terms of saving time and reducing errors.

  8. A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

    Science.gov (United States)

    Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

    2015-01-01

    Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180

  9. Identification of somatic mutations in cancer through Bayesian-based analysis of sequenced genome pairs.

    Science.gov (United States)

    Christoforides, Alexis; Carpten, John D; Weiss, Glen J; Demeure, Michael J; Von Hoff, Daniel D; Craig, David W

    2013-05-04

    The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.

  10. Systematic differences in the response of genetic variation to pedigree and genome-based selection methods.

    Science.gov (United States)

    Heidaritabar, M; Vereijken, A; Muir, W M; Meuwissen, T; Cheng, H; Megens, H-J; Groenen, M A M; Bastiaansen, J W M

    2014-12-01

    Genomic selection (GS) is a DNA-based method of selecting for quantitative traits in animal and plant breeding, and offers a potentially superior alternative to traditional breeding methods that rely on pedigree and phenotype information. Using a 60 K SNP chip with markers spaced throughout the entire chicken genome, we compared the impact of GS and traditional BLUP (best linear unbiased prediction) selection methods applied side-by-side in three different lines of egg-laying chickens. Differences were demonstrated between methods, both at the level and genomic distribution of allele frequency changes. In all three lines, the average allele frequency changes were larger with GS, 0.056 0.064 and 0.066, compared with BLUP, 0.044, 0.045 and 0.036 for lines B1, B2 and W1, respectively. With BLUP, 35 selected regions (empirical P selected regions were identified. Empirical thresholds for local allele frequency changes were determined from gene dropping, and differed considerably between GS (0.167-0.198) and BLUP (0.105-0.126). Between lines, the genomic regions with large changes in allele frequencies showed limited overlap. Our results show that GS applies selection pressure much more locally than BLUP, resulting in larger allele frequency changes. With these results, novel insights into the nature of selection on quantitative traits have been gained and important questions regarding the long-term impact of GS are raised. The rapid changes to a part of the genetic architecture, while another part may not be selected, at least in the short term, require careful consideration, especially when selection occurs before phenotypes are observed.

  11. Molecular Characterization of Five Potyviruses Infecting Korean Sweet Potatoes Based on Analyses of Complete Genome Sequences

    Directory of Open Access Journals (Sweden)

    Hae-Ryun Kwak

    2015-12-01

    Full Text Available Sweet potatoes (Ipomea batatas L. are grown extensively, in tropical and temperate regions, and are important food crops worldwide. In Korea, potyviruses, including Sweet potato feathery mottle virus (SPFMV, Sweet potato virus C (SPVC, Sweet potato virus G (SPVG, Sweet potato virus 2 (SPV2, and Sweet potato latent virus (SPLV, have been detected in sweet potato fields at a high (~95% incidence. In the present work, complete genome sequences of 18 isolates, representing the five potyviruses mentioned above, were compared with previously reported genome sequences. The complete genomes consisted of 10,081 to 10,830 nucleotides, excluding the poly-A tails. Their genomic organizations were typical of the Potyvirus genus, including one target open reading frame coding for a putative polyprotein. Based on phylogenetic analyses and sequence comparisons, the Korean SPFMV isolates belonged to the strains RC and O with >98% nucleotide sequence identity. Korean SPVC isolates had 99% identity to the Japanese isolate SPVC-Bungo and 70% identity to the SPFMV isolates. The Korean SPVG isolates showed 99% identity to the three previously reported SPVG isolates. Korean SPV2 isolates had 97% identity to the SPV2 GWB-2 isolate from the USA. Korean SPLV isolates had a relatively low (88% nucleotide sequence identity with the Taiwanese SPLV-TW isolates, and they were phylogenetically distantly related to SPFMV isolates. Recombination analysis revealed that possible recombination events occurred in the P1, HC-Pro and NIa-NIb regions of SPFMV and SPLV isolates and these regions were identified as hotspots for recombination in the sweet potato potyviruses.

  12. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

    Science.gov (United States)

    Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre; Borodovsky, Mark; Stanke, Mario

    2016-03-01

    Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Prognostic Impact of Array-based Genomic Profiles in Esophageal Squamous Cell Cancer

    International Nuclear Information System (INIS)

    Carneiro, Ana; Isinger, Anna; Karlsson, Anna; Johansson, Jan; Jönsson, Göran; Bendahl, Pär-Ola; Falkenback, Dan; Halvarsson, Britta; Nilbert, Mef

    2008-01-01

    Esophageal squamous cell carcinoma (ESCC) is a genetically complex tumor type and a major cause of cancer related mortality. Although distinct genetic alterations have been linked to ESCC development and prognosis, the genetic alterations have not gained clinical applicability. We applied array-based comparative genomic hybridization (aCGH) to obtain a whole genome copy number profile relevant for identifying deranged pathways and clinically applicable markers. A 32 k aCGH platform was used for high resolution mapping of copy number changes in 30 stage I-IV ESCC. Potential interdependent alterations and deranged pathways were identified and copy number changes were correlated to stage, differentiation and survival. Copy number alterations affected median 19% of the genome and included recurrent gains of chromosome regions 5p, 7p, 7q, 8q, 10q, 11q, 12p, 14q, 16p, 17p, 19p, 19q, and 20q and losses of 3p, 5q, 8p, 9p and 11q. High-level amplifications were observed in 30 regions and recurrently involved 7p11 (EGFR), 11q13 (MYEOV, CCND1, FGF4, FGF3, PPFIA, FAD, TMEM16A, CTTS and SHANK2) and 11q22 (PDFG). Gain of 7p22.3 predicted nodal metastases and gains of 1p36.32 and 19p13.3 independently predicted poor survival in multivariate analysis. aCGH profiling verified genetic complexity in ESCC and herein identified imbalances of multiple central tumorigenic pathways. Distinct gains correlate with clinicopathological variables and independently predict survival, suggesting clinical applicability of genomic profiling in ESCC

  14. Genome-wide conserved non-coding microsatellite (CNMS) marker-based integrative genetical genomics for quantitative dissection of seed weight in chickpea.

    Science.gov (United States)

    Bajaj, Deepak; Saxena, Maneesha S; Kujur, Alice; Das, Shouvik; Badoni, Saurabh; Tripathi, Shailesh; Upadhyaya, Hari D; Gowda, C L L; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K; Parida, Swarup K

    2015-03-01

    Phylogenetic footprinting identified 666 genome-wide paralogous and orthologous CNMS (conserved non-coding microsatellite) markers from 5'-untranslated and regulatory regions (URRs) of 603 protein-coding chickpea genes. The (CT)n and (GA)n CNMS carrying CTRMCAMV35S and GAGA8BKN3 regulatory elements, respectively, are abundant in the chickpea genome. The mapped genic CNMS markers with robust amplification efficiencies (94.7%) detected higher intraspecific polymorphic potential (37.6%) among genotypes, implying their immense utility in chickpea breeding and genetic analyses. Seventeen differentially expressed CNMS marker-associated genes showing strong preferential and seed tissue/developmental stage-specific expression in contrasting genotypes were selected to narrow down the gene targets underlying seed weight quantitative trait loci (QTLs)/eQTLs (expression QTLs) through integrative genetical genomics. The integration of transcript profiling with seed weight QTL/eQTL mapping, molecular haplotyping, and association analyses identified potential molecular tags (GAGA8BKN3 and RAV1AAT regulatory elements and alleles/haplotypes) in the LOB-domain-containing protein- and KANADI protein-encoding transcription factor genes controlling the cis-regulated expression for seed weight in the chickpea. This emphasizes the potential of CNMS marker-based integrative genetical genomics for the quantitative genetic dissection of complex seed weight in chickpea. © The Author 2014. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  15. Molecular analysis of single oocyst of Eimeria by whole genome amplification (WGA) based nested PCR.

    Science.gov (United States)

    Wang, Yunzhou; Tao, Geru; Cui, Yujuan; Lv, Qiyao; Xie, Li; Li, Yuan; Suo, Xun; Qin, Yinghe; Xiao, Lihua; Liu, Xianyong

    2014-09-01

    PCR-based molecular tools are widely used for the identification and characterization of protozoa. Here we report the molecular analysis of Eimeria species using combined methods of whole genome amplification (WGA) and nested PCR. Single oocyst of Eimeria stiedai or Eimeriamedia was directly used for random amplification of the genomic DNA with either primer extension preamplification (PEP) or multiple displacement amplification (MDA), and then the WGA product was used as template in nested PCR with species-specific primers for ITS-1, 18S rDNA and 23S rDNA of E. stiedai and E. media. WGA-based PCR was successful for the amplification of these genes from single oocyst. For the species identification of single oocyst isolated from mixed E. stiedai or E. media, the results from WGA-based PCR were exactly in accordance with those from morphological identification, suggesting the availability of this method in molecular analysis of eimerian parasites at the single oocyst level. WGA-based PCR method can also be applied for the identification and genetic characterization of other protists. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    Science.gov (United States)

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813

  17. Development of versatile non-homologous end joining-based knock-in module for genome editing.

    Science.gov (United States)

    Sawatsubashi, Shun; Joko, Yudai; Fukumoto, Seiji; Matsumoto, Toshio; Sugano, Shigeo S

    2018-01-12

    CRISPR/Cas9-based genome editing has dramatically accelerated genome engineering. An important aspect of genome engineering is efficient knock-in technology. For improved knock-in efficiency, the non-homologous end joining (NHEJ) repair pathway has been used over the homology-dependent repair pathway, but there remains a need to reduce the complexity of the preparation of donor vectors. We developed the versatile NHEJ-based knock-in module for genome editing (VIKING). Using the consensus sequence of the time-honored pUC vector to cut donor vectors, any vector with a pUC backbone could be used as the donor vector without customization. Conditions required to minimize random integration rates of the donor vector were also investigated. We attempted to isolate null lines of the VDR gene in human HaCaT keratinocytes using knock-in/knock-out with a selection marker cassette, and found 75% of clones isolated were successfully knocked-in. Although HaCaT cells have hypotetraploid genome composition, the results suggest multiple clones have VDR null phenotypes. VIKING modules enabled highly efficient knock-in of any vectors harboring pUC vectors. Users now can insert various existing vectors into an arbitrary locus in the genome. VIKING will contribute to low-cost genome engineering.

  18. BPhyOG: An interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes

    Directory of Open Access Journals (Sweden)

    Lin Kui

    2007-07-01

    Full Text Available Abstract Background Overlapping genes (OGs in bacterial genomes are pairs of adjacent genes of which the coding sequences overlap partly or entirely. With the rapid accumulation of sequence data, many OGs in bacterial genomes have now been identified. Indeed, these might prove a consistent feature across all microbial genomes. Our previous work suggests that OGs can be considered as robust markers at the whole genome level for the construction of phylogenies. An online, interactive web server for inferring phylogenies is needed for biologists to analyze phylogenetic relationships among a set of bacterial genomes of interest. Description BPhyOG is an online interactive server for reconstructing the phylogenies of completely sequenced bacterial genomes on the basis of their shared overlapping genes. It provides two tree-reconstruction methods: Neighbor Joining (NJ and Unweighted Pair-Group Method using Arithmetic averages (UPGMA. Users can apply the desired method to generate phylogenetic trees, which are based on an evolutionary distance matrix for the selected genomes. The distance between two genomes is defined by the normalized number of their shared OG pairs. BPhyOG also allows users to browse the OGs that were used to infer the phylogenetic relationships. It provides detailed annotation for each OG pair and the features of the component genes through hyperlinks. Users can also retrieve each of the homologous OG pairs that have been determined among 177 genomes. It is a useful tool for analyzing the tree of life and overlapping genes from a genomic standpoint. Conclusion BPhyOG is a useful interactive web server for genome-wide inference of any potential evolutionary relationship among the genomes selected by users. It currently includes 177 completely sequenced bacterial genomes containing 79,855 OG pairs, the annotation and homologous OG pairs of which are integrated comprehensively. The reliability of phylogenies complemented by

  19. Rapid extraction of genomic DNA from saliva for HLA typing on microarray based on magnetic nanobeads

    Energy Technology Data Exchange (ETDEWEB)

    Xie Xin; Zhang Xu E-mail: shinezhang@hotmail.com; Yu Bingbin; Gao Huafang; Zhang Huan; Fei Weiyang

    2004-09-01

    A series of simplified protocols are developed for extracting genomic DNA from saliva by using the magnetic nanobeads as absorbents. In these protocols, both the enrichment of the target cells and the adsorption of DNA can be achieved simultaneously by our functionally modified magnetic beads in one step, and the DNA-nanobeads complex can be used as PCR templates. HLA typing based on an oligonucleotide array was conducted by hybridization with the PCR products. The result shows that the protocols are robust and sensitive.

  20. Integration of Genome Scale Metabolic Networks and Gene Regulation of Metabolic Enzymes With Physiologically Based Pharmacokinetics.

    Science.gov (United States)

    Maldonado, Elaina M; Leoncikas, Vytautas; Fisher, Ciarán P; Moore, J Bernadette; Plant, Nick J; Kierzek, Andrzej M

    2017-11-01

    The scope of physiologically based pharmacokinetic (PBPK) modeling can be expanded by assimilation of the mechanistic models of intracellular processes from systems biology field. The genome scale metabolic networks (GSMNs) represent a whole set of metabolic enzymes expressed in human tissues. Dynamic models of the gene regulation of key drug metabolism enzymes are available. Here, we introduce GSMNs and review ongoing work on integration of PBPK, GSMNs, and metabolic gene regulation. We demonstrate example models. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

  1. Low Base-Substitution Mutation Rate in the Germline Genome of the Ciliate Tetrahymena thermophila

    Science.gov (United States)

    2016-09-15

    Tetrahymena thermophila, a model eukaryote. PLoS Biol. 4:e286. Farlow A, et al. 2015. The spontaneous mutation rate in the fission yeast Schizosaccharomyces...spontane- ous mutations in yeast . Proc Natl Acad Sci U S A. 105:9272–9277. Lynn DH, Doerder FP. 2012. The life and times of Tetrahymena. Methods Cell...Low Base-Substitution Mutation Rate in the Germline Genome of the Ciliate Tetrahymena thermophila Hongan Long1,2,y, David J. Winter3,*,y, Allan Y.-C

  2. Phytoplasma infection in tomato is associated with re-organization of plasma membrane, ER stacks, and actin filaments in sieve elements.

    Science.gov (United States)

    Buxa, Stefanie V; Degola, Francesca; Polizzotto, Rachele; De Marco, Federica; Loschi, Alberto; Kogel, Karl-Heinz; di Toppi, Luigi Sanità; van Bel, Aart J E; Musetti, Rita

    2015-01-01

    Phytoplasmas, biotrophic wall-less prokaryotes, only reside in sieve elements of their host plants. The essentials of the intimate interaction between phytoplasmas and their hosts are poorly understood, which calls for research on potential ultrastructural modifications. We investigated modifications of the sieve-element ultrastructure induced in tomato plants by 'Candidatus Phytoplasma solani,' the pathogen associated with the stolbur disease. Phytoplasma infection induces a drastic re-organization of sieve-element substructures including changes in plasma membrane surface and distortion of the sieve-element reticulum. Observations of healthy and stolbur-diseased plants provided evidence for the emergence of structural links between sieve-element plasma membrane and phytoplasmas. One-sided actin aggregates on the phytoplasma surface also inferred a connection between phytoplasma and sieve-element cytoskeleton. Actin filaments displaced from the sieve-element mictoplasm to the surface of the phytoplasmas in infected sieve elements. Western blot analysis revealed a decrease of actin and an increase of ER-resident chaperone luminal binding protein (BiP) in midribs of phytoplasma-infected plants. Collectively, the studies provided novel insights into ultrastructural responses of host sieve elements to phloem-restricted prokaryotes.

  3. Genomic prediction based on data from three layer lines using non-linear regression models.

    Science.gov (United States)

    Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L

    2014-11-06

    Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional

  4. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  5. Investigating Drought Tolerance in Chickpea Using Genome-Wide Association Mapping and Genomic Selection Based on Whole-Genome Resequencing Data.

    Science.gov (United States)

    Li, Yongle; Ruperao, Pradeep; Batley, Jacqueline; Edwards, David; Khan, Tanveer; Colmer, Timothy D; Pang, Jiayin; Siddique, Kadambot H M; Sutton, Tim

    2018-01-01

    Drought tolerance is a complex trait that involves numerous genes. Identifying key causal genes or linked molecular markers can facilitate the fast development of drought tolerant varieties. Using a whole-genome resequencing approach, we sequenced 132 chickpea varieties and advanced breeding lines and found more than 144,000 single nucleotide polymorphisms (SNPs). We measured 13 yield and yield-related traits in three drought-prone environments of Western Australia. The genotypic effects were significant for all traits, and many traits showed highly significant correlations, ranging from 0.83 between grain yield and biomass to -0.67 between seed weight and seed emergence rate. To identify candidate genes, the SNP and trait data were incorporated into the SUPER genome-wide association study (GWAS) model, a modified version of the linear mixed model. We found that several SNPs from auxin-related genes, including auxin efflux carrier protein (PIN3), p-glycoprotein, and nodulin MtN21/EamA-like transporter, were significantly associated with yield and yield-related traits under drought-prone environments. We identified four genetic regions containing SNPs significantly associated with several different traits, which was an indication of pleiotropic effects. We also investigated the possibility of incorporating the GWAS results into a genomic selection (GS) model, which is another approach to deal with complex traits. Compared to using all SNPs, application of the GS model using subsets of SNPs significantly associated with the traits under investigation increased the prediction accuracies of three yield and yield-related traits by more than twofold. This has important implication for implementing GS in plant breeding programs.

  6. Investigating Drought Tolerance in Chickpea Using Genome-Wide Association Mapping and Genomic Selection Based on Whole-Genome Resequencing Data

    Directory of Open Access Journals (Sweden)

    Yongle Li

    2018-02-01

    Full Text Available Drought tolerance is a complex trait that involves numerous genes. Identifying key causal genes or linked molecular markers can facilitate the fast development of drought tolerant varieties. Using a whole-genome resequencing approach, we sequenced 132 chickpea varieties and advanced breeding lines and found more than 144,000 single nucleotide polymorphisms (SNPs. We measured 13 yield and yield-related traits in three drought-prone environments of Western Australia. The genotypic effects were significant for all traits, and many traits showed highly significant correlations, ranging from 0.83 between grain yield and biomass to -0.67 between seed weight and seed emergence rate. To identify candidate genes, the SNP and trait data were incorporated into the SUPER genome-wide association study (GWAS model, a modified version of the linear mixed model. We found that several SNPs from auxin-related genes, including auxin efflux carrier protein (PIN3, p-glycoprotein, and nodulin MtN21/EamA-like transporter, were significantly associated with yield and yield-related traits under drought-prone environments. We identified four genetic regions containing SNPs significantly associated with several different traits, which was an indication of pleiotropic effects. We also investigated the possibility of incorporating the GWAS results into a genomic selection (GS model, which is another approach to deal with complex traits. Compared to using all SNPs, application of the GS model using subsets of SNPs significantly associated with the traits under investigation increased the prediction accuracies of three yield and yield-related traits by more than twofold. This has important implication for implementing GS in plant breeding programs.

  7. Analysis and Comparison of Information Theory-based Distances for Genomic Strings

    Science.gov (United States)

    Balzano, Walter; Cicalese, Ferdinando; Del Sorbo, Maria Rosaria; Vaccaro, Ugo

    2008-07-01

    Genomic string comparison via alignment are widely applied for mining and retrieval of information in biological databases. In some situation, the effectiveness of such alignment based comparison is still unclear, e.g., for sequences with non-uniform length and with significant shuffling of identical substrings. An alternative approach is the one based on information theory distances. Biological data information content is stored in very long strings of only four characters. In last ten years, several entropic measures have been proposed for genomic string analysis. Notwithstanding their individual merit and experimental validation, to the nest of our knowledge, there is no direct comparison of these different metrics. We shall present four of the most representative alignment-free distance measures, based on mutual information. Each one has a different origin and expression. Our comparison involves a sort of arrangement, to reduce different concepts to a unique formalism, so as it has been possible to construct a phylogenetic tree for each of them. The trees produced via these metrics are compared to the ones widely accepted as biologically validated. In general the results provided more evidence of the reliability of the alignment-free distance models. Also, we observe that one of the metrics appeared to be more robust than the other three. We believe that this result can be object of further researches and observations. Many of the results of experimentation, the graphics and the table are available at the following URL: http://people.na.infn.it/˜wbalzano/BIO

  8. The complete mitochondrial genome of the enigmatic bigheadedturtle (Platysternon): description of unusual genomic features and thereconciliation of phylogenetic hypotheses based on mitochondrial andnuclear DNA

    Energy Technology Data Exchange (ETDEWEB)

    Parham, James F.; Feldman, Chris R.; Boore, Jeffrey L.

    2005-12-28

    The big-headed turtle (Platysternon megacephalum) from east Asia is the sole living representative of a poorly-studied turtle lineage (Platysternidae). It has no close living relatives, and its phylogenetic position within turtles is one of the outstanding controversies in turtle systematics. Platysternon was traditionally considered to be close to snapping turtles (Chelydridae) based on some studies of its morphology and mitochondrial (mt) DNA, however, other studies of morphology and nuclear (nu) DNA do not support that hypothesis. We sequenced the complete mt genome of Platysternon and the nearly complete mt genomes of two other relevant turtles and compared them to turtle mt genomes from the literature to form the largest molecular dataset used to date to address this issue. The resulting phylogeny robustly rejects the placement of Platysternon with Chelydridae, but instead shows that it is a member of the Testudinoidea, a diverse, nearly globally-distributed group that includes pond turtles and tortoises. We also discovered that Platysternon mtDNA has large-scale gene rearrangements and possesses two, nearly identical, control regions, features that distinguish it from all other studied turtles. Our study robustly determines the phylogenetic placement of Platysternon and provides a well-resolved outline of major turtle lineages, while demonstrating the significantly greater resolving power of comparing large amounts of mt sequence over that of short fragments. Earlier phylogenies placing Platysternon with chelydrids required a temporal gap in the fossil record that is now unnecessary. The duplicated control regions and gene rearrangements of the Platysternon mt DNA probably resulted from the duplication of part of the genome and then the subsequent loss of redundant genes. Although it is possible that having two control regions may provide some advantage, explaining why the control regions would be maintained while some of the duplicated genes were eroded

  9. Using community-based participatory research principles to develop more understandable recruitment and informed consent documents in genomic research.

    Directory of Open Access Journals (Sweden)

    Harlyn G Skinner

    Full Text Available Heart Healthy Lenoir is a transdisciplinary project aimed at creating long-term, sustainable approaches to reduce cardiovascular disease risk disparities in Lenoir County, North Carolina using a design spanning genomic analysis and clinical intervention. We hypothesized that residents of Lenoir County would be unfamiliar and mistrustful of genomic research, and therefore reluctant to participate; additionally, these feelings would be higher in African-Americans.To test our hypothesis, we conducted qualitative research using community-based participatory research principles to ensure our genomic research strategies addressed the needs, priorities, and concerns of the community. African-American (n = 19 and White (n = 16 adults in Lenoir County participated in four focus groups exploring perceptions about genomics and cardiovascular disease. Demographic surveys were administered and a semi-structured interview guide was used to facilitate discussions. The discussions were digitally recorded, transcribed verbatim, and analyzed in ATLAS.ti.From our analysis, key themes emerged: transparent communication, privacy, participation incentives and barriers, knowledge, and the impact of knowing. African-Americans were more concerned about privacy and community impact compared to Whites, however, African-Americans were still eager to participate in our genomic research project. The results from our formative study were used to improve the informed consent and recruitment processes by: 1 reducing misconceptions of genomic studies; and 2 helping to foster participant understanding and trust with the researchers. Our study demonstrates how community-based participatory research principles can be used to gain deeper insight into the community and increase participation in genomic research studies. Due in part to these efforts 80.3% of eligible African-American participants and 86.9% of eligible White participants enrolled in the Heart Healthy Lenoir Genomics

  10. Complete genome sequence of Klebsiella pneumoniae J1, a protein-based microbial flocculant-producing bacterium.

    Science.gov (United States)

    Pang, Changlong; Li, Ang; Cui, Di; Yang, Jixian; Ma, Fang; Guo, Haijuan

    2016-02-20

    Klebsiella pneumoniae J1 is a Gram-negative strain, which belongs to a protein-based microbial flocculant-producing bacterium. However, little genetic information is known about this species. Here we carried out a whole-genome sequence analysis of this strain and report the complete genome sequence of this organism and its genetic basis for carbohydrate metabolism, capsule biosynthesis and transport system. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Complete genome sequence of cyanobacterium Nostoc sp. NIES-3756, a potentially useful strain for phytochrome-based bioengineering.

    Science.gov (United States)

    Hirose, Yuu; Fujisawa, Takatomo; Ohtsubo, Yoshiyuki; Katayama, Mitsunori; Misawa, Naomi; Wakazuki, Sachiko; Shimura, Yohei; Nakamura, Yasukazu; Kawachi, Masanobu; Yoshikawa, Hirofumi; Eki, Toshihiko; Kanesaki, Yu

    2016-01-20

    To explore the diverse photoreceptors of cyanobacteria, we isolated Nostoc sp. strain NIES-3756 from soil at Mimomi-Park, Chiba, Japan, and determined its complete genome sequence. The Genome consists of one chromosome and two plasmids (total 6,987,571 bp containing no gaps). The NIES-3756 strain carries 7 phytochrome and 12 cyanobacteriochrome genes, which will facilitate the studies of phytochrome-based bioengineering. Copyright © 2015. Published by Elsevier B.V.

  12. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach.

    Science.gov (United States)

    Kohl, Thomas A; Diel, Roland; Harmsen, Dag; Rothgänger, Jörg; Walter, Karen Meywald; Merker, Matthias; Weniger, Thomas; Niemann, Stefan

    2014-07-01

    Whole-genome sequencing (WGS) allows for effective tracing of Mycobacterium tuberculosis complex (MTBC) (tuberculosis pathogens) transmission. However, it is difficult to standardize and, therefore, is not yet employed for interlaboratory prospective surveillance. To allow its widespread application, solutions for data standardization and storage in an easily expandable database are urgently needed. To address this question, we developed a core genome multilocus sequence typing (cgMLST) scheme for clinical MTBC isolates using the Ridom SeqSphere(+) software, which transfers the genome-wide single nucleotide polymorphism (SNP) diversity into an allele numbering system that is standardized, portable, and not computationally intensive. To test its performance, we performed WGS analysis of 26 isolates with identical IS6110 DNA fingerprints and spoligotyping patterns from a longitudinal outbreak in the federal state of Hamburg, Germany (notified between 2001 and 2010). The cgMLST approach (3,041 genes) discriminated the 26 strains with a resolution comparable to that of SNP-based WGS typing (one major cluster of 22 identical or closely related and four outlier isolates with at least 97 distinct SNPs or 63 allelic variants). Resulting tree topologies are highly congruent and grouped the isolates in both cases analogously. Our data show that SNP- and cgMLST-based WGS analyses facilitate high-resolution discrimination of longitudinal MTBC outbreaks. cgMLST allows for a meaningful epidemiological interpretation of the WGS genotyping data. It enables standardized WGS genotyping for epidemiological investigations, e.g., on the regional public health office level, and the creation of web-accessible databases for global TB surveillance with an integrated early warning system. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  13. The Genome-Based Metabolic Systems Engineering to Boost Levan Production in a Halophilic Bacterial Model.

    Science.gov (United States)

    Aydin, Busra; Ozer, Tugba; Oner, Ebru Toksoy; Arga, Kazim Yalcin

    2018-03-01

    Metabolic systems engineering is being used to redirect microbial metabolism for the overproduction of chemicals of interest with the aim of transforming microbial hosts into cellular factories. In this study, a genome-based metabolic systems engineering approach was designed and performed to improve biopolymer biosynthesis capability of a moderately halophilic bacterium Halomonas smyrnensis AAD6 T producing levan, which is a fructose homopolymer with many potential uses in various industries and medicine. For this purpose, the genome-scale metabolic model for AAD6 T was used to characterize the metabolic resource allocation, specifically to design metabolic engineering strategies for engineered bacteria with enhanced levan production capability. Simulations were performed in silico to determine optimal gene knockout strategies to develop new strains with enhanced levan production capability. The majority of the gene knockout strategies emphasized the vital role of the fructose uptake mechanism, and pointed out the fructose-specific phosphotransferase system (PTS fru ) as the most promising target for further metabolic engineering studies. Therefore, the PTS fru of AAD6 T was restructured with insertional mutagenesis and triparental mating techniques to construct a novel, engineered H. smyrnensis strain, BMA14. Fermentation experiments were carried out to demonstrate the high efficiency of the mutant strain BMA14 in terms of final levan concentration, sucrose consumption rate, and sucrose conversion efficiency, when compared to the AAD6 T . The genome-based metabolic systems engineering approach presented in this study might be considered an efficient framework to redirect microbial metabolism for the overproduction of chemicals of interest, and the novel strain BMA14 might be considered a potential microbial cell factory for further studies aimed to design levan production processes with lower production costs.

  14. Context based computational analysis and characterization of ARS consensus sequences (ACS of Saccharomyces cerevisiae genome

    Directory of Open Access Journals (Sweden)

    Vinod Kumar Singh

    2016-09-01

    Full Text Available Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS requires an essential consensus sequence (ACS for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC denoted as ORC-ACS and non-replicating ACS sequences (nrACS, that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  15. Swine transcriptome characterization by combined Iso-Seq and RNA-seq for annotating the emerging long read-based reference genome

    Science.gov (United States)

    PacBio long-read sequencing technology is increasingly popular in genome sequence assembly and transcriptome cataloguing. Recently, a new-generation pig reference genome was assembled based on long reads from this technology. To finely annotate this genome assembly, transcriptomes of nine tissues fr...

  16. Non-viral delivery systems for CRISPR/Cas9-based genome editing: Challenges and opportunities.

    Science.gov (United States)

    Li, Ling; Hu, Shuo; Chen, Xiaoyuan

    2018-07-01

    In recent years, CRISPR (clustered regularly interspaced short palindromic repeat)/Cas (CRISPR-associated) genome editing systems have become one of the most robust platforms in basic biomedical research and therapeutic applications. To date, efficient in vivo delivery of the CRISPR/Cas9 system to the targeted cells remains a challenge. Although viral vectors have been widely used in the delivery of the CRISPR/Cas9 system in vitro and in vivo, their fundamental shortcomings, such as the risk of carcinogenesis, limited insertion size, immune responses and difficulty in large-scale production, severely limit their further applications. Alternative non-viral delivery systems for CRISPR/Cas9 are urgently needed. With the rapid development of non-viral vectors, lipid- or polymer-based nanocarriers have shown great potential for CRISPR/Cas9 delivery. In this review, we analyze the pros and cons of delivering CRISPR/Cas9 systems in the form of plasmid, mRNA, or protein and then discuss the limitations and challenges of CRISPR/Cas9-based genome editing. Furthermore, current non-viral vectors that have been applied for CRISPR/Cas9 delivery in vitro and in vivo are outlined in details. Finally, critical obstacles for non-viral delivery of CRISPR/Cas9 system are highlighted and promising strategies to overcome these barriers are proposed. Published by Elsevier Ltd.

  17. A multi-objective constraint-based approach for modeling genome-scale microbial ecosystems.

    Science.gov (United States)

    Budinich, Marko; Bourdon, Jérémie; Larhlimi, Abdelhalim; Eveillard, Damien

    2017-01-01

    Interplay within microbial communities impacts ecosystems on several scales, and elucidation of the consequent effects is a difficult task in ecology. In particular, the integration of genome-scale data within quantitative models of microbial ecosystems remains elusive. This study advocates the use of constraint-based modeling to build predictive models from recent high-resolution -omics datasets. Following recent studies that have demonstrated the accuracy of constraint-based models (CBMs) for simulating single-strain metabolic networks, we sought to study microbial ecosystems as a combination of single-strain metabolic networks that exchange nutrients. This study presents two multi-objective extensions of CBMs for modeling communities: multi-objective flux balance analysis (MO-FBA) and multi-objective flux variability analysis (MO-FVA). Both methods were applied to a hot spring mat model ecosystem. As a result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, were emphasized. We expect this approach to be used for integrating genomic information in microbial ecosystems. Following models will provide insights about behaviors (including diversity) that take place at the ecosystem scale.

  18. A multi-objective constraint-based approach for modeling genome-scale microbial ecosystems.

    Directory of Open Access Journals (Sweden)

    Marko Budinich

    Full Text Available Interplay within microbial communities impacts ecosystems on several scales, and elucidation of the consequent effects is a difficult task in ecology. In particular, the integration of genome-scale data within quantitative models of microbial ecosystems remains elusive. This study advocates the use of constraint-based modeling to build predictive models from recent high-resolution -omics datasets. Following recent studies that have demonstrated the accuracy of constraint-based models (CBMs for simulating single-strain metabolic networks, we sought to study microbial ecosystems as a combination of single-strain metabolic networks that exchange nutrients. This study presents two multi-objective extensions of CBMs for modeling communities: multi-objective flux balance analysis (MO-FBA and multi-objective flux variability analysis (MO-FVA. Both methods were applied to a hot spring mat model ecosystem. As a result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, were emphasized. We expect this approach to be used for integrating genomic information in microbial ecosystems. Following models will provide insights about behaviors (including diversity that take place at the ecosystem scale.

  19. Genome profiling (GP method based classification of insects: congruence with that of classical phenotype-based one.

    Directory of Open Access Journals (Sweden)

    Shamim Ahmed

    Full Text Available Ribosomal RNAs have been widely used for identification and classification of species, and have produced data giving new insights into phylogenetic relationships. Recently, multilocus genotyping and even whole genome sequencing-based technologies have been adopted in ambitious comparative biology studies. However, such technologies are still far from routine-use in species classification studies due to their high costs in terms of labor, equipment and consumables.Here, we describe a simple and powerful approach for species classification called genome profiling (GP. The GP method composed of random PCR, temperature gradient gel electrophoresis (TGGE and computer-aided gel image processing is highly informative and less laborious. For demonstration, we classified 26 species of insects using GP and 18S rDNA-sequencing approaches. The GP method was found to give a better correspondence to the classical phenotype-based approach than did 18S rDNA sequencing employing a congruence value. To our surprise, use of a single probe in GP was sufficient to identify the relationships between the insect species, making this approach more straightforward.The data gathered here, together with those of previous studies show that GP is a simple and powerful method that can be applied for actually universally identifying and classifying species. The current success supported our previous proposal that GP-based web database can be constructible and effective for the global identification/classification of species.

  20. The SGC beyond structural genomics: redefining the role of 3D structures by coupling genomic stratification with fragment-based discovery.

    Science.gov (United States)

    Bradley, Anthony R; Echalier, Aude; Fairhead, Michael; Strain-Damerell, Claire; Brennan, Paul; Bullock, Alex N; Burgess-Brown, Nicola A; Carpenter, Elisabeth P; Gileadi, Opher; Marsden, Brian D; Lee, Wen Hwa; Yue, Wyatt; Bountra, Chas; von Delft, Frank

    2017-11-08

    The ongoing explosion in genomics data has long since outpaced the capacity of conventional biochemical methodology to verify the large number of hypotheses that emerge from the analysis of such data. In contrast, it is still a gold-standard for early phenotypic validation towards small-molecule drug discovery to use probe molecules (or tool compounds), notwithstanding the difficulty and cost of generating them. Rational structure-based approaches to ligand discovery have long promised the efficiencies needed to close this divergence; in practice, however, this promise remains largely unfulfilled, for a host of well-rehearsed reasons and despite the huge technical advances spearheaded by the structural genomics initiatives of the noughties. Therefore the current, fourth funding phase of the Structural Genomics Consortium (SGC), building on its extensive experience in structural biology of novel targets and design of protein inhibitors, seeks to redefine what it means to do structural biology for drug discovery. We developed the concept of a Target Enabling Package (TEP) that provides, through reagents, assays and data, the missing link between genetic disease linkage and the development of usefully potent compounds. There are multiple prongs to the ambition: rigorously assessing targets' genetic disease linkages through crowdsourcing to a network of collaborating experts; establishing a systematic approach to generate the protocols and data that comprise each target's TEP; developing new, X-ray-based fragment technologies for generating high quality chemical matter quickly and cheaply; and exploiting a stringently open access model to build multidisciplinary partnerships throughout academia and industry. By learning how to scale these approaches, the SGC aims to make structures finally serve genomics, as originally intended, and demonstrate how 3D structures systematically allow new modes of druggability to be discovered for whole classes of targets. © 2017 The

  1. A plant-based chemical genomics screen for the identification of flowering inducers.

    Science.gov (United States)

    Fiers, Martijn; Hoogenboom, Jorin; Brunazzi, Alice; Wennekes, Tom; Angenent, Gerco C; Immink, Richard G H

    2017-01-01

    Floral timing is a carefully regulated process, in which the plant determines the optimal moment to switch from the vegetative to reproductive phase. While there are numerous genes known that control flowering time, little information is available on chemical compounds that are able to influence this process. We aimed to discover novel compounds that are able to induce flowering in the model plant Arabidopsis. For this purpose we developed a plant-based screening platform that can be used in a chemical genomics study. Here we describe the set-up of the screening platform and various issues and pitfalls that need to be addressed in order to perform a chemical genomics screening on Arabidopsis plantlets. We describe the choice for a molecular marker, in combination with a sensitive reporter that's active in plants and is sufficiently sensitive for detection. In this particular screen, the firefly Luciferase marker was used, fused to the regulatory sequences of the floral meristem identity gene APETALA1 (AP1) , which is an early marker for flowering. Using this screening platform almost 9000 compounds were screened, in triplicate, in 96-well plates at a concentration of 25 µM. One of the identified potential flowering inducing compounds was studied in more detail and named Flowering1 (F1). F1 turned out to be an analogue of the plant hormone Salicylic acid (SA) and appeared to be more potent than SA in the induction of flowering. The effect could be confirmed by watering Arabidopsis plants with SA or F1, in which F1 gave a significant reduction in time to flowering in comparison to SA treatment or the control. In this study a chemical genomics screening platform was developed to discover compounds that can induce flowering in Arabidopsis. This platform was used successfully, to identify a compound that can speed-up flowering in Arabidopsis.

  2. Ethnobotany genomics - discovery and innovation in a new era of exploratory research

    Directory of Open Access Journals (Sweden)

    Ragupathy Subramanyam

    2010-01-01

    Full Text Available Abstract We present here the first use of DNA barcoding in a new approach to ethnobotany we coined "ethnobotany genomics". This new approach is founded on the concept of 'assemblage' of biodiversity knowledge, which includes a coming together of different ways of knowing and valorizing species variation in a novel approach seeking to add value to both traditional knowledge (TK and scientific knowledge (SK. We employed contemporary genomic technology, DNA barcoding, as an important tool for identifying cryptic species, which were already recognized ethnotaxa using the TK classification systems of local cultures in the Velliangiri Hills of India. This research is based on several case studies in our lab, which define an approach to that is poised to evolve quickly with the advent of new ideas and technology. Our results show that DNA barcoding validated several new cryptic plant species to science that were previously recognized by TK classifications of the Irulas and Malasars, and were lumped using SK classification. The contribution of the local aboriginal knowledge concerning plant diversity and utility in India is considerable; our study presents new ethnomedicine to science. Ethnobotany genomics can also be used to determine the distribution of rare species and their ecological requirements, including traditional ecological knowledge so that conservation strategies can be implemented. This is aligned with the Convention on Biological Diversity that was signed by over 150 nations, and thus the world's complex array of human-natural-technological relationships has effectively been re-organized.

  3. Ethnobotany genomics - discovery and innovation in a new era of exploratory research.

    Science.gov (United States)

    Newmaster, Steven G; Ragupathy, Subramanyam

    2010-01-26

    We present here the first use of DNA barcoding in a new approach to ethnobotany we coined "ethnobotany genomics". This new approach is founded on the concept of 'assemblage' of biodiversity knowledge, which includes a coming together of different ways of knowing and valorizing species variation in a novel approach seeking to add value to both traditional knowledge (TK) and scientific knowledge (SK). We employed contemporary genomic technology, DNA barcoding, as an important tool for identifying cryptic species, which were already recognized ethnotaxa using the TK classification systems of local cultures in the Velliangiri Hills of India. This research is based on several case studies in our lab, which define an approach to that is poised to evolve quickly with the advent of new ideas and technology. Our results show that DNA barcoding validated several new cryptic plant species to science that were previously recognized by TK classifications of the Irulas and Malasars, and were lumped using SK classification. The contribution of the local aboriginal knowledge concerning plant diversity and utility in India is considerable; our study presents new ethnomedicine to science. Ethnobotany genomics can also be used to determine the distribution of rare species and their ecological requirements, including traditional ecological knowledge so that conservation strategies can be implemented. This is aligned with the Convention on Biological Diversity that was signed by over 150 nations, and thus the world's complex array of human-natural-technological relationships has effectively been re-organized.

  4. Microbial Genome Analysis and Comparisons: Web-based Protocols and Resources

    Science.gov (United States)

    Fully annotated genome sequences of many microorganisms are publicly available as a resource. However, in-depth analysis of these genomes using specialized tools is required to derive meaningful information. We describe here the utility of three powerful publicly available genome databases and ana...

  5. Genomic prediction based on data from three layer lines: a comparison between linear methods

    NARCIS (Netherlands)

    Calus, M.P.L.; Huang, H.; Vereijken, J.; Visscher, J.; Napel, ten J.; Windig, J.J.

    2014-01-01

    Background The prediction accuracy of several linear genomic prediction models, which have previously been used for within-line genomic prediction, was evaluated for multi-line genomic prediction. Methods Compared to a conventional BLUP (best linear unbiased prediction) model using pedigree data, we

  6. CGI: Java software for mapping and visualizing data from array-based comparative genomic hybridization and expression profiling.

    Science.gov (United States)

    Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H; Lau, Ching C; Behl, Sanjiv; Man, Tsz-Kwong

    2007-10-06

    With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.

  7. CGI: Java Software for Mapping and Visualizing Data from Array-based Comparative Genomic Hybridization and Expression Profiling

    Directory of Open Access Journals (Sweden)

    Joyce Xiuweu-Xu Gu

    2007-01-01

    Full Text Available With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator that matches the BAC clones from array-based comparative genomic hybridization (aCGH to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specifi c BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.

  8. Integrate genome-based assessment of safety for probiotic strains: Bacillus coagulans GBI-30, 6086 as a case study.

    Science.gov (United States)

    Salvetti, Elisa; Orrù, Luigi; Capozzi, Vittorio; Martina, Alessia; Lamontanara, Antonella; Keller, David; Cash, Howard; Felis, Giovanna E; Cattivelli, Luigi; Torriani, Sandra; Spano, Giuseppe

    2016-05-01

    Probiotics are microorganisms that confer beneficial effects on the host; nevertheless, before being allowed for human consumption, their safety must be verified with accurate protocols. In the genomic era, such procedures should take into account the genomic-based approaches. This study aims at assessing the safety traits of Bacillus coagulans GBI-30, 6086 integrating the most updated genomics-based procedures and conventional phenotypic assays. Special attention was paid to putative virulence factors (VF), antibiotic resistance (AR) genes and genes encoding enzymes responsible for harmful metabolites (i.e. biogenic amines, BAs). This probiotic strain was phenotypically resistant to streptomycin and kanamycin, although the genome analysis suggested that the AR-related genes were not easily transferrable to other bacteria, and no other genes with potential safety risks, such as those related to VF or BA production, were retrieved. Furthermore, no unstable elements that could potentially lead to genomic rearrangements were detected. Moreover, a workflow is proposed to allow the proper taxonomic identification of a microbial strain and the accurate evaluation of risk-related gene traits, combining whole genome sequencing analysis with updated bioinformatics tools and standard phenotypic assays. The workflow presented can be generalized as a guideline for the safety investigation of novel probiotic strains to help stakeholders (from scientists to manufacturers and consumers) to meet regulatory requirements and avoid misleading information.

  9. A web accessible resource for investigating cassava phenomics and genomics information: BIOGEN BASE.

    Science.gov (United States)

    Jayakodi, Murukarthick; Selvan, Sreedevi Ghokhilamani; Natesan, Senthil; Muthurajan, Raveendran; Duraisamy, Raghu; Ramineni, Jana Jeevan; Rathinasamy, Sakthi Ambothi; Karuppusamy, Nageswari; Lakshmanan, Pugalenthi; Chokkappan, Mohan

    2011-01-01

    The goal of our research is to establish a unique portal to bring out the potential outcome of the research in the Casssava crop. The Biogen base for cassava clearly brings out the variations of different traits of the germplasms, maintained at the Tapioca and Castor Research Station, Tamil Nadu Agricultural University. Phenotypic and genotypic variations of the accessions are clearly depicted, for the users to browse and interpret the variations using the microsatellite markers. Database (BIOGEN BASE - CASSAVA) is designed using PHP and MySQL and is equipped with extensive search options. It is more user-friendly and made publicly available, to improve the research and development of cassava by making a wealth of genetics and genomics data available through open, common, and worldwide forum for all individuals interested in the field. The database is available for free at http://www.tnaugenomics.com/biogenbase/casava.php.

  10. Genomic Characterization of Flavobacterium psychrophilum Serotypes and Development of a Multiplex PCR-Based Serotyping Scheme

    Directory of Open Access Journals (Sweden)

    Tatiana Rochat

    2017-09-01

    Full Text Available Flavobacterium psychrophilum is a devastating bacterial pathogen of salmonids reared in freshwater worldwide. So far, serological diversity between isolates has been described but the underlying molecular factors remain unknown. By combining complete genome sequence analysis and the serotyping method proposed by Lorenzen and Olesen (1997 for a set of 34 strains, we identified key molecular determinants of the serotypes. This knowledge allowed us to develop a robust multiplex PCR-based serotyping scheme, which was applied to 244 bacterial isolates. The results revealed a striking association between PCR-serotype and fish host species and illustrate the use of this approach as a simple and cost-effective method for the determination of F. psychrophilum serogroups. PCR-based serotyping could be a useful tool in a range of applications such as disease surveillance, selection of salmonids for bacterial coldwater disease resistance and future vaccine formulation.

  11. NGS-based approach to determine the presence of HPV and their sites of integration in human cancer genome.

    Science.gov (United States)

    Chandrani, P; Kulkarni, V; Iyer, P; Upadhyay, P; Chaubal, R; Das, P; Mulherkar, R; Singh, R; Dutt, A

    2015-06-09

    Human papilloma virus (HPV) accounts for the most common cause of all virus-associated human cancers. Here, we describe the first graphic user interface (GUI)-based automated tool 'HPVDetector', for non-computational biologists, exclusively for detection and annotation of the HPV genome based on next-generation sequencing data sets. We developed a custom-made reference genome that comprises of human chromosomes along with annotated genome of 143 HPV types as pseudochromosomes. The tool runs on a dual mode as defined by the user: a 'quick mode' to identify presence of HPV types and an 'integration mode' to determine genomic location for the site of integration. The input data can be a paired-end whole-exome, whole-genome or whole-transcriptome data set. The HPVDetector is available in public domain for download: http://www.actrec.gov.in/pi-webpages/AmitDutt/HPVdetector/HPVDetector.html. On the basis of our evaluation of 116 whole-exome, 23 whole-transcriptome and 2 whole-genome data, we were able to identify presence of HPV in 20 exomes and 4 transcriptomes of cervical and head and neck cancer tumour samples. Using the inbuilt annotation module of HPVDetector, we found predominant integration of viral gene E7, a known oncogene, at known 17q21, 3q27, 7q35, Xq28 and novel sites of integration in the human genome. Furthermore, co-infection with high-risk HPVs such as 16 and 31 were found to be mutually exclusive compared with low-risk HPV71. HPVDetector is a simple yet precise and robust tool for detecting HPV from tumour samples using variety of next-generation sequencing platforms including whole genome, whole exome and transcriptome. Two different modes (quick detection and integration mode) along with a GUI widen the usability of HPVDetector for biologists and clinicians with minimal computational knowledge.

  12. The cost-benefit of genomic testing of heifers and using sexed semen in pasture-based dairy herds.

    Science.gov (United States)

    Newton, J E; Hayes, B J; Pryce, J E

    2018-04-25

    Recent improvements in dairy cow fertility and female reproductive technologies offer an opportunity to apply greater selection pressure to females. This means there may be greater incentive to obtain genomic breeding values for females. We modeled the impact of changes to key parameters on the net benefit from genomic testing of heifer calves with and without usage of sexed semen. This paper builds on earlier cost-benefit studies but uses parameters relevant to pasture-based systems. A deterministic model was used to evaluate the effect on net benefit due to changes in (1) reproduction rate, (2) genomic test costs, (3) availability of parent-derived breeding values (EBV PA ), and (4) replacement rate. When the use of sexed semen was included, we also considered (1) the proportion of heifers and cows mated to sexed semen, (2) decreases in conception rate in inseminations with sexed semen, and (3) the marginal return for surplus heifers. Scenarios with lower replacement rates and no availability of EBV PA had the largest net benefits. Under current Australian parameters, the net benefit of genomic testing realized over the lifetime of genotyped heifers is expected to range from A$204 to A$1,124 per 100 cows for a herd with median reproductive performance. The cost of a genomic test, a perceived barrier to many farmers, had only a small effect on net benefit. Genomic testing alone was always more profitable than using sexed semen and genomic testing together if the only benefit considered was increased genetic gain in heifer replacements. When other benefits (i.e., the higher sale price of a surplus heifer compared with a male calf) were considered, there were combinations of parameters where net benefit from using sexed semen and genomic testing was higher than the equivalent scenario with genomic testing only. Using sexed semen alongside genomic testing is most likely to be profitable when (1) used in heifers, (2) the marginal return for selling surplus heifers

  13. A first generation BAC-based physical map of the rainbow trout genome

    Directory of Open Access Journals (Sweden)

    Thorgaard Gary H

    2009-10-01

    Full Text Available Abstract Background Rainbow trout (Oncorhynchus mykiss are the most-widely cultivated cold freshwater fish in the world and an important model species for many research areas. Coupling great interest in this species as a research model with the need for genetic improvement of aquaculture production efficiency traits justifies the continued development of genomics research resources. Many quantitative trait loci (QTL have been identified for production and life-history traits in rainbow trout. A bacterial artificial chromosome (BAC physical map is needed to facilitate fine mapping of QTL and the selection of positional candidate genes for incorporation in marker-assisted selection (MAS for improving rainbow trout aquaculture production. This resource will also facilitate efforts to obtain and assemble a whole-genome reference sequence for this species. Results The physical map was constructed from DNA fingerprinting of 192,096 BAC clones using the 4-color high-information content fingerprinting (HICF method. The clones were assembled into physical map contigs using the finger-printing contig (FPC program. The map is composed of 4,173 contigs and 9,379 singletons. The total number of unique fingerprinting fragments (consensus bands in contigs is 1,185,157, which corresponds to an estimated physical length of 2.0 Gb. The map assembly was validated by 1 comparison with probe hybridization results and agarose gel fingerprinting contigs; and 2 anchoring large contigs to the microsatellite-based genetic linkage map. Conclusion The production and validation of the first BAC physical map of the rainbow trout genome is described in this paper. We are currently integrating this map with the NCCCWA genetic map using more than 200 microsatellites isolated from BAC end sequences and by identifying BACs that harbor more than 300 previously mapped markers. The availability of an integrated physical and genetic map will enable detailed comparative genome

  14. Integrative computational approach for genome-based study of microbial lipid-degrading enzymes.

    Science.gov (United States)

    Vorapreeda, Tayvich; Thammarongtham, Chinae; Laoteng, Kobkul

    2016-07-01

    Lipid-degrading or lipolytic enzymes have gained enormous attention in academic and industrial sectors. Several efforts are underway to discover new lipase enzymes from a variety of microorganisms with particular catalytic properties to be used for extensive applications. In addition, various tools and strategies have been implemented to unravel the functional relevance of the versatile lipid-degrading enzymes for special purposes. This review highlights the study of microbial lipid-degrading enzymes through an integrative computational approach. The identification of putative lipase genes from microbial genomes and metagenomic libraries using homology-based mining is discussed, with an emphasis on sequence analysis of conserved motifs and enzyme topology. Molecular modelling of three-dimensional structure on the basis of sequence similarity is shown to be a potential approach for exploring the structural and functional relationships of candidate lipase enzymes. The perspectives on a discriminative framework of cutting-edge tools and technologies, including bioinformatics, computational biology, functional genomics and functional proteomics, intended to facilitate rapid progress in understanding lipolysis mechanism and to discover novel lipid-degrading enzymes of microorganisms are discussed.

  15. Novel extraction strategy of ribosomal RNA and genomic DNA from cheese for PCR-based investigations.

    Science.gov (United States)

    Bonaïti, Catherine; Parayre, Sandrine; Irlinger, Françoise

    2006-03-15

    Cheese microorganisms, such as bacteria and fungi, constitute a complex ecosystem that plays a central role in cheeses ripening. The molecular study of cheese microbial diversity and activity is essential but the extraction of high quality nucleic acid may be problematic: the cheese samples are characterised by a strong buffering capacity which negatively influenced the yield of the extracted rRNA. The objective of this study is to develop an effective method for the direct and simultaneous isolation of yeast and bacterial ribosomal RNA and genomic DNA from the same cheese samples. DNA isolation was based on a protocol used for nucleic acids isolation from anaerobic digestor, without preliminary washing step with the combined use of the action of chaotropic agent (acid guanidinium thiocyanate), detergents (SDS, N-lauroylsarcosine), chelating agent (EDTA) and a mechanical method (bead beating system). The DNA purification was carried out by two washing steps of phenol-chloroform. RNA was isolated successfully after the second acid extraction step by recovering it from the phenolic phase of the first acid extraction. The novel method yielded pure preparation of undegraded RNA accessible for reverse transcription-PCR. The extraction protocol of genomic DNA and rRNA was applicable to complex ecosystem of different cheese matrices.

  16. Lessons Learned From A Study Of Genomics-Based Carrier Screening For Reproductive Decision Making.

    Science.gov (United States)

    Wilfond, Benjamin S; Kauffman, Tia L; Jarvik, Gail P; Reiss, Jacob A; Richards, C Sue; McMullen, Carmit; Gilmore, Marian; Himes, Patricia; Kraft, Stephanie A; Porter, Kathryn M; Schneider, Jennifer L; Punj, Sumit; Leo, Michael C; Dickerson, John F; Lynch, Frances L; Clarke, Elizabeth; Rope, Alan F; Lutz, Kevin; Goddard, Katrina A B

    2018-05-01

    Genomics-based carrier screening is one of many opportunities to use genomic information to inform medical decision making, but clinicians, health care delivery systems, and payers need to determine whether to offer screening and how to do so in an efficient, ethical way. To shed light on this issue, we conducted a study in the period 2014-17 to inform the design of clinical screening programs and guide further health services research. Many of our results have been published elsewhere; this article summarizes the lessons we learned from that study and offers policy insights. Our experience can inform understanding of the potential impact of expanded carrier screening services on health system workflows and workforces-impacts that depend on the details of the screening approach. We found limited patient or health system harms from expanded screening. We also found that some patients valued the information they learned from the process. Future policy discussions should consider the value of offering such expanded carrier screening in health delivery systems with limited resources.

  17. RNAi-Based Functional Genomics Identifies New Virulence Determinants in Mucormycosis.

    Directory of Open Access Journals (Sweden)

    Trung Anh Trieu

    2017-01-01

    Full Text Available Mucorales are an emerging group of human pathogens that are responsible for the lethal disease mucormycosis. Unfortunately, functional studies on the genetic factors behind the virulence of these organisms are hampered by their limited genetic tractability, since they are reluctant to classical genetic tools like transposable elements or gene mapping. Here, we describe an RNAi-based functional genomic platform that allows the identification of new virulence factors through a forward genetic approach firstly described in Mucorales. This platform contains a whole-genome collection of Mucor circinelloides silenced transformants that presented a broad assortment of phenotypes related to the main physiological processes in fungi, including virulence, hyphae morphology, mycelial and yeast growth, carotenogenesis and asexual sporulation. Selection of transformants with reduced virulence allowed the identification of mcplD, which encodes a Phospholipase D, and mcmyo5, encoding a probably essential cargo transporter of the Myosin V family, as required for a fully virulent phenotype of M. circinelloides. Knock-out mutants for those genes showed reduced virulence in both Galleria mellonella and Mus musculus models, probably due to a delayed germination and polarized growth within macrophages. This study provides a robust approach to study virulence in Mucorales and as a proof of concept identified new virulence determinants in M. circinelloides that could represent promising targets for future antifungal therapies.

  18. Population-based statistical inference for temporal sequence of somatic mutations in cancer genomes.

    Science.gov (United States)

    Rhee, Je-Keun; Kim, Tae-Min

    2018-04-20

    It is well recognized that accumulation of somatic mutations in cancer genomes plays a role in carcinogenesis; however, the temporal sequence and evolutionary relationship of somatic mutations remain largely unknown. In this study, we built a population-based statistical framework to infer the temporal sequence of acquisition of somatic mutations. Using the model, we analyzed the mutation profiles of 1954 tumor specimens across eight tumor types. As a result, we identified tumor type-specific directed networks composed of 2-15 cancer-related genes (nodes) and their mutational orders (edges). The most common ancestors identified in pairwise comparison of somatic mutations were TP53 mutations in breast, head/neck, and lung cancers. The known relationship of KRAS to TP53 mutations in colorectal cancers was identified, as well as potential ancestors of TP53 mutation such as NOTCH1, EGFR, and PTEN mutations in head/neck, lung and endometrial cancers, respectively. We also identified apoptosis-related genes enriched with ancestor mutations in lung cancers and a relationship between APC hotspot mutations and TP53 mutations in colorectal cancers. While evolutionary analysis of cancers has focused on clonal versus subclonal mutations identified in individual genomes, our analysis aims to further discriminate ancestor versus descendant mutations in population-scale mutation profiles that may help select cancer drivers with clinical relevance.

  19. A preliminary survey of M. hyopneumoniae virulence factors based on comparative genomic analysis

    Directory of Open Access Journals (Sweden)

    Henrique Bunselmeyer Ferreira

    2007-01-01

    Full Text Available Mycoplasma hyopneumoniae is the etiological agent of porcine enzootic pneumonia (PEP, a major problem for the pig industry. The mechanisms of M. hyopneumoniae pathogenicity allow to predict the existence of several classes of virulence factors, whose study has been essentially restricted to the characterization of adhesion-related and major antigenic proteins. The now available complete sequences of the genomes of two pathogenic and one non-pathogenic strain of M. hyopneumoniae allowed to use a comparative genomics approach to putatively identify virulence genes. In this preliminary survey, we were able to identify 118 CDSs encoding putative virulence factors, based on specific criteria ranging from predicted cell surface location or variation between strains to previous functional studies showing antigenicity or involvement in host-pathogen interaction. This survey is expected to serve as a first step towards the functional characterization of new virulence genes/proteins that will be important not only for a better comprehension of M. hyopneumoniae biology, but also for the development of new and improved protocols for PEP vaccination, diagnosis and treatment.

  20. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  1. Public health and valorization of genome-based technologies: a new model.

    Science.gov (United States)

    Lal, Jonathan A; Schulte In den Bäumen, Tobias; Morré, Servaas A; Brand, Angela

    2011-12-05

    The success rate of timely translation of genome-based technologies to commercially feasible products/services with applicability in health care systems is significantly low. We identified both industry and scientists neglect health policy aspects when commercializing their technology, more specifically, Public Health Assessment Tools (PHAT) and early on involvement of decision makers through which market authorization and reimbursements are dependent. While Technology Transfer (TT) aims to facilitate translation of ideas into products, Health Technology Assessment, one component of PHAT, for example, facilitates translation of products/processes into healthcare services and eventually comes up with recommendations for decision makers. We aim to propose a new model of valorization to optimize integration of genome-based technologies into the healthcare system. The method used to develop our model is an adapted version of the Fish Trap Model and the Basic Design Cycle. We found although different, similarities exist between TT and PHAT. Realizing the potential of being mutually beneficial justified our proposal of their relative parallel initiation. We observed that the Public Health Genomics Wheel should be included in this relative parallel activity to ensure all societal/policy aspects are dealt with preemptively by both stakeholders. On further analysis, we found out this whole process is dependent on the Value of Information. As a result, we present our LAL (Learning Adapting Leveling) model which proposes, based on market demand; TT and PHAT by consultation/bi-lateral communication should advocate for relevant technologies. This can be achieved by public-private partnerships (PPPs). These widely defined PPPs create the innovation network which is a developing, consultative/collaborative-networking platform between TT and PHAT. This network has iterations and requires learning, assimilating and using knowledge developed and is called absorption capacity. We

  2. Comprehensive cytological characterization of the Gossypium hirsutum genome based on the development of a set of chromosome cytological markers

    Directory of Open Access Journals (Sweden)

    Wenbo Shan

    2016-08-01

    Full Text Available Cotton is the world's most important natural fiber crop. It is also a model system for studying polyploidization, genomic organization, and genome-size variation. Integrating the cytological characterization of cotton with its genetic map will be essential for understanding its genome structure and evolution, as well as for performing further genetic-map based mapping and cloning. In this study, we isolated a complete set of bacterial artificial chromosome clones anchored to each of the 52 chromosome arms of the tetraploid cotton Gossypium hirsutum. Combining these with telomere and centromere markers, we constructed a standard karyotype for the G. hirsutum inbred line TM-1. We dissected the chromosome arm localizations of the 45S and 5S rDNA and suggest a centromere repositioning event in the homoeologous chromosomes AT09 and DT09. By integrating a systematic karyotype analysis with the genetic linkage map, we observed different genome sizes and chromosomal structures between the subgenomes of the tetraploid cotton and those of its diploid ancestors. Using evidence of conserved coding sequences, we suggest that the different evolutionary paths of non-coding retrotransposons account for most of the variation in size between the subgenomes of tetraploid cotton and its diploid ancestors. These results provide insights into the cotton genome and will facilitate further genome studies in G. hirsutum.

  3. Comprehensive cytological characterization of the Gossypium hirsutum genome based on the development of a set of chromosome cytological markers

    Institute of Scientific and Technical Information of China (English)

    Wenbo; Shan; Yanqin; Jiang; Jinlei; Han; Kai; Wang

    2016-01-01

    Cotton is the world’s most important natural fiber crop. It is also a model system for studying polyploidization, genomic organization, and genome-size variation. Integrating the cytological characterization of cotton with its genetic map will be essential for understanding its genome structure and evolution, as well as for performing further genetic-map based mapping and cloning. In this study, we isolated a complete set of bacterial artificial chromosome clones anchored to each of the 52 chromosome arms of the tetraploid cotton Gossypium hirsutum. Combining these with telomere and centromere markers, we constructed a standard karyotype for the G. hirsutum inbred line TM-1. We dissected the chromosome arm localizations of the 45 S and 5S r DNA and suggest a centromere repositioning event in the homoeologous chromosomes AT09 and DT09. By integrating a systematic karyotype analysis with the genetic linkage map, we observed different genome sizes and chromosomal structures between the subgenomes of the tetraploid cotton and those of its diploid ancestors. Using evidence of conserved coding sequences, we suggest that the different evolutionary paths of non-coding retrotransposons account for most of the variation in size between the subgenomes of tetraploid cotton and its diploid ancestors. These results provide insights into the cotton genome and will facilitate further genome studies in G. hirsutum.

  4. Genome-wide siRNA-based functional genomics of pigmentation identifies novel genes and pathways that impact melanogenesis in human cells.

    Directory of Open Access Journals (Sweden)

    Anand K Ganesan

    2008-12-01

    Full Text Available Melanin protects the skin and eyes from the harmful effects of UV irradiation, protects neural cells from toxic insults, and is required for sound conduction in the inner ear. Aberrant regulation of melanogenesis underlies skin disorders (melasma and vitiligo, neurologic disorders (Parkinson's disease, auditory disorders (Waardenburg's syndrome, and opthalmologic disorders (age related macular degeneration. Much of the core synthetic machinery driving melanin production has been identified; however, the spectrum of gene products participating in melanogenesis in different physiological niches is poorly understood. Functional genomics based on RNA-mediated interference (RNAi provides the opportunity to derive unbiased comprehensive collections of pharmaceutically tractable single gene targets supporting melanin production. In this study, we have combined a high-throughput, cell-based, one-well/one-gene screening platform with a genome-wide arrayed synthetic library of chemically synthesized, small interfering RNAs to identify novel biological pathways that govern melanin biogenesis in human melanocytes. Ninety-two novel genes that support pigment production were identified with a low false discovery rate. Secondary validation and preliminary mechanistic studies identified a large panel of targets that converge on tyrosinase expression and stability. Small molecule inhibition of a family of gene products in this class was sufficient to impair chronic tyrosinase expression in pigmented melanoma cells and UV-induced tyrosinase expression in primary melanocytes. Isolation of molecular machinery known to support autophagosome biosynthesis from this screen, together with in vitro and in vivo validation, exposed a close functional relationship between melanogenesis and autophagy. In summary, these studies illustrate the power of RNAi-based functional genomics to identify novel genes, pathways, and pharmacologic agents that impact a biological phenotype

  5. Herbarium genomics

    DEFF Research Database (Denmark)

    Bakker, Freek T.; Lei, Di; Yu, Jiaying

    2016-01-01

    Herbarium genomics is proving promising as next-generation sequencing approaches are well suited to deal with the usually fragmented nature of archival DNA. We show that routine assembly of partial plastome sequences from herbarium specimens is feasible, from total DNA extracts and with specimens...... up to 146 years old. We use genome skimming and an automated assembly pipeline, Iterative Organelle Genome Assembly, that assembles paired-end reads into a series of candidate assemblies, the best one of which is selected based on likelihood estimation. We used 93 specimens from 12 different...... correlation between plastome coverage and nuclear genome size (C value) in our samples, but the range of C values included is limited. Finally, we conclude that routine plastome sequencing from herbarium specimens is feasible and cost-effective (compared with Sanger sequencing or plastome...

  6. Using FlyBase, a Database of Drosophila Genes and Genomes.

    Science.gov (United States)

    Marygold, Steven J; Crosby, Madeline A; Goodman, Joshua L

    2016-01-01

    For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic, and high-throughput technologies add to the quantity and diversity of available data and resources.FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets, or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback.This chapter provides an overview of the data content, organization, and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries.

  7. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA.

    Science.gov (United States)

    Gibbons, Brian; Datta, Parikkhit; Wu, Ying; Chan, Alan; Al Armour, John

    2006-06-30

    Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH) we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A). Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  8. Origin of the Y genome in Elymus and its relationship to other genomes in Triticeae based on evidence from elongation factor G (EF-G) gene sequences.

    Science.gov (United States)

    Sun, Genlou; Komatsuda, Takao

    2010-08-01

    It is well known that Elymus arose through hybridization between representatives of different genera. Cytogenetic analyses show that all its members include the St genome in combination with one or more of four other genomes, the H, Y, P, and W genomes. The origins of the H, P, and W genomes are known, but not for the Y genome. We analyzed the single copy nuclear gene coding for elongation factor G (EF-G) from 28 accessions of polyploid Elymus species and 45 accessions of diploid Triticeae species in order to investigate origin of the Y genome and its relationship to other genomes in the tribe Triticeae. Sequence comparisons among the St, H, Y, P, W, and E genomes detected genome-specific polymorphisms at 66 nucleotide positions. The St and Y genomes are relatively dissimilar. The phylogeny of the Y genome sequences was investigated for the first time. They were most similar to the W genome sequences. The Y genome sequences were placed in two different groups. These two groups were included in an unresolved clade that included the W and E sequences as well as sequences from many annual species. The H genomes sequences were in a clade with the F, P, and Ns genome sequences as sister groups. These two clades were more closely related to each other and to the L and Xp genomes than they were to the St genome sequences. These data support the hypothesis that the Y genome evolved in a diploid species and has a different origin from the St genome. Copyright 2010 Elsevier Inc. All rights reserved.

  9. Calibration and analysis of genome-based models for microbial ecology.

    Science.gov (United States)

    Louca, Stilianos; Doebeli, Michael

    2015-10-16

    Microbial ecosystem modeling is complicated by the large number of unknown parameters and the lack of appropriate calibration tools. Here we present a novel computational framework for modeling microbial ecosystems, which combines genome-based model construction with statistical analysis and calibration to experimental data. Using this framework, we examined the dynamics of a community of Escherichia coli strains that emerged in laboratory evolution experiments, during which an ancestral strain diversified into two coexisting ecotypes. We constructed a microbial community model comprising the ancestral and the evolved strains, which we calibrated using separate monoculture experiments. Simulations reproduced the successional dynamics in the evolution experiments, and pathway activation patterns observed in microarray transcript profiles. Our approach yielded detailed insights into the metabolic processes that drove bacterial diversification, involving acetate cross-feeding and competition for organic carbon and oxygen. Our framework provides a missing link towards a data-driven mechanistic microbial ecology.

  10. Bayesian prediction of bacterial growth temperature range based on genome sequences

    DEFF Research Database (Denmark)

    Jensen, Dan Børge; Vesth, Tammi Camilla; Hallin, Peter Fischer

    2012-01-01

    Background: The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based...... on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results: This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles...... that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the naive Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic...

  11. Coalescent-based genome analyses resolve the early branches of the euarchontoglires.

    Directory of Open Access Journals (Sweden)

    Vikas Kumar

    Full Text Available Despite numerous large-scale phylogenomic studies, certain parts of the mammalian tree are extraordinarily difficult to resolve. We used the coding regions from 19 completely sequenced genomes to study the relationships within the super-clade Euarchontoglires (Primates, Rodentia, Lagomorpha, Dermoptera and Scandentia because the placement of Scandentia within this clade is controversial. The difficulty in resolving this issue is due to the short time spans between the early divergences of Euarchontoglires, which may cause incongruent gene trees. The conflict in the data can be depicted by network analyses and the contentious relationships are best reconstructed by coalescent-based analyses. This method is expected to be superior to analyses of concatenated data in reconstructing a species tree from numerous gene trees. The total concatenated dataset used to study the relationships in this group comprises 5,875 protein-coding genes (9,799,170 nucleotides from all orders except Dermoptera (flying lemurs. Reconstruction of the species tree from 1,006 gene trees using coalescent models placed Scandentia as sister group to the primates, which is in agreement with maximum likelihood analyses of concatenated nucleotide sequence data. Additionally, both analytical approaches favoured the Tarsier to be sister taxon to Anthropoidea, thus belonging to the Haplorrhine clade. When divergence times are short such as in radiations over periods of a few million years, even genome scale analyses struggle to resolve phylogenetic relationships. On these short branches processes such as incomplete lineage sorting and possibly hybridization occur and make it preferable to base phylogenomic analyses on coalescent methods.

  12. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.

  13. SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354

  14. GENOME-BASED MODELING AND DESIGN OF METABOLIC INTERACTIONS IN MICROBIAL COMMUNITIES

    Directory of Open Access Journals (Sweden)

    Radhakrishnan Mahadevan

    2012-10-01

    With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research.

  15. Mitochondrial genome of the stonefly Kamimuria wangi (Plecoptera: Perlidae) and phylogenetic position of plecoptera based on mitogenomes.

    Science.gov (United States)

    Yu-Han, Qian; Hai-Yan, Wu; Xiao-Yu, Ji; Wei-Wei, Yu; Yu-Zhou, Du

    2014-01-01

    This study determined the mitochondrial genome sequence of the stonefly, Kamimuria wangi. In order to investigate the relatedness of stonefly to other members of Neoptera, a phylogenetic analysis was undertaken based on 13 protein-coding genes of mitochondrial genomes in 13 representative insects. The mitochondrial genome of the stonefly is a circular molecule consisting of 16,179 nucleotides and contains the 37 genes typically found in other insects. A 10-bp poly-T stretch was observed in the A+T-rich region of the K. wangi mitochondrial genome. Downstream of the poly-T stretch, two regions were located with potential ability to form stem-loop structures; these were designated stem-loop 1 (positions 15848-15651) and stem-loop 2 (15965-15998). The arrangement of genes and nucleotide composition of the K. wangi mitogenome are similar to those in Pteronarcys princeps, suggesting a conserved genome evolution within the Plecoptera. Phylogenetic analysis using maximum likelihood and Bayesian inference of 13 protein-coding genes supported a novel relationship between the Plecoptera and Ephemeroptera. The results contradict the existence of a monophyletic Plectoptera and Plecoptera as sister taxa to Embiidina, and thus requires further analyses with additional mitogenome sampling at the base of the Neoptera.

  16. Mitochondrial genome of the stonefly Kamimuria wangi (Plecoptera: Perlidae and phylogenetic position of plecoptera based on mitogenomes.

    Directory of Open Access Journals (Sweden)

    Qian Yu-Han

    Full Text Available This study determined the mitochondrial genome sequence of the stonefly, Kamimuria wangi. In order to investigate the relatedness of stonefly to other members of Neoptera, a phylogenetic analysis was undertaken based on 13 protein-coding genes of mitochondrial genomes in 13 representative insects. The mitochondrial genome of the stonefly is a circular molecule consisting of 16,179 nucleotides and contains the 37 genes typically found in other insects. A 10-bp poly-T stretch was observed in the A+T-rich region of the K. wangi mitochondrial genome. Downstream of the poly-T stretch, two regions were located with potential ability to form stem-loop structures; these were designated stem-loop 1 (positions 15848-15651 and stem-loop 2 (15965-15998. The arrangement of genes and nucleotide composition of the K. wangi mitogenome are similar to those in Pteronarcys princeps, suggesting a conserved genome evolution within the Plecoptera. Phylogenetic analysis using maximum likelihood and Bayesian inference of 13 protein-coding genes supported a novel relationship between the Plecoptera and Ephemeroptera. The results contradict the existence of a monophyletic Plectoptera and Plecoptera as sister taxa to Embiidina, and thus requires further analyses with additional mitogenome sampling at the base of the Neoptera.

  17. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  18. Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory.

    Science.gov (United States)

    Crisan, Anamaria; McKee, Geoffrey; Munzner, Tamara; Gardy, Jennifer L

    2018-01-01

    Microbial genome sequencing is now being routinely used in many clinical and public health laboratories. Understanding how to report complex genomic test results to stakeholders who may have varying familiarity with genomics-including clinicians, laboratorians, epidemiologists, and researchers-is critical to the successful and sustainable implementation of this new technology; however, there are no evidence-based guidelines for designing such a report in the pathogen genomics domain. Here, we describe an iterative, human-centered approach to creating a report template for communicating tuberculosis (TB) genomic test results. We used Design Study Methodology-a human centered approach drawn from the information visualization domain-to redesign an existing clinical report. We used expert consults and an online questionnaire to discover various stakeholders' needs around the types of data and tasks related to TB that they encounter in their daily workflow. We also evaluated their perceptions of and familiarity with genomic data, as well as its utility at various clinical decision points. These data shaped the design of multiple prototype reports that were compared against the existing report through a second online survey, with the resulting qualitative and quantitative data informing the final, redesigned, report. We recruited 78 participants, 65 of whom were clinicians, nurses, laboratorians, researchers, and epidemiologists involved in TB diagnosis, treatment, and/or surveillance. Our first survey indicated that participants were largely enthusiastic about genomic data, with the majority agreeing on its utility for certain TB diagnosis and treatment tasks and many reporting some confidence in their ability to interpret this type of data (between 58.8% and 94.1%, depending on the specific data type). When we compared our four prototype reports against the existing design, we found that for the majority (86.7%) of design comparisons, participants preferred the

  19. Explorative visual analytics on interval-based genomic data and their metadata.

    Science.gov (United States)

    Jalili, Vahid; Matteucci, Matteo; Masseroli, Marco; Ceri, Stefano

    2017-12-04

    With the wide-spreading of public repositories of NGS processed data, the availability of user-friendly and effective tools for data exploration, analysis and visualization is becoming very relevant. These tools enable interactive analytics, an exploratory approach for the seamless "sense-making" of data through on-the-fly integration of analysis and visualization phases, suggested not only for evaluating processing results, but also for designing and adapting NGS data analysis pipelines. This paper presents abstractions for supporting the early analysis of NGS processed data and their implementation in an associated tool, named GenoMetric Space Explorer (GeMSE). This tool serves the needs of the GenoMetric Query Language, an innovative cloud-based system for computing complex queries over heterogeneous processed data. It can also be used starting from any text files in standard BED, BroadPeak, NarrowPeak, GTF, or general tab-delimited format, containing numerical features of genomic regions; metadata can be provided as text files in tab-delimited attribute-value format. GeMSE allows interactive analytics, consisting of on-the-fly cycling among steps of data exploration, analysis and visualization that help biologists and bioinformaticians in making sense of heterogeneous genomic datasets. By means of an explorative interaction support, users can trace past activities and quickly recover their results, seamlessly going backward and forward in the analysis steps and comparative visualizations of heatmaps. GeMSE effective application and practical usefulness is demonstrated through significant use cases of biological interest. GeMSE is available at http://www.bioinformatics.deib.polimi.it/GeMSE/ , and its source code is available at https://github.com/Genometric/GeMSE under GPLv3 open-source license.

  20. Polymorphic Microsatellite Markers for the Tetrapolar Anther-Smut Fungus Microbotryum saponariae Based on Genome Sequencing

    Science.gov (United States)

    Fortuna, Taiadjana M.; Snirc, Alodie; Badouin, Hélène; Gouzy, Jérome; Siguenza, Sophie; Esquerre, Diane; Le Prieur, Stéphanie; Shykoff, Jacqui A.; Giraud, Tatiana

    2016-01-01

    Background Anther-smut fungi belonging to the genus Microbotryum sterilize their host plants by aborting ovaries and replacing pollen by fungal spores. Sibling Microbotryum species are highly specialized on their host plants and they have been widely used as models for studies of ecology and evolution of plant pathogenic fungi. However, most studies have focused, so far, on M. lychnidis-dioicae that parasitizes the white campion Silene latifolia. Microbotryum saponariae, parasitizing mainly Saponaria officinalis, is an interesting anther-smut fungus, since it belongs to a tetrapolar lineage (i.e., with two independently segregating mating-type loci), while most of the anther-smut Microbotryum fungi are bipolar (i.e., with a single mating-type locus). Saponaria officinalis is a widespread long-lived perennial plant species with multiple flowering stems, which makes its anther-smut pathogen a good model for studying phylogeography and within-host multiple infections. Principal Findings Here, based on a generated genome sequence of M. saponariae we developed 6 multiplexes with a total of 22 polymorphic microsatellite markers using an inexpensive and efficient method. We scored these markers in fungal individuals collected from 97 populations across Europe, and found that the number of their alleles ranged from 2 to 11, and their expected heterozygosity from 0.01 to 0.58. Cross-species amplification was examined using nine other Microbotryum species parasitizing hosts belonging to Silene, Dianthus and Knautia genera. All loci were successfully amplified in at least two other Microbotryum species. Significance These newly developed markers will provide insights into the population genetic structure and the occurrence of within-host multiple infections of M. saponariae. In addition, the draft genome of M. saponariae, as well as one of the described markers will be useful resources for studying the evolution of the breeding systems in the genus Microbotryum and the

  1. Population genomic analyses based on 1 million SNPs in commercial egg layers.

    Directory of Open Access Journals (Sweden)

    Mahmood Gholami

    Full Text Available Identifying signatures of selection can provide valuable insight about the genes or genomic regions that are or have been under selective pressure, which can lead to a better understanding of genotype-phenotype relationships. A common strategy for selection signature detection is to compare samples from several populations and search for genomic regions with outstanding genetic differentiation. Wright's fixation index, FST, is a useful index for evaluation of genetic differentiation between populations. The aim of this study was to detect selective signatures between different chicken groups based on SNP-wise FST calculation. A total of 96 individuals of three commercial layer breeds and 14 non-commercial fancy breeds were genotyped with three different 600K SNP-chips. After filtering a total of 1 million SNPs were available for FST calculation. Averages of FST values were calculated for overlapping windows. Comparisons of these were then conducted between commercial egg layers and non-commercial fancy breeds, as well as between white egg layers and brown egg layers. Comparing non-commercial and commercial breeds resulted in the detection of 630 selective signatures, while 656 selective signatures were detected in the comparison between the commercial egg-layer breeds. Annotation of selection signature regions revealed various genes corresponding to productions traits, for which layer breeds were selected. Among them were NCOA1, SREBF2 and RALGAPA1 associated with reproductive traits, broodiness and egg production. Furthermore, several of the detected genes were associated with growth and carcass traits, including POMC, PRKAB2, SPP1, IGF2, CAPN1, TGFb2 and IGFBP2. Our approach demonstrates that including different populations with a specific breeding history can provide a unique opportunity for a better understanding of farm animal selection.

  2. Common minor histocompatibility antigen discovery based upon patient clinical outcomes and genomic data.

    Directory of Open Access Journals (Sweden)

    Paul M Armistead

    Full Text Available Minor histocompatibility antigens (mHA mediate much of the graft vs. leukemia (GvL effect and graft vs. host disease (GvHD in patients who undergo allogeneic stem cell transplantation (SCT. Therapeutic decision making and treatments based upon mHAs will require the evaluation of multiple candidate mHAs and the selection of those with the potential to have the greatest impact on clinical outcomes. We hypothesized that common, immunodominant mHAs, which are presented by HLA-A, B, and C molecules, can mediate clinically significant GvL and/or GvHD, and that these mHAs can be identified through association of genomic data with clinical outcomes.Because most mHAs result from donor/recipient cSNP disparities, we genotyped 57 myeloid leukemia patients and their donors at 13,917 cSNPs. We correlated the frequency of genetically predicted mHA disparities with clinical evidence of an immune response and then computationally screened all peptides mapping to the highly associated cSNPs for their ability to bind to HLA molecules. As proof-of-concept, we analyzed one predicted antigen, T4A, whose mHA mismatch trended towards improved overall and disease free survival in our cohort. T4A mHA mismatches occurred at the maximum theoretical frequency for any given SCT. T4A-specific CD8+ T lymphocytes (CTLs were detected in 3 of 4 evaluable post-transplant patients predicted to have a T4A mismatch.Our method is the first to combine clinical outcomes data with genomics and bioinformatics methods to predict and confirm a mHA. Refinement of this method should enable the discovery of clinically relevant mHAs in the majority of transplant patients and possibly lead to novel immunotherapeutics.

  3. Genomic DNA-based absolute quantification of gene expression in Vitis.

    Science.gov (United States)

    Gambetta, Gregory A; McElrone, Andrew J; Matthews, Mark A

    2013-07-01

    Many studies in which gene expression is quantified by polymerase chain reaction represent the expression of a gene of interest (GOI) relative to that of a reference gene (RG). Relative expression is founded on the assumptions that RG expression is stable across samples, treatments, organs, etc., and that reaction efficiencies of the GOI and RG are equal; assumptions which are often faulty. The true variability in RG expression and actual reaction efficiencies are seldom determined experimentally. Here we present a rapid and robust method for absolute quantification of expression in Vitis where varying concentrations of genomic DNA were used to construct GOI standard curves. This methodology was utilized to absolutely quantify and determine the variability of the previously validated RG ubiquitin (VvUbi) across three test studies in three different tissues (roots, leaves and berries). In addition, in each study a GOI was absolutely quantified. Data sets resulting from relative and absolute methods of quantification were compared and the differences were striking. VvUbi expression was significantly different in magnitude between test studies and variable among individual samples. Absolute quantification consistently reduced the coefficients of variation of the GOIs by more than half, often resulting in differences in statistical significance and in some cases even changing the fundamental nature of the result. Utilizing genomic DNA-based absolute quantification is fast and efficient. Through eliminating error introduced by assuming RG stability and equal reaction efficiencies between the RG and GOI this methodology produces less variation, increased accuracy and greater statistical power. © 2012 Scandinavian Plant Physiology Society.

  4. Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies

    Science.gov (United States)

    Manitz, Juliane; Burger, Patricia; Amos, Christopher I.; Chang-Claude, Jenny; Wichmann, Heinz-Erich; Kneib, Thomas; Bickeböller, Heike

    2017-01-01

    The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility. PMID:28785300

  5. Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies.

    Science.gov (United States)

    Friedrichs, Stefanie; Manitz, Juliane; Burger, Patricia; Amos, Christopher I; Risch, Angela; Chang-Claude, Jenny; Wichmann, Heinz-Erich; Kneib, Thomas; Bickeböller, Heike; Hofner, Benjamin

    2017-01-01

    The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility.

  6. Use of Comparative Genomics-Based Markers for Discrimination of Host Specificity in Fusarium oxysporum.

    Science.gov (United States)

    van Dam, Peter; de Sain, Mara; Ter Horst, Anneliek; van der Gragt, Michelle; Rep, Martijn

    2018-01-01

    The polyphyletic nature of many formae speciales of Fusarium oxysporum prevents molecular identification of newly encountered strains based on conserved, vertically inherited genes. Alternative molecular detection methods that could replace labor- and time-intensive disease assays are therefore highly desired. Effectors are functional elements in the pathogen-host interaction and have been found to show very limited sequence diversity between strains of the same forma specialis , which makes them potential markers for host-specific pathogenicity. We therefore compared candidate effector genes extracted from 60 existing and 22 newly generated genome assemblies, specifically targeting strains affecting cucurbit plant species. Based on these candidate effector genes, a total of 18 PCR primer pairs were designed to discriminate between each of the seven Cucurbitaceae-affecting formae speciales When tested on a collection of strains encompassing different clonal lineages of these formae speciales , nonpathogenic strains, and strains of other formae speciales , they allowed clear recognition of the host range of each evaluated strain. Within Fusarium oxysporum f. sp. melonis more genetic variability exists than anticipated, resulting in three F. oxysporum f. sp. melonis marker patterns that partially overlapped with the cucurbit-infecting Fusarium oxysporum f. sp. cucumerinum , Fusarium oxysporum f. sp. niveum , Fusarium oxysporum f. sp. momordicae , and/or Fusarium oxysporum f. sp. lagenariae For F. oxysporum f. sp. niveum , a multiplex TaqMan assay was evaluated and was shown to allow quantitative and specific detection of template DNA quantities as low as 2.5 pg. These results provide ready-to-use marker sequences for the mentioned F. oxysporum pathogens. Additionally, the method can be applied to find markers distinguishing other host-specific forms of F. oxysporum IMPORTANCE Pathogenic strains of Fusarium oxysporum are differentiated into formae speciales based on

  7. BigQ: a NoSQL based framework to handle genomic variants in i2b2.

    Science.gov (United States)

    Gabetta, Matteo; Limongelli, Ivan; Rizzo, Ettore; Riva, Alberto; Segagni, Daniele; Bellazzi, Riccardo

    2015-12-29

    Precision medicine requires the tight integration of clinical and molecular data. To this end, it is mandatory to define proper technological solutions able to manage the overwhelming amount of high throughput genomic data needed to test associations between genomic signatures and human phenotypes. The i2b2 Center (Informatics for Integrating Biology and the Bedside) has developed a widely internationally adopted framework to use existing clinical data for discovery research that can help the definition of precision medicine interventions when coupled with genetic data. i2b2 can be significantly advanced by designing efficient management solutions of Next Generation Sequencing data. We developed BigQ, an extension of the i2b2 framework, which integrates patient clinical phenotypes with genomic variant profiles generated by Next Generation Sequencing. A visual programming i2b2 plugin allows retrieving variants belonging to the patients in a cohort by applying filters on genomic variant annotations. We report an evaluation of the query performance of our system on more than 11 million variants, showing that the implemented solution scales linearly in terms of query time and disk space with the number of variants. In this paper we describe a new i2b2 web service composed of an efficient and scalable document-based database that manages annotations of genomic variants and of a visual programming plug-in designed to dynamically perform queries on clinical and genetic data. The system therefore allows managing the fast growing volume of genomic variants and can be used to integrate heterogeneous genomic annotations.

  8. Spontaneous germline excision of Tol1, a DNA-based transposable element naturally occurring in the medaka fish genome.

    Science.gov (United States)

    Watanabe, Kohei; Koga, Hajime; Nakamura, Kodai; Fujita, Akiko; Hattori, Akimasa; Matsuda, Masaru; Koga, Akihiko

    2014-04-01

    DNA-based transposable elements are ubiquitous constituents of eukaryotic genomes. Vertebrates are, however, exceptional in that most of their DNA-based elements appear to be inactivated. The Tol1 element of the medaka fish, Oryzias latipes, is one of the few elements for which copies containing an undamaged gene have been found. Spontaneous transposition of this element in somatic cells has previously been demonstrated, but there is only indirect evidence for its germline transposition. Here, we show direct evidence of spontaneous excision in the germline. Tyrosinase is the key enzyme in melanin biosynthesis. In an albino laboratory strain of medaka fish, which is homozygous for a mutant tyrosinase gene in which a Tol1 copy is inserted, we identified de novo reversion mutations related to melanin pigmentation. The gamete-based reversion rate was as high as 0.4%. The revertant fish carried the tyrosinase gene from which the Tol1 copy had been excised. We previously reported the germline transposition of Tol2, another DNA-based element that is thought to be a recent invader of the medaka fish genome. Tol1 is an ancient resident of the genome. Our results indicate that even an old element can contribute to genetic variation in the host genome as a natural mutator.

  9. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from total DNA Sequences.

    NARCIS (Netherlands)

    Izan, Shairul; Esselink, G.; Visser, R.G.F.; Smulders, M.J.M.; Borm, T.J.A.

    2017-01-01

    Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This

  10. Array-based genomic screening at diagnosis and during follow-up in chronic lymphocytic leukemia

    DEFF Research Database (Denmark)

    Gunnarsson, Rebeqa; Mansouri, Larry; Isaksson, Anders

    2011-01-01

    High-resolution genomic microarrays enable simultaneous detection of copy-number aberrations such as the known recurrent aberrations in chronic lymphocytic leukemia [del(11q), del(13q), del(17p) and trisomy 12], and copy-number neutral loss of heterozygosity. Moreover, comparison of genomic...... profiles from sequential patients' samples allows detection of clonal evolution....

  11. Array-based genomic screening at diagnosis and during follow-up in chronic lymphocytic leukemia

    DEFF Research Database (Denmark)

    Gunnarsson, Rebeqa; Mansouri, Larry; Isaksson, Anders

    2011-01-01

    High-resolution genomic microarrays enable simultaneous detection of copy-number aberrations such as the known recurrent aberrations in chronic lymphocytic leukemia [del(11q), del(13q), del(17p) and trisomy 12], and copy-number neutral loss of heterozygosity. Moreover, comparison of genomic...

  12. A proposed genus boundary for the prokaryotes based on genomic insights.

    Science.gov (United States)

    Qin, Qi-Long; Xie, Bin-Bin; Zhang, Xi-Ying; Chen, Xiu-Lan; Zhou, Bai-Cheng; Zhou, Jizhong; Oren, Aharon; Zhang, Yu-Zhong

    2014-06-01

    Genomic information has already been applied to prokaryotic species definition and classification. However, the contribution of the genome sequence to prokaryotic genus delimitation has been less studied. To gain insights into genus definition for the prokaryotes, we attempted to reveal the genus-level genomic differences in the current prokaryotic classification system and to delineate the boundary of a genus on the basis of genomic information. The average nucleotide sequence identity between two genomes can be used for prokaryotic species delineation, but it is not suitable for genus demarcation. We used the percentage of conserved proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance. A comprehensive genomic survey indicated that the POCP can serve as a robust genomic index for establishing the genus boundary for prokaryotic groups. Basically, two species belonging to the same genus would share at least half of their proteins. In a specific lineage, the genus and family/order ranks showed slight or no overlap in terms of POCP values. A prokaryotic genus can be defined as a group of species with all pairwise POCP values higher than 50%. Integration of whole-genome data into the current taxonomy system can provide comprehensive information for prokaryotic genus definition and delimitation. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  13. Epigenomic annotation-based interpretation of genomic data: from enrichment analysis to machine learning.

    Science.gov (United States)

    Dozmorov, Mikhail G

    2017-10-15

    One of the goals of functional genomics is to understand the regulatory implications of experimentally obtained genomic regions of interest (ROIs). Most sequencing technologies now generate ROIs distributed across the whole genome. The interpretation of these genome-wide ROIs represents a challenge as the majority of them lie outside of functionally well-defined protein coding regions. Recent efforts by the members of the International Human Epigenome Consortium have generated volumes of functional/regulatory data (reference epigenomic datasets), effectively annotating the genome with epigenomic properties. Consequently, a wide variety of computational tools has been developed utilizing these epigenomic datasets for the interpretation of genomic data. The purpose of this review is to provide a structured overview of practical solutions for the interpretation of ROIs with the help of epigenomic data. Starting with epigenomic enrichment analysis, we discuss leading tools and machine learning methods utilizing epigenomic and 3D genome structure data. The hierarchy of tools and methods reviewed here presents a practical guide for the interpretation of genome-wide ROIs within an epigenomic context. mikhail.dozmorov@vcuhealth.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  14. Segmenting the human genome based on states of neutral genetic divergence.

    Science.gov (United States)

    Kuruppumullage Don, Prabhani; Ananda, Guruprasad; Chiaromonte, Francesca; Makova, Kateryna D

    2013-09-03

    Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human-orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states--each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants--including those associated with cancer and other diseases--and to improve computational predictions of noncoding functional elements.

  15. Efficient CRISPR/Cas9-Based Genome Engineering in Human Pluripotent Stem Cells.

    Science.gov (United States)

    Kime, Cody; Mandegar, Mohammad A; Srivastava, Deepak; Yamanaka, Shinya; Conklin, Bruce R; Rand, Tim A

    2016-01-01

    Human pluripotent stem cells (hPS cells) are rapidly emerging as a powerful tool for biomedical discovery. The advent of human induced pluripotent stem cells (hiPS cells) with human embryonic stem (hES)-cell-like properties has led to hPS cells with disease-specific genetic backgrounds for in vitro disease modeling and drug discovery as well as mechanistic and developmental studies. To fully realize this potential, it will be necessary to modify the genome of hPS cells with precision and flexibility. Pioneering experiments utilizing site-specific double-strand break (DSB)-mediated genome engineering tools, including zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), have paved the way to genome engineering in previously recalcitrant systems such as hPS cells. However, these methods are technically cumbersome and require significant expertise, which has limited adoption. A major recent advance involving the clustered regularly interspaced short palindromic repeats (CRISPR) endonuclease has dramatically simplified the effort required for genome engineering and will likely be adopted widely as the most rapid and flexible system for genome editing in hPS cells. In this unit, we describe commonly practiced methods for CRISPR endonuclease genomic editing of hPS cells into cell lines containing genomes altered by insertion/deletion (indel) mutagenesis or insertion of recombinant genomic DNA. Copyright © 2016 John Wiley & Sons, Inc.

  16. An extended anchored linkage map and virtual mapping for the american mink genome based on homology to human and dog

    DEFF Research Database (Denmark)

    Anistoroaei, Razvan Marian; Ansari, S.; Farid, A.

    2009-01-01

    hybridization (FISH) and/or by means of human/dog/mink comparative homology. The average interval between markers is 8.5 cM and the linkage groups collectively span 1340 cM. In addition, 217 and 275 mink microsatellites have been placed on human and dog genomes, respectively. In conjunction with the existing...... comparative human/dog/mink data, these assignments represent useful virtual maps for the American mink genome. Comparison of the current human/dog assembled sequential map with the existing Zoo-FISH-based human/dog/mink maps helped to refine the human/dog/mink comparative map. Furthermore, comparison...... of the human and dog genome assemblies revealed a number of large synteny blocks, some of which are corroborated by data from the mink linkage map....

  17. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA

    Directory of Open Access Journals (Sweden)

    Chan Alan

    2006-06-01

    Full Text Available Abstract Background Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. Results In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A. Conclusion Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  18. Genome-enabled Modeling of Microbial Biogeochemistry using a Trait-based Approach. Does Increasing Metabolic Complexity Increase Predictive Capabilities?

    Science.gov (United States)

    King, E.; Karaoz, U.; Molins, S.; Bouskill, N.; Anantharaman, K.; Beller, H. R.; Banfield, J. F.; Steefel, C. I.; Brodie, E.

    2015-12-01

    The biogeochemical functioning of ecosystems is shaped in part by genomic information stored in the subsurface microbiome. Cultivation-independent approaches allow us to extract this information through reconstruction of thousands of genomes from a microbial community. Analysis of these genomes, in turn, gives an indication of the organisms present and their functional roles. However, metagenomic analyses can currently deliver thousands of different genomes that range in abundance/importance, requiring the identification and assimilation of key physiologies and metabolisms to be represented as traits for successful simulation of subsurface processes. Here we focus on incorporating -omics information into BioCrunch, a genome-informed trait-based model that represents the diversity of microbial functional processes within a reactive transport framework. This approach models the rate of nutrient uptake and the thermodynamics of coupled electron donors and acceptors for a range of microbial metabolisms including heterotrophs and chemolithotrophs. Metabolism of exogenous substrates fuels catabolic and anabolic processes, with the proportion of energy used for cellular maintenance, respiration, biomass development, and enzyme production based upon dynamic intracellular and environmental conditions. This internal resource partitioning represents a trade-off against biomass formation and results in microbial community emergence across a fitness landscape. Biocrunch was used here in simulations that included organisms and metabolic pathways derived from a dataset of ~1200 non-redundant genomes reflecting a microbial community in a floodplain aquifer. Metagenomic data was directly used to parameterize trait values related to growth and to identify trait linkages associated with respiration, fermentation, and key enzymatic functions such as plant polymer degradation. Simulations spanned a range of metabolic complexities and highlight benefits originating from simulations

  19. Genome-Based Identification of Active Prophage Regions by Next Generation Sequencing in Bacillus licheniformis DSM13

    Science.gov (United States)

    Hertel, Robert; Rodríguez, David Pintor; Hollensteiner, Jacqueline; Dietrich, Sascha; Leimbach, Andreas; Hoppert, Michael; Liesegang, Heiko; Volland, Sonja

    2015-01-01

    Prophages are viruses, which have integrated their genomes into the genome of a bacterial host. The status of the prophage genome can vary from fully intact with the potential to form infective particles to a remnant state where only a few phage genes persist. Prophages have impact on the properties of their host and are therefore of great interest for genomic research and strain design. Here we present a genome- and next generation sequencing (NGS)-based approach for identification and activity evaluation of prophage regions. Seven prophage or prophage-like regions were identified in the genome of Bacillus licheniformis DSM13. Six of these regions show similarity to members of the Siphoviridae phage family. The remaining region encodes the B. licheniformis orthologue of the PBSX prophage from Bacillus subtilis. Analysis of isolated phage particles (induced by mitomycin C) from the wild-type strain and prophage deletion mutant strains revealed activity of the prophage regions BLi_Pp2 (PBSX-like), BLi_Pp3 and BLi_Pp6. In contrast to BLi_Pp2 and BLi_Pp3, neither phage DNA nor phage particles of BLi_Pp6 could be visualized. However, the ability of prophage BLi_Pp6 to generate particles could be confirmed by sequencing of particle-protected DNA mapping to prophage locus BLi_Pp6. The introduced NGS-based approach allows the investigation of prophage regions and their ability to form particles. Our results show that this approach increases the sensitivity of prophage activity analysis and can complement more conventional approaches such as transmission electron microscopy (TEM). PMID:25811873

  20. Genome based analyses of six hexacorallian species reject the “naked coral” hypothesis

    KAUST Repository

    Wang, Xin

    2017-09-23

    Scleractinian corals are the foundation species of the coral-reef ecosystem. Their calcium carbonate skeletons form extensive structures that are home to millions of species, making coral reefs one of the most diverse ecosystems of our planet. However, our understanding of how reef-building corals have evolved the ability to calcify and become the ecosystem builders they are today is hampered by uncertain relationships within their subclass Hexacorallia. Corallimorpharians have been proposed to originate from a complex scleractinian ancestor that lost the ability to calcify in response to increasing ocean acidification, suggesting the possibility for corals to lose and gain the ability to calcify in response to increasing ocean acidification. Here we employed a phylogenomic approach using whole-genome data from six hexacorallian species to resolve the evolutionary relationship between reef-building corals and their non-calcifying relatives. Phylogenetic analysis based on 1,421 single-copy orthologs, as well as gene presence/absence and synteny information, converged on the same topologies, showing strong support for scleractinian monophyly and a corallimorpharian sister clade. Our broad phylogenomic approach using sequence-based and sequence-independent analyses provides unambiguous evidence for the monophyly of scleractinian corals and the rejection of corallimorpharians as descendants of a complex coral ancestor.

  1. Genome based analyses of six hexacorallian species reject the “naked coral” hypothesis

    KAUST Repository

    Wang, Xin; Drillon, Gué nola; Ryu, Taewoo; Voolstra, Christian R.; Aranda, Manuel

    2017-01-01

    Scleractinian corals are the foundation species of the coral-reef ecosystem. Their calcium carbonate skeletons form extensive structures that are home to millions of species, making coral reefs one of the most diverse ecosystems of our planet. However, our understanding of how reef-building corals have evolved the ability to calcify and become the ecosystem builders they are today is hampered by uncertain relationships within their subclass Hexacorallia. Corallimorpharians have been proposed to originate from a complex scleractinian ancestor that lost the ability to calcify in response to increasing ocean acidification, suggesting the possibility for corals to lose and gain the ability to calcify in response to increasing ocean acidification. Here we employed a phylogenomic approach using whole-genome data from six hexacorallian species to resolve the evolutionary relationship between reef-building corals and their non-calcifying relatives. Phylogenetic analysis based on 1,421 single-copy orthologs, as well as gene presence/absence and synteny information, converged on the same topologies, showing strong support for scleractinian monophyly and a corallimorpharian sister clade. Our broad phylogenomic approach using sequence-based and sequence-independent analyses provides unambiguous evidence for the monophyly of scleractinian corals and the rejection of corallimorpharians as descendants of a complex coral ancestor.

  2. PLACNETw: a web-based tool for plasmid reconstruction from bacterial genomes.

    Science.gov (United States)

    Vielva, Luis; de Toro, María; Lanza, Val F; de la Cruz, Fernando

    2017-12-01

    PLACNET is a graph-based tool for reconstruction of plasmids from next generation sequence pair-end datasets. PLACNET graphs contain two types of nodes (assembled contigs and reference genomes) and two types of edges (scaffold links and homology to references). Manual pruning of the graphs is a necessary requirement in PLACNET, but this is difficult for users without solid bioinformatic background. PLACNETw, a webtool based on PLACNET, provides an interactive graphic interface, automates BLAST searches, and extracts the relevant information for decision making. It allows a user with domain expertise to visualize the scaffold graphs and related information of contigs as well as reference sequences, so that the pruning operations can be done interactively from a personal computer without the need for additional tools. After successful pruning, each plasmid becomes a separate connected component subgraph. The resulting data are automatically downloaded by the user. PLACNETw is freely available at https://castillo.dicom.unican.es/upload/. delacruz@unican.es. A tutorial video and several solved examples are available at https://castillo.dicom.unican.es/placnetw_video/ and https://castillo.dicom.unican.es/examples/. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  3. Correcting for cryptic relatedness by a regression-based genomic control method

    Directory of Open Access Journals (Sweden)

    Yang Yaning

    2009-12-01

    Full Text Available Abstract Background Genomic control (GC method is a useful tool to correct for the cryptic relatedness in population-based association studies. It was originally proposed for correcting for the variance inflation of Cochran-Armitage's additive trend test by using information from unlinked null markers, and was later generalized to be applicable to other tests with the additional requirement that the null markers are matched with the candidate marker in allele frequencies. However, matching allele frequencies limits the number of available null markers and thus limits the applicability of the GC method. On the other hand, errors in genotype/allele frequencies may cause further bias and variance inflation and thereby aggravate the effect of GC correction. Results In this paper, we propose a regression-based GC method using null markers that are not necessarily matched in allele frequencies with the candidate marker. Variation of allele frequencies of the null markers is adjusted by a regression method. Conclusion The proposed method can be readily applied to the Cochran-Armitage's trend tests other than the additive trend test, the Pearson's chi-square test and other robust efficiency tests. Simulation results show that the proposed method is effective in controlling type I error in the presence of population substructure.

  4. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Science.gov (United States)

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  5. Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory

    Science.gov (United States)

    Crisan, Anamaria; McKee, Geoffrey; Munzner, Tamara

    2018-01-01

    Background Microbial genome sequencing is now being routinely used in many clinical and public health laboratories. Understanding how to report complex genomic test results to stakeholders who may have varying familiarity with genomics—including clinicians, laboratorians, epidemiologists, and researchers—is critical to the successful and sustainable implementation of this new technology; however, there are no evidence-based guidelines for designing such a report in the pathogen genomics domain. Here, we describe an iterative, human-centered approach to creating a report template for communicating tuberculosis (TB) genomic test results. Methods We used Design Study Methodology—a human centered approach drawn from the information visualization domain—to redesign an existing clinical report. We used expert consults and an online questionnaire to discover various stakeholders’ needs around the types of data and tasks related to TB that they encounter in their daily workflow. We also evaluated their perceptions of and familiarity with genomic data, as well as its utility at various clinical decision points. These data shaped the design of multiple prototype reports that were compared against the existing report through a second online survey, with the resulting qualitative and quantitative data informing the final, redesigned, report. Results We recruited 78 participants, 65 of whom were clinicians, nurses, laboratorians, researchers, and epidemiologists involved in TB diagnosis, treatment, and/or surveillance. Our first survey indicated that participants were largely enthusiastic about genomic data, with the majority agreeing on its utility for certain TB diagnosis and treatment tasks and many reporting some confidence in their ability to interpret this type of data (between 58.8% and 94.1%, depending on the specific data type). When we compared our four prototype reports against the existing design, we found that for the majority (86.7%) of design

  6. Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory

    Directory of Open Access Journals (Sweden)

    Anamaria Crisan

    2018-01-01

    Full Text Available Background Microbial genome sequencing is now being routinely used in many clinical and public health laboratories. Understanding how to report complex genomic test results to stakeholders who may have varying familiarity with genomics—including clinicians, laboratorians, epidemiologists, and researchers—is critical to the successful and sustainable implementation of this new technology; however, there are no evidence-based guidelines for designing such a report in the pathogen genomics domain. Here, we describe an iterative, human-centered approach to creating a report template for communicating tuberculosis (TB genomic test results. Methods We used Design Study Methodology—a human centered approach drawn from the information visualization domain—to redesign an existing clinical report. We used expert consults and an online questionnaire to discover various stakeholders’ needs around the types of data and tasks related to TB that they encounter in their daily workflow. We also evaluated their perceptions of and familiarity with genomic data, as well as its utility at various clinical decision points. These data shaped the design of multiple prototype reports that were compared against the existing report through a second online survey, with the resulting qualitative and quantitative data informing the final, redesigned, report. Results We recruited 78 participants, 65 of whom were clinicians, nurses, laboratorians, researchers, and epidemiologists involved in TB diagnosis, treatment, and/or surveillance. Our first survey indicated that participants were largely enthusiastic about genomic data, with the majority agreeing on its utility for certain TB diagnosis and treatment tasks and many reporting some confidence in their ability to interpret this type of data (between 58.8% and 94.1%, depending on the specific data type. When we compared our four prototype reports against the existing design, we found that for the majority (86.7% of

  7. GenExp: an interactive web-based genomic DAS client with client-side data rendering.

    Directory of Open Access Journals (Sweden)

    Bernat Gel Moreno

    Full Text Available BACKGROUND: The Distributed Annotation System (DAS offers a standard protocol for sharing and integrating annotations on biological sequences. There are more than 1000 DAS sources available and the number is steadily increasing. Clients are an essential part of the DAS system and integrate data from several independent sources in order to create a useful representation to the user. While web-based DAS clients exist, most of them do not have direct interaction capabilities such as dragging and zooming with the mouse. RESULTS: Here we present GenExp, a web based and fully interactive visual DAS client. GenExp is a genome oriented DAS client capable of creating informative representations of genomic data zooming out from base level to complete chromosomes. It proposes a novel approach to genomic data rendering and uses the latest HTML5 web technologies to create the data representation inside the client browser. Thanks to client-side rendering most position changes do not need a network request to the server and so responses to zooming and panning are almost immediate. In GenExp it is possible to explore the genome intuitively moving it with the mouse just like geographical map applications. Additionally, in GenExp it is possible to have more than one data viewer at the same time and to save the current state of the application to revisit it later on. CONCLUSIONS: GenExp is a new interactive web-based client for DAS and addresses some of the short-comings of the existing clients. It uses client-side data rendering techniques resulting in easier genome browsing and exploration. GenExp is open source under the GPL license and it is freely available at http://gralggen.lsi.upc.edu/recerca/genexp.

  8. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes.

    Science.gov (United States)

    Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Minuchehr, Zarrin; Nassiri, Mohammad Reza

    2015-03-01

    During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS)-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data.

  9. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes

    Directory of Open Access Journals (Sweden)

    Saber Jelokhani-Niaraki

    2015-03-01

    Full Text Available During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data.

  10. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes

    Science.gov (United States)

    Jelokhani-Niaraki, Saber; Minuchehr, Zarrin; Nassiri, Mohammad Reza

    2015-01-01

    During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS)-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data. PMID:25873847

  11. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

    DEFF Research Database (Denmark)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan

    2009-01-01

    MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary...... determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than...

  12. Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology

    Directory of Open Access Journals (Sweden)

    Chao Shiaoman

    2011-01-01

    Full Text Available Abstract Background Genetic markers are pivotal to modern genomics research; however, discovery and genotyping of molecular markers in oat has been hindered by the size and complexity of the genome, and by a scarcity of sequence data. The purpose of this study was to generate oat expressed sequence tag (EST information, develop a bioinformatics pipeline for SNP discovery, and establish a method for rapid, cost-effective, and straightforward genotyping of SNP markers in complex polyploid genomes such as oat. Results Based on cDNA libraries of four cultivated oat genotypes, approximately 127,000 contigs were assembled from approximately one million Roche 454 sequence reads. Contigs were filtered through a novel bioinformatics pipeline to eliminate ambiguous polymorphism caused by subgenome homology, and 96 in silico SNPs were selected from 9,448 candidate loci for validation using high-resolution melting (HRM analysis. Of these, 52 (54% were polymorphic between parents of the Ogle1040 × TAM O-301 (OT mapping population, with 48 segregating as single Mendelian loci, and 44 being placed on the existing OT linkage map. Ogle and TAM amplicons from 12 primers were sequenced for SNP validation, revealing complex polymorphism in seven amplicons but general sequence conservation within SNP loci. Whole-amplicon interrogation with HRM revealed insertions, deletions, and heterozygotes in secondary oat germplasm pools, generating multiple alleles at some primer targets. To validate marker utility, 36 SNP assays were used to evaluate the genetic diversity of 34 diverse oat genotypes. Dendrogram clusters corresponded generally to known genome composition and genetic ancestry. Conclusions The high-throughput SNP discovery pipeline presented here is a rapid and effective method for identification of polymorphic SNP alleles in the oat genome. The current-generation HRM system is a simple and highly-informative platform for SNP genotyping. These techniques provide

  13. Genome-based polymorphic microsatellite development and validation in the mosquito Aedes aegypti and application to population genetics in Haiti

    Directory of Open Access Journals (Sweden)

    Streit Thomas G

    2009-12-01

    Full Text Available Abstract Background Microsatellite markers have proven useful in genetic studies in many organisms, yet microsatellite-based studies of the dengue and yellow fever vector mosquito Aedes aegypti have been limited by the number of assayable and polymorphic loci available, despite multiple independent efforts to identify them. Here we present strategies for efficient identification and development of useful microsatellites with broad coverage across the Aedes aegypti genome, development of multiplex-ready PCR groups of microsatellite loci, and validation of their utility for population analysis with field collections from Haiti. Results From 79 putative microsatellite loci representing 31 motifs identified in 42 whole genome sequence supercontig assemblies in the Aedes aegypti genome, 33 microsatellites providing genome-wide coverage amplified as single copy sequences in four lab strains, with a range of 2-6 alleles per locus. The tri-nucleotide motifs represented the majority (51% of the polymorphic single copy loci, and none of these was located within a putative open reading frame. Seven groups of 4-5 microsatellite loci each were developed for multiplex-ready PCR. Four multiplex-ready groups were used to investigate population genetics of Aedes aegypti populations sampled in Haiti. Of the 23 loci represented in these groups, 20 were polymorphic with a range of 3-24 alleles per locus (mean = 8.75. Allelic polymorphic information content varied from 0.171 to 0.867 (mean = 0.545. Most loci met Hardy-Weinberg expectations across populations and pairwise FST comparisons identified significant genetic differentiation between some populations. No evidence for genetic isolation by distance was observed. Conclusion Despite limited success in previous reports, we demonstrate that the Aedes aegypti genome is well-populated with single copy, polymorphic microsatellite loci that can be uncovered using the strategy developed here for rapid and efficient

  14. Comparative BAC-based mapping in the white-throated sparrow, a novel behavioral genomics model, using interspecies overgo hybridization

    Directory of Open Access Journals (Sweden)

    Gonser Rusty A

    2011-06-01

    Full Text Available Abstract Background The genomics era has produced an arsenal of resources from sequenced organisms allowing researchers to target species that do not have comparable mapping and sequence information. These new "non-model" organisms offer unique opportunities to examine environmental effects on genomic patterns and processes. Here we use comparative mapping as a first step in characterizing the genome organization of a novel animal model, the white-throated sparrow (Zonotrichia albicollis, which occurs as white or tan morphs that exhibit alternative behaviors and physiology. Morph is determined by the presence or absence of a complex chromosomal rearrangement. This species is an ideal model for behavioral genomics because the association between genotype and phenotype is absolute, making it possible to identify the genomic bases of phenotypic variation. Findings We initiated a genomic study in this species by characterizing the white-throated sparrow BAC library via filter hybridization with overgo probes designed for the chicken, turkey, and zebra finch. Cross-species hybridization resulted in 640 positive sparrow BACs assigned to 77 chicken loci across almost all macro-and microchromosomes, with a focus on the chromosomes associated with morph. Out of 216 overgos, 36% of the probes hybridized successfully, with an average number of 3.0 positive sparrow BACs per overgo. Conclusions These data will be utilized for determining chromosomal architecture and for fine-scale mapping of candidate genes associated with phenotypic differences. Our research confirms the utility of interspecies hybridization for developing comparative maps in other non-model organisms.

  15. An SVD-based comparison of nine whole eukaryotic genomes supports a coelomate rather than ecdysozoan lineage

    Directory of Open Access Journals (Sweden)

    Stuart Gary W

    2004-12-01

    Full Text Available Abstract Background Eukaryotic whole genome sequences are accumulating at an impressive rate. Effective methods for comparing multiple whole eukaryotic genomes on a large scale are needed. Most attempted solutions involve the production of large scale alignments, and many of these require a high stringency pre-screen for putative orthologs in order to reduce the effective size of the dataset and provide a reasonably high but unknown fraction of correctly aligned homologous sites for comparison. As an alternative, highly efficient methods that do not require the pre-alignment of operationally defined orthologs are also being explored. Results A non-alignment method based on the Singular Value Decomposition (SVD was used to compare the predicted protein complement of nine whole eukaryotic genomes ranging from yeast to man. This analysis resulted in the simultaneous identification and definition of a large number of well conserved motifs and gene families, and produced a species tree supporting one of two conflicting hypotheses of metazoan relationships. Conclusions Our SVD-based analysis of the entire protein complement of nine whole eukaryotic genomes suggests that highly conserved motifs and gene families can be identified and effectively compared in a single coherent definition space for the easy extraction of gene and species trees. While this occurs without the explicit definition of orthologs or homologous sites, the analysis can provide a basis for these definitions.

  16. Practical Calling Approach for Exome Array-Based Genome-Wide Association Studies in Korean Population

    Directory of Open Access Journals (Sweden)

    Tae-Joon Park

    2015-01-01

    Full Text Available Exome-based genotyping arrays are cost-effective and have recently been used as alternative platforms to whole-exome sequencing. However, the automated clustering algorithm in an exome array has a genotype calling problem in accuracy for identifying rare and low-frequency variants. To address these shortcomings, we present a practical approach for accurate genotype calling using the Illumina Infinium HumanExome BeadChip. We present comparison results and a statistical summary of our genotype data sets. Our data set comprises 14,647 Korean samples. To solve the limitation of automated clustering, we performed manual genotype clustering for the targeted identification of 46,076 variants that were identified using GenomeStudio software. To evaluate the effects of applying custom cluster files, we tested cluster files using 804 independent Korean samples and the same platform. Our study firstly suggests practical guidelines for exome chip quality control in Asian populations and provides valuable insight into an association study using exome chip.

  17. DNA-based identification of spices: DNA isolation, whole genome amplification, and polymerase chain reaction.

    Science.gov (United States)

    Focke, Felix; Haase, Ilka; Fischer, Markus

    2011-01-26

    Usually spices are identified morphologically using simple methods like magnifying glasses or microscopic instruments. On the other hand, molecular biological methods like the polymerase chain reaction (PCR) enable an accurate and specific detection also in complex matrices. Generally, the origins of spices are plants with diverse genetic backgrounds and relationships. The processing methods used for the production of spices are complex and individual. Consequently, the development of a reliable DNA-based method for spice analysis is a challenging intention. However, once established, this method will be easily adapted to less difficult food matrices. In the current study, several alternative methods for the isolation of DNA from spices have been developed and evaluated in detail with regard to (i) its purity (photometric), (ii) yield (fluorimetric methods), and (iii) its amplifiability (PCR). Whole genome amplification methods were used to preamplify isolates to improve the ratio between amplifiable DNA and inhibiting substances. Specific primer sets were designed, and the PCR conditions were optimized to detect 18 spices selectively. Assays of self-made spice mixtures were performed to proof the applicability of the developed methods.

  18. SECOM: A novel hash seed and community detection based-approach for genome-scale protein domain identification

    KAUST Repository

    Fan, Ming

    2012-06-28

    With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx. © 2012 Fan et al.

  19. SECOM: A novel hash seed and community detection based-approach for genome-scale protein domain identification

    KAUST Repository

    Fan, Ming; Wong, Ka-Chun; Ryu, Tae Woo; Ravasi, Timothy; Gao, Xin

    2012-01-01

    With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx. © 2012 Fan et al.

  20. Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life.

    Science.gov (United States)

    Brown, Christopher T; Sharon, Itai; Thomas, Brian C; Castelle, Cindy J; Morowitz, Michael J; Banfield, Jillian F

    2013-12-17

    The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community.We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation.During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences

  1. Ebolavirus comparative genomics

    DEFF Research Database (Denmark)

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat

    2015-01-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms...

  2. The Micronutrient Genomics Project: A community-driven knowledge base for micronutrient research

    NARCIS (Netherlands)

    Ommen, B. van; El-Sohemy, A.; Hesketh, J.; Kaput, J.; Fenech, M.; Evelo, C.T.; McArdle, H.J.; Bouwman, J.; Lietz, G.; Mathers, J.C.; Fairweather-Tait, S.; Kranen, H. van; Elliott, R.; Wopereis, S.; Ferguson, L.R.; Méplan, C.; Perozzi, G.; Allen, L.; Rivero, D.

    2010-01-01

    Micronutrients influence multiple metabolic pathways including oxidative and inflammatory processes. Optimum micronutrient supply is important for the maintenance of homeostasis in metabolism and, ultimately, for maintaining good health. With advances in systems biology and genomics technologies, it

  3. An international collaborative family-based whole genome quantitative trait linkage scan for myopic refractive error

    DEFF Research Database (Denmark)

    Abbott, Diana; Li, Yi-Ju; Guggenheim, Jeremy A

    2012-01-01

    To investigate quantitative trait loci linked to refractive error, we performed a genome-wide quantitative trait linkage analysis using single nucleotide polymorphism markers and family data from five international sites....

  4. Excised radicle tips as a source of genomic DNA for PCR-based ...

    Indian Academy of Sciences (India)

    2012-12-13

    Dec 13, 2012 ... Cotton; cry1Ac; genomic DNA isolation; high-resolution melting curve analysis; radicle tip; seed purity testing .... cooled to 40°C. Fluorescence data for melting curves were ... greatly increased by introducing automation.

  5. MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement

    Directory of Open Access Journals (Sweden)

    Zhang Liqing

    2010-01-01

    Full Text Available Abstract Background Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model. However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model. Results In this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs, using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs, and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments

  6. Universal and rapid salt-extraction of high quality genomic DNA for PCR-based techniques.

    OpenAIRE

    Aljanabi, S M; Martinez, I

    1997-01-01

    A very simple, fast, universally applicable and reproducible method to extract high quality megabase genomic DNA from different organisms is described. We applied the same method to extract high quality complex genomic DNA from different tissues (wheat, barley, potato, beans, pear and almond leaves as well as fungi, insects and shrimps' fresh tissue) without any modification. The method does not require expensive and environmentally hazardous reagents and equipment. It can be performed even i...

  7. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L..

    Directory of Open Access Journals (Sweden)

    Bin Han

    Full Text Available Microsatellites or simple sequence repeats (SSRs are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR; 70,564 (23.9% were found to be monomorphic and 224,703 (76.1% were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3% amplified one locus, 8 (17.8% amplified multiple identical loci, and 13 (28.9% did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising

  8. Revisiting the classification of curtoviruses based on genome-wide pairwise identity

    KAUST Repository

    Varsani, Arvind

    2014-01-25

    Members of the genus Curtovirus (family Geminiviridae) are important pathogens of many wild and cultivated plant species. Until recently, relatively few full curtovirus genomes have been characterised. However, with the 19 full genome sequences now available in public databases, we revisit the proposed curtovirus species and strain classification criteria. Using pairwise identities coupled with phylogenetic evidence, revised species and strain demarcation guidelines have been instituted. Specifically, we have established 77% genome-wide pairwise identity as a species demarcation threshold and 94% genome-wide pairwise identity as a strain demarcation threshold. Hence, whereas curtovirus sequences with >77% genome-wide pairwise identity would be classified as belonging to the same species, those sharing >94% identity would be classified as belonging to the same strain. We provide step-by-step guidelines to facilitate the classification of newly discovered curtovirus full genome sequences and a set of defined criteria for naming new species and strains. The revision yields three curtovirus species: Beet curly top virus (BCTV), Spinach severe surly top virus (SpSCTV) and Horseradish curly top virus (HrCTV). © 2014 Springer-Verlag Wien.

  9. Revisiting the classification of curtoviruses based on genome-wide pairwise identity

    KAUST Repository

    Varsani, Arvind; Martin, Darren Patrick; Navas-Castillo, Jesú s; Moriones, Enrique; Herná ndez-Zepeda, Cecilia; Idris, Ali; Murilo Zerbini, F.; Brown, Judith K.

    2014-01-01

    Members of the genus Curtovirus (family Geminiviridae) are important pathogens of many wild and cultivated plant species. Until recently, relatively few full curtovirus genomes have been characterised. However, with the 19 full genome sequences now available in public databases, we revisit the proposed curtovirus species and strain classification criteria. Using pairwise identities coupled with phylogenetic evidence, revised species and strain demarcation guidelines have been instituted. Specifically, we have established 77% genome-wide pairwise identity as a species demarcation threshold and 94% genome-wide pairwise identity as a strain demarcation threshold. Hence, whereas curtovirus sequences with >77% genome-wide pairwise identity would be classified as belonging to the same species, those sharing >94% identity would be classified as belonging to the same strain. We provide step-by-step guidelines to facilitate the classification of newly discovered curtovirus full genome sequences and a set of defined criteria for naming new species and strains. The revision yields three curtovirus species: Beet curly top virus (BCTV), Spinach severe surly top virus (SpSCTV) and Horseradish curly top virus (HrCTV). © 2014 Springer-Verlag Wien.

  10. Highly sensitive polymerase chain reaction-free quantum dot-based quantification of forensic genomic DNA

    International Nuclear Information System (INIS)

    Tak, Yu Kyung; Kim, Won Young; Kim, Min Jung; Han, Eunyoung; Han, Myun Soo; Kim, Jong Jin; Kim, Wook; Lee, Jong Eun; Song, Joon Myong

    2012-01-01

    Highlights: ► Genomic DNA quantification were performed using a quantum dot-labeled Alu sequence. ► This probe provided PCR-free determination of human genomic DNA. ► Qdot-labeled Alu probe-hybridized genomic DNAs had a 2.5-femtogram detection limit. ► Qdot-labeled Alu sequence was used to assess DNA samples for human identification. - Abstract: Forensic DNA samples can degrade easily due to exposure to light and moisture at the crime scene. In addition, the amount of DNA acquired at a criminal site is inherently limited. This limited amount of human DNA has to be quantified accurately after the process of DNA extraction. The accurately quantified extracted genomic DNA is then used as a DNA template in polymerase chain reaction (PCR) amplification for short tandem repeat (STR) human identification. Accordingly, highly sensitive and human-specific quantification of forensic DNA samples is an essential issue in forensic study. In this work, a quantum dot (Qdot)-labeled Alu sequence was developed as a probe to simultaneously satisfy both the high sensitivity and human genome selectivity for quantification of forensic DNA samples. This probe provided PCR-free determination of human genomic DNA and had a 2.5-femtogram detection limit due to the strong emission and photostability of the Qdot. The Qdot-labeled Alu sequence has been used successfully to assess 18 different forensic DNA samples for STR human identification.

  11. Isotope-based medical research in the post genome era: Gene-orchestrated life functions in medicine seen and affected by isotopes. Workshop report

    International Nuclear Information System (INIS)

    Feinendegen, L.E.

    1997-01-01

    The US Department of Energy (DOE) and the National Institutes of Health (NIH) conducted a workshop on Isotope-Based Medical Research in the Post Genome Era at NIH, Bethesda, Maryland, November 12--14, 1997. The workshop aimed at identifying the role of stable and radioisotopes for advanced diagnosis and therapy of a wide range of illnesses using the new information that comes from the human genome program. In this sense, the agenda addressed the challenge of functional genomics in humans. The workshop addressed: functional genomics in clinical medicine; new diagnostic potentials; new therapy potentials; challenge to tracer- and effector-pharmaceutical chemistry; and project plans for joint ventures

  12. Isotope-based medical research in the post genome era: Gene-orchestrated life functions in medicine seen and affected by isotopes. Workshop report

    Energy Technology Data Exchange (ETDEWEB)

    Feinendegen, L.E.

    1997-12-31

    The US Department of Energy (DOE) and the National Institutes of Health (NIH) conducted a workshop on Isotope-Based Medical Research in the Post Genome Era at NIH, Bethesda, Maryland, November 12--14, 1997. The workshop aimed at identifying the role of stable and radioisotopes for advanced diagnosis and therapy of a wide range of illnesses using the new information that comes from the human genome program. In this sense, the agenda addressed the challenge of functional genomics in humans. The workshop addressed: functional genomics in clinical medicine; new diagnostic potentials; new therapy potentials; challenge to tracer- and effector-pharmaceutical chemistry; and project plans for joint ventures.

  13. Simultaneous genomic identification and profiling of a single cell using semiconductor-based next generation sequencing

    Directory of Open Access Journals (Sweden)

    Manabu Watanabe

    2014-09-01

    Full Text Available Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line were used as a model. Single-cell capture was performed using laser capture microdissection (LCM with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈106 cells were subjected to whole genome amplification (WGA. For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 1031–35. For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100× were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100× were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.

  14. Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

    Directory of Open Access Journals (Sweden)

    Bonten Marc JM

    2010-04-01

    Full Text Available Abstract Background The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. Results We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI, which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Conclusions Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come.

  15. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3  ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA

  16. Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

    Directory of Open Access Journals (Sweden)

    Toub Omid

    2010-10-01

    Full Text Available Abstract Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS were predicted by in silico analysis of the grapevine (Vitis vinifera genome assembly 1. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information

  17. Nonviral Genome Editing Based on a Polymer-Derivatized CRISPR Nanocomplex for Targeting Bacterial Pathogens and Antibiotic Resistance.

    Science.gov (United States)

    Kang, Yoo Kyung; Kwon, Kyu; Ryu, Jea Sung; Lee, Ha Neul; Park, Chankyu; Chung, Hyun Jung

    2017-04-19

    The overuse of antibiotics plays a major role in the emergence and spread of multidrug-resistant bacteria. A molecularly targeted, specific treatment method for bacterial pathogens can prevent this problem by reducing the selective pressure during microbial growth. Herein, we introduce a nonviral treatment strategy delivering genome editing material for targeting antibacterial resistance. We apply the CRISPR-Cas9 system, which has been recognized as an innovative tool for highly specific and efficient genome engineering in different organisms, as the delivery cargo. We utilize polymer-derivatized Cas9, by direct covalent modification of the protein with cationic polymer, for subsequent complexation with single-guide RNA targeting antibiotic resistance. We show that nanosized CRISPR complexes (= Cr-Nanocomplex) were successfully formed, while maintaining the functional activity of Cas9 endonuclease to induce double-strand DNA cleavage. We also demonstrate that the Cr-Nanocomplex designed to target mecA-the major gene involved in methicillin resistance-can be efficiently delivered into Methicillin-resistant Staphylococcus aureus (MRSA), and allow the editing of the bacterial genome with much higher efficiency compared to using native Cas9 complexes or conventional lipid-based formulations. The present study shows for the first time that a covalently modified CRISPR system allows nonviral, therapeutic genome editing, and can be potentially applied as a target specific antimicrobial.

  18. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy

    Directory of Open Access Journals (Sweden)

    Guanghong Zuo

    2015-10-01

    Full Text Available A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements.

  19. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy.

    Science.gov (United States)

    Zuo, Guanghong; Hao, Bailin

    2015-10-01

    A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  20. MiSNPDb: a web-based genomic resources of tropical ecology fruit mango (Mangifera indica L.) for phylogeography and varietal differentiation.

    Science.gov (United States)

    Iquebal, M A; Jaiswal, Sarika; Mahato, Ajay Kumar; Jayaswal, Pawan K; Angadi, U B; Kumar, Neeraj; Sharma, Nimisha; Singh, Anand K; Srivastav, Manish; Prakash, Jai; Singh, S K; Khan, Kasim; Mishra, Rupesh K; Rajan, Shailendra; Bajpai, Anju; Sandhya, B S; Nischita, Puttaraju; Ravishankar, K V; Dinesh, M R; Rai, Anil; Kumar, Dinesh; Sharma, Tilak R; Singh, Nagendra K

    2017-11-02

    Mango is one of the most important fruits of tropical ecological region of the world, well known for its nutritive value, aroma and taste. Its world production is >45MT worth >200 billion US dollars. Genomic resources are required for improvement in productivity and management of mango germplasm. There is no web-based genomic resources available for mango. Hence rapid and cost-effective high throughput putative marker discovery is required to develop such resources. RAD-based marker discovery can cater this urgent need till whole genome sequence of mango becomes available. Using a panel of 84 mango varieties, a total of 28.6 Gb data was generated by ddRAD-Seq approach on Illumina HiSeq 2000 platform. A total of 1.25 million SNPs were discovered. Phylogenetic tree using 749 common SNPs across these varieties revealed three major lineages which was compared with geographical locations. A web genomic resources MiSNPDb, available at http://webtom.cabgrid.res.in/mangosnps/ is based on 3-tier architecture, developed using PHP, MySQL and Javascript. This web genomic resources can be of immense use in the development of high density linkage map, QTL discovery, varietal differentiation, traceability, genome finishing and SNP chip development for future GWAS in genomic selection program. We report here world's first web-based genomic resources for genetic improvement and germplasm management of mango.

  1. Sequence Analysis and Characterization of Active Human Alu Subfamilies Based on the 1000 Genomes Pilot Project.

    Science.gov (United States)

    Konkel, Miriam K; Walker, Jerilyn A; Hotard, Ashley B; Ranck, Megan C; Fontenot, Catherine C; Storer, Jessica; Stewart, Chip; Marth, Gabor T; Batzer, Mark A

    2015-08-29

    The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Inverse PCR-based method for isolating novel SINEs from genome.

    Science.gov (United States)

    Han, Yawei; Chen, Liping; Guan, Lihong; He, Shunping

    2014-04-01

    Short interspersed elements (SINEs) are moderately repetitive DNA sequences in eukaryotic genomes. Although eukaryotic genomes contain numerous SINEs copy, it is very difficult and laborious to isolate and identify them by the reported methods. In this study, the inverse PCR was successfully applied to isolate SINEs from Opsariichthys bidens genome in Eastern Asian Cyprinid. A group of SINEs derived from tRNA(Ala) molecular had been identified, which were named Opsar according to Opsariichthys. SINEs characteristics were exhibited in Opsar, which contained a tRNA(Ala)-derived region at the 5' end, a tRNA-unrelated region, and AT-rich region at the 3' end. The tRNA-derived region of Opsar shared 76 % sequence similarity with tRNA(Ala) gene. This result indicated that Opsar could derive from the inactive or pseudogene of tRNA(Ala). The reliability of method was tested by obtaining C-SINE, Ct-SINE, and M-SINEs from Ctenopharyngodon idellus, Megalobrama amblycephala, and Cyprinus carpio genomes. This method is simpler than the previously reported, which successfully omitted many steps, such as preparation of probes, construction of genomic libraries, and hybridization.

  3. [Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].

    Science.gov (United States)

    Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y

    2017-08-01

    To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine

  4. A network-based approach to prioritize results from genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Nirmala Akula

    Full Text Available Genome-wide association studies (GWAS are a valuable approach to understanding the genetic basis of complex traits. One of the challenges of GWAS is the translation of genetic association results into biological hypotheses suitable for further investigation in the laboratory. To address this challenge, we introduce Network Interface Miner for Multigenic Interactions (NIMMI, a network-based method that combines GWAS data with human protein-protein interaction data (PPI. NIMMI builds biological networks weighted by connectivity, which is estimated by use of a modification of the Google PageRank algorithm. These weights are then combined with genetic association p-values derived from GWAS, producing what we call 'trait prioritized sub-networks.' As a proof of principle, NIMMI was tested on three GWAS datasets previously analyzed for height, a classical polygenic trait. Despite differences in sample size and ancestry, NIMMI captured 95% of the known height associated genes within the top 20% of ranked sub-networks, far better than what could be achieved by a single-locus approach. The top 2% of NIMMI height-prioritized sub-networks were significantly enriched for genes involved in transcription, signal transduction, transport, and gene expression, as well as nucleic acid, phosphate, protein, and zinc metabolism. All of these sub-networks were ranked near the top across all three height GWAS datasets we tested. We also tested NIMMI on a categorical phenotype, Crohn's disease. NIMMI prioritized sub-networks involved in B- and T-cell receptor, chemokine, interleukin, and other pathways consistent with the known autoimmune nature of Crohn's disease. NIMMI is a simple, user-friendly, open-source software tool that efficiently combines genetic association data with biological networks, translating GWAS findings into biological hypotheses.

  5. A Network-Based Approach to Prioritize Results from Genome-Wide Association Studies

    Science.gov (United States)

    Akula, Nirmala; Baranova, Ancha; Seto, Donald; Solka, Jeffrey; Nalls, Michael A.; Singleton, Andrew; Ferrucci, Luigi; Tanaka, Toshiko; Bandinelli, Stefania; Cho, Yoon Shin; Kim, Young Jin; Lee, Jong-Young; Han, Bok-Ghee; McMahon, Francis J.

    2011-01-01

    Genome-wide association studies (GWAS) are a valuable approach to understanding the genetic basis of complex traits. One of the challenges of GWAS is the translation of genetic association results into biological hypotheses suitable for further investigation in the laboratory. To address this challenge, we introduce Network Interface Miner for Multigenic Interactions (NIMMI), a network-based method that combines GWAS data with human protein-protein interaction data (PPI). NIMMI builds biological networks weighted by connectivity, which is estimated by use of a modification of the Google PageRank algorithm. These weights are then combined with genetic association p-values derived from GWAS, producing what we call ‘trait prioritized sub-networks.’ As a proof of principle, NIMMI was tested on three GWAS datasets previously analyzed for height, a classical polygenic trait. Despite differences in sample size and ancestry, NIMMI captured 95% of the known height associated genes within the top 20% of ranked sub-networks, far better than what could be achieved by a single-locus approach. The top 2% of NIMMI height-prioritized sub-networks were significantly enriched for genes involved in transcription, signal transduction, transport, and gene expression, as well as nucleic acid, phosphate, protein, and zinc metabolism. All of these sub-networks were ranked near the top across all three height GWAS datasets we tested. We also tested NIMMI on a categorical phenotype, Crohn’s disease. NIMMI prioritized sub-networks involved in B- and T-cell receptor, chemokine, interleukin, and other pathways consistent with the known autoimmune nature of Crohn’s disease. NIMMI is a simple, user-friendly, open-source software tool that efficiently combines genetic association data with biological networks, translating GWAS findings into biological hypotheses. PMID:21915301

  6. Evidence-based annotation of the malaria parasite's genome using comparative expression profiling.

    Directory of Open Access Journals (Sweden)

    Yingyao Zhou

    2008-02-01

    Full Text Available A fundamental problem in systems biology and whole genome sequence analysis is how to infer functions for the many uncharacterized proteins that are identified, whether they are conserved across organisms of different phyla or are phylum-specific. This problem is especially acute in pathogens, such as malaria parasites, where genetic and biochemical investigations are likely to be more difficult. Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages. We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms. We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function. These analyses, together with protein-protein interaction data, provide probabilistic models that predict the function of 926 uncharacterized malaria genes and also suggest that malaria parasites may provide a simple model system for the study of some human processes. These data also provide a foundation for further studies of transcriptional regulation in malaria parasites.

  7. Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs.

    Directory of Open Access Journals (Sweden)

    Adam H Freedman

    2016-03-01

    Full Text Available Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.

  8. Next-generation Sequencing-based genomic profiling: Fostering innovation in cancer care?

    Directory of Open Access Journals (Sweden)

    Gustavo S. Fernandes

    Full Text Available OBJECTIVES: With the development of next-generation sequencing (NGS technologies, DNA sequencing has been increasingly utilized in clinical practice. Our goal was to investigate the impact of genomic evaluation on treatment decisions for heavily pretreated patients with metastatic cancer. METHODS: We analyzed metastatic cancer patients from a single institution whose cancers had progressed after all available standard-of-care therapies and whose tumors underwent next-generation sequencing analysis. We determined the percentage of patients who received any therapy directed by the test, and its efficacy. RESULTS: From July 2013 to December 2015, 185 consecutive patients were tested using a commercially available next-generation sequencing-based test, and 157 patients were eligible. Sixty-six patients (42.0% were female, and 91 (58.0% were male. The mean age at diagnosis was 52.2 years, and the mean number of pre-test lines of systemic treatment was 2.7. One hundred and seventy-seven patients (95.6% had at least one identified gene alteration. Twenty-four patients (15.2% underwent systemic treatment directed by the test result. Of these, one patient had a complete response, four (16.7% had partial responses, two (8.3% had stable disease, and 17 (70.8% had disease progression as the best result. The median progression-free survival time with matched therapy was 1.6 months, and the median overall survival was 10 months. CONCLUSION: We identified a high prevalence of gene alterations using an next-generation sequencing test. Although some benefit was associated with the matched therapy, most of the patients had disease progression as the best response, indicating the limited biological potential and unclear clinical relevance of this practice.

  9. Development of Highly Informative Genome-Wide Single Sequence Repeat Markers for Breeding Applications in Sesame and Construction of a Web Resource: SisatBase

    Directory of Open Access Journals (Sweden)

    Komivi Dossa

    2017-08-01

    Full Text Available The sequencing of the full nuclear genome of sesame (Sesamum indicum L. provides the platform for functional analyses of genome components and their application in breeding programs. Although the importance of microsatellites markers or simple sequence repeats (SSR in crop genotyping, genetics, and breeding applications is well established, only a little information exist concerning SSRs at the whole genome level in sesame. In addition, SSRs represent a suitable marker type for sesame molecular breeding in developing countries where it is mainly grown. In this study, we identified 138,194 genome-wide SSRs of which 76.5% were physically mapped onto the 13 pseudo-chromosomes. Among these SSRs, up to three primers pairs were supplied for 101,930 SSRs and used to in silico amplify the reference genome together with two newly sequenced sesame accessions. A total of 79,957 SSRs (78% were polymorphic between the three genomes thereby suggesting their promising use in different genomics-assisted breeding applications. From these polymorphic SSRs, 23 were selected and validated to have high polymorphic potential in 48 sesame accessions from different growing areas of Africa. Furthermore, we have developed an online user-friendly database, SisatBase (http://www.sesame-bioinfo.org/SisatBase/, which provides free access to SSRs data as well as an integrated platform for functional analyses. Altogether, the reference SSR and SisatBase would serve as useful resources for genetic assessment, genomic studies, and breeding advancement in sesame, especially in developing countries.

  10. Harnessing the sorghum genome sequence:development of a genome-wide microsattelite (SSR) resource for swift genetic mapping and map based cloning in sorghum

    Science.gov (United States)

    Sorghum is the second cereal crop to have a full genome completely sequenced (Nature (2009), 457:551). This achievement is widely recognized as a scientific milestone for grass genetics and genomics in general. However, the true worth of genetic information lies in translating the sequence informa...

  11. Comparison of Burrows-Wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: application to Illumina data for livestock genomes

    Science.gov (United States)

    Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Eff...

  12. CMS: a web-based system for visualization and analysis of genome-wide methylation data of human cancers.

    Science.gov (United States)

    Gu, Fei; Doderer, Mark S; Huang, Yi-Wen; Roa, Juan C; Goodfellow, Paul J; Kizer, E Lynette; Huang, Tim H M; Chen, Yidong

    2013-01-01

    DNA methylation of promoter CpG islands is associated with gene suppression, and its unique genome-wide profiles have been linked to tumor progression. Coupled with high-throughput sequencing technologies, it can now efficiently determine genome-wide methylation profiles in cancer cells. Also, experimental and computational technologies make it possible to find the functional relationship between cancer-specific methylation patterns and their clinicopathological parameters. Cancer methylome system (CMS) is a web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. Methylation intensities were obtained from MBDCap-sequencing, pre-processed and stored in the database. 191 patient samples (169 tumor and 22 normal specimen) and 41 breast cancer cell-lines are deposited in the database, comprising about 6.6 billion uniquely mapped sequence reads. This provides comprehensive and genome-wide epigenetic portraits of human breast cancer and endometrial cancer to date. Two views are proposed for users to better understand methylation structure at the genomic level or systemic methylation alteration at the gene level. In addition, a variety of annotation tracks are provided to cover genomic information. CMS includes important analytic functions for interpretation of methylation data, such as the detection of differentially methylated regions, statistical calculation of global methylation intensities, multiple gene sets of biologically significant categories, interactivity with UCSC via custom-track data. We also present examples of discoveries utilizing the framework. CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research. CMS is freely accessible

  13. GLIDERS - A web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs

    Directory of Open Access Journals (Sweden)

    Broxholme John

    2009-10-01

    Full Text Available Abstract Background A number of tools for the examination of linkage disequilibrium (LD patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb. We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine that enables the retrieval of pairwise associations with r2 ≥ 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers. Description GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range. The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%, distance limits between SNPs (minimum and maximum, r2 (0.3 to 1, HapMap population sample (CEU, YRI and JPT+CHB combined and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file. Conclusion GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.

  14. Genome-scale comparison and constraint-based metabolic reconstruction of the facultative anaerobic Fe(III-reducer Rhodoferax ferrireducens

    Directory of Open Access Journals (Sweden)

    Daugherty Sean

    2009-09-01

    Full Text Available Abstract Background Rhodoferax ferrireducens is a metabolically versatile, Fe(III-reducing, subsurface microorganism that is likely to play an important role in the carbon and metal cycles in the subsurface. It also has the unique ability to convert sugars to electricity, oxidizing the sugars to carbon dioxide with quantitative electron transfer to graphite electrodes in microbial fuel cells. In order to expand our limited knowledge about R. ferrireducens, the complete genome sequence of this organism was further annotated and then the physiology of R. ferrireducens was investigated with a constraint-based, genome-scale in silico metabolic model and laboratory studies. Results The iterative modeling and experimental approach unveiled exciting, previously unknown physiological features, including an expanded range of substrates that support growth, such as cellobiose and citrate, and provided additional insights into important features such as the stoichiometry of the electron transport chain and the ability to grow via fumarate dismutation. Further analysis explained why R. ferrireducens is unable to grow via photosynthesis or fermentation of sugars like other members of this genus and uncovered novel genes for benzoate metabolism. The genome also revealed that R. ferrireducens is well-adapted for growth in the subsurface because it appears to be capable of dealing with a number of environmental insults, including heavy metals, aromatic compounds, nutrient limitation and oxidative stress. Conclusion This study demonstrates that combining genome-scale modeling with the annotation of a new genome sequence can guide experimental studies and accelerate the understanding of the physiology of under-studied yet environmentally relevant microorganisms.

  15. Use of genome editing tools in human stem cell-based disease modeling and precision medicine.

    Science.gov (United States)

    Wei, Yu-da; Li, Shuang; Liu, Gai-gai; Zhang, Yong-xian; Ding, Qiu-rong

    2015-10-01

    Precision medicine emerges as a new approach that takes into account individual variability. The successful conduct of precision medicine requires the use of precise disease models. Human pluripotent stem cells (hPSCs), as well as adult stem cells, can be differentiated into a variety of human somatic cell types that can be used for research and drug screening. The development of genome editing technology over the past few years, especially the CRISPR/Cas system, has made it feasible to precisely and efficiently edit the genetic background. Therefore, disease modeling by using a combination of human stem cells and genome editing technology has offered a new platform to generate " personalized " disease models, which allow the study of the contribution of individual genetic variabilities to disease progression and the development of precise treatments. In this review, recent advances in the use of genome editing in human stem cells and the generation of stem cell models for rare diseases and cancers are discussed.

  16. Phylogenetic Relationships of the Fern Cyrtomium falcatum (Dryopteridaceae from Dokdo Island Based on Chloroplast Genome Sequencing

    Directory of Open Access Journals (Sweden)

    Gurusamy Raman

    2016-12-01

    Full Text Available Cyrtomium falcatum is a popular ornamental fern cultivated worldwide. Native to the Korean Peninsula, Japan, and Dokdo Island in the Sea of Japan, it is the only fern present on Dokdo Island. We isolated and characterized the chloroplast (cp genome of C. falcatum, and compared it with those of closely related species. The genes trnV-GAC and trnV-GAU were found to be present within the cp genome of C. falcatum, whereas trnP-GGG and rpl21 were lacking. Moreover, cp genomes of Cyrtomium devexiscapulae and Adiantum capillus-veneris lack trnP-GGG and rpl21, suggesting these are not conserved among angiosperm cp genomes. The deletion of trnR-UCG, trnR-CCG, and trnSeC in the cp genomes of C. falcatum and other eupolypod ferns indicates these genes are restricted to tree ferns, non-core leptosporangiates, and basal ferns. The C. falcatum cp genome also encoded ndhF and rps7, with GUG start codons that were only conserved in polypod ferns, and it shares two significant inversions with other ferns, including a minor inversion of the trnD-GUC region and an approximate 3 kb inversion of the trnG-trnT region. Phylogenetic analyses showed that Equisetum was found to be a sister clade to Psilotales-Ophioglossales with a 100% bootstrap (BS value. The sister relationship between Pteridaceae and eupolypods was also strongly supported by a 100% BS, but Bayesian molecular clock analyses suggested that C. falcatum diversified in the mid-Paleogene period (45.15 ± 4.93 million years ago and might have moved from Eurasia to Dokdo Island.

  17. Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait.

    Science.gov (United States)

    Ober, Ulrike; Huang, Wen; Magwire, Michael; Schlather, Martin; Simianer, Henner; Mackay, Trudy F C

    2015-01-01

    The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17%) of the genetic variance among lines in females (males), the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

  18. Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17% of the genetic variance among lines in females (males, the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

  19. [Genomics innovative teaching pattern based upon amalgamation between modern educational technology and constructivism studying theory].

    Science.gov (United States)

    Liang, Xu-Fang; Peng, Jing; Zhou, Tian-Hong

    2007-04-01

    In order to overcome various malpractices in the traditional teaching methods, and also as part of the Guangdong province molecular biology perfect course project, some reforms were carried out to the teaching pattern of genomics. The reforms include using the foreign original teaching materials, bilingual teaching, as well as taking the constructivism-directed discussion teaching method and the multimedia computer-assisted instruction. To improve the scoring way and the laboratory course of the subject, we carried on a multiplex inspection systems and a self-designing experiments. Through the teaching reform on Genomics, we have gradually consummated the construction of molecular biology curriculum system.

  20. A Chado case study: an ontology-based modular schema for representing genome-associated biological information.

    Science.gov (United States)

    Mungall, Christopher J; Emmert, David B

    2007-07-01

    A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http

  1. Brief Guide to Genomics: DNA, Genes and Genomes

    Science.gov (United States)

    ... clinic. Most new drugs based on genome-based research are estimated to be at least 10 to 15 years away, though recent genome-driven efforts in lipid-lowering therapy have considerably shortened that interval. According ...

  2. Genome Maps, a new generation genome browser.

    Science.gov (United States)

    Medina, Ignacio; Salavert, Francisco; Sanchez, Rubén; de Maria, Alejandro; Alonso, Roberto; Escobar, Pablo; Bleda, Marta; Dopazo, Joaquín

    2013-07-01

    Genome browsers have gained importance as more genomes and related genomic information become available. However, the increase of information brought about by new generation sequencing technologies is, at the same time, causing a subtle but continuous decrease in the efficiency of conventional genome browsers. Here, we present Genome Maps, a genome browser that implements an innovative model of data transfer and management. The program uses highly efficient technologies from the new HTML5 standard, such as scalable vector graphics, that optimize workloads at both server and client sides and ensure future scalability. Thus, data management and representation are entirely carried out by the browser, without the need of any Java Applet, Flash or other plug-in technology installation. Relevant biological data on genes, transcripts, exons, regulatory features, single-nucleotide polymorphisms, karyotype and so forth, are imported from web services and are available as tracks. In addition, several DAS servers are already included in Genome Maps. As a novelty, this web-based genome browser allows the local upload of huge genomic data files (e.g. VCF or BAM) that can be dynamically visualized in real time at the client side, thus facilitating the management of medical data affected by privacy restrictions. Finally, Genome Maps can easily be integrated in any web application by including only a few lines of code. Genome Maps is an open source collaborative initiative available in the GitHub repository (https://github.com/compbio-bigdata-viz/genome-maps). Genome Maps is available at: http://www.genomemaps.org.

  3. Single-Cell-Based Platform for Copy Number Variation Profiling through Digital Counting of Amplified Genomic DNA Fragments.

    Science.gov (United States)

    Li, Chunmei; Yu, Zhilong; Fu, Yusi; Pang, Yuhong; Huang, Yanyi

    2017-04-26

    We develop a novel single-cell-based platform through digital counting of amplified genomic DNA fragments, named multifraction amplification (mfA), to detect the copy number variations (CNVs) in a single cell. Amplification is required to acquire genomic information from a single cell, while introducing unavoidable bias. Unlike prevalent methods that directly infer CNV profiles from the pattern of sequencing depth, our mfA platform denatures and separates the DNA molecules from a single cell into multiple fractions of a reaction mix before amplification. By examining the sequencing result of each fraction for a specific fragment and applying a segment-merge maximum likelihood algorithm to the calculation of copy number, we digitize the sequencing-depth-based CNV identification and thus provide a method that is less sensitive to the amplification bias. In this paper, we demonstrate a mfA platform through multiple displacement amplification (MDA) chemistry. When performing the mfA platform, the noise of MDA is reduced; therefore, the resolution of single-cell CNV identification can be improved to 100 kb. We can also determine the genomic region free of allelic drop-out with mfA platform, which is impossible for conventional single-cell amplification methods.

  4. Ethical issues in human genome epidemiology: a case study based on the Japanese American Family Study in Seattle, Washington.

    Science.gov (United States)

    Austin, Melissa A

    2002-04-01

    Recent completion of the draft sequence of the human genome has been greeted with both excitement and skepticism, and the potential of this accomplishment for advancing public health has been tempered by ethical concerns about the protection of human subjects. This commentary explores ethical issues arising in human genome epidemiology by using a case study approach based on the ongoing Japanese American Family Study at the University of Washington in Seattle (1994-2003). Ethical issues encountered in designing the study, collecting the data, and reporting the study results are considered. When developing studies, investigators must consider whether to restrict the study to specific racial or ethnic groups and whether community involvement is appropriate. Once the study design is in place, further ethical issues emerge, including obtaining informed consent for DNA banking and protecting the privacy and confidentiality of family members. Finally, investigators must carefully consider whether to report genotype results to study participants and whether pedigrees illustrating the results of the study will be published. Overall, the promise of genomics for improving public health must be pursued based on the fundamental ethical principles of respect for persons, beneficence, and justice.

  5. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    Science.gov (United States)

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. © The Author(s) 2016. Published by Oxford University Press.

  6. Genomic Epidemiology of Salmonella enterica Serotype Enteritidis based on Population Structure of Prevalent Lineages

    DEFF Research Database (Denmark)

    Deng, Xiangyu; Desai, Prerak T.; den Bakker, Henk C.

    2014-01-01

    serotype Nitra strains. Single-nucleotide polymorphisms were filtered to identify 4,887 reliable loci that distinguished all isolates from each other. Our whole-genome single-nucleotide polymorphism typing approach was robust for S. enterica Enteritidis subtyping with combined data for different strains...

  7. A DNA minor groove electronegative potential genome map based on photo-chemical probing

    DEFF Research Database (Denmark)

    Lindemose, Søren; Nielsen, Peter Eigil; Hansen, Morten

    2011-01-01

    The double-stranded DNA of the genome contains both sequence information directly relating to the protein and RNA coding as well as functional and structural information relating to protein recognition. Only recently is the importance of DNA shape in this recognition process being fully appreciat...

  8. Genome-based exploration of the specialized metabolic capacities of the genus Rhodococcus

    NARCIS (Netherlands)

    Ceniceros, Ana; Dijkhuizen, Lubbert; Petrusma, Mirjan; Medema, M.H.

    2017-01-01

    Background Bacteria of the genus Rhodococcus are well known for their ability to degrade a large range of organic compounds. Some rhodococci are free-living, saprophytic bacteria; others are animal and plant pathogens. Recently, several studies have shown that their genomes encode putative pathways

  9. Genome-based exploration of the specialized metabolic capacities of the genus Rhodococcus

    NARCIS (Netherlands)

    Ceniceros, Ana; Dijkhuizen, Lubbert; Petrusma, Mirjan; Medema, Marnix H.

    2017-01-01

    Background: Bacteria of the genus Rhodococcus are well known for their ability to degrade a large range of organic compounds. Some rhodococci are free-living, saprophytic bacteria; others are animal and plant pathogens. Recently, several studies have shown that their genomes encode putative pathways

  10. Genome-based exploration of the specialized metabolic capacities of the genus Rhodococcus

    NARCIS (Netherlands)

    Ceniceros, Ana; Dijkhuizen, Lubbert; Petrusma, Mirjan; Medema, Marnix H

    2017-01-01

    BACKGROUND: Bacteria of the genus Rhodococcus are well known for their ability to degrade a large range of organic compounds. Some rhodococci are free-living, saprophytic bacteria; others are animal and plant pathogens. Recently, several studies have shown that their genomes encode putative pathways

  11. Metabolic model for the filamentous ‘Candidatus Microthrix parvicella’ based on genomic and metagenomic analyses

    DEFF Research Database (Denmark)

    McIlroy, Simon Jon; Kristiansen, Rikke; Albertsen, Mads

    2013-01-01

    acids as triacylglycerols. Utilisation of trehalose and/or polyphosphate stores or partial oxidation of long-chain fatty acids may supply the energy required for anaerobic lipid uptake and storage. Comparing the genome sequence of this isolate with metagenomes from two full-scale wastewater treatment...

  12. Sandwich corrected standard errors in family-based genome-wide association studies

    NARCIS (Netherlands)

    Minica, C.C.; Dolan, C.V.; Kampert, M.M.D.; Boomsma, D.I.; Vink, J.M.

    2015-01-01

    Given the availability of genotype and phenotype data collected in family members, the question arises which estimator ensures the most optimal use of such data in genome-wide scans. Using simulations, we compared the Unweighted Least Squares (ULS) and Maximum Likelihood (ML) procedures. The former

  13. Genome-wide SNP association-based localization of a dwarfism gene in Friesian dwarf horses

    NARCIS (Netherlands)

    Orr, J.L.; Back, W.; Gu, J.; Leegwater, P.H.; Govindarajan, P.; Conroy, J.; Ducro, B.J.; Arendonk, van J.A.M.

    2010-01-01

    The recent completion of the horse genome and commercial availability of an equine SNP genotyping array has facilitated the mapping of disease genes. We report putative localization of the gene responsible for dwarfism, a trait in Friesian horses that is thought to have a recessive mode of

  14. MOVE : A Multi-Level Ontology-Based Visualization and Exploration Framework for Genomic Networks

    NARCIS (Netherlands)

    Bosman, D.W.; Blom, E.J.; Ogao, P.J.; Kuipers, O.P.; Roerdink, J.B.T.M.

    2007-01-01

    Among the various research areas that comprise bioinformatics, systems biology is gaining increasing attention. An important goal of systems biology is the unraveling of dynamic interactions between components of living cells (e.g., proteins, genes). These interactions exist among others on genomic,

  15. New traits in crops produced by genome editing techniques based on deletions

    NARCIS (Netherlands)

    Wiel, van de C.C.M.; Schaart, J.G.; Lotz, L.A.P.; Smulders, M.J.M.

    2017-01-01

    One of the most promising New Plant Breeding Techniques is genome editing (also called gene editing) with the help of a programmable site-directed nuclease (SDN). In this review, we focus on SDN-1, which is the generation of small deletions or insertions (indels) at a precisely defined location in

  16. Genomic prediction based on data from three layer lines using non-linear regression models

    NARCIS (Netherlands)

    Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.

    2014-01-01

    Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate

  17. Genomic characterization of Flavobacterium psychrophilum serotypes and development of a multiplex PCR-based serotyping scheme

    DEFF Research Database (Denmark)

    Rochat, Tatiana; Fujiwara-Nagata, Erina; Calvez, Ségolène

    2017-01-01

    Flavobacterium psychrophilum is a devastating bacterial pathogen of salmonids reared in freshwater worldwide. So far, serological diversity between isolates has been described but the underlying molecular factors remain unknown. By combining complete genome sequence analysis and the serotyping me...... for bacterial coldwater disease resistance and future vaccine formulation....

  18. Construction of chromosomal recombination maps of three genomes of lilies (Lilium) based on GISH analysis.

    NARCIS (Netherlands)

    Nadeem Khan, M.; Shujun Zhou,; Barba Gonzalez, R.; Ramanna, M.S.; Visser, R.G.F.; Tuyl, van J.M.

    2009-01-01

    Chromosomal recombination maps were constructed for three genomes of lily (Lilium) using GISH analyses. For this purpose, the backcross (BC) progenies of two diploid (2n = 2x = 24) interspecific hybrids of lily, viz. Longiflorum × Asiatic (LA) and Oriental × Asiatic (OA), were used. Mostly the BC

  19. Plant-based raw material: Improved food quality for better nutrition via plant genomics

    NARCIS (Netherlands)

    Meer, van der I.M.; Bovy, A.G.; Bosch, H.J.

    2001-01-01

    Plants form the basis of the human food chain. Characteristics of plants are therefore crucial to the quantity and quality of human food. In this review, it is discussed how technological developments in the area of plant genomics and plant genetics help to mobilise the potential of plants to

  20. Meiotic homoeologous recombination-based alien gene introgression in the genomics era of wheat

    Science.gov (United States)

    Wheat (Triticum spp.) has a narrow genetic basis due to its allopolyploid origin. However, wheat has numerous wild relatives usable for expanding genetic variability of its genome through meiotic homoeologous recombination. Traditionally, laborious cytological analyses have been employed to detect h...

  1. A revised timescale for human evolution based on ancient mitochondrial genomes

    Czech Academy of Sciences Publication Activity Database

    Fu, Q.; Mittnik, A.; Johnson, P. L. F.; Bos, K.; Lari, M.; Bollongino, R.; Sun, Ch.; Giemsch, L.; Schmitz, R.; Burger, J.; Ronchitelli, A. M.; Martini, F.; Cremonesi, R. G.; Svoboda, Jiří; Bauer, P.; Caramelli, D.; Castellano, S.; Reich, D.; Pääbo, S.; Krause, J.

    2013-01-01

    Roč. 23, April 8 (2013), s. 553-559 ISSN 0960-9822 Institutional support: RVO:68081758 Keywords : mitochondrial genome * human evolution * calibration Subject RIV: AC - Archeology, Anthropology, Ethnology OBOR OECD: Archaeology Impact factor: 9.916, year: 2013

  2. Insights into phylogeny, sex function and age of Fragaria based on whole chloroplast genome sequencing

    Science.gov (United States)

    Wambui Njunguna; Aaron Liston; Richard Cronn; Tia-Lynn Ashman; Nahla Bassil

    2013-01-01

    The cultivated strawberry is one of the youngest domesticated plants, developed in France in the 1700s from chance hybridization between two western hemisphere octoploid species. However, little is known about the evolution of the species that gave rise to this important fruit crop. Phylogenetic analysis of chloroplast genome sequences of 21 Fragaria...

  3. Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease.

    Directory of Open Access Journals (Sweden)

    Chuong B Do

    2011-06-01

    Full Text Available Although the causes of Parkinson's disease (PD are thought to be primarily environmental, recent studies suggest that a number of genes influence susceptibility. Using targeted case recruitment and online survey instruments, we conducted the largest case-control genome-wide association study (GWAS of PD based on a single collection of individuals to date (3,426 cases and 29,624 controls. We discovered two novel, genome-wide significant associations with PD-rs6812193 near SCARB2 (p = 7.6 × 10(-10, OR = 0.84 and rs11868035 near SREBF1/RAI1 (p = 5.6 × 10(-8, OR = 0.85-both replicated in an independent cohort. We also replicated 20 previously discovered genetic associations (including LRRK2, GBA, SNCA, MAPT, GAK, and the HLA region, providing support for our novel study design. Relying on a recently proposed method based on genome-wide sharing estimates between distantly related individuals, we estimated the heritability of PD to be at least 0.27. Finally, using sparse regression techniques, we constructed predictive models that account for 6%-7% of the total variance in liability and that suggest the presence of true associations just beyond genome-wide significance, as confirmed through both internal and external cross-validation. These results indicate a substantial, but by no means total, contribution of genetics underlying susceptibility to both early-onset and late-onset PD, suggesting that, despite the novel associations discovered here and elsewhere, the majority of the genetic component for Parkinson's disease remains to be discovered.

  4. Building a semantic web-based metadata repository for facilitating detailed clinical modeling in cancer genome studies.

    Science.gov (United States)

    Sharma, Deepak K; Solbrig, Harold R; Tao, Cui; Weng, Chunhua; Chute, Christopher G; Jiang, Guoqian

    2017-06-05

    Detailed Clinical Models (DCMs) have been regarded as the basis for retaining computable meaning when data are exchanged between heterogeneous computer systems. To better support clinical cancer data capturing and reporting, there is an emerging need to develop informatics solutions for standards-based clinical models in cancer study domains. The objective of the study is to develop and evaluate a cancer genome study metadata management system that serves as a key infrastructure in supporting clinical information modeling in cancer genome study domains. We leveraged a Semantic Web-based metadata repository enhanced with both ISO11179 metadata standard and Clinical Information Modeling Initiative (CIMI) Reference Model. We used the common data elements (CDEs) defined in The Cancer Genome Atlas (TCGA) data dictionary, and extracted the metadata of the CDEs using the NCI Cancer Data Standards Repository (caDSR) CDE dataset rendered in the Resource Description Framework (RDF). The ITEM/ITEM_GROUP pattern defined in the latest CIMI Reference Model is used to represent reusable model elements (mini-Archetypes). We produced a metadata repository with 38 clinical cancer genome study domains, comprising a rich collection of mini-Archetype pattern instances. We performed a case study of the domain "clinical pharmaceutical" in the TCGA data dictionary and demonstrated enriched data elements in the metadata repository are very useful in support of building detailed clinical models. Our informatics approach leveraging Semantic Web technologies provides an effective way to build a CIMI-compliant metadata repository that would facilitate the detailed clinical modeling to support use cases beyond TCGA in clinical cancer study domains.

  5. Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections.

    Science.gov (United States)

    Murray, Lee; Mobegi, Victor A; Duffy, Craig W; Assefa, Samuel A; Kwiatkowski, Dominic P; Laman, Eugene; Loua, Kovana M; Conway, David J

    2016-05-12

    In regions where malaria is endemic, individuals are often infected with multiple distinct parasite genotypes, a situation that may impact on evolution of parasite virulence and drug resistance. Most approaches to studying genotypic diversity have involved analysis of a modest number of polymorphic loci, although whole genome sequencing enables a broader characterisation of samples. PCR-based microsatellite typing of a panel of ten loci was performed on Plasmodium falciparum in 95 clinical isolates from a highly endemic area in the Republic of Guinea, to characterize within-isolate genetic diversity. Separately, single nucleotide polymorphism (SNP) data from genome-wide short-read sequences of the same samples were used to derive within-isolate fixation indices (F ws), an inverse measure of diversity within each isolate compared to overall local genetic diversity. The latter indices were compared with the microsatellite results, and also with indices derived by randomly sampling modest numbers of SNPs. As expected, the number of microsatellite loci with more than one allele in each isolate was highly significantly inversely correlated with the genome-wide F ws fixation index (r = -0.88, P 10 % had high correlation (r > 0.90) with the index derived using all SNPs. Different types of data give highly correlated indices of within-infection diversity, although PCR-based analysis detects low-level minority genotypes not apparent in bulk sequence analysis. When whole-genome data are not obtainable, quantitative assay of ten or more SNPs can yield a reasonably accurate estimate of the within-infection fixation index (F ws).

  6. Genomic Characterization of DArT Markers Based on High-Density Linkage Analysis and Physical Mapping to the Eucalyptus Genome

    Science.gov (United States)

    Petroli, César D.; Sansaloni, Carolina P.; Carling, Jason; Steane, Dorothy A.; Vaillancourt, René E.; Myburg, Alexander A.; da Silva, Orzenil Bonfim; Pappas, Georgios Joannis; Kilian, Andrzej; Grattapaglia, Dario

    2012-01-01

    Diversity Arrays Technology (DArT) provides a robust, high throughput, cost-effective method to query thousands of sequence polymorphisms in a single assay. Despite the extensive use of this genotyping platform for numerous plant species, little is known regarding the sequence attributes and genome-wide distribution of DArT markers. We investigated the genomic properties of the 7,680 DArT marker probes of a Eucalyptus array, by sequencing them, constructing a high density linkage map and carrying out detailed physical mapping analyses to the Eucalyptus grandis reference genome. A consensus linkage map with 2,274 DArT markers anchored to 210 microsatellites and a framework map, with improved support for ordering, displayed extensive collinearity with the genome sequence. Only 1.4 Mbp of the 75 Mbp of still unplaced scaffold sequence was captured by 45 linkage mapped but physically unaligned markers to the 11 main Eucalyptus pseudochromosomes, providing compelling evidence for the quality and completeness of the current Eucalyptus genome assembly. A highly significant correspondence was found between the locations of DArT markers and predicted gene models, while most of the 89 DArT probes unaligned to the genome correspond to sequences likely absent in E. grandis, consistent with the pan-genomic feature of this multi-Eucalyptus species DArT array. These comprehensive linkage-to-physical mapping analyses provide novel data regarding the genomic attributes of DArT markers in plant genomes in general and for Eucalyptus in particular. DArT markers preferentially target the gene space and display a largely homogeneous distribution across the genome, thereby providing superb coverage for mapping and genome-wide applications in breeding and diversity studies. Data reported on these ubiquitous properties of DArT markers will be particularly valuable to researchers working on less-studied crop species who already count on DArT genotyping arrays but for which no reference

  7. Genomic characterization of DArT markers based on high-density linkage analysis and physical mapping to the Eucalyptus genome.

    Directory of Open Access Journals (Sweden)

    César D Petroli

    Full Text Available Diversity Arrays Technology (DArT provides a robust, high throughput, cost-effective method to query thousands of sequence polymorphisms in a single assay. Despite the extensive use of this genotyping platform for numerous plant species, little is known regarding the sequence attributes and genome-wide distribution of DArT markers. We investigated the genomic properties of the 7,680 DArT marker probes of a Eucalyptus array, by sequencing them, constructing a high density linkage map and carrying out detailed physical mapping analyses to the Eucalyptus grandis reference genome. A consensus linkage map with 2,274 DArT markers anchored to 210 microsatellites and a framework map, with improved support for ordering, displayed extensive collinearity with the genome sequence. Only 1.4 Mbp of the 75 Mbp of still unplaced scaffold sequence was captured by 45 linkage mapped but physically unaligned markers to the 11 main Eucalyptus pseudochromosomes, providing compelling evidence for the quality and completeness of the current Eucalyptus genome assembly. A highly significant correspondence was found between the locations of DArT markers and predicted gene models, while most of the 89 DArT probes unaligned to the genome correspond to sequences likely absent in E. grandis, consistent with the pan-genomic feature of this multi-Eucalyptus species DArT array. These comprehensive linkage-to-physical mapping analyses provide novel data regarding the genomic attributes of DArT markers in plant genomes in general and for Eucalyptus in particular. DArT markers preferentially target the gene space and display a largely homogeneous distribution across the genome, thereby providing superb coverage for mapping and genome-wide applications in breeding and diversity studies. Data reported on these ubiquitous properties of DArT markers will be particularly valuable to researchers working on less-studied crop species who already count on DArT genotyping arrays but for

  8. Providing guidance for genomics-based cancer treatment decisions: insights from stakeholder engagement for post-prostatectomy radiation therapy.

    Science.gov (United States)

    Abe, James; Lobo, Jennifer M; Trifiletti, Daniel M; Showalter, Timothy N

    2017-08-24

    Despite the emergence of genomics-based risk prediction tools in oncology, there is not yet an established framework for communication of test results to cancer patients to support shared decision-making. We report findings from a stakeholder engagement program that aimed to develop a framework for using Markov models with individualized model inputs, including genomics-based estimates of cancer recurrence probability, to generate personalized decision aids for prostate cancer patients faced with radiation therapy treatment decisions after prostatectomy. We engaged a total of 22 stakeholders, including: prostate cancer patients, urological surgeons, radiation oncologists, genomic testing industry representatives, and biomedical informatics faculty. Slides were at each meeting to provide background information regarding the analytical framework. Participants were invited to provide feedback during the meeting, including revising the overall project aims. Stakeholder meeting content was reviewed and summarized by stakeholder group and by theme. The majority of stakeholder suggestions focused on aspects of decision aid design and formatting. Stakeholders were enthusiastic about the potential value of using decision analysis modeling with personalized model inputs for cancer recurrence risk, as well as competing risks from age and comorbidities, to generate a patient-centered tool to assist decision-making. Stakeholders did not view privacy considerations as a major barrier to the proposed decision aid program. A common theme was that decision aids should be portable across multiple platforms (electronic and paper), should allow for interaction by the user to adjust model inputs iteratively, and available to patients both before and during consult appointments. Emphasis was placed on the challenge of explaining the model's composite result of quality-adjusted life years. A range of stakeholders provided valuable insights regarding the design of a personalized decision

  9. Personal genomics services: whose genomes?

    Science.gov (United States)

    Gurwitz, David; Bregman-Eschet, Yael

    2009-07-01

    New companies offering personal whole-genome information services over the internet are dynamic and highly visible players in the personal genomics field. For fees currently ranging from US$399 to US$2500 and a vial of saliva, individuals can now purchase online access to their individual genetic information regarding susceptibility to a range of chronic diseases and phenotypic traits based on a genome-wide SNP scan. Most of the companies offering such services are based in the United States, but their clients may come from nearly anywhere in the world. Although the scientific validity, clinical utility and potential future implications of such services are being hotly debated, several ethical and regulatory questions related to direct-to-consumer (DTC) marketing strategies of genetic tests have not yet received sufficient attention. For example, how can we minimize the risk of unauthorized third parties from submitting other people's DNA for testing? Another pressing question concerns the ownership of (genotypic and phenotypic) information, as well as the unclear legal status of customers regarding their own personal information. Current legislation in the US and Europe falls short of providing clear answers to these questions. Until the regulation of personal genomics services catches up with the technology, we call upon commercial providers to self-regulate and coordinate their activities to minimize potential risks to individual privacy. We also point out some specific steps, along the trustee model, that providers of DTC personal genomics services as well as regulators and policy makers could consider for addressing some of the concerns raised below.

  10. Putative Microsatellite DNA Marker-Based Wheat Genomic Resource for Varietal Improvement and Management

    Directory of Open Access Journals (Sweden)

    Sarika Jaiswal

    2017-11-01

    Full Text Available Wheat fulfills 20% of global caloric requirement. World needs 60% more wheat for 9 billion population by 2050 but climate change with increasing temperature is projected to affect wheat productivity adversely. Trait improvement and management of wheat germplasm requires genomic resource. Simple Sequence Repeats (SSRs being highly polymorphic and ubiquitously distributed in the genome, can be a marker of choice but there is no structured marker database with options to generate primer pairs for genotyping on desired chromosome/physical location. Previously associated markers with different wheat trait are also not available in any database. Limitations of in vitro SSR discovery can be overcome by genome-wide in silico mining of SSR. Triticum aestivum SSR database (TaSSRDb is an integrated online database with three-tier architecture, developed using PHP and MySQL and accessible at http://webtom.cabgrid.res.in/wheatssr/. For genotyping, Primer3 standalone code computes primers on user request. Chromosome-wise SSR calling for all the three sub genomes along with choice of motif types is provided in addition to the primer generation for desired marker. We report here a database of highest number of SSRs (476,169 from complex, hexaploid wheat genome (~17 GB along with previously reported 268 SSR markers associated with 11 traits. Highest (116.93 SSRs/Mb and lowest (74.57 SSRs/Mb SSR densities were found on 2D and 3A chromosome, respectively. To obtain homozygous locus, e-PCR was done. Such 30 loci were randomly selected for PCR validation in panel of 18 wheat Advance Varietal Trial (AVT lines. TaSSRDb can be a valuable genomic resource tool for linkage mapping, gene/QTL (Quantitative trait locus discovery, diversity analysis, traceability and variety identification. Varietal specific profiling and differentiation can supplement DUS (Distinctiveness, Uniformity, and Stability testing, EDV (Essentially Derived Variety/IV (Initial Variety disputes, seed

  11. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database

    Science.gov (United States)

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome’s content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by artichoke accessions, as templates. PMID:27648830

  12. SNP-based pathway enrichment analysis for genome-wide association studies

    Directory of Open Access Journals (Sweden)

    Potkin Steven G

    2011-04-01

    Full Text Available Abstract Background Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs, have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs. Results We describe a SNP-based pathway enrichment method for GWAS studies. The method consists of the following two main steps: 1 for a given pathway, using an adaptive truncated product statistic to identify all representative (potentially more than one SNPs of each gene, calculating the average number of representative SNPs for the genes, then re-selecting the representative SNPs of genes in the pathway based on this number; and 2 ranking all selected SNPs by the significance of their statistical association with a trait of interest, and testing if the set of SNPs from a particular pathway is significantly enriched with high ranks using a weighted Kolmogorov-Smirnov test. We applied our method to two large genetically distinct GWAS data sets of schizophrenia, one

  13. Ricebase: a breeding and genetics platform for rice, integrating individual molecular markers, pedigrees and whole-genome-based data.

    Science.gov (United States)

    Edwards, J D; Baldo, A M; Mueller, L A

    2016-01-01

    Ricebase (http://ricebase.org) is an integrative genomic database for rice (Oryza sativa) with an emphasis on combining datasets in a way that maintains the key links between past and current genetic studies. Ricebase includes DNA sequence data, gene annotations, nucleotide variation data and molecular marker fragment size data. Rice research has benefited from early adoption and extensive use of simple sequence repeat (SSR) markers; however, the majority of rice SSR markers were developed prior to the latest rice pseudomolecule assembly. Interpretation of new research using SNPs in the context of literature citing SSRs requires a common coordinate system. A new pipeline, using a stepwise relaxation of stringency, was used to map SSR primers onto the latest rice pseudomolecule assembly. The SSR markers and experimentally assayed amplicon sizes are presented in a relational database with a web-based front end, and are available as a track loaded in a genome browser with links connecting the browser and database. The combined capabilities of Ricebase link genetic markers, genome context, allele states across rice germplasm and potentially user curated phenotypic interpretations as a community resource for genetic discovery and breeding in rice. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the United States.

  14. The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods.

    Directory of Open Access Journals (Sweden)

    Rui Martiniano

    2017-07-01

    Full Text Available We analyse new genomic data (0.05-2.95x from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200-3500 BC to the Middle Bronze Age (1740-1430 BC and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature.

  15. Efficient anchoring of alien chromosome segments introgressed into bread wheat by new Leymus racemosus genome-based markers.

    Science.gov (United States)

    Edet, Offiong Ukpong; Kim, June-Sik; Okamoto, Masanori; Hanada, Kousuke; Takeda, Tomoyuki; Kishii, Masahiro; Gorafi, Yasir Serag Alnor; Tsujimoto, Hisashi

    2018-03-27

    The tertiary gene pool of bread wheat, to which Leymus racemosus belongs, has remained underutilized due to the current limited genomic resources of the species that constitute it. Continuous enrichment of public databases with useful information regarding these species is, therefore, needed to provide insights on their genome structures and aid successful utilization of their genes to develop improved wheat cultivars for effective management of environmental stresses. We generated de novo DNA and mRNA sequence information of L. racemosus and developed 110 polymorphic PCR-based markers from the data, and to complement the PCR markers, DArT-seq genotyping was applied to develop additional 9990 SNP markers. Approximately 52% of all the markers enabled us to clearly genotype 22 wheat-L. racemosus chromosome introgression lines, and L. racemosus chromosome-specific markers were highly efficient in detailed characterization of the translocation and recombination lines analyzed. A further analysis revealed remarkable transferability of the PCR markers to three other important Triticeae perennial species: L. mollis, Psathyrostachys huashanica and Elymus ciliaris, indicating their suitability for characterizing wheat-alien chromosome introgressions carrying chromosomes of these genomes. The efficiency of the markers in characterizing wheat-L. racemosus chromosome introgression lines proves their reliability, and their high transferability further broadens their scope of application. This is the first report on sequencing and development of markers from L. racemosus genome and the application of DArT-seq to develop markers from a perennial wild relative of wheat, marking a paradigm shift from the seeming concentration of the technology on cultivated species. Integration of these markers with appropriate cytogenetic methods would accelerate development and characterization of wheat-alien chromosome introgression lines.

  16. Analysis of codon usage patterns in Morus notabilis based on genome and transcriptome data.

    Science.gov (United States)

    Wen, Yan; Zou, Ziliang; Li, Hongshun; Xiang, Zhonghuai; He, Ningjia

    2017-06-01

    Codons play important roles in regulating gene expression levels and mRNA half-lives. However, codon usage and related studies in multicellular organisms still lag far behind those in unicellular organisms. In this study, we describe for the first time genome-wide patterns of codon bias in Morus notabilis (mulberry tree), and analyze genome-wide codon usage in 12 other species within the order Rosales. The codon usage of M. notabilis was affected by nucleotide composition, mutation pressure, nature selection, and gene expression level. Translational selection optimal codons were identified and highly expressed genes of M. notabilis tended to use the optimal codons. Genes with higher expression levels have shorter coding region and lower amino acid complexity. Housekeeping genes showed stronger translational selection, which, notably, was not caused by the large differences between the expression level of housekeeping genes and other genes.

  17. Ranking of Prokaryotic Genomes Based on Maximization of Sortedness of Gene Lengths.

    Science.gov (United States)

    Bolshoy, A; Salih, B; Cohen, I; Tatarinova, T

    How variations of gene lengths (some genes become longer than their predecessors, while other genes become shorter and the sizes of these factions are randomly different from organism to organism) depend on organismal evolution and adaptation is still an open question. We propose to rank the genomes according to lengths of their genes, and then find association between the genome rank and variousproperties, such as growth temperature, nucleotide composition, and pathogenicity. This approach reveals evolutionary driving factors. The main purpose of this study is to test effectiveness and robustness of several ranking methods. The selected method of evaluation is measuring of overall sortedness of the data. We have demonstrated that all considered methods give consistent results and Bubble Sort and Simulated Annealing achieve the highest sortedness. Also, Bubble Sort is considerably faster than the Simulated Annealing method.

  18. Genomic prediction based on next generation sequencing of 1000 F2-families in Lolium perenne L

    DEFF Research Database (Denmark)

    Fé, Dario; Ashraf, Bilal; Greve-Pedersen, Morten

    2014-01-01

    and abiotic stresses. The study is performed on 995 F2 families originated from the DLF breeding program. All families were genotyped by reduced representation sequencing. A total of 1,020,065 SNPs were detected and used for genomic prediction. First analyses, used for model testing, have been carried out...... on salt stress tolerance. Ryegrass families where sown in rockwool blocks (in four replicates) in greenhouse, and allowed to establish over 60 days using standard fertilization and watering. Three consecutive treatments, with increasing salt (NaCl) concentrations, were applied. Ten days after initiation...... of each treatment, the percentage of green matter was evaluated by visual scoring and by digital imaging. Preliminary analysis using GBLUP have identified a significant amount of genetic variance (individual heritabilities ranging between 0.20 and 0.40 and family heritabilities up to about 0.15). Genomic...

  19. Illumina based whole mitochondrial genome of Junonia iphita reveals minor intraspecific variation

    Directory of Open Access Journals (Sweden)

    Catherine Vanlalruati

    2015-12-01

    Full Text Available In the present study, the near complete mitochondrial genome (mitogenome of Junonia iphita (Lepidoptera: Nymphalidae: Nymphalinae was determined to be 14,892 bp. The gene order and orientation are identical to those in other butterfly species. The phylogenetic tree constructed from the whole mitogenomes using the 13 protein coding genes (PCGs defines the genetic relatedness of the two J. iphita species collected from two different regions. All the Junonia species clustered together, and were further subdivided into clade one consisting of J. almana and J. orithya and clade two comprising of the two J. iphita which were collected from Indo and Indochinese subregions separated by river barrier. Comparison between the two J. iphita sequences revealed minor variations and Single Nucleotide Polymorphisms were identified at 51 sites amounting to 0.4% of the entire mitochondrial genome.

  20. Genome-based exploration of the specialized metabolic capacities of the genus Rhodococcus.

    Science.gov (United States)

    Ceniceros, Ana; Dijkhuizen, Lubbert; Petrusma, Mirjan; Medema, Marnix H

    2017-08-09

    Bacteria of the genus Rhodococcus are well known for their ability to degrade a large range of organic compounds. Some rhodococci are free-living, saprophytic bacteria; others are animal and plant pathogens. Recently, several studies have shown that their genomes encode putative pathways for the synthesis of a large number of specialized metabolites that are likely to be involved in microbe-microbe and host-microbe interactions. To systematically explore the specialized metabolic potential of this genus, we here performed a comprehensive analysis of the biosynthetic coding capacity across publicly available rhododoccal genomes, and compared these with those of several Mycobacterium strains as well as that of their mutual close relative Amycolicicoccus subflavus. Comparative genomic analysis shows that most predicted biosynthetic gene cluster families in these strains are clade-specific and lack any homology with gene clusters encoding the production of known natural products. Interestingly, many of these clusters appear to encode the biosynthesis of lipopeptides, which may play key roles in the diverse environments were rhodococci thrive, by acting as biosurfactants, pathogenicity factors or antimicrobials. We also identified several gene cluster families that are universally shared among all three genera, which therefore may have a more 'primary' role in their physiology. Inactivation of these clusters by mutagenesis might help to generate weaker strains that can be used as live vaccines. The genus Rhodococcus thus provides an interesting target for natural product discovery, in view of its large and mostly uncharacterized biosynthetic repertoire, its relatively fast growth and the availability of effective genetic tools for its genomic modification.

  1. Rapid and highly efficient construction of TALE-based transcriptional regulators and nucleases for genome modification

    KAUST Repository

    Li, Lixin

    2012-01-22

    Transcription activator-like effectors (TALEs) can be used as DNA-targeting modules by engineering their repeat domains to dictate user-selected sequence specificity. TALEs have been shown to function as site-specific transcriptional activators in a variety of cell types and organisms. TALE nucleases (TALENs), generated by fusing the FokI cleavage domain to TALE, have been used to create genomic double-strand breaks. The identity of the TALE repeat variable di-residues, their number, and their order dictate the DNA sequence specificity. Because TALE repeats are nearly identical, their assembly by cloning or even by synthesis is challenging and time consuming. Here, we report the development and use of a rapid and straightforward approach for the construction of designer TALE (dTALE) activators and nucleases with user-selected DNA target specificity. Using our plasmid set of 100 repeat modules, researchers can assemble repeat domains for any 14-nucleotide target sequence in one sequential restriction-ligation cloning step and in only 24 h. We generated several custom dTALEs and dTALENs with new target sequence specificities and validated their function by transient expression in tobacco leaves and in vitro DNA cleavage assays, respectively. Moreover, we developed a web tool, called idTALE, to facilitate the design of dTALENs and the identification of their genomic targets and potential off-targets in the genomes of several model species. Our dTALE repeat assembly approach along with the web tool idTALE will expedite genome-engineering applications in a variety of cell types and organisms including plants. © 2012 Springer Science+Business Media B.V.

  2. Rapid and highly efficient construction of TALE-based transcriptional regulators and nucleases for genome modification.

    Science.gov (United States)

    Li, Lixin; Piatek, Marek J; Atef, Ahmed; Piatek, Agnieszka; Wibowo, Anjar; Fang, Xiaoyun; Sabir, J S M; Zhu, Jian-Kang; Mahfouz, Magdy M

    2012-03-01

    Transcription activator-like effectors (TALEs) can be used as DNA-targeting modules by engineering their repeat domains to dictate user-selected sequence specificity. TALEs have been shown to function as site-specific transcriptional activators in a variety of cell types and organisms. TALE nucleases (TALENs), generated by fusing the FokI cleavage domain to TALE, have been used to create genomic double-strand breaks. The identity of the TALE repeat variable di-residues, their number, and their order dictate the DNA sequence specificity. Because TALE repeats are nearly identical, their assembly by cloning or even by synthesis is challenging and time consuming. Here, we report the development and use of a rapid and straightforward approach for the construction of designer TALE (dTALE) activators and nucleases with user-selected DNA target specificity. Using our plasmid set of 100 repeat modules, researchers can assemble repeat domains for any 14-nucleotide target sequence in one sequential restriction-ligation cloning step and in only 24 h. We generated several custom dTALEs and dTALENs with new target sequence specificities and validated their function by transient expression in tobacco leaves and in vitro DNA cleavage assays, respectively. Moreover, we developed a web tool, called idTALE, to facilitate the design of dTALENs and the identification of their genomic targets and potential off-targets in the genomes of several model species. Our dTALE repeat assembly approach along with the web tool idTALE will expedite genome-engineering applications in a variety of cell types and organisms including plants.

  3. Microarray-based genomic surveying of gene polymorphisms in Chlamydia trachomatis

    OpenAIRE

    Brunelle, Brian W; Nicholson, Tracy L; Stephens, Richard S

    2004-01-01

    By comparing two fully sequenced genomes of Chlamydia trachomatis using competitive hybridization on DNA microarrays, a logarithmic correlation was demonstrated between the signal ratio of the arrays and the 75-99% range of nucleotide identities of the genes. Variable genes within 14 uncharacterized strains of C. trachomatis were identified by array analysis and verified by DNA sequencing. These genes may be crucial for understanding chlamydial virulence and pathogenesis.

  4. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform

    OpenAIRE

    Wenning Zheng; Naresh V.R. Mutha; Hamed Heydari; Avirup Dutta; Cheuk Chuen Siow; Nicholas S. Jakubovics; Wei Yee Wee; Shi Yang Tan; Mia Yang Ang; Guat Jah Wong; Siew Woh Choo

    2016-01-01

    Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genome...

  5. Genetic basis of kidney cancer: Role of genomics for the development of disease-based therapeutics

    Science.gov (United States)

    Linehan, W. Marston

    2012-01-01

    Kidney cancer is not a single disease; it is made up of a number of different types of cancer, including clear cell, type 1 papillary, type 2 papillary, chromophobe, TFE3, TFEB, and oncocytoma. Sporadic, nonfamilial kidney cancer includes clear cell kidney cancer (75%), type 1 papillary kidney cancer (10%), papillary type 2 kidney cancer (including collecting duct and medullary RCC) (5%), the microphalmia-associated transcription (MiT) family translocation kidney cancers (TFE3, TFEB, and MITF), chromophobe kidney cancer (5%), and oncocytoma (5%). Each has a distinct histology, a different clinical course, responds differently to therapy, and is caused by mutation in a different gene. Genomic studies identifying the genes for kidney cancer, including the VHL, MET, FLCN, fumarate hydratase, succinate dehydrogenase, TSC1, TSC2, and TFE3 genes, have significantly altered the ways in which patients with kidney cancer are managed. While seven FDA-approved agents that target the VHL pathway have been approved for the treatment of patients with advanced kidney cancer, further genomic studies, such as whole genome sequencing, gene expression patterns, and gene copy number, will be required to gain a complete understanding of the genetic basis of kidney cancer and of the kidney cancer gene pathways and, most importantly, to provide the foundation for the development of effective forms of therapy for patients with this disease. PMID:23038766

  6. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences

    Directory of Open Access Journals (Sweden)

    Shairul Izan

    2017-08-01

    Full Text Available Whole Genome Shotgun (WGS sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb, Aegilops tauschii (4 Gb and Paphiopedilum henryanum (25 Gb. We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.

  7. Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

    Science.gov (United States)

    Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

    2018-04-01

    Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  8. Genome-wide and gene-based association studies of anxiety disorders in European and African American samples.

    Directory of Open Access Journals (Sweden)

    Takeshi Otowa

    Full Text Available Anxiety disorders (ADs are common mental disorders caused by a combination of genetic and environmental factors. Since ADs are highly comorbid with each other, partially due to shared genetic basis, studying AD phenotypes in a coordinated manner may be a powerful strategy for identifying potential genetic loci for ADs. To detect these loci, we performed genome-wide association studies (GWAS of ADs. In addition, as a complementary approach to single-locus analysis, we also conducted gene- and pathway-based analyses. GWAS data were derived from the control sample of the Molecular Genetics of Schizophrenia (MGS project (2,540 European American and 849 African American subjects genotyped on the Affymetrix GeneChip 6.0 array. We applied two phenotypic approaches: (1 categorical case-control comparisons (CC based upon psychiatric diagnoses, and (2 quantitative phenotypic factor scores (FS derived from a multivariate analysis combining information across the clinical phenotypes. Linear and logistic models were used to analyse the association with ADs using FS and CC traits, respectively. At the single locus level, no genome-wide significant association was found. A trans-population gene-based meta-analysis across both ethnic subsamples using FS identified three genes (MFAP3L on 4q32.3, NDUFAB1 and PALB2 on 16p12 with genome-wide significance (false discovery rate (FDR] <5%. At the pathway level, several terms such as transcription regulation, cytokine binding, and developmental process were significantly enriched in ADs (FDR <5%. Our approaches studying ADs as quantitative traits and utilizing the full GWAS data may be useful in identifying susceptibility genes and pathways for ADs.

  9. Long-PCR based next generation sequencing of the whole mitochondrial genome of the peacock skate Pavoraja nitida (Elasmobranchii: Arhynchobatidae).

    Science.gov (United States)

    Yang, Lei; Naylor, Gavin J P

    2016-01-01

    We determined the complete mitochondrial genome sequence (16,760 bp) of the peacock skate Pavoraja nitida using a long-PCR based next generation sequencing method. It has 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region in the typical vertebrate arrangement. Primers, protocols, and procedures used to obtain this mitogenome are provided. We anticipate that this approach will facilitate rapid collection of mitogenome sequences for studies on phylogenetic relationships, population genetics, and conservation of cartilaginous fishes.

  10. mEBT: multiple-matching Evidence-based Translator of Murine Genomic Responses for Human Immunity Studies.

    Science.gov (United States)

    Tae, Donghyun; Seok, Junhee

    2018-05-29

    In this paper, we introduce multiple-matching Evidence-based Translator (mEBT) to discover genomic responses from murine expression data for human immune studies, which are significant in the given condition of mice and likely have similar responses in the corresponding condition of human. mEBT is evaluated over multiple data sets and shows improved inter-species agreement. mEBT is expected to be useful for research groups who use murine models to study human immunity. http://cdal.korea.ac.kr/mebt/. jseok14@korea.ac.kr. Supplementary data are available at Bioinformatics online.

  11. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

    Science.gov (United States)

    Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark

    2015-01-01

    Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed

  12. Graph-based semi-supervised learning with genomic data integration using condition-responsive genes applied to phenotype classification.

    Science.gov (United States)

    Doostparast Torshizi, Abolfazl; Petzold, Linda R

    2018-01-01

    Data integration methods that combine data from different molecular levels such as genome, epigenome, transcriptome, etc., have received a great deal of interest in the past few years. It has been demonstrated that the synergistic effects of different biological data types can boost learning capabilities and lead to a better understanding of the underlying interactions among molecular levels. In this paper we present a graph-based semi-supervised classification algorithm that incorporates latent biological knowledge in the form of biological pathways with gene expression and DNA methylation data. The process of graph construction from biological pathways is based on detecting condition-responsive genes, where 3 sets of genes are finally extracted: all condition responsive genes, high-frequency condition-responsive genes, and P-value-filtered genes. The proposed approach is applied to ovarian cancer data downloaded from the Human Genome Atlas. Extensive numerical experiments demonstrate superior performance of the proposed approach compared to other state-of-the-art algorithms, including the latest graph-based classification techniques. Simulation results demonstrate that integrating various data types enhances classification performance and leads to a better understanding of interrelations between diverse omics data types. The proposed approach outperforms many of the state-of-the-art data integration algorithms. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  13. Phylogenetic relationships among amphisbaenian reptiles based on complete mitochondrial genomic sequences.

    Science.gov (United States)

    Macey, J Robert; Papenfuss, Theodore J; Kuehl, Jennifer V; Fourcade, H Mathew; Boore, Jeffrey L

    2004-10-01

    Complete mitochondrial genomic sequences are reported from 12 members in the four families of the reptile group Amphisbaenia. Analysis of 11,946 aligned nucleotide positions (5797 informative) produces a robust phylogenetic hypothesis. The family Rhineuridae is basal and Bipedidae is the sister taxon to the Amphisbaenidae plus Trogonophidae. Amphisbaenian reptiles are surprisingly old, predating the breakup of Pangaea 200 million years before present, because successive basal taxa (Rhineuridae and Bipedidae) are situated in tectonic regions of Laurasia and nested taxa (Amphisbaenidae and Trogonophidae) are found in Gondwanan regions. Thorough sampling within the Bipedidae shows that it is not tectonic movement of Baja California away from the Mexican mainland that is primary in isolating Bipes species, but rather that primary vicariance occurred between northern and southern groups. Amphisbaenian families show parallel reduction in number of limbs and Bipes species exhibit parallel reduction in number of digits. A measure is developed for comparing the phylogenetic information content of various genes. A synapomorphic trait defining the Bipedidae is a shift from the typical vertebrate mitochondrial gene arrangement to the derived state of trnE and nad6. In addition, a tandem duplication of trnT and trnP is observed in Bipes biporus with a pattern of pseudogene formation that varies among populations. The first case of convergent rearrangement of the mitochondrial genome among animals demonstrated by complete genomic sequences is reported. Relative to most vertebrates, the Rhineuridae has the block nad6, trnE switched in order with the block cob, trnT, trnP, as they are in birds.

  14. Optimization of multi-environment trials for genomic selection based on crop models.

    Science.gov (United States)

    Rincent, R; Kuhn, E; Monod, H; Oury, F-X; Rousset, M; Allard, V; Le Gouis, J

    2017-08-01

    We propose a statistical criterion to optimize multi-environment trials to predict genotype × environment interactions more efficiently, by combining crop growth models and genomic selection models. Genotype × environment interactions (GEI) are common in plant multi-environment trials (METs). In this context, models developed for genomic selection (GS) that refers to the use of genome-wide information for predicting breeding values of selection candidates need to be adapted. One promising way to increase prediction accuracy in various environments is to combine ecophysiological and genetic modelling thanks to crop growth models (CGM) incorporating genetic parameters. The efficiency of this approach relies on the quality of the parameter estimates, which depends on the environments composing this MET used for calibration. The objective of this study was to determine a method to optimize the set of environments composing the MET for estimating genetic parameters in this context. A criterion called OptiMET was defined to this aim, and was evaluated on simulated and real data, with the example of wheat phenology. The MET defined with OptiMET allowed estimating the genetic parameters with lower error, leading to higher QTL detection power and higher prediction accuracies. MET defined with OptiMET was on average more efficient than random MET composed of twice as many environments, in terms of quality of the parameter estimates. OptiMET is thus a valuable tool to determine optimal experimental conditions to best exploit MET and the phenotyping tools that are currently developed.

  15. Phylogenetic relationships among amphisbaenian reptiles based on complete mitochondrial genomic sequences

    Energy Technology Data Exchange (ETDEWEB)

    Macey, J. Robert; Papenfuss, Theodore J.; Kuehl, Jennifer V.; Fourcade, H. Matthew; Boore, Jeffrey L.

    2004-05-19

    Complete mitochondrial genomic sequences are reported from 12 members in the four families of the reptile group Amphisbaenia. Analysis of 11,946 aligned nucleotide positions (5,797 informative) produces a robust phylogenetic hypothesis. The family Rhineuridae is basal and Bipedidae is the sister taxon to the Amphisbaenidae plus Trogonophidae. Amphisbaenian reptiles are surprisingly old, predating the breakup of Pangaea 200 million years before present, because successive basal taxa (Rhineuridae and Bipedidae) are situated in tectonic regions of Laurasia and nested taxa (Amphisbaenidae and Trogonophidae) are found in Gondwanan regions. Thorough sampling within the Bipedidae shows that it is not tectonic movement of Baja California away from the Mexican mainland that is primary in isolating Bipes species, but rather that primary vicariance occurred between northern and southern groups. Amphisbaenian families show parallel reduction in number of limbs and Bipes species exhibit parallel reduction in number of digits. A measure is developed for comparing the phylogenetic information content of various genes. A synapomorphic trait defining the Bipedidae is a shift from the typical vertebrate mitochondrial gene arrangement to the derived state of trnE and nad6. In addition, a tandem duplication of trnT and trnP is observed in B. biporus with a pattern of pseudogene formation that varies among populations. The first case of convergent rearrangement of the mitochondrial genome among animals demonstrated by complete genomic sequences is reported. Relative to most vertebrates, the Rhineuridae has the block nad6, trnE switched in order with cob, trnT, trnP, as they are in birds.

  16. Protein Interaction-Based Genome-Wide Analysis of Incident Coronary Heart Disease

    DEFF Research Database (Denmark)

    Jensen, Majken Karoline; Pers, Tune Hannes; Dworzynski, Piotr

    2011-01-01

    in genes associated with risk of coronary heart disease (CHD). Methods and Results-Genome-wide association analyses of approximately approximate to 700 000 single-nucleotide polymorphisms in 899 incident CHD cases and 1823 age-and sex-matched controls within the Nurses' Health and the Health Professionals...... complex. Conclusions-The integration of a GWA study with PPI data successfully identifies a set of candidate susceptibility genes for incident CHD that would have been missed in single-marker GWA analysis. (Circ Cardiovasc Genet. 2011; 4:549-556.)...

  17. Weighting sequence variants based on their annotation increases power of whole-genome association studies

    DEFF Research Database (Denmark)

    Sveinbjornsson, Gardar; Albrechtsen, Anders; Zink, Florian

    2016-01-01

    The consensus approach to genome-wide association studies (GWAS) has been to assign equal prior probability of association to all sequence variants tested. However, some sequence variants, such as loss-of-function and missense variants, are more likely than others to affect protein function...... for the family-wise error rate (FWER), using as weights the enrichment of sequence annotations among association signals. We show that this weighted adjustment increases the power to detect association over the standard Bonferroni correction. We use the enrichment of associations by sequence annotation we have...

  18. Bias of genetic trend of genomic predictions based on both real dairy cattle and simulated data

    DEFF Research Database (Denmark)

    Ma, Peipei; Lund, Mogens Sandø; Nielsen, Ulrik Sander

    This study investigated the phenomenon of bias in the trend of genomic predictions and attempted to find the reason and solution for this bias. The data used in this study include Danish Jersey data and simulation data. In Jersey data, the bias was reduced when cows were included in the reference...... population. In simulated data, there was no bias when the test animals were unselected cows. When the G matrix was derived from genotypes of causal genes, the bias was reduced. The results suggest that the main reasons for causing the bias of the prediction trends are the selection of bulls and bull dams...

  19. Visualization for genomics: the Microbial Genome Viewer.

    Science.gov (United States)

    Kerkhoven, Robert; van Enckevort, Frank H J; Boekhorst, Jos; Molenaar, Douwe; Siezen, Roland J

    2004-07-22

    A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a MySQL database. The generated images are in scalable vector graphics (SVG) format, which is suitable for creating high-quality scalable images and dynamic Web representations. Gene-related data such as transcriptome and time-course microarray experiments can be superimposed on the maps for visual inspection. The Microbial Genome Viewer 1.0 is freely available at http://www.cmbi.kun.nl/MGV

  20. Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods

    Science.gov (United States)

    2013-01-01

    Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313

  1. Coalescent-Based Analyses of Genomic Sequence Data Provide a Robust Resolution of Phylogenetic Relationships among Major Groups of Gibbons

    Science.gov (United States)

    Shi, Cheng-Min; Yang, Ziheng

    2018-01-01

    Abstract The phylogenetic relationships among extant gibbon species remain unresolved despite numerous efforts using morphological, behavorial, and genetic data and the sequencing of whole genomes. A major challenge in reconstructing the gibbon phylogeny is the radiative speciation process, which resulted in extremely short internal branches in the species phylogeny and extensive incomplete lineage sorting with extensive gene-tree heterogeneity across the genome. Here, we analyze two genomic-scale data sets, with ∼10,000 putative noncoding and exonic loci, respectively, to estimate the species tree for the major groups of gibbons. We used the Bayesian full-likelihood method bpp under the multispecies coalescent model, which naturally accommodates incomplete lineage sorting and uncertainties in the gene trees. For comparison, we included three heuristic coalescent-based methods (mp-est, SVDQuartets, and astral) as well as concatenation. From both data sets, we infer the phylogeny for the four extant gibbon genera to be (Hylobates, (Nomascus, (Hoolock, Symphalangus))). We used simulation guided by the real data to evaluate the accuracy of the methods used. Astral, while not as efficient as bpp, performed well in estimation of the species tree even in presence of excessive incomplete lineage sorting. Concatenation, mp-est and SVDQuartets were unreliable when the species tree contains very short internal branches. Likelihood ratio test of gene flow suggests a small amount of migration from Hylobates moloch to H. pileatus, while cross-genera migration is absent or rare. Our results highlight the utility of coalescent-based methods in addressing challenging species tree problems characterized by short internal branches and rampant gene tree-species tree discordance. PMID:29087487

  2. Brute-Force Approach for Mass Spectrometry-Based Variant Peptide Identification in Proteogenomics without Personalized Genomic Data

    Science.gov (United States)

    Ivanov, Mark V.; Lobas, Anna A.; Levitsky, Lev I.; Moshkovskii, Sergei A.; Gorshkov, Mikhail V.

    2018-02-01

    In a proteogenomic approach based on tandem mass spectrometry analysis of proteolytic peptide mixtures, customized exome or RNA-seq databases are employed for identifying protein sequence variants. However, the problem of variant peptide identification without personalized genomic data is important for a variety of applications. Following the recent proposal by Chick et al. (Nat. Biotechnol. 33, 743-749, 2015) on the feasibility of such variant peptide search, we evaluated two available approaches based on the previously suggested "open" search and the "brute-force" strategy. To improve the efficiency of these approaches, we propose an algorithm for exclusion of false variant identifications from the search results involving analysis of modifications mimicking single amino acid substitutions. Also, we propose a de novo based scoring scheme for assessment of identified point mutations. In the scheme, the search engine analyzes y-type fragment ions in MS/MS spectra to confirm the location of the mutation in the variant peptide sequence.

  3. The first genetic map of a synthesized allohexaploid Brassica with A, B and C genomes based on simple sequence repeat markers.

    Science.gov (United States)

    Yang, S; Chen, S; Geng, X X; Yan, G; Li, Z Y; Meng, J L; Cowling, W A; Zhou, W J

    2016-04-01

    We present the first genetic map of an allohexaploid Brassica species, based on segregating microsatellite markers in a doubled haploid mapping population generated from a hybrid between two hexaploid parents. This study reports the first genetic map of trigenomic Brassica. A doubled haploid mapping population consisting of 189 lines was obtained via microspore culture from a hybrid H16-1 derived from a cross between two allohexaploid Brassica lines (7H170-1 and Y54-2). Simple sequence repeat primer pairs specific to the A genome (107), B genome (44) and C genome (109) were used to construct a genetic linkage map of the population. Twenty-seven linkage groups were resolved from 274 polymorphic loci on the A genome (109), B genome (49) and C genome (116) covering a total genetic distance of 3178.8 cM with an average distance between markers of 11.60 cM. This is the first genetic framework map for the artificially synthesized Brassica allohexaploids. The linkage groups represent the expected complement of chromosomes in the A, B and C genomes from the original diploid and tetraploid parents. This framework linkage map will be valuable for QTL analysis and future genetic improvement of a new allohexaploid Brassica species, and in improving our understanding of the genetic control of meiosis in new polyploids.

  4. Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.

    Science.gov (United States)

    Raghav, Sunil Kumar; Deplancke, Bart

    2012-01-01

    Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.

  5. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    Science.gov (United States)

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies.

    Science.gov (United States)

    Yung, Ling Sing; Yang, Can; Wan, Xiang; Yu, Weichuan

    2011-05-01

    Collecting millions of genetic variations is feasible with the advanced genotyping technology. With a huge amount of genetic variations data in hand, developing efficient algorithms to carry out the gene-gene interaction analysis in a timely manner has become one of the key problems in genome-wide association studies (GWAS). Boolean operation-based screening and testing (BOOST), a recent work in GWAS, completes gene-gene interaction analysis in 2.5 days on a desktop computer. Compared with central processing units (CPUs), graphic processing units (GPUs) are highly parallel hardware and provide massive computing resources. We are, therefore, motivated to use GPUs to further speed up the analysis of gene-gene interactions. We implement the BOOST method based on a GPU framework and name it GBOOST. GBOOST achieves a 40-fold speedup compared with BOOST. It completes the analysis of Wellcome Trust Case Control Consortium Type 2 Diabetes (WTCCC T2D) genome data within 1.34 h on a desktop computer equipped with Nvidia GeForce GTX 285 display card. GBOOST code is available at http://bioinformatics.ust.hk/BOOST.html#GBOOST.

  7. [Improvement of thermal adaptability and fermentation of industrial ethanologenic yeast by genomic DNA mutagenesis-based genetic recombination].

    Science.gov (United States)

    Liu, Xiuying; He, Xiuping; Lu, Ying; Zhang, Borun

    2011-07-01

    Ethanol is an attractive alternative to fossil fuels. Saccharomyces cerevisiae is the most important ethanol producer. However, in the process of industrial production of ethanol, both cell growth and fermentation of ethanologenic S. cerevisiae are dramatically affected by environmental stresses, such as thermal stress. In this study, we improved both the thermotolerance and fermentation performance of industrial ethanologenic S. cerevisiae by combined usage of chemical mutagenesis and genomic DNA mutagenesis-based genetic recombination method. The recombinant S. cerevisiae strain T44-2 could grow at 44 degrees C, 3 degrees C higher than that of the original strain CE6. The survival rate of T44-2 was 1.84 and 1.87-fold of that of CE6 when heat shock at 48 degrees C and 52 degrees C for 1 h respectively. At temperature higher than 37 degrees C, recombinant strain T44-2 always gave higher cell growth and ethanol production than those of strain CE6. Meanwhile, from 30 degrees C to 40 degrees C, recombinant strain T44-2 produces 91.2-83.8 g/L of ethanol from 200 g/L of glucose, which indicated that the recombinant strain T44-2 had both thermotolerance and broad thermal adaptability. The work offers a novel method, called genomic DNA mutagenesis-based genetic recombination, to improve the physiological functions of S. cerevisiae.

  8. Genome-based nutrition: An intervention strategy for the prevention and treatment of obesity and nonalcoholic steatohepatitis

    Science.gov (United States)

    Roman, Sonia; Ojeda-Granados, Claudia; Ramos-Lopez, Omar; Panduro, Arturo

    2015-01-01

    Obesity and nonalcoholic steatohepatitis are increasing in westernized countries, regardless of their geographic location. In Latin America, most countries, including Mexico, have a heterogeneous admixture genome with Amerindian, European and African ancestries. However, certain high allelic frequencies of several nutrient-related polymorphisms may have been achieved by past gene-nutrient interactions. Such interactions may have promoted the positive selection of variants adapted to regional food sources. At present, the unbalanced diet composition of the Mexicans has led the country to a 70% prevalence rate of overweightness and obesity due to substantial changes in food habits, among other factors. International guidelines and intervention strategies may not be adequate for all populations worldwide because they do not consider disparities in genetic and environmental factors, and thus there is a need for differential prevention and management strategies. Here, we provide the rationale for an intervention strategy for the prevention and management of obesity-related diseases such as non-alcoholic steatohepatitis based on a regionalized genome-based diet. The components required to design such a diet should focus on the specific ancestry of each population around the world and the convenience of consuming traditional ethnic food. PMID:25834309

  9. A gene-based linkage map for Bicyclus anynana butterflies allows for a comprehensive analysis of synteny with the lepidopteran reference genome.

    Directory of Open Access Journals (Sweden)

    Patrícia Beldade

    2009-02-01

    Full Text Available Lepidopterans (butterflies and moths are a rich and diverse order of insects, which, despite their economic impact and unusual biological properties, are relatively underrepresented in terms of genomic resources. The genome of the silkworm Bombyx mori has been fully sequenced, but comparative lepidopteran genomics has been hampered by the scarcity of information for other species. This is especially striking for butterflies, even though they have diverse and derived phenotypes (such as color vision and wing color patterns and are considered prime models for the evolutionary and developmental analysis of ecologically relevant, complex traits. We focus on Bicyclus anynana butterflies, a laboratory system for studying the diversification of novelties and serially repeated traits. With a panel of 12 small families and a biphasic mapping approach, we first assigned 508 expressed genes to segregation groups and then ordered 297 of them within individual linkage groups. We also coarsely mapped seven color pattern loci. This is the richest gene-based map available for any butterfly species and allowed for a broad-coverage analysis of synteny with the lepidopteran reference genome. Based on 462 pairs of mapped orthologous markers in Bi. anynana and Bo. mori, we observed strong conservation of gene assignment to chromosomes, but also evidence for numerous large- and small-scale chromosomal rearrangements. With gene collections growing for a variety of target organisms, the ability to place those genes in their proper genomic context is paramount. Methods to map expressed genes and to compare maps with relevant model systems are crucial to extend genomic-level analysis outside classical model species. Maps with gene-based markers are useful for comparative genomics and to resolve mapped genomic regions to a tractable number of candidate genes, especially if there is synteny with related model species. This is discussed in relation to the identification of

  10. Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.

    Science.gov (United States)

    Ahrenfeldt, Johanne; Skaarup, Carina; Hasman, Henrik; Pedersen, Anders Gorm; Aarestrup, Frank Møller; Lund, Ole

    2017-01-05

    Whole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods. Our aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves. We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes. Based on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the

  11. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    Science.gov (United States)

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  12. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    Energy Technology Data Exchange (ETDEWEB)

    Reddy, Tatiparthi B. K. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Thomas, Alex D. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Stamatis, Dimitri [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Bertsch, Jon [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Isbandi, Michelle [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Jansson, Jakob [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Mallajosyula, Jyothi [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Pagani, Ioanna [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lobos, Elizabeth A. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Kyrpides, Nikos C. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); King Abdulaziz Univ., Jeddah (Saudi Arabia)

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  13. Cloud Based Resource for Data Hosting, Visualization and Analysis Using UCSC Cancer Genomics Browser | Informatics Technology for Cancer Research (ITCR)

    Science.gov (United States)

    The Cancer Analysis Virtual Machine (CAVM) project will leverage cloud technology, the UCSC Cancer Genomics Browser, and the Galaxy analysis workflow system to provide investigators with a flexible, scalable platform for hosting, visualizing and analyzing their own genomic data.

  14. Marine genomics

    DEFF Research Database (Denmark)

    Oliveira Ribeiro, Ângela Maria; Foote, Andrew David; Kupczok, Anne

    2017-01-01

    Marine ecosystems occupy 71% of the surface of our planet, yet we know little about their diversity. Although the inventory of species is continually increasing, as registered by the Census of Marine Life program, only about 10% of the estimated two million marine species are known. This lag......-throughput sequencing approaches have been helping to improve our knowledge of marine biodiversity, from the rich microbial biota that forms the base of the tree of life to a wealth of plant and animal species. In this review, we present an overview of the applications of genomics to the study of marine life, from...

  15. Single-Nucleotide Variations in Cardiac Arrhythmias: Prospects for Genomics and Proteomics Based Biomarker Discovery and Diagnostics

    Directory of Open Access Journals (Sweden)

    Ayman Abunimer

    2014-03-01

    Full Text Available Cardiovascular diseases are a large contributor to causes of early death in developed countries. Some of these conditions, such as sudden cardiac death and atrial fibrillation, stem from arrhythmias—a spectrum of conditions with abnormal electrical activity in the heart. Genome-wide association studies can identify single nucleotide variations (SNVs that may predispose individuals to developing acquired forms of arrhythmias. Through manual curation of published genome-wide association studies, we have collected a comprehensive list of 75 SNVs associated with cardiac arrhythmias. Ten of the SNVs result in amino acid changes and can be used in proteomic-based detection methods. In an effort to identify additional non-synonymous mutations that affect the proteome, we analyzed the post-translational modification S-nitrosylation, which is known to affect cardiac arrhythmias. We identified loss of seven known S-nitrosylation sites due to non-synonymous single nucleotide variations (nsSNVs. For predicted nitrosylation sites we found 1429 proteins where the sites are modified due to nsSNV. Analysis of the predicted S-nitrosylation dataset for over- or under-representation (compared to the complete human proteome of pathways and functional elements shows significant statistical over-representation of the blood coagulation pathway. Gene Ontology (GO analysis displays statistically over-represented terms related to muscle contraction, receptor activity, motor activity, cystoskeleton components, and microtubule activity. Through the genomic and proteomic context of SNVs and S-nitrosylation sites presented in this study, researchers can look for variation that can predispose individuals to cardiac arrhythmias. Such attempts to elucidate mechanisms of arrhythmia thereby add yet another useful parameter in predicting susceptibility for cardiac diseases.

  16. Microarray-based comparative genomic profiling of reference strains and selected Canadian field isolates of Actinobacillus pleuropneumoniae

    Directory of Open Access Journals (Sweden)

    MacInnes Janet I

    2009-02-01

    Full Text Available Abstract Background Actinobacillus pleuropneumoniae, the causative agent of porcine pleuropneumonia, is a highly contagious respiratory pathogen that causes severe losses to the swine industry worldwide. Current commercially-available vaccines are of limited value because they do not induce cross-serovar immunity and do not prevent development of the carrier state. Microarray-based comparative genomic hybridizations (M-CGH were used to estimate whole genomic diversity of representative Actinobacillus pleuropneumoniae strains. Our goal was to identify conserved genes, especially those predicted to encode outer membrane proteins and lipoproteins because of their potential for the development of more effective vaccines. Results Using hierarchical clustering, our M-CGH results showed that the majority of the genes in the genome of the serovar 5 A. pleuropneumoniae L20 strain were conserved in the reference strains of all 15 serovars and in representative field isolates. Fifty-eight conserved genes predicted to encode for outer membrane proteins or lipoproteins were identified. As well, there were several clusters of diverged or absent genes including those associated with capsule biosynthesis, toxin production as well as genes typically associated with mobile elements. Conclusion Although A. pleuropneumoniae strains are essentially clonal, M-CGH analysis of the reference strains of the fifteen serovars and representative field isolates revealed several classes of genes that were divergent or absent. Not surprisingly, these included genes associated with capsule biosynthesis as the capsule is associated with sero-specificity. Several of the conserved genes were identified as candidates for vaccine development, and we conclude that M-CGH is a valuable tool for reverse vaccinology.

  17. Cyclone: java-based querying and computing with Pathway/Genome databases.

    Science.gov (United States)

    Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

    2007-05-15

    Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.

  18. Elephantid Genomes Reveal the Molecular Bases of Woolly Mammoth Adaptations to the Arctic

    Directory of Open Access Journals (Sweden)

    Vincent J. Lynch

    2015-07-01

    Full Text Available Woolly mammoths and living elephants are characterized by major phenotypic differences that have allowed them to live in very different environments. To identify the genetic changes that underlie the suite of woolly mammoth adaptations to extreme cold, we sequenced the nuclear genome from three Asian elephants and two woolly mammoths, and we identified and functionally annotated genetic changes unique to woolly mammoths. We found that genes with mammoth-specific amino acid changes are enriched in functions related to circadian biology, skin and hair development and physiology, lipid metabolism, adipose development and physiology, and temperature sensation. Finally, we resurrected and functionally tested the mammoth and ancestral elephant TRPV3 gene, which encodes a temperature-sensitive transient receptor potential (thermoTRP channel involved in thermal sensation and hair growth, and we show that a single mammoth-specific amino acid substitution in an otherwise highly conserved region of the TRPV3 channel strongly affects its temperature sensitivity.

  19. Prognostic impact of array-based genomic profiles in esophageal squamous cell cancer

    DEFF Research Database (Denmark)

    Carneiro, Ana; Isinger, Anna; Karlsson, Anna

    2008-01-01

    BACKGROUND: Esophageal squamous cell carcinoma (ESCC) is a genetically complex tumor type and a major cause of cancer related mortality. Although distinct genetic alterations have been linked to ESCC development and prognosis, the genetic alterations have not gained clinical applicability. We...... interdependent alterations and deranged pathways were identified and copy number changes were correlated to stage, differentiation and survival. RESULTS: Copy number alterations affected median 19% of the genome and included recurrent gains of chromosome regions 5p, 7p, 7q, 8q, 10q, 11q, 12p, 14q, 16p, 17p, 19p......p13.3 independently predicted poor survival in multivariate analysis. CONCLUSION: aCGH profiling verified genetic complexity in ESCC and herein identified imbalances of multiple central tumorigenic pathways. Distinct gains correlate with clinicopathological variables and independently predict...

  20. Genome-wide Association Study Implicates PARD3B-based AIDS Restriction

    Science.gov (United States)

    Nelson, George W.; Lautenberger, James A.; Chinn, Leslie; McIntosh, Carl; Johnson, Randall C.; Sezgin, Efe; Kessing, Bailey; Malasky, Michael; Hendrickson, Sher L.; Pontius, Joan; Tang, Minzhong; An, Ping; Winkler, Cheryl A.; Limou, Sophie; Le Clerc, Sigrid; Delaneau, Olivier; Zagury, Jean-François; Schuitemaker, Hanneke; van Manen, Daniëlle; Bream, Jay H.; Gomperts, Edward D.; Buchbinder, Susan; Goedert, James J.; Kirk, Gregory D.; O'Brien, Stephen J.

    2011-01-01

    Background. Host genetic variation influences human immunodeficiency virus (HIV) infection and progression to AIDS. Here we used clinically well-characterized subjects from 5 pretreatment HIV/AIDS cohorts for a genome-wide association study to identify gene associations with rate of AIDS progression. Methods.  European American HIV seroconverters (n = 755) were interrogated for single-nucleotide polymorphisms (SNPs) (n = 700,022) associated with progression to AIDS 1987 (Cox proportional hazards regression analysis, co-dominant model). Results.  Association with slower progression was observed for SNPs in the gene PARD3B. One of these, rs11884476, reached genome-wide significance (relative hazard = 0.3; P =3. 370 × 10−9) after statistical correction for 700,022 SNPs and contributes 4.52% of the overall variance in AIDS progression in this study. Nine of the top-ranked SNPs define a PARD3B haplotype that also displays significant association with progression to AIDS (hazard ratio, 0.3; P = 3.220 × 10−8). One of these SNPs, rs10185378, is a predicted exonic splicing enhancer; significant alteration in the expression profile of PARD3B splicing transcripts was observed in B cell lines with alternate rs10185378 genotypes. This SNP was typed in European cohorts of rapid progressors and was found to be protective for AIDS 1993 definition (odds ratio, 0.43, P = .025). Conclusions. These observations suggest a potential unsuspected pathway of host genetic influence on the dynamics of AIDS progression. PMID:21502085

  1. Genome-based identification of spliceosomal proteins in the silk moth Bombyx mori.

    Science.gov (United States)

    Somarelli, Jason A; Mesa, Annia; Fuller, Myron E; Torres, Jacqueline O; Rodriguez, Carol E; Ferrer, Christina M; Herrera, Rene J

    2010-12-01

    Pre-messenger RNA splicing is a highly conserved eukaryotic cellular function that takes place by way of a large, RNA-protein assembly known as the spliceosome. In the mammalian system, nearly 300 proteins associate with uridine-rich small nuclear (sn)RNAs to form this complex. Some of these splicing factors are ubiquitously present in the spliceosome, whereas others are involved only in the processing of specific transcripts. Several proteomics analyses have delineated the proteins of the spliceosome in several species. In this study, we mine multiple sequence data sets of the silk moth Bombyx mori in an attempt to identify the entire set of known spliceosomal proteins. Five data sets were utilized, including the 3X, 6X, and Build 2.0 genomic contigs as well as the expressed sequence tag and protein libraries. While homologs for 88% of vertebrate splicing factors were delineated in the Bombyx mori genome, there appear to be several spliceosomal polypeptides absent in Bombyx mori and seven additional insect species. This apparent increase in spliceosomal complexity in vertebrates may reflect the tissue-specific and developmental stage-specific alternative pre-mRNA splicing requirements in vertebrates. Phylogenetic analyses of 15 eukaryotic taxa using the core splicing factors suggest that the essential functional units of the pre-mRNA processing machinery have remained highly conserved from yeast to humans. The Sm and LSm proteins are the most conserved, whereas proteins of the U1 small nuclear ribonucleoprotein particle are the most divergent. These data highlight both the differential conservation and relative phylogenetic signals of the essential spliceosomal components throughout evolution. © 2010 Wiley Periodicals, Inc.

  2. Human genome I

    International Nuclear Information System (INIS)

    Anon.

    1989-01-01

    An international conference, Human Genome I, was held Oct. 2-4, 1989 in San Diego, Calif. Selected speakers discussed: Current Status of the Genome Project; Technique Innovations; Interesting regions; Applications; and Organization - Different Views of Current and Future Science and Procedures. Posters, consisting of 119 presentations, were displayed during the sessions. 119 were indexed for inclusion to the Energy Data Base

  3. A multi-sample based method for identifying common CNVs in normal human genomic structure using high-resolution aCGH data.

    Directory of Open Access Journals (Sweden)

    Chihyun Park

    Full Text Available BACKGROUND: It is difficult to identify copy number variations (CNV in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs; CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR. CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.

  4. On the analysis of genome-wide association studies in family-based designs: a universal, robust analysis approach and an application to four genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Sungho Won

    2009-11-01

    Full Text Available For genome-wide association studies in family-based designs, we propose a new, universally applicable approach. The new test statistic exploits all available information about the association, while, by virtue of its design, it maintains the same robustness against population admixture as traditional family-based approaches that are based exclusively on the within-family information. The approach is suitable for the analysis of almost any trait type, e.g. binary, continuous, time-to-onset, multivariate, etc., and combinations of those. We use simulation studies to verify all theoretically derived properties of the approach, estimate its power, and compare it with other standard approaches. We illustrate the practical implications of the new analysis method by an application to a lung-function phenotype, forced expiratory volume in one second (FEV1 in 4 genome-wide association studies.

  5. Strategies for implementing genomic selection in family-based aquaculture breeding schemes: double haploid sib test populations

    Directory of Open Access Journals (Sweden)

    Nirea Kahsay G

    2012-10-01

    Full Text Available Abstract Background Simulation studies have shown that accuracy and genetic gain are increased in genomic selection schemes compared to traditional aquaculture sib-based schemes. In genomic selection, accuracy of selection can be maximized by increasing the precision of the estimation of SNP effects and by maximizing the relationships between test sibs and candidate sibs. Another means of increasing the accuracy of the estimation of SNP effects is to create individuals in the test population with extreme genotypes. The latter approach was studied here with creation of double haploids and use of non-random mating designs. Methods Six alternative breeding schemes were simulated in which the design of the test population was varied: test sibs inherited maternal (Mat, paternal (Pat or a mixture of maternal and paternal (MatPat double haploid genomes or test sibs were obtained by maximum coancestry mating (MaxC, minimum coancestry mating (MinC, or random (RAND mating. Three thousand test sibs and 3000 candidate sibs were genotyped. The test sibs were recorded for a trait that could not be measured on the candidates and were used to estimate SNP effects. Selection was done by truncation on genome-wide estimated breeding values and 100 individuals were selected as parents each generation, equally divided between both sexes. Results Results showed a 7 to 19% increase in selection accuracy and a 6 to 22% increase in genetic gain in the MatPat scheme compared to the RAND scheme. These increases were greater with lower heritabilities. Among all other scenarios, i.e. Mat, Pat, MaxC, and MinC, no substantial differences in selection accuracy and genetic gain were observed. Conclusions In conclusion, a test population designed with a mixture of paternal and maternal double haploids, i.e. the MatPat scheme, increases substantially the accuracy of selection and genetic gain. This will be particularly interesting for traits that cannot be recorded on the

  6. Systems-based analysis of the Sarcocystis neurona genome identifies pathways that contribute to a heteroxenous life cycle.

    Science.gov (United States)

    Blazejewski, Tomasz; Nursimulu, Nirvana; Pszenny, Viviana; Dangoudoubiyam, Sriveny; Namasivayam, Sivaranjani; Chiasson, Melissa A; Chessman, Kyle; Tonkin, Michelle; Swapna, Lakshmipuram S; Hung, Stacy S; Bridgers, Joshua; Ricklefs, Stacy M; Boulanger, Martin J; Dubey, Jitender P; Porcella, Stephen F; Kissinger, Jessica C; Howe, Daniel K; Grigg, Michael E; Parkinson, John

    2015-02-10

    Sarcocystis neurona is a member of the coccidia, a clade of single-celled parasites of medical and veterinary importance including Eimeria, Sarcocystis, Neospora, and Toxoplasma. Unlike Eimeria, a single-host enteric pathogen, Sarcocystis, Neospora, and Toxoplasma are two-host parasites that infect and produce infectious tissue cysts in a wide range of intermediate hosts. As a genus, Sarcocystis is one of the most successful protozoan parasites; all vertebrates, including birds, reptiles, fish, and mammals are hosts to at least one Sarcocystis species. Here we sequenced Sarcocystis neurona, the causal agent of fatal equine protozoal myeloencephalitis. The S. neurona genome is 127 Mbp, more than twice the size of other sequenced coccidian genomes. Comparative analyses identified conservation of the invasion machinery among the coccidia. However, many dense-granule and rhoptry kinase genes, responsible for altering host effector pathways in Toxoplasma and Neospora, are absent from S. neurona. Further, S. neurona has a divergent repertoire of SRS proteins, previously implicated in tissue cyst formation in Toxoplasma. Systems-based analyses identified a series of metabolic innovations, including the ability to exploit alternative sources of energy. Finally, we present an S. neurona model detailing conserved molecular innovations that promote the transition from a purely enteric lifestyle (Eimeria) to a heteroxenous parasite capable of infecting a wide range of intermediate hosts. Sarcocystis neurona is a member of the coccidia, a clade of single-celled apicomplexan parasites responsible for major economic and health care burdens worldwide. A cousin of Plasmodium, Cryptosporidium, Theileria, and Eimeria, Sarcocystis is one of the most successful parasite genera; it is capable of infecting all vertebrates (fish, reptiles, birds, and mammals-including humans). The past decade has witnessed an increasing number of human outbreaks of clinical significance associated with

  7. Furfural-tolerant Zymomonas mobilis derived from error-prone PCR-based whole genome shuffling and their tolerant mechanism.

    Science.gov (United States)

    Huang, Suzhen; Xue, Tingli; Wang, Zhiquan; Ma, Yuanyuan; He, Xueting; Hong, Jiefang; Zou, Shaolan; Song, Hao; Zhang, Minhua

    2018-04-01

    Furfural-tolerant strain is essential for the fermentative production of biofuels or chemicals from lignocellulosic biomass. In this study, Zymomonas mobilis CP4 was for the first time subjected to error-prone PCR-based whole genome shuffling, and the resulting mutants F211 and F27 that could tolerate 3 g/L furfural were obtained. The mutant F211 under various furfural stress conditions could rapidly grow when the furfural concentration reduced to 1 g/L. Meanwhile, the two mutants also showed higher tolerance to high concentration of glucose than the control strain CP4. Genome resequencing revealed that the F211 and F27 had 12 and 13 single-nucleotide polymorphisms. The activity assay demonstrated that the activity of NADH-dependent furfural reductase in mutant F211 and CP4 was all increased under furfural stress, and the activity peaked earlier in mutant than in control. Also, furfural level in the culture of F211 was also more rapidly decreased. These indicate that the increase in furfural tolerance of the mutants may be resulted from the enhanced NADH-dependent furfural reductase activity during early log phase, which could lead to an accelerated furfural detoxification process in mutants. In all, we obtained Z. mobilis mutants with enhanced furfural and high concentration of glucose tolerance, and provided valuable clues for the mechanism of furfural tolerance and strain development.

  8. A Comparative Genomic Study in Schizophrenic and in Bipolar Disorder Patients, Based on Microarray Expression Profiling Meta-Analysis

    Directory of Open Access Journals (Sweden)

    Marianthi Logotheti

    2013-01-01

    Full Text Available Schizophrenia affecting almost 1% and bipolar disorder affecting almost 3%–5% of the global population constitute two severe mental disorders. The catecholaminergic and the serotonergic pathways have been proved to play an important role in the development of schizophrenia, bipolar disorder, and other related psychiatric disorders. The aim of the study was to perform and interpret the results of a comparative genomic profiling study in schizophrenic patients as well as in healthy controls and in patients with bipolar disorder and try to relate and integrate our results with an aberrant amino acid transport through cell membranes. In particular we have focused on genes and mechanisms involved in amino acid transport through cell membranes from whole genome expression profiling data. We performed bioinformatic analysis on raw data derived from four different published studies. In two studies postmortem samples from prefrontal cortices, derived from patients with bipolar disorder, schizophrenia, and control subjects, have been used. In another study we used samples from postmortem orbitofrontal cortex of bipolar subjects while the final study was performed based on raw data from a gene expression profiling dataset in the postmortem superior temporal cortex of schizophrenics. The data were downloaded from NCBI's GEO datasets.

  9. Early experience with formalin-fixed paraffin-embedded (FFPE) based commercial clinical genomic profiling of gliomas-robust and informative with caveats.

    Science.gov (United States)

    Movassaghi, Masoud; Shabihkhani, Maryam; Hojat, Seyed A; Williams, Ryan R; Chung, Lawrance K; Im, Kyuseok; Lucey, Gregory M; Wei, Bowen; Mareninov, Sergey; Wang, Michael W; Ng, Denise W; Tashjian, Randy S; Magaki, Shino; Perez-Rosendahl, Mari; Yang, Isaac; Khanlou, Negar; Vinters, Harry V; Liau, Linda M; Nghiemphu, Phioanh L; Lai, Albert; Cloughesy, Timothy F; Yong, William H

    2017-08-01

    Commercial targeted genomic profiling with next generation sequencing using formalin-fixed paraffin embedded (FFPE) tissue has recently entered into clinical use for diagnosis and for the guiding of therapy. However, there is limited independent data regarding the accuracy or robustness of commercial genomic profiling in gliomas. As part of patient care, FFPE samples of gliomas from 71 patients were submitted for targeted genomic profiling to one commonly used commercial vendor, Foundation Medicine. Genomic alterations were determined for the following grades or groups of gliomas; Grade I/II, Grade III, primary glioblastomas (GBMs), recurrent primary GBMs, and secondary GBMs. In addition, FFPE samples from the same patients were independently assessed with conventional methods such as immunohistochemistry (IHC), Quantitative real-time PCR (qRT-PCR), or Fluorescence in situ hybridization (FISH) for three genetic alterations: IDH1 mutations, EGFR amplification, and EGFRvIII expression. A total of 100 altered genes were detected by the aforementioned targeted genomic profiling assay. The number of different genomic alterations was significantly different between the five groups of gliomas and consistent with the literature. CDKN2A/B, TP53, and TERT were the most common genomic alterations seen in primary GBMs, whereas IDH1, TP53, and PIK3CA were the most common in secondary GBMs. Targeted genomic profiling demonstrated 92.3%-100% concordance with conventional methods. The targeted genomic profiling report provided an average of 5.5 drugs, and listed an average of 8.4 clinical trials for the 71 glioma patients studied but only a third of the trials were appropriate for glioma patients. In this limited comparison study, this commercial next generation sequencing based-targeted genomic profiling showed a high concordance rate with conventional methods for the 3 genetic alterations and identified mutations expected for the type of glioma. While it may not be feasible to

  10. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations

    Science.gov (United States)

    2012-01-01

    Background Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. Results We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension – UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent

  11. Acorn: A grid computing system for constraint based modeling and visualization of the genome scale metabolic reaction networks via a web interface

    Directory of Open Access Journals (Sweden)

    Bushell Michael E

    2011-05-01

    Full Text Available Abstract Background Constraint-based approaches facilitate the prediction of cellular metabolic capabilities, based, in turn on predictions of the repertoire of enzymes encoded in the genome. Recently, genome annotations have been used to reconstruct genome scale metabolic reaction networks for numerous species, including Homo sapiens, which allow simulations that provide valuable insights into topics, including predictions of gene essentiality of pathogens, interpretation of genetic polymorphism in metabolic disease syndromes and suggestions for novel approaches to microbial metabolic engineering. These constraint-based simulations are being integrated with the functional genomics portals, an activity that requires efficient implementation of the constraint-based simulations in the web-based environment. Results Here, we present Acorn, an open source (GNU GPL grid computing system for constraint-based simulations of genome scale metabolic reaction networks within an interactive web environment. The grid-based architecture allows efficient execution of computationally intensive, iterative protocols such as Flux Variability Analysis, which can be readily scaled up as the numbers of models (and users increase. The web interface uses AJAX, which facilitates efficient model browsing and other search functions, and intuitive implementation of appropriate simulation conditions. Research groups can install Acorn locally and create user accounts. Users can also import models in the familiar SBML format and link reaction formulas to major functional genomics portals of choice. Selected models and simulation results can be shared between different users and made publically available. Users can construct pathway map layouts and import them into the server using a desktop editor integrated within the system. Pathway maps are then used to visualise numerical results within the web environment. To illustrate these features we have deployed Acorn and created a

  12. Unveiling health attitudes and creating good-for-you foods: the genomics metaphor, consumer innovative web-based technologies.

    Science.gov (United States)

    Moskowitz, H R; German, J B; Saguy, I S

    2005-01-01

    This article presents an integrated analysis of three emerging knowledge bases in the nutrition and consumer products industries, and how they may effect the food industry. These knowledge bases produce new vistas for corporate product development, especially with respect to those foods that are positioned as 'good for you.' Couched within the current thinking of state-of-the-art knowledge and information, this article highlights how today's thinking about accelerated product development can be introduced into the food and health industries to complement these three research areas. The 3 knowledge bases are: the genomics revolution, which has opened new insights into understanding the interactions of personal needs of individual consumers with nutritionally relevant components of the foods; the investigation of food choice by scientific studies; the development of large scale databases (mega-studies) about the consumer mind. These knowledge bases, combined with new methods to understand the consumer through research, make possible a more focused development. The confluence of trends outlined in this article provides the corporation with the beginnings of a new path to a knowledge-based, principles-grounded product-development system. The approaches hold the potential to create foods based upon people's nutritional requirements combined with their individual preferences. Integrating these emerging knowledge areas with new consumer research techniques may well reshape how the food industry develops new products to satisfy consumer needs and wants.

  13. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  14. Extreme genomes

    OpenAIRE

    DeLong, Edward F

    2000-01-01

    The complete genome sequence of Thermoplasma acidophilum, an acid- and heat-loving archaeon, has recently been reported. Comparative genomic analysis of this 'extremophile' is providing new insights into the metabolic machinery, ecology and evolution of thermophilic archaea.

  15. A DNA-based pattern classifier with in vitro learning and associative recall for genomic characterization and biosensing without explicit sequence knowledge.

    Science.gov (United States)

    Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo

    2014-01-01

    Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could

  16. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko; Tanaka, Tsuyoshi; Ohyanagi, Hajime; Hsing, Yue-Ie C.; Itoh, Takeshi

    2018-01-01

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  17. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko

    2018-02-14

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  18. Grass genomes

    OpenAIRE

    Bennetzen, Jeffrey L.; SanMiguel, Phillip; Chen, Mingsheng; Tikhonov, Alexander; Francki, Michael; Avramova, Zoya

    1998-01-01

    For the most part, studies of grass genome structure have been limited to the generation of whole-genome genetic maps or the fine structure and sequence analysis of single genes or gene clusters. We have investigated large contiguous segments of the genomes of maize, sorghum, and rice, primarily focusing on intergenic spaces. Our data indicate that much (>50%) of the maize genome is composed of interspersed repetitive DNAs, primarily nested retrotransposons that in...

  19. [Preface for genome editing special issue].

    Science.gov (United States)

    Gu, Feng; Gao, Caixia

    2017-10-25

    Genome editing technology, as an innovative biotechnology, has been widely used for editing the genome from model organisms, animals, plants and microbes. CRISPR/Cas9-based genome editing technology shows its great value and potential in the dissection of functional genomics, improved breeding and genetic disease treatment. In the present special issue, the principle and application of genome editing techniques has been summarized. The advantages and disadvantages of the current genome editing technology and future prospects would also be highlighted.

  20. New sequence-based data on the relative DNA contents of chromosomes in the normal male and female human diploid genomes for radiation molecular cytogenetics

    Directory of Open Access Journals (Sweden)

    Repin Mikhail V

    2009-06-01

    Full Text Available Abstract Background The objective of this work is to obtain the correct relative DNA contents of chromosomes in the normal male and female human diploid genomes for the use at FISH analysis of radiation-induced chromosome aberrations. Results The relative DNA contents of chromosomes in the male and female human diploid genomes have been calculated from the publicly available international Human Genome Project data. New sequence-based data on the relative DNA contents of human chromosomes were compared with the data recommended by the International Atomic Energy Agency in 2001. The differences in the values of the relative DNA contents of chromosomes obtained by using different approaches for 15 human chromosomes, mainly for large chromosomes, were below 2%. For the chromosomes 13, 17, 20 and 22 the differences were above 5%. Conclusion New sequence-based data on the relative DNA contents of chromosomes in the normal male and female human diploid genomes were obtained. This approach, based on the genome sequence, can be recommended for the use in radiation molecular cytogenetics.

  1. Genome-wide SNP association-based localization of a dwarfism gene in Friesian dwarf horses.

    Science.gov (United States)

    Orr, N; Back, W; Gu, J; Leegwater, P; Govindarajan, P; Conroy, J; Ducro, B; Van Arendonk, J A M; MacHugh, D E; Ennis, S; Hill, E W; Brama, P A J

    2010-12-01

    The recent completion of the horse genome and commercial availability of an equine SNP genotyping array has facilitated the mapping of disease genes. We report putative localization of the gene responsible for dwarfism, a trait in Friesian horses that is thought to have a recessive mode of inheritance, to a 2-MB region of chromosome 14 using just 10 affected animals and 10 controls. We successfully genotyped 34,429 SNPs that were tested for association with dwarfism using chi-square tests. The most significant SNP in our study, BIEC2-239376 (P(2df)=4.54 × 10(-5), P(rec)=7.74 × 10(-6)), is located close to a gene implicated in human dwarfism. Fine-mapping and resequencing analyses did not aid in further localization of the causative variant, and replication of our findings in independent sample sets will be necessary to confirm these results. © 2010 The Authors, Journal compilation © 2010 Stichting International Foundation for Animal Genetics.

  2. An att site-based recombination reporter system for genome engineering and synthetic DNA assembly.

    Science.gov (United States)

    Bland, Michael J; Ducos-Galand, Magaly; Val, Marie-Eve; Mazel, Didier

    2017-07-14

    Direct manipulation of the genome is a widespread technique for genetic studies and synthetic biology applications. The tyrosine and serine site-specific recombination systems of bacteriophages HK022 and ΦC31 are widely used for stable directional exchange and relocation of DNA sequences, making them valuable tools in these contexts. We have developed site-specific recombination tools that allow the direct selection of recombination events by embedding the attB site from each system within the β-lactamase resistance coding sequence (bla). The HK and ΦC31 tools were developed by placing the attB sites from each system into the signal peptide cleavage site coding sequence of bla. All possible open reading frames (ORFs) were inserted and tested for recombination efficiency and bla activity. Efficient recombination was observed for all tested ORFs (3 for HK, 6 for ΦC31) as shown through a cointegrate formation assay. The bla gene with the embedded attB site was functional for eight of the nine constructs tested. The HK/ΦC31 att-bla system offers a simple way to directly select recombination events, thus enhancing the use of site-specific recombination systems for carrying out precise, large-scale DNA manipulation, and adding useful tools to the genetics toolbox. We further show the power and flexibility of bla to be used as a reporter for recombination.

  3. A new method to cluster genomes based on cumulative Fourier power spectrum.

    Science.gov (United States)

    Dong, Rui; Zhu, Ziyue; Yin, Changchuan; He, Rong L; Yau, Stephen S-T

    2018-06-20

    Analyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming. When the scale of data is large, alignment methods cannot finish calculation in reasonable time. Therefore, we present a new method using moments of cumulative Fourier power spectrum in clustering the DNA sequences. Each sequence is translated into a vector in Euclidean space. Distances between the vectors can reflect the relationships between sequences. The mapping between the spectra and moment vector is one-to-one, which means that no information is lost in the power spectra during the calculation. We cluster and classify several datasets including Influenza A, primates, and human rhinovirus (HRV) datasets to build up the phylogenetic trees. Results show that the new proposed cumulative Fourier power spectrum is much faster and more accurately than MSA and another alignment-free method known as k-mer. The research provides us new insights in the study of phylogeny, evolution, and efficient DNA comparison algorithms for large genomes. The computer programs of the cumulative Fourier power spectrum are available at GitHub (https://github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum). Copyright © 2018. Published by Elsevier B.V.

  4. Population diversity of Diaphorina citri (Hemiptera: Liviidae) in China based on whole mitochondrial genome sequences.

    Science.gov (United States)

    Wu, Fengnian; Jiang, Hongyan; Beattie, G Andrew C; Holford, Paul; Chen, Jianchi; Wallis, Christopher M; Zheng, Zheng; Deng, Xiaoling; Cen, Yijing

    2018-04-24

    Diaphorina citri (Asian citrus psyllid; ACP) transmits 'Candidatus Liberibacter asiaticus' associated with citrus Huanglongbing (HLB). ACP has been reported in 11 provinces/regions in China, yet its population diversity remains unclear. In this study, we evaluated ACP population diversity in China using representative whole mitochondrial genome (mitogenome) sequences. Additional mitogenome sequences outside China were also acquired and evaluated. The sizes of the 27 ACP mitogenome sequences ranged from 14 986 to 15 030 bp. Along with three previously published mitogenome sequences, the 30 sequences formed three major mitochondrial groups (MGs): MG1, present in southwestern China and occurring at elevations above 1000 m; MG2, present in southeastern China and Southeast Asia (Cambodia, Indonesia, Malaysia, and Vietnam) and occurring at elevations below 180 m; and MG3, present in the USA and Pakistan. Single nucleotide polymorphisms in five genes (cox2, atp8, nad3, nad1 and rrnL) contributed mostly in the ACP diversity. Among these genes, rrnL had the most variation. Mitogenome sequences analyses revealed two major phylogenetic groups of ACP present in China as well as a possible unique group present currently in Pakistan and the USA. The information could have significant implications for current ACP control and HLB management. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.

  5. Cancer genomics

    DEFF Research Database (Denmark)

    Norrild, Bodil; Guldberg, Per; Ralfkiær, Elisabeth Methner

    2007-01-01

    Almost all cells in the human body contain a complete copy of the genome with an estimated number of 25,000 genes. The sequences of these genes make up about three percent of the genome and comprise the inherited set of genetic information. The genome also contains information that determines whe...

  6. Loss of BRCA1 or BRCA2 markedly increases the rate of base substitution mutagenesis and has distinct effects on genomic deletions

    DEFF Research Database (Denmark)

    Zamborszky, J.; Szikriszt, B.; Gervai, J. Z.

    2017-01-01

    -genome sequencing of multiple isogenic chicken DT40 cell clones to precisely determine the consequences of BRCA1/2 loss on all types of genomic mutagenesis. Spontaneous base substitution mutation rates increased sevenfold upon the disruption of either BRCA1 or BRCA2, and the arising mutation spectra showed strong...... of stalled replication forks as the cause of increased mutagenesis. The high rate of base substitution mutagenesis demonstrated by our experiments is likely to significantly contribute to the oncogenic effect of the inactivation of BRCA1 or BRCA2....

  7. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome

    NARCIS (Netherlands)

    Sharp, Andrew J.; Hansen, Sierra; Selzer, Rebecca R.; Cheng, Ze; Regan, Regina; Hurst, Jane A.; Stewart, Helen; Price, Sue M.; Blair, Edward; Hennekam, Raoul C.; Fitzpatrick, Carrie A.; Segraves, Rick; Richmond, Todd A.; Guiver, Cheryl; Albertson, Donna G.; Pinkel, Daniel; Eis, Peggy S.; Schwartz, Stuart; Knight, Samantha J. L.; Eichler, Evan E.

    2006-01-01

    Genomic disorders are characterized by the presence of flanking segmental duplications that predispose these regions to recurrent rearrangement. Based on the duplication architecture of the genome, we investigated 130 regions that we hypothesized as candidates for previously undescribed genomic

  8. Accelerating Genome Editing in CHO Cells Using CRISPR Cas9 and CRISPy, a Web-Based Target Finding Tool

    DEFF Research Database (Denmark)

    Ronda, Carlotta; Pedersen, Lasse Ebdrup; Hansen, Henning Gram

    2014-01-01

    of the CRISPR Cas9 technology in CHO cells by generating site-specific gene disruptions in COSMC and FUT8, both of which encode proteins involved in glycosylation. The tested single guide RNAs (sgRNAs) created an indel frequency up to 47.3% in COSMC, while an indel frequency up to 99.7% in FUT8 was achieved...... mutations at the target sites, with a strong preference for single base indels. Finally, we have developed a user-friendly bioinformatics tool, named “CRISPy” for rapid identification of sgRNA target sequences in the CHO-K1 genome. The CRISPy tool identified 1,970,449 CRISPR targets divided into 27...

  9. phiGENOME: an integrative navigation throughout bacteriophage genomes.

    Science.gov (United States)

    Stano, Matej; Klucar, Lubos

    2011-11-01

    phiGENOME is a web-based genome browser generating dynamic and interactive graphical representation of phage genomes stored in the phiSITE, database of gene regulation in bacteriophages. phiGENOME is an integral part of the phiSITE web portal (http://www.phisite.org/phigenome) and it was optimised for visualisation of phage genomes with the emphasis on the gene regulatory elements. phiGENOME consists of three components: (i) genome map viewer built using Adobe Flash technology, providing dynamic and interactive graphical display of phage genomes; (ii) sequence browser based on precisely formatted HTML tags, providing detailed exploration of genome features on the sequence level and (iii) regulation illustrator, based on Scalable Vector Graphics (SVG) and designed for graphical representation of gene regulations. Bringing 542 complete genome sequences accompanied with their rich annotations and references, makes phiGENOME a unique information resource in the field of phage genomics. Copyright © 2011 Elsevier Inc. All rights reserved.

  10. Genomic sequencing in clinical trials

    OpenAIRE

    Mestan, Karen K; Ilkhanoff, Leonard; Mouli, Samdeep; Lin, Simon

    2011-01-01

    Abstract Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to fin...

  11. Genome-Scale, Constraint-Based Modeling of Nitrogen Oxide Fluxes during Coculture of Nitrosomonas europaea and Nitrobacter winogradskyi

    Science.gov (United States)

    Giguere, Andrew T.; Murthy, Ganti S.; Bottomley, Peter J.; Sayavedra-Soto, Luis A.

    2018-01-01

    ABSTRACT Nitrification, the aerobic oxidation of ammonia to nitrate via nitrite, emits nitrogen (N) oxide gases (NO, NO2, and N2O), which are potentially hazardous compounds that contribute to global warming. To better understand the dynamics of nitrification-derived N oxide production, we conducted culturing experiments and used an integrative genome-scale, constraint-based approach to model N oxide gas sources and sinks during complete nitrification in an aerobic coculture of two model nitrifying bacteria, the ammonia-oxidizing bacterium Nitrosomonas europaea and the nitrite-oxidizing bacterium Nitrobacter winogradskyi. The model includes biotic genome-scale metabolic models (iFC578 and iFC579) for each nitrifier and abiotic N oxide reactions. Modeling suggested both biotic and abiotic reactions are important sources and sinks of N oxides, particularly under microaerobic conditions predicted to occur in coculture. In particular, integrative modeling suggested that previous models might have underestimated gross NO production during nitrification due to not taking into account its rapid oxidation in both aqueous and gas phases. The integrative model may be found at https://github.com/chaplenf/microBiome-v2.1. IMPORTANCE Modern agriculture is sustained by application of inorganic nitrogen (N) fertilizer in the form of ammonium (NH4+). Up to 60% of NH4+-based fertilizer can be lost through leaching of nitrifier-derived nitrate (NO3−), and through the emission of N oxide gases (i.e., nitric oxide [NO], N dioxide [NO2], and nitrous oxide [N2O] gases), the latter being a potent greenhouse gas. Our approach to modeling of nitrification suggests that both biotic and abiotic mechanisms function as important sources and sinks of N oxides during microaerobic conditions and that previous models might have underestimated gross NO production during nitrification. PMID:29577088

  12. Genome-Scale, Constraint-Based Modeling of Nitrogen Oxide Fluxes during Coculture of Nitrosomonas europaea and Nitrobacter winogradskyi.

    Science.gov (United States)

    Mellbye, Brett L; Giguere, Andrew T; Murthy, Ganti S; Bottomley, Peter J; Sayavedra-Soto, Luis A; Chaplen, Frank W R

    2018-01-01

    Nitrification, the aerobic oxidation of ammonia to nitrate via nitrite, emits nitrogen (N) oxide gases (NO, NO 2 , and N 2 O), which are potentially hazardous compounds that contribute to global warming. To better understand the dynamics of nitrification-derived N oxide production, we conducted culturing experiments and used an integrative genome-scale, constraint-based approach to model N oxide gas sources and sinks during complete nitrification in an aerobic coculture of two model nitrifying bacteria, the ammonia-oxidizing bacterium Nitrosomonas europaea and the nitrite-oxidizing bacterium Nitrobacter winogradskyi . The model includes biotic genome-scale metabolic models (iFC578 and iFC579) for each nitrifier and abiotic N oxide reactions. Modeling suggested both biotic and abiotic reactions are important sources and sinks of N oxides, particularly under microaerobic conditions predicted to occur in coculture. In particular, integrative modeling suggested that previous models might have underestimated gross NO production during nitrification due to not taking into account its rapid oxidation in both aqueous and gas phases. The integrative model may be found at https://github.com/chaplenf/microBiome-v2.1. IMPORTANCE Modern agriculture is sustained by application of inorganic nitrogen (N) fertilizer in the form of ammonium (NH 4 + ). Up to 60% of NH 4 + -based fertilizer can be lost through leaching of nitrifier-derived nitrate (NO 3 - ), and through the emission of N oxide gases (i.e., nitric oxide [NO], N dioxide [NO 2 ], and nitrous oxide [N 2 O] gases), the latter being a potent greenhouse gas. Our approach to modeling of nitrification suggests that both biotic and abiotic mechanisms function as important sources and sinks of N oxides during microaerobic conditions and that previous models might have underestimated gross NO production during nitrification.

  13. Genome-based analysis of Carbapenemase-producing Klebsiella pneumoniae isolates from German hospital patients, 2008-2014.

    Science.gov (United States)

    Becker, Laura; Kaase, Martin; Pfeifer, Yvonne; Fuchs, Stephan; Reuss, Annicka; von Laer, Anja; Sin, Muna Abu; Korte-Berwanger, Miriam; Gatermann, Sören; Werner, Guido

    2018-01-01

    By using whole genome sequence data we aimed at describing a population snapshot of carbapenemase-producing K. pneumoniae isolated from hospitalized patients in Germany between 2008 and 2014. We selected a representative subset of 107 carbapenemase-producing K. pneumoniae clinical isolates possessing the four most prevalent carbapenemase types in Germany (KPC-2, KPC-3, OXA-48, NDM-1). Isolates were processed via illumina NGS. Data were analysed using different SNP-based mapping and de-novo assembly approaches. Relevant information was extracted from NGS data (antibiotic resistance determinants, wzi gene/ cps type, virulence genes). NGS data from the present study were also compared with 238 genome data from two previous international studies on K. pneumoniae. NGS-based analyses revealed a preferred prevalence of KPC-2-producing ST258 and KPC-3-producing ST512 isolates. OXA-48, being the most prevalent carbapenemase type in Germany, was associated with various K. pneumoniae strain types; most of them possessing IncL/M plasmid replicons suggesting a preferred dissemination of bla OXA-48 via this well-known plasmid type. Clusters ST15, ST147, ST258, and ST512 demonstrated an intermingled subset structure consisting of German and other European K. pneumoniae isolates. ST23 being the most frequent MLST type in Asia was found only once in Germany. This latter isolate contained an almost complete set of virulence genes and a K1 capsule suggesting occurrence of a hypervirulent ST23 strain producing OXA-48 in Germany. Our study results suggest prevalence of "classical" K. pneumonaie strain types associated with widely distributed carbapenemase genes such as ST258/KPC-2 or ST512/KPC-3 also in Germany. The finding of a supposed hypervirulent and OXA-48-producing ST23 K. pneumoniae isolates outside Asia is highly worrisome and requires intense molecular surveillance.

  14. Pathway-based analysis of a melanoma genome-wide association study: analysis of genes related to tumour-immunosuppression.

    Directory of Open Access Journals (Sweden)

    Nils Schoof

    Full Text Available Systemic immunosuppression is a risk factor for melanoma, and sunburn-induced immunosuppression is thought to be causal. Genes in immunosuppression pathways are therefore candidate melanoma-susceptibility genes. If variants within these genes individually have a small effect on disease risk, the association may be undetected in genome-wide association (GWA studies due to low power to reach a high significance level. Pathway-based approaches have been suggested as a method of incorporating a priori knowledge into the analysis of GWA studies. In this study, the association of 1113 single nucleotide polymorphisms (SNPs in 43 genes (39 genomic regions related to immunosuppression have been analysed using a gene-set approach in 1539 melanoma cases and 3917 controls from the GenoMEL consortium GWA study. The association between melanoma susceptibility and the whole set of tumour-immunosuppression genes, and also predefined functional subgroups of genes, was considered. The analysis was based on a measure formed by summing the evidence from the most significant SNP in each gene, and significance was evaluated empirically by case-control label permutation. An association was found between melanoma and the complete set of genes (p(emp=0.002, as well as the subgroups related to the generation of tolerogenic dendritic cells (p(emp=0.006 and secretion of suppressive factors (p(emp=0.0004, thus providing preliminary evidence of involvement of tumour-immunosuppression gene polymorphisms in melanoma susceptibility. The analysis was repeated on a second phase of the GenoMEL study, which showed no evidence of an association. As one of the first attempts to replicate a pathway-level association, our results suggest that low power and heterogeneity may present challenges.

  15. Systematic Identification and Assessment of Therapeutic Targets for Breast Cancer Based on Genome-Wide RNA Interference Transcriptomes

    Directory of Open Access Journals (Sweden)

    Yang Liu

    2017-02-01

    Full Text Available With accumulating public omics data, great efforts have been made to characterize the genetic heterogeneity of breast cancer. However, identifying novel targets and selecting the best from the sizeable lists of candidate targets is still a key challenge for targeted therapy, largely owing to the lack of economical, efficient and systematic discovery and assessment to prioritize potential therapeutic targets. Here, we describe an approach that combines the computational evaluation and objective, multifaceted assessment to systematically identify and prioritize targets for biological validation and therapeutic exploration. We first establish the reference gene expression profiles from breast cancer cell line MCF7 upon genome-wide RNA interference (RNAi of a total of 3689 genes, and the breast cancer query signatures using RNA-seq data generated from tissue samples of clinical breast cancer patients in the Cancer Genome Atlas (TCGA. Based on gene set enrichment analysis, we identified a set of 510 genes that when knocked down could significantly reverse the transcriptome of breast cancer state. We then perform multifaceted assessment to analyze the gene set to prioritize potential targets for gene therapy. We also propose drug repurposing opportunities and identify potentially druggable proteins that have been poorly explored with regard to the discovery of small-molecule modulators. Finally, we obtained a small list of candidate therapeutic targets for four major breast cancer subtypes, i.e., luminal A, luminal B, HER2+ and triple negative breast cancer. This RNAi transcriptome-based approach can be a helpful paradigm for relevant researches to identify and prioritize candidate targets for experimental validation.

  16. A SNP Based Linkage Map of the Arctic Charr (Salvelinus alpinus Genome Provides Insights into the Diploidization Process After Whole Genome Duplication

    Directory of Open Access Journals (Sweden)

    Cameron M. Nugent

    2017-02-01

    Full Text Available Diploidization, which follows whole genome duplication events, does not occur evenly across the genome. In salmonid fishes, certain pairs of homeologous chromosomes preserve tetraploid loci in higher frequencies toward the telomeres due to residual tetrasomic inheritance. Research suggests this occurs only in homeologous pairs where one chromosome arm has undergone a fusion event. We present a linkage map for Arctic charr (Salvelinus alpinus, a salmonid species with relatively fewer chromosome fusions. Genotype by sequencing identified 19,418 SNPs, and a linkage map consisting of 4508 markers was constructed from a subset of high quality SNPs and microsatellite markers that were used to anchor the new map to previous versions. Both male- and female-specific linkage maps contained the expected number of 39 linkage groups. The chromosome type associated with each linkage group was determined, and 10 stable metacentric chromosomes were identified, along with a chromosome polymorphism involving the sex chromosome AC04. Two instances of a weak form of pseudolinkage were detected in the telomeric regions of homeologous chromosome arms in both female and male linkage maps. Chromosome arm homologies within the Atlantic salmon (Salmo salar and rainbow trout (Oncorhynchus mykiss genomes were determined. Paralogous sequence variants (PSVs were identified, and their comparative BLASTn hit locations showed that duplicate markers exist in higher numbers on seven pairs of homeologous arms, previously identified as preserving tetrasomy in salmonid species. Homeologous arm pairs where neither arm has been part of a fusion event in Arctic charr had fewer PSVs, suggesting faster diploidization rates in these regions.

  17. Potential evolution of neurosurgical treatment paradigms for craniopharyngioma based on genomic and transcriptomic characteristics.

    Science.gov (United States)

    Robinson, Leslie C; Santagata, Sandro; Hankinson, Todd C

    2016-12-01

    The recent genomic and transcriptomic characterization of human craniopharyngiomas has provided important insights into the pathogenesis of these tumors and supports that these tumor types are distinct entities. Critically, the insights provided by these data offer the potential for the introduction of novel therapies and surgical treatment paradigms for these tumors, which are associated with high morbidity rates and morbid conditions. Mutations in the CTNNB1 gene are primary drivers of adamantinomatous craniopharyngioma (ACP) and lead to the accumulation of β-catenin protein in a subset of the nuclei within the neoplastic epithelium of these tumors. Dysregulation of epidermal growth factor receptor (EGFR) and of sonic hedgehog (SHH) signaling in ACP suggest that paracrine oncogenic mechanisms may underlie ACP growth and implicate these signaling pathways as potential targets for therapeutic intervention using directed therapies. Recent work shows that ACP cells have primary cilia, further supporting the potential importance of SHH signaling in the pathogenesis of these tumors. While further preclinical data are needed, directed therapies could defer, or replace, the need for radiation therapy and/or allow for less aggressive surgical interventions. Furthermore, the prospect for reliable control of cystic disease without the need for surgery now exists. Studies of papillary craniopharyngioma (PCP) are more clinically advanced than those for ACP. The vast majority of PCPs harbor the BRAF v600e mutation. There are now 2 reports of patients with PCP that had dramatic therapeutic responses to targeted agents. Ongoing clinical and research studies promise to not only advance our understanding of these challenging tumors but to offer new approaches for patient management.

  18. Phylogeny of caecilian amphibians (Gymnophiona) based on complete mitochondrial genomes and nuclear RAG1.

    Science.gov (United States)

    San Mauro, Diego; Gower, David J; Oommen, Oommen V; Wilkinson, Mark; Zardoya, Rafael

    2004-11-01

    We determined the complete nucleotide sequence of the mitochondrial (mt) genome of five individual caecilians (Amphibia: Gymnophiona) representing five of the six recognized families: Rhinatrema bivittatum (Rhinatrematidae), Ichthyophis glutinosus (Ichthyophiidae), Uraeotyphlus cf. oxyurus (Uraeotyphlidae), Scolecomorphus vittatus (Scolecomorphidae), and Gegeneophis ramaswamii (Caeciliidae). The organization and size of these newly determined mitogenomes are similar to those previously reported for the caecilian Typhlonectes natans (Typhlonectidae), and for other vertebrates. Nucleotide sequences of the nuclear RAG1 gene were also determined for these six species of caecilians, and the salamander Mertensiella luschani atifi. RAG1 (both at the amino acid and nucleotide level) shows slower rates of evolution than almost all mt protein-coding genes (at the amino acid level). The new mt and nuclear sequences were compared with data for other amphibians and subjected to separate and combined phylogenetic analyses (Maximum Parsimony, Minimum Evolution, Maximum Likelihood, and Bayesian Inference). All analyses strongly support the monophyly of the three amphibian Orders. The Batrachia hypothesis (Gymnophiona, (Anura, Caudata) receives moderate or good support depending on the method of analysis. Within Gymnophiona, the optimal tree (Rhinatrema, (Ichthyophis, Uraeotyphlus), (Scolecomorphus, (Gegeneophis Typhlonectes) agrees with the most recent morphological and molecular studies. The sister group relationship between Rhinatrematidae and all other caecilians, that between Ichthyophiidae and Uraeotyphlidae, and the monophyly of the higher caecilians Scolecomorphidae+Caeciliidae+Typhlonectidae, are strongly supported, whereas the relationships among the higher caecilians are less unambiguously resolved. Analysis of RAG1 is affected by a spurious local rooting problem and associated low support that is ameliorated when outgroups are excluded. Comparisons of trees using the

  19. Resource base influences genome-wide DNA methylation levels in wild baboons (Papio cynocephalus)

    Science.gov (United States)

    Lea, Amanda J.; Altmann, Jeanne; Alberts, Susan C.; Tung, Jenny

    2015-01-01

    Variation in resource availability commonly exerts strong effects on fitness-related traits in wild animals. However, we know little about the molecular mechanisms that mediate these effects, or about their persistence over time. To address these questions, we profiled genome-wide whole blood DNA methylation levels in two sets of wild baboons: (i) ‘wild-feeding’ baboons that foraged naturally in a savanna environment and (ii) ‘Lodge’ baboons that had ready access to spatially concentrated human food scraps, resulting in high feeding efficiency and low daily travel distances. We identified 1,014 sites (0.20% of sites tested) that were differentially methylated between wild-feeding and Lodge baboons, providing the first evidence that resource availability shapes the epigenome in a wild mammal. Differentially methylated sites tended to occur in contiguous stretches (i.e., in differentially methylated regions or DMRs), in promoters and enhancers, and near metabolism-related genes, supporting their functional importance in gene regulation. In agreement, reporter assay experiments confirmed that methylation at the largest identified DMR, located in the promoter of a key glycolysis-related gene, was sufficient to causally drive changes in gene expression. Intriguingly, all dispersing males carried a consistent epigenetic signature of their membership in a wild-feeding group, regardless of whether males dispersed into or out of this group as adults. Together, our findings support a role for DNA methylation in mediating ecological effects on phenotypic traits in the wild, and emphasize the dynamic environmental sensitivity of DNA methylation levels across the life course. PMID:26508127

  20. Genome-based identification and analysis of ionotropic receptors in Spodoptera litura

    Science.gov (United States)

    Zhu, Jia-Ying; Xu, Zhi-Wen; Zhang, Xin-Min; Liu, Nai-Yong

    2018-06-01

    The ability to sense and recognize various classes of compounds is of particular importance for survival and reproduction of insects. Ionotropic receptor (IR), a sub-family of the ionotropic glutamate receptor family, has been identified as one of crucial chemoreceptor super-families, which mediates the sensing of odors and/or tastants, and serves as non-chemosensory functions. Yet, little is known about IR characteristics, evolution, and functions in Lepidoptera. Here, we identify the IR gene repertoire from a destructive polyphagous pest, Spodoptera litura. The exhaustive analyses with genome and transcriptome data lead to the identification of 45 IR genes, comprising 17 antennal IRs (A-IRs), 8 Lepidoptera-specific IRs (LS-IRs), and 20 divergent IRs (D-IRs). Phylogenetic analysis reveals that S. litura A-IRs generally retain a strict single copy within each orthologous group, and two lineage expansions are observed in the D-IR sub-family including IR100d-h and 100i-o, likely attributed to gene duplications. Results of gene structure analysis classify the SlitIRs into four types: I (intronless), II (1-3 introns), III (5-9 introns), and IV (10-18 introns). Extensive expression profiles demonstrate that the majority of SlitIRs (28/43) are enriched in adult antennae, and some are detected in gustatory-associated tissues like proboscises and legs as well as non-chemosensory organs like abdomens and reproductive tissues of both sexes. These results indicate that SlitIRs have diverse functional roles in olfaction, taste, and reproduction. Together, our study has complemented the information on chemoreceptor genes in S. litura, and meanwhile allows for target experiments to identify potential IR candidates for the control of this pest.

  1. Genome-wide screening for genes whose deletions confer sensitivity to mutagenic purine base analogs in yeast

    Directory of Open Access Journals (Sweden)

    Kozmin Stanislav G

    2005-06-01

    Full Text Available Abstract Background N-hydroxylated base analogs, such as 6-hydroxylaminopurine (HAP and 2-amino-6-hydroxylaminopurine (AHA, are strong mutagens in various organisms due to their ambiguous base-pairing properties. The systems protecting cells from HAP and related noncanonical purines in Escherichia coli include specialized deoxyribonucleoside triphosphatase RdgB, DNA repair endonuclease V, and a molybdenum cofactor-dependent system. Fewer HAP-detoxification systems have been identified in yeast Saccharomyces cerevisiae and other eukaryotes. Cellular systems protecting from AHA are unknown. In the present study, we performed a genome-wide search for genes whose deletions confer sensitivity to HAP and AHA in yeast. Results We screened the library of yeast deletion mutants for sensitivity to the toxic and mutagenic action of HAP and AHA. We identified novel genes involved in the genetic control of base analogs sensitivity, including genes controlling purine metabolism, cytoskeleton organization, and amino acid metabolism. Conclusion We developed a method for screening the yeast deletion library for sensitivity to the mutagenic and toxic action of base analogs and identified 16 novel genes controlling pathways of protection from HAP. Three of them also protect from AHA.

  2. Traditional medicine and genomics

    Directory of Open Access Journals (Sweden)

    Kalpana Joshi

    2010-01-01

    Full Text Available ′Omics′ developments in the form of genomics, proteomics and metabolomics have increased the impetus of traditional medicine research. Studies exploring the genomic, proteomic and metabolomic basis of human constitutional types based on Ayurveda and other systems of oriental medicine are becoming popular. Such studies remain important to developing better understanding of human variations and individual differences. Countries like India, Korea, China and Japan are investing in research on evidence-based traditional medicines and scientific validation of fundamental principles. This review provides an account of studies addressing relationships between traditional medicine and genomics.

  3. Traditional medicine and genomics.

    Science.gov (United States)

    Joshi, Kalpana; Ghodke, Yogita; Shintre, Pooja

    2010-01-01

    'Omics' developments in the form of genomics, proteomics and metabolomics have increased the impetus of traditional medicine research. Studies exploring the genomic, proteomic and metabolomic basis of human constitutional types based on Ayurveda and other systems of oriental medicine are becoming popular. Such studies remain important to developing better understanding of human variations and individual differences. Countries like India, Korea, China and Japan are investing in research on evidence-based traditional medicines and scientific validation of fundamental principles. This review provides an account of studies addressing relationships between traditional medicine and genomics.

  4. Genomic signal processing