WorldWideScience

Sample records for genomic re-organization based

  1. Polymer models of chromosome (re)organization

    Science.gov (United States)

    Mirny, Leonid

    Chromosome Conformation Capture technique (Hi-C) provides comprehensive information about frequencies of spatial interactions between genomic loci. Inferring 3D organization of chromosomes from these data is a challenging biophysical problem. We develop a top-down approach to biophysical modeling of chromosomes. Starting with a minimal set of biologically motivated interactions we build ensembles of polymer conformations that can reproduce major features observed in Hi-C experiments. I will present our work on modeling organization of human metaphase and interphase chromosomes. Our works suggests that active processes of loop extrusion can be a universal mechanism responsible for formation of domains in interphase and chromosome compaction in metaphase.

  2. Reference Based Genome Compression

    CERN Document Server

    Chern, Bobbie; Manolakos, Alexandros; No, Albert; Venkat, Kartik; Weissman, Tsachy

    2012-01-01

    DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known reference genome. The proposed algorithm first generates a mapping from the reference to the target genome, and then compresses this mapping with an entropy coder. As an illustration of the performance: applying our algorithm to James Watson's genome with hg18 as a reference, we are able to reduce the 2991 megabyte (MB) genome down to 6.99 MB, while Gzip compresses it to 834.8 MB.

  3. Reference Based Genome Compression

    OpenAIRE

    Chern, Bobbie; Ochoa, Idoia; Manolakos, Alexandros; No, Albert; Venkat, Kartik; Weissman, Tsachy

    2012-01-01

    DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known reference genome. The proposed algorithm first generates a mapping from the reference to the target gen...

  4. BrucellaBase: Genome information resource.

    Science.gov (United States)

    Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Khader, L K M Abdul; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

    2016-09-01

    Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html.

  5. Re-organizing Universities for the Information Age

    Directory of Open Access Journals (Sweden)

    David Annand

    2007-11-01

    Full Text Available University education is still generally conducted within pre-Industrial Age organizational structures. As a result of their inability to evolve the predominant cohort-based classroom structure to more cost-effectively meet the aspirations of burgeoning worldwide populations for higher education, universities may see substantial organizational changes imposed on them over the next decades by external forces. Emergent forms of university organizational structures are examined that may affect this needed transformation.

  6. Cortical Development, Plasticity and Re-Organization in Children with Cochlear Implants

    Science.gov (United States)

    Sharma, Anu; Nash, Amy A.; Dorman, Michael

    2009-01-01

    A basic tenet of developmental neurobiology is that certain areas of the cortex will re-organize, if appropriate stimulation is withheld for long periods. Stimulation must be delivered to a sensory system within a narrow window of time (a sensitive period) if that system is to develop normally. In this article, we will describe age cut-offs for a…

  7. Genome chaos: survival strategy during crisis.

    Science.gov (United States)

    Liu, Guo; Stevens, Joshua B; Horne, Steven D; Abdallah, Batoul Y; Ye, Karen J; Bremer, Steven W; Ye, Christine J; Chen, David J; Heng, Henry H

    2014-01-01

    Genome chaos, a process of complex, rapid genome re-organization, results in the formation of chaotic genomes, which is followed by the potential to establish stable genomes. It was initially detected through cytogenetic analyses, and recently confirmed by whole-genome sequencing efforts which identified multiple subtypes including "chromothripsis", "chromoplexy", "chromoanasynthesis", and "chromoanagenesis". Although genome chaos occurs commonly in tumors, both the mechanism and detailed aspects of the process are unknown due to the inability of observing its evolution over time in clinical samples. Here, an experimental system to monitor the evolutionary process of genome chaos was developed to elucidate its mechanisms. Genome chaos occurs following exposure to chemotherapeutics with different mechanisms, which act collectively as stressors. Characterization of the karyotype and its dynamic changes prior to, during, and after induction of genome chaos demonstrates that chromosome fragmentation (C-Frag) occurs just prior to chaotic genome formation. Chaotic genomes seem to form by random rejoining of chromosomal fragments, in part through non-homologous end joining (NHEJ). Stress induced genome chaos results in increased karyotypic heterogeneity. Such increased evolutionary potential is demonstrated by the identification of increased transcriptome dynamics associated with high levels of karyotypic variance. In contrast to impacting on a limited number of cancer genes, re-organized genomes lead to new system dynamics essential for cancer evolution. Genome chaos acts as a mechanism of rapid, adaptive, genome-based evolution that plays an essential role in promoting rapid macroevolution of new genome-defined systems during crisis, which may explain some unwanted consequences of cancer treatment.

  8. Genome-based Taxonomic Classification of Bacteroidetes

    Directory of Open Access Journals (Sweden)

    Richard L. Hahnke

    2016-12-01

    Full Text Available The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.

  9. Genome-Based Taxonomic Classification of Bacteroidetes.

    Science.gov (United States)

    Hahnke, Richard L; Meier-Kolthoff, Jan P; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N; Woyke, Tanja; Kyrpides, Nikos C; Klenk, Hans-Peter; Göker, Markus

    2016-01-01

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.

  10. Visual Cross-Modal Re-Organization in Children with Cochlear Implants.

    Directory of Open Access Journals (Sweden)

    Julia Campbell

    Full Text Available Visual cross-modal re-organization is a neurophysiological process that occurs in deafness. The intact sensory modality of vision recruits cortical areas from the deprived sensory modality of audition. Such compensatory plasticity is documented in deaf adults and animals, and is related to deficits in speech perception performance in cochlear-implanted adults. However, it is unclear whether visual cross-modal re-organization takes place in cochlear-implanted children and whether it may be a source of variability contributing to speech and language outcomes. Thus, the aim of this study was to determine if visual cross-modal re-organization occurs in cochlear-implanted children, and whether it is related to deficits in speech perception performance.Visual evoked potentials (VEPs were recorded via high-density EEG in 41 normal hearing children and 14 cochlear-implanted children, aged 5-15 years, in response to apparent motion and form change. Comparisons of VEP amplitude and latency, as well as source localization results, were conducted between the groups in order to view evidence of visual cross-modal re-organization. Finally, speech perception in background noise performance was correlated to the visual response in the implanted children.Distinct VEP morphological patterns were observed in both the normal hearing and cochlear-implanted children. However, the cochlear-implanted children demonstrated larger VEP amplitudes and earlier latency, concurrent with activation of right temporal cortex including auditory regions, suggestive of visual cross-modal re-organization. The VEP N1 latency was negatively related to speech perception in background noise for children with cochlear implants.Our results are among the first to describe cross modal re-organization of auditory cortex by the visual modality in deaf children fitted with cochlear implants. Our findings suggest that, as a group, children with cochlear implants show evidence of visual cross

  11. GO4genome: A Prokaryotic Phylogeny Based on Genome Organization

    OpenAIRE

    Merkl, Rainer; Wiezer, Arnim

    2009-01-01

    Determining the phylogeny of closely related prokaryotes may fail in an analysis of rRNA or a small set of sequences. Whole-genome phylogeny utilizes the maximally available sample space. For a precise determination of genome similarity, two aspects have to be considered when developing an algorithm of whole-genome phylogeny: (1) gene order conservation is a more precise signal than gene content; and (2) when using sequence similarity, failures in identifying orthologues or the in situ replac...

  12. Microstructure of transcallosal motor fibers reflects type of cortical (re-)organization in congenital hemiparesis.

    Science.gov (United States)

    Juenger, Hendrik; Koerte, Inga K; Muehlmann, Marc; Mayinger, Michael; Mall, Volker; Krägeloh-Mann, Ingeborg; Shenton, Martha E; Berweck, Steffen; Staudt, Martin; Heinen, Florian

    2014-11-01

    Early unilateral brain lesions can lead to different types of corticospinal (re-)organization of motor networks. In one group of patients, the contralesional hemisphere exerts motor control not only over the contralateral non-paretic hand but also over the (ipsilateral) paretic hand, as the primary motor cortex is (re-)organized in the contralesional hemisphere. Another group of patients with early unilateral lesions shows "normal" contralateral motor projections starting in the lesioned hemisphere. We investigated how these different patterns of cortical (re-)organization affect interhemispheric transcallosal connectivity in patients with congenital hemiparesis. Eight patients with ipsilateral motor projections (group IPSI) versus 7 patients with contralateral motor projections (group CONTRA) underwent magnetic resonance diffusion tensor imaging (DTI). The corpus callosum (CC) was subdivided in 5 areas (I-V) in the mid-sagittal slice and volumetric information. The following diffusion parameters were calculated: fractional anisotropy (FA), trace, radial diffusivity (RD), and axial diffusivity (AD). DTI revealed significantly lower FA, increased trace and RD for group IPSI compared to group CONTRA in area III of the corpus callosum, where transcallosal motor fibers cross the CC. In the directly neighboring area IV, where transcallosal somatosensory fibers cross the CC, no differences were found for these DTI parameters between IPSI and CONTRA. Volume of callosal subsections showed significant differences for area II (connecting premotor cortices) and III, where group IPSI had lower volume. The results of this study demonstrate that the callosal microstructure in patients with congenital hemiparesis reflects the type of cortical (re-)organization. Early lesions disrupting corticospinal motor projections to the paretic hand consecutively affect the development or maintenance of transcallosal motor fibers. Copyright © 2014 European Paediatric Neurology Society

  13. Re-organization Impact on the Telekom Malaysias International Division Productivity

    OpenAIRE

    Ahasanul Haque; Ali Khatibi; Khaizura Karim

    2005-01-01

    International Divisions productivity was perceived to incline towards a downward trend which was contravened in 1996 re-organization objective of Telekom Malaysia. This study aims to analyze the root causes of this setback and recommend the solutions to improve the companys productivity. The method to diagnose the root cause was done through surveys and interviews. The data collection was carried out through questionnaire consisting of 85 questions. Total of 171 respondents from the internati...

  14. Experimental evidence of dynamic re-organization of evolving landscapes under changing climatic forcing

    Science.gov (United States)

    Singh, Arvind; Tejedor, Alejandro; Zaliapin, Ilya; Reinhardt, Liam; Foufoula-Georgiou, Efi

    2015-04-01

    The aim of this study is to better understand the dynamic re-organization of an evolving landscape under a scenario of changing climatic forcing for improving our knowledge of geomorphic transport laws under transient conditions and developing predictive models of landscape response to external perturbations. Real landscape observations for long-term analysis are limited and to this end a high resolution controlled laboratory experiment was conducted at the St. Anthony Falls laboratory at the University of Minnesota. Elevation data were collected at temporal resolution of 5 mins and spatial resolution of 0.5 mm as the landscape approached steady state (constant uplift and precipitation rate) and in the transient state (under the same uplift and 5x precipitation). The results reveal rapid topographic re-organization under a five-fold precipitation increase with the fluvial regime expanding into the previously debris dominated regime, accelerated erosion happening at hillslope scales, and rivers shifting from an erosion-limited to a transport-limited regime. From a connectivity and clustering analysis of the erosional and depositional events, we demonstrate the strikingly different spatial patterns of landscape evolution under steady-state (SS) and transient-state (TS), even when the time under SS is "stretched" compared to that under TS such as to match the total volume and PDF of erosional and depositional amounts. We quantify the spatial coupling of hillslopes and channels and demonstrate that hillslopes lead and channels follow in re-organizing the whole landscape under such an amplified precipitation regime.

  15. StellaBase: The Nematostella vectensis Genomics Database

    OpenAIRE

    James C Sullivan; Ryan, Joseph F; Watson, James A.; Webb, Jeramy; Mullikin, James C; Rokhsar, Daniel; Finnerty, John R

    2005-01-01

    StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions...

  16. Genomics-based plant germplasm research (GPGR

    Directory of Open Access Journals (Sweden)

    Jizeng Jia

    2017-04-01

    Full Text Available Plant germplasm underpins much of crop genetic improvement. Millions of germplasm accessions have been collected and conserved ex situ and/or in situ, and the major challenge is now how to exploit and utilize this abundant resource. Genomics-based plant germplasm research (GPGR or “Genoplasmics” is a novel cross-disciplinary research field that seeks to apply the principles and techniques of genomics to germplasm research. We describe in this paper the concept, strategy, and approach behind GPGR, and summarize current progress in the areas of the definition and construction of core collections, enhancement of germplasm with core collections, and gene discovery from core collections. GPGR is opening a new era in germplasm research. The contribution, progress and achievements of GPGR in the future are predicted.

  17. Transcriptional regulation by histone modifications: towards a theory of chromatin re-organization during stem cell differentiation.

    Science.gov (United States)

    Binder, Hans; Steiner, Lydia; Przybilla, Jens; Rohlf, Thimo; Prohaska, Sonja; Galle, Jörg

    2013-04-01

    Chromatin-related mechanisms, as e.g. histone modifications, are known to be involved in regulatory switches within the transcriptome. Only recently, mathematical models of these mechanisms have been established. So far they have not been applied to genome-wide data. We here introduce a mathematical model of transcriptional regulation by histone modifications and apply it to data of trimethylation of histone 3 at lysine 4 (H3K4me3) and 27 (H3K27me3) in mouse pluripotent and lineage-committed cells. The model describes binding of protein complexes to chromatin which are capable of reading and writing histone marks. Molecular interactions of the complexes with DNA and modified histones create a regulatory switch of transcriptional activity. The regulatory states of the switch depend on the activity of histone (de-) methylases, the strength of complex-DNA-binding and the number of nucleosomes capable of cooperatively contributing to complex-binding. Our model explains experimentally measured length distributions of modified chromatin regions. It suggests (i) that high CpG-density facilitates recruitment of the modifying complexes in embryonic stem cells and (ii) that re-organization of extended chromatin regions during lineage specification into neuronal progenitor cells requires targeted de-modification. Our approach represents a basic step towards multi-scale models of transcriptional control during development and lineage specification.

  18. PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Bansal Manju

    2011-07-01

    Full Text Available Abstract Background As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics. Findings Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability in the vicinity of all annotated translation start sites (TLS. Conclusion PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web http://nucleix.mbu.iisc.ernet.in/prombase/.

  19. GenColors-based comparative genome databases for small eukaryotic genomes.

    Science.gov (United States)

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  20. Cross-modal re-organization in adults with early stage hearing loss.

    Science.gov (United States)

    Campbell, Julia; Sharma, Anu

    2014-01-01

    Cortical cross-modal re-organization, or recruitment of auditory cortical areas for visual processing, has been well-documented in deafness. However, the degree of sensory deprivation necessary to induce such cortical plasticity remains unclear. We recorded visual evoked potentials (VEP) using high-density electroencephalography in nine persons with adult-onset mild-moderate hearing loss and eight normal hearing control subjects. Behavioral auditory performance was quantified using a clinical measure of speech perception-in-noise. Relative to normal hearing controls, adults with hearing loss showed significantly larger P1, N1, and P2 VEP amplitudes, decreased N1 latency, and a novel positive component (P2') following the P2 VEP. Current source density reconstruction of VEPs revealed a shift toward ventral stream processing including activation of auditory temporal cortex in hearing-impaired adults. The hearing loss group showed worse than normal speech perception performance in noise, which was strongly correlated with a decrease in the N1 VEP latency. Overall, our findings provide the first evidence that visual cross-modal re-organization not only begins in the early stages of hearing impairment, but may also be an important factor in determining behavioral outcomes for listeners with hearing loss, a finding which demands further investigation.

  1. Cross-modal re-organization in adults with early stage hearing loss.

    Directory of Open Access Journals (Sweden)

    Julia Campbell

    Full Text Available Cortical cross-modal re-organization, or recruitment of auditory cortical areas for visual processing, has been well-documented in deafness. However, the degree of sensory deprivation necessary to induce such cortical plasticity remains unclear. We recorded visual evoked potentials (VEP using high-density electroencephalography in nine persons with adult-onset mild-moderate hearing loss and eight normal hearing control subjects. Behavioral auditory performance was quantified using a clinical measure of speech perception-in-noise. Relative to normal hearing controls, adults with hearing loss showed significantly larger P1, N1, and P2 VEP amplitudes, decreased N1 latency, and a novel positive component (P2' following the P2 VEP. Current source density reconstruction of VEPs revealed a shift toward ventral stream processing including activation of auditory temporal cortex in hearing-impaired adults. The hearing loss group showed worse than normal speech perception performance in noise, which was strongly correlated with a decrease in the N1 VEP latency. Overall, our findings provide the first evidence that visual cross-modal re-organization not only begins in the early stages of hearing impairment, but may also be an important factor in determining behavioral outcomes for listeners with hearing loss, a finding which demands further investigation.

  2. Genomics and public health: development of Web-based training tools for increasing genomic awareness.

    Science.gov (United States)

    Bodzin, Jennifer; Kardia, Sharon L R; Goldenberg, Aaron; Raup, Sarah F; Bach, Janice V; Citrin, Toby

    2005-04-01

    In 2001, the Centers for Disease Control and Prevention funded three Centers for Genomics and Public Health to develop training tools for increasing genomic awareness. Over the past three years, the centers, working together with the Centers for Disease Control and Prevention's Office of Genomics and Disease Prevention, have developed tools to increase awareness of the impact genomics will have on public health practice, to provide a foundation for understanding basic genomic advances, and to translate the relevance of that information to public health practitioners' own work. These training tools serve to communicate genomic advances and their potential for integration into public heath practice. This paper highlights two of these training tools: 1) Genomics for Public Health Practitioners: The Practical Application of Genomics in Public Health Practice, a Web-based introduction to genomics, and 2) Six Weeks to Genomic Awareness, an in-depth training module on public health genomics. This paper focuses on the processes and collaborative efforts by which these live presentations were developed and delivered as Web-based training sessions.

  3. CyanoBase: the cyanobacteria genome database update 2010.

    Science.gov (United States)

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.

  4. Necessity of Re-Organization of Turkish Agricultural Higher Education System

    Directory of Open Access Journals (Sweden)

    P. Ulger

    2006-01-01

    Full Text Available Education in Agricultural Faculties has been changing from beginning to these days in Turkey. However, education in agriculture always contains all area of agriculture and students are awarded the same diploma which is “Agricultural Engineer” after four years. By means of science and technology has been developing, agricultural applications also have been changing recently. This resulted in increasing of requirements from agricultural engineer in respect of both wide and deep knowledge about agriculture. In this study it was aimed that clarification of history of agricultural higher education from the beginning till now in Turkey and agricultural higher education systems in some developed countries. Necessity of re-organization of Turkish agricultural higher education system was also discussed and some recommendations about this theme were given.

  5. Cross-Modal Re-Organization in Clinical Populations with Hearing Loss

    Directory of Open Access Journals (Sweden)

    Anu Sharma

    2016-01-01

    Full Text Available We review evidence for cross-modal cortical re-organization in clinical populations with hearing loss. Cross-modal plasticity refers to the ability for an intact sensory modality (e.g., vision or somatosensation to recruit cortical brain regions from a deprived sensory modality (e.g., audition to carry out sensory processing. We describe evidence for cross-modal changes in hearing loss across the age-spectrum and across different degrees of hearing impairment, including children with profound, bilateral deafness with cochlear implants, single-sided deafness before and after cochlear implantation, and adults with early-stage, mild-moderate, age-related hearing loss. Understanding cross-modal plasticity in the context of auditory deprivation, and the potential for reversal of these changes following intervention, may be vital in directing intervention and rehabilitation options for clinical populations with hearing loss.

  6. Coordinates and intervals in graph-based reference genomes.

    Science.gov (United States)

    Rand, Knut D; Grytten, Ivar; Nederbragt, Alexander J; Storvik, Geir O; Glad, Ingrid K; Sandve, Geir K

    2017-05-18

    It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes. We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable. More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph . An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords .

  7. Ontology-Based Search of Genomic Metadata.

    Science.gov (United States)

    Fernandez, Javier D; Lenzerini, Maurizio; Masseroli, Marco; Venco, Francesco; Ceri, Stefano

    2016-01-01

    The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.

  8. CRISPR/Cas9 based genome editing of Penicillium chrysogenum

    NARCIS (Netherlands)

    Pohl, Carsten; Kiel, Jan A K W; Driessen, Arnold J M; Bovenberg, Roel A L; Nygård, Yvonne

    2016-01-01

    CRISPR/Cas9 based systems have emerged as versatile platforms for precision genome editing in a wide range of organisms. Here we have developed powerful CRISPR/Cas9 tools for marker-based and marker-free genome modifications in Penicillium chrysogenum, a model filamentous fungus and industrially rel

  9. Hellbender genome sequences shed light on genomic expansion at the base of crown salamanders.

    Science.gov (United States)

    Sun, Cheng; Mueller, Rachel Lockridge

    2014-07-01

    Among animals, genome sizes range from 20 Mb to 130 Gb, with 380-fold variation across vertebrates. Most of the largest vertebrate genomes are found in salamanders, an amphibian clade of 660 species. Thus, salamanders are an important system for studying causes and consequences of genomic gigantism. Previously, we showed that plethodontid salamander genomes accumulate higher levels of long terminal repeat (LTR) retrotransposons than do other vertebrates, although the evolutionary origins of such sequences remained unexplored. We also showed that some salamanders in the family Plethodontidae have relatively slow rates of DNA loss through small insertions and deletions. Here, we present new data from Cryptobranchus alleganiensis, the hellbender. Cryptobranchus and Plethodontidae span the basal phylogenetic split within salamanders; thus, analyses incorporating these taxa can shed light on the genome of the ancestral crown salamander lineage, which underwent expansion. We show that high levels of LTR retrotransposons likely characterize all crown salamanders, suggesting that disproportionate expansion of this transposable element (TE) class contributed to genomic expansion. Phylogenetic and age distribution analyses of salamander LTR retrotransposons indicate that salamanders' high TE levels reflect persistence and diversification of ancestral TEs rather than horizontal transfer events. Finally, we show that relatively slow DNA loss rates through small indels likely characterize all crown salamanders, suggesting that a decreased DNA loss rate contributed to genomic expansion at the clade's base. Our identification of shared genomic features across phylogenetically distant salamanders is a first step toward identifying the evolutionary processes underlying accumulation and persistence of high levels of repetitive sequence in salamander genomes.

  10. StellaBase: the Nematostella vectensis Genomics Database.

    Science.gov (United States)

    Sullivan, James C; Ryan, Joseph F; Watson, James A; Webb, Jeramy; Mullikin, James C; Rokhsar, Daniel; Finnerty, John R

    2006-01-01

    StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions. Data provided by these searches will elucidate gene family evolution in early animals. Unique research tools, including a Nematostella genetic stock library, a primer library, a literature repository and a gene expression library will provide support to the burgeoning Nematostella research community. The development of StellaBase accompanies significant upgrades to CnidBase, the Cnidarian Evolutionary Genomics Database. With the completion of the first sequenced cnidarian genome, genome comparison tools have been added to CnidBase. In addition, StellaBase provides a framework for the integration of additional species-specific databases into CnidBase. StellaBase is available at http://www.stellabase.org.

  11. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan;

    2014-01-01

    than 1 kb. Excluding the 59 SVs (54 insertions/deletions, 5 inversions) that overlap with N-base gaps in the reference assembly hg19, 666 non-gap SVs remained, and 396 of them (60%) were verified by paired-end data from whole-genome sequencing-based re-sequencing or de novo assembly sequence from...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...

  12. 基于业务流程再造理论的家庭医生团队分工协作管理流程重组研究%Re _ organization of Labor Division and Collaboration Management ProCess of Family DoCtor Teams Based on the Theory of Business ProCess Reengineering

    Institute of Scientific and Technical Information of China (English)

    王红伟; 杨文秀; 骆达

    2015-01-01

    本文以业务流程再造(BPR)为理论基础,对天津市 H 区辖属的1个社区卫生服务中心的家庭医生团队原分工协作管理流程进行分析发现,样本团队成员彼此责任不明确,专业化分工不够,同时团队成员间的协作性不强,服务效率不高。应积极转变思路,推行新的家庭医生团队流程化管理模式,同时调整团队内部分工,对需合作的服务项目进行细化分工,强化团队成员的协作网络管理,加强团队之间、团队成员之间的沟通协作。%Based on the theory of business process reengineering(BPR),the article analyzed the labor division and collaboration management process of family doctor team in one community health center of H community,Tianjin. The result showed that the each other responsibility of the sample team members was vague,the specialization division of labor was low, collaboration between team members was loose and the service was inefficient. Therefore,we should actively change our thinking and look for new processed management model of family doctor team. At the same time we should adjust the team internal division of labor,make a detailed division of labor in the service items in need of cooperation,strengthen the team membersˊ cooperation network and the communication and cooperation between the teams and between team members.

  13. wFleaBase: the Daphnia genome database

    Directory of Open Access Journals (Sweden)

    Singan Vasanth R

    2005-03-01

    Full Text Available Abstract Background wFleaBase is a database with the necessary infrastructure to curate, archive and share genetic, molecular and functional genomic data and protocols for an emerging model organism, the microcrustacean Daphnia. Commonly known as the water-flea, Daphnia's ecological merit is unequaled among metazoans, largely because of its sentinel role within freshwater ecosystems and over 200 years of biological investigations. By consequence, the Daphnia Genomics Consortium (DGC has launched an interdisciplinary research program to create the resources needed to study genes that affect ecological and evolutionary success in natural environments. Discussion These tools include the genome database wFleaBase, which currently contains functions to search and extract information from expressed sequenced tags, genome survey sequences and full genome sequencing projects. This new database is built primarily from core components of the Generic Model Organism Database project, and related bioinformatics tools. Summary Over the coming year, preliminary genetic maps and the nearly complete genomic sequence of Daphnia pulex will be integrated into wFleaBase, including gene predictions and ortholog assignments based on sequence similarities with eukaryote genes of known function. wFleaBase aims to serve a large ecological and evolutionary research community. Our challenge is to rapidly expand its content and to ultimately integrate genetic and functional genomic information with population-level responses to environmental challenges. URL: http://wfleabase.org/.

  14. A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes

    Directory of Open Access Journals (Sweden)

    Michael Strong

    2009-12-01

    Full Text Available As the number of completely sequenced microbial genomes continues to rise at an impressive rate, it is important to prepare students with the skills necessary to investigate microorganisms at the genomic level. As a part of the core curriculum for first-year graduate students in the biological sciences, we have implemented a web-based tutorial to introduce students to the fields of comparative and functional genomics. The tutorial focuses on recent computational methods for identifying functionally linked genes and proteins on a genome-wide scale and was used to introduce students to the Rosetta Stone, Phylogenetic Profile, conserved Gene Neighbor, and Operon computational methods. Students learned to use a number of publicly available web servers and databases to identify functionally linked genes in the Escherichia coli genome, with emphasis on genome organization and operon structure. The overall effectiveness of the tutorial was assessed based on student evaluations and homework assignments. The tutorial is available to other educators at http://www.doe-mbi.ucla.edu/~strong/m253.php.

  15. AgBase: a functional genomics resource for agriculture

    Directory of Open Access Journals (Sweden)

    Hill David P

    2006-09-01

    Full Text Available Abstract Background Many agricultural species and their pathogens have sequenced genomes and more are in progress. Agricultural species provide food, fiber, xenotransplant tissues, biopharmaceuticals and biomedical models. Moreover, many agricultural microorganisms are human zoonoses. However, systems biology from functional genomics data is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation and agricultural research communities are smaller with limited funding compared to many model organism communities. Description To facilitate systems biology in these traditionally agricultural species we have established "AgBase", a curated, web-accessible, public resource http://www.agbase.msstate.edu for structural and functional annotation of agricultural genomes. The AgBase database includes a suite of computational tools to use GO annotations. We use standardized nomenclature following the Human Genome Organization Gene Nomenclature guidelines and are currently functionally annotating chicken, cow and sheep gene products using the Gene Ontology (GO. The computational tools we have developed accept and batch process data derived from different public databases (with different accession codes, return all existing GO annotations, provide a list of products without GO annotation, identify potential orthologs, model functional genomics data using GO and assist proteomics analysis of ESTs and EST assemblies. Our journal database helps prevent redundant manual GO curation. We encourage and publicly acknowledge GO annotations from researchers and provide a service for researchers interested in GO and analysis of functional genomics data. Conclusion The AgBase database is the first database dedicated to functional genomics and systems biology analysis for agriculturally important species and their pathogens. We use experimental data to improve structural annotation of genomes and to

  16. Network Based Prediction Model for Genomics Data Analysis*

    OpenAIRE

    Huang, Ying; Wang, Pei

    2012-01-01

    Biological networks, such as genetic regulatory networks and protein interaction networks, provide important information for studying gene/protein activities. In this paper, we propose a new method, NetBoosting, for incorporating a priori biological network information in analyzing high dimensional genomics data. Specially, we are interested in constructing prediction models for disease phenotypes of interest based on genomics data, and at the same time identifying disease susceptible genes. ...

  17. Array-based comparative genomic hybridization for genome-wide screening of DNA copy number in bladder tumors.

    NARCIS (Netherlands)

    Veltman, J.A.; Fridlyand, J.; Pejavar, S.; Olshen, A.B.; Korkola, J.E.; Vries, S. de; Carroll, P.; Kuo, W.L.; Pinkel, D.; Albertson, D.; Cordon-Cardo, C.; Jain, A.N.; Waldman, F.M.

    2003-01-01

    Genome-wide copy number profiles were characterized in 41 primary bladder tumors using array-based comparative genomic hybridization (array CGH). In addition to previously identified alterations in large chromosomal regions, alterations were identified in many small genomic regions, some with high-l

  18. Genomic comparisons of Brucella spp. and closely related bacteria using base compositional and proteome based methods

    DEFF Research Database (Denmark)

    Bohlin, Jon; Snipen, Lars; Cloeckaert, Axel

    2010-01-01

    , genomic codon and amino acid frequencies based comparisons) and proteomes (all-against-all BLAST protein comparisons and pan-genomic analyses). RESULTS: We found that the oligonucleotide based methods gave different results compared to that of the proteome based methods. Differences were also found...... than proteome comparisons between species in genus Brucella and genus Ochrobactrum. Pan-genomic analyses indicated that uptake of DNA from outside genus Brucella appears to be limited. CONCLUSIONS: While both the proteome based methods and the Markov chain based genomic signatures were able to reflect...

  19. Bacterial Recombineering: Genome Engineering via Phage-Based Homologous Recombination.

    Science.gov (United States)

    Pines, Gur; Freed, Emily F; Winkler, James D; Gill, Ryan T

    2015-11-20

    The ability to specifically modify bacterial genomes in a precise and efficient manner is highly desired in various fields, ranging from molecular genetics to metabolic engineering and synthetic biology. Much has changed from the initial realization that phage-derived genes may be employed for such tasks to today, where recombineering enables complex genetic edits within a genome or a population. Here, we review the major developments leading to recombineering becoming the method of choice for in situ bacterial genome editing while highlighting the various applications of recombineering in pushing the boundaries of synthetic biology. We also present the current understanding of the mechanism of recombineering. Finally, we discuss in detail issues surrounding recombineering efficiency and future directions for recombineering-based genome editing.

  20. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......BACKGROUND: Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point...... mapping technology as a comprehensive and cost-effective method for detecting structural variation and studying complex regions in the human genome, as well as deciphering viral integration into the host genome....

  1. Global genomic diversity of human papillomavirus 6 based on 724 isolates and 190 complete genome sequences.

    Science.gov (United States)

    Jelen, Mateja M; Chen, Zigui; Kocjan, Boštjan J; Burt, Felicity J; Chan, Paul K S; Chouhy, Diego; Combrinck, Catharina E; Coutlée, François; Estrade, Christine; Ferenczy, Alex; Fiander, Alison; Franco, Eduardo L; Garland, Suzanne M; Giri, Adriana A; González, Joaquín Víctor; Gröning, Arndt; Heidrich, Kerstin; Hibbitts, Sam; Hošnjak, Lea; Luk, Tommy N M; Marinic, Karina; Matsukura, Toshihiko; Neumann, Anna; Oštrbenk, Anja; Picconi, Maria Alejandra; Richardson, Harriet; Sagadin, Martin; Sahli, Roland; Seedat, Riaz Y; Seme, Katja; Severini, Alberto; Sinchi, Jessica L; Smahelova, Jana; Tabrizi, Sepehr N; Tachezy, Ruth; Tohme, Sarah; Uloza, Virgilijus; Vitkauskiene, Astra; Wong, Yong Wee; Zidovec Lepej, Snježana; Burk, Robert D; Poljak, Mario

    2014-07-01

    Human papillomavirus type 6 (HPV6) is the major etiological agent of anogenital warts and laryngeal papillomas and has been included in both the quadrivalent and nonavalent prophylactic HPV vaccines. This study investigated the global genomic diversity of HPV6, using 724 isolates and 190 complete genomes from six continents, and the association of HPV6 genomic variants with geographical location, anatomical site of infection/disease, and gender. Initially, a 2,800-bp E5a-E5b-L1-LCR fragment was sequenced from 492/530 (92.8%) HPV6-positive samples collected for this study. Among them, 130 exhibited at least one single nucleotide polymorphism (SNP), indel, or amino acid change in the E5a-E5b-L1-LCR fragment and were sequenced in full. A global alignment and maximum likelihood tree of 190 complete HPV6 genomes (130 fully sequenced in this study and 60 obtained from sequence repositories) revealed two variant lineages, A and B, and five B sublineages: B1, B2, B3, B4, and B5. HPV6 (sub)lineage-specific SNPs and a 960-bp representative region for whole-genome-based phylogenetic clustering within the L2 open reading frame were identified. Multivariate logistic regression analysis revealed that lineage B predominated globally. Sublineage B3 was more common in Africa and North and South America, and lineage A was more common in Asia. Sublineages B1 and B3 were associated with anogenital infections, indicating a potential lesion-specific predilection of some HPV6 sublineages. Females had higher odds for infection with sublineage B3 than males. In conclusion, a global HPV6 phylogenetic analysis revealed the existence of two variant lineages and five sublineages, showing some degree of ethnogeographic, gender, and/or disease predilection in their distribution. This study established the largest database of globally circulating HPV6 genomic variants and contributed a total of 130 new, complete HPV6 genome sequences to available sequence repositories. Two HPV6 variant lineages

  2. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models

    Directory of Open Access Journals (Sweden)

    Surovcik Katharina

    2006-03-01

    Full Text Available Abstract Background Horizontal gene transfer (HGT is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired

  3. Genomic profiling of oral squamous cell carcinoma by array-based comparative genomic hybridization.

    Directory of Open Access Journals (Sweden)

    Shunichi Yoshioka

    Full Text Available We designed a study to investigate genetic relationships between primary tumors of oral squamous cell carcinoma (OSCC and their lymph node metastases, and to identify genomic copy number aberrations (CNAs related to lymph node metastasis. For this purpose, we collected a total of 42 tumor samples from 25 patients and analyzed their genomic profiles by array-based comparative genomic hybridization. We then compared the genetic profiles of metastatic primary tumors (MPTs with their paired lymph node metastases (LNMs, and also those of LNMs with non-metastatic primary tumors (NMPTs. Firstly, we found that although there were some distinctive differences in the patterns of genomic profiles between MPTs and their paired LNMs, the paired samples shared similar genomic aberration patterns in each case. Unsupervised hierarchical clustering analysis grouped together 12 of the 15 MPT-LNM pairs. Furthermore, similarity scores between paired samples were significantly higher than those between non-paired samples. These results suggested that MPTs and their paired LNMs are composed predominantly of genetically clonal tumor cells, while minor populations with different CNAs may also exist in metastatic OSCCs. Secondly, to identify CNAs related to lymph node metastasis, we compared CNAs between grouped samples of MPTs and LNMs, but were unable to find any CNAs that were more common in LNMs. Finally, we hypothesized that subpopulations carrying metastasis-related CNAs might be present in both the MPT and LNM. Accordingly, we compared CNAs between NMPTs and LNMs, and found that gains of 7p, 8q and 17q were more common in the latter than in the former, suggesting that these CNAs may be involved in lymph node metastasis of OSCC. In conclusion, our data suggest that in OSCCs showing metastasis, the primary and metastatic tumors share similar genomic profiles, and that cells in the primary tumor may tend to metastasize after acquiring metastasis-associated CNAs.

  4. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  5. WormBase 2016: expanding to enable helminth genomic research

    Science.gov (United States)

    Howe, Kevin L.; Bolt, Bruce J.; Cain, Scott; Chan, Juancarlos; Chen, Wen J.; Davis, Paul; Done, James; Down, Thomas; Gao, Sibyl; Grove, Christian; Harris, Todd W.; Kishore, Ranjana; Lee, Raymond; Lomax, Jane; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Stanley, Eleanor; Tuli, Mary Ann; Van Auken, Kimberly; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wright, Adam; Yook, Karen; Berriman, Matthew; Kersey, Paul; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W.

    2016-01-01

    WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research. PMID:26578572

  6. Changing Histopathological Diagnostics by Genome-Based Tumor Classification

    Directory of Open Access Journals (Sweden)

    Michael Kloth

    2014-05-01

    Full Text Available Traditionally, tumors are classified by histopathological criteria, i.e., based on their specific morphological appearances. Consequently, current therapeutic decisions in oncology are strongly influenced by histology rather than underlying molecular or genomic aberrations. The increase of information on molecular changes however, enabled by the Human Genome Project and the International Cancer Genome Consortium as well as the manifold advances in molecular biology and high-throughput sequencing techniques, inaugurated the integration of genomic information into disease classification. Furthermore, in some cases it became evident that former classifications needed major revision and adaption. Such adaptations are often required by understanding the pathogenesis of a disease from a specific molecular alteration, using this molecular driver for targeted and highly effective therapies. Altogether, reclassifications should lead to higher information content of the underlying diagnoses, reflecting their molecular pathogenesis and resulting in optimized and individual therapeutic decisions. The objective of this article is to summarize some particularly important examples of genome-based classification approaches and associated therapeutic concepts. In addition to reviewing disease specific markers, we focus on potentially therapeutic or predictive markers and the relevance of molecular diagnostics in disease monitoring.

  7. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  8. Accurate genome relative abundance estimation based on shotgun metagenomic reads.

    Directory of Open Access Journals (Sweden)

    Li C Xia

    Full Text Available Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy. GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

  9. INTEGRATED GENOME-BASED STUDIES OF SHEWANELLA ECOPHYSIOLOGY

    Energy Technology Data Exchange (ETDEWEB)

    TIEDJE, JAMES M; KONSTANTINIDIS, KOSTAS; WORDEN, MARK

    2014-01-08

    The aim of the work reported is to study Shewanella population genomics, and to understand the evolution, ecophysiology, and speciation of Shewanella. The tasks supporting this aim are: to study genetic and ecophysiological bases defining the core and diversification of Shewanella species; to determine gene content patterns along redox gradients; and to Investigate the evolutionary processes, patterns and mechanisms of Shewanella.

  10. Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome

    Science.gov (United States)

    CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of human, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific t...

  11. Impulsive Neural Networks Algorithm Based on the Artificial Genome Model

    Directory of Open Access Journals (Sweden)

    Yuan Gao

    2014-05-01

    Full Text Available To describe gene regulatory networks, this article takes the framework of the artificial genome model and proposes impulsive neural networks algorithm based on the artificial genome model. Firstly, the gene expression and the cell division tree are applied to generate spiking neurons with specific attributes, neural network structure, connection weights and specific learning rules of each neuron. Next, the gene segment duplications and divergence model are applied to design the evolutionary algorithm of impulsive neural networks at the level of the artificial genome. The dynamic changes of developmental gene regulatory networks are controlled during the whole evolutionary process. Finally, the behavior of collecting food for autonomous intelligent agent is simulated, which is driven by nerves. Experimental results demonstrate that the algorithm in this article has the evolutionary ability on large-scale impulsive neural networks

  12. The effect of genealogy-based haplotypes on genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid; Fernando, Rohan L.; Su, Guosheng

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression...... on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using...... local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (pi) of the haplotype covariates had zero effect...

  13. CFGP: a web-based, comparative fungal genomics platform.

    Science.gov (United States)

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  14. Genomic-based-breeding tools for tropical maize improvement.

    Science.gov (United States)

    Chakradhar, Thammineni; Hindu, Vemuri; Reddy, Palakolanu Sudhakar

    2017-09-05

    Maize has traditionally been the main staple diet in the Southern Asia and Sub-Saharan Africa and widely grown by millions of resource poor small scale farmers. Approximately, 35.4 million hectares are sown to tropical maize, constituting around 59% of the developing worlds. Tropical maize encounters tremendous challenges besides poor agro-climatic situations with average yields recorded <3 tones/hectare that is far less than the average of developed countries. On the contrary to poor yields, the demand for maize as food, feed, and fuel is continuously increasing in these regions. Heterosis breeding introduced in early 90 s improved maize yields significantly, but genetic gains is still a mirage, particularly for crop growing under marginal environments. Application of molecular markers has accelerated the pace of maize breeding to some extent. The availability of array of sequencing and genotyping technologies offers unrivalled service to improve precision in maize-breeding programs through modern approaches such as genomic selection, genome-wide association studies, bulk segregant analysis-based sequencing approaches, etc. Superior alleles underlying complex traits can easily be identified and introgressed efficiently using these sequence-based approaches. Integration of genomic tools and techniques with advanced genetic resources such as nested association mapping and backcross nested association mapping could certainly address the genetic issues in maize improvement programs in developing countries. Huge diversity in tropical maize and its inherent capacity for doubled haploid technology offers advantage to apply the next generation genomic tools for accelerating production in marginal environments of tropical and subtropical world. Precision in phenotyping is the key for success of any molecular-breeding approach. This article reviews genomic technologies and their application to improve agronomic traits in tropical maize breeding has been reviewed in

  15. CRISPR/Cas9 Based Genome Editing of Penicillium chrysogenum.

    Science.gov (United States)

    Pohl, C; Kiel, J A K W; Driessen, A J M; Bovenberg, R A L; Nygård, Y

    2016-07-15

    CRISPR/Cas9 based systems have emerged as versatile platforms for precision genome editing in a wide range of organisms. Here we have developed powerful CRISPR/Cas9 tools for marker-based and marker-free genome modifications in Penicillium chrysogenum, a model filamentous fungus and industrially relevant cell factory. The developed CRISPR/Cas9 toolbox is highly flexible and allows editing of new targets with minimal cloning efforts. The Cas9 protein and the sgRNA can be either delivered during transformation, as preassembled CRISPR-Cas9 ribonucleoproteins (RNPs) or expressed from an AMA1 based plasmid within the cell. The direct delivery of the Cas9 protein with in vitro synthesized sgRNA to the cells allows for a transient method for genome engineering that may rapidly be applicable for other filamentous fungi. The expression of Cas9 from an AMA1 based vector was shown to be highly efficient for marker-free gene deletions.

  16. Genomics and Public Health: Development of Web-based Training Tools for Increasing Genomic Awareness

    OpenAIRE

    Kardia, Sharon LR; Bodzin, Jennifer; Goldenberg, Aaron; Citrin, Toby; Raup, Sarah F; Bach, Janice V

    2005-01-01

    In 2001, the Centers for Disease Control and Prevention funded three Centers for Genomics and Public Health to develop training tools for increasing genomic awareness. Over the past three years, the centers, working together with the Centers for Disease Control and Prevention's Office of Genomics and Disease Prevention, have developed tools to increase awareness of the impact genomics will have on public health practice, to provide a foundation for understanding basic genomic advances, and to...

  17. Integrated Genome-Based Studies of Shewanella Echophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Margrethe H. Serres

    2012-06-29

    Shewanella oneidensis MR-1 is a motile, facultative {gamma}-Proteobacterium with remarkable respiratory versatility; it can utilize a range of organic and inorganic compounds as terminal electronacceptors for anaerobic metabolism. The ability to effectively reduce nitrate, S0, polyvalent metals andradionuclides has established MR-1 as an important model dissimilatory metal-reducing microorganism for genome-based investigations of biogeochemical transformation of metals and radionuclides that are of concern to the U.S. Department of Energy (DOE) sites nationwide. Metal-reducing bacteria such as Shewanella also have a highly developed capacity for extracellular transfer of respiratory electrons to solid phase Fe and Mn oxides as well as directly to anode surfaces in microbial fuel cells. More broadly, Shewanellae are recognized free-living microorganisms and members of microbial communities involved in the decomposition of organic matter and the cycling of elements in aquatic and sedimentary systems. To function and compete in environments that are subject to spatial and temporal environmental change, Shewanella must be able to sense and respond to such changes and therefore require relatively robust sensing and regulation systems. The overall goal of this project is to apply the tools of genomics, leveraging the availability of genome sequence for 18 additional strains of Shewanella, to better understand the ecophysiology and speciation of respiratory-versatile members of this important genus. To understand these systems we propose to use genome-based approaches to investigate Shewanella as a system of integrated networks; first describing key cellular subsystems - those involved in signal transduction, regulation, and metabolism - then building towards understanding the function of whole cells and, eventually, cells within populations. As a general approach, this project will employ complimentary "top-down" - bioinformatics-based genome functional predictions, high

  18. Whole genome sequence-based serogrouping of Listeria monocytogenes isolates.

    Science.gov (United States)

    Hyden, Patrick; Pietzka, Ariane; Lennkh, Anna; Murer, Andrea; Springer, Burkhard; Blaschitz, Marion; Indra, Alexander; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner; Sensen, Christoph W

    2016-10-10

    Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  19. Genomic and epigenetic insights into the molecular bases of heterosis.

    Science.gov (United States)

    Chen, Z Jeffrey

    2013-07-01

    Heterosis, also known as hybrid vigour, is widespread in plants and animals, but the molecular bases for this phenomenon remain elusive. Recent studies in hybrids and allopolyploids using transcriptomic, proteomic, metabolomic, epigenomic and systems biology approaches have provided new insights. Emerging genomic and epigenetic perspectives suggest that heterosis arises from allelic interactions between parental genomes, leading to altered programming of genes that promote the growth, stress tolerance and fitness of hybrids. For example, epigenetic modifications of key regulatory genes in hybrids and allopolyploids can alter complex regulatory networks of physiology and metabolism, thus modulating biomass and leading to heterosis. The conceptual advances could help to improve plant and animal productivity through the manipulation of heterosis.

  20. Genome-based bioprospecting of microbes for new therapeutics.

    Science.gov (United States)

    Zotchev, Sergey B; Sekurova, Olga N; Katz, Leonard

    2012-12-01

    Bioprospecting of natural sources for new medicines has a long and successful history, exemplified by the fact that over 50% of all drugs currently on the market are either derived from or inspired by natural products. However, development of new natural product-based therapeutics has been on the decline over the past 20 years, mainly owing to frequent re-discovery of already known compounds coupled with high costs for screening, characterization and development. With the onset of the genomic era allowing rapid sequencing and analysis of bacterial and fungal genomes, it became evident that these organisms possess 'hidden treasures' in the form of gene clusters potentially governing biosynthesis of novel biologically active compounds. This review highlights current progress in mining for and expression of these gene clusters, which may revolutionize the drug discovery pipelines in the near future.

  1. RAPD-based screening of genomic libraries for positional cloning.

    Science.gov (United States)

    Dioh, W; Tharreau, D; Lebrun, M H

    1997-12-15

    RAPD markers are frequently used for positional cloning. However, RAPD markers often contain repeated sequences which prevent genomic library screening by hybridisation. We have developed a simple RAPD analysis of genomic libraries based on the identification of cosmid pools and clones amplifying the RAPD marker of interest. Our method does not require the cloning or characterisation of the RAPD marker as it relies on the analysis of cosmid pools or clones using a simple RAPD protocol. We applied this strategy using four RAPD markers composed of single copy or repeated sequences linked to avirulence genes of the rice blast fungus Magnaporthe grisea . Cosmids containing these RAPD markers were easily and rapidly identified allowing the construction of physical contigs at these loci.

  2. Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments

    Directory of Open Access Journals (Sweden)

    Tcherepanov Vasily

    2004-07-01

    Full Text Available Abstract Background With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes is not feasible without new bioinformatics tools. Results A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1 rapidly identify and correct alignment errors in large, multiple genome alignments; and 2 generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs to retrieve detailed annotation information about the aligned genomes or use information from text files. Conclusion Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

  3. Genomic comparisons of Brucella spp. and closely related bacteria using base compositional and proteome based methods

    Science.gov (United States)

    2010-01-01

    Background Classification of bacteria within the genus Brucella has been difficult due in part to considerable genomic homogeneity between the different species and biovars, in spite of clear differences in phenotypes. Therefore, many different methods have been used to assess Brucella taxonomy. In the current work, we examine 32 sequenced genomes from genus Brucella representing the six classical species, as well as more recently described species, using bioinformatical methods. Comparisons were made at the level of genomic DNA using oligonucleotide based methods (Markov chain based genomic signatures, genomic codon and amino acid frequencies based comparisons) and proteomes (all-against-all BLAST protein comparisons and pan-genomic analyses). Results We found that the oligonucleotide based methods gave different results compared to that of the proteome based methods. Differences were also found between the oligonucleotide based methods used. Whilst the Markov chain based genomic signatures grouped the different species in genus Brucella according to host preference, the codon and amino acid frequencies based methods reflected small differences between the Brucella species. Only minor differences could be detected between all genera included in this study using the codon and amino acid frequencies based methods. Proteome comparisons were found to be in strong accordance with current Brucella taxonomy indicating a remarkable association between gene gain or loss on one hand and mutations in marker genes on the other. The proteome based methods found greater similarity between Brucella species and Ochrobactrum species than between species within genus Agrobacterium compared to each other. In other words, proteome comparisons of species within genus Agrobacterium were found to be more diverse than proteome comparisons between species in genus Brucella and genus Ochrobactrum. Pan-genomic analyses indicated that uptake of DNA from outside genus Brucella appears to be

  4. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    Directory of Open Access Journals (Sweden)

    Luo Ming-Cheng

    2011-01-01

    Full Text Available Abstract Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA

  5. Integrated Genome-Based Studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Andrei L. Osterman, Ph.D.

    2012-12-17

    Integration of bioinformatics and experimental techniques was applied to mapping and characterization of the key components (pathways, enzymes, transporters, regulators) of the core metabolic machinery in Shewanella oneidensis and related species with main focus was on metabolic and regulatory pathways involved in utilization of various carbon and energy sources. Among the main accomplishments reflected in ten joint publications with other participants of Shewanella Federation are: (i) A systems-level reconstruction of carbohydrate utilization pathways in the genus of Shewanella (19 species). This analysis yielded reconstruction of 18 sugar utilization pathways including 10 novel pathway variants and prediction of > 60 novel protein families of enzymes, transporters and regulators involved in these pathways. Selected functional predictions were verified by focused biochemical and genetic experiments. Observed growth phenotypes were consistent with bioinformatic predictions providing strong validation of the technology and (ii) Global genomic reconstruction of transcriptional regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors, 8 riboswitches and 6 translational attenuators. Of those, 45 regulons were inferred directly from the genome context analysis, whereas others were propagated from previously characterized regulons in other species. Selected regulatory predictions were experimentally tested. Integration of this analysis with microarray data revealed overall consistency and provided additional layer of interactions between regulons. All the results were captured in the new database RegPrecise, which is a joint development with the LBNL team. A more detailed analysis of the individual subsystems, pathways and regulons in Shewanella spp included bioinfiormatics-based prediction and experimental characterization of: (i) N-Acetylglucosamine catabolic pathway; (ii)Lactate utilization machinery; (iii) Novel Nrt

  6. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc;

    in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... prove useful for less heritable traits such as diseases and fertility...

  7. Heteropolymeric triplex-based genomic assay to detect pathogens or single-nucleotide polymorphisms in human genomic samples.

    Directory of Open Access Journals (Sweden)

    Jasmine I Daksis

    Full Text Available Human genomic samples are complex and are considered difficult to assay directly without denaturation or PCR amplification. We report the use of a base-specific heteropolymeric triplex, formed by native duplex genomic target and an oligonucleotide third strand probe, to assay for low copy pathogen genomes present in a sample also containing human genomic duplex DNA, or to assay human genomic duplex DNA for Single Nucleotide Polymorphisms (SNP, without PCR amplification. Wild-type and mutant probes are used to identify triplexes containing FVL G1691A, MTHFR C677T and CFTR mutations. The specific triplex structure forms rapidly at room temperature in solution and may be detected without a separation step. YOYO-1, a fluorescent bis-intercalator, promotes and signals the formation of the specific triplex. Genomic duplexes may be assayed homogeneously with single base pair resolution. The specific triple-stranded structures of the assay may approximate homologous recombination intermediates, which various models suggest may form in either the major or minor groove of the duplex. The bases of the stable duplex target are rendered specifically reactive to the bases of the probe because of the activity of intercalated YOYO-1, which is known to decondense duplex locally 1.3 fold. This may approximate the local decondensation effected by recombination proteins such as RecA in vivo. Our assay, while involving triplex formation, is sui generis, as it is not homopurine sequence-dependent, as are "canonical triplexes". Rather, the base pair-specific heteropolymeric triplex of the assay is conformation-dependent. The highly sensitive diagnostic assay we present allows for the direct detection of base sequence in genomic duplex samples, including those containing human genomic duplex DNA, thereby bypassing the inherent problems and cost associated with conventional PCR based diagnostic assays.

  8. Colibri: a functional data base for the Escherichia coli genome.

    Science.gov (United States)

    Médigue, C; Viari, A; Hénaut, A; Danchin, A

    1993-09-01

    Several data libraries have been created to organize all the data obtained worldwide about the Escherichia coli genome. Because the known data now amount to more than 40% of the whole genome sequence, it has become necessary to organize the data in such a way that appropriate procedures can associate knowledge produced by experiments about each gene to its position on the chromosome and its relation to other relevant genes, for example. In addition, global properties of genes, affected by the introduction of new entries, should be present as appropriate description fields. A data base, implemented on Macintosh by using the data base management system 4th Dimension, is described. It is constructed around a core constituted by known contigs of E. coli sequences and links data collected in general libraries (unmodified) to data associated with evolving knowledge (with modifiable fields). Biologically significant results obtained through the coupling of appropriate procedures (learning or statistical data analysis) are presented. The data base is available through a 4th Dimension runtime and through FTP on Internet. It has been regularly updated and will be systematically linked to other E. coli data bases (M. Kroger, R. Wahl, G. Schachtel, and P. Rice, Nucleic Acids Res. 20(Suppl.):2119-2144, 1992; K. E. Rudd, W. Miller, C. Werner, J. Ostell, C. Tolstoshev, and S. G. Satterfield, Nucleic Acids Res. 19:637-647, 1991) in the near future.

  9. Genome size evolution in pufferfish: an insight from BAC clone-based Diodon holocanthus genome sequencing

    Directory of Open Access Journals (Sweden)

    Gan Xiaoni

    2010-06-01

    Full Text Available Abstract Background Variations in genome size within and between species have been observed since the 1950 s in diverse taxonomic groups. Serving as model organisms, smooth pufferfish possess the smallest vertebrate genomes. Interestingly, spiny pufferfish from its sister family have genome twice as large as smooth pufferfish. Therefore, comparative genomic analysis between smooth pufferfish and spiny pufferfish is useful for our understanding of genome size evolution in pufferfish. Results Ten BAC clones of a spiny pufferfish Diodon holocanthus were randomly selected and shotgun sequenced. In total, 776 kb of non-redundant sequences without gap representing 0.1% of the D. holocanthus genome were identified, and 77 distinct genes were predicted. In the sequenced D. holocanthus genome, 364 kb is homologous with 265 kb of the Takifugu rubripes genome, and 223 kb is homologous with 148 kb of the Tetraodon nigroviridis genome. The repetitive DNA accounts for 8% of the sequenced D. holocanthus genome, which is higher than that in the T. rubripes genome (6.89% and that in the Te. nigroviridis genome (4.66%. In the repetitive DNA, 76% is retroelements which account for 6% of the sequenced D. holocanthus genome and belong to known families of transposable elements. More than half of retroelements were distributed within genes. In the non-homologous regions, repeat element proportion in D. holocanthus genome increased to 10.6% compared with T. rubripes and increased to 9.19% compared with Te. nigroviridis. A comparison of 10 well-defined orthologous genes showed that the average intron size (566 bp in D. holocanthus genome is significantly longer than that in the smooth pufferfish genome (435 bp. Conclusion Compared with the smooth pufferfish, D. holocanthus has a low gene density and repeat elements rich genome. Genome size variation between D. holocanthus and the smooth pufferfish exhibits as length variation between homologous region and different

  10. Genomics and Public Health: Development of Web-based Training

    OpenAIRE

    Janice V. Bach, MS; Aaron Goldenberg, MA, MPH; Toby Citrin, JD; Sarah F. Raup, MPH; Jennifer Bodzin, MPH; Sharon L.R. Kardia, PhD

    2005-01-01

    In 2001, the Centers for Disease Control and Prevention funded three Centers for Genomics and Public Health to develop training tools for increasing genomic awareness. Over the past three years, the centers, working together with the Centers for Disease Control and Preventions Office of Genomics and Disease Prevention, have developed tools to increase awareness of the impact genomics will have on public health practice, to provide a foundation for understanding basic genomic advances, and to ...

  11. Post-Fragmentation Whole Genome Amplification-Based Method

    Science.gov (United States)

    Benardini, James; LaDuc, Myron T.; Langmore, John

    2011-01-01

    This innovation is derived from a proprietary amplification scheme that is based upon random fragmentation of the genome into a series of short, overlapping templates. The resulting shorter DNA strands (fragments with defined 3 and 5 termini. Specific primers to these termini are then used to isothermally amplify this library into potentially unlimited quantities that can be used immediately for multiple downstream applications including gel eletrophoresis, quantitative polymerase chain reaction (QPCR), comparative genomic hybridization microarray, SNP analysis, and sequencing. The standard reaction can be performed with minimal hands-on time, and can produce amplified DNA in as little as three hours. Post-fragmentation whole genome amplification-based technology provides a robust and accurate method of amplifying femtogram levels of starting material into microgram yields with no detectable allele bias. The amplified DNA also facilitates the preservation of samples (spacecraft samples) by amplifying scarce amounts of template DNA into microgram concentrations in just a few hours. Based on further optimization of this technology, this could be a feasible technology to use in sample preservation for potential future sample return missions. The research and technology development described here can be pivotal in dealing with backward/forward biological contamination from planetary missions. Such efforts rely heavily on an increasing understanding of the burden and diversity of microorganisms present on spacecraft surfaces throughout assembly and testing. The development and implementation of these technologies could significantly improve the comprehensiveness and resolving power of spacecraft-associated microbial population censuses, and are important to the continued evolution and advancement of planetary protection capabilities. Current molecular procedures for assaying spacecraft-associated microbial burden and diversity have inherent sample loss issues at

  12. [A major game in the re-organization of the Professional Nursing School].

    Science.gov (United States)

    de Amorin, Wellington Mendonça; Barreira, Ieda de Alencar

    2007-01-01

    This is a historical-social description study supported on the thought of Pierre Bourdieu based on documental analysis. It describes the sanitarists and psychiatrists' actions from the reformulation of Education and Public Health Ministry into Education and Health Ministry in the beginning of New State and analyse the fight's strategies of the main agents to take advantage on their proposals of Professional Nursing School's reorganization. The fight's strategies that psychiatrists, sanitarists and certificated nurses had used to stake their projects, characterized a difficult battle inserted in a hard major game. The analyse of the ten course's months of the main document shows the conflict between those agents to impose a new rule to the school.

  13. An Integrative Pathway-based Clinical-genomic Model for Cancer Survival Prediction.

    Science.gov (United States)

    Chen, Xi; Wang, Lily; Ishwaran, Hemant

    2010-09-01

    Prediction models that use gene expression levels are now being proposed for personalized treatment of cancer, but building accurate models that are easy to interpret remains a challenge. In this paper, we describe an integrative clinical-genomic approach that combines both genomic pathway and clinical information. First, we summarize information from genes in each pathway using Supervised Principal Components (SPCA) to obtain pathway-based genomic predictors. Next, we build a prediction model based on clinical variables and pathway-based genomic predictors using Random Survival Forests (RSF). Our rationale for this two-stage procedure is that the underlying disease process may be influenced by environmental exposure (measured by clinical variables) and perturbations in different pathways (measured by pathway-based genomic variables), as well as their interactions. Using two cancer microarray datasets, we show that the pathway-based clinical-genomic model outperforms gene-based clinical-genomic models, with improved prediction accuracy and interpretability.

  14. Solution-based targeted genomic enrichment for precious DNA samples

    Directory of Open Access Journals (Sweden)

    Shearer Aiden

    2012-05-01

    Full Text Available Abstract Background Solution-based targeted genomic enrichment (TGE protocols permit selective sequencing of genomic regions of interest on a massively parallel scale. These protocols could be improved by: 1 modifying or eliminating time consuming steps; 2 increasing yield to reduce input DNA and excessive PCR cycling; and 3 enhancing reproducible. Results We developed a solution-based TGE method for downstream Illumina sequencing in a non-automated workflow, adding standard Illumina barcode indexes during the post-hybridization amplification to allow for sample pooling prior to sequencing. The method utilizes Agilent SureSelect baits, primers and hybridization reagents for the capture, off-the-shelf reagents for the library preparation steps, and adaptor oligonucleotides for Illumina paired-end sequencing purchased directly from an oligonucleotide manufacturing company. Conclusions This solution-based TGE method for Illumina sequencing is optimized for small- or medium-sized laboratories and addresses the weaknesses of standard protocols by reducing the amount of input DNA required, increasing capture yield, optimizing efficiency, and improving reproducibility.

  15. Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome.

    Science.gov (United States)

    Wang, Yi; Liu, Xianju; Ren, Chong; Zhong, Gan-Yuan; Yang, Long; Li, Shaohua; Liang, Zhenchang

    2016-04-21

    CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of humans, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific target sites for CRISPR/Cas9 have been computationally identified for several annual model and crop species, but such sites have not been reported for perennial, woody fruit species. In this study, we identified and characterized five types of CRISPR/Cas9 target sites in the widely cultivated grape species Vitis vinifera and developed a user-friendly database for editing grape genomes in the future. A total of 35,767,960 potential CRISPR/Cas9 target sites were identified from grape genomes in this study. Among them, 22,597,817 target sites were mapped to specific genomic locations and 7,269,788 were found to be highly specific. Protospacers and PAMs were found to distribute uniformly and abundantly in the grape genomes. They were present in all the structural elements of genes with the coding region having the highest abundance. Five PAM types, TGG, AGG, GGG, CGG and NGG, were observed. With the exception of the NGG type, they were abundantly present in the grape genomes. Synteny analysis of similar genes revealed that the synteny of protospacers matched the synteny of homologous genes. A user-friendly database containing protospacers and detailed information of the sites was developed and is available for public use at the Grape-CRISPR website ( http://biodb.sdau.edu.cn/gc/index.html ). Grape genomes harbour millions of potential CRISPR/Cas9 target sites. These sites are widely distributed among and within chromosomes with predominant abundance in the coding regions of genes. We developed a publicly-accessible Grape-CRISPR database for facilitating the use of the CRISPR/Cas9 system as a genome editing tool for functional studies and molecular breeding of grapes. Among

  16. Analysis of chimpanzee history based on genome sequence alignments.

    Directory of Open Access Journals (Sweden)

    Jennifer L Caswell

    2008-04-01

    Full Text Available Population geneticists often study small numbers of carefully chosen loci, but it has become possible to obtain orders of magnitude for more data from overlaps of genome sequences. Here, we generate tens of millions of base pairs of multiple sequence alignments from combinations of three western chimpanzees, three central chimpanzees, an eastern chimpanzee, a bonobo, a human, an orangutan, and a macaque. Analysis provides a more precise understanding of demographic history than was previously available. We show that bonobos and common chimpanzees were separated approximately 1,290,000 years ago, western and other common chimpanzees approximately 510,000 years ago, and eastern and central chimpanzees at least 50,000 years ago. We infer that the central chimpanzee population size increased by at least a factor of 4 since its separation from western chimpanzees, while the western chimpanzee effective population size decreased. Surprisingly, in about one percent of the genome, the genetic relationships between humans, chimpanzees, and bonobos appear to be different from the species relationships. We used PCR-based resequencing to confirm 11 regions where chimpanzees and bonobos are not most closely related. Study of such loci should provide information about the period of time 5-7 million years ago when the ancestors of humans separated from those of the chimpanzees.

  17. Kernel Based Nonlinear Dimensionality Reduction and Classification for Genomic Microarray

    Directory of Open Access Journals (Sweden)

    Lan Shu

    2008-07-01

    Full Text Available Genomic microarrays are powerful research tools in bioinformatics and modern medicinal research because they enable massively-parallel assays and simultaneous monitoring of thousands of gene expression of biological samples. However, a simple microarray experiment often leads to very high-dimensional data and a huge amount of information, the vast amount of data challenges researchers into extracting the important features and reducing the high dimensionality. In this paper, a nonlinear dimensionality reduction kernel method based locally linear embedding(LLE is proposed, and fuzzy K-nearest neighbors algorithm which denoises datasets will be introduced as a replacement to the classical LLE’s KNN algorithm. In addition, kernel method based support vector machine (SVM will be used to classify genomic microarray data sets in this paper. We demonstrate the application of the techniques to two published DNA microarray data sets. The experimental results confirm the superiority and high success rates of the presented method.

  18. CRISPR-Cas9 Based Engineering of Actinomycetal Genomes

    DEFF Research Database (Denmark)

    Tong, Yaojun; Charusanti, Pep; Zhang, Lixin

    2015-01-01

    . To facilitate the genetic manipulation of actinomycetes, we developed a highly efficient CRISPR-Cas9 system to delete gene(s) or gene cluster(s), implement precise gene replacements, and reversibly control gene expression in actinomycetes. We demonstrate our system by targeting two genes, actIORF1 (SCO5087......) and actVB (SCO5092), from the actinorhodin biosynthetic gene cluster in Streptomyces coelicolor A3(2). Our CRISPR-Cas9 system successfully inactivated the targeted genes. When no templates for homology-directed repair (HDR) were present, the site-specific DNA double-strand breaks (DSBs) introduced by Cas9....... Moreover, we developed a system to efficiently and reversibly control expression of target genes, deemed CRISPRi, based on a catalytically dead variant of Cas9 (dCas9). The CRISPR-Cas9 based system described here comprises a powerful and broadly applicable set of tools to manipulate actinomycetal genomes....

  19. AgBase: a functional genomics resource for agriculture

    OpenAIRE

    2006-01-01

    Abstract Background Many agricultural species and their pathogens have sequenced genomes and more are in progress. Agricultural species provide food, fiber, xenotransplant tissues, biopharmaceuticals and biomedical models. Moreover, many agricultural microorganisms are human zoonoses. However, systems biology from functional genomics data is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation and agricultural researc...

  20. Genome-scale constraint-based modeling of Geobacter metallireducens

    Directory of Open Access Journals (Sweden)

    Famili Iman

    2009-01-01

    Full Text Available Abstract Background Geobacter metallireducens was the first organism that can be grown in pure culture to completely oxidize organic compounds with Fe(III oxide serving as electron acceptor. Geobacter species, including G. sulfurreducens and G. metallireducens, are used for bioremediation and electricity generation from waste organic matter and renewable biomass. The constraint-based modeling approach enables the development of genome-scale in silico models that can predict the behavior of complex biological systems and their responses to the environments. Such a modeling approach was applied to provide physiological and ecological insights on the metabolism of G. metallireducens. Results The genome-scale metabolic model of G. metallireducens was constructed to include 747 genes and 697 reactions. Compared to the G. sulfurreducens model, the G. metallireducens metabolic model contains 118 unique reactions that reflect many of G. metallireducens' specific metabolic capabilities. Detailed examination of the G. metallireducens model suggests that its central metabolism contains several energy-inefficient reactions that are not present in the G. sulfurreducens model. Experimental biomass yield of G. metallireducens growing on pyruvate was lower than the predicted optimal biomass yield. Microarray data of G. metallireducens growing with benzoate and acetate indicated that genes encoding these energy-inefficient reactions were up-regulated by benzoate. These results suggested that the energy-inefficient reactions were likely turned off during G. metallireducens growth with acetate for optimal biomass yield, but were up-regulated during growth with complex electron donors such as benzoate for rapid energy generation. Furthermore, several computational modeling approaches were applied to accelerate G. metallireducens research. For example, growth of G. metallireducens with different electron donors and electron acceptors were studied using the genome

  1. Phylogeny and evolution of Cervidae based on complete mitochondrial genomes.

    Science.gov (United States)

    Zhang, W-Q; Zhang, M-H

    2012-03-14

    Mitochondrial DNA sequences can be used to estimate phylogenetic relationships among animal taxa and for molecular phylogenetic evolution analysis. With the development of sequencing technology, more and more mitochondrial sequences have been made available in public databases, including whole mitochondrial DNA sequences. These data have been used for phylogenetic analysis of animal species, and for studies of evolutionary processes. We made phylogenetic analyses of 19 species of Cervidae, with Bos taurus as the outgroup. We used neighbor joining, maximum likelihood, maximum parsimony, and Bayesian inference methods on whole mitochondrial genome sequences. The consensus phylogenetic trees supported monophyly of the family Cervidae; it was divided into two subfamilies, Plesiometacarpalia and Telemetacarpalia, and four tribes, Cervinae, Muntiacinae, Hydropotinae, and Odocoileinae. The divergence times in these families were estimated by phylogenetic analysis using the Bayesian method with a relaxed molecular clock method; the results were consistent with those of previous studies. We concluded that the evolutionary structure of the family Cervidae can be reconstructed by phylogenetic analysis based on whole mitochondrial genomes; this method could be used broadly in phylogenetic evolutionary analysis of animal taxa.

  2. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Science.gov (United States)

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  3. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Directory of Open Access Journals (Sweden)

    Param Priya Singh

    2015-07-01

    Full Text Available Whole genome duplications (WGD have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  4. An acquisition account of genomic islands based on genome signature comparisons

    Directory of Open Access Journals (Sweden)

    Luyf ACM

    2005-11-01

    Full Text Available Abstract Background Recent analyses of prokaryotic genome sequences have demonstrated the important force horizontal gene transfer constitutes in genome evolution. Horizontally acquired sequences are detectable by, among others, their dinucleotide composition (genome signature dissimilarity with the host genome. Genomic islands (GIs comprise important and interesting horizontally transferred sequences, but information about acquisition events or relatedness between GIs is scarce. In Vibrio vulnificus CMCP6, 10 and 11 GIs have previously been identified in the sequenced chromosomes I and II, respectively. We assessed the compositional similarity and putative acquisition account of these GIs using the genome signature. For this analysis we developed a new algorithm, available as a web application. Results Of 21 GIs, VvI-1 and VvI-10 of chromosome I have similar genome signatures, and while artificially divided due to a linear annotation, they are adjacent on the circular chromosome and therefore comprise one GI. Similarly, GIs VvI-3 and VvI-4 of chromosome I together with the region between these two islands are compositionally similar, suggesting that they form one GI (making a total of 19 GIs in chromosome I + chromosome II. Cluster analysis assigned the 19 GIs to 11 different branches above our conservative threshold. This suggests a limited number of compositionally similar donors or intragenomic dispersion of ancestral acquisitions. Furthermore, 2 GIs of chromosome II cluster with chromosome I, while none of the 19 GIs group with chromosome II, suggesting an unidirectional dispersal of large anomalous gene clusters from chromosome I to chromosome II. Conclusion From the results, we infer 10 compositionally dissimilar donors for 19 GIs in the V. vulnificus CMCP6 genome, including chromosome I donating to chromosome II. This suggests multiple transfer events from individual donor types or from donors with similar genome signatures. Applied to

  5. Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values.

    Science.gov (United States)

    Comin, Matteo; Schimd, Michele

    2016-08-12

    Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. In this paper we present a family of alignment-free measures, called d (q) -type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments.

  6. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

    Cπ method and applied to 1,272 Duroc pigs with both genotypic and phenotypic records including residual (RFI) and daily feed intake (DFI), average daily gain (ADG) and back fat (BF)). Records were split into a training (968 pigs) and a validation dataset (304 pigs). SNPs were annotated by 14 different...... groups. Genomic prediction has accuracy comparable to an own phenotype and use of genomic prediction can be cost effective by replacing feed intake measurement. Use of genomic annotation of SNPs and QTL information had no largely significant impact on predictive accuracy for the current traits but may...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...

  7. ArraySearch: A Web-Based Genomic Search Engine.

    Science.gov (United States)

    Wilson, Tyler J; Ge, Steven X

    2012-01-01

    Recent advances in microarray technologies have resulted in a flood of genomics data. This large body of accumulated data could be used as a knowledge base to help researchers interpret new experimental data. ArraySearch finds statistical correlations between newly observed gene expression profiles and the huge source of well-characterized expression signatures deposited in the public domain. A search query of a list of genes will return experiments on which the genes are significantly up- or downregulated collectively. Searches can also be conducted using gene expression signatures from new experiments. This resource will empower biological researchers with a statistical method to explore expression data from their own research by comparing it with expression signatures from a large public archive.

  8. Ascaris phylogeny based on multiple whole mtDNA genomes

    DEFF Research Database (Denmark)

    Nejsum, Peter; Hawash, Mohamed B F; Betson, Martha

    2016-01-01

    Ascaris lumbricoides and A. suum are two parasitic nematodes infecting humans and pigs, respectively. There has been considerable debate as to whether Ascaris in the two hosts should be considered a single or two separate species. Previous studies identified at least three major clusters (A, B...... and C) of human and pig Ascaris based on partial cox1 sequences. In the present study, we selected major haplotypes from these different clusters to characterize their whole mitochondrial genomes for phylogenetic analysis. We also undertook coalescent simulations to investigate the evolutionary history...... events: the first one occurring early in the Neolithic period which resulted in a differentiated population of Ascaris in pigs (cluster C), the second occurring more recently (~ 900 generations ago), resulting in clusters A and B which might have been spread worldwide by human activities....

  9. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... on Tabasco), led us to the detection of a high-resolution map of segmental duplications in the pig genome. Comparing these segments with four other Duroc animals sequenced at our institute, supplied the resources needed to describe the first genome-wide and systematic analysis of segmental duplications...

  10. GeNemo: a search engine for web-based functional genomic data

    OpenAIRE

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-01-01

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of E...

  11. Technology-Driven and Evidence-Based Genomic Analysis for Integrated Pediatric and Prenatal Genetics Evaluation

    Institute of Scientific and Technical Information of China (English)

    Yuan Wei; Fang Xu; Peining Li

    2013-01-01

    The first decade since the completion of the Human Genome Project has been marked with rapid development of genomic technologies and their immediate clinical applications.Genomic analysis using oligonucleotide array comparative genomic hybridization (aCGH) or single nucleotide polymorphism (SNP) chips has been applied to pediatric patients with developmental and intellectual disabilities (DD/ID),multiple congenital anomalies (MCA) and autistic spectrum disorders (ASD).Evaluation of analytical and clinical validities of aCGH showed > 99% sensitivity and specificity and increased analytical resolution by higher density probe coverage.Reviews of case series,multi-center comparison and large patient-control studies demonstrated a diagnostic yield of 12%-20%; approximately 60% of these abnormalities were recurrent genomic disorders.This pediatric experience has been extended toward prenatal diagnosis.A series of reports indicated approximately 10% of pregnancies with ultrasound-detected structural anomalies and normal cytogenetic findings had genomic abnormalities,and 30% of these abnormalities were syndromic genomic disorders.Evidence-based practice guidelines and standards for implementing genomic analysis and web-delivered knowledge resources for interpreting genomic findings have been established.The progress from this technology-driven and evidence-based genomic analysis provides not only opportunities to dissect disease-causing mechanisms and develop rational therapeutic interventions but also important lessons for integrating genomic sequencing into pediatric and prenatal genetic evaluation.

  12. GeNemo: a search engine for web-based functional genomic data.

    Science.gov (United States)

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-07-08

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org.

  13. Genome-based versus gene-based theory of cancer: Possible implications for clinical practice

    Indian Academy of Sciences (India)

    Nataša Todorović-Raković

    2011-09-01

    The current state in oncology research indicates that the attempts to explain such complex process as cancerogenesis by a single or several genetic mutations were not successful enough. On the other hand, chromosomal/genomic instability – almost universal features of malignant tumours which influence a global pattern of gene expression and, subsequently, many oncogenic pathways – were often disregarded and considered nonessential to clinical application. However, a new arising field of system biology including ‘new forms’ of genome diversity such as copy number variations (CNV) and high-throughput oncogene mutation profiling now reveal all the complexity of cancer and provide the final explanation of the oncogenic pathways, based on stochastic (onco)genomic variation rather than on (onco)genic concepts.

  14. The FlyBase database of the Drosophila genome projects andcommunity literature

    Energy Technology Data Exchange (ETDEWEB)

    Gelbart, William; Bayraktaroglu, Leyla; Bettencourt, Brian; Campbell, Kathy; Crosby, Madeline; Emmert, David; Hradecky, Pavel; Huang,Yanmei; Letovsky, Stan; Matthews, Beverly; Russo, Susan; Schroeder,Andrew; Smutniak, Frank; Zhou, Pinglei; Zytkovicz, Mark; Ashburner,Michael; Drysdale, Rachel; de Grey, Aubrey; Foulger, Rebecca; Millburn,Gillian; Yamada, Chihiro; Kaufman, Thomas; Matthews, Kathy; Gilbert, Don; Grumbling, Gary; Strelets, Victor; Shemen, C.; Rubin, Gerald; Berman,Brian; Frise, Erwin; Gibson, Mark; Harris, Nomi; Kaminker, Josh; Lewis,Suzanna; Marshall, Brad; Misra, Sima; Mungall, Christopher; Prochnik,Simon; Richter, John; Smith, Christopher; Shu, ShengQiang; Tupy,Jonathan; Wiel, Colin

    2002-09-16

    FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D.melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy.

  15. INTEGRATED GENOME-BASED STUDIES OF SHEWANELLA ECOPHYSIOLOGY

    Energy Technology Data Exchange (ETDEWEB)

    NEALSON, KENNETH H.

    2013-10-15

    laboratories. Applications: 1. Corrosion: Electron flow is often part of the corrosive process, and several studies were done in concert with this proposal with regard to the ability of EET-capable bacteria to enhance, inhibit, or detect corrosion. These included using EET-capable bacteria to detect corrosion in its earliest stages [5], to use corrosion-causing bacteria for the study of the microbe/mineral interface during corrosion [1], and to study the groups of microbes involved with corrosion of natural systems [19]. 2. Bioenergy and microbial fuel cells: The production of electricity by Shewanella was shown early in this program (several years ago) to be dependent on the genes for extracellular electron transport (EET), and applied work involved the testing of various strains and conditions for the optimization of current production by the shewanellae [11,14,16]. 3. Identification of shewanellae strains: Based on similarities seen in genomic comparisons, a rapid method was employed for distinguishing between shewanellae strains [17]. Interactions with other laboratories: This grant was an extension of a grant involving the so-called ?Shewanella Federation?, and as such, a number of our publications were joint with other members of this group. The groups included: 1. Pacific Northwest Laboratories ? 2. Oak Ridge National Labs 3. Michigan State University 4. University of Oklahoma 5. Naval Research Laboratory, Washington DC 6. Burnham Medical Research Institute, San Diego 7. J. Craig Venter Institute, San Diego Education: Graduate Students: Michael Waters, Ph.D. ? at NIST, Washington D.C. Lewis Hsu, Ph.D. ? at NRL, San Diego Howard Harris, Ph.D. ? Postdoc at University, France Everett Salas, Ph.D. ? Scientist at Chevron McLean, Jeffrey, Ph.D. ? Scientist at J. Craig Venter Institute McCrow, John, Ph.D. ? Scientist at J. Craig Venter Institute Postdocs: Mohamed El-Naggar ? Professor of Physics, USC Jinjun Kan ? Senior Researcher at Undergraduatges: During this year, we had

  16. A SNP based linkage map of the turkey genome reveals multiple intrachromosomal rearrangements between the Turkey and Chicken genomes

    Directory of Open Access Journals (Sweden)

    Vereijken Addie

    2010-11-01

    Full Text Available Abstract Background The turkey (Meleagris gallopavo is an important agricultural species that is the second largest contributor to the world's poultry meat production. The genomic resources of turkey provide turkey breeders with tools needed for the genetic improvement of commercial breeds of turkey for economically important traits. A linkage map of turkey is essential not only for the mapping of quantitative trait loci, but also as a framework to enable the assignment of sequence contigs to specific chromosomes. Comparative genomics with chicken provides insight into mechanisms of genome evolution and helps in identifying rare genomic events such as genomic rearrangements and duplications/deletions. Results Eighteen full sib families, comprising 1008 (35 F1 and 973 F2 birds, were genotyped for 775 single nucleotide polymorphisms (SNPs. Of the 775 SNPs, 570 were informative and used to construct a linkage map in turkey. The final map contains 531 markers in 28 linkage groups. The total genetic distance covered by these linkage groups is 2,324 centimorgans (cM with the largest linkage group (81 loci measuring 326 cM. Average marker interval for all markers across the 28 linkage groups is 4.6 cM. Comparative mapping of turkey and chicken revealed two inter-, and 57 intrachromosomal rearrangements between these two species. Conclusion Our turkey genetic map of 531 markers reveals a genome length of 2,324 cM. Our linkage map provides an improvement of previously published maps because of the more even distribution of the markers and because the map is completely based on SNP markers enabling easier and faster genotyping assays than the microsatellitemarkers used in previous linkage maps. Turkey and chicken are shown to have a highly conserved genomic structure with a relatively low number of inter-, and intrachromosomal rearrangements.

  17. Genomic analysis of plant chromosomes based on meiotic pairing

    Directory of Open Access Journals (Sweden)

    Lisete Chamma Davide

    2007-12-01

    Full Text Available This review presents the principles and applications of classical genomic analysis, with emphasis on plant breeding. The main mathematical models used to estimate the preferential chromosome pairing in diploid or polyploid, interspecific or intergenera hybrids are presented and discussed, with special reference to the applications and studies for the definition of genome relationships among species of the Poaceae family.

  18. A Genomics-Based Classification of Human Lung Tumors

    NARCIS (Netherlands)

    Seidel, Danila; Zander, Thomas; Heukamp, Lukas C.; Peifer, Martin; Bos, Marc; Fernandez-Cuesta, Lynnette; Leenders, Frauke; Lu, Xin; Ansen, Sascha; Gardizi, Masyar; Nguyen, Chau; Berg, Johannes; Russell, Prudence; Wainer, Zoe; Schildhaus, Hans-Ulrich; Rogers, Toni-Maree; Solomon, Benjamin; Pao, William; Carter, Scott L.; Getz, Gad; Hayes, D. Neil; Wilkerson, Matthew D.; Thunnissen, Erik; Travis, William D.; Perner, Sven; Wright, Gavin; Brambilla, Elisabeth; Buettner, Reinhard; Wolf, Juergen; Thomas, Roman; Gabler, Franziska; Wilkening, Ines; Mueller, Christian; Dahmen, Ilona; Menon, Roopika; Koenig, Katharina; Albus, Kerstin; Merkelbach-Bruse, Sabine; Fassunke, Jana; Schmitz, Katja; Kuenstlinger, Helen; Kleine, Michaela; Binot, Elke; Querings, Silvia; Altmueller, Janine; Boessmann, Ingelore; Nuemberg, Peter; Schneider, Peter; Bogus, Magdalena; Buettner, Reinhard; Perner, Sven; Russell, Prudence; Thunnissen, Erik; Travis, William D.; Brambilla, Elisabeth; Soltermann, Alex; Moch, Holger; Brustugun, Odd Terje; Solberg, Steinar; Lund-Iversen, Marius; Helland, Aslaug; Muley, Thomas; Hoffmann, Hans; Schnabel, Philipp A.; Chen, Yuan; Groen, Herman; Timens, Wim; Sietsma, Hannie; Clement, Joachim H.; Weder, Walter; Saenger, Joerg; Stoelben, Erich; Ludwig, Corinna; Engel-Riedel, Walburga; Smit, Egbert; Heideman, Danille A. M.; Snijders, Peter J. F.; Nogova, Lucia; Sos, Martin L.; Mattonet, Christian; Toepelt, Karin; Scheffler, Matthias; Goekkurt, Eray; Kappes, Rainer; Krueger, Stefan; Kambartel, Kato; Behringer, Dirk; Schulte, Wolfgang; Galetke, Wolfgang; Randerath, Winfried; Heldwein, Matthias; Schlesinger, Andreas; Serke, Monika; Hekmat, Khosro; Frank, Konrad F.; Schnell, Roland; Reiser, Marcel; Huenerlituerkoglu, Ali-Nuri; Schmitz, Stephan; Meffert, Lisa; Ko, Yon-Dschun; Litt-Lampe, Markus; Gerigk, Ulrich; Fricke, Rainer; Besse, Benjamin; Brambilla, Christian; Lantuejoul, Sylvie; Lorimier, Philippe; Moro-Sibilot, Denis; Cappuzzo, Federico; Ligorio, Claudia; Damiani, Stefania; Field, John K.; Hyde, Russell; Validire, Pierre; Girard, Philippe; Muscarella, Lucia A.; Fazio, Vito M.; Hallek, Michael; Soria, Jean-Charles; Carter, Scott L.; Getz, Gad; Hayes, D. Neil; Wilkerson, Matthew D.; Achter, Viktor; Lang, Ulrich; Seidel, Danila; Zander, Thomas; Heukamp, Lukas C.; Peifer, Martin; Bos, Marc; Pao, William; Travis, William D.; Brambilla, Elisabeth; Buettner, Reinhard; Wolf, Juergen; Thomas, Roman K.

    2013-01-01

    We characterized genome alterations in 1255 clinically annotated lung tumors of all histological subgroups to identify genetically defined and clinically relevant subtypes. More than 55% of all cases had at least one oncogenic genome alteration potentially amenable to specific therapeutic interventi

  19. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  20. CRISPR-Cas9 Based Engineering of Actinomycetal Genomes.

    Science.gov (United States)

    Tong, Yaojun; Charusanti, Pep; Zhang, Lixin; Weber, Tilmann; Lee, Sang Yup

    2015-09-18

    Bacteria of the order Actinomycetales are one of the most important sources of pharmacologically active and industrially relevant secondary metabolites. Unfortunately, many of them are still recalcitrant to genetic manipulation, which is a bottleneck for systematic metabolic engineering. To facilitate the genetic manipulation of actinomycetes, we developed a highly efficient CRISPR-Cas9 system to delete gene(s) or gene cluster(s), implement precise gene replacements, and reversibly control gene expression in actinomycetes. We demonstrate our system by targeting two genes, actIORF1 (SCO5087) and actVB (SCO5092), from the actinorhodin biosynthetic gene cluster in Streptomyces coelicolor A3(2). Our CRISPR-Cas9 system successfully inactivated the targeted genes. When no templates for homology-directed repair (HDR) were present, the site-specific DNA double-strand breaks (DSBs) introduced by Cas9 were repaired through the error-prone nonhomologous end joining (NHEJ) pathway, resulting in a library of deletions with variable sizes around the targeted sequence. If templates for HDR were provided at the same time, precise deletions of the targeted gene were observed with near 100% frequency. Moreover, we developed a system to efficiently and reversibly control expression of target genes, deemed CRISPRi, based on a catalytically dead variant of Cas9 (dCas9). The CRISPR-Cas9 based system described here comprises a powerful and broadly applicable set of tools to manipulate actinomycetal genomes.

  1. Recent advances in genome-based polyketide discovery.

    Science.gov (United States)

    Helfrich, Eric J N; Reiter, Silke; Piel, Jörn

    2014-10-01

    Polyketides are extraordinarily diverse secondary metabolites of great pharmacological value and with interesting ecological functions. The post-genomics era has led to fundamental changes in natural product research by inverting the workflow of secondary metabolite discovery. As opposed to traditional bioactivity-guided screenings, genome mining is an in silico method to screen and analyze sequenced genomes for natural product biosynthetic gene clusters. Since genes for known compounds can be recognized at the early computational stage, genome mining presents an opportunity for dereplication. This review highlights recent progress in bioinformatics, pathway engineering and chemical analytics to extract the biosynthetic secrets hidden in the genome of both well-known natural product sources as well as previously neglected bacteria.

  2. [Transcription activator-like effectors(TALEs)based genome engineering].

    Science.gov (United States)

    Zhao, Mei-Wei; Duan, Cheng-Li; Liu, Jiang

    2013-10-01

    Systematic reverse-engineering of functional genome architecture requires precise modifications of gene sequences and transcription levels. The development and application of transcription activator-like effectors(TALEs) has created a wealth of genome engineering possibilities. TALEs are a class of naturally occurring DNA-binding proteins found in the plant pathogen Xanthomonas species. The DNA-binding domain of each TALE typically consists of tandem 34-amino acid repeat modules rearranged according to a simple cipher to target new DNA sequences. Customized TALEs can be used for a wide variety of genome engineering applications, including transcriptional modulation and genome editing. Such "genome engineering" has now been established in human cells and a number of model organisms, thus opening the door to better understanding gene function in model organisms, improving traits in crop plants and treating human genetic disorders.

  3. WormBase: network access to the genome and biology of Caenorhabditis elegans.

    Science.gov (United States)

    Stein, L; Sternberg, P; Durbin, R; Thierry-Mieg, J; Spieth, J

    2001-01-01

    WormBase (http://www.wormbase.org) is a web-based resource for the Caenorhabditis elegans genome and its biology. It builds upon the existing ACeDB database of the C.elegans genome by providing data curation services, a significantly expanded range of subject areas and a user-friendly front end.

  4. [CRISPR/Cas9-based genome editing systems and the analysis of targeted genome mutations in plants].

    Science.gov (United States)

    Xingliang, Ma; Yaoguang, Liu

    2016-02-01

    Targeted genomic editing technologies use programmable DNA nucleases to cleave genomic target sites, thus inducing targeted mutations in the genomes. The newly prevailed clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system that consists of the Cas9 nuclease and single guide RNA (sgRNA) has the advantages of simplicity and high efficiency as compared to other programmable DNA nuclease systems such as zinc finger nucleases (ZFNs) and transcription activator like effector nucleases (TALENs). Currently, a number of cases have been reported on the application of the CRISPR/Cas9 genomic editing technology in plants. In this review, we summarize the strategies for preparing the Cas9 and sgRNA expression constructs, the transformation method for obtaining targeted mutations, the efficiency and features of the resulting mutations and the methods for detecting or genotyping of the mutation sites. We also discuss the existing problems and perspectives of CRISPR/Cas9-based genomic editing in plants.

  5. Research progress of plant population genomics based on high-throughput sequencing.

    Science.gov (United States)

    Yunsheng, Wang

    2016-08-01

    Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.

  6. Analysis of the genomic homologous recombination in Theilovirus based on complete genomes

    Directory of Open Access Journals (Sweden)

    Yi Maoli

    2011-09-01

    Full Text Available Abstract At present, Theilovirus is considered to comprise four distinct serotypes, including Theiler's murine encephalomyelitis virus, Vilyuisk human encephalomyelitis virus, Thera virus, and Saffold virus. So far, there is no systematical study that investigated the genomic recombination of Theilovirus. The present study performed the phylogenetic and recombination analysis of Theilovirus over the complete genomes. Seven potentially significant recombination events were identified. However, according to the strains information and references related to the recombinants and their parental strains, four of the recombination events might happen non-naturally. These results will provide valuable hints for future research on evolution and antigenic variability of Theilovirus.

  7. Analysis of the genomic homologous recombination in Theilovirus based on complete genomes.

    Science.gov (United States)

    Sun, Guangming; Zhang, Xiaodan; Yi, Maoli; Shao, Shihe; Zhang, Wen

    2011-09-17

    At present, Theilovirus is considered to comprise four distinct serotypes, including Theiler's murine encephalomyelitis virus, Vilyuisk human encephalomyelitis virus, Thera virus, and Saffold virus. So far, there is no systematical study that investigated the genomic recombination of Theilovirus. The present study performed the phylogenetic and recombination analysis of Theilovirus over the complete genomes. Seven potentially significant recombination events were identified. However, according to the strains information and references related to the recombinants and their parental strains, four of the recombination events might happen non-naturally. These results will provide valuable hints for future research on evolution and antigenic variability of Theilovirus.

  8. Compatibility of pedigree-based and marker-based relationships for single-step genomic prediction

    DEFF Research Database (Denmark)

    Christensen, Ole Fredslund

    2012-01-01

    Single-step methods for genomic prediction have recently become popular because they are conceptually simple and in practice such a method can completely replace a pedigree-based method for routine genetic evaluation. An issue with single-step methods is compatibility between the marker-based rel......Single-step methods for genomic prediction have recently become popular because they are conceptually simple and in practice such a method can completely replace a pedigree-based method for routine genetic evaluation. An issue with single-step methods is compatibility between the marker......-based relationship matrix and the pedigree-based relationship matrix. The compatibility issue involves which allele frequencies to use in the marker-based relationship matrix, and also that adjustments of this matrix to the pedigree-based relationship matrix are needed. In addition, it has been overlooked...... in the base population. Here, two ideas are explored. The first idea is to instead adjust the pedigree-based relationship matrix to be compatible to the marker-based relationship matrix, whereas the second idea is to include the likelihood for the observed markers. A single-step method is used where...

  9. Integrated Genome-Based Studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Jizhong; He, Zhili

    2014-04-08

    As a part of the Shewanella Federation project, we have used integrated genomic, proteomic and computational technologies to study various aspects of energy metabolism of two Shewanella strains from a systems-level perspective.

  10. Actin re-organization induced by Chlamydia trachomatis serovar D--evidence for a critical role of the effector protein CT166 targeting Rac.

    Directory of Open Access Journals (Sweden)

    Jessica Thalmann

    Full Text Available The intracellular bacterium Chlamydia trachomatis causes infections of urogenital tract, eyes or lungs. Alignment reveals homology of CT166, a putative effector protein of urogenital C. trachomatis serovars, with the N-terminal glucosyltransferase domain of clostridial glucosylating toxins (CGTs. CGTs contain an essential DXD-motif and mono-glucosylate GTP-binding proteins of the Rho/Ras families, the master regulators of the actin cytoskeleton. CT166 is preformed in elementary bodies of C. trachomatis D and is detected in the host-cell shortly after infection. Infection with high MOI of C. trachomatis serovar D containing the CT166 ORF induces actin re-organization resulting in cell rounding and a decreased cell diameter. A comparable phenotype was observed in HeLa cells treated with the Rho-GTPase-glucosylating Toxin B from Clostridium difficile (TcdB or HeLa cells ectopically expressing CT166. CT166 with a mutated DXD-motif (CT166-mut exhibited almost unchanged actin dynamics, suggesting that CT166-induced actin re-organization depends on the glucosyltransferase motif of CT166. The cytotoxic necrotizing factor 1 (CNF1 from E. coli deamidates and thereby activates Rho-GTPases and transiently protects them against TcdB-induced glucosylation. CNF1-treated cells were found to be protected from TcdB- and CT166-induced actin re-organization. CNF1 treatment as well as ectopic expression of non-glucosylable Rac1-G12V, but not RhoA-G14A, reverted CT166-induced actin re-organization, suggesting that CT166-induced actin re-organization depends on the glucosylation of Rac1. In accordance, over-expression of CT166-mut diminished TcdB induced cell rounding, suggesting shared substrates. Cell rounding induced by high MOI infection with C. trachomatis D was reduced in cells expressing CT166-mut or Rac1-G12V, and in CNF1 treated cells. These observations indicate that the cytopathic effect of C. trachomatis D is mediated by CT166 induced Rac1 glucosylation

  11. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system.

    Science.gov (United States)

    Speth, Daan R; In 't Zandt, Michiel H; Guerrero-Cruz, Simon; Dutilh, Bas E; Jetten, Mike S M

    2016-03-31

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date.

  12. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

    DEFF Research Database (Denmark)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan

    2009-01-01

    MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary......' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners...... heuristics. RESULTS: We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect...

  13. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

    NARCIS (Netherlands)

    Speth, D.; Zandt, M.H. In 't; Guerrero-Cruz, S.; Dutilh, B.E.; Jetten, M.S.M

    2016-01-01

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is use

  14. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

    NARCIS (Netherlands)

    Speth, D.; Zandt, M.H. In 't; Guerrero-Cruz, S.; Dutilh, B.E.; Jetten, M.S.M

    2016-01-01

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is

  15. Comprehensive genome characterization of solitary fibrous tumors using high-resolution array-based comparative genomic hybridization.

    Science.gov (United States)

    Bertucci, François; Bouvier-Labit, Corinne; Finetti, Pascal; Adélaïde, José; Metellus, Philippe; Mokhtari, Karima; Decouvelaere, Anne-Valérie; Miquel, Catherine; Jouvet, Anne; Figarella-Branger, Dominique; Pedeutour, Florence; Chaffanet, Max; Birnbaum, Daniel

    2013-02-01

    Solitary fibrous tumors (SFTs) are rare spindle cell tumors with limited therapeutic options. Their molecular basis is poorly known. No consistent cytogenetic abnormality has been reported. We used high-resolution whole-genome array-based comparative genomic hybridization (Agilent 244K oligonucleotide chips) to profile 47 samples, meningeal in >75% of cases. Few copy number aberrations (CNAs) were observed. Sixty-eight percent of samples did not show any gene CNA after exclusion of probes located in regions with referenced copy number variation (CNV). Only low-level CNAs were observed. The genomic profiles were very homogeneous among samples. No molecular class was revealed by clustering of DNA copy numbers. All cases displayed a "simplex" profile. No recurrent CNA was identified. Imbalances occurring in >20%, such as the gain of 8p11.23-11.22 region, contained known CNVs. The 13q14.11-13q31.1 region (lost in 4% of cases) was the largest altered region and contained the lowest percentage of genes with referenced CNVs. A total of 425 genes without CNV showed copy number transition in at least one sample, but only but only 1 in at least 10% of samples. The genomic profiles of meningeal and extra-meningeal cases did not show any differences.

  16. Genomes-based phylogeny of the genus Xanthomonas

    Directory of Open Access Journals (Sweden)

    Rodriguez-R Luis M

    2012-03-01

    Full Text Available Abstract Background The genus Xanthomonas comprises several plant pathogenic bacteria affecting a wide range of hosts. Despite the economic, industrial and biological importance of Xanthomonas, the classification and phylogenetic relationships within the genus are still under active debate. Some of the relationships between pathovars and species have not been thoroughly clarified, with old pathovars becoming new species. A change in the genus name has been recently suggested for Xanthomonas albilineans, an early branching species currently located in this genus, but a thorough phylogenomic reconstruction would aid in solving these and other discrepancies in this genus. Results Here we report the results of the genome-wide analysis of DNA sequences from 989 orthologous groups from 17 Xanthomonas spp. genomes available to date, representing all major lineages within the genus. The phylogenetic and computational analyses used in this study have been automated in a Perl package designated Unus, which provides a framework for phylogenomic analyses which can be applied to other datasets at the genomic level. Unus can also be easily incorporated into other phylogenomic pipelines. Conclusions Our phylogeny agrees with previous phylogenetic topologies on the genus, but revealed that the genomes of Xanthomonas citri and Xanthomonas fuscans belong to the same species, and that of Xanthomonas albilineans is basal to the joint clade of Xanthomonas and Xylella fastidiosa. Genome reduction was identified in the species Xanthomonas vasicola in addition to the previously identified reduction in Xanthomonas albilineans. Lateral gene transfer was also observed in two gene clusters.

  17. Cas9-based genome editing in Arabidopsis and tobacco.

    Science.gov (United States)

    Li, Jian-Feng; Zhang, Dandan; Sheen, Jen

    2014-01-01

    Targeted modification of plant genome is key to elucidating and manipulating gene functions in plant research and biotechnology. The clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas) technology is emerging as a powerful genome-editing method in diverse plants that traditionally lacked facile and versatile tools for targeted genetic engineering. This technology utilizes easily reprogrammable guide RNAs (sgRNAs) to direct Streptococcus pyogenes Cas9 endonuclease to generate DNA double-stranded breaks in targeted genome sequences, which facilitates efficient mutagenesis by error-prone nonhomologous end-joining (NHEJ) or sequence replacement by homology-directed repair (HDR). In this chapter, we describe the procedure to design and evaluate dual sgRNAs for plant codon-optimized Cas9-mediated genome editing using mesophyll protoplasts as model cell systems in Arabidopsis thaliana and Nicotiana benthamiana. We also discuss future directions in sgRNA/Cas9 applications for generating targeted genome modifications and gene regulations in plants.

  18. INTEGRATED GENOME-BASED STUDIES OF SHEWANELLA ECOPHYSIOLOGY

    Energy Technology Data Exchange (ETDEWEB)

    NEALSON, KENNETH H.

    2013-10-15

    laboratories. Applications: 1. Corrosion: Electron flow is often part of the corrosive process, and several studies were done in concert with this proposal with regard to the ability of EET-capable bacteria to enhance, inhibit, or detect corrosion. These included using EET-capable bacteria to detect corrosion in its earliest stages [5], to use corrosion-causing bacteria for the study of the microbe/mineral interface during corrosion [1], and to study the groups of microbes involved with corrosion of natural systems [19]. 2. Bioenergy and microbial fuel cells: The production of electricity by Shewanella was shown early in this program (several years ago) to be dependent on the genes for extracellular electron transport (EET), and applied work involved the testing of various strains and conditions for the optimization of current production by the shewanellae [11,14,16]. 3. Identification of shewanellae strains: Based on similarities seen in genomic comparisons, a rapid method was employed for distinguishing between shewanellae strains [17]. Interactions with other laboratories: This grant was an extension of a grant involving the so-called ?Shewanella Federation?, and as such, a number of our publications were joint with other members of this group. The groups included: 1. Pacific Northwest Laboratories ? 2. Oak Ridge National Labs 3. Michigan State University 4. University of Oklahoma 5. Naval Research Laboratory, Washington DC 6. Burnham Medical Research Institute, San Diego 7. J. Craig Venter Institute, San Diego Education: Graduate Students: Michael Waters, Ph.D. ? at NIST, Washington D.C. Lewis Hsu, Ph.D. ? at NRL, San Diego Howard Harris, Ph.D. ? Postdoc at University, France Everett Salas, Ph.D. ? Scientist at Chevron McLean, Jeffrey, Ph.D. ? Scientist at J. Craig Venter Institute McCrow, John, Ph.D. ? Scientist at J. Craig Venter Institute Postdocs: Mohamed El-Naggar ? Professor of Physics, USC Jinjun Kan ? Senior Researcher at Undergraduatges: During this year, we had

  19. A web-based multi-genome synteny viewer for customized data

    Directory of Open Access Journals (Sweden)

    Revanna Kashi V

    2012-08-01

    Full Text Available Abstract Background Web-based synteny visualization tools are important for sharing data and revealing patterns of complicated genome conservation and rearrangements. Such tools should allow biologists to upload genomic data for their own analysis. This requirement is critical because individual biologists are generating large amounts of genomic sequences that quickly overwhelm any centralized web resources to collect and display all those data. Recently, we published a web-based synteny viewer, GSV, which was designed to satisfy the above requirement. However, GSV can only compare two genomes at a given time. Extending the functionality of GSV to visualize multiple genomes is important to meet the increasing demand of the research community. Results We have developed a multi-Genome Synteny Viewer (mGSV. Similar to GSV, mGSV is a web-based tool that allows users to upload their own genomic data files for visualization. Multiple genomes can be presented in a single integrated view with an enhanced user interface. Users can navigate through all the selected genomes in either pairwise or multiple viewing mode to examine conserved genomic regions as well as the accompanying genome annotations. Besides serving users who manually interact with the web server, mGSV also provides Web Services for machine-to-machine communication to accept data sent by other remote resources. The entire mGSV package can also be downloaded for easy local installation. Conclusions mGSV significantly enhances the original functionalities of GSV. A web server hosting mGSV is provided at http://cas-bioinfo.cas.unt.edu/mgsv.

  20. Quantitative metagenomic analyses based on average genome size normalization

    DEFF Research Database (Denmark)

    Frank, Jeremy Alexander; Sørensen, Søren Johannes

    2011-01-01

    Over the past quarter-century, microbiologists have used DNA sequence information to aid in the characterization of microbial communities. During the last decade, this has expanded from single genes to microbial community genomics, or metagenomics, in which the gene content of an environment can...... by estimating average genome sizes. This normalization can relieve comparative biases introduced by differences in community structure, number of sequencing reads, and sequencing read lengths between different metagenomes. We demonstrate the utility of this approach by comparing metagenomes from two different...... marine sources using both conventional small-subunit (SSU) rRNA gene analyses and our quantitative method to calculate the proportion of genomes in each sample that are capable of a particular metabolic trait. With both environments, to determine what proportion of each community they make up and how...

  1. VibrioBase: A Model for Next-Generation Genome and Annotation Database Development

    Directory of Open Access Journals (Sweden)

    Siew Woh Choo

    2014-01-01

    Full Text Available To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC tool, and pathogenomics profiling tool (PathoProT. The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development.

  2. Manifold Based Optimization for Single-Cell 3D Genome Reconstruction.

    Directory of Open Access Journals (Sweden)

    Jonas Paulsen

    2015-08-01

    Full Text Available The three-dimensional (3D structure of the genome is important for orchestration of gene expression and cell differentiation. While mapping genomes in 3D has for a long time been elusive, recent adaptations of high-throughput sequencing to chromosome conformation capture (3C techniques, allows for genome-wide structural characterization for the first time. However, reconstruction of "consensus" 3D genomes from 3C-based data is a challenging problem, since the data are aggregated over millions of cells. Recent single-cell adaptations to the 3C-technique, however, allow for non-aggregated structural assessment of genome structure, but data suffer from sparse and noisy interaction sampling. We present a manifold based optimization (MBO approach for the reconstruction of 3D genome structure from chromosomal contact data. We show that MBO is able to reconstruct 3D structures based on the chromosomal contacts, imposing fewer structural violations than comparable methods. Additionally, MBO is suitable for efficient high-throughput reconstruction of large systems, such as entire genomes, allowing for comparative studies of genomic structure across cell-lines and different species.

  3. Genomic organization, annotation, and ligand-receptor inferences of chicken chemokines and chemokine receptor genes based on comparative genomics

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2005-03-01

    Full Text Available Abstract Background Chemokines and their receptors play important roles in host defense, organogenesis, hematopoiesis, and neuronal communication. Forty-two chemokines and 19 cognate receptors have been found in the human genome. Prior to this report, only 11 chicken chemokines and 7 receptors had been reported. The objectives of this study were to systematically identify chicken chemokines and their cognate receptor genes in the chicken genome and to annotate these genes and ligand-receptor binding by a comparative genomics approach. Results Twenty-three chemokine and 14 chemokine receptor genes were identified in the chicken genome. All of the chicken chemokines contained a conserved CC, CXC, CX3C, or XC motif, whereas all the chemokine receptors had seven conserved transmembrane helices, four extracellular domains with a conserved cysteine, and a conserved DRYLAIV sequence in the second intracellular domain. The number of coding exons in these genes and the syntenies are highly conserved between human, mouse, and chicken although the amino acid sequence homologies are generally low between mammalian and chicken chemokines. Chicken genes were named with the systematic nomenclature used in humans and mice based on phylogeny, synteny, and sequence homology. Conclusion The independent nomenclature of chicken chemokines and chemokine receptors suggests that the chicken may have ligand-receptor pairings similar to mammals. All identified chicken chemokines and their cognate receptors were identified in the chicken genome except CCR9, whose ligand was not identified in this study. The organization of these genes suggests that there were a substantial number of these genes present before divergence between aves and mammals and more gene duplications of CC, CXC, CCR, and CXCR subfamilies in mammals than in aves after the divergence.

  4. Whole-genome sequence-based analysis of thyroid function

    OpenAIRE

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 1...

  5. [Current perspectives on genome-based diagnostic tests in Pediatrics].

    Science.gov (United States)

    Lay-Son, R Guillermo; León, P Luis

    2015-01-01

    Etiological diagnosis is essential in the clinical management of individual patients. Some children with complex medical conditions are subjected to numerous testing, known as "diagnostic odyssey", which often gives no conclusive results. In recent years, a revolution in genomic medicine is underway with the use of technologies that promise to increase the ability to make a diagnosis and reduce the time involved. The main advantages and limitations of genomic diagnosis, as opposed to usual methodologies are reviewed with an emphasis on Pediatrics. Copyright © 2015. Publicado por Elsevier España, S.L.U.

  6. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Science.gov (United States)

    Satoh, Soichirou; Mimuro, Mamoru; Tanaka, Ayumi

    2013-01-01

    Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  7. YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia.

    Science.gov (United States)

    Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh

    2015-01-16

    Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the

  8. Genomic tools development for Aquilegia: construction of a BAC-based physical map

    Directory of Open Access Journals (Sweden)

    Hodges Scott A

    2010-11-01

    Full Text Available Abstract Background The genus Aquilegia, consisting of approximately 70 taxa, is a member of the basal eudicot lineage, Ranuculales, which is evolutionarily intermediate between monocots and core eudicots, and represents a relatively unstudied clade in the angiosperm phylogenetic tree that bridges the gap between these two major plant groups. Aquilegia species are closely related and their distribution covers highly diverse habitats. These provide rich resources to better understand the genetic basis of adaptation to different pollinators and habitats that in turn leads to rapid speciation. To gain insights into the genome structure and facilitate gene identification, comparative genomics and whole-genome shotgun sequencing assembly, BAC-based genomics resources are of crucial importance. Results BAC-based genomic resources, including two BAC libraries, a physical map with anchored markers and BAC end sequences, were established from A. formosa. The physical map was composed of a total of 50,155 BAC clones in 832 contigs and 3939 singletons, covering 21X genome equivalents. These contigs spanned a physical length of 689.8 Mb (~2.3X of the genome suggesting the complex heterozygosity of the genome. A set of 197 markers was developed from ESTs induced by drought-stress, or involved in anthocyanin biosynthesis or floral development, and was integrated into the physical map. Among these were 87 genetically mapped markers that anchored 54 contigs, spanning 76.4 Mb (25.5% across the genome. Analysis of a selection of 12,086 BAC end sequences (BESs from the minimal tiling path (MTP allowed a preview of the Aquilegia genome organization, including identification of transposable elements, simple sequence repeats and gene content. Common repetitive elements previously reported in both monocots and core eudicots were identified in Aquilegia suggesting the value of this genome in connecting the two major plant clades. Comparison with sequenced plant genomes

  9. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    Josephine Erhiakporeh

    2016-07-06

    Jul 6, 2016 ... identification of a set of 75 candidate genes (42, 22 and 11 from Arabidopsis, potato and tomato, ... understanding on the genetic basis of drought tolerance by using the .... Comparative genomics and genes expression assay ... Primer code ... physiological and molecular responses to drought stress.

  10. The ethical introduction of genome-based information and technologies into public health.

    Science.gov (United States)

    Howard, H C; Swinnen, E; Douw, K; Vondeling, H; Cassiman, J-J; Cambon-Thomsen, A; Borry, P

    2013-01-01

    With the human genome project running from 1989 until its completion in 2003, and the incredible advances in sequencing technology and in bioinformatics during the last decade, there has been a shift towards an increase focus on studying common complex disorders which develop due to the interplay of many different genes as well as environmental factors. Although some susceptibility genes have been identified in some populations for disorders such as cancer, diabetes and cardiovascular diseases, the integration of this information into the health care system has proven to be much more problematic than for single gene disorders. Furthermore, with the 1000$ genome supposedly just around the corner, and whole genome sequencing gradually being integrated into research protocols as well as in the clinical context, there is a strong push for the uptake of additional genomic testing. Indeed, the advent of public health genomics, wherein genomics would be integrated in all aspects of health care and public health, should be taken seriously. Although laudable, these advances also bring with them a slew of ethical and social issues that challenge the normative frameworks used in clinical genetics until now. With this in mind, we highlight herein 5 principles that are used as a primer to discuss the ethical introduction of genome-based information and genome-based technologies into public health.

  11. A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease

    Science.gov (United States)

    Kyriakou, Theodosios; Nelson, Christopher P; Hopewell, Jemma C; Webb, Thomas R; Zeng, Lingyao; Dehghan, Abbas; Alver, Maris; Armasu, Sebastian M; Auro, Kirsi; Bjonnes, Andrew; Chasman, Daniel I; Chen, Shufeng; Ford, Ian; Franceschini, Nora; Gieger, Christian; Grace, Christopher; Gustafsson, Stefan; Huang, Jie; Hwang, Shih-Jen; Kim, Yun Kyoung; Kleber, Marcus E; Lau, King Wai; Lu, Xiangfeng; Lu, Yingchang; Lyytikäinen, Leo-Pekka; Mihailov, Evelin; Morrison, Alanna C; Pervjakova, Natalia; Qu, Liming; Rose, Lynda M; Salfati, Elias; Saxena, Richa; Scholz, Markus; Smith, Albert V; Tikkanen, Emmi; Uitterlinden, Andre; Yang, Xueli; Zhang, Weihua; Zhao, Wei; de Andrade, Mariza; de Vries, Paul S; van Zuydam, Natalie R; Anand, Sonia S; Bertram, Lars; Beutner, Frank; Dedoussis, George; Frossard, Philippe; Gauguier, Dominique; Goodall, Alison H; Gottesman, Omri; Haber, Marc; Han, Bok-Ghee; Huang, Jianfeng; Jalilzadeh, Shapour; Kessler, Thorsten; König, Inke R; Lannfelt, Lars; Lieb, Wolfgang; Lind, Lars; Lindgren, Cecilia M; Lokki, Marja-Liisa; Magnusson, Patrik K; Mallick, Nadeem H; Mehra, Narinder; Meitinger, Thomas; Memon, Fazal-ur-Rehman; Morris, Andrew P; Nieminen, Markku S; Pedersen, Nancy L; Peters, Annette; Rallidis, Loukianos S; Rasheed, Asif; Samuel, Maria; Shah, Svati H; Sinisalo, Juha; Stirrups, Kathleen E; Trompet, Stella; Wang, Laiyuan; Zaman, Khan S; Ardissino, Diego; Boerwinkle, Eric; Borecki, Ingrid B; Bottinger, Erwin P; Buring, Julie E; Chambers, John C; Collins, Rory; Cupples, L Adrienne; Danesh, John; Demuth, Ilja; Elosua, Roberto; Epstein, Stephen E; Esko, Tõnu; Feitosa, Mary F; Franco, Oscar H; Franzosi, Maria Grazia; Granger, Christopher B; Gu, Dongfeng; Gudnason, Vilmundur; Hall, Alistair S; Hamsten, Anders; Harris, Tamara B; Hazen, Stanley L; Hengstenberg, Christian; Hofman, Albert; Ingelsson, Erik; Iribarren, Carlos; Jukema, J Wouter; Karhunen, Pekka J; Kim, Bong-Jo; Kooner, Jaspal S; Kullo, Iftikhar J; Lehtimäki, Terho; Loos, Ruth J F; Melander, Olle; Metspalu, Andres; März, Winfried; Palmer, Colin N; Perola, Markus; Quertermous, Thomas; Rader, Daniel J; Ridker, Paul M; Ripatti, Samuli; Roberts, Robert; Salomaa, Veikko; Sanghera, Dharambir K; Schwartz, Stephen M; Seedorf, Udo; Stewart, Alexandre F; Stott, David J; Thiery, Joachim; Zalloua, Pierre A; O’Donnell, Christopher J; Reilly, Muredach P; Assimes, Themistocles L; Thompson, John R; Erdmann, Jeanette; Clarke, Robert; Watkins, Hugh; Kathiresan, Sekar; McPherson, Ruth; Deloukas, Panos; Schunkert, Heribert; Samani, Nilesh J; Farrall, Martin

    2015-01-01

    Existing knowledge of genetic variants affecting risk of coronary artery disease (CAD) is largely based on genome-wide association studies (GWAS) analysis of common SNPs. Leveraging phased haplotypes from the 1000 Genomes Project, we report a GWAS meta-analysis of 185 thousand CAD cases and controls, interrogating 6.7 million common (MAF>0.05) as well as 2.7 million low frequency (0.005

  12. A contig-based strategy for the genome-wide discovery of microRNAs without complete genome resources.

    Directory of Open Access Journals (Sweden)

    Jun-Zhi Wen

    Full Text Available MicroRNAs (miRNAs are important regulators of many cellular processes and exist in a wide range of eukaryotes. High-throughput sequencing is a mainstream method of miRNA identification through which it is possible to obtain the complete small RNA profile of an organism. Currently, most approaches to miRNA identification rely on a reference genome for the prediction of hairpin structures. However, many species of economic and phylogenetic importance are non-model organisms without complete genome sequences, and this limits miRNA discovery. Here, to overcome this limitation, we have developed a contig-based miRNA identification strategy. We applied this method to a triploid species of edible banana (GCTCV-119, Musa spp. AAA group and identified 180 pre-miRNAs and 314 mature miRNAs, which is three times more than those were predicted by the available dataset-based methods (represented by EST+GSS. Based on the recently published miRNA data set of Musa acuminate, the recall rate and precision of our strategy are estimated to be 70.6% and 92.2%, respectively, significantly better than those of EST+GSS-based strategy (10.2% and 50.0%, respectively. Our novel, efficient and cost-effective strategy facilitates the study of the functional and evolutionary role of miRNAs, as well as miRNA-based molecular breeding, in non-model species of economic or evolutionary interest.

  13. Similarity-based disease risk assessment for personal genomes: proof of concept.

    Science.gov (United States)

    Woo, Jung Hoon; Lai, Albert M; Chung, Wendy K; Weng, Chunhua

    2011-01-01

    The increasing availability of personal genome data has led to escalating needs by consumers to understand the implications of their gene sequences. At present, poorly integrated genetic knowledge has not met these needs. This proof-of-concept study proposes a similarity-based approach to assess the disease risk predisposition for personal genomes. We hypothesize that the semantic similarity between a personal genome and a disease can indicate the disease risks in the person. We developed a knowledge network that integrates existing knowledge of genes, diseases, and symptoms from six sources using the Semantic Web standard, Resource Description Framework (RDF). We then used latent relationships between genes and diseases derived from our knowledge network to measure the semantic similarity between a personal genome and a genetic disease. For demonstration, we showed the feasibility of assessing the disease risks in one personal genome and discussed related methodology issues.

  14. Multilevel Genomics-Based Taxonomy of Renal Cell Carcinoma

    Directory of Open Access Journals (Sweden)

    Fengju Chen

    2016-03-01

    Full Text Available On the basis of multidimensional and comprehensive molecular characterization (including DNA methalylation and copy number, RNA, and protein expression, we classified 894 renal cell carcinomas (RCCs of various histologic types into nine major genomic subtypes. Site of origin within the nephron was one major determinant in the classification, reflecting differences among clear cell, chromophobe, and papillary RCC. Widespread molecular changes associated with TFE3 gene fusion or chromatin modifier genes were present within a specific subtype and spanned multiple subtypes. Differences in patient survival and in alteration of specific pathways (including hypoxia, metabolism, MAP kinase, NRF2-ARE, Hippo, immune checkpoint, and PI3K/AKT/mTOR could further distinguish the subtypes. Immune checkpoint markers and molecular signatures of T cell infiltrates were both highest in the subtype associated with aggressive clear cell RCC. Differences between the genomic subtypes suggest that therapeutic strategies could be tailored to each RCC disease subset.

  15. Genome-based comparative analyses of Antarctic and temperate species of Paenibacillus.

    Directory of Open Access Journals (Sweden)

    Melissa Dsouza

    Full Text Available Antarctic soils represent a unique environment characterised by extremes of temperature, salinity, elevated UV radiation, low nutrient and low water content. Despite the harshness of this environment, members of 15 bacterial phyla have been identified in soils of the Ross Sea Region (RSR. However, the survival mechanisms and ecological roles of these phyla are largely unknown. The aim of this study was to investigate whether strains of Paenibacillus darwinianus owe their resilience to substantial genomic changes. For this, genome-based comparative analyses were performed on three P. darwinianus strains, isolated from gamma-irradiated RSR soils, together with nine temperate, soil-dwelling Paenibacillus spp. The genome of each strain was sequenced to over 1,000-fold coverage, then assembled into contigs totalling approximately 3 Mbp per genome. Based on the occurrence of essential, single-copy genes, genome completeness was estimated at approximately 88%. Genome analysis revealed between 3,043-3,091 protein-coding sequences (CDSs, primarily associated with two-component systems, sigma factors, transporters, sporulation and genes induced by cold-shock, oxidative and osmotic stresses. These comparative analyses provide an insight into the metabolic potential of P. darwinianus, revealing potential adaptive mechanisms for survival in Antarctic soils. However, a large proportion of these mechanisms were also identified in temperate Paenibacillus spp., suggesting that these mechanisms are beneficial for growth and survival in a range of soil environments. These analyses have also revealed that the P. darwinianus genomes contain significantly fewer CDSs and have a lower paralogous content. Notwithstanding the incompleteness of the assemblies, the large differences in genome sizes, determined by the number of genes in paralogous clusters and the CDS content, are indicative of genome content scaling. Finally, these sequences are a resource for further

  16. Evolutionary insights from suffix array-based genome sequence analysis

    Indian Academy of Sciences (India)

    Anindya Poddar; Nagasuma Chandra; Madhavi Ganapathiraju; K Sekar; Judith Klein-Seetharaman; Raj Reddy; N Balakrishnan

    2007-08-01

    Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational methods. Suffix trees and suffix arrays are data structures, well known in many other areas and are highly suited for sequence analysis too. Here we report an improvement to the design of construction of suffix arrays. Enhancement in versatility and scalability, enabled by this approach, is demonstrated through the use of real-life examples. The scalability of the algorithm to whole genomes renders it suitable to address many biologically interesting problems. One example is the evolutionary insight gained by analysing unigrams, bi-grams and higher n-grams, indicating that the genetic code has a direct influence on the overall composition of the genome. Further, different proteomes have been analysed for the coverage of the possible peptide space, which indicate that as much as a quarter of the total space at the tetra-peptide level is left un-sampled in prokaryotic organisms, although almost all tri-peptides can be seen in one protein or another in a proteome. Besides, distinct patterns begin to emerge for the counts of particular tetra and higher peptides, indicative of a ‘meaning’ for tetra and higher n-grams. The toolkit has also been used to demonstrate the usefulness of identifying repeats in whole proteomes efficiently. As an example, 16 members of one COG, coded by the genome of Mycobacterium tuberculosis H37Rv have been found to contain a repeating sequence of 300 amino acids.

  17. PBrowse: a web-based platform for real-time collaborative exploration of genomic data.

    Science.gov (United States)

    Szot, Peter S; Yang, Andrian; Wang, Xin; Parsania, Chirag; Röhm, Uwe; Wong, Koon Ho; Ho, Joshua W K

    2017-05-19

    Genome browsers are widely used for individually exploring various types of genomic data. A handful of genome browsers offer limited tools for collaboration among multiple users. Here, we describe PBrowse, an integrated real-time collaborative genome browser that enables multiple users to simultaneously view and access genomic data, thereby harnessing the wisdom of the crowd. PBrowse is based on the Dalliance genome browser and has a re-designed user and data management system with novel collaborative functionalities, including real-time collaborative view, track comment and an integrated group chat feature. Through the Distributed Annotation Server protocol, PBrowse can easily access a wide range of publicly available genomic data, such as the ENCODE data sets. We argue that PBrowse represents a paradigm shift from using a genome browser as a static data visualization tool to a platform that enables real-time human-human interaction and knowledge exchange in a collaborative setting. PBrowse is available at http://pbrowse.victorchang.edu.au, and its source code is available via an open source BSD 3 license at http://github.com/VCCRI/PBrowse. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Genome-wide prediction, display and refinement of binding sites with information theory-based models

    Directory of Open Access Journals (Sweden)

    Leeder J Steven

    2003-09-01

    Full Text Available Abstract Background We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined weight matrices. Matrices are refined by adding experimentally-defined binding sites to published binding sites. Delila-Genome was used to examine the accuracy of individual information contents of binding sites detected with refined matrices as a measure of the strengths of the corresponding protein-nucleic acid interactions. The software can then be used to predict novel sites by rescanning the genome with the refined matrices. Results Parameters for genome scans are entered using a Java-based GUI interface and backend scripts in Perl. Multi-processor CPU load-sharing minimized the average response time for scans of different chromosomes. Scans of human genome assemblies required 4–6 hours for transcription factor binding sites and 10–19 hours for splice sites, respectively, on 24- and 3-node Mosix and Beowulf clusters. Individual binding sites are displayed either as high-resolution sequence walkers or in low-resolution custom tracks in the UCSC genome browser. For large datasets, we applied a data reduction strategy that limited displays of binding sites exceeding a threshold information content to specific chromosomal regions within or adjacent to genes. An HTML document is produced listing binding sites ranked by binding site strength or chromosomal location hyperlinked to the UCSC custom track, other annotation databases and binding site sequences. Post-genome scan tools parse binding site annotations of selected chromosome intervals and compare the results of genome scans using different weight matrices. Comparisons of multiple genome scans can display binding sites that are unique to each scan and identify sites with significantly altered binding strengths

  19. Event-based text mining for biology and functional genomics

    Science.gov (United States)

    Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.

    2015-01-01

    The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365

  20. Event-based text mining for biology and functional genomics.

    Science.gov (United States)

    Ananiadou, Sophia; Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B

    2015-05-01

    The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of 'events', i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research.

  1. Generation of a BAC-based physical map of the melon genome

    Directory of Open Access Journals (Sweden)

    Puigdomènech Pere

    2010-05-01

    Full Text Available Abstract Background Cucumis melo (melon belongs to the Cucurbitaceae family, whose economic importance among horticulture crops is second only to Solanaceae. Melon has high intra-specific genetic variation, morphologic diversity and a small genome size (450 Mb, which make this species suitable for a great variety of molecular and genetic studies that can lead to the development of tools for breeding varieties of the species. A number of genetic and genomic resources have already been developed, such as several genetic maps and BAC genomic libraries. These tools are essential for the construction of a physical map, a valuable resource for map-based cloning, comparative genomics and assembly of whole genome sequencing data. However, no physical map of any Cucurbitaceae has yet been developed. A project has recently been started to sequence the complete melon genome following a whole-genome shotgun strategy, which makes use of massive sequencing data. A BAC-based melon physical map will be a useful tool to help assemble and refine the draft genome data that is being produced. Results A melon physical map was constructed using a 5.7 × BAC library and a genetic map previously developed in our laboratories. High-information-content fingerprinting (HICF was carried out on 23,040 BAC clones, digesting with five restriction enzymes and SNaPshot labeling, followed by contig assembly with FPC software. The physical map has 1,355 contigs and 441 singletons, with an estimated physical length of 407 Mb (0.9 × coverage of the genome and the longest contig being 3.2 Mb. The anchoring of 845 BAC clones to 178 genetic markers (100 RFLPs, 76 SNPs and 2 SSRs also allowed the genetic positioning of 183 physical map contigs/singletons, representing 55 Mb (12% of the melon genome, to individual chromosomal loci. The melon FPC database is available for download at http://melonomics.upv.es/static/files/public/physical_map/. Conclusions Here we report the construction

  2. Applications of Genome-based Science in Shaping Citrus Industries of the World (JGI Seventh Annual User Meeting, 2012: Genomics of Energy and Environment)

    Energy Technology Data Exchange (ETDEWEB)

    Gmitter Jr, Fred [University of Florida

    2012-03-21

    Fred Gmitter from the University of Florida on "Applications of Genome-based Science in Shaping the Future of the World's Citrus Industries" at the 7th Annual Genomics of Energy & Environment Meeting on March 21, 2012 in Walnut Creek, California.

  3. A BAC-based physical map of the Drosophila buzzatii genome

    Energy Technology Data Exchange (ETDEWEB)

    Gonzalez, Josefa; Nefedov, Michael; Bosdet, Ian; Casals, Ferran; Calvete, Oriol; Delprat, Alejandra; Shin, Heesun; Chiu, Readman; Mathewson, Carrie; Wye, Natasja; Hoskins, Roger A.; Schein, JacquelineE.; de Jong, Pieter; Ruiz, Alfredo

    2005-03-18

    Large-insert genomic libraries facilitate cloning of large genomic regions, allow the construction of clone-based physical maps and provide useful resources for sequencing entire genomes. Drosophilabuzzatii is a representative species of the repleta group in the Drosophila subgenus, which is being widely used as a model in studies of genome evolution, ecological adaptation and speciation. We constructed a Bacterial Artificial Chromosome (BAC) genomic library of D. buzzatii using the shuttle vector pTARBAC2.1. The library comprises 18,353 clones with an average insert size of 152 kb and a {approx}18X expected representation of the D. buzzatii euchromatic genome. We screened the entire library with six euchromatic gene probes and estimated the actual genome representation to be {approx}23X. In addition, we fingerprinted by restriction digestion and agarose gel electrophoresis a sample of 9,555 clones, and assembled them using Finger Printed Contigs (FPC) software and manual editing into 345 contigs (mean of 26 clones per contig) and 670singletons. Finally, we anchored 181 large contigs (containing 7,788clones) to the D. buzzatii salivary gland polytene chromosomes by in situ hybridization of 427 representative clones. The BAC library and a database with all the information regarding the high coverage BAC-based physical map described in this paper are available to the research community.

  4. Integrated genome-based studies of Shewanella ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Segre Daniel; Beg Qasim

    2012-02-14

    This project was a component of the Shewanella Federation and, as such, contributed to the overall goal of applying the genomic tools to better understand eco-physiology and speciation of respiratory-versatile members of Shewanella genus. Our role at Boston University was to perform bioreactor and high throughput gene expression microarrays, and combine dynamic flux balance modeling with experimentally obtained transcriptional and gene expression datasets from different growth conditions. In the first part of project, we designed the S. oneidensis microarray probes for Affymetrix Inc. (based in California), then we identified the pathways of carbon utilization in the metal-reducing marine bacterium Shewanella oneidensis MR-1, using our newly designed high-density oligonucleotide Affymetrix microarray on Shewanella cells grown with various carbon sources. Next, using a combination of experimental and computational approaches, we built algorithm and methods to integrate the transcriptional and metabolic regulatory networks of S. oneidensis. Specifically, we combined mRNA microarray and metabolite measurements with statistical inference and dynamic flux balance analysis (dFBA) to study the transcriptional response of S. oneidensis MR-1 as it passes through exponential, stationary, and transition phases. By measuring time-dependent mRNA expression levels during batch growth of S. oneidensis MR-1 under two radically different nutrient compositions (minimal lactate and nutritionally rich LB medium), we obtain detailed snapshots of the regulatory strategies used by this bacterium to cope with gradually changing nutrient availability. In addition to traditional clustering, which provides a first indication of major regulatory trends and transcription factors activities, we developed and implemented a new computational approach for Dynamic Detection of Transcriptional Triggers (D2T2). This new method allows us to infer a putative topology of transcriptional dependencies

  5. Intimate evolution of proteins. Proteome atomic content correlates with genome base composition.

    Science.gov (United States)

    Baudouin-Cornu, Peggy; Schuerer, Katja; Marlière, Philippe; Thomas, Dominique

    2004-02-13

    Discerning the significant relations that exist within and among genome sequences is a major step toward the modeling of biopolymer evolution. Here we report the systematic analysis of the atomic composition of proteins encoded by organisms representative of each kingdoms. Protein atomic contents are shown to vary largely among species, the larger variations being observed for the main architectural component of proteins, the carbon atom. These variations apply to the bulk proteins as well as to subsets of ortholog proteins. A pronounced correlation between proteome carbon content and genome base composition is further evidenced, with high G+C genome content being related to low protein carbon content. The generation of random proteomes and the examination of the canonical genetic code provide arguments for the hypothesis that natural selection might have driven genome base composition.

  6. cDNA-AFLP-based genetical genomics in cotton fibers.

    Science.gov (United States)

    Claverie, Michel; Souquet, Marlène; Jean, Janine; Forestier-Chiron, Nelly; Lepitre, Vincent; Pré, Martial; Jacobs, John; Llewellyn, Danny; Lacape, Jean-Marc

    2012-03-01

    Genetical genomics, or genetic analysis applied to gene expression data, has not been widely used in plants. We used quantitative cDNA-AFLP to monitor the variation in the expression level of cotton fiber transcripts among a population of inter-specific Gossypium hirsutum × G. barbadense recombinant inbred lines (RILs). Two key fiber developmental stages, elongation (10 days post anthesis, dpa), and secondary cell wall thickening (22 dpa), were studied. Normalized intensity ratios of 3,263 and 1,201 transcript-derived fragments (TDFs) segregating over 88 RILs were analyzed for quantitative trait loci (QTL) mapping for the 10 and 22 dpa fibers, respectively. Two-thirds of all TDFs mapped between 1 and 6 eQTLs (LOD > 3.5). Chromosome 21 had a higher density of eQTLs than other chromosomes in both data sets and, within chromosomes, hotspots of presumably trans-acting eQTLs were identified. The eQTL hotspots were compared to the location of phenotypic QTLs for fiber characteristics among the RILs, and several cases of co-localization were detected. Quantitative RT-PCR for 15 sequenced TDFs showed that 3 TDFs had at least one eQTL at a similar location to those identified by cDNA-AFLP, while 3 other TDFs mapped an eQTL at a similar location but with opposite additive effect. In conclusion, cDNA-AFLP proved to be a cost-effective and highly transferable platform for genome-wide and population-wide gene expression profiling. Because TDFs are anonymous, further validation and interpretation (in silico analysis, qPCR gene profiling) of the eQTL and eQTL hotspots will be facilitated by the increasing availability of cDNA and genomic sequence resources in cotton.

  7. Prokaryotic Phylogeny Based on Complete Genomes Without Sequence Alignment

    Science.gov (United States)

    Hao, Bailin; Qi, Ji; Wang, Bin

    2003-04-01

    This is a brief review of a series of on-going work on bacterial phylogeny. We have proposed a new method to infer relatedness of prokaryotes from their complete genome data without using sequence alignment. It has led to results comparable with the bacteriologists' systematics as reflected in the latest 2001 edition of the Bergey's Manual of Systematic Bacteriology1. In what follows we only touch on the mathematical aspects of the method. The biological implications of our results will be published elsewhere.

  8. Recombination analysis based on the complete genome of bocavirus

    Directory of Open Access Journals (Sweden)

    Chen Shengxia

    2011-04-01

    Full Text Available Abstract Bocavirus include bovine parvovirus, minute virus of canine, porcine bocavirus, gorilla bocavirus, and Human bocaviruses 1-4 (HBoVs. Although recent reports showed that recombination happened in bocavirus, no systematical study investigated the recombination of bocavirus. The present study performed the phylogenetic and recombination analysis of bocavirus over the complete genomes available in GenBank. Results confirmed that recombination existed among bocavirus, including the likely inter-genotype recombination between HBoV1 and HBoV4, and intra-genotype recombination among HBoV2 variants. Moreover, it is the first report revealing the recombination that occurred between minute viruses of canine.

  9. Sequence based polymorphic (SBP marker technology for targeted genomic regions: its application in generating a molecular map of the Arabidopsis thaliana genome

    Directory of Open Access Journals (Sweden)

    Sahu Binod B

    2012-01-01

    Full Text Available Abstract Background Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs for any genomic regions. Here a sequence based polymorphic (SBP marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described. Results A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0 was obtained by applying Illumina's sequencing by synthesis (Solexa technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0 genome sequence identified putative single nucleotide polymorphisms (SNPs throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. Conclusions The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for

  10. Regulatory hurdles for genome editing: process- vs. product-based approaches in different regulatory contexts

    OpenAIRE

    Sprink, Thorben; Eriksson, Dennis; Schiemann, Joachim; Hartung, Frank

    2016-01-01

    Novel plant genome editing techniques call for an updated legislation regulating the use of plants produced by genetic engineering or genome editing, especially in the European Union. Established more than 25?years ago and based on a clear distinction between transgenic and conventionally bred plants, the current EU Directives fail to accommodate the new continuum between genetic engineering and conventional breeding. Despite the fact that the Directive 2001/18/EC contains both process- and p...

  11. Implementing Genomic Clinical Decision Support for Drug‐Based Precision Medicine

    Science.gov (United States)

    Formea, CM; Hoffman, JM; Matey, E; Peterson, JF; Boyce, RD

    2017-01-01

    The explosive growth of patient‐specific genomic information relevant to drug therapy will continue to be a defining characteristic of biomedical research. To implement drug‐based personalized medicine (PM) for patients, clinicians need actionable information incorporated into electronic health records (EHRs). New clinical decision support (CDS) methods and informatics infrastructure are required in order to comprehensively integrate, interpret, deliver, and apply the full range of genomic data for each patient.1 PMID:28109071

  12. A versatile genome-scale PCR-based pipeline for high-definition DNA FISH.

    Science.gov (United States)

    Bienko, Magda; Crosetto, Nicola; Teytelman, Leonid; Klemm, Sandy; Itzkovitz, Shalev; van Oudenaarden, Alexander

    2013-02-01

    We developed a cost-effective genome-scale PCR-based method for high-definition DNA FISH (HD-FISH). We visualized gene loci with diffraction-limited resolution, chromosomes as spot clusters and single genes together with transcripts by combining HD-FISH with single-molecule RNA FISH. We provide a database of over 4.3 million primer pairs targeting the human and mouse genomes that is readily usable for rapid and flexible generation of probes.

  13. BioViews: Java-based tools for genomic data visualization.

    Science.gov (United States)

    Helt, G A; Lewis, S; Loraine, A E; Rubin, G M

    1998-03-01

    Visualization tools for bioinformatics ideally should provide universal access to the most current data in an interactive and intuitive graphical user interface. Since the introduction of Java, a language designed for distributed programming over the Web, the technology now exists to build a genomic data visualization tool that meets these requirements. Using Java we have developed a prototype genome browser applet (BioViews) that incorporates a three-level graphical view of genomic data: a physical map, an annotated sequence map, and a DNA sequence display. Annotated biological features are displayed on the physical and sequence-based maps, and the different views are interconnected. The applet is linked to several databases and can retrieve features and display hyperlinked textual data on selected features. In addition to browsing genomic data, different types of analyses can be performed interactively and the results of these analyses visualized alongside prior annotations. Our genome browser is built on top of extensible, reusable graphic components specifically designed for bioinformatics. Other groups can (and do) reuse this work in various ways. Genome centers can reuse large parts of the genome browser with minor modifications, bioinformatics groups working on sequence analysis can reuse components to build front ends for analysis programs, and biology laboratories can reuse components to publish results as dynamic Web documents.

  14. Toward Genomics-Based Breeding in C3 Cool-Season Perennial Grasses.

    Science.gov (United States)

    Talukder, Shyamal K; Saha, Malay C

    2017-01-01

    Most important food and feed crops in the world belong to the C3 grass family. The future of food security is highly reliant on achieving genetic gains of those grasses. Conventional breeding methods have already reached a plateau for improving major crops. Genomics tools and resources have opened an avenue to explore genome-wide variability and make use of the variation for enhancing genetic gains in breeding programs. Major C3 annual cereal breeding programs are well equipped with genomic tools; however, genomic research of C3 cool-season perennial grasses is lagging behind. In this review, we discuss the currently available genomics tools and approaches useful for C3 cool-season perennial grass breeding. Along with a general review, we emphasize the discussion focusing on forage grasses that were considered orphan and have little or no genetic information available. Transcriptome sequencing and genotype-by-sequencing technology for genome-wide marker detection using next-generation sequencing (NGS) are very promising as genomics tools. Most C3 cool-season perennial grass members have no prior genetic information; thus NGS technology will enhance collinear study with other C3 model grasses like Brachypodium and rice. Transcriptomics data can be used for identification of functional genes and molecular markers, i.e., polymorphism markers and simple sequence repeats (SSRs). Genome-wide association study with NGS-based markers will facilitate marker identification for marker-assisted selection. With limited genetic information, genomic selection holds great promise to breeders for attaining maximum genetic gain of the cool-season C3 perennial grasses. Application of all these tools can ensure better genetic gains, reduce length of selection cycles, and facilitate cultivar development to meet the future demand for food and fodder.

  15. Toward Genomics-Based Breeding in C3 Cool-Season Perennial Grasses

    Directory of Open Access Journals (Sweden)

    Shyamal K. Talukder

    2017-07-01

    Full Text Available Most important food and feed crops in the world belong to the C3 grass family. The future of food security is highly reliant on achieving genetic gains of those grasses. Conventional breeding methods have already reached a plateau for improving major crops. Genomics tools and resources have opened an avenue to explore genome-wide variability and make use of the variation for enhancing genetic gains in breeding programs. Major C3 annual cereal breeding programs are well equipped with genomic tools; however, genomic research of C3 cool-season perennial grasses is lagging behind. In this review, we discuss the currently available genomics tools and approaches useful for C3 cool-season perennial grass breeding. Along with a general review, we emphasize the discussion focusing on forage grasses that were considered orphan and have little or no genetic information available. Transcriptome sequencing and genotype-by-sequencing technology for genome-wide marker detection using next-generation sequencing (NGS are very promising as genomics tools. Most C3 cool-season perennial grass members have no prior genetic information; thus NGS technology will enhance collinear study with other C3 model grasses like Brachypodium and rice. Transcriptomics data can be used for identification of functional genes and molecular markers, i.e., polymorphism markers and simple sequence repeats (SSRs. Genome-wide association study with NGS-based markers will facilitate marker identification for marker-assisted selection. With limited genetic information, genomic selection holds great promise to breeders for attaining maximum genetic gain of the cool-season C3 perennial grasses. Application of all these tools can ensure better genetic gains, reduce length of selection cycles, and facilitate cultivar development to meet the future demand for food and fodder.

  16. Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes

    DEFF Research Database (Denmark)

    Bohlin, J; Skjerve, E; Ussery, David

    2008-01-01

    BACKGROUND: The increasing number of sequenced prokaryotic genomes contains a wealth of genomic data that needs to be effectively analysed. A set of statistical tools exists for such analysis, but their strengths and weaknesses have not been fully explored. The statistical methods we are concerned......, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore...... measure was a good measure to detect horizontally transferred regions, and when used to compare the phylogenetic relationships between plasmids and hosts, significant correlation (R2 = 0.4) was found with genomic GC content and intra-chromosomal homogeneity. CONCLUSION: The statistical methods examined...

  17. Application of Microarray-Based Comparative Genomic Hybridization in Prenatal and Postnatal Settings: Three Case Reports

    Directory of Open Access Journals (Sweden)

    Jing Liu

    2011-01-01

    Full Text Available Microarray-based comparative genomic hybridization (array CGH is a newly emerged molecular cytogenetic technique for rapid evaluation of the entire genome with sub-megabase resolution. It allows for the comprehensive investigation of thousands and millions of genomic loci at once and therefore enables the efficient detection of DNA copy number variations (a.k.a, cryptic genomic imbalances. The development and the clinical application of array CGH have revolutionized the diagnostic process in patients and has provided a clue to many unidentified or unexplained diseases which are suspected to have a genetic cause. In this paper, we present three clinical cases in both prenatal and postnatal settings. Among all, array CGH played a major discovery role to reveal the cryptic and/or complex nature of chromosome arrangements. By identifying the genetic causes responsible for the clinical observation in patients, array CGH has provided accurate diagnosis and appropriate clinical management in a timely and efficient manner.

  18. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  19. VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics.

    Science.gov (United States)

    Megy, Karine; Emrich, Scott J; Lawson, Daniel; Campbell, David; Dialynas, Emmanuel; Hughes, Daniel S T; Koscielny, Gautier; Louis, Christos; Maccallum, Robert M; Redmond, Seth N; Sheehan, Andrew; Topalis, Pantelis; Wilson, Derek

    2012-01-01

    VectorBase (http://www.vectorbase.org) is a NIAID-supported bioinformatics resource for invertebrate vectors of human pathogens. It hosts data for nine genomes: mosquitoes (three Anopheles gambiae genomes, Aedes aegypti and Culex quinquefasciatus), tick (Ixodes scapularis), body louse (Pediculus humanus), kissing bug (Rhodnius prolixus) and tsetse fly (Glossina morsitans). Hosted data range from genomic features and expression data to population genetics and ontologies. We describe improvements and integration of new data that expand our taxonomic coverage. Releases are bi-monthly and include the delivery of preliminary data for emerging genomes. Frequent updates of the genome browser provide VectorBase users with increasing options for visualizing their own high-throughput data. One major development is a new population biology resource for storing genomic variations, insecticide resistance data and their associated metadata. It takes advantage of improved ontologies and controlled vocabularies. Combined, these new features ensure timely release of multiple types of data in the public domain while helping overcome the bottlenecks of bioinformatics and annotation by engaging with our user community.

  20. WormBase ParaSite - a comprehensive resource for helminth genomics.

    Science.gov (United States)

    Howe, Kevin L; Bolt, Bruce J; Shafie, Myriam; Kersey, Paul; Berriman, Matthew

    2017-07-01

    The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  1. Kernel-based whole-genome prediction of complex traits: a review

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics. PMID:25360145

  2. Multiplex CRISPR/Cas9-based genome engineering from a single lentiviral vector.

    Science.gov (United States)

    Kabadi, Ami M; Ousterout, David G; Hilton, Isaac B; Gersbach, Charles A

    2014-10-29

    Engineered DNA-binding proteins that manipulate the human genome and transcriptome have enabled rapid advances in biomedical research. In particular, the RNA-guided CRISPR/Cas9 system has recently been engineered to create site-specific double-strand breaks for genome editing or to direct targeted transcriptional regulation. A unique capability of the CRISPR/Cas9 system is multiplex genome engineering by delivering a single Cas9 enzyme and two or more single guide RNAs (sgRNAs) targeted to distinct genomic sites. This approach can be used to simultaneously create multiple DNA breaks or to target multiple transcriptional activators to a single promoter for synergistic enhancement of gene induction. To address the need for uniform and sustained delivery of multiplex CRISPR/Cas9-based genome engineering tools, we developed a single lentiviral system to express a Cas9 variant, a reporter gene and up to four sgRNAs from independent RNA polymerase III promoters that are incorporated into the vector by a convenient Golden Gate cloning method. Each sgRNA is efficiently expressed and can mediate multiplex gene editing and sustained transcriptional activation in immortalized and primary human cells. This delivery system will be significant to enabling the potential of CRISPR/Cas9-based multiplex genome engineering in diverse cell types. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform.

    Science.gov (United States)

    Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh

    2016-01-01

    Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI) and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%), predicted hydrophobicity and molecular weight (Da) using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1) client workstation, (2) web server, (3) application server, and (4) database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC) framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs), 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence Factor

  4. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform

    Directory of Open Access Journals (Sweden)

    Wenning Zheng

    2016-03-01

    Full Text Available Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%, predicted hydrophobicity and molecular weight (Da using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1 client workstation, (2 web server, (3 application server, and (4 database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs, 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence

  5. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  6. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  7. RadishBase: a database for genomics and genetics of radish.

    Science.gov (United States)

    Shen, Di; Sun, Honghe; Huang, Mingyun; Zheng, Yi; Li, Xixiang; Fei, Zhangjun

    2013-02-01

    Radish is an economically important vegetable crop. During the past several years, large-scale genomics and genetics resources have been accumulated for this species. To store, query, analyze and integrate these radish resources efficiently, we have developed RadishBase (http://bioinfo.bti.cornell.edu/radish), a genomics and genetics database of radish. Currently the database contains radish mitochondrial genome sequences, expressed sequence tag (EST) and unigene sequences and annotations, biochemical pathways, EST-derived single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers, and genetic maps. RadishBase is designed to enable users easily to retrieve and visualize biologically important information through a set of efficient query interfaces and analysis tools, including the BLAST search and unigene annotation query interfaces, and tools to classify unigenes functionally, to identify enriched gene ontology (GO) terms and to visualize genetic maps. A database containing radish pathways predicted from unigene sequences is also included in RadishBase. The tools and interfaces in RadishBase allow efficient mining of recently released and continually expanding large-scale radish genomics and genetics data sets, including the radish genome sequences and RNA-seq data sets.

  8. Microarray-based genomic profiling as a diagnostic tool in acute lymphoblastic leukemia.

    Science.gov (United States)

    Simons, Annet; Stevens-Kroef, Marian; El Idrissi-Zaynoun, Najat; van Gessel, Sabine; Weghuis, Daniel Olde; van den Berg, Eva; Waanders, Esmé; Hoogerbrugge, Peter; Kuiper, Roland; van Kessel, Ad Geurts

    2011-12-01

    In acute lymphoblastic leukemia (ALL) specific genomic abnormalities provide important clinical information. In most routine clinical diagnostic laboratories conventional karyotyping, in conjunction with targeted screens using e.g., fluorescence in situ hybridization (FISH), is currently considered as the gold standard to detect such aberrations. Conventional karyotyping, however, is limited in its resolution and yield, thus hampering the genetic diagnosis of ALL. We explored whether microarray-based genomic profiling would be feasible as an alternative strategy in a routine clinical diagnostic setting. To this end, we compared conventional karyotypes with microarray-deduced copy number aberration (CNA) karyotypes in 60 ALL cases. Microarray-based genomic profiling resulted in a CNA detection rate of 90%, whereas for conventional karyotyping this was 61%. In addition, many small (< 5 Mb) genetic lesions were encountered, frequently harboring clinically relevant ALL-related genes such as CDKN2A/B, ETV6, PAX5, and IKZF1. From our data we conclude that microarray-based genomic profiling serves as a robust tool in the genetic diagnosis of ALL, outreaching conventional karyotyping in CNA detection both in terms of sensitivity and specificity. We also propose a practical workflow for a comprehensive and objective interpretation of CNAs obtained through microarray-based genomic profiling, thereby facilitating its application in a routine clinical diagnostic setting.

  9. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    Science.gov (United States)

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  10. Genome reannotation of the lizard Anolis carolinensis based on 14 adult and embryonic deep transcriptomes

    Directory of Open Access Journals (Sweden)

    Eckalbar Walter L

    2013-01-01

    Full Text Available Abstract Background The green anole lizard, Anolis carolinensis, is a key species for both laboratory and field-based studies of evolutionary genetics, development, neurobiology, physiology, behavior, and ecology. As the first non-avian reptilian genome sequenced, A. carolinesis is also a prime reptilian model for comparison with other vertebrate genomes. The public databases of Ensembl and NCBI have provided a first generation gene annotation of the anole genome that relies primarily on sequence conservation with related species. A second generation annotation based on tissue-specific transcriptomes would provide a valuable resource for molecular studies. Results Here we provide an annotation of the A. carolinensis genome based on de novo assembly of deep transcriptomes of 14 adult and embryonic tissues. This revised annotation describes 59,373 transcripts, compared to 16,533 and 18,939 currently for Ensembl and NCBI, and 22,962 predicted protein-coding genes. A key improvement in this revised annotation is coverage of untranslated region (UTR sequences, with 79% and 59% of transcripts containing 5’ and 3’ UTRs, respectively. Gaps in genome sequence from the current A. carolinensis build (Anocar2.0 are highlighted by our identification of 16,542 unmapped transcripts, representing 6,695 orthologues, with less than 70% genomic coverage. Conclusions Incorporation of tissue-specific transcriptome sequence into the A. carolinensis genome annotation has markedly improved its utility for comparative and functional studies. Increased UTR coverage allows for more accurate predicted protein sequence and regulatory analysis. This revised annotation also provides an atlas of gene expression specific to adult and embryonic tissues.

  11. Single gene-based distinction of individual microbial genomes from a mixed population of microbial cells

    Directory of Open Access Journals (Sweden)

    Manu Valtteri Tamminen

    2015-03-01

    Full Text Available Recent progress in environmental microbiology has revealed vast populations of microbes in any given habitat that cannot be detected by conventional culturing strategies. The use of sensitive genetic detection methods such as CARD-FISH and in situ PCR have been limited by the cell wall permeabilization requirement that cannot be performed similarly on all cell types without lysing some and leaving some unpermeabilized. Furthermore, the detection of low copy targets such as genes present in single copies in the microbial genomes, has remained problematic. We describe an emulsion-based procedure to trap individual microbial cells into picoliter-volume polyacrylamide droplets that provide a rigid support for genetic material and therefore allow complete degradation of cellular material to expose the individual genomes. The polyacrylamide droplets are subsequently converted into picoliter-scale reactors for genome amplification. The amplified genomes are labelled based on the presence of a target gene and differentiated from those that do not contain the gene by flow cytometry. Using the Escherichia coli strains XL1 and MC1061, which differ with respect to the presence (XL1 or absence (MC1061 of a single copy of a tetracycline resistance gene per genome, we demonstrate that XL1 genomes present at 0.1% of MC1061 genomes can be differentiated using this method. Using a spiked sediment microbial sample, we demonstrate that the method is applicable to highly complex environmental microbial communities as a target gene-based screen for individual microbes. The method provides a novel tool for enumerating functional cell populations in complex microbial communities. We envision that the method could be optimized for fluorescence-activated cell sorting to enrich genetic material of interest from complex environmental samples.

  12. GENOMIC VARIABILITY AMONG CATTLE POPULATIONS BASED ON RUNS OF HOMOZYGOSITY

    Directory of Open Access Journals (Sweden)

    Veronika Šidlová

    2015-09-01

    Full Text Available In this work, the distribution of different lengths ROH (runs of homozygosity in six cattle breeds was described. A total of 122 animals from six cattle breeds (Holstein, Simmental, Austrian Pinzgau, Ayrshire, MRI-Meuse Rhine Issel and Slovak Pinzgau were analysed. The ROH approach was used to distinguish Slovak Pinzgau population from other investigated breeds as well as to differentiate between ancient and recent inbreeding. The average number of ROH per animal ranged from 17.06 in Holstein to 159.22 in Ayrshire. The highest number of short ROH (ancient inbreeding was found in Simmental, followed by Ayrshire. The Ayrshire and MRI had a higher proportion of longer ROH distributed across the whole genome, revealing recent inbreeding. ROH were identified and used to estimate molecular inbreeding coefficients (FROH. The highest level of inbreeding from the investigated breeds was found out in Ayrshire with the same tendency for all length categories compared to Slovak Pinzgau with higher ancient inbreeding. Ancient inbreeding was only observed in Holstein population. A similar trend is becoming apparent even for Slovak Pinzgau, showing the second smallest recent inbreeding. Therefore, it is necessary to preserve the given population in the original phenotype and prevent further increase of inbreeding especially in endangered breeds.

  13. Whole genome data for omics-based research on the self-fertilizing fish Kryptolebias marmoratus.

    Science.gov (United States)

    Rhee, Jae-Sung; Lee, Jae-Seong

    2014-08-30

    Genome resources have advantages for understanding diverse areas such as biological patterns and functioning of organisms. Omics platforms are useful approaches for the study of organs and organisms. These approaches can be powerful screening tools for whole genome, proteome, and metabolome profiling, and can be used to understand molecular changes in response to internal and external stimuli. This methodology has been applied successfully in freshwater model fish such as the zebrafish Danio rerio and the Japanese medaka Oryzias latipes in research areas such as basic physiology, developmental biology, genetics, and environmental biology. However, information is still scarce about model fish that inhabit brackish water or seawater. To develop the self-fertilizing killifish Kryptolebias marmoratus as a potential model species with unique characteristics and research merits, we obtained genomic information about K. marmoratus. We address ways to use these data for genome-based molecular mechanistic studies. We review the current state of genome information on K. marmoratus to initiate omics approaches. We evaluate the potential applications of integrated omics platforms for future studies in environmental science, developmental biology, and biomedical research. We conclude that information about the K. marmoratus genome will provide a better understanding of the molecular functions of genes, proteins, and metabolites that are involved in the biological functions of this species. Omics platforms, particularly combined technologies that make effective use of bioinformatics, will provide powerful tools for hypothesis-driven investigations and discovery-driven discussions on diverse aspects of this species and on fish and vertebrates in general.

  14. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm

    Science.gov (United States)

    de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.

    2016-01-01

    Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657

  15. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Directory of Open Access Journals (Sweden)

    Daniel M de Brito

    Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

  16. Integrated and sequence-ordered BAC- and YAC-based physical maps for the rat genome.

    Science.gov (United States)

    Krzywinski, Martin; Wallis, John; Gösele, Claudia; Bosdet, Ian; Chiu, Readman; Graves, Tina; Hummel, Oliver; Layman, Dan; Mathewson, Carrie; Wye, Natasja; Zhu, Baoli; Albracht, Derek; Asano, Jennifer; Barber, Sarah; Brown-John, Mabel; Chan, Susanna; Chand, Steve; Cloutier, Alison; Davito, Jonathon; Fjell, Chris; Gaige, Tony; Ganten, Detlev; Girn, Noreen; Guggenheimer, Kurtis; Himmelbauer, Heinz; Kreitler, Thomas; Leach, Stephen; Lee, Darlene; Lehrach, Hans; Mayo, Michael; Mead, Kelly; Olson, Teika; Pandoh, Pawan; Prabhu, Anna-Liisa; Shin, Heesun; Tänzer, Simone; Thompson, Jason; Tsai, Miranda; Walker, Jason; Yang, George; Sekhon, Mandeep; Hillier, LaDeana; Zimdahl, Heike; Marziali, Andre; Osoegawa, Kazutoyo; Zhao, Shaying; Siddiqui, Asim; de Jong, Pieter J; Warren, Wes; Mardis, Elaine; McPherson, John D; Wilson, Richard; Hübner, Norbert; Jones, Steven; Marra, Marco; Schein, Jacqueline

    2004-04-01

    As part of the effort to sequence the genome of Rattus norvegicus, we constructed a physical map comprised of fingerprinted bacterial artificial chromosome (BAC) clones from the CHORI-230 BAC library. These BAC clones provide approximately 13-fold redundant coverage of the genome and have been assembled into 376 fingerprint contigs. A yeast artificial chromosome (YAC) map was also constructed and aligned with the BAC map via fingerprinted BAC and P1 artificial chromosome clones (PACs) sharing interspersed repetitive sequence markers with the YAC-based physical map. We have annotated 95% of the fingerprint map clones in contigs with coordinates on the version 3.1 rat genome sequence assembly, using BAC-end sequences and in silico mapping methods. These coordinates have allowed anchoring 358 of the 376 fingerprint map contigs onto the sequence assembly. Of these, 324 contigs are anchored to rat genome sequences localized to chromosomes, and 34 contigs are anchored to unlocalized portions of the rat sequence assembly. The remaining 18 contigs, containing 54 clones, still require placement. The fingerprint map is a high-resolution integrative data resource that provides genome-ordered associations among BAC, YAC, and PAC clones and the assembled sequence of the rat genome.

  17. CRISPR-based screening of genomic island excision events in bacteria.

    Science.gov (United States)

    Selle, Kurt; Klaenhammer, Todd R; Barrangou, Rodolphe

    2015-06-30

    Genomic analysis of Streptococcus thermophilus revealed that mobile genetic elements (MGEs) likely contributed to gene acquisition and loss during evolutionary adaptation to milk. Clustered regularly interspaced short palindromic repeats-CRISPR-associated genes (CRISPR-Cas), the adaptive immune system in bacteria, limits genetic diversity by targeting MGEs including bacteriophages, transposons, and plasmids. CRISPR-Cas systems are widespread in streptococci, suggesting that the interplay between CRISPR-Cas systems and MGEs is one of the driving forces governing genome homeostasis in this genus. To investigate the genetic outcomes resulting from CRISPR-Cas targeting of integrated MGEs, in silico prediction revealed four genomic islands without essential genes in lengths from 8 to 102 kbp, totaling 7% of the genome. In this study, the endogenous CRISPR3 type II system was programmed to target the four islands independently through plasmid-based expression of engineered CRISPR arrays. Targeting lacZ within the largest 102-kbp genomic island was lethal to wild-type cells and resulted in a reduction of up to 2.5-log in the surviving population. Genotyping of Lac(-) survivors revealed variable deletion events between the flanking insertion-sequence elements, all resulting in elimination of the Lac-encoding island. Chimeric insertion sequence footprints were observed at the deletion junctions after targeting all of the four genomic islands, suggesting a common mechanism of deletion via recombination between flanking insertion sequences. These results established that self-targeting CRISPR-Cas systems may direct significant evolution of bacterial genomes on a population level, influencing genome homeostasis and remodeling.

  18. PCR-Based Seamless Genome Editing with High Efficiency and Fidelity in Escherichia coli

    DEFF Research Database (Denmark)

    Liu, Yilan; Yang, Maohua; Yan, Daojiang

    2016-01-01

    Efficiency and fidelity are the key obstacles for genome editing toolboxes. In the present study, a PCR-based tandem repeat assisted genome editing (TRAGE) method with high efficiency and fidelity was developed. The design of TRAGE is based on the mechanism of repair of spontaneous double...... for seamlessly deleting, substituting and inserting targeted genes using PCR products. The effects of different manipulations including sucrose addition time, subculture times in LB with sucrose and stages of inoculation on the efficiency were investigated. With our recommended procedure, seamless excision...... of cat-sacB cassette can be realized in 48 h efficiently. We believe that the developed method has great potential for seamless genome editing in E. coli....

  19. A computer-based education intervention to enhance surrogates' informed consent for genomics research.

    Science.gov (United States)

    Shelton, Ann K; Freeman, Bradley D; Fish, Anne F; Bachman, Jean A; Richardson, Lloyd I

    2015-03-01

    Many research studies conducted today in critical care have a genomics component. Patients' surrogates asked to authorize participation in genomics research for a loved one in the intensive care unit may not be prepared to make informed decisions about a patient's participation in the research. To examine the effectiveness of a new, computer-based education module on surrogates' understanding of the process of informed consent for genomics research. A pilot study was conducted with visitors in the waiting rooms of 2 intensive care units in a Midwestern tertiary care medical center. Visitors were randomly assigned to the experimental (education module plus a sample genomics consent form; n = 65) or the control (sample genomics consent form only; n = 69) group. Participants later completed a test on informed genomics consent. Understanding the process of informed consent was greater (P = .001) in the experimental group than in the control group. Specifically, compared with the control group, the experimental group had a greater understanding of 8 of 13 elements of informed consent: intended benefits of research (P = .02), definition of surrogate consenter (P= .001), withdrawal from the study (P = .001), explanation of risk (P = .002), purpose of the institutional review board (P = .001), definition of substituted judgment (P = .03), compensation for harm (P = .001), and alternative treatments (P = .004). Computer-based education modules may be an important addition to conventional approaches for obtaining informed consent in the intensive care unit. Preparing patients' family members who may consider serving as surrogate consenters is critical to facilitating genomics research in critical care. ©2015 American Association of Critical-Care Nurses.

  20. Mitotic-chromosome-based physical mapping of the Culex quinquefasciatus genome.

    Science.gov (United States)

    Naumenko, Anastasia N; Timoshevskiy, Vladimir A; Kinney, Nicholas A; Kokhanenko, Alina A; deBruyn, Becky S; Lovin, Diane D; Stegniy, Vladimir N; Severson, David W; Sharakhov, Igor V; Sharakhova, Maria V

    2015-01-01

    The genome assembly of southern house mosquito Cx. quinquefasciatus is represented by a high number of supercontigs with no order or orientation on the chromosomes. Although cytogenetic maps for the polytene chromosomes of this mosquito have been developed, their utilization for the genome mapping remains difficult because of the low number of high-quality spreads in chromosome preparations. Therefore, a simple and robust mitotic-chromosome-based approach for the genome mapping of Cx. quinquefasciatus still needs to be developed. In this study, we performed physical mapping of 37 genomic supercontigs using fluorescent in situ hybridization on mitotic chromosomes from imaginal discs of 4th instar larvae. The genetic linkage map nomenclature was adopted for the chromosome numbering based on the direct positioning of 58 markers that were previously genetically mapped. The smallest, largest, and intermediate chromosomes were numbered as 1, 2, and 3, respectively. For idiogram development, we analyzed and described in detail the morphology and proportions of the mitotic chromosomes. Chromosomes were subdivided into 19 divisions and 72 bands of four different intensities. These idiograms were used for mapping the genomic supercontigs/genetic markers. We also determined the presence of length polymorphism in the q arm of sex-determining chromosome 1 in Cx. quinquefasciatus related to the size of ribosomal locus. Our physical mapping and previous genetic linkage mapping resulted in the chromosomal assignment of 13% of the total genome assembly to the chromosome bands. We provided the first detailed description, nomenclature, and idiograms for the mitotic chromosomes of Cx. quinquefasciatus. Further application of the approach developed in this study will help to improve the quality of the southern house mosquito genome.

  1. Microarray-based whole-genome hybridization as a tool for determining procaryotic species relatedness

    Energy Technology Data Exchange (ETDEWEB)

    Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.

    2008-01-15

    The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.

  2. Isolation of a new antibacterial peptide actinokineosin from Actinokineospora spheciospongiae based on genome mining.

    Science.gov (United States)

    Takasaka, N; Kaweewan, I; Ohnishi-Kameyama, M; Kodani, S

    2017-02-01

    Based on genome mining, a new antibacterial peptide named actinokineosin was isolated from a rare actinomycete Actinokineospora spheciospongiae. The amino acid sequence of the C-terminus of actinokineosin was established by TOF-MS/MS experiments. The amino acid sequence in the macrolactam ring was determined by TOF-MS/MS analyses after cleavage with BNPS-skatole and successive trypsin treatment. As a result of an antibacterial assay using a paper disk, actinokineosin showed antibacterial activity against Micrococcus luteus at a dosage of 50 μg per disk. From the genome sequence data of A. spheciospongiae, the biosynthetic gene cluster of actinokineosin was found and was indicated to consist of 10 genes. Among the genes, the gene aknA encoded the precursor of actinokineosin and the genes including aknC, aknB1 and aknB2 were proposed as modification enzymes to give mature actinokineosin. Genome mining is a powerful tool to find new bioactive compounds from the genome database. In this report, we succeeded in isolation and structure determination of a new antibacterial peptide named actinokineosin based on genome mining. © 2016 The Society for Applied Microbiology.

  3. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    Science.gov (United States)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  4. Safety assessment of Staphylococcus phages of the family Myoviridae based on complete genome sequences

    Science.gov (United States)

    Cui, Zelin; Guo, Xiaokui; Dong, Ke; Zhang, Yan; Li, Qingtian; Zhu, Yongzhang; Zeng, Lingbing; Tang, Rong; Li, Li

    2017-01-01

    Staphylococcus phages of the Myoviridae family have a wide host range and potential applications in phage therapy. In this report, safety assessments of these phages were conducted based on their complete genome sequences. The complete genomes of Staphylococcus phages of the Myoviridae family were analyzed, and the Open Reading Frame (ORFs) were compared with a pool of virulence and antibiotic resistance genes using the BLAST algorithm. In addition, the lifestyle of the phages (virulent or temperate) was also confirmed using PHACTS. The results showed that all phages were lytic and did not contain resistance or virulence genes based on bioinformatic analyses, excluding the possibility that they could be vectors for the dissemination of these undesirable genes. These findings suggest that the phages are safe at the genome level. The SceD-like transglycosylase, which is a biomarker for vancomycin-intermediate strains, was widely distributed in the phage genomes. Approximately 70% of the ORFs encoded in the phage genomes have unknown functions; therefore, their roles in the antibiotic resistance and virulence of Staphylococcus aureus are still unknown and require consideration before use in phage therapy. PMID:28117392

  5. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  6. Evaluation of somatic genomic imbalances in thyroid carcinomas of follicular origin by CGH-based approaches.

    Science.gov (United States)

    Baldan, Federica; Mio, Catia; Allegri, Lorenzo; Passon, Nadia; Lepore, Saverio M; Russo, Diego; Damante, Giuseppe

    2017-09-07

    Application of distinct technologies of cancer genome analysis has provided important information for the molecular characterization of several human neoplasia, including follicular cell-derived thyroid carcinoma. Among them, comparative genomic hybridization (CGH)-based procedures have been extensively applied to evaluate genomic imbalances present in these tumours, obtaining data leading to an increase in the understanding of their complexity and diversity. In this review, after a brief overview of the most commonly used CGH-based technichs, we will describe the major results deriving from the most influential studies in the literature which used this approach to investigate the genomic aberrations of thyroid cancer cells. In most studies a small number of patients have been analyzed. Deletions and duplications at different chromosomal regions were detected in all investigated cohorts. A higher number of genomic imbalances has been detected in anaplastic or poorly differentiated thyroid carcinomas compared to well differentiated ones. Limitations in the interpretation of the results, as well the potential impact in the clinical practice are discussed. Though a quite heterogeneous picture arises from results so far available, CGH array, combined with other methodologies as well as an accurate clinical management, may offer novel opportunities for a better stratification of thyroid cancer patients.

  7. Genotype-Specific Genomic Markers Associated with Primary Hepatomas, Based on Complete Genomic Sequencing of Hepatitis B Virus▿

    OpenAIRE

    Sung, Joseph J. Y.; Tsui, Stephen K. W.; Tse, Chi-Hang; Ng, Eddie Y. T.; Leung, Kwong-Sak; Lee, Kin-Hong; Mok, Tony S. K.; Bartholomeusz, Angeline; Au, Thomas C. C.; Tsoi, Kelvin K. F.; Locarnini, Stephen; Chan, Henry L. Y.

    2008-01-01

    We aimed to identify genomic markers in hepatitis B virus (HBV) that are associated with hepatocellular carcinoma (HCC) development by comparing the complete genomic sequences of HBVs among patients with HCC and those without. One hundred patients with HBV-related HCC and 100 age-matched HBV-infected non-HCC patients (controls) were studied. HBV DNA from serum was directly sequenced to study the whole viral genome. Data mining and rule learning were employed to develop diagnostic algorithms. ...

  8. An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform

    Directory of Open Access Journals (Sweden)

    Zhang Tongwu

    2011-11-01

    Full Text Available Abstract Motivation Complete organellar genome sequences (chloroplasts and mitochondria provide valuable resources and information for studying plant molecular ecology and evolution. As high-throughput sequencing technology advances, it becomes the norm that a shotgun approach is used to obtain complete genome sequences. Therefore, to assemble organellar sequences from the whole genome, shotgun reads are inevitable. However, associated techniques are often cumbersome, time-consuming, and difficult, because true organellar DNA is difficult to separate efficiently from nuclear copies, which have been transferred to the nucleus through the course of evolution. Results We report a new, rapid procedure for plant chloroplast and mitochondrial genome sequencing and assembly using the Roche/454 GS FLX platform. Plant cells can contain multiple copies of the organellar genomes, and there is a significant correlation between the depth of sequence reads in contigs and the number of copies of the genome. Without isolating organellar DNA from the mixture of nuclear and organellar DNA for sequencing, we retrospectively extracted assembled contigs of either chloroplast or mitochondrial sequences from the whole genome shotgun data. Moreover, the contig connection graph property of Newbler (a platform-specific sequence assembler ensures an efficient final assembly. Using this procedure, we assembled both chloroplast and mitochondrial genomes of a resurrection plant, Boea hygrometrica, with high fidelity. We also present information and a minimal sequence dataset as a reference for the assembly of other plant organellar genomes.

  9. Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web.

    Science.gov (United States)

    Miller, Chase A; Anthony, Jon; Meyer, Michelle M; Marth, Gabor

    2013-02-01

    High-throughput biological research requires simultaneous visualization as well as analysis of genomic data, e.g. read alignments, variant calls and genomic annotations. Traditionally, such integrative analysis required desktop applications operating on locally stored data. Many current terabyte-size datasets generated by large public consortia projects, however, are already only feasibly stored at specialist genome analysis centers. As even small laboratories can afford very large datasets, local storage and analysis are becoming increasingly limiting, and it is likely that most such datasets will soon be stored remotely, e.g. in the cloud. These developments will require web-based tools that enable users to access, analyze and view vast remotely stored data with a level of sophistication and interactivity that approximates desktop applications. As rapidly dropping cost enables researchers to collect data intended to answer questions in very specialized contexts, developers must also provide software libraries that empower users to implement customized data analyses and data views for their particular application. Such specialized, yet lightweight, applications would empower scientists to better answer specific biological questions than possible with general-purpose genome browsers currently available. Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Scribl simplifies the development of sophisticated web-based graphical tools that approach the dynamism and interactivity of desktop applications. Software is freely available online at http://chmille4.github.com/Scribl/ and is implemented in JavaScript with all modern browsers supported.

  10. Genomic and pedigree-based genetic parameters for scarcely recorded traits when some animals are genotyped.

    Science.gov (United States)

    Veerkamp, R F; Mulder, H A; Thompson, R; Calus, M P L

    2011-08-01

    Genetic parameters were estimated using relationships between animals that were based either on pedigree, 43,011 single nucleotide polymorphisms, or a combination of these, considering genotyped and non-genotyped animals. The standard error of the estimates and a parametric bootstrapping procedure was used to investigate sampling properties of the estimated variance components. The data set contained milk yield, dry matter intake and body weight for 517 first-lactation heifers with genotypes and phenotypes, and another 112 heifers with phenotypes only. Multivariate models were fitted using the different relationships in ASReml software. Estimates of genetic variance were lower based on genomic relationships than using pedigree relationships. Genetic variances from genomic and pedigree relationships were, however, not directly comparable because they apply to different base populations. Standard errors indicated that using the genomic relationships gave more accurate estimates of heritability but equally accurate estimates of genetic correlation. However, the estimates of standard errors were affected by the differences in scale between the 2 relationship matrices, causing differences in values of the genetic parameters. The bootstrapping results (with genetic parameters at the same level), confirmed that both heritability and genetic correlations were estimated more accurately with genomic relationships in comparison with using the pedigree relationships. Animals without genotype were included in the analysis by merging genomic and pedigree relationships. This allowed all phenotypes to be used, including those from non-genotyped animals. This combination of genomic and pedigree relationships gave the most accurate estimates of genetic variance. When a small data set is available it might be more advantageous for the estimation of genetic parameters to genotype existing animals, rather than collecting more phenotypes. Copyright © 2011 American Dairy Science

  11. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    Science.gov (United States)

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  12. Genome filtering using methylation-sensitive restriction enzymes with six-base pair recognition sites

    Science.gov (United States)

    The large fraction of repetitive DNA in many plant genomes has complicated all aspects of DNA sequencing and assembly, and thus techniques that enrich for genes and low-copy sequences have been employed to isolate gene space. Methyl sensitive restriction enzymes with six base pair recognition sites...

  13. A versatile genome-scale PCR-based pipeline for high-definition DNA FISH

    NARCIS (Netherlands)

    Bienko, M.; Crosetto, N.; Teytelman, L.; Klemm, S.; Itzkovitz, S.; van Oudenaarden, A.

    2013-01-01

    We developed a cost-effective genome-scale PCR-based method for high-definition DNA FISH (HD-FISH). We visualized gene loci with diffraction-limited resolution, chromosomes as spot clusters and single genes together with transcripts by combining HD-FISH with single-molecule RNA FISH. We provide a da

  14. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    Science.gov (United States)

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  15. An HMM-based comparative genomic framework for detecting introgression in eukaryotes.

    Directory of Open Access Journals (Sweden)

    Kevin J Liu

    2014-06-01

    Full Text Available One outcome of interspecific hybridization and subsequent effects of evolutionary forces is introgression, which is the integration of genetic material from one species into the genome of an individual in another species. The evolution of several groups of eukaryotic species has involved hybridization, and cases of adaptation through introgression have been already established. In this work, we report on PhyloNet-HMM-a new comparative genomic framework for detecting introgression in genomes. PhyloNet-HMM combines phylogenetic networks with hidden Markov models (HMMs to simultaneously capture the (potentially reticulate evolutionary history of the genomes and dependencies within genomes. A novel aspect of our work is that it also accounts for incomplete lineage sorting and dependence across loci. Application of our model to variation data from chromosome 7 in the mouse (Mus musculus domesticus genome detected a recently reported adaptive introgression event involving the rodent poison resistance gene Vkorc1, in addition to other newly detected introgressed genomic regions. Based on our analysis, it is estimated that about 9% of all sites within chromosome 7 are of introgressive origin (these cover about 13 Mbp of chromosome 7, and over 300 genes. Further, our model detected no introgression in a negative control data set. We also found that our model accurately detected introgression and other evolutionary processes from synthetic data sets simulated under the coalescent model with recombination, isolation, and migration. Our work provides a powerful framework for systematic analysis of introgression while simultaneously accounting for dependence across sites, point mutations, recombination, and ancestral polymorphism.

  16. The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows.

    Science.gov (United States)

    O'Connor, Brian D; Yuen, Denis; Chung, Vincent; Duncan, Andrew G; Liu, Xiang Kun; Patricia, Janice; Paten, Benedict; Stein, Lincoln; Ferretti, Vincent

    2017-01-01

    As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).

  17. Proposal of fifteen new species of Parasynechococcus based on genomic, physiological and ecological features.

    Science.gov (United States)

    Coutinho, F H; Dutilh, B E; Thompson, C C; Thompson, F L

    2016-12-01

    Members of the recently proposed genus Parasynechococcus (Cyanobacteria) are extremely abundant throughout the global ocean and contribute significantly to global primary productivity. However, the taxonomy of these organisms remains poorly characterized. The aim of this study was to propose a new taxonomic framework for Parasynechococcus based on a genomic taxonomy approach that incorporates genomic, physiological and ecological data. Through in silico DNA-DNA hybridization, average amino acid identity, dinucleotide signatures and phylogenetic reconstruction, a total of 15 species of Parasynechococcus could be delineated. Each species was then described on the basis of their gene content, light and nutrient utilization strategies, geographical distribution patterns throughout the oceans and response to environmental parameters.

  18. MODEST: a web-based design tool for oligonucleotide-mediated genome engineering and recombineering

    DEFF Research Database (Denmark)

    Bonde, Mads; Klausen, Michael Schantz; Anderson, Mads Valdemar

    2014-01-01

    Recombineering and multiplex automated genome engineering (MAGE) offer the possibility to rapidly modify multiple genomic or plasmid sites at high efficiencies. This enables efficient creation of genetic variants including both single mutants with specifically targeted modifications as well......, which confers the corresponding genetic change, is performed manually. To address these challenges, we have developed the MAGE Oligo Design Tool (MODEST). This web-based tool allows designing of MAGE oligos for (i) tuning translation rates by modifying the ribosomal binding site, (ii) generating...... efficiency recombineering and MAGE. MODEST is available for free and is open to all users at http://modest.biosustain.dtu.dk....

  19. Genomic Variance Estimation Based on Genotyping-by-Sequencing with Different Coverage in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Fé, Dario; Jensen, Just

    2014-01-01

    at each SNP in family pools or polyploids. There are, however, several statistical challenges associated with this method, including low sequencing depth and missing values. Low sequencing depth results in inaccuracies in estimates of allele frequencies for each SNP. In this work we have focused...... on optimizing methods and models utilizing F2 family phenotype records and NGS information from F2 family pools in perennial ryegrass. Genomic variance was estimated using genomic relationship matrices based on different coverage depths to verify effects of coverage depth. Example traits were seed yield, rust...... score and heading date. A total of 995 F2 families were genotyped via GBS, resulting in allele frequency estimates at 1 million SNPs in each family, the coverage within family ranging from 0 to 60. Results from both real and simulated data show that genomic variance is overestimated at lower coverage...

  20. MethylC-seq library preparation for base-resolution whole-genome bisulfite sequencing.

    Science.gov (United States)

    Urich, Mark A; Nery, Joseph R; Lister, Ryan; Schmitz, Robert J; Ecker, Joseph R

    2015-03-01

    Current high-throughput DNA sequencing technologies enable acquisition of billions of data points through which myriad biological processes can be interrogated, including genetic variation, chromatin structure, gene expression patterns, small RNAs and protein-DNA interactions. Here we describe the MethylC-sequencing (MethylC-seq) library preparation method, a 2-d protocol that enables the genome-wide identification of cytosine DNA methylation states at single-base resolution. The technique involves fragmentation of genomic DNA followed by adapter ligation, bisulfite conversion and limited amplification using adapter-specific PCR primers in preparation for sequencing. To date, this protocol has been successfully applied to genomic DNA isolated from primary cell culture, sorted cells and fresh tissue from over a thousand plant and animal samples.

  1. Tracing the Spread of Clostridium difficile Ribotype 027 in Germany Based on Bacterial Genome Sequences.

    Directory of Open Access Journals (Sweden)

    Matthias Steglich

    Full Text Available We applied whole-genome sequencing to reconstruct the spatial and temporal dynamics underpinning the expansion of Clostridium difficile ribotype 027 in Germany. Based on re-sequencing of genomes from 57 clinical C. difficile isolates, which had been collected from hospitalized patients at 36 locations throughout Germany between 1990 and 2012, we demonstrate that C. difficile genomes have accumulated sequence variation sufficiently fast to document the pathogen's spread at a regional scale. We detected both previously described lineages of fluoroquinolone-resistant C. difficile ribotype 027, FQR1 and FQR2. Using Bayesian phylogeographic analyses, we show that fluoroquinolone-resistant C. difficile 027 was imported into Germany at least four times, that it had been widely disseminated across multiple federal states even before the first outbreak was noted in 2007, and that it has continued to spread since.

  2. Integrating genome-based informatics to modernize global disease monitoring, information sharing, and response

    DEFF Research Database (Denmark)

    Aarestrup, Frank Møller; Brown, Eric W; Detter, Chris

    2012-01-01

    The rapid advancement of genome technologies holds great promise for improving the quality and speed of clinical and public health laboratory investigations and for decreasing their cost. The latest generation of genome DNA sequencers can provide highly detailed and robust information on disease......-causing microbes, and in the near future these technologies will be suitable for routine use in national, regional, and global public health laboratories. With additional improvements in instrumentation, these next- or third-generation sequencers are likely to replace conventional culture-based and molecular...... typing methods to provide point-of-care clinical diagnosis and other essential information for quicker and better treatment of patients. Provided there is free-sharing of information by all clinical and public health laboratories, these genomic tools could spawn a global system of linked databases...

  3. An EST-based genome scan using 454 sequencing in the marine snail Littorina saxatilis.

    Science.gov (United States)

    Galindo, J; Grahame, J W; Butlin, R K

    2010-09-01

    Genome scans have been used in the studies of ecological speciation to find genomic regions ('outlier loci') showing reduced gene flow between divergent populations/species. High-throughput sequencing ('454') offers new opportunities in this field via transcriptome sequencing. Divergent ecotypes of the marine gastropod Littorina saxatilis represent a good example of incipient ecological speciation. We performed a 454-based genome scan between H and M ecotypes of L. saxatilis from the British Isles using cDNA of pooled individuals. Allele frequencies were calculated for 2454 single nucleotide polymorphisms (SNPs), within 572 contigs, and 7% of loci were detected as outliers. Functional annotation of the contigs containing outlier SNPs showed that they included shell matrix and muscle proteins (lithostathine, mucin, titin), proteins involved in energetic metabolism (arginine kinase, NADH dehydrogenase) and reverse transcriptases. Follow-up investigations into these proteins and unannotated outliers will be a promising route in the study of ecological speciation in L. saxatilis.

  4. An evaluation of multiple annealing and looping based genome amplification using a synthetic bacterial community

    KAUST Repository

    Wang, Yong

    2016-02-23

    The low biomass in environmental samples is a major challenge for microbial metagenomic studies. The amplification of a genomic DNA was frequently applied to meeting the minimum requirement of the DNA for a high-throughput next-generation-sequencing technology. Using a synthetic bacterial community, the amplification efficiency of the Multiple Annealing and Looping Based Amplification Cycles (MALBAC) kit that is originally developed to amplify the single-cell genomic DNA of mammalian organisms is examined. The DNA template of 10 pg in each reaction of the MALBAC amplification may generate enough DNA for Illumina sequencing. Using 10 pg and 100 pg templates for each reaction set, the MALBAC kit shows a stable and homogeneous amplification as indicated by the highly consistent coverage of the reads from the two amplified samples on the contigs assembled by the original unamplified sample. Although GenomePlex whole genome amplification kit allows one to generate enough DNA using 100 pg of template in each reaction, the minority of the mixed bacterial species is not linearly amplified. For both of the kits, the GC-rich regions of the genomic DNA are not efficiently amplified as suggested by the low coverage of the contigs with the high GC content. The high efficiency of the MALBAC kit is supported for the amplification of environmental microbial DNA samples, and the concerns on its application are also raised to bacterial species with the high GC content.

  5. Sonication-based isolation and enrichment of Chlorella protothecoides chloroplasts for illumina genome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Angelova, Angelina [University of Arizona; Park, Sang-Hycuk [University of Arizona; Kyndt, John [Bellevue University; Fitzsimmons, Kevin [University of Arizona; Brown, Judith K [University of Arizona

    2013-09-01

    With the increasing world demand for biofuel, a number of oleaginous algal species are being considered as renewable sources of oil. Chlorella protothecoides Krüger synthesizes triacylglycerols (TAGs) as storage compounds that can be converted into renewable fuel utilizing an anabolic pathway that is poorly understood. The paucity of algal chloroplast genome sequences has been an important constraint to chloroplast transformation and for studying gene expression in TAGs pathways. In this study, the intact chloroplasts were released from algal cells using sonication followed by sucrose gradient centrifugation, resulting in a 2.36-fold enrichment of chloroplasts from C. protothecoides, based on qPCR analysis. The C. protothecoides chloroplast genome (cpDNA) was determined using the Illumina HiSeq 2000 sequencing platform and found to be 84,576 Kb in size (8.57 Kb) in size, with a GC content of 30.8 %. This is the first report of an optimized protocol that uses a sonication step, followed by sucrose gradient centrifugation, to release and enrich intact chloroplasts from a microalga (C. prototheocoides) of sufficient quality to permit chloroplast genome sequencing with high coverage, while minimizing nuclear genome contamination. The approach is expected to guide chloroplast isolation from other oleaginous algal species for a variety of uses that benefit from enrichment of chloroplasts, ranging from biochemical analysis to genomics studies.

  6. An evaluation of multiple annealing and looping based genome amplification using a synthetic bacterial community

    Institute of Scientific and Technical Information of China (English)

    WANG Yong; GAO Zhaoming; XU Ying; LI Guangyu; HE Lisheng; QIAN Peiyuan

    2016-01-01

    The low biomass in environmental samples is a major challenge for microbial metagenomic studies. The amplification of a genomic DNA was frequently applied to meeting the minimum requirement of the DNA for a high-throughput next-generation-sequencing technology. Using a synthetic bacterial community, the amplification efficiency of the Multiple Annealing and Looping Based Amplification Cycles (MALBAC) kit that is originally developed to amplify the single-cell genomic DNA of mammalian organisms is examined. The DNA template of 10 pg in each reaction of the MALBAC amplification may generate enough DNA for Illumina sequencing. Using 10 pg and 100 pg templates for each reaction set, the MALBAC kit shows a stable and homogeneous amplification as indicated by the highly consistent coverage of the reads from the two amplified samples on the contigs assembled by the original unamplified sample. Although GenomePlex whole genome amplification kit allows one to generate enough DNA using 100 pg of template in each reaction, the minority of the mixed bacterial species is not linearly amplified. For both of the kits, the GC-rich regions of the genomic DNA are not efficiently amplified as suggested by the low coverage of the contigs with the high GC content. The high efficiency of the MALBAC kit is supported for the amplification of environmental microbial DNA samples, and the concerns on its application are also raised to bacterial species with the high GC content.

  7. Ethical issues and best practice in clinically based genomic research: Exeter Stakeholders Meeting Report.

    Science.gov (United States)

    Carrieri, D; Bewshea, C; Walker, G; Ahmad, T; Bowen, W; Hall, A; Kelly, S

    2016-09-27

    Current guidelines on consenting individuals to participate in genomic research are diverse. This creates problems for participants and also for researchers, particularly for clinicians who provide both clinical care and research to their patients. A group of 14 stakeholders met on 7 October 2015 in Exeter to discuss the ethical issues and the best practice arising in clinically based genomic research, with particular emphasis on the issue of returning results to study participants/patients in light of research findings affecting research and clinical practices. The group was deliberately multidisciplinary to ensure that a diversity of views was represented. This report outlines the main ethical issues, areas of best practice and principles underlying ethical clinically based genomic research discussed during the meeting. The main point emerging from the discussion is that ethical principles, rather than being formulaic, should guide researchers/clinicians to identify who the main stakeholders are to consult with for a specific project and to incorporate their voices/views strategically throughout the lifecycle of each project. We believe that the mix of principles and practical guidelines outlined in this report can contribute to current debates on how to conduct ethical clinically based genomic research.

  8. Ethical issues and best practice in clinically based genomic research: Exeter Stakeholders Meeting Report

    Science.gov (United States)

    Carrieri, D; Bewshea, C; Walker, G; Ahmad, T; Bowen, W; Hall, A; Kelly, S

    2016-01-01

    Current guidelines on consenting individuals to participate in genomic research are diverse. This creates problems for participants and also for researchers, particularly for clinicians who provide both clinical care and research to their patients. A group of 14 stakeholders met on 7 October 2015 in Exeter to discuss the ethical issues and the best practice arising in clinically based genomic research, with particular emphasis on the issue of returning results to study participants/patients in light of research findings affecting research and clinical practices. The group was deliberately multidisciplinary to ensure that a diversity of views was represented. This report outlines the main ethical issues, areas of best practice and principles underlying ethical clinically based genomic research discussed during the meeting. The main point emerging from the discussion is that ethical principles, rather than being formulaic, should guide researchers/clinicians to identify who the main stakeholders are to consult with for a specific project and to incorporate their voices/views strategically throughout the lifecycle of each project. We believe that the mix of principles and practical guidelines outlined in this report can contribute to current debates on how to conduct ethical clinically based genomic research. PMID:27677925

  9. A first generation physical map of the medaka genome in BACs essential for positional cloning and clone-by-clone based genomic sequencing.

    Science.gov (United States)

    Khorasani, Maryam Zadeh; Hennig, Steffen; Imre, Gabriele; Asakawa, Shuichi; Palczewski, Stefanie; Berger, Anja; Hori, Hiroshi; Naruse, Kiyoshi; Mitani, Hiroshi; Shima, Akihiro; Lehrach, Hans; Wittbrodt, Jochen; Kondoh, Hisato; Shimizu, Nobuyoshi; Himmelbauer, Heinz

    2004-07-01

    In order to realize the full potential of the medaka as a model system for developmental biology and genetics, characterized genomic resources need to be established, culminating in the sequence of the medaka genome. To facilitate the map-based cloning of genes underlying induced mutations and to provide templates for clone-based genomic sequencing, we have created a first-generation physical map of the medaka genome in bacterial artificial chromosome (BAC) clones. In particular, we exploited the synteny to the closely related genome of the pufferfish, Takifugu rubripes, by marker content mapping. As a first step, we clustered 103,144 public medaka EST sequences to obtain a set of 21,121 non-redundant sequence entities. Avoiding oversampling of gene-dense regions, 11,254 of EST clusters were successfully matched against the draft sequence of the fugu genome, and 2363 genes were selected for the BAC map project. We designed 35mer oligonucleotide probes from the selected genes and hybridized them against 64,500 BAC clones of strains Cab and Hd-rR, representing 14-fold coverage of the medaka genome. Our data set is further supplemented with 437 results generated from PCR-amplified inserts of medaka cDNA clones and BAC end-fragment markers. Our current, edited, first generation medaka BAC map consists of 902 map segments that cover about 74% of the medaka genome. The map contains 2721 markers. Of these, 2534 are from expressed sequences, equivalent to a non-redundant set of 2328 loci. The 934 markers (724 different) are anchored to the medaka genetic map. Thus, genetic map assignments provide immediate access to underlying clones and contigs, simplifying molecular access to candidate gene regions and their characterization.

  10. Improved base calling for the Illumina Genome Analyzer using machine learning strategies

    OpenAIRE

    Kircher, Martin; Stenzel, Udo; Kelso, Janet

    2009-01-01

    The Illumina Genome Analyzer generates millions of short sequencing reads. We present Ibis (Improved base identification system), an accurate, fast and easy-to-use base caller that significantly reduces the error rate and increases the output of usable reads. Ibis is faster and more robust with respect to chemistry and technology than other publicly available packages. Ibis is freely available under the GPL from .

  11. Ori-Finder: A web-based system for finding oriCs in unannotated bacterial genomes

    Directory of Open Access Journals (Sweden)

    Zhang Chun-Ting

    2008-02-01

    Full Text Available Abstract Background Chromosomal replication is the central event in the bacterial cell cycle. Identification of replication origins (oriCs is necessary for almost all newly sequenced bacterial genomes. Given the increasing pace of genome sequencing, the current available software for predicting oriCs, however, still leaves much to be desired. Therefore, the increasing availability of genome sequences calls for improved software to identify oriCs in newly sequenced and unannotated bacterial genomes. Results We have developed Ori-Finder, an online system for finding oriCs in bacterial genomes based on an integrated method comprising the analysis of base composition asymmetry using the Z-curve method, distribution of DnaA boxes, and the occurrence of genes frequently close to oriCs. The program can also deal with unannotated genome sequences by integrating the gene-finding program ZCURVE 1.02. Output of the predicted results is exported to an HTML report, which offers convenient views on the results in both graphical and tabular formats. Conclusion A web-based system to predict replication origins of bacterial genomes has been presented here. Based on this system, oriC regions have been predicted for the bacterial genomes available in GenBank currently. It is hoped that Ori-Finder will become a useful tool for the identification and analysis of oriCs in both bacterial and archaeal genomes.

  12. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis

    Directory of Open Access Journals (Sweden)

    Stajich Jason E

    2006-11-01

    relatives. We could not confidently resolve whether Candida glabrata or Saccharomyces castellii lies at the base of the WGD clade. Conclusion We have constructed robust phylogenies for fungi based on whole genome analysis. Overall, our phylogenies provide strong support for the classification of phyla, sub-phyla, classes and orders. We have resolved the relationship of the classes Leotiomyctes and Sordariomycetes, and have identified two classes within the CTG clade of the Saccharomycotina that may correlate with sexual status.

  13. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes.

    Science.gov (United States)

    Ruppitsch, Werner; Pietzka, Ariane; Prior, Karola; Bletz, Stefan; Fernandez, Haizpea Lasa; Allerberger, Franz; Harmsen, Dag; Mellmann, Alexander

    2015-09-01

    Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determined the breadth of the L. monocytogenes population based on MLST data with a Bayesian approach. Based on the genome sequence data of representative isolates for the whole population, cgMLST target genes were defined and reappraised with 67 L. monocytogenes isolates from two outbreaks and serotype reference strains. The Bayesian population analysis generated five L. monocytogenes groups. Using all available NCBI RefSeq genomes (n = 36) and six additionally sequenced strains, all genetic groups were covered. Pairwise comparisons of these 42 genome sequences resulted in 1,701 cgMLST targets present in all 42 genomes with 100% overlap and ≥90% sequence similarity. Overall, ≥99.1% of the cgMLST targets were present in 67 outbreak and serotype reference strains, underlining the representativeness of the cgMLST scheme. Moreover, cgMLST enabled clustering of outbreak isolates with ≤10 alleles difference and unambiguous separation from unrelated outgroup isolates. In conclusion, the novel cgMLST scheme not only improves outbreak investigations but also enables, due to the availability of the automatically curated cgMLST nomenclature, interlaboratory exchange of data that are crucial, especially for rapid responses during transsectorial outbreaks.

  14. Comparison of buccal and blood-derived canine DNA, either native or whole genome amplified, for array-based genome-wide association studies

    Directory of Open Access Journals (Sweden)

    Lawley Cynthia

    2011-06-01

    Full Text Available Abstract Background The availability of array-based genotyping platforms for single nucleotide polymorphisms (SNPs for the canine genome has expanded the opportunities to undertake genome-wide association (GWA studies to identify the genetic basis for Mendelian and complex traits. Whole blood as the source of high quality DNA is undisputed but often proves impractical for collection of the large numbers of samples necessary to discover the loci underlying complex traits. Further, many countries prohibit the collection of blood from dogs unless medically necessary thereby restricting access to critical control samples from healthy dogs. Alternate sources of DNA, typically from buccal cytobrush extractions, while convenient, have been suggested to have low yield and perform poorly in GWA. Yet buccal cytobrushes provide a cost-effective means of collecting DNA, are readily accepted by dog owners, and represent a large resource base in many canine genetics laboratories. To increase the DNA quantities, whole genome amplification (WGA can be performed. Thus, the present study assessed the utility of buccal-derived DNA as well as whole genome amplification in comparison to blood samples for use on the most recent iteration of the canine HD SNP array (Illumina. Findings In both buccal and blood samples, whether whole genome amplified or not, 97% of the samples had SNP call rates in excess of 80% indicating that the vast majority of the SNPs would be suitable to perform association studies regardless of the DNA source. Similarly, there were no significant differences in marker intensity measurements between buccal and blood samples for copy number variations (CNV analysis. Conclusions All DNA samples assayed, buccal or blood, native or whole genome amplified, are appropriate for use in array-based genome-wide association studies. The concordance between subsets of dogs for which both buccal and blood samples, or those samples whole genome amplified, was

  15. [Study on an inquiry-based teaching case in genomics curriculum: identifying virulence factors of Escherichia coli by using comparative genomics].

    Science.gov (United States)

    Jidong, Zhou; Yudong, Li

    2015-02-01

    Genomics is the core subject of various "omics" and it also becomes a topic of increasing interest in undergraduate curricula of biological sciences. However, the study on teaching methodology of genomics courses was very limited so far. Here we report an application of inquiry-based teaching in genomics courses by using virulence factors of Escherichia coli as an example of comparative genomics study. Specially, students first built a multiple-genome alignment of different E. coli strains to investigate the gene conservation using the Mauve tool; then putative virulence factor genes were identified by using BLAST tool to obtain gene annotations. The teaching process was divided into five modules: situation, resources, task, process and evaluation. Learning-assessment results revealed that students had acquired the knowledge and skills of genomics, and their learning interest and ability of self-study were also motivated. Moreover, the special teaching case can be applied to other related courses, such as microbiology, bioinformatics, molecular biology and food safety detection technology.

  16. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Klebsiella pneumoniae.

    Science.gov (United States)

    Zhou, Haijian; Liu, Wenbing; Qin, Tian; Liu, Chen; Ren, Hongyu

    2017-01-01

    At present, the most used methods for Klebsiella pneumoniae subtyping are multilocus sequence typing (MLST) and pulsed-field gel electrophoresis (PFGE). However, the discriminatory power of MLST could not meet the need for distinguishing outbreak and non-outbreak isolates and the PFGE is time-consuming and labor-intensive. A core genome multilocus sequence typing (cgMLST) scheme for whole-genome sequence-based typing of K. pneumoniae was developed for solving the disadvantages of these traditional molecular subtyping methods. Firstly, we used the complete genome of K. pneumoniae strain HKUOPLC as the reference genome and 907 genomes of K. pneumoniae download from NCBI database as original genome dataset to determine cgMLST target genes. A total of 1,143 genes were retained as cgMLST target genes. Secondly, we used 26 K. pneumoniae strains from a nosocomial infection outbreak to evaluate the cgMLST scheme. cgMLST enabled clustering of outbreak strains with <10 alleles difference and unambiguous separation from unrelated outgroup strains. Moreover, cgMLST revealed that there may be several sub-clones of epidemic ST11 clone. In conclusion, the novel cgMLST scheme not only showed higher discriminatory power compared with PFGE and MLST in outbreak investigations but also showed ability to reveal more population structure characteristics than MLST.

  17. A genomic background based method for association analysis in related individuals.

    Directory of Open Access Journals (Sweden)

    Najaf Amin

    Full Text Available BACKGROUND: Feasibility of genotyping of hundreds and thousands of single nucleotide polymorphisms (SNPs in thousands of study subjects have triggered the need for fast, powerful, and reliable methods for genome-wide association analysis. Here we consider a situation when study participants are genetically related (e.g. due to systematic sampling of families or because a study was performed in a genetically isolated population. Of the available methods that account for relatedness, the Measured Genotype (MG approach is considered the 'gold standard'. However, MG is not efficient with respect to time taken for the analysis of genome-wide data. In this context we proposed a fast two-step method called Genome-wide Association using Mixed Model and Regression (GRAMMAR for the analysis of pedigree-based quantitative traits. This method certainly overcomes the drawback of time limitation of the measured genotype (MG approach, but pays in power. One of the major drawbacks of both MG and GRAMMAR, is that they crucially depend on the availability of complete and correct pedigree data, which is rarely available. METHODOLOGY: In this study we first explore type 1 error and relative power of MG, GRAMMAR, and Genomic Control (GC approaches for genetic association analysis. Secondly, we propose an extension to GRAMMAR i.e. GRAMMAR-GC. Finally, we propose application of GRAMMAR-GC using the kinship matrix estimated through genomic marker data, instead of (possibly missing and/or incorrect genealogy. CONCLUSION: Through simulations we show that MG approach maintains high power across a range of heritabilities and possible pedigree structures, and always outperforms other contemporary methods. We also show that the power of our proposed GRAMMAR-GC approaches to that of the 'gold standard' MG for all models and pedigrees studied. We show that this method is both feasible and powerful and has correct type 1 error in the context of genome-wide association analysis

  18. Phytoplasma infection in tomato is associated with re-organization of plasma membrane, ER stacks and actin filaments in sieve elements

    Directory of Open Access Journals (Sweden)

    Stefanie Vera Buxa

    2015-08-01

    Full Text Available Phytoplasmas, biotrophic wall-less prokaryotes, only reside in sieve elements of their host plants. The essentials of the intimate interaction between phytoplasmas and their hosts are poorly understood, which calls for research on potential ultrastructural modifications. We investigated modifications of the sieve-element ultrastructure induced in tomato plants by ‘Candidatus Phytoplasma solani’, the pathogen associated with the stolbur disease. Phytoplasma infection induces a drastic re-organization of sieve-element substructures including changes in plasma membrane surface and distortion of the sieve-element reticulum. Observations of healthy and stolbur-diseased plants provided evidence for the emergence of structural links between sieve-element plasma membrane and phytoplasmas. One-sided actin aggregates on the phytoplasma surface also inferred a connection between phytoplasma and sieve-element cytoskeleton. Actin filaments displaced from the sieve-element mictoplasm to the surface of the phytoplasmas in infected sieve elements. Expression analysis revealed a decrease of actin and an increase of ER-resident chaperone luminal binding protein (BiP in midribs of phytoplasma-infected plants. Collectively, the studies provided novel insights into ultrastructural responses of host sieve elements to phloem-restricted prokaryotes.

  19. Reliability of pedigree-based and genomic evaluations in selected populations.

    Science.gov (United States)

    Gorjanc, Gregor; Bijma, Piter; Hickey, John M

    2015-08-14

    Reliability is an important parameter in breeding. It measures the precision of estimated breeding values (EBV) and, thus, potential response to selection on those EBV. The precision of EBV is commonly measured by relating the prediction error variance (PEV) of EBV to the base population additive genetic variance (base PEV reliability), while the potential for response to selection is commonly measured by the squared correlation between the EBV and breeding values (BV) on selection candidates (reliability of selection). While these two measures are equivalent for unselected populations, they are not equivalent for selected populations. The aim of this study was to quantify the effect of selection on these two measures of reliability and to show how this affects comparison of breeding programs using pedigree-based or genomic evaluations. Two scenarios with random and best linear unbiased prediction (BLUP) selection were simulated, where the EBV of selection candidates were estimated using only pedigree, pedigree and phenotype, genome-wide marker genotypes and phenotype, or only genome-wide marker genotypes. The base PEV reliabilities of these EBV were compared to the corresponding reliabilities of selection. Realized genetic selection intensity was evaluated to quantify the potential of selection on the different types of EBV and, thus, to validate differences in reliabilities. Finally, the contribution of different underlying processes to changes in additive genetic variance and reliabilities was quantified. The simulations showed that, for selected populations, the base PEV reliability substantially overestimates the reliability of selection of EBV that are mainly based on old information from the parental generation, as is the case with pedigree-based prediction. Selection on such EBV gave very low realized genetic selection intensities, confirming the overestimation and importance of genotyping both male and female selection candidates. The two measures of

  20. Factors affecting reproducibility between genome-scale siRNA-based screens

    Science.gov (United States)

    Barrows, Nicholas J.; Le Sommer, Caroline; Garcia-Blanco, Mariano A.; Pearson, James L.

    2011-01-01

    RNA interference-based screening is a powerful new genomic technology which addresses gene function en masse. To evaluate factors influencing hit list composition and reproducibility, we performed two identically designed small interfering RNA (siRNA)-based, whole genome screens for host factors supporting yellow fever virus infection. These screens represent two separate experiments completed five months apart and allow the direct assessment of the reproducibility of a given siRNA technology when performed in the same environment. Candidate hit lists generated by sum rank, median absolute deviation, z-score, and strictly standardized mean difference were compared within and between whole genome screens. Application of these analysis methodologies within a single screening dataset using a fixed threshold equivalent to a p-value ≤ 0.001 resulted in hit lists ranging from 82 to 1,140 members and highlighted the tremendous impact analysis methodology has on hit list composition. Intra- and inter-screen reproducibility was significantly influenced by the analysis methodology and ranged from 32% to 99%. This study also highlighted the power of testing at least two independent siRNAs for each gene product in primary screens. To facilitate validation we conclude by suggesting methods to reduce false discovery at the primary screening stage. In this study we present the first comprehensive comparison of multiple analysis strategies, and demonstrate the impact of the analysis methodology on the composition of the “hit list”. Therefore, we propose that the entire dataset derived from functional genome-scale screens, especially if publicly funded, should be made available as is done with data derived from gene expression and genome-wide association studies. PMID:20625183

  1. MIRAGE: a functional genomics-based approach for metabolic network model reconstruction and its application to cyanobacteria networks.

    Science.gov (United States)

    Vitkin, Edward; Shlomi, Tomer

    2012-11-29

    Genome-scale metabolic network reconstructions are considered a key step in quantifying the genotype-phenotype relationship. We present a novel gap-filling approach, MetabolIc Reconstruction via functionAl GEnomics (MIRAGE), which identifies missing network reactions by integrating metabolic flux analysis and functional genomics data. MIRAGE's performance is demonstrated on the reconstruction of metabolic network models of E. coli and Synechocystis sp. and validated via existing networks for these species. Then, it is applied to reconstruct genome-scale metabolic network models for 36 sequenced cyanobacteria amenable for constraint-based modeling analysis and specifically for metabolic engineering. The reconstructed network models are supplied via standard SBML files.

  2. A geminivirus-based guide RNA delivery system for CRISPR/Cas9 mediated plant genome editing

    OpenAIRE

    2015-01-01

    CRISPR/Cas has emerged as potent genome editing technology and has successfully been applied in many organisms, including several plant species. However, delivery of genome editing reagents remains a challenge in plants. Here, we report a virus-based guide RNA (gRNA) delivery system for CRISPR/Cas9 mediated plant genome editing (VIGE) that can be used to precisely target genome locations and cause mutations. VIGE is performed by using a modified Cabbage Leaf Curl virus (CaLCuV) vector to expr...

  3. Developing genomic knowledge bases and databases to support clinical management: current perspectives.

    Science.gov (United States)

    Huser, Vojtech; Sincan, Murat; Cimino, James J

    2014-01-01

    Personalized medicine, the ability to tailor diagnostic and treatment decisions for individual patients, is seen as the evolution of modern medicine. We characterize here the informatics resources available today or envisioned in the near future that can support clinical interpretation of genomic test results. We assume a clinical sequencing scenario (germline whole-exome sequencing) in which a clinical specialist, such as an endocrinologist, needs to tailor patient management decisions within his or her specialty (targeted findings) but relies on a genetic counselor to interpret off-target incidental findings. We characterize the genomic input data and list various types of knowledge bases that provide genomic knowledge for generating clinical decision support. We highlight the need for patient-level databases with detailed lifelong phenotype content in addition to genotype data and provide a list of recommendations for personalized medicine knowledge bases and databases. We conclude that no single knowledge base can currently support all aspects of personalized recommendations and that consolidation of several current resources into larger, more dynamic and collaborative knowledge bases may offer a future path forward.

  4. CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

    Directory of Open Access Journals (Sweden)

    Spraggins Thomas A

    2007-04-01

    Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the

  5. The Glyphosate-Based Herbicide Roundup® Does Not Elevate Genome-Wide Mutagenesis of Escherichia coli.

    Science.gov (United States)

    Tincher, Clayton; Long, Hongan; Behringer, Megan G; Walker, Noah; Lynch, Michael

    2017-08-09

    Mutations induced by pollutants may promote pathogen evolution for example by accelerating mutations conferring antibiotic resistance. Generally, evaluating the genome-wide mutagenic effects of long-term sublethal pollutant exposure at single-nucleotide resolution is extremely difficult. To overcome this technical barrier, we use the mutation accumulation/whole genome sequencing (MA/WGS) method as a mutagenicity test, to quantitatively evaluate genome-wide mutagenesis of Escherichia coli after long-term exposure to a wide gradient of the glyphosate-based herbicide (GBH) Roundup® Concentrate Plus. The genome-wide mutation rate decreases as GBH concentration increases, suggesting that even long-term GBH exposure does not compromise the genome stability of bacteria. Copyright © 2017, G3: Genes, Genomes, Genetics.

  6. Innovative molecular diagnosis of Trichinella species based on β-carbonic anhydrase genomic sequence.

    Science.gov (United States)

    Zolfaghari Emameh, Reza; Kuuslahti, Marianne; Näreaho, Anu; Sukura, Antti; Parkkila, Seppo

    2016-03-01

    Trichinellosis is a helminthic infection where different species of Trichinella nematodes are the causative agents. Several molecular assays have been designed to aid diagnostics of trichinellosis. These assays are mostly complex and expensive. The genomes of Trichinella species contain certain parasite-specific genes, which can be detected by polymerase chain reaction (PCR) methods. We selected β-carbonic anhydrase (β-CA) gene as a target, because it is present in many parasites genomes but absent in vertebrates. We developed a novel β-CA gene-based method for detection of Trichinella larvae in biological samples. We first identified a β-CA protein sequence from Trichinella spiralis by bioinformatic tools using β-CAs from Caenorhabditis elegans and Drosophila melanogaster. Thereafter, 16 sets of designed primers were tested to detect β-CA genomic sequences from three species of Trichinella, including T. spiralis, Trichinella pseudospiralis and Trichinella nativa. Among all 16 sets of designed primers, the primer set No. 2 efficiently amplified β-CA genomic sequences from T. spiralis, T. pseudospiralis and T. nativa without any false-positive amplicons from other parasite samples including Toxoplasma gondii, Toxocara cati and Parascaris equorum. This robust and straightforward method could be useful for meat inspection in slaughterhouses, quality control by food authorities and medical laboratories.

  7. Genomic-based tools for the risk assessment, management, and prevention of type 2 diabetes

    Directory of Open Access Journals (Sweden)

    Johansen Taber KA

    2015-01-01

    Full Text Available Katherine A Johansen Taber, Barry D DickinsonDepartment of Science and Biotechnology, American Medical Association, Chicago, IL, USAAbstract: Type 2 diabetes (T2D is a common and serious disorder and is a significant risk factor for the development of cardiovascular disease, neuropathy, nephropathy, retinopathy, periodontal disease, and foot ulcers and amputations. The burden of disease associated with T2D has led to an emphasis on early identification of the millions of individuals at high risk so that management and intervention strategies can be effectively implemented before disease progression begins. With increasing knowledge about the genetic basis of T2D, several genomic-based strategies have been tested for their ability to improve risk assessment, management and prevention. Genetic risk scores have been developed with the intent to more accurately identify those at risk for T2D and to potentially improve motivation and adherence to lifestyle modification programs. In addition, evidence is building that oral antihyperglycemic medications are subject to pharmacogenomic variation in a substantial number of patients, suggesting genomics may soon play a role in determining the most effective therapies. T2D is a complex disease that affects individuals differently, and risk prediction and treatment may be challenging for health care providers. Genomic approaches hold promise for their potential to improve risk prediction and tailor management for individual patients and to contribute to better health outcomes for those with T2D.Keywords: diabetes, genomic, risk prediction, management

  8. CRISPR/Cas9-Based Multiplex Genome Editing in Monocot and Dicot Plants.

    Science.gov (United States)

    Ma, Xingliang; Liu, Yao-Guang

    2016-07-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-mediated genome targeting system has been applied to a variety of organisms, including plants. Compared to other genome-targeting technologies such as zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), the CRISPR/Cas9 system is easier to use and has much higher editing efficiency. In addition, multiple "single guide RNAs" (sgRNAs) with different target sequences can be designed to direct the Cas9 protein to multiple genomic sites for simultaneous multiplex editing. Here, we present a procedure for highly efficient multiplex genome targeting in monocot and dicot plants using a versatile and robust CRISPR/Cas9 vector system, emphasizing the construction of binary constructs with multiple sgRNA expression cassettes in one round of cloning using Golden Gate ligation. We also describe the genotyping of targeted mutations in transgenic plants by direct Sanger sequencing followed by decoding of superimposed sequencing chromatograms containing biallelic or heterozygous mutations using the Web-based tool DSDecode. © 2016 by John Wiley & Sons, Inc.

  9. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species.

    Science.gov (United States)

    de Bruijn, Irene; de Kock, Maarten J D; Yang, Meng; de Waard, Pieter; van Beek, Teris A; Raaijmakers, Jos M

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these predictions, however, are untested and the association between genome sequence and biological function of the predicted metabolite is lacking. Here we report the genome-based identification of previously unknown CLP gene clusters in plant pathogenic Pseudomonas syringae strains B728a and DC3000 and in plant beneficial Pseudomonas fluorescens Pf0-1 and SBW25. For P. fluorescens SBW25, a model strain in studying bacterial evolution and adaptation, the structure of the CLP with a predicted 9-amino acid peptide moiety was confirmed by chemical analyses. Mutagenesis confirmed that the three identified NRPS genes are essential for CLP synthesis in strain SBW25. CLP production was shown to play a key role in motility, biofilm formation and in activity of SBW25 against zoospores of Phytophthora infestans. This is the first time that an antimicrobial metabolite is identified from strain SBW25. The results indicate that genome mining may enable the discovery of unknown gene clusters and traits that are highly relevant in the lifestyle of plant beneficial and plant pathogenic bacteria.

  10. A systematic identification of multiple toxin-target interactions based on chemical, genomic and toxicological data.

    Science.gov (United States)

    Zhou, Wei; Huang, Chao; Li, Yan; Duan, Jinyou; Wang, Yonghua; Yang, Ling

    2013-02-01

    Although the assessment of toxicity of various agents, -omics (genomic, proteomic, metabolomic, etc.) data has been accumulated largely, the acquirement of toxicity information of variety of molecules through experimental methods still remains a difficult task. Presently, a systems toxicology approach that integrates massive diverse chemical, genomic and toxicological information was developed for prediction of the toxin targets and their related networks. The procedures are: (1) by use of two powerful statistical methods, i.e., support vector machine (SVM) and random forest (RF), a systemic model for prediction of multiple toxin-target interactions using the extracted chemical and genomic features has been developed with its reliability and robustness estimated. And the qualitative classification of targets according to the phenotypic diseases has been taken into account to further uncover the biological meaning of the targets, as well as to validate the robustness of the in silico models. (2) Based on the predicted toxin-target interactions, a genome-scale toxin-target-disease network exampled by cardiovascular disease is generated. (3) A topological analysis of the network is carried out to identify those targets that are most susceptible in human to topical agents including the most critical toxins, as well as to uncover both the toxin-specific mechanisms and pathways. The methodologies presented herein for systems toxicology will make drug development, toxin environmental risk assessment more efficient, acceptable and cost-effective.

  11. Clinical Sequencing Exploratory Research Consortium: Accelerating Evidence-Based Practice of Genomic Medicine.

    Science.gov (United States)

    Green, Robert C; Goddard, Katrina A B; Jarvik, Gail P; Amendola, Laura M; Appelbaum, Paul S; Berg, Jonathan S; Bernhardt, Barbara A; Biesecker, Leslie G; Biswas, Sawona; Blout, Carrie L; Bowling, Kevin M; Brothers, Kyle B; Burke, Wylie; Caga-Anan, Charlisse F; Chinnaiyan, Arul M; Chung, Wendy K; Clayton, Ellen W; Cooper, Gregory M; East, Kelly; Evans, James P; Fullerton, Stephanie M; Garraway, Levi A; Garrett, Jeremy R; Gray, Stacy W; Henderson, Gail E; Hindorff, Lucia A; Holm, Ingrid A; Lewis, Michelle Huckaby; Hutter, Carolyn M; Janne, Pasi A; Joffe, Steven; Kaufman, David; Knoppers, Bartha M; Koenig, Barbara A; Krantz, Ian D; Manolio, Teri A; McCullough, Laurence; McEwen, Jean; McGuire, Amy; Muzny, Donna; Myers, Richard M; Nickerson, Deborah A; Ou, Jeffrey; Parsons, Donald W; Petersen, Gloria M; Plon, Sharon E; Rehm, Heidi L; Roberts, J Scott; Robinson, Dan; Salama, Joseph S; Scollon, Sarah; Sharp, Richard R; Shirts, Brian; Spinner, Nancy B; Tabor, Holly K; Tarczy-Hornoch, Peter; Veenstra, David L; Wagle, Nikhil; Weck, Karen; Wilfond, Benjamin S; Wilhelmsen, Kirk; Wolf, Susan M; Wynn, Julia; Yu, Joon-Ho

    2016-06-02

    Despite rapid technical progress and demonstrable effectiveness for some types of diagnosis and therapy, much remains to be learned about clinical genome and exome sequencing (CGES) and its role within the practice of medicine. The Clinical Sequencing Exploratory Research (CSER) consortium includes 18 extramural research projects, one National Human Genome Research Institute (NHGRI) intramural project, and a coordinating center funded by the NHGRI and National Cancer Institute. The consortium is exploring analytic and clinical validity and utility, as well as the ethical, legal, and social implications of sequencing via multidisciplinary approaches; it has thus far recruited 5,577 participants across a spectrum of symptomatic and healthy children and adults by utilizing both germline and cancer sequencing. The CSER consortium is analyzing data and creating publically available procedures and tools related to participant preferences and consent, variant classification, disclosure and management of primary and secondary findings, health outcomes, and integration with electronic health records. Future research directions will refine measures of clinical utility of CGES in both germline and somatic testing, evaluate the use of CGES for screening in healthy individuals, explore the penetrance of pathogenic variants through extensive phenotyping, reduce discordances in public databases of genes and variants, examine social and ethnic disparities in the provision of genomics services, explore regulatory issues, and estimate the value and downstream costs of sequencing. The CSER consortium has established a shared community of research sites by using diverse approaches to pursue the evidence-based development of best practices in genomic medicine.

  12. Enhancement of single guide RNA transcription for efficient CRISPR/Cas-based genomic engineering.

    Science.gov (United States)

    Ui-Tei, Kumiko; Maruyama, Shohei; Nakano, Yuko

    2017-06-01

    Genomic engineering using clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) protein is a promising approach for targeting the genomic DNA of virtually any organism in a sequence-specific manner. Recent remarkable advances in CRISPR/Cas technology have made it a feasible system for use in therapeutic applications and biotechnology. In the CRISPR/Cas system, a guide RNA (gRNA), interacting with the Cas protein, recognizes a genomic region with sequence complementarity, and the double-stranded DNA at the target site is cleaved by the Cas protein. A widely used gRNA is an RNA polymerase III (pol III)-driven single gRNA (sgRNA), which is produced by artificial fusion of CRISPR RNA (crRNA) and trans-activation crRNA (tracrRNA). However, we identified a TTTT stretch, known as a termination signal of RNA pol III, in the scaffold region of the sgRNA. Here, we revealed that sgRNA carrying a TTTT stretch reduces the efficiency of sgRNA transcription due to premature transcriptional termination, and decreases the efficiency of genome editing. Unexpectedly, it was also shown that the premature terminated sgRNA may have an adverse effect of inducing RNA interference. Such disadvantageous effects were avoided by substituting one base in the TTTT stretch.

  13. 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis

    NARCIS (Netherlands)

    Greshock, J; Naylor, TL; Margolin, A; Diskin, S; Cleaver, SH; Futreal, PA; deJong, PJ; Zhao, SY; Liebman, M; Weber, BL

    2004-01-01

    Array-based comparative genomic hybridization (aCGH) is a recently developed tool for genome-wide determination of DNA copy number alterations. This technology has tremendous potential for disease-gene discovery in cancer and developmental disorders as well as numerous other applications. However, w

  14. Integrated Database And Knowledge Base For Genomic Prospective Cohort Study In Tohoku Medical Megabank Toward Personalized Prevention And Medicine.

    Science.gov (United States)

    Ogishima, Soichi; Takai, Takako; Shimokawa, Kazuro; Nagaie, Satoshi; Tanaka, Hiroshi; Nakaya, Jun

    2015-01-01

    The Tohoku Medical Megabank project is a national project to revitalization of the disaster area in the Tohoku region by the Great East Japan Earthquake, and have conducted large-scale prospective genome-cohort study. Along with prospective genome-cohort study, we have developed integrated database and knowledge base which will be key database for realizing personalized prevention and medicine.

  15. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    Science.gov (United States)

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops.

  16. HANDS: a tool for genome-wide discovery of subgenome-specific base-identity in polyploids.

    KAUST Repository

    Mithani, Aziz

    2013-09-24

    The analysis of polyploid genomes is problematic because homeologous subgenome sequences are closely related. This relatedness makes it difficult to assign individual sequences to the specific subgenome from which they are derived, and hinders the development of polyploid whole genome assemblies.We here present a next-generation sequencing (NGS)-based approach for assignment of subgenome-specific base-identity at sites containing homeolog-specific polymorphisms (HSPs): \\'HSP base Assignment using NGS data through Diploid Similarity\\' (HANDS). We show that HANDS correctly predicts subgenome-specific base-identity at >90% of assayed HSPs in the hexaploid bread wheat (Triticum aestivum) transcriptome, thus providing a substantial increase in accuracy versus previous methods for homeolog-specific base assignment.We conclude that HANDS enables rapid and accurate genome-wide discovery of homeolog-specific base-identity, a capability having multiple applications in polyploid genomics.

  17. Isolation of single-base genome-edited human iPS cells without antibiotic selection.

    Science.gov (United States)

    Miyaoka, Yuichiro; Chan, Amanda H; Judge, Luke M; Yoo, Jennie; Huang, Miller; Nguyen, Trieu D; Lizarraga, Paweena P; So, Po-Lin; Conklin, Bruce R

    2014-03-01

    Precise editing of human genomes in pluripotent stem cells by homology-driven repair of targeted nuclease-induced cleavage has been hindered by the difficulty of isolating rare clones. We developed an efficient method to capture rare mutational events, enabling isolation of mutant lines with single-base substitutions without antibiotic selection. This method facilitates efficient induction or reversion of mutations associated with human disease in isogenic human induced pluripotent stem cells.

  18. Array comparative genomic hybridization-based characterization of genetic alterations in pulmonary neuroendocrine tumors

    OpenAIRE

    Voortman, Johannes; Lee, Jih-Hsiang; Killian, Jonathan Keith; Suuriniemi, Miia; Wang, Yonghong; Lucchi, Marco; Smith, William I; Meltzer, Paul; Wang, Yisong; Giaccone, Giuseppe

    2010-01-01

    The goal of this study was to characterize and classify pulmonary neuroendocrine tumors based on array comparative genomic hybridization (aCGH). Using aCGH, we performed karyotype analysis of 33 small cell lung cancer (SCLC) tumors, 13 SCLC cell lines, 19 bronchial carcinoids, and 9 gastrointestinal carcinoids. In contrast to the relatively conserved karyotypes of carcinoid tumors, the karyotypes of SCLC tumors and cell lines were highly aberrant. High copy number (CN) gains were detected in ...

  19. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    OpenAIRE

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10...

  20. Genome-Based Studies of Marine Microorganisms to Maximize the Diversity of Natural Products Discovery for Medical Treatments

    Directory of Open Access Journals (Sweden)

    Xin-Qing Zhao

    2011-01-01

    Full Text Available Marine microorganisms are rich source for natural products which play important roles in pharmaceutical industry. Over the past decade, genome-based studies of marine microorganisms have unveiled the tremendous diversity of the producers of natural products and also contributed to the efficiency of harness the strain diversity and chemical diversity, as well as the genetic diversity of marine microorganisms for the rapid discovery and generation of new natural products. In the meantime, genomic information retrieved from marine symbiotic microorganisms can also be employed for the discovery of new medical molecules from yet-unculturable microorganisms. In this paper, the recent progress in the genomic research of marine microorganisms is reviewed; new tools of genome mining as well as the advance in the activation of orphan pathways and metagenomic studies are summarized. Genome-based research of marine microorganisms will maximize the biodiscovery process and solve the problems of supply and sustainability of drug molecules for medical treatments.

  1. CFGP 2.0: a versatile web-based platform for supporting comparative and evolutionary genomics of fungi and Oomycetes.

    Science.gov (United States)

    Choi, Jaeyoung; Cheong, Kyeongchae; Jung, Kyongyong; Jeon, Jongbum; Lee, Gir-Won; Kang, Seogchan; Kim, Sangsoo; Lee, Yin-Won; Lee, Yong-Hwan

    2013-01-01

    In 2007, Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr/) was publicly open with 65 genomes corresponding to 58 fungal and Oomycete species. The CFGP provided six bioinformatics tools, including a novel tool entitled BLASTMatrix that enables search homologous genes to queries in multiple species simultaneously. CFGP also introduced Favorite, a personalized virtual space for data storage and analysis with these six tools. Since 2007, CFGP has grown to archive 283 genomes corresponding to 152 fungal and Oomycete species as well as 201 genomes that correspond to seven bacteria, 39 plants and 105 animals. In addition, the number of tools in Favorite increased to 27. The Taxonomy Browser of CFGP 2.0 allows users to interactively navigate through a large number of genomes according to their taxonomic positions. The user interface of BLASTMatrix was also improved to facilitate subsequent analyses of retrieved data. A newly developed genome browser, Seoul National University Genome Browser (SNUGB), was integrated into CFGP 2.0 to support graphical presentation of diverse genomic contexts. Based on the standardized genome warehouse of CFGP 2.0, several systematic platforms designed to support studies on selected gene families have been developed. Most of them are connected through Favorite to allow of sharing data across the platforms.

  2. Estimation of the Whitefly Bemisia tabaci Genome Size Based on k-mer and Flow Cytometric Analyses

    Directory of Open Access Journals (Sweden)

    Wenbo Chen

    2015-07-01

    Full Text Available Whiteflies of the Bemisia tabaci (Hemiptera: Aleyrodidae cryptic species complex are among the most important agricultural insect pests in the world. These phloem-feeding insects can colonize over 1000 species of plants worldwide and inflict severe economic losses to crops, mainly through the transmission of pathogenic viruses. Surprisingly, there is very little genomic information about whiteflies. As a starting point to genome sequencing, we report a new estimation of the genome size of the B. tabaci B biotype or Middle East-Asia Minor 1 (MEAM1 population. Using an isogenic whitefly colony with over 6500 haploid male individuals for genomic DNA, three paired-end genomic libraries with insert sizes of ~300 bp, 500 bp and 1 Kb were constructed and sequenced on an Illumina HiSeq 2500 system. A total of ~50 billion base pairs of sequences were obtained from each library. K-mer analysis using these sequences revealed that the genome size of the whitefly was ~682.3 Mb. In addition, the flow cytometric analysis estimated the haploid genome size of the whitefly to be ~690 Mb. Considering the congruency between both estimation methods, we predict the haploid genome size of B. tabaci MEAM1 to be ~680–690 Mb. Our data provide a baseline for ongoing efforts to assemble and annotate the B. tabaci genome.

  3. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database.

    Science.gov (United States)

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome's content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called "Cynara cardunculus MicroSatellite DataBase" (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates.

  4. CRISPR-based genome editing and expression control systems in Clostridium acetobutylicum and Clostridium beijerinckii.

    Science.gov (United States)

    Li, Qi; Chen, Jun; Minton, Nigel P; Zhang, Ying; Wen, Zhiqiang; Liu, Jinle; Yang, Haifeng; Zeng, Zhe; Ren, Xiaodan; Yang, Junjie; Gu, Yang; Jiang, Weihong; Jiang, Yu; Yang, Sheng

    2016-07-01

    Solventogenic clostridia are important industrial microorganisms that produce various chemicals and fuels. Effective genetic tools would facilitate physiological studies aimed both at improving our understanding of metabolism and optimizing solvent productivity through metabolic engineering. Here we have developed an all-in-one, CRISPR-based genome editing plasmid, pNICKclos, that can be used to achieve successive rounds of gene editing in Clostridium acetobutylicum ATCC 824 and Clostridium beijerinckii NCIMB 8052 with efficiencies varying from 6.7% to 100% and 18.8% to 100%, respectively. The plasmid specifies the requisite target-specific guide RNA, the gene encoding the Streptococcus pyogenes Cas9 nickase and the genome editing template encompassing the gene-specific homology arms. It can be used to create single target mutants within three days, with a further two days required for the curing of the pNICKclos plasmid ready for a second round of mutagenesis. A S. pyogenes dCas9-mediated gene regulation control system, pdCASclos, was also developed and used in a CRISPRi strategy to successfully repress the expression of spo0A in C. acetobutylicum and C. beijerinckii. The combined application of the established high efficiency CRISPR-Cas9 based genome editing and regulation control systems will greatly accelerate future progress in the understanding and manipulation of metabolism in solventogenic clostridia. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Carotenoid Biosynthesis in Cyanobacteria: Structural and Evolutionary Scenarios Based on Comparative Genomics

    Directory of Open Access Journals (Sweden)

    Chengwei Liang , Fangqing Zhao , Wei Wei , Zhangxiao Wen , Song Qin

    2006-01-01

    Full Text Available Carotenoids are widely distributed pigments in nature and their biosynthetic pathway has been extensively studied in various organisms. The recent access to the overwhelming amount genomic data of cyanobacteria has given birth to a novel approach called comparative genomics. The putative enzymes involved in the carotenoid biosynthesis among the cyanobacteria were determined by similarity-based tools. The reconstruction of biosynthetic pathway was based on the related enzymes. It is interesting to find that nearly all the cyanobacteria share quite similar pathway to synthesize β-carotene except for Gloeobacter violaceus PCC 7421. The enzymes, crtE-B-P-Qb-L, involved in the upstream pathway are more conserved than the subsequent ones (crtW-R. In addition, many carotenoid synthesis enzymes exhibit diversity in structure and function. Such examples in the families of ζ –carotene desaturase, lycopene cylases and carotene ketolases were described in this article. When we mapped these crt genes to the cyanobacterial genomes, the crt genes showed great structural variation among species. All of them are dispersed on the whole chromosome in contrast to the linear adjacent distribution of the crt gene cluster in other eubacteria. Moreover, in unicellular cyanobacteria, each step of the carotenogenic pathway is usually catalyzed by one gene product, whereas multiple ketolase genes are found in filamentous cyanobacteria. Such increased numbers of crt genes and their correlation to the ecological adaptation were carefully discussed.

  6. Regulatory hurdles for genome editing: process- vs. product-based approaches in different regulatory contexts.

    Science.gov (United States)

    Sprink, Thorben; Eriksson, Dennis; Schiemann, Joachim; Hartung, Frank

    2016-07-01

    Novel plant genome editing techniques call for an updated legislation regulating the use of plants produced by genetic engineering or genome editing, especially in the European Union. Established more than 25 years ago and based on a clear distinction between transgenic and conventionally bred plants, the current EU Directives fail to accommodate the new continuum between genetic engineering and conventional breeding. Despite the fact that the Directive 2001/18/EC contains both process- and product-related terms, it is commonly interpreted as a strictly process-based legislation. In view of several new emerging techniques which are closer to the conventional breeding than common genetic engineering, we argue that it should be actually interpreted more in relation to the resulting product. A legal guidance on how to define plants produced by exploring novel genome editing techniques in relation to the decade-old legislation is urgently needed, as private companies and public researchers are waiting impatiently with products and projects in the pipeline. We here outline the process in the EU to develop a legislation that properly matches the scientific progress. As the process is facing several hurdles, we also compare with existing frameworks in other countries and discuss ideas for an alternative regulatory system.

  7. Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing.

    Directory of Open Access Journals (Sweden)

    Alexander William Eastman

    2015-01-01

    Full Text Available Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing

  8. DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database

    CERN Document Server

    Afify, Heba; Wahed, Manal Abdel

    2011-01-01

    Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information theory are often perceived as being of interest for data communication and storage. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison of genomic databases. This paper presents a differential compression algorithm that is based on production of difference sequences according to op-code table in order to optimize the compression of homologous sequences in dataset. Therefore, the stored data are composed of reference sequence, the set of differences, and differences locations, instead of storing each sequence individually. This algorithm does not require a priori knowledge about the statistics of the sequence set. The...

  9. Measurement of word frequencies in genomic DNA sequences based on partial alignment and fuzzy set.

    Science.gov (United States)

    Shida, Fumiya; Mizuta, Satoshi

    2014-08-01

    Accompanied with the rapid increase of the amount of data registered in the databases of biological sequences, the need for a fast method of sequence comparison applicable to sequences of large size is also increasing. In general, alignment is used for sequence comparison. However, the alignment may not be appropriate for comparison of sequences of large size such as whole genome sequences due to its large time complexity. In this article, we propose a semi alignment-free method of sequence comparison based on word frequency distributions, in which we partially use the alignment to measure word frequencies along with the idea of fuzzy set theory. Experiments with ten bacterial genome sequences demonstrated that the fuzzy measurements has the effect that facilitates discrimination between close relatives and distant relatives.

  10. DNA methylation profiling using bisulfite-based epityping of pooled genomic DNA.

    Science.gov (United States)

    Docherty, Sophia J; Davis, Oliver S P; Haworth, Claire M A; Plomin, Robert; Mill, Jonathan

    2010-11-01

    DNA methylation plays a vital role in normal cellular function, with aberrant methylation signatures being implicated in a growing number of human pathologies and complex human traits. Methods based on the modification of genomic DNA with sodium bisulfite are considered the 'gold-standard' for DNA methylation profiling on genomic DNA; however they require large amounts of DNA and may be prohibitively expensive when used on the large sample sizes necessary to detect small effects. DNA pooling approaches are already widely used in large-scale studies of DNA sequence and gene expression. In this paper, we describe the application of this economical DNA pooling technique to the study of DNA methylation profiles. This method generates accurate quantitative assessments of group DNA methylation averages, reducing the time, cost and amount of DNA starting material required for large-scale epigenetic investigation of disease phenotypes.

  11. Human genomic library screened with 17-base oligonucleotide probes yields a novel interferon gene.

    OpenAIRE

    Torczynski, R M; Fuke, M; Bollon, A P

    1984-01-01

    A method is presented that has permitted a human genomic library to be screened for low-copy genes using 17-base synthetic oligonucleotides as probes. Parallel screening with two different 17-base probes permitted the unambiguous identification of clones containing interferon-alpha (IFN-alpha) genes. The isolated human IFN-alpha genes were sequenced, and one appears to be IFN-alpha L; the other is one not previously described, which we have designated IFN-alpha WA. The IFN-alpha WA sequence d...

  12. Visualization for genomics: the Microbial Genome Viewer.

    NARCIS (Netherlands)

    Kerkhoven, R.; Enckevort, F.H.J. van; Boekhorst, J.; Molenaar, D.; Siezen, R.J.

    2004-01-01

    SUMMARY: A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a My

  13. Improved Evidence-Based Genome-scale Metabolic Models for Maize Leaf, Embryo, and Endosperm

    Energy Technology Data Exchange (ETDEWEB)

    Seaver, Samuel M.D.; Frelin, Oceane; Bradbury, Louis M.T.; Zarecki, Raphy; Ruppin, Eytan; Hanson, Andrew D.; Henry, Christopher S.

    2015-03-10

    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes.

  14. Improved Evidence-Based Genome-scale Metabolic Models for Maize Leaf, Embryo, and Endosperm.

    Directory of Open Access Journals (Sweden)

    Samuel eSeaver

    2015-03-01

    Full Text Available There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes.

  15. A RAD-based linkage map and comparative genomics in the gudgeons (genus Gnathopogon, Cyprinidae

    Directory of Open Access Journals (Sweden)

    Kakioka Ryo

    2013-01-01

    Full Text Available Abstract Background The construction of linkage maps is a first step in exploring the genetic basis for adaptive phenotypic divergence in closely related species by quantitative trait locus (QTL analysis. Linkage maps are also useful for comparative genomics in non-model organisms. Advances in genomics technologies make it more feasible than ever to study the genetics of adaptation in natural populations. Restriction-site associated DNA (RAD sequencing in next-generation sequencers facilitates the development of many genetic markers and genotyping. We aimed to construct a linkage map of the gudgeons of the genus Gnathopogon (Cyprinidae for comparative genomics with the zebrafish Danio rerio (a member of the same family as gudgeons and for the future QTL analysis of the genetic architecture underlying adaptive phenotypic evolution of Gnathopogon. Results We constructed the first genetic linkage map of Gnathopogon using a 198 F2 interspecific cross between two closely related species in Japan: river-dwelling Gnathopogon elongatus and lake-dwelling Gnathopogon caerulescens. Based on 1,622 RAD-tag markers, a linkage map spanning 1,390.9 cM with 25 linkage groups and an average marker interval of 0.87 cM was constructed. We also identified a region involving female-specific transmission ratio distortion (TRD. Synteny and collinearity were extensively conserved between Gnathopogon and zebrafish. Conclusions The dense SNP-based linkage map presented here provides a basis for future QTL analysis. It will also be useful for transferring genomic information from a “traditional” model fish species, zebrafish, to screen candidate genes underlying ecologically important traits of the gudgeons.

  16. A RAD-based linkage map and comparative genomics in the gudgeons (genus Gnathopogon, Cyprinidae)

    Science.gov (United States)

    2013-01-01

    Background The construction of linkage maps is a first step in exploring the genetic basis for adaptive phenotypic divergence in closely related species by quantitative trait locus (QTL) analysis. Linkage maps are also useful for comparative genomics in non-model organisms. Advances in genomics technologies make it more feasible than ever to study the genetics of adaptation in natural populations. Restriction-site associated DNA (RAD) sequencing in next-generation sequencers facilitates the development of many genetic markers and genotyping. We aimed to construct a linkage map of the gudgeons of the genus Gnathopogon (Cyprinidae) for comparative genomics with the zebrafish Danio rerio (a member of the same family as gudgeons) and for the future QTL analysis of the genetic architecture underlying adaptive phenotypic evolution of Gnathopogon. Results We constructed the first genetic linkage map of Gnathopogon using a 198 F2 interspecific cross between two closely related species in Japan: river-dwelling Gnathopogon elongatus and lake-dwelling Gnathopogon caerulescens. Based on 1,622 RAD-tag markers, a linkage map spanning 1,390.9 cM with 25 linkage groups and an average marker interval of 0.87 cM was constructed. We also identified a region involving female-specific transmission ratio distortion (TRD). Synteny and collinearity were extensively conserved between Gnathopogon and zebrafish. Conclusions The dense SNP-based linkage map presented here provides a basis for future QTL analysis. It will also be useful for transferring genomic information from a “traditional” model fish species, zebrafish, to screen candidate genes underlying ecologically important traits of the gudgeons. PMID:23324215

  17. A SNP based linkage map of the turkey genome reveals multiple intrachromosomal rearrangements between the Turkey and Chicken genomes

    NARCIS (Netherlands)

    Aslam, M.L.; Bastiaansen, J.W.M.; Crooijmans, R.P.M.A.; Vereijken, A.; Groenen, M.A.M.; Megens, H.J.W.C.

    2010-01-01

    Background The turkey (Meleagris gallopavo) is an important agricultural species that is the second largest contributor to the world's poultry meat production. The genomic resources of turkey provide turkey breeders with tools needed for the genetic improvement of commercial breeds of turkey for eco

  18. Group-based and personalized care in an age of genomic and evidence-based medicine: a reappraisal.

    Science.gov (United States)

    Maglo, Koffi N

    2012-01-01

    This article addresses the philosophical and moral foundations of group-based and individualized therapy in connection with population care equality. The U.S. Food and Drug Administration (FDA) recently modified its public health policy by seeking to enhance the efficacy and equality of care through the approval of group-specific prescriptions and doses for some drugs. In the age of genomics, when individualization of care increasingly has become a major concern, investigating the relationship between population health, stratified medicine, and personalized therapy can improve our understanding of the ethical and biomedical implications of genomic medicine. I suggest that the need to optimize population health through population substructure-sensitive research and the need to individualize care through genetically targeted therapies are not necessarily incompatible. Accordingly, the article reconceptualizes a unified goal for modern scientific medicine in terms of individualized equal care.

  19. SISP: a Fast Species Identification System for Prokaryotes Based on Total Nucleotide Identity of Whole Genome Sequences

    Directory of Open Access Journals (Sweden)

    Jiapeng Chen

    2015-06-01

    Full Text Available In the genomic era, new techniques and criteria are proposed to improve the traditionally phenotypic and biochemical test–based approaches for prokaryotic species definition. Among them, average nucleotide identity (ANI mirrors DNA-DNA hybridization and is widely used by the microbial research community. However, our test shows that ANI possibly defines distinct taxa as the same species when they shared highly homologous sequences in a very short genomic region. In this study, we propose an improved algorithm named total nucleotide identity (TNI for use in bacterial taxonomy; this algorithm provided higher accuracy for species classification than ANI. Furthermore, we developed a species identification system for prokaryotes (SISP based on pairwise TNI of 3,073 genomes acquired from GenBank. For a submitted query genome, SISP can quickly find its most closely related genome from the established database based on the TNI calculation and infer the possible species of the query genome. Given a criterion of TNI > 70%, SISP has an accuracy that was above 90% for 3,596 prokaryotic genomes. SISP is open source and is available at https://github.com/chjp/SISProkaryotes.

  20. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    Directory of Open Access Journals (Sweden)

    Borui Pi

    Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  1. Demography-adjusted tests of neutrality based on genome-wide SNP data

    KAUST Repository

    Rafajlović, Marina

    2014-08-01

    Tests of the neutral evolution hypothesis are usually built on the standard model which assumes that mutations are neutral and the population size remains constant over time. However, it is unclear how such tests are affected if the last assumption is dropped. Here, we extend the unifying framework for tests based on the site frequency spectrum, introduced by Achaz and Ferretti, to populations of varying size. Key ingredients are the first two moments of the site frequency spectrum. We show how these moments can be computed analytically if a population has experienced two instantaneous size changes in the past. We apply our method to data from ten human populations gathered in the 1000 genomes project, estimate their demographies and define demography-adjusted versions of Tajima\\'s D, Fay & Wu\\'s H, and Zeng\\'s E. Our results show that demography-adjusted test statistics facilitate the direct comparison between populations and that most of the differences among populations seen in the original unadjusted tests can be explained by their underlying demographies. Upon carrying out whole-genome screens for deviations from neutrality, we identify candidate regions of recent positive selection. We provide track files with values of the adjusted and unadjusted tests for upload to the UCSC genome browser. © 2014 Elsevier Inc.

  2. Metabolic model for the filamentous ‘Candidatus Microthrix parvicella' based on genomic and metagenomic analyses

    Science.gov (United States)

    Jon McIlroy, Simon; Kristiansen, Rikke; Albertsen, Mads; Michael Karst, Søren; Rossetti, Simona; Lund Nielsen, Jeppe; Tandoi, Valter; James Seviour, Robert; Nielsen, Per Halkjær

    2013-01-01

    ‘Candidatus Microthrix parvicella' is a lipid-accumulating, filamentous bacterium so far found only in activated sludge wastewater treatment plants, where it is a common causative agent of sludge separation problems. Despite attracting considerable interest, its detailed physiology is still unclear. In this study, the genome of the RN1 strain was sequenced and annotated, which facilitated the construction of a theoretical metabolic model based on available in situ and axenic experimental data. This model proposes that under anaerobic conditions, this organism accumulates preferentially long-chain fatty acids as triacylglycerols. Utilisation of trehalose and/or polyphosphate stores or partial oxidation of long-chain fatty acids may supply the energy required for anaerobic lipid uptake and storage. Comparing the genome sequence of this isolate with metagenomes from two full-scale wastewater treatment plants with enhanced biological phosphorus removal reveals high similarity, with few metabolic differences between the axenic and the dominant community ‘Ca. M. parvicella' strains. Hence, the metabolic model presented in this paper could be considered generally applicable to strains in full-scale treatment systems. The genomic information obtained here will provide the basis for future research into in situ gene expression and regulation. Such information will give substantial insight into the ecophysiology of this unusual and biotechnologically important filamentous bacterium. PMID:23446830

  3. Pantograph: A template-based method for genome-scale metabolic model reconstruction.

    Science.gov (United States)

    Loira, Nicolas; Zhukova, Anna; Sherman, David James

    2015-04-01

    Genome-scale metabolic models are a powerful tool to study the inner workings of biological systems and to guide applications. The advent of cheap sequencing has brought the opportunity to create metabolic maps of biotechnologically interesting organisms. While this drives the development of new methods and automatic tools, network reconstruction remains a time-consuming process where extensive manual curation is required. This curation introduces specific knowledge about the modeled organism, either explicitly in the form of molecular processes, or indirectly in the form of annotations of the model elements. Paradoxically, this knowledge is usually lost when reconstruction of a different organism is started. We introduce the Pantograph method for metabolic model reconstruction. This method combines a template reaction knowledge base, orthology mappings between two organisms, and experimental phenotypic evidence, to build a genome-scale metabolic model for a target organism. Our method infers implicit knowledge from annotations in the template, and rewrites these inferences to include them in the resulting model of the target organism. The generated model is well suited for manual curation. Scripts for evaluating the model with respect to experimental data are automatically generated, to aid curators in iterative improvement. We present an implementation of the Pantograph method, as a toolbox for genome-scale model reconstruction, curation and validation. This open source package can be obtained from: http://pathtastic.gforge.inria.fr.

  4. UniPrimer: A Web-Based Primer Design Tool for Comparative Analyses of Primate Genomes

    Directory of Open Access Journals (Sweden)

    Nomin Batnyam

    2012-01-01

    Full Text Available Whole genome sequences of various primates have been released due to advanced DNA-sequencing technology. A combination of computational data mining and the polymerase chain reaction (PCR assay to validate the data is an excellent method for conducting comparative genomics. Thus, designing primers for PCR is an essential procedure for a comparative analysis of primate genomes. Here, we developed and introduced UniPrimer for use in those studies. UniPrimer is a web-based tool that designs PCR- and DNA-sequencing primers. It compares the sequences from six different primates (human, chimpanzee, gorilla, orangutan, gibbon, and rhesus macaque and designs primers on the conserved region across species. UniPrimer is linked to RepeatMasker, Primer3Plus, and OligoCalc softwares to produce primers with high accuracy and UCSC In-Silico PCR to confirm whether the designed primers work. To test the performance of UniPrimer, we designed primers on sample sequences using UniPrimer and manually designed primers for the same sequences. The comparison of the two processes showed that UniPrimer was more effective than manual work in terms of saving time and reducing errors.

  5. Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew.

    Science.gov (United States)

    Fan, Yu; Yu, Dandan; Yao, Yong-Gang

    2014-11-21

    The tree shrew (Tupaia belangeri) is a small mammal with a close relationship to primates and it has been proposed as an alternative experimental animal to primates in biomedical research. The recent release of a high-quality Chinese tree shrew genome enables more researchers to use this species as the model animal in their studies. With the aim to making the access to an extensively annotated genome database straightforward and easy, we have created the Tree shrew Database (TreeshrewDB). This is a web-based platform that integrates the currently available data from the tree shrew genome, including an updated gene set, with a systematic functional annotation and a mRNA expression pattern. In addition, to assist with automatic gene sequence analysis, we have integrated the common programs Blast, Muscle, GBrowse, GeneWise and codeml, into TreeshrewDB. We have also developed a pipeline for the analysis of positive selection. The user-friendly interface of TreeshrewDB, which is available at http://www.treeshrewdb.org, will undoubtedly help in many areas of biological research into the tree shrew.

  6. A profile-based method for identifying functional divergence of orthologous genes in bacterial genomes.

    Science.gov (United States)

    Wheeler, Nicole E; Barquist, Lars; Kingsley, Robert A; Gardner, Paul P

    2016-12-01

    Next generation sequencing technologies have provided us with a wealth of information on genetic variation, but predicting the functional significance of this variation is a difficult task. While many comparative genomics studies have focused on gene flux and large scale changes, relatively little attention has been paid to quantifying the effects of single nucleotide polymorphisms and indels on protein function, particularly in bacterial genomics. We present a hidden Markov model based approach we call delta-bitscore (DBS) for identifying orthologous proteins that have diverged at the amino acid sequence level in a way that is likely to impact biological function. We benchmark this approach with several widely used datasets and apply it to a proof-of-concept study of orthologous proteomes in an investigation of host adaptation in Salmonella enterica We highlight the value of the method in identifying functional divergence of genes, and suggest that this tool may be a better approach than the commonly used dN/dS metric for identifying functionally significant genetic changes occurring in recently diverged organisms. A program implementing DBS for pairwise genome comparisons is freely available at: https://github.com/UCanCompBio/deltaBS CONTACT: nicole.wheeler@pg.canterbury.ac.nz or lars.barquist@uni-wuerzburg.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  7. A challenge to the ancient origin of SIVagm based on African green monkey mitochondrial genomes.

    Directory of Open Access Journals (Sweden)

    Joel O Wertheim

    2007-07-01

    Full Text Available While the circumstances surrounding the origin and spread of HIV are becoming clearer, the particulars of the origin of simian immunodeficiency virus (SIV are still unknown. Specifically, the age of SIV, whether it is an ancient or recent infection, has not been resolved. Although many instances of cross-species transmission of SIV have been documented, the similarity between the African green monkey (AGM and SIVagm phylogenies has long been held as suggestive of ancient codivergence between SIVs and their primate hosts. Here, we present well-resolved phylogenies based on full-length AGM mitochondrial genomes and seven previously published SIVagm genomes; these allowed us to perform the first rigorous phylogenetic test to our knowledge of the hypothesis that SIVagm codiverged with the AGMs. Using the Shimodaira-Hasegawa test, we show that the AGM mitochondrial genomes and SIVagm did not evolve along the same topology. Furthermore, we demonstrate that the SIVagm topology can be explained by a pattern of west-to-east transmission of the virus across existing AGM geographic ranges. Using a relaxed molecular clock, we also provide a date for the most recent common ancestor of the AGMs at approximately 3 million years ago. This study substantially weakens the theory of ancient SIV infection followed by codivergence with its primate hosts.

  8. Safety assessment of Bifidobacterium longum J DM301 based on complete genome sequences

    Institute of Scientific and Technical Information of China (English)

    Yan-Xia Wei; Zhuo-Yang Zhang; Chang Liu; Xiao-Kui Guo; Pradeep K Malakar

    2012-01-01

    AIM: To assess the safety of Bifidobacterium longum (B.longum) JDM301 based on complete genome sequences. METHODS: The complete genome sequences of JDM301 were determined using the GS 20 system. Putative virulence factors, putative antibiotic resistance genes and genes encoding enzymes responsible for harmful metabolites were identified by blast with virulence factors database, antibiotic resistance genes database and genes associated with harmful metabolites in previous reports. Minimum inhibitory concentration of 16 common antimicrobial agents was evaluated by E-test. RESULTS: JDM301 was shown to contain 36 genes associated with antibiotic resistance, 5 enzymes related to harmful metabolites and 162 nonspecific virulence factors mainly associated with transcriptional regulation, adhesion, sugar and amino acid transport. B. longum JDM301 was intrinsically resistant tocipro ciprofloxacin,amikacin, gentamicin and streptomycin and susceptible to vancomycin, amoxicillin, cephalothin, chloramphenicol, erythromycin, ampicillin, cefotaxime, rifampicin, imipenemandtrimethoprim and trimethoprim-sulphamethoxazol. JDM301.JDM301 was moderately resistant to bacitracin, while an earlier study showed that bifidobacteria were susceptible to this antibiotic. A tetracycline resistance gene with the risk of transfer was found in JDM301, which needs to be experimentally validated. CONCLUSION: The safety assessment of JDM301 using information derived from complete bacterial genome will contribute to a wider and deeper insight into the safety of probiotic bacteria.

  9. Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis.

    NARCIS (Netherlands)

    Hehir-Kwa, J.Y.; Egmont-Peterson, M.; Janssen, I.M.; Smeets, D.F.C.M.; Geurts van Kessel, A.H.M.; Veltman, J.A.

    2007-01-01

    Recently, comparative genomic hybridization onto bacterial artificial chromosome (BAC) arrays (array-based comparative genomic hybridization) has proved to be successful for the detection of submicroscopic DNA copy-number variations in health and disease. Technological improvements to achieve a

  10. Molecular Characterization of Five Potyviruses Infecting Korean Sweet Potatoes Based on Analyses of Complete Genome Sequences

    Directory of Open Access Journals (Sweden)

    Hae-Ryun Kwak

    2015-12-01

    Full Text Available Sweet potatoes (Ipomea batatas L. are grown extensively, in tropical and temperate regions, and are important food crops worldwide. In Korea, potyviruses, including Sweet potato feathery mottle virus (SPFMV, Sweet potato virus C (SPVC, Sweet potato virus G (SPVG, Sweet potato virus 2 (SPV2, and Sweet potato latent virus (SPLV, have been detected in sweet potato fields at a high (~95% incidence. In the present work, complete genome sequences of 18 isolates, representing the five potyviruses mentioned above, were compared with previously reported genome sequences. The complete genomes consisted of 10,081 to 10,830 nucleotides, excluding the poly-A tails. Their genomic organizations were typical of the Potyvirus genus, including one target open reading frame coding for a putative polyprotein. Based on phylogenetic analyses and sequence comparisons, the Korean SPFMV isolates belonged to the strains RC and O with >98% nucleotide sequence identity. Korean SPVC isolates had 99% identity to the Japanese isolate SPVC-Bungo and 70% identity to the SPFMV isolates. The Korean SPVG isolates showed 99% identity to the three previously reported SPVG isolates. Korean SPV2 isolates had 97% identity to the SPV2 GWB-2 isolate from the USA. Korean SPLV isolates had a relatively low (88% nucleotide sequence identity with the Taiwanese SPLV-TW isolates, and they were phylogenetically distantly related to SPFMV isolates. Recombination analysis revealed that possible recombination events occurred in the P1, HC-Pro and NIa-NIb regions of SPFMV and SPLV isolates and these regions were identified as hotspots for recombination in the sweet potato potyviruses.

  11. Deciphering Clostridium tyrobutyricum Metabolism Based on the Whole-Genome Sequence and Proteome Analyses

    Directory of Open Access Journals (Sweden)

    Joungmin Lee

    2016-06-01

    Full Text Available Clostridium tyrobutyricum is a Gram-positive anaerobic bacterium that efficiently produces butyric acid and is considered a promising host for anaerobic production of bulk chemicals. Due to limited knowledge on the genetic and metabolic characteristics of this strain, however, little progress has been made in metabolic engineering of this strain. Here we report the complete genome sequence of C. tyrobutyricum KCTC 5387 (ATCC 25755, which consists of a 3.07-Mbp chromosome and a 63-kbp plasmid. The results of genomic analyses suggested that C. tyrobutyricum produces butyrate from butyryl-coenzyme A (butyryl-CoA through acetate reassimilation by CoA transferase, differently from Clostridium acetobutylicum, which uses the phosphotransbutyrylase-butyrate kinase pathway; this was validated by reverse transcription-PCR (RT-PCR of related genes, protein expression levels, in vitro CoA transferase assay, and fed-batch fermentation. In addition, the changes in protein expression levels during the course of batch fermentations on glucose were examined by shotgun proteomics. Unlike C. acetobutylicum, the expression levels of proteins involved in glycolytic and fermentative pathways in C. tyrobutyricum did not decrease even at the stationary phase. Proteins related to energy conservation mechanisms, including Rnf complex, NfnAB, and pyruvate-phosphate dikinase that are absent in C. acetobutylicum, were identified. Such features explain why this organism can produce butyric acid to a much higher titer and better tolerate toxic metabolites. This study presenting the complete genome sequence, global protein expression profiles, and genome-based metabolic characteristics during the batch fermentation of C. tyrobutyricum will be valuable in designing strategies for metabolic engineering of this strain.

  12. Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants

    Energy Technology Data Exchange (ETDEWEB)

    van Baren, Marijke J.; Bachy, Charles; Reistetter, Emily Nahas; Purvine, Samuel O.; Grimwood, Jane; Sudek, Sebastian; Yu, Hang; Poirier, Camille; Deerinck, Thomas J.; Kuo, Alan; Grigoriev, Igor V.; Wong, Chee-Hong; Smith, Richard D.; Callister, Stephen J.; Wei, Chia-Lin; Schmutz, Jeremy; Worden, Alexandra Z.

    2016-03-31

    Prasinophytes are widespread marine green algae that are related to plants. Abundance of the genus Micromonas has reportedly increased in the Arctic due to climate-induced changes. Thus, studies of these organisms are important for marine ecology and understanding Virdiplantae evolution and diversification. We generated evidence-based Micromonas gene models using proteomics and RNA-Seq to improve prasinophyte genomic resources. First, sequences of four chromosomes in the 22 Mb Micromonas pusilla (CCMP1545) genome were finished. Comparison with the finished 21 Mb Micromonas commoda (RCC299) shows they share less than 8,142of ~10,000 protein-encoding genes, depending on the analysis method. Unlike RCC299 and other sequenced eukaryotes, CCMP1545 has two abundant repetitive intron types and a high percent (26%) GC splice donors. Micromonas has more genus-specific protein families (19%) than other genome sequenced prasinophytes (11%). Comparative analyses using predicted proteomes from other prasinophytes reveal proteins likely related to scale formation and ancestral photosynthesis. Our studies also indicate that peptidoglycan (PG) biosynthesis enzymes have been lost in multiple independent events in select prasinophytes and most plants. However, CCMP1545, polar Micromonas CCMP2099 and prasinophytes from other claasses retain the entire PG pathway, like moss and glaucophyte algae. Multiple vascular plants that share a unique bi-domain protein also have the pathway, except the Penicillin-Binding-Protein. Alongside Micromonas experiments using antibiotics that halt bacterial PG biosynthesis, the findings highlight unrecognized phylogenetic complexity in the PG-pathway retention and implicate a role in chloroplast structure of division in several extant Vridiplantae lineages. Extensive differences in gene loss and architecture between related prasinophytes underscore their extensive divergence. PG biosynthesis genes from the cyanobacterial endosymbiont that became the

  13. Molecular analysis of single oocyst of Eimeria by whole genome amplification (WGA) based nested PCR.

    Science.gov (United States)

    Wang, Yunzhou; Tao, Geru; Cui, Yujuan; Lv, Qiyao; Xie, Li; Li, Yuan; Suo, Xun; Qin, Yinghe; Xiao, Lihua; Liu, Xianyong

    2014-09-01

    PCR-based molecular tools are widely used for the identification and characterization of protozoa. Here we report the molecular analysis of Eimeria species using combined methods of whole genome amplification (WGA) and nested PCR. Single oocyst of Eimeria stiedai or Eimeriamedia was directly used for random amplification of the genomic DNA with either primer extension preamplification (PEP) or multiple displacement amplification (MDA), and then the WGA product was used as template in nested PCR with species-specific primers for ITS-1, 18S rDNA and 23S rDNA of E. stiedai and E. media. WGA-based PCR was successful for the amplification of these genes from single oocyst. For the species identification of single oocyst isolated from mixed E. stiedai or E. media, the results from WGA-based PCR were exactly in accordance with those from morphological identification, suggesting the availability of this method in molecular analysis of eimerian parasites at the single oocyst level. WGA-based PCR method can also be applied for the identification and genetic characterization of other protists.

  14. Genomic predictions based on a joint reference population for the Nordic Red cattle breeds.

    Science.gov (United States)

    Zhou, L; Heringstad, B; Su, G; Guldbrandtsen, B; Meuwissen, T H E; Svendsen, M; Grove, H; Nielsen, U S; Lund, M S

    2014-07-01

    The main aim of this study was to compare accuracies of imputation and genomic predictions based on single and joint reference populations for Norwegian Red (NRF) and a composite breed (DFS) consisting of Danish Red, Finnish Ayrshire, and Swedish Red. The single nucleotide polymorphism (SNP) data for NRF consisted of 2 data sets: one including 25,000 markers (NRF25K) and the other including 50,000 markers (NRF50K). The NRF25K data set had 2,572 bulls, and the NRF50K data set had 1,128 bulls. Four hundred forty-two bulls were genotyped in both data sets (double-genotyped bulls). The DFS data set (DSF50K) included 50,000 markers of 13,472 individuals, of which around 4,700 were progeny-tested bulls. The NRF25K data set was imputed to 50,000 density using the software Beagle. The average error rate for the imputation of NRF25K decreased slightly from 0.023 to 0.021, and the correlation between observed and imputed genotypes changed from 0.935 to 0.936 when comparing the NRF50K reference and the NRF50K-DFS50K joint reference imputations. A genomic BLUP (GBLUP) model and a Bayesian 4-component mixture model were used to predict genomic breeding values for the NRF and DFS bulls based on the single and joint NRF and DFS reference populations. In the multiple population predictions, accuracies of genomic breeding values increased for the 3 production traits (milk, fat, and protein yields) for both NRF and DFS. Accuracies increased by 6 and 1.3 percentage points, on average, for the NRF and DFS bulls, respectively, using the GBLUP model, and by 9.3 and 1.3 percentage points, on average, using the Bayesian 4-component mixture model. However, accuracies for health or reproduction traits did not increase from the multiple population predictions. Among the 3 DFS populations, Swedish Red gained most in accuracies from the multiple population predictions, presumably because Swedish Red has a closer genetic relationship with NRF than Danish Red and Finnish Ayrshire. The Bayesian 4

  15. Interior flow re-organization along the great Byrd-Totten ice divide of East Antarctica: evidence from radar layer disruptions

    Science.gov (United States)

    Cavitte, M. G. P.; Blankenship, D. D.; Young, D. A.; Siegert, M. J.; Le Meur, E.; Chappellaz, J. A.

    2012-04-01

    Interior flow re-organization is an essential component in our understanding of the temporality and magnitude of sea-level variations, especially in non-marine portions of the East Antarctic ice sheet where most of the "sea-level rise potential" is stored. Internal structure in these regions can be evaluated using internal layers from radar sounding, which can be traced over hundreds of kilometers using airborne surveys. The exceptional acuity of phase-coherent radar surveys gives both vertical resolution and horizontal continuity to radar layers that makes them extremely useful for constraining glaciological models and contributing to ice core site selection as well as understanding transient ice sheet behavior. Specifically, when well dated, these ice layers can give us high resolution snapshots into temporal and spatial ice evolution including tributary penetration of ice divides. Present areas of tributary flow reaching into the interior are well constrained through remote sensing techniques, while evidence of such transient behavior in previous glacial cycles has long since been buried by subsequent accumulation. In these cases, radar imaging is the only technique useful for identifying buried episodes of transient ice flow as anomalous yet depth consistent layer disruptions. We focus on the great Byrd-Totten ice divide in the East Antarctic interior, between the Vostok and EPICA Dome C ice core sites, to identify periods of tributary encroachment. Several layers tracked between the two sites are used to correlate their chronologies and accurately date the horizons (M. Cavitte et al, in prep.). A strong advantage in using radar for the dating of these events is its negligible contribution to age uncertainties: radar uncertainties are of the order of hundreds of years, a factor of ten smaller than traditional ice core dating techniques. Combining the age-depth stratigraphy obtained for the area and visual identifications of tributary intrusions gives a

  16. An effective virus-based gene silencing method for functional genomics studies in common bean

    Directory of Open Access Journals (Sweden)

    Kachroo Aardra

    2011-06-01

    Full Text Available Abstract Background Common bean (Phaseolus vulgaris L. is a crop of economic and nutritious importance in many parts of the world. The lack of genomic resources have impeded the advancement of common bean genomics and thereby crop improvement. Although concerted efforts from the "Phaseomics" consortium have resulted in the development of several genomic resources, functional studies have continued to lag due to the recalcitrance of this crop for genetic transformation. Results Here we describe the use of a bean pod mottle virus (BPMV-based vector for silencing of endogenous genes in common bean as well as for protein expression. This BPMV-based vector was originally developed for use in soybean. It has been successfully employed for both protein expression and gene silencing in this species. We tested this vector for applications in common bean by targeting common bean genes encoding nodulin 22 and stearoyl-acyl carrier protein desaturase for silencing. Our results indicate that the BPMV vector can indeed be employed for reverse genetics studies of diverse biological processes in common bean. We also used the BPMV-based vector for expressing the green fluorescent protein (GFP in common bean and demonstrate stable GFP expression in all common bean tissues where BPMV was detected. Conclusions The availability of this vector is an important advance for the common bean research community not only because it provides a rapid means for functional studies in common bean, but also because it does so without generating genetically modified plants. Here we describe the detailed methodology and provide essential guidelines for the use of this vector for both gene silencing and protein expression in common bean. The entire VIGS procedure can be completed in 4-5 weeks.

  17. Xenogenomics: Genomic Bioprospecting in Indigenous and Exotic Plants Through EST Discovery, cDNA Microarray-Based Expression Profiling and Functional Genomics

    Directory of Open Access Journals (Sweden)

    German C. Spangenberg

    2006-04-01

    Full Text Available To date, the overwhelming majority of genomics programs in plants have been directed at model or crop plant species, meaning that very little of the naturally occurring sequence diversity found in plants is available for characterization and exploitation. In contrast, ‘xenogenomics’ refers to the discovery and functional analysis of novel genes and alleles from indigenous and exotic species, permitting bioprospecting of biodiversity using high-throughput genomics experimental approaches. Such a program has been initiated to bioprospect for genetic determinants of abiotic stress tolerance in indigenous Australian flora and native Antarctic plants. Uniquely adapted Poaceae and Fabaceae species with enhanced tolerance to salt, drought, elevated soil aluminium concentration, and freezing stress have been identified, based primarily on their eco-physiology, and have been subjected to structural and functional genomics analyses. For each species, EST collections have been derived from plants subjected to appropriate abiotic stresses. Transcript profiling with spotted unigene cDNA micro-arrays has been used to identify genes that are transcriptionally modulated in response to abiotic stress. Candidate genes identified on the basis of sequence annotation or transcript profiling have been assayed in planta and other in vivo systems for their capacity to confer novel phenotypes. Comparative genomics analysis of novel genes and alleles identified in the xenogenomics target plant species has subsequently been undertaken with reference to key model and crop plants.

  18. Genome-wide conserved non-coding microsatellite (CNMS) marker-based integrative genetical genomics for quantitative dissection of seed weight in chickpea

    Science.gov (United States)

    Bajaj, Deepak; Saxena, Maneesha S.; Kujur, Alice; Das, Shouvik; Badoni, Saurabh; Tripathi, Shailesh; Upadhyaya, Hari D.; Gowda, C. L. L.; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.

    2015-01-01

    Phylogenetic footprinting identified 666 genome-wide paralogous and orthologous CNMS (conserved non-coding microsatellite) markers from 5′-untranslated and regulatory regions (URRs) of 603 protein-coding chickpea genes. The (CT)n and (GA)n CNMS carrying CTRMCAMV35S and GAGA8BKN3 regulatory elements, respectively, are abundant in the chickpea genome. The mapped genic CNMS markers with robust amplification efficiencies (94.7%) detected higher intraspecific polymorphic potential (37.6%) among genotypes, implying their immense utility in chickpea breeding and genetic analyses. Seventeen differentially expressed CNMS marker-associated genes showing strong preferential and seed tissue/developmental stage-specific expression in contrasting genotypes were selected to narrow down the gene targets underlying seed weight quantitative trait loci (QTLs)/eQTLs (expression QTLs) through integrative genetical genomics. The integration of transcript profiling with seed weight QTL/eQTL mapping, molecular haplotyping, and association analyses identified potential molecular tags (GAGA8BKN3 and RAV1AAT regulatory elements and alleles/haplotypes) in the LOB-domain-containing protein- and KANADI protein-encoding transcription factor genes controlling the cis-regulated expression for seed weight in the chickpea. This emphasizes the potential of CNMS marker-based integrative genetical genomics for the quantitative genetic dissection of complex seed weight in chickpea. PMID:25504138

  19. CRISPR/Cas9-based genome editing for simultaneous interference with gene expression and protein stability

    DEFF Research Database (Denmark)

    Martinez, Virginia; Lauritsen, Ida; Hobel, Tonja

    2017-01-01

    Interference with genes is the foundation of reverse genetics and is key to manipulation of living cells for biomedical and biotechnological applications. However, classical genetic knockout and transcriptional knockdown technologies have different drawbacks and offer no control over existing...... protein levels. Here, we describe an efficient genome editing approach that affects specific protein abundances by changing the rates of both RNA synthesis and protein degradation, based on the two cross-kingdom control mechanisms CRISPRi and the N-end rule for protein stability. In addition, our approach...

  20. Rapid extraction of genomic DNA from saliva for HLA typing on microarray based on magnetic nanobeads

    Science.gov (United States)

    Xie, Xin; Zhang, Xu; Yu, Bingbin; Gao, Huafang; Zhang, Huan; Fei, Weiyang

    2004-09-01

    A series of simplified protocols are developed for extracting genomic DNA from saliva by using the magnetic nanobeads as absorbents. In these protocols, both the enrichment of the target cells and the adsorption of DNA can be achieved simultaneously by our functionally modified magnetic beads in one step, and the DNA-nanobeads complex can be used as PCR templates. HLA typing based on an oligonucleotide array was conducted by hybridization with the PCR products. The result shows that the protocols are robust and sensitive.

  1. Comparisons of Shewanella strains based on genome annotations, modeling and experiments

    Energy Technology Data Exchange (ETDEWEB)

    Ong, Wai Kit; Vu, Trang; Lovendahl, Klaus N.; Llull, Jenna; Serres, Margaret; Romine, Margaret F.; Reed, Jennifer L.

    2014-01-01

    Shewanella is a genus of facultatively anaerobic, Gram-negative bacteria that have highly adaptable metabolism which allows them to thrive in diverse environments. This quality makes them attractive target bacteria for research in bioremediation and microbial fuel cell applications. Constraint-based modeling is a useful tool for helping researchers gain insights into the metabolic capabilities of these bacteria. However, Shewanella oneidensis MR-1 is the only strain with a genome-scale metabolic model constructed out of the 22 sequenced Shewanella strains.

  2. SynMap2 and SynMap3D: web-based whole-genome synteny browsers.

    Science.gov (United States)

    Haug-Baltzell, Asher; Stephens, Sean A; Davey, Sean; Scheidegger, Carlos E; Lyons, Eric

    2017-07-15

    Current synteny visualization tools either focus on small regions of sequence and do not illustrate genome-wide trends, or are complicated to use and create visualizations that are difficult to interpret. To address this challenge, The Comparative Genomics Platform (CoGe) has developed two web-based tools to visualize synteny across whole genomes. SynMap2 and SynMap3D allow researchers to explore whole genome synteny patterns (across two or three genomes, respectively) in responsive, web-based visualization and virtual reality environments. Both tools have access to the extensive CoGe genome database (containing over 30 000 genomes) as well as the option for users to upload their own data. By leveraging modern web technologies there is no installation required, making the tools widely accessible and easy to use. Both tools are open source (MIT license) and freely available for use online through CoGe ( https://genomevolution.org ). SynMap2 and SynMap3D can be accessed at http://genomevolution.org/coge/SynMap.pl and http://genomevolution.org/coge/SynMap3D.pl , respectively. Source code is available: https://github.com/LyonsLab/coge . ericlyons@email.arizona.edu. Supplementary data are available at Bioinformatics online.

  3. DNAskew: Statistical Analysis of Base Compositional Asymmetry and Prediction of Replication Boundaries in the Genome Sequences

    Institute of Scientific and Technical Information of China (English)

    Xiang-RuMA; Shao-BoXIAO; Ai-ZhenGUO; Jian-QiangLUE; Huan-ChunCHEN

    2004-01-01

    Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution,in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

  4. Integrated Genomic and Network-Based Analyses of Complex Diseases and Human Disease Network.

    Science.gov (United States)

    Al-Harazi, Olfat; Al Insaif, Sadiq; Al-Ajlan, Monirah A; Kaya, Namik; Dzimiri, Nduna; Colak, Dilek

    2016-06-20

    A disease phenotype generally reflects various pathobiological processes that interact in a complex network. The highly interconnected nature of the human protein interaction network (interactome) indicates that, at the molecular level, it is difficult to consider diseases as being independent of one another. Recently, genome-wide molecular measurements, data mining and bioinformatics approaches have provided the means to explore human diseases from a molecular basis. The exploration of diseases and a system of disease relationships based on the integration of genome-wide molecular data with the human interactome could offer a powerful perspective for understanding the molecular architecture of diseases. Recently, subnetwork markers have proven to be more robust and reliable than individual biomarker genes selected based on gene expression profiles alone, and achieve higher accuracy in disease classification. We have applied one of these methodologies to idiopathic dilated cardiomyopathy (IDCM) data that we have generated using a microarray and identified significant subnetworks associated with the disease. In this paper, we review the recent endeavours in this direction, and summarize the existing methodologies and computational tools for network-based analysis of complex diseases and molecular relationships among apparently different disorders and human disease network. We also discuss the future research trends and topics of this promising field.

  5. Ethnobotany genomics - discovery and innovation in a new era of exploratory research

    Directory of Open Access Journals (Sweden)

    Ragupathy Subramanyam

    2010-01-01

    Full Text Available Abstract We present here the first use of DNA barcoding in a new approach to ethnobotany we coined "ethnobotany genomics". This new approach is founded on the concept of 'assemblage' of biodiversity knowledge, which includes a coming together of different ways of knowing and valorizing species variation in a novel approach seeking to add value to both traditional knowledge (TK and scientific knowledge (SK. We employed contemporary genomic technology, DNA barcoding, as an important tool for identifying cryptic species, which were already recognized ethnotaxa using the TK classification systems of local cultures in the Velliangiri Hills of India. This research is based on several case studies in our lab, which define an approach to that is poised to evolve quickly with the advent of new ideas and technology. Our results show that DNA barcoding validated several new cryptic plant species to science that were previously recognized by TK classifications of the Irulas and Malasars, and were lumped using SK classification. The contribution of the local aboriginal knowledge concerning plant diversity and utility in India is considerable; our study presents new ethnomedicine to science. Ethnobotany genomics can also be used to determine the distribution of rare species and their ecological requirements, including traditional ecological knowledge so that conservation strategies can be implemented. This is aligned with the Convention on Biological Diversity that was signed by over 150 nations, and thus the world's complex array of human-natural-technological relationships has effectively been re-organized.

  6. BPhyOG: An interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes

    Directory of Open Access Journals (Sweden)

    Lin Kui

    2007-07-01

    Full Text Available Abstract Background Overlapping genes (OGs in bacterial genomes are pairs of adjacent genes of which the coding sequences overlap partly or entirely. With the rapid accumulation of sequence data, many OGs in bacterial genomes have now been identified. Indeed, these might prove a consistent feature across all microbial genomes. Our previous work suggests that OGs can be considered as robust markers at the whole genome level for the construction of phylogenies. An online, interactive web server for inferring phylogenies is needed for biologists to analyze phylogenetic relationships among a set of bacterial genomes of interest. Description BPhyOG is an online interactive server for reconstructing the phylogenies of completely sequenced bacterial genomes on the basis of their shared overlapping genes. It provides two tree-reconstruction methods: Neighbor Joining (NJ and Unweighted Pair-Group Method using Arithmetic averages (UPGMA. Users can apply the desired method to generate phylogenetic trees, which are based on an evolutionary distance matrix for the selected genomes. The distance between two genomes is defined by the normalized number of their shared OG pairs. BPhyOG also allows users to browse the OGs that were used to infer the phylogenetic relationships. It provides detailed annotation for each OG pair and the features of the component genes through hyperlinks. Users can also retrieve each of the homologous OG pairs that have been determined among 177 genomes. It is a useful tool for analyzing the tree of life and overlapping genes from a genomic standpoint. Conclusion BPhyOG is a useful interactive web server for genome-wide inference of any potential evolutionary relationship among the genomes selected by users. It currently includes 177 completely sequenced bacterial genomes containing 79,855 OG pairs, the annotation and homologous OG pairs of which are integrated comprehensively. The reliability of phylogenies complemented by

  7. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  8. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center.

    Science.gov (United States)

    Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-01

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  9. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database

    Science.gov (United States)

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome’s content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called “Cynara cardunculus MicroSatellite DataBase” (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates. PMID:27648830

  10. The complete mitochondrial genome of the enigmatic bigheadedturtle (Platysternon): description of unusual genomic features and thereconciliation of phylogenetic hypotheses based on mitochondrial andnuclear DNA

    Energy Technology Data Exchange (ETDEWEB)

    Parham, James F.; Feldman, Chris R.; Boore, Jeffrey L.

    2005-12-28

    The big-headed turtle (Platysternon megacephalum) from east Asia is the sole living representative of a poorly-studied turtle lineage (Platysternidae). It has no close living relatives, and its phylogenetic position within turtles is one of the outstanding controversies in turtle systematics. Platysternon was traditionally considered to be close to snapping turtles (Chelydridae) based on some studies of its morphology and mitochondrial (mt) DNA, however, other studies of morphology and nuclear (nu) DNA do not support that hypothesis. We sequenced the complete mt genome of Platysternon and the nearly complete mt genomes of two other relevant turtles and compared them to turtle mt genomes from the literature to form the largest molecular dataset used to date to address this issue. The resulting phylogeny robustly rejects the placement of Platysternon with Chelydridae, but instead shows that it is a member of the Testudinoidea, a diverse, nearly globally-distributed group that includes pond turtles and tortoises. We also discovered that Platysternon mtDNA has large-scale gene rearrangements and possesses two, nearly identical, control regions, features that distinguish it from all other studied turtles. Our study robustly determines the phylogenetic placement of Platysternon and provides a well-resolved outline of major turtle lineages, while demonstrating the significantly greater resolving power of comparing large amounts of mt sequence over that of short fragments. Earlier phylogenies placing Platysternon with chelydrids required a temporal gap in the fossil record that is now unnecessary. The duplicated control regions and gene rearrangements of the Platysternon mt DNA probably resulted from the duplication of part of the genome and then the subsequent loss of redundant genes. Although it is possible that having two control regions may provide some advantage, explaining why the control regions would be maintained while some of the duplicated genes were eroded

  11. Complete genome sequence of cyanobacterium Nostoc sp. NIES-3756, a potentially useful strain for phytochrome-based bioengineering.

    Science.gov (United States)

    Hirose, Yuu; Fujisawa, Takatomo; Ohtsubo, Yoshiyuki; Katayama, Mitsunori; Misawa, Naomi; Wakazuki, Sachiko; Shimura, Yohei; Nakamura, Yasukazu; Kawachi, Masanobu; Yoshikawa, Hirofumi; Eki, Toshihiko; Kanesaki, Yu

    2016-01-20

    To explore the diverse photoreceptors of cyanobacteria, we isolated Nostoc sp. strain NIES-3756 from soil at Mimomi-Park, Chiba, Japan, and determined its complete genome sequence. The Genome consists of one chromosome and two plasmids (total 6,987,571 bp containing no gaps). The NIES-3756 strain carries 7 phytochrome and 12 cyanobacteriochrome genes, which will facilitate the studies of phytochrome-based bioengineering. Copyright © 2015. Published by Elsevier B.V.

  12. Towards fully automated structure-based function prediction in structural genomics: a case study.

    Science.gov (United States)

    Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M

    2007-04-13

    As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.

  13. The application of the entropy-based statistic for genomic association study of QTL

    Institute of Scientific and Technical Information of China (English)

    Yang Xiang; Yumei Li; Zaiming Liu; Zhenqiu Sun

    2008-01-01

    An entropy-based statistic TPE has been proposed for genomic association study for disease-susceptibility locus.The statistic TPE may be directly adopted and/or extended to quantitative-trait locus (QTL)mapping for quantitative traits.In this article,the statistic TPE was extended and applied to quantitative trait for association analysis of QTL by means of selective genotyping.The statistical properties (the type I error rate and the power) were examined under a range of parameters and population-sampling strategies (e.g.,various genetic models,various heritabilities,and various sample-selection threshold values) by simulation studies.The results indicated that the statistic Tee is robust and powerful for genomic association study of QTL.A simulation study based on the haplotype frequencies of 10 single nucleotide polymorphisms (SNPs) of angiotensin-I converting enzyme genes was conducted to evaluate the performance of the statistic TPE for genetic association study.

  14. The application of the entropy-based statistic for genomic association study of QTL.

    Science.gov (United States)

    Xiang, Yang; Li, Yumei; Liu, Zaiming; Sun, Zhenqiu

    2008-03-01

    An entropy-based statistic T(PE) has been proposed for genomic association study for disease-susceptibility locus. The statistic T(PE) may be directly adopted and/or extended to quantitative-trait locus (QTL) mapping for quantitative traits. In this article, the statistic T(PE) was extended and applied to quantitative trait for association analysis of QTL by means of selective genotyping. The statistical properties (the type I error rate and the power) were examined under a range of parameters and population-sampling strategies (e.g., various genetic models, various heritabilities, and various sample-selection threshold values) by simulation studies. The results indicated that the statistic T(PE) is robust and powerful for genomic association study of QTL. A simulation study based on the haplotype frequencies of 10 single nucleotide polymorphisms (SNPs) of angiotensin-I converting enzyme genes was conducted to evaluate the performance of the statistic T(PE) for genetic association study.

  15. Messenger RNA- versus retrovirus-based induced pluripotent stem cell reprogramming strategies: analysis of genomic integrity.

    Science.gov (United States)

    Steichen, Clara; Luce, Eléanor; Maluenda, Jérôme; Tosca, Lucie; Moreno-Gimeno, Inmaculada; Desterke, Christophe; Dianat, Noushin; Goulinet-Mainot, Sylvie; Awan-Toor, Sarah; Burks, Deborah; Marie, Joëlle; Weber, Anne; Tachdjian, Gérard; Melki, Judith; Dubart-Kupperschmitt, Anne

    2014-06-01

    The use of synthetic messenger RNAs to generate human induced pluripotent stem cells (iPSCs) is particularly appealing for potential regenerative medicine applications, because it overcomes the common drawbacks of DNA-based or virus-based reprogramming strategies, including transgene integration in particular. We compared the genomic integrity of mRNA-derived iPSCs with that of retrovirus-derived iPSCs generated in strictly comparable conditions, by single-nucleotide polymorphism (SNP) and copy number variation (CNV) analyses. We showed that mRNA-derived iPSCs do not differ significantly from the parental fibroblasts in SNP analysis, whereas retrovirus-derived iPSCs do. We found that the number of CNVs seemed independent of the reprogramming method, instead appearing to be clone-dependent. Furthermore, differentiation studies indicated that mRNA-derived iPSCs differentiated efficiently into hepatoblasts and that these cells did not load additional CNVs during differentiation. The integration-free hepatoblasts that were generated constitute a new tool for the study of diseased hepatocytes derived from patients' iPSCs and their use in the context of stem cell-derived hepatocyte transplantation. Our findings also highlight the need to conduct careful studies on genome integrity for the selection of iPSC lines before using them for further applications.

  16. Array comparative genomic hybridization-based characterization of genetic alterations in pulmonary neuroendocrine tumors.

    Science.gov (United States)

    Voortman, Johannes; Lee, Jih-Hsiang; Killian, Jonathan Keith; Suuriniemi, Miia; Wang, Yonghong; Lucchi, Marco; Smith, William I; Meltzer, Paul; Wang, Yisong; Giaccone, Giuseppe

    2010-07-20

    The goal of this study was to characterize and classify pulmonary neuroendocrine tumors based on array comparative genomic hybridization (aCGH). Using aCGH, we performed karyotype analysis of 33 small cell lung cancer (SCLC) tumors, 13 SCLC cell lines, 19 bronchial carcinoids, and 9 gastrointestinal carcinoids. In contrast to the relatively conserved karyotypes of carcinoid tumors, the karyotypes of SCLC tumors and cell lines were highly aberrant. High copy number (CN) gains were detected in SCLC tumors and cell lines in cytogenetic bands encoding JAK2, FGFR1, and MYC family members. In some of those samples, the CN of these genes exceeded 100, suggesting that they could represent driver alterations and potential drug targets in subgroups of SCLC patients. In SCLC tumors, as well as bronchial carcinoids and carcinoids of gastrointestinal origin, recurrent CN alterations were observed in 203 genes, including the RB1 gene and 59 microRNAs of which 51 locate in the DLK1-DIO3 domain. These findings suggest the existence of partially shared CN alterations in these tumor types. In contrast, CN alterations of the TP53 gene and the MYC family members were predominantly observed in SCLC. Furthermore, we demonstrated that the aCGH profile of SCLC cell lines highly resembles that of clinical SCLC specimens. Finally, by analyzing potential drug targets, we provide a genomics-based rationale for targeting the AKT-mTOR and apoptosis pathways in SCLC.

  17. A revised timescale for human evolution based on ancient mitochondrial genomes

    Science.gov (United States)

    Johnson, Philip L.F.; Bos, Kirsten; Lari, Martina; Bollongino, Ruth; Sun, Chengkai; Giemsch, Liane; Schmitz, Ralf; Burger, Joachim; Ronchitelli, Anna Maria; Martini, Fabio; Cremonesi, Renata G.; Svoboda, Jiří; Bauer, Peter; Caramelli, David; Castellano, Sergi; Reich, David; Pääbo, Svante; Krause, Johannes

    2016-01-01

    Summary Background Recent analyses of de novo DNA mutations in modern humans have suggested a nuclear substitution rate that is approximately half that of previous estimates based on fossil calibration. This result has led to suggestions that major events in human evolution occurred far earlier than previously thought. Result Here we use mitochondrial genome sequences from 10 securely dated ancient modern humans spanning 40,000 years as calibration points for the mitochondrial clock, thus yielding a direct estimate of the mitochondrial substitution rate. Our clock yields mitochondrial divergence times that are in agreement with earlier estimates based on calibration points derived from either fossils or archaeological material. In particular, our results imply a separation of non-Africans from the most closely related sub-Saharan African mitochondrial DNAs (haplogroup L3) of less than 62,000-95,000 years ago. Conclusion Though single loci like mitochondrial DNA (mtDNA) can only provide biased estimates of population split times, they can provide valid upper bounds; our results exclude most of the older dates for African and non-African split times recently suggested by de novo mutation rate estimates in the nuclear genome. PMID:23523248

  18. MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.

    Science.gov (United States)

    Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin

    2015-04-01

    Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  19. Protein Interaction-Based Genome-Wide Analysis of Incident Coronary Heart Disease

    DEFF Research Database (Denmark)

    Jensen, Majken Karoline; Pers, Tune Hannes; Dworzynski, Piotr

    2011-01-01

    Background-Network-based approaches may leverage genome-wide association (GWA) analysis by testing for the aggregate association across several pathway members. We aimed to examine if networks of genes that represent experimentally determined protein-protein interactions (PPIs) are enriched...... are involved in abnormal cardiovascular system physiological features based on knockout mice (4-fold enrichment; Fisher exact test, P = 0.006). Ingenuity pathway analysis revealed that canonical pathways, especially related to blood pressure regulation, were significantly enriched in the genes from the top...... complex. Conclusions-The integration of a GWA study with PPI data successfully identifies a set of candidate susceptibility genes for incident CHD that would have been missed in single-marker GWA analysis. (Circ Cardiovasc Genet. 2011; 4:549-556.)...

  20. Fragment-based cocktail crystallography by the Medical Structural Genomics of Pathogenic Protozoa Consortium

    Science.gov (United States)

    Verlinde, Christophe L.M.J.; Fan, Erkang; Shibata, Sayaka; Zhang, Zongsheng; Sun, Zhihua; Deng, Wei; Ross, Jennifer; Kim, Jessica; Xiao, Liren; Arakaki, Tracy L.; Bosch, Jürgen; Caruthers, Jonathan M.; Larson, Eric T.; LeTrong, Isolde; Napuli, Alberto; Kelly, Angela; Mueller, Natasha; Zucker, Frank; Van Voorhis, Wesley C.; Buckner, Frederick S.; Merritt, Ethan A.; Hol, Wim G.J.

    2010-01-01

    The history of fragment-based drug discovery, with an emphasis on crystallographic methods, is sketched, illuminating various contributions, including our own, which preceded the industrial development of the method. Subsequently, the creation of the BMSC fragment cocktails library is described. The BMSC collection currently comprises 68 cocktails of 10 compounds that are shape-wise diverse. The utility of these cocktails for initiating lead discovery in structure-based drug design has been explored by soaking numerous protein crystals obtained by our MSGPP (Medical Structural Genomics of Pathogenic Protozoa) consortium. Details of the fragment selection and cocktail design procedures, as well as examples of the successes obtained are given. The BMSC Fragment Cocktail recipes are available free of charge and are in use in over 20 academic labs. PMID:19929835

  1. Universal spectrum for DNA base CG frequency distribution in Takifugu rubripes (Puffer fish) genome

    CERN Document Server

    Selvam, A M

    2007-01-01

    The frequency distribution of DNA bases A, C, G, T exhibit fractal fluctuations, namely a zigzag pattern of an increase followed by a decrease of all orders of magnitude along the length of the DNA molecule. Selfsimilar fractal fluctuations are ubiquitous to space-time fluctuations of dynamical systems in nature. The power spectra of fractal fluctuations exhibit inverse power law form signifying long-range space-time correlations such that there is two-way communication between local (small-scale) and global (large-scale) perturbations. In this paper it is shown that DNA base CG frequency distribution in Takifugu rubripes (Puffer fish) Genome Release 4 exhibit universal inverse power law form of the statistical normal distribution consistent with a general systems theory model prediction of quantumlike chaos governing fractal space-time distributions. The model predictions are (i) quasicrystalline Penrose tiling pattern for the nested coiled structure thereby achieving maximum packing efficiency for the DNA m...

  2. Genomic Characterization of Flavobacterium psychrophilum Serotypes and Development of a Multiplex PCR-Based Serotyping Scheme

    Directory of Open Access Journals (Sweden)

    Tatiana Rochat

    2017-09-01

    Full Text Available Flavobacterium psychrophilum is a devastating bacterial pathogen of salmonids reared in freshwater worldwide. So far, serological diversity between isolates has been described but the underlying molecular factors remain unknown. By combining complete genome sequence analysis and the serotyping method proposed by Lorenzen and Olesen (1997 for a set of 34 strains, we identified key molecular determinants of the serotypes. This knowledge allowed us to develop a robust multiplex PCR-based serotyping scheme, which was applied to 244 bacterial isolates. The results revealed a striking association between PCR-serotype and fish host species and illustrate the use of this approach as a simple and cost-effective method for the determination of F. psychrophilum serogroups. PCR-based serotyping could be a useful tool in a range of applications such as disease surveillance, selection of salmonids for bacterial coldwater disease resistance and future vaccine formulation.

  3. CCor: A whole genome network-based similarity measure between two genes.

    Science.gov (United States)

    Hu, Yiming; Zhao, Hongyu

    2016-12-01

    Measuring the similarity between genes is often the starting point for building gene regulatory networks. Most similarity measures used in practice only consider pairwise information with a few also consider network structure. Although theoretical properties of pairwise measures are well understood in the statistics literature, little is known about their statistical properties of those similarity measures based on network structure. In this article, we consider a new whole genome network-based similarity measure, called CCor, that makes use of information of all the genes in the network. We derive a concentration inequality of CCor and compare it with the commonly used Pearson correlation coefficient for inferring network modules. Both theoretical analysis and real data example demonstrate the advantages of CCor over existing measures for inferring gene modules.

  4. Identification of genetic bases of vibrio fluvialis species-specific biochemical pathways and potential virulence factors by comparative genomic analysis.

    Science.gov (United States)

    Lu, Xin; Liang, Weili; Wang, Yunduan; Xu, Jialiang; Zhu, Jun; Kan, Biao

    2014-03-01

    Vibrio fluvialis is an important food-borne pathogen that causes diarrheal illness and sometimes extraintestinal infections in humans. In this study, we sequenced the genome of a clinical V. fluvialis strain and determined its phylogenetic relationships with other Vibrio species by comparative genomic analysis. We found that the closest relationship was between V. fluvialis and V. furnissii, followed by those with V. cholerae and V. mimicus. Moreover, based on genome comparisons and gene complementation experiments, we revealed genetic mechanisms of the biochemical tests that differentiate V. fluvialis from closely related species. Importantly, we identified a variety of genes encoding potential virulence factors, including multiple hemolysins, transcriptional regulators, and environmental survival and adaptation apparatuses, and the type VI secretion system, which is indicative of complex regulatory pathways modulating pathogenesis in this organism. The availability of V. fluvialis genome sequences may promote our understanding of pathogenic mechanisms for this emerging pathogen.

  5. Prokaryotic Contig Annotation Pipeline Server: Web Application for a Prokaryotic Genome Annotation Pipeline Based on the Shiny App Package.

    Science.gov (United States)

    Park, Byeonghyeok; Baek, Min-Jeong; Min, Byoungnam; Choi, In-Geol

    2017-09-01

    Genome annotation is a primary step in genomic research. To establish a light and portable prokaryotic genome annotation pipeline for use in individual laboratories, we developed a Shiny app package designated as "P-CAPS" (Prokaryotic Contig Annotation Pipeline Server). The package is composed of R and Python scripts that integrate publicly available annotation programs into a server application. P-CAPS is not only a browser-based interactive application but also a distributable Shiny app package that can be installed on any personal computer. The final annotation is provided in various standard formats and is summarized in an R markdown document. Annotation can be visualized and examined with a public genome browser. A benchmark test showed that the annotation quality and completeness of P-CAPS were reliable and compatible with those of currently available public pipelines.

  6. RiceGeneThresher: a web-based application for mining genes underlying QTL in rice genome.

    Science.gov (United States)

    Thongjuea, Supat; Ruanjaichon, Vinitchan; Bruskiewich, Richard; Vanavichit, Apichart

    2009-01-01

    RiceGeneThresher is a public online resource for mining genes underlying genome regions of interest or quantitative trait loci (QTL) in rice genome. It is a compendium of rice genomic resources consisting of genetic markers, genome annotation, expressed sequence tags (ESTs), protein domains, gene ontology, plant stress-responsive genes, metabolic pathways and prediction of protein-protein interactions. RiceGeneThresher system integrates these diverse data sources and provides powerful web-based applications, and flexible tools for delivering customized set of biological data on rice. Its system supports whole-genome gene mining for QTL by querying using DNA marker intervals or genomic loci. RiceGeneThresher provides biologically supported evidences that are essential for targeting groups or networks of genes involved in controlling traits underlying QTL. Users can use it to discover and to assign the most promising candidate genes in preparation for the further gene function validation analysis. The web-based application is freely available at http://rice.kps.ku.ac.th.

  7. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

    DEFF Research Database (Denmark)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan;

    2009-01-01

    MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary...

  8. Gene expression profiles in squamous cell cervical carcinoma using array-based comparative genomic hybridization analysis.

    Science.gov (United States)

    Choi, Y-W; Bae, S M; Kim, Y-W; Lee, H N; Kim, Y W; Park, T C; Ro, D Y; Shin, J C; Shin, S J; Seo, J-S; Ahn, W S

    2007-01-01

    Our aim was to identify novel genomic regions of interest and provide highly dynamic range information on correlation between squamous cell cervical carcinoma and its related gene expression patterns by a genome-wide array-based comparative genomic hybridization (array-CGH). We analyzed 15 cases of cervical cancer from KangNam St Mary's Hospital of the Catholic University of Korea. Microdissection assay was performed to obtain DNA samples from paraffin-embedded cervical tissues of cancer as well as of the adjacent normal tissues. The bacterial artificial chromosome (BAC) array used in this study consisted of 1440 human BACs and the space among the clones was 2.08 Mb. All the 15 cases of cervical cancer showed the differential changes of the cervical cancer-associated genetic alterations. The analysis limit of average gains and losses was 53%. A significant positive correlation was found in 8q24.3, 1p36.32, 3q27.1, 7p21.1, 11q13.1, and 3p14.2 changes through the cervical carcinogenesis. The regions of high level of gain were 1p36.33-1p36.32, 8q24.3, 16p13.3, 1p36.33, 3q27.1, and 7p21.1. And the regions of homozygous loss were 2q12.1, 22q11.21, 3p14.2, 6q24.3, 7p15.2, and 11q25. In the high level of gain regions, GSDMDC1, RECQL4, TP73, ABCF3, ALG3, HDAC9, ESRRA, and RPS6KA4 were significantly correlated with cervical cancer. The genes encoded by frequently lost clones were PTPRG, GRM7, ZDHHC3, EXOSC7, LRP1B, and NR3C2. Therefore, array-CGH analyses showed that specific genomic alterations were maintained in cervical cancer that were critical to the malignant phenotype and may give a chance to find out possible target genes present in the gained or lost clones.

  9. Detection of genomic imbalances in microdissected Hodgkin and Reed-Sternberg cells of classical Hodgkin's lymphoma by array-based comparative genomic hybridization.

    Science.gov (United States)

    Hartmann, Sylvia; Martin-Subero, José I; Gesk, Stefan; Hüsken, Julia; Giefing, Maciej; Nagel, Inga; Riemke, Jennifer; Chott, Andreas; Klapper, Wolfram; Parrens, Marie; Merlio, Jean-Philippe; Küppers, Ralf; Bräuninger, Andreas; Siebert, Reiner; Hansmann, Martin-Leo

    2008-09-01

    Cytogenetic analysis of classical Hodgkin's lymphoma is limited by the low content of the neoplastic Hodgkin-Reed-Sternberg cells in the affected tissues. However, available cytogenetic data point to an extreme karyotype complexity. To obtain insights into chromosomal imbalances in classical Hodgkin's lymphoma, we applied array-based comparative genomic hybridization (array comparative genomic hybridization) using DNA from microdissected Hodgkin-Reed-Sternberg cells. To avoid biases introduced by DNA amplification for array comparative genomic hybridization, cHL cases rich in Hodgkin-Reed-Sternberg cells were selected. DNA obtained from approximately 100,000 microdissected Hodgkin-Reed-Sternberg cells of each of ten classical Hodgkin's lymphoma cases was hybridized onto commercial 105 K oligonucleotide comparative genomic hybridization microarrays. Selected imbalances were confirmed by interphase cytogenetics and quantitative polymerase chain reaction analysis and further studied in an independent series of classical Hodgkin's lymphoma. Gains identified in at least five cHL affected 2p12-16, 5q15-23, 6p22, 8q13, 8q24, 9p21-24, 9q34, 12q13-14, 17q12, 19p13, 19q13 and 20q11 whereas losses recurrent in at least five cases involved Xp21, 6q23-24 and 13q22. Copy number changes of selected genes and a small deletion (156 kb) of the CDKN2B (p15) gene were confirmed by interphase cytogenetics and polymerase chain reaction analysis, respectively. Several gained regions included genes constitutively expressed in cHL. Among these, gains of STAT6 (12q13), NOTCH1 (9q34) and JUNB (19p13) were present in additional cHL with the usual low Hodgkin-Reed-Sternberg cell content. The present study demonstrates that array comparative genomic hybridization of microdissected Hodgkin-Reed-Sternberg cells is suitable for identifying and characterizing chromosomal imbalances. Regions affected by genomic changes in Hodgkin-Reed-Sternberg cells recurrently include genes constitutively

  10. Surface ligation-based resonance light scattering analysis of methylated genomic DNA on a microarray platform.

    Science.gov (United States)

    Ma, Lan; Lei, Zhen; Liu, Xia; Liu, Dianjun; Wang, Zhenxin

    2016-05-10

    DNA methylation is a crucial epigenetic modification and is closely related to tumorigenesis. Herein, a surface ligation-based high throughput method combined with bisulfite treatment is developed for analysis of methylated genomic DNA. In this method, a DNA microarray is employed as a reaction platform, and resonance light scattering (RLS) of nanoparticles is used as the detection principle. The specificity stems from allele-specific ligation of Taq DNA ligase, which is further enhanced by improving the fidelity of Taq DNA ligase in a heterogeneous reaction. Two amplification techniques, rolling circle amplification (RCA) and silver enhancement, are employed after the ligation reaction and a gold nanoparticle (GNP) labeling procedure is used to amplify the signal. As little as 0.01% methylated DNA (i.e. 2 pmol L(-1)) can be distinguished from the cocktail of methylated and unmethylated DNA by the proposed method. More importantly, this method shows good accuracy and sensitivity in profiling the methylation level of genomic DNA of three selected colonic cancer cell lines. This strategy provides a high throughput alternative with reasonable sensitivity and resolution for cancer study and diagnosis.

  11. Donor plasmid design for codon and single base genome editing using zinc finger nucleases.

    Science.gov (United States)

    Pruett-Miller, Shondra M; Davis, Gregory D

    2015-01-01

    In recent years, CompoZr zinc finger nuclease (ZFN) technology has matured to the point that a user-defined double strand break (DSB) can be placed at virtually any location in the human genome within 50 bp of a desired site. Such high resolution ZFN engineering is well within the conversion tract limitations demarcated by the mammalian DNA repair machinery, resulting in a nearly universal ability to create point mutations throughout the human genome. Additionally, new architectures for targeted nuclease engineering have been rapidly developed, namely transcription activator like effector nucleases (TALENs) and clustered regularly interspaced short palindromic repeats (CRISPR)/Cas systems, further expanding options for placement of DSBs. This new capability has created a need to explore the practical limitations of delivering plasmid-based information to the sites of chromosomal double strand breaks so that nuclease-donor methods can be widely deployed in fundamental and therapeutic research. In this chapter, we explore a ZFN-compatible donor design in the context of codon changes at an endogenous locus encoding the human RSK2 kinase.

  12. RNAi-Based Functional Genomics Identifies New Virulence Determinants in Mucormycosis

    Science.gov (United States)

    Sanchis, Marta; Lopez-Fernandez, Loida; Torres-Martínez, Santiago; Garre, Victoriano; Ruiz-Vázquez, Rosa María

    2017-01-01

    Mucorales are an emerging group of human pathogens that are responsible for the lethal disease mucormycosis. Unfortunately, functional studies on the genetic factors behind the virulence of these organisms are hampered by their limited genetic tractability, since they are reluctant to classical genetic tools like transposable elements or gene mapping. Here, we describe an RNAi-based functional genomic platform that allows the identification of new virulence factors through a forward genetic approach firstly described in Mucorales. This platform contains a whole-genome collection of Mucor circinelloides silenced transformants that presented a broad assortment of phenotypes related to the main physiological processes in fungi, including virulence, hyphae morphology, mycelial and yeast growth, carotenogenesis and asexual sporulation. Selection of transformants with reduced virulence allowed the identification of mcplD, which encodes a Phospholipase D, and mcmyo5, encoding a probably essential cargo transporter of the Myosin V family, as required for a fully virulent phenotype of M. circinelloides. Knock-out mutants for those genes showed reduced virulence in both Galleria mellonella and Mus musculus models, probably due to a delayed germination and polarized growth within macrophages. This study provides a robust approach to study virulence in Mucorales and as a proof of concept identified new virulence determinants in M. circinelloides that could represent promising targets for future antifungal therapies. PMID:28107502

  13. CRISPR/Cas9-based efficient genome editing in Staphylococcus aureus.

    Science.gov (United States)

    Liu, Qi; Jiang, Yu; Shao, Lei; Yang, Ping; Sun, Bingbing; Yang, Sheng; Chen, Daijie

    2017-09-01

    Staphylococcus aureus is an important pathogenic bacterium prevalent in nosocomial infections and associated with high morbidity and mortality rates, which arise from the significant pathogenicity and multi-drug resistance. However, the typical genetic manipulation tools used to explore the relevant molecular mechanisms of S. aureus have multiple limitations: leaving a scar in the genome, comparatively low gene-editing efficiency, and prolonged experimental period. Here, we present a single-plasmid based on the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated (Cas) system which allows rapid and efficient chromosomal manipulation in S. aureus. The plasmid carries the cas9 gene under the control of the constitutive promoter Pxyl/tet, a single guide RNA-encoding sequence transcribed via a strong promoter Pspac, and donor DNA used to repair the double strand breaks. The function of the CRISPR/Cas9 vector was demonstrated by deleting the tgt gene and the rocA gene, and by inserting the erm R cassette in S. aureus. This research establishes a CRISPR/Cas9 genome editing tool in S. aureus, which enables marker-free, scarless and rapid genetic manipulation, thus accelerating the study of gene function in S. aureus. © The Author 2017. Published by Oxford University Press on behalf of the Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. Cloud computing-based TagSNP selection algorithm for human genome data.

    Science.gov (United States)

    Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

    2015-01-05

    Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.

  15. Performance Ratio Based Resource Allocation Decision-Making in Genomic Medicine.

    Science.gov (United States)

    Fragoulakis, Vasilios; Mitropoulou, Christina; Katelidou, Daphne; van Schaik, Ron H; Maniadakis, Nikolaos; Patrinos, George P

    2017-02-01

    In modern healthcare systems, the available resources may influence the morbidity, mortality, and-consequently-the level of healthcare provided in every country. This is of particular interest in developing countries where the resources are limited and must be spent wisely to address social justice and the right for equal access in healthcare services by all the citizens in economically viable terms. In this light, the current allocation is, in practice, inefficient and rests mostly on each country's individual political and historical context and, thus, does not always incorporate decision-making enabled by economic models. In this study, we present a new economic model, specifically for resource allocation for genomic medicine, based on performance ratio, with potential applications in diverse healthcare sectors, which are particularly appealing for developing countries and low-resource environments. The model proposes a new method for resource allocation taking into account (1) the size of innovation of a new technology, (2) the relative effectiveness in comparison with social preferences, and (3) the cost of the technology, which permits the measurement of effectiveness to be determined differently in the context of a specific disease and then to be expressed in a relative form using a common performance ratio. The present work expands on previous work for innovation in economic models pertaining to genomic medicine and supports translational science.

  16. Selection of highly efficient sgRNAs for CRISPR/Cas9-based plant genome editing.

    Science.gov (United States)

    Liang, Gang; Zhang, Huimin; Lou, Dengji; Yu, Diqiu

    2016-02-19

    The CRISPR/Cas9-sgRNA system has been developed to mediate genome editing and become a powerful tool for biological research. Employing the CRISPR/Cas9-sgRNA system for genome editing and manipulation has accelerated research and expanded researchers' ability to generate genetic models. However, the method evaluating the efficiency of sgRNAs is lacking in plants. Based on the nucleotide compositions and secondary structures of sgRNAs which have been experimentally validated in plants, we instituted criteria to design efficient sgRNAs. To facilitate the assembly of multiple sgRNA cassettes, we also developed a new strategy to rapidly construct CRISPR/Cas9-sgRNA system for multiplex editing in plants. In theory, up to ten single guide RNA (sgRNA) cassettes can be simultaneously assembled into the final binary vectors. As a proof of concept, 21 sgRNAs complying with the criteria were designed and the corresponding Cas9/sgRNAs expression vectors were constructed. Sequencing analysis of transgenic rice plants suggested that 82% of the desired target sites were edited with deletion, insertion, substitution, and inversion, displaying high editing efficiency. This work provides a convenient approach to select efficient sgRNAs for target editing.

  17. Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data

    Science.gov (United States)

    Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

    2015-01-01

    Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used. PMID:25569088

  18. Integrative computational approach for genome-based study of microbial lipid-degrading enzymes.

    Science.gov (United States)

    Vorapreeda, Tayvich; Thammarongtham, Chinae; Laoteng, Kobkul

    2016-07-01

    Lipid-degrading or lipolytic enzymes have gained enormous attention in academic and industrial sectors. Several efforts are underway to discover new lipase enzymes from a variety of microorganisms with particular catalytic properties to be used for extensive applications. In addition, various tools and strategies have been implemented to unravel the functional relevance of the versatile lipid-degrading enzymes for special purposes. This review highlights the study of microbial lipid-degrading enzymes through an integrative computational approach. The identification of putative lipase genes from microbial genomes and metagenomic libraries using homology-based mining is discussed, with an emphasis on sequence analysis of conserved motifs and enzyme topology. Molecular modelling of three-dimensional structure on the basis of sequence similarity is shown to be a potential approach for exploring the structural and functional relationships of candidate lipase enzymes. The perspectives on a discriminative framework of cutting-edge tools and technologies, including bioinformatics, computational biology, functional genomics and functional proteomics, intended to facilitate rapid progress in understanding lipolysis mechanism and to discover novel lipid-degrading enzymes of microorganisms are discussed.

  19. A surrogate-based approach for post-genomic partner identification

    Directory of Open Access Journals (Sweden)

    Giordano Tony

    2001-09-01

    Full Text Available Abstract Background Modern drug discovery is concerned with identification and validation of novel protein targets from among the 30,000 genes or more postulated to be present in the human genome. While protein-protein interactions may be central to many disease indications, it has been difficult to identify new chemical entities capable of regulating these interactions as either agonists or antagonists. Results In this paper, we show that peptide complements (or surrogates derived from highly diverse random phage display libraries can be used for the identification of the expected natural biological partners for protein and non-protein targets. Our examples include surrogates isolated against both an extracellular secreted protein (TNFβ and intracellular disease related mRNAs. In each case, surrogates binding to these targets were obtained and found to contain partner information embedded in their amino acid sequences. Furthermore, this information was able to identify the correct biological partners from large human genome databases by rapid and integrated computer based searches. Conclusions Modified versions of these surrogates should provide agents capable of modifying the activity of these targets and enable one to study their involvement in specific biological processes as a means of target validation for downstream drug discovery.

  20. Novel extraction strategy of ribosomal RNA and genomic DNA from cheese for PCR-based investigations.

    Science.gov (United States)

    Bonaïti, Catherine; Parayre, Sandrine; Irlinger, Françoise

    2006-03-15

    Cheese microorganisms, such as bacteria and fungi, constitute a complex ecosystem that plays a central role in cheeses ripening. The molecular study of cheese microbial diversity and activity is essential but the extraction of high quality nucleic acid may be problematic: the cheese samples are characterised by a strong buffering capacity which negatively influenced the yield of the extracted rRNA. The objective of this study is to develop an effective method for the direct and simultaneous isolation of yeast and bacterial ribosomal RNA and genomic DNA from the same cheese samples. DNA isolation was based on a protocol used for nucleic acids isolation from anaerobic digestor, without preliminary washing step with the combined use of the action of chaotropic agent (acid guanidinium thiocyanate), detergents (SDS, N-lauroylsarcosine), chelating agent (EDTA) and a mechanical method (bead beating system). The DNA purification was carried out by two washing steps of phenol-chloroform. RNA was isolated successfully after the second acid extraction step by recovering it from the phenolic phase of the first acid extraction. The novel method yielded pure preparation of undegraded RNA accessible for reverse transcription-PCR. The extraction protocol of genomic DNA and rRNA was applicable to complex ecosystem of different cheese matrices.

  1. The case for a rational genome-based vaccine against malaria

    Directory of Open Access Journals (Sweden)

    Carla eProietti

    2015-01-01

    Full Text Available Historically, vaccines have been designed to mimic the immunity induced by natural exposure to the target pathogen, but this approach has not been effective for any parasitic pathogens of humans or complex pathogens that cause chronic disease in humans, such as Plasmodium. Despite intense efforts by many laboratories around the world on different aspects of Plasmodium spp. molecular and cell biology, epidemiology and immunology, progress towards the goal of an effective malaria vaccine has been disappointing. The premise of rational vaccine design is to induce the desired immune response against the key pathogen antigens or epitopes targeted by protective immune responses. We advocate that development of an optimally efficacious malaria vaccine will need to improve on nature, and that this can be accomplished by rational vaccine design facilitated by mining genomic, proteomic and transcriptomic datasets in the context of relevant biological function. In our opinion, modern genome-based rational vaccine design offers enormous potential above and beyond that of whole-organism vaccines approaches established over 200 years ago where immunity is likely suboptimal due to the many genetic and immunological host-parasite adaptations evolved to allow the Plasmodium parasite to coexist in the human host, and which are associated with logistic and regulatory hurdles for production and delivery.

  2. Rapid pair-wise synteny analysis of large bacterial genomes using web-based GeneOrder4.0

    Directory of Open Access Journals (Sweden)

    Mahadevan Padmanabhan

    2010-02-01

    Full Text Available Abstract Background The growing whole genome sequence databases necessitate the development of user-friendly software tools to mine these data. Web-based tools are particularly useful to wet-bench biologists as they enable platform-independent analysis of sequence data, without having to perform complex programming tasks and software compiling. Findings GeneOrder4.0 is a web-based "on-the-fly" synteny and gene order analysis tool for comparative bacterial genomics (ca. 8 Mb. It enables the visualization of synteny by plotting protein similarity scores between two genomes and it also provides visual annotation of "hypothetical" proteins from older archived genomes based on more recent annotations. Conclusions The web-based software tool GeneOrder4.0 is a user-friendly application that has been updated to allow the rapid analysis of synteny and gene order in large bacterial genomes. It is developed with the wet-bench researcher in mind.

  3. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  4. Genetics- and genomics-based interventions for nutritional enhancement of grain legume crops: status and outlook.

    Science.gov (United States)

    Bohra, Abhishek; Sahrawat, Kanwar L; Kumar, Shiv; Joshi, Rohit; Parihar, Ashok K; Singh, Ummed; Singh, Deepak; Singh, Narendra P

    2015-05-01

    Meeting the food demands and ensuring nutritional security of the ever increasing global population in the face of degrading natural resource base and impending climate change is the biggest challenge of the twenty first century. The consequences of mineral/micronutrient deficiencies or the hidden hunger in the developing world are indeed alarming and need urgent attention. In addressing the problems associated with mineral/micronutrient deficiency, grain legumes as an integral component of the farming systems in the developing world have to play a crucial role. For resource-poor populations, a strategy based on selecting and/or developing grain legume cultivars with grains denser in micronutrients, by biofortification, seems the most appropriate and attractive approach to address the problem. This is evident from the on-going global research efforts on biofortification to provide nutrient-dense grains for use by the poorest of the poor in the developing countries. Towards this end, rapidly growing genomics technologies hold promise to hasten the progress of breeding nutritious legume crops. In conjunction with the myriad of expansions in genomics, advances in other 'omics' technologies particularly plant ionomics or ionome profiling open up novel opportunities to comprehensively examine the elemental composition and mineral networks of an organism in a rapid and cost-effective manner. These emerging technologies would effectively guide the scientific community to enrich the edible parts of grain legumes with bio-available minerals and enhancers/promoters. We believe that the application of these new-generation tools in turn would provide crop-based solutions to hidden hunger worldwide for achieving global nutritional security.

  5. Genome-wide interaction-based association analysis identified multiple new susceptibility Loci for common diseases.

    Directory of Open Access Journals (Sweden)

    Yang Liu

    2011-03-01

    Full Text Available Genome-wide interaction-based association (GWIBA analysis has the potential to identify novel susceptibility loci. These interaction effects could be missed with the prevailing approaches in genome-wide association studies (GWAS. However, no convincing loci have been discovered exclusively from GWIBA methods, and the intensive computation involved is a major barrier for application. Here, we developed a fast, multi-thread/parallel program named "pair-wise interaction-based association mapping" (PIAM for exhaustive two-locus searches. With this program, we performed a complete GWIBA analysis on seven diseases with stringent control for false positives, and we validated the results for three of these diseases. We identified one pair-wise interaction between a previously identified locus, C1orf106, and one new locus, TEC, that was specific for Crohn's disease, with a Bonferroni corrected P < 0.05 (P = 0.039. This interaction was replicated with a pair of proxy linked loci (P = 0.013 on an independent dataset. Five other interactions had corrected P < 0.5. We identified the allelic effect of a locus close to SLC7A13 for coronary artery disease. This was replicated with a linked locus on an independent dataset (P = 1.09 × 10⁻⁷. Through a local validation analysis that evaluated association signals, rather than locus-based associations, we found that several other regions showed association/interaction signals with nominal P < 0.05. In conclusion, this study demonstrated that the GWIBA approach was successful for identifying novel loci, and the results provide new insights into the genetic architecture of common diseases. In addition, our PIAM program was capable of handling very large GWAS datasets that are likely to be produced in the future.

  6. Public health and valorization of genome-based technologies: a new model

    Directory of Open Access Journals (Sweden)

    Lal Jonathan A

    2011-12-01

    Full Text Available Abstract Background The success rate of timely translation of genome-based technologies to commercially feasible products/services with applicability in health care systems is significantly low. We identified both industry and scientists neglect health policy aspects when commercializing their technology, more specifically, Public Health Assessment Tools (PHAT and early on involvement of decision makers through which market authorization and reimbursements are dependent. While Technology Transfer (TT aims to facilitate translation of ideas into products, Health Technology Assessment, one component of PHAT, for example, facilitates translation of products/processes into healthcare services and eventually comes up with recommendations for decision makers. We aim to propose a new model of valorization to optimize integration of genome-based technologies into the healthcare system. Methods The method used to develop our model is an adapted version of the Fish Trap Model and the Basic Design Cycle. Results We found although different, similarities exist between TT and PHAT. Realizing the potential of being mutually beneficial justified our proposal of their relative parallel initiation. We observed that the Public Health Genomics Wheel should be included in this relative parallel activity to ensure all societal/policy aspects are dealt with preemptively by both stakeholders. On further analysis, we found out this whole process is dependent on the Value of Information. As a result, we present our LAL (Learning Adapting Leveling model which proposes, based on market demand; TT and PHAT by consultation/bi-lateral communication should advocate for relevant technologies. This can be achieved by public-private partnerships (PPPs. These widely defined PPPs create the innovation network which is a developing, consultative/collaborative-networking platform between TT and PHAT. This network has iterations and requires learning, assimilating and using knowledge

  7. Next-generation libraries for robust RNA interference-based genome-wide screens.

    Science.gov (United States)

    Kampmann, Martin; Horlbeck, Max A; Chen, Yuwen; Tsai, Jordan C; Bassik, Michael C; Gilbert, Luke A; Villalta, Jacqueline E; Kwon, S Chul; Chang, Hyeshik; Kim, V Narry; Weissman, Jonathan S

    2015-06-30

    Genetic screening based on loss-of-function phenotypes is a powerful discovery tool in biology. Although the recent development of clustered regularly interspaced short palindromic repeats (CRISPR)-based screening approaches in mammalian cell culture has enormous potential, RNA interference (RNAi)-based screening remains the method of choice in several biological contexts. We previously demonstrated that ultracomplex pooled short-hairpin RNA (shRNA) libraries can largely overcome the problem of RNAi off-target effects in genome-wide screens. Here, we systematically optimize several aspects of our shRNA library, including the promoter and microRNA context for shRNA expression, selection of guide strands, and features relevant for postscreen sample preparation for deep sequencing. We present next-generation high-complexity libraries targeting human and mouse protein-coding genes, which we grouped into 12 sublibraries based on biological function. A pilot screen suggests that our next-generation RNAi library performs comparably to current CRISPR interference (CRISPRi)-based approaches and can yield complementary results with high sensitivity and high specificity.

  8. A Rapid and Reproducible Genomic DNA Extraction Protocol for Sequence-Based Identification of Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and Green Algae

    National Research Council Canada - National Science Library

    Farkhondeh Saba; Moslem Papizadeh; Javad Khansha; Mahshid Sedghi; Mehrnoosh Rasooli; Mohammad Ali Amoozegar; Mohammad Reza Soudi; Seyed Abolhassan Shahzadeh Fazeli

    2017-01-01

    Background: Sequence-based identification of various microorganisms including Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and green algae necessitates an efficient and reproducible genome extraction...

  9. A Rapid and Reproducible Genomic DNA Extraction Protocol for Sequence-Based Identification of Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and Green Algae

    National Research Council Canada - National Science Library

    Farkhondeh Saba; Moslem Papizadeh; Javad Khansha; Mahshid Sedghi; Mehrnoosh Rasooli; Mohammad Ali Amoozegar; Mohammad Reza Soudi; Seyed Abolhassan Shahzadeh Fazeli

    2016-01-01

    Background: Sequence-based identification of various microorganisms including Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and green algae necessitates an efficient and reproducible genome extraction...

  10. Comprehensive cytological characterization of the Gossypium hirsutum genome based on the development of a set of chromosome cytological markers

    Directory of Open Access Journals (Sweden)

    Wenbo Shan

    2016-08-01

    Full Text Available Cotton is the world's most important natural fiber crop. It is also a model system for studying polyploidization, genomic organization, and genome-size variation. Integrating the cytological characterization of cotton with its genetic map will be essential for understanding its genome structure and evolution, as well as for performing further genetic-map based mapping and cloning. In this study, we isolated a complete set of bacterial artificial chromosome clones anchored to each of the 52 chromosome arms of the tetraploid cotton Gossypium hirsutum. Combining these with telomere and centromere markers, we constructed a standard karyotype for the G. hirsutum inbred line TM-1. We dissected the chromosome arm localizations of the 45S and 5S rDNA and suggest a centromere repositioning event in the homoeologous chromosomes AT09 and DT09. By integrating a systematic karyotype analysis with the genetic linkage map, we observed different genome sizes and chromosomal structures between the subgenomes of the tetraploid cotton and those of its diploid ancestors. Using evidence of conserved coding sequences, we suggest that the different evolutionary paths of non-coding retrotransposons account for most of the variation in size between the subgenomes of tetraploid cotton and its diploid ancestors. These results provide insights into the cotton genome and will facilitate further genome studies in G. hirsutum.

  11. Genome-wide siRNA-based functional genomics of pigmentation identifies novel genes and pathways that impact melanogenesis in human cells.

    Directory of Open Access Journals (Sweden)

    Anand K Ganesan

    2008-12-01

    Full Text Available Melanin protects the skin and eyes from the harmful effects of UV irradiation, protects neural cells from toxic insults, and is required for sound conduction in the inner ear. Aberrant regulation of melanogenesis underlies skin disorders (melasma and vitiligo, neurologic disorders (Parkinson's disease, auditory disorders (Waardenburg's syndrome, and opthalmologic disorders (age related macular degeneration. Much of the core synthetic machinery driving melanin production has been identified; however, the spectrum of gene products participating in melanogenesis in different physiological niches is poorly understood. Functional genomics based on RNA-mediated interference (RNAi provides the opportunity to derive unbiased comprehensive collections of pharmaceutically tractable single gene targets supporting melanin production. In this study, we have combined a high-throughput, cell-based, one-well/one-gene screening platform with a genome-wide arrayed synthetic library of chemically synthesized, small interfering RNAs to identify novel biological pathways that govern melanin biogenesis in human melanocytes. Ninety-two novel genes that support pigment production were identified with a low false discovery rate. Secondary validation and preliminary mechanistic studies identified a large panel of targets that converge on tyrosinase expression and stability. Small molecule inhibition of a family of gene products in this class was sufficient to impair chronic tyrosinase expression in pigmented melanoma cells and UV-induced tyrosinase expression in primary melanocytes. Isolation of molecular machinery known to support autophagosome biosynthesis from this screen, together with in vitro and in vivo validation, exposed a close functional relationship between melanogenesis and autophagy. In summary, these studies illustrate the power of RNAi-based functional genomics to identify novel genes, pathways, and pharmacologic agents that impact a biological phenotype

  12. Sequence-based prediction of single nucleosome positioning and genome-wide nucleosome occupancy.

    Science.gov (United States)

    van der Heijden, Thijn; van Vugt, Joke J F A; Logie, Colin; van Noort, John

    2012-09-18

    Nucleosome positioning dictates eukaryotic DNA compaction and access. To predict nucleosome positions in a statistical mechanics model, we exploited the knowledge that nucleosomes favor DNA sequences with specific periodically occurring dinucleotides. Our model is the first to capture both dyad position within a few base pairs, and free binding energy within 2 k(B)T, for all the known nucleosome positioning sequences. By applying Percus's equation to the derived energy landscape, we isolate sequence effects on genome-wide nucleosome occupancy from other factors that may influence nucleosome positioning. For both in vitro and in vivo systems, three parameters suffice to predict nucleosome occupancy with correlation coefficients of respectively 0.74 and 0.66. As predicted, we find the largest deviations in vivo around transcription start sites. This relatively simple algorithm can be used to guide future studies on the influence of DNA sequence on chromatin organization.

  13. Scanning for signatures of geographically restricted selection based on population genomics analysis

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Natural selection, as the driving force of human evolution, has direct impact on population differentiation. However, it is still unclear to what extent the genetic differentiation has been caused by natural selection. To explore this question, we performed a genome-wide scan with single nucleotide polymorphism (SNP) data from the International HapMap Project. Single locus FST analysis was applied to assess the frequency difference among populations in autosomes. Based on the empirical distribution of FST, we identified 12669 SNPs correlating to population differentiation and 1853 candidate genes subjected to geographic restricted natural selection. Further interpretation of gene ontogeny revealed 121 categories of biological process with the enrichments of candidate genes. Our results suggest that natural selection may play an important role in human population differentiation. In addition, our analysis provides new clues as well as research methods for our understanding of population differentiation and natural selection.

  14. A genome-based identification approach for members of the genus Bifidobacterium.

    Science.gov (United States)

    Ferrario, Chiara; Milani, Christian; Mancabelli, Leonardo; Lugli, Gabriele Andrea; Turroni, Francesca; Duranti, Sabrina; Mangifesta, Marta; Viappiani, Alice; Sinderen, Douwe van; Ventura, Marco

    2015-03-01

    During recent years, the significant and increasing interest in novel bifidobacterial strains with health-promoting characteristics has catalyzed the development of methods for efficient and reliable identification of Bifidobacterium strains at (sub) species level. We developed an assay based on recently acquired bifidobacterial genomic data and involving 98 primer pairs, called the Bifidobacterium-ampliseq panel. This panel includes multiplex PCR primers that target both core and variable genes of the pangenome of this genus. Our results demonstrate that the employment of the Bifidobacterium-ampliseq panel allows rapid and specific identification of the so far recognized 48 (sub)species harboring the Bifidobacterium genus, and thus represents a cost- and time-effective bifidobacterial screening methodology.

  15. The phylogenetic position of Neritimorpha based on the mitochondrial genome of Nerita melanotragus (Mollusca: Gastropoda).

    Science.gov (United States)

    Castro, Lyda R; Colgan, D J

    2010-11-01

    This is the first report of the mitochondrial gene order and almost-complete DNA sequence of a representative of the Neritimorpha, the highest-ranking gastropod clade lacking such data. Mitochondrial gene order in Nerita is largely plesiomorphic. Its only difference from the cephalopod Octopus vulgaris is a tRNA transposition shared by Vetigastropoda and Caenogastropoda. Genome arrangements were not informative enough to resolve the evolutionary relationships of Neritimorpha, Vetigastropoda and Caenogastropoda. The sister-group taxon of Neritimorpha varied in sequence-based analyses. Some suggested that Neritimorpha is the sister group of Caenogastropoda plus Heterobranchia and some that Neritimorpha and Caenogastropoda are sister groups. No analysis significantly supported the hypothesis that Vetigastroda is more closely related to Caenogastropoda than is Neritimorpha.

  16. Genome-based proteomic analysis of Lignosus rhinocerotis (Cooke) Ryvarden sclerotium.

    Science.gov (United States)

    Yap, Hui-Yeng Yeannie; Fung, Shin-Yee; Ng, Szu-Ting; Tan, Chon-Seng; Tan, Nget-Hong

    2015-01-01

    Lignosus rhinocerotis (Cooke) Ryvarden (Polyporales, Basidiomycota), also known as the tiger milk mushroom, has received much interest in recent years owing to its wide-range ethnobotanical uses and the recent success in its domestication. The sclerotium is the part with medicinal value. Using two-dimensional gel electrophoresis coupled with mass spectrometry analysis, a total of 16 non-redundant, major proteins were identified with high confidence level in L. rhinocerotis sclerotium based on its genome as custom mapping database. Some of these proteins, such as the putative lectins, immunomodulatory proteins, superoxide dismutase, and aegerolysin may have pharmaceutical potential; while others are involved in nutrient mobilization and the protective antioxidant mechanism in the sclerotium. The findings from this study provide a molecular basis for future research on potential pharmacologically active proteins of L. rhinocerotis.

  17. Haplotype Based Genome-Enabled Prediction of Traits Across Nordic Red Cattle Breeds

    DEFF Research Database (Denmark)

    Castro Dias Cuyabano, Beatriz; Lund, Mogens Sandø; Rosa, G J M;

    SNP markers have been widely explored in genome based prediction. This study explored the use of haplotype blocks (haploblocks) to predict five milk production traits (fertility, mastitis, protein, fat and milk yield), using a mix of Nordic Red cattle as reference population for training.......1% higher reliability than with the individual SNP approach in mastitis. This work gives evidence that predictions using haploblocks along with a combined training population of dairy cattle, may improve prediction accuracy of important traits in the individual populations........ Predictions were performed under a Bayesian approach comparing a GBLUP and a mixture model. In general, predictions were more reliable when using haploblocks instead of individual SNPs as predictors. The Danish Red cattle presented the largest benefit in predictive ability from haploblocks, achieving 5...

  18. Application of Array-based Comparative Genome Hybridization in Children with Developmental Delay or Mental Retardation

    Directory of Open Access Journals (Sweden)

    Jao-Shwann Liang

    2008-12-01

    Full Text Available Children with developmental delay or mental retardation (DD/MR are commonly en countered in child neurology clinics, and establishing an etiologic diagnosis is a challenge for child neurologists. Among the etiologies, chromosomal imbalance is one of the most important causes. However, many of these chromosomal imbalances are submicroscopic and cannot be detected by conventional cytogenetic methods. Microarray-based comparative genomic hybridization (array CGH is considered to be superior in the investigation of chromosomal deletions or duplications in children with DD/MR, and has been demonstrated to improve the diagnostic detection rate for these small chromosomal abnormalities. Here, we review the recent studies of array CGH in the evaluation of patients with idiopathic DD/MR.

  19. A genome-wide association study of neuroticism in a population-based sample.

    Science.gov (United States)

    Calboli, Federico C F; Tozzi, Federica; Galwey, Nicholas W; Antoniades, Athos; Mooser, Vincent; Preisig, Martin; Vollenweider, Peter; Waterworth, Dawn; Waeber, Gerard; Johnson, Michael R; Muglia, Pierandrea; Balding, David J

    2010-07-09

    Neuroticism is a moderately heritable personality trait considered to be a risk factor for developing major depression, anxiety disorders and dementia. We performed a genome-wide association study in 2,235 participants drawn from a population-based study of neuroticism, making this the largest association study for neuroticism to date. Neuroticism was measured by the Eysenck Personality Questionnaire. After Quality Control, we analysed 430,000 autosomal SNPs together with an additional 1.2 million SNPs imputed with high quality from the Hap Map CEU samples. We found a very small effect of population stratification, corrected using one principal component, and some cryptic kinship that required no correction. NKAIN2 showed suggestive evidence of association with neuroticism as a main effect (p neuroticism. Our study was powered to detect almost all SNPs explaining at least 2% of heritability, and so our results effectively exclude the existence of loci having a major effect on neuroticism.

  20. Comparative genomics-based investigation of resequencing targets in Vibrio fischeri: Focus on point miscalls and artefactual expansions

    Directory of Open Access Journals (Sweden)

    Ruby Edward G

    2008-03-01

    Full Text Available Abstract Background Sequence closure often represents the end-point of a genome project, without a system in place for subsequent improvement and refinement. Building on the genome project of Vibrio fischeri ES114, we used a comparative approach to identify and investigate genes that had a high likelihood of sequence error. Results Comparison of the V. fischeri ES114 genome with that of conspecific strain MJ11 identified 82 target loci in ES114 as containing likely errors, and thus of high-priority for resequencing. Analysis of the targets identified 75 loci in which an error had occurred, resulting in the correction of 10,457 base pairs to generate the new ES114 genomic sequence. A majority of the inaccurate loci involved frameshift errors, correction of which fused adjacent ORFs. Although insertions/deletions are thought to be rare in microbial genome assemblies, fourteen of the loci contained extraneous sequence of over 300 bp, likely due to imperfect contig ends that were misassembled in tandem rather than as overlapping segments. Additionally we updated the entire genome annotation with 113 new features including previously uncalled protein-coding genes, regulatory RNA genes and operon leader peptides, and we analyzed the transcriptional apparatus encoded by ES114. Conclusion We demonstrate that errors in microbial genome sequences, thought to largely be confined to point mutations, may also consist of other prevalent large-scale rearrangements such as insertions. Ongoing genome quality control and annotation programs are necessary to accompany technological advancements in data generation. These updates further advance V. fischeri as an important model for understanding intercellular communication and colonization of animal tissue.

  1. Evaluation of genome-wide power of genetic association studies based on empirical data from the HapMap project.

    Science.gov (United States)

    Nannya, Yasuhito; Taura, Kenjiro; Kurokawa, Mineo; Chiba, Shigeru; Ogawa, Seishi

    2007-10-15

    With recent advances in high-throughput single nucleotide polymorphism (SNP) typing technologies, genome-wide association studies have become a realistic approach to identify the causative genes that are responsible for common diseases of complex genetic traits. In this strategy, a trade-off between the increased genome coverage and a chance of finding SNPs incidentally showing a large statistics becomes serious due to extreme multiple-hypothesis testing. We investigated the extent to which this trade-off limits the genome-wide power with this approach by simulating a large number of case-control panels based on the empirical data from the HapMap Project. In our simulations, statistical costs of multiple hypothesis testing were evaluated by empirically calculating distributions of the maximum value of the chi(2) statistics for a series of marker sets having increasing numbers of SNPs, which were used to determine a genome-wide threshold in the following power simulations. With a practical study size, the cost of multiple testing largely offsets the potential benefits from increased genome coverage given modest genetic effects and/or low frequencies of causal alleles. In most realistic scenarios, increasing genome coverage becomes less influential on the power, while sample size is the predominant determinant of the feasibility of genome-wide association tests. Increasing genome coverage without corresponding increase in sample size will only consume resources without little gain in power. For common causal alleles with relatively large effect sizes [genotype relative risk > or =1.7], we can expect satisfactory power with currently available large-scale genotyping platforms using realistic sample size ( approximately 1000 per arm).

  2. Genomic and Haplotype Comparison of Butanol Producing Bacteria Based on 16S rDNA

    Directory of Open Access Journals (Sweden)

    Ekwan Nofa Wiratno

    2016-04-01

    Full Text Available High butanol demand for transportation fuel triggers butanol production development. Exploration of butanolproducing bacteria using genomic comparison and biogeography will help to develop butanol industry. The objectives of this research were butanol production, genome comparison and haplotype analysis of butanolproducing bacteria from Ranu Pani Lake sediment using 16S rDNA sequences. The highest butanol concentrations were showed by Paenibacillus polymyxa RP 2.2 isolate (10.34 g.L-1, followed by Bacillus methylotrophicus RP 3.2 and B. methylotrophicus RP 7.2 isolate (10.11 g.L-1 and 9.63 g.L-1 respectively. Paenibacillus polymyxa RP 2.2 showed similarity in nucleotide composition (ATGC with B. methylotrophicus RP 3.2, B. methylotrophicus RP 7.2, P. polymyxa CR1, Bacillus amyloliquefaciens NELB-12, and Paenibacillus polymyxa WR-2. Clostridium acetobutylicum ATCC 824 showed similarity in nucleotide composition (ATGC with Clostridium saccharoperbutylacetonicum N1-4, and Clostridium saccharobutylicum Ox29. The lowest G+C content was C. saccharobutylicum Ox29 (51.35%, and the highest was B. methylotrophicus RP 7.2 (55.33%. Conserved region of 16S rDNA (1044 bp were consisted of 17 conserved sequences. The number of Parsimony Informative Site (PIS was 319 spot and single tone was 48 spot. We found in this study that all of butanolproducing bacterial DNA sequences have clustered to 8 haplotypes. Based on the origin of sample, there were three haplotype groups. Bacteria from group A were could produce butanol 8.9-10.34 g.L-1, group B 9.2-14.2 g.L-1 and group C was could produce butanol 0.47 g.L-1. The haplotype analysis of bacteria based on 16S rDNA sequences in this study could predict capability of butanol production.

  3. Genomic connectivity networks based on the BrainSpan atlas of the developing human brain

    Science.gov (United States)

    Mahfouz, Ahmed; Ziats, Mark N.; Rennert, Owen M.; Lelieveldt, Boudewijn P. F.; Reinders, Marcel J. T.

    2014-03-01

    The human brain comprises systems of networks that span the molecular, cellular, anatomic and functional levels. Molecular studies of the developing brain have focused on elucidating networks among gene products that may drive cellular brain development by functioning together in biological pathways. On the other hand, studies of the brain connectome attempt to determine how anatomically distinct brain regions are connected to each other, either anatomically (diffusion tensor imaging) or functionally (functional MRI and EEG), and how they change over development. A global examination of the relationship between gene expression and connectivity in the developing human brain is necessary to understand how the genetic signature of different brain regions instructs connections to other regions. Furthermore, analyzing the development of connectivity networks based on the spatio-temporal dynamics of gene expression provides a new insight into the effect of neurodevelopmental disease genes on brain networks. In this work, we construct connectivity networks between brain regions based on the similarity of their gene expression signature, termed "Genomic Connectivity Networks" (GCNs). Genomic connectivity networks were constructed using data from the BrainSpan Transcriptional Atlas of the Developing Human Brain. Our goal was to understand how the genetic signatures of anatomically distinct brain regions relate to each other across development. We assessed the neurodevelopmental changes in connectivity patterns of brain regions when networks were constructed with genes implicated in the neurodevelopmental disorder autism (autism spectrum disorder; ASD). Using graph theory metrics to characterize the GCNs, we show that ASD-GCNs are relatively less connected later in development with the cerebellum showing a very distinct expression of ASD-associated genes compared to other brain regions.

  4. CFMDS: CUDA-based fast multidimensional scaling for genome-scale data.

    Science.gov (United States)

    Park, Sungin; Shin, Soo-Yong; Hwang, Kyu-Baek

    2012-01-01

    Multidimensional scaling (MDS) is a widely used approach to dimensionality reduction. It has been applied to feature selection and visualization in various areas. Among diverse MDS methods, the classical MDS is a simple and theoretically sound solution for projecting data objects onto a low dimensional space while preserving the original distances among them as much as possible. However, it is not trivial to apply it to genome-scale data (e.g., microarray gene expression profiles) on regular desktop computers, because of its high computational complexity. We implemented a highly-efficient software application, called CFMDS (CUDA-based Fast MultiDimensional Scaling), which produces an approximate solution of the classical MDS based on CUDA (compute unified device architecture) and the divide-and-conquer principle. CUDA is a parallel computing architecture exploiting the power of the GPU (graphics processing unit). The principle of divide-and-conquer was adopted for circumventing the small memory problem of usual graphics cards. Our application software has been tested on various benchmark datasets including microarrays and compared with the classical MDS algorithms implemented using C# and MATLAB. In our experiments, CFMDS was more than a hundred times faster for large data than such general solutions. Regarding the quality of dimensionality reduction, our approximate solutions were as good as those from the general solutions, as the Pearson's correlation coefficients between them were larger than 0.9. CFMDS is an expeditious solution for the data dimensionality reduction problem. It is especially useful for efficient processing of genome-scale data consisting of several thousands of objects in several minutes.

  5. The complete mitochondrial genome of the enigmatic bigheaded turtle (Platysternon: description of unusual genomic features and the reconciliation of phylogenetic hypotheses based on mitochondrial and nuclear DNA

    Directory of Open Access Journals (Sweden)

    Feldman Chris R

    2006-02-01

    Full Text Available Abstract Background The big-headed turtle (Platysternon megacephalum from east Asia is the sole living representative of a poorly-studied turtle lineage (Platysternidae. It has no close living relatives, and its phylogenetic position within turtles is one of the outstanding controversies in turtle systematics. Platysternon was traditionally considered to be close to snapping turtles (Chelydridae based on some studies of its morphology and mitochondrial (mt DNA, however, other studies of morphology and nuclear (nu DNA do not support that hypothesis. Results We sequenced the complete mt genome of Platysternon and the nearly complete mt genomes of two other relevant turtles and compared them to turtle mt genomes from the literature to form the largest molecular dataset used to date to address this issue. The resulting phylogeny robustly rejects the placement of Platysternon with Chelydridae, but instead shows that it is a member of the Testudinoidea, a diverse, nearly globally-distributed group that includes pond turtles and tortoises. We also discovered that Platysternon mtDNA has large-scale gene rearrangements and possesses two, nearly identical, control regions, features that distinguish it from all other studied turtles. Conclusion Our study robustly determines the phylogenetic placement of Platysternon and provides a well-resolved outline of major turtle lineages, while demonstrating the significantly greater resolving power of comparing large amounts of mt sequence over that of short fragments. Earlier phylogenies placing Platysternon with chelydrids required a temporal gap in the fossil record that is now unnecessary. The duplicated control regions and gene rearrangements of the Platysternon mtDNA probably resulted from the duplication of part of the genome and then the subsequent loss of redundant genes. Although it is possible that having two control regions may provide some advantage, explaining why the control regions would be

  6. Herbarium genomics

    DEFF Research Database (Denmark)

    Bakker, Freek T.; Lei, Di; Yu, Jiaying

    2016-01-01

    Herbarium genomics is proving promising as next-generation sequencing approaches are well suited to deal with the usually fragmented nature of archival DNA. We show that routine assembly of partial plastome sequences from herbarium specimens is feasible, from total DNA extracts and with specimens...... up to 146 years old. We use genome skimming and an automated assembly pipeline, Iterative Organelle Genome Assembly, that assembles paired-end reads into a series of candidate assemblies, the best one of which is selected based on likelihood estimation. We used 93 specimens from 12 different...... correlation between plastome coverage and nuclear genome size (C value) in our samples, but the range of C values included is limited. Finally, we conclude that routine plastome sequencing from herbarium specimens is feasible and cost-effective (compared with Sanger sequencing or plastome...

  7. Association between chromosomal aberration of COX8C and tethered spinal cord syndrome: array-based comparative genomic hybridization analysis

    Directory of Open Access Journals (Sweden)

    Qiu-jiong Zhao

    2016-01-01

    Full Text Available Copy number variations have been found in patients with neural tube abnormalities. In this study, we performed genome-wide screening using high-resolution array-based comparative genomic hybridization in three children with tethered spinal cord syndrome and two healthy parents. Of eight copy number variations, four were non-polymorphic. These non-polymorphic copy number variations were associated with Angelman and Prader-Willi syndromes, and microcephaly. Gene function enrichment analysis revealed that COX8C, a gene associated with metabolic disorders of the nervous system, was located in the copy number variation region of Patient 1. Our results indicate that array-based comparative genomic hybridization can be used to diagnose tethered spinal cord syndrome. Our results may help determine the pathogenesis of tethered spinal cord syndrome and prevent occurrence of this disease.

  8. Genomics-based early-phase clinical trials in oncology: recommendations from the task force on Methodology for the Development of Innovative Cancer Therapies.

    Science.gov (United States)

    Liu, Stephen V; Miller, Vincent A; Lobbezoo, Marinus W; Giaccone, Giuseppe

    2014-11-01

    The Methodology for the Development of Innovative Cancer Therapies (MDICT) task force discussed incorporation of genomic profiling into early (Phase I and II) clinical trials in oncology. The task force reviewed the challenges of standardising genomics data in a manner conducive to conducting clinical trials. Current barriers to successful and efficient implementation were identified and discussed, as well as the methods of genomic analysis, the proper setting for study and strategies to facilitate timely completion of genomics-based studies. The importance of properly capturing and cataloguing outcomes was also discussed. Several recommendations regarding the use of genomics in these trials are provided.

  9. The Direzione Generale per gli Affari Generali, il Bilancio, le Risorse Umane e la Formazione, in the re-organization of the Ministero per i Beni e le Attività Culturali: institutional purposes bent to Human Resources and Training

    Directory of Open Access Journals (Sweden)

    Alfredo Giacomazzi

    2004-02-01

    Full Text Available The plan for the training for 2005 is presented as part of the re-organization of the Ministry for Cultural Heritage and Activities and of the General Management for General Affairs, for the Budget, Human Resources and Training, in particular it is a presentation of the activities that the Administration wishes to carry out at an economic-financial level and in the management. To be more precise reference is made to the National Project “L-Lifelong Learning” – continuous learning with high technologies and basic computer training.

  10. GENOME-BASED MODELING AND DESIGN OF METABOLIC INTERACTIONS IN MICROBIAL COMMUNITIES

    Directory of Open Access Journals (Sweden)

    Radhakrishnan Mahadevan

    2012-10-01

    With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research.

  11. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

    Science.gov (United States)

    Xiang, Zuoshuang; Qin, Tingting; Qin, Zhaohui S; He, Yongqun

    2013-10-16

    The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining

  12. A geminivirus-based guide RNA delivery system for CRISPR/Cas9 mediated plant genome editing.

    Science.gov (United States)

    Yin, Kangquan; Han, Ting; Liu, Guang; Chen, Tianyuan; Wang, Ying; Yu, Alice Yunzi L; Liu, Yule

    2015-10-09

    CRISPR/Cas has emerged as potent genome editing technology and has successfully been applied in many organisms, including several plant species. However, delivery of genome editing reagents remains a challenge in plants. Here, we report a virus-based guide RNA (gRNA) delivery system for CRISPR/Cas9 mediated plant genome editing (VIGE) that can be used to precisely target genome locations and cause mutations. VIGE is performed by using a modified Cabbage Leaf Curl virus (CaLCuV) vector to express gRNAs in stable transgenic plants expressing Cas9. DNA sequencing confirmed VIGE of endogenous NbPDS3 and NbIspH genes in non-inoculated leaves because CaLCuV can infect plants systemically. Moreover, VIGE of NbPDS3 and NbIspH in newly developed leaves caused photo-bleached phenotype. These results demonstrate that geminivirus-based VIGE could be a powerful tool in plant genome editing.

  13. A DNA minor groove electronegative potential genome map based on photo-chemical probing

    DEFF Research Database (Denmark)

    Lindemose, Søren; Nielsen, Peter Eigil; Hansen, Morten

    2011-01-01

    The double-stranded DNA of the genome contains both sequence information directly relating to the protein and RNA coding as well as functional and structural information relating to protein recognition. Only recently is the importance of DNA shape in this recognition process being fully appreciated...... resolution of any genome, and it is illustrated how such detailed studies of this sequence dependent, inherent property of the DNA may reflect on genome organization, gene expression and chromosomal condensation....

  14. A SSR-based composite genetic linkage map for the cultivated peanut (Arachis hypogaea L. genome

    Directory of Open Access Journals (Sweden)

    Li Shaoxiong

    2010-01-01

    Full Text Available Abstract Background The construction of genetic linkage maps for cultivated peanut (Arachis hypogaea L. has and continues to be an important research goal to facilitate quantitative trait locus (QTL analysis and gene tagging for use in a marker-assisted selection in breeding. Even though a few maps have been developed, they were constructed using diploid or interspecific tetraploid populations. The most recently published intra-specific map was constructed from the cross of cultivated peanuts, in which only 135 simple sequence repeat (SSR markers were sparsely populated in 22 linkage groups. The more detailed linkage map with sufficient markers is necessary to be feasible for QTL identification and marker-assisted selection. The objective of this study was to construct a genetic linkage map of cultivated peanut using simple sequence repeat (SSR markers derived primarily from peanut genomic sequences, expressed sequence tags (ESTs, and by "data mining" sequences released in GenBank. Results Three recombinant inbred lines (RILs populations were constructed from three crosses with one common female parental line Yueyou 13, a high yielding Spanish market type. The four parents were screened with 1044 primer pairs designed to amplify SSRs and 901 primer pairs produced clear PCR products. Of the 901 primer pairs, 146, 124 and 64 primer pairs (markers were polymorphic in these populations, respectively, and used in genotyping these RIL populations. Individual linkage maps were constructed from each of the three populations and a composite map based on 93 common loci were created using JoinMap. The composite linkage maps consist of 22 composite linkage groups (LG with 175 SSR markers (including 47 SSRs on the published AA genome maps, representing the 20 chromosomes of A. hypogaea. The total composite map length is 885.4 cM, with an average marker density of 5.8 cM. Segregation distortion in the 3 populations was 23.0%, 13.5% and 7.8% of the markers

  15. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  16. Comprehensive population-based genome sequencing provides insight into hematopoietic regulatory mechanisms

    Science.gov (United States)

    Guo, Michael H.; Nandakumar, Satish K.; Ulirsch, Jacob C.; Zekavat, Seyedeh M.; Buenrostro, Jason D.; Natarajan, Pradeep; Salem, Rany M.; Chiarle, Roberto; Mitt, Mario; Kals, Mart; Pärn, Kalle; Fischer, Krista; Milani, Lili; Mägi, Reedik; Palta, Priit; Gabriel, Stacey B.; Metspalu, Andres; Lander, Eric S.; Kathiresan, Sekar; Hirschhorn, Joel N.; Esko, Tõnu; Sankaran, Vijay G.

    2017-01-01

    Genetic variants affecting hematopoiesis can influence commonly measured blood cell traits. To identify factors that affect hematopoiesis, we performed association studies for blood cell traits in the population-based Estonian Biobank using high-coverage whole-genome sequencing (WGS) in 2,284 samples and SNP genotyping in an additional 14,904 samples. Using up to 7,134 samples with available phenotype data, our analyses identified 17 associations across 14 blood cell traits. Integration of WGS-based fine-mapping and complementary epigenomic datasets provided evidence for causal mechanisms at several loci, including at a previously undiscovered basophil count-associated locus near the master hematopoietic transcription factor CEBPA. The fine-mapped variant at this basophil count association near CEBPA overlapped an enhancer active in common myeloid progenitors and influenced its activity. In situ perturbation of this enhancer by CRISPR/Cas9 mutagenesis in hematopoietic stem and progenitor cells demonstrated that it is necessary for and specifically regulates CEBPA expression during basophil differentiation. We additionally identified basophil count-associated variation at another more pleiotropic myeloid enhancer near GATA2, highlighting regulatory mechanisms for ordered expression of master hematopoietic regulators during lineage specification. Our study illustrates how population-based genetic studies can provide key insights into poorly understood cell differentiation processes of considerable physiologic relevance. PMID:28031487

  17. Physical anthropology and ethnicity in Asia: the transition from anthropometry to genome-based studies.

    Science.gov (United States)

    Bittles, A H; Black, M L; Wang, W

    2007-03-01

    Initial physical anthropology studies into ethnic diversity were largely dependent on comparative whole body and craniometric measurements, and through time assessments of ethnic diversity based on these measures exhibited increasing statistical sophistication. Since the 1990s, in Asia as elsewhere in the world, human diversity studies have increasingly utilized DNA-based analyses, with Y-chromosome and mtDNA markers providing complementary perspectives on the origins and gene pool structures of different ethnic groups. This approach is illustrated in a study of population genetic structure in PR China, in which DNA samples from the Han majority and eight ethnic minorities were analyzed. The Y-chromosome and mtDNA data showed multiple paternal geographical and ethnic origins but restricted maternal ancestries. However, interpretive problems were apparent in the definition of a number of the ethnic study populations, which appear to reflect political as well as genetic influences. In all anthropological studies, whether based on anthropometry or genomic analysis, unambiguous and appropriate community identification is a prerequisite.

  18. SiRNA sequence model: redesign algorithm based on available genome-wide libraries.

    Science.gov (United States)

    Kozak, Karol

    2013-12-01

    The evolution of RNA interference (RNAi) and the development of technologies exploiting its biology have enabled scientists to rapidly examine the consequences of depleting a particular gene product in cells. Design tools have been developed based on experimental data to increase the knockdown efficiency of siRNAs. Not all siRNAs that are developed to a given target mRNA are equally effective. Currently available design algorithms take an accession, identify conserved regions among their transcript space, find accessible regions within the mRNA, design all possible siRNAs for these regions, filter them based on multi-scores thresholds, and then perform off-target filtration. These different criteria are used by commercial suppliers to produce siRNA genome-wide libraries for different organisms. In this article, we analyze existing siRNA design algorithms and evaluate weight of design parameters for libraries produced in the last decade. We proved that not all essential parameters are currently applied by siRNA vendors. Based on our evaluation results, we were able to suggest an siRNA sequence pattern. The findings in our study can be useful for commercial vendors improving the design of RNAi constructs, by addressing both the issue of potency and the issue of specificity.

  19. Common minor histocompatibility antigen discovery based upon patient clinical outcomes and genomic data.

    Directory of Open Access Journals (Sweden)

    Paul M Armistead

    Full Text Available BACKGROUND: Minor histocompatibility antigens (mHA mediate much of the graft vs. leukemia (GvL effect and graft vs. host disease (GvHD in patients who undergo allogeneic stem cell transplantation (SCT. Therapeutic decision making and treatments based upon mHAs will require the evaluation of multiple candidate mHAs and the selection of those with the potential to have the greatest impact on clinical outcomes. We hypothesized that common, immunodominant mHAs, which are presented by HLA-A, B, and C molecules, can mediate clinically significant GvL and/or GvHD, and that these mHAs can be identified through association of genomic data with clinical outcomes. METHODOLOGY/PRINCIPAL FINDINGS: Because most mHAs result from donor/recipient cSNP disparities, we genotyped 57 myeloid leukemia patients and their donors at 13,917 cSNPs. We correlated the frequency of genetically predicted mHA disparities with clinical evidence of an immune response and then computationally screened all peptides mapping to the highly associated cSNPs for their ability to bind to HLA molecules. As proof-of-concept, we analyzed one predicted antigen, T4A, whose mHA mismatch trended towards improved overall and disease free survival in our cohort. T4A mHA mismatches occurred at the maximum theoretical frequency for any given SCT. T4A-specific CD8+ T lymphocytes (CTLs were detected in 3 of 4 evaluable post-transplant patients predicted to have a T4A mismatch. CONCLUSIONS/SIGNIFICANCE: Our method is the first to combine clinical outcomes data with genomics and bioinformatics methods to predict and confirm a mHA. Refinement of this method should enable the discovery of clinically relevant mHAs in the majority of transplant patients and possibly lead to novel immunotherapeutics.

  20. Prediction of drug-target interactions for drug repositioning only based on genomic expression similarity.

    Directory of Open Access Journals (Sweden)

    Kejian Wang

    Full Text Available Small drug molecules usually bind to multiple protein targets or even unintended off-targets. Such drug promiscuity has often led to unwanted or unexplained drug reactions, resulting in side effects or drug repositioning opportunities. So it is always an important issue in pharmacology to identify potential drug-target interactions (DTI. However, DTI discovery by experiment remains a challenging task, due to high expense of time and resources. Many computational methods are therefore developed to predict DTI with high throughput biological and clinical data. Here, we initiatively demonstrate that the on-target and off-target effects could be characterized by drug-induced in vitro genomic expression changes, e.g. the data in Connectivity Map (CMap. Thus, unknown ligands of a certain target can be found from the compounds showing high gene-expression similarity to the known ligands. Then to clarify the detailed practice of CMap based DTI prediction, we objectively evaluate how well each target is characterized by CMap. The results suggest that (1 some targets are better characterized than others, so the prediction models specific to these well characterized targets would be more accurate and reliable; (2 in some cases, a family of ligands for the same target tend to interact with common off-targets, which may help increase the efficiency of DTI discovery and explain the mechanisms of complicated drug actions. In the present study, CMap expression similarity is proposed as a novel indicator of drug-target interactions. The detailed strategies of improving data quality by decreasing the batch effect and building prediction models are also effectively established. We believe the success in CMap can be further translated into other public and commercial data of genomic expression, thus increasing research productivity towards valid drug repositioning and minimal side effects.

  1. Biosurveillance enterprise for operational awareness, a genomic-based approach for tracking pathogen virulence.

    Science.gov (United States)

    Valdivia-Granda, Willy A

    2013-11-15

    To protect our civilians and warfighters against both known and unknown pathogens, biodefense stakeholders must be able to foresee possible technological trends that could affect their threat risk assessment. However, significant flaws in how we prioritize our countermeasure-needs continue to limit their development. As recombinant biotechnology becomes increasingly simplified and inexpensive, small groups, and even individuals, can now achieve the design, synthesis, and production of pathogenic organisms for offensive purposes. Under these daunting circumstances, a reliable biosurveillance approach that supports a diversity of users could better provide early warnings about the emergence of new pathogens (both natural and manmade), reverse engineer pathogens carrying traits to avoid available countermeasures, and suggest the most appropriate detection, prophylactic, and therapeutic solutions. While impressive in data mining capabilities, real-time content analysis of social media data misses much of the complexity in the factual reality. Quality issues within freeform user-provided hashtags and biased referencing can significantly undermine our confidence in the information obtained to make critical decisions about the natural vs. intentional emergence of a pathogen. At the same time, errors in pathogen genomic records, the narrow scope of most databases, and the lack of standards and interoperability across different detection and diagnostic devices, continue to restrict the multidimensional biothreat assessment. The fragmentation of our biosurveillance efforts into different approaches has stultified attempts to implement any new foundational enterprise that is more reliable, more realistic and that avoids the scenario of the warning that comes too late. This discussion focus on the development of genomic-based decentralized medical intelligence and laboratory system to track emerging and novel microbial health threats in both military and civilian settings and

  2. Effect of predictor traits on accuracy of genomic breeding values for feed intake based on a limited cow reference population.

    Science.gov (United States)

    Pszczola, M; Veerkamp, R F; de Haas, Y; Wall, E; Strabel, T; Calus, M P L

    2013-11-01

    The genomic breeding value accuracy of scarcely recorded traits is low because of the limited number of phenotypic observations. One solution to increase the breeding value accuracy is to use predictor traits. This study investigated the impact of recording additional phenotypic observations for predictor traits on reference and evaluated animals on the genomic breeding value accuracy for a scarcely recorded trait. The scarcely recorded trait was dry matter intake (DMI, n = 869) and the predictor traits were fat-protein-corrected milk (FPCM, n = 1520) and live weight (LW, n = 1309). All phenotyped animals were genotyped and originated from research farms in Ireland, the United Kingdom and the Netherlands. Multi-trait REML was used to simultaneously estimate variance components and breeding values for DMI using available predictors. In addition, analyses using only pedigree relationships were performed. Breeding value accuracy was assessed through cross-validation (CV) and prediction error variance (PEV). CV groups (n = 7) were defined by splitting animals across genetic lines and management groups within country. With no additional traits recorded for the evaluated animals, both CV- and PEV-based accuracies for DMI were substantially higher for genomic than for pedigree analyses (CV: max. 0.26 for pedigree and 0.33 for genomic analyses; PEV: max. 0.45 and 0.52, respectively). With additional traits available, the differences between pedigree and genomic accuracies diminished. With additional recording for FPCM, pedigree accuracies increased from 0.26 to 0.47 for CV and from 0.45 to 0.48 for PEV. Genomic accuracies increased from 0.33 to 0.50 for CV and from 0.52 to 0.53 for PEV. With additional recording for LW instead of FPCM, pedigree accuracies increased to 0.54 for CV and to 0.61 for PEV. Genomic accuracies increased to 0.57 for CV and to 0.60 for PEV. With both FPCM and LW available for evaluated animals, accuracy was highest (0.62 for CV and 0.61 for PEV in

  3. Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains : a web-based resource

    Directory of Open Access Journals (Sweden)

    Vergnaud Gilles

    2004-01-01

    Full Text Available Abstract Background Polymorphic tandem repeat typing is a new generic technology which has been proved to be very efficient for bacterial pathogens such as B. anthracis, M. tuberculosis, P. aeruginosa, L. pneumophila, Y. pestis. The previously developed tandem repeats database takes advantage of the release of genome sequence data for a growing number of bacteria to facilitate the identification of tandem repeats. The development of an assay then requires the evaluation of tandem repeat polymorphism on well-selected sets of isolates. In the case of major human pathogens, such as S. aureus, more than one strain is being sequenced, so that tandem repeats most likely to be polymorphic can now be selected in silico based on genome sequence comparison. Results In addition to the previously described general Tandem Repeats Database, we have developed a tool to automatically identify tandem repeats of a different length in the genome sequence of two (or more closely related bacterial strains. Genome comparisons are pre-computed. The results of the comparisons are parsed in a database, which can be conveniently queried over the internet according to criteria of practical value, including repeat unit length, predicted size difference, etc. Comparisons are available for 16 bacterial species, and the orthopox viruses, including the variola virus and three of its close neighbors. Conclusions We are presenting an internet-based resource to help develop and perform tandem repeats based bacterial strain typing. The tools accessible at http://minisatellites.u-psud.fr now comprise four parts. The Tandem Repeats Database enables the identification of tandem repeats across entire genomes. The Strain Comparison Page identifies tandem repeats differing between different genome sequences from the same species. The "Blast in the Tandem Repeats Database" facilitates the search for a known tandem repeat and the prediction of amplification product sizes. The "Bacterial

  4. An siRNA-based functional genomics screen for the identification of regulators of ciliogenesis and ciliopathy genes

    NARCIS (Netherlands)

    Wheway, G.; Schmidts, M.; Mans, D.A.; Szymanska, K.; Nguyen, T.M.; Racher, H.; Phelps, I.G.; Toedt, G.; Kennedy, J.; Wunderlich, K.A.; Sorusch, N.; Abdelhamed, Z.A.; Natarajan, S.; Herridge, W.; Reeuwijk, J. van; Horn, N.; Boldt, K.; Parry, D.A.; Letteboer, S.J.F.; Roosing, S.; Adams, M.; Bell, S.M.; Bond, J.; Higgins, J.; Morrison, E.E.; Tomlinson, D.C.; Slaats, G.G.; Dam, T.J.P. van; Huang, L.; Kessler, K.; Giessl, A.; Logan, C.V.; Boyle, E.A.; Shendure, J.; Anazi, S.; Aldahmesh, M.; Hazzaa, S. Al; Hegele, R.A.; Ober, C.; Frosk, P.; Mhanni, A.A.; Chodirker, B.N.; Chudley, A.E.; Lamont, R.; Bernier, F.P.; Beaulieu, C.L.; Gordon, P.; Pon, R.T.; Donahue, C.; Barkovich, A.J.; Wolf, L.; Toomes, C.; Thiel, C.T.; Boycott, K.M.; McKibbin, M.; Inglehearn, C.F.; Stewart, F.; Omran, H.; Huynen, M.A.; Sergouniotis, P.I.; Alkuraya, F.S.; Parboosingh, J.S.; Innes, A.M.; Willoughby, C.E.; Giles, R.H.; Webster, A.R.; Ueffing, M.; Blacque, O.; Gleeson, J.G.; Wolfrum, U.; Beales, P.L.; Gibson, T.; Doherty, D.; Mitchison, H.M.; Roepman, R.; Johnson, C.A.

    2015-01-01

    Defects in primary cilium biogenesis underlie the ciliopathies, a growing group of genetic disorders. We describe a whole-genome siRNA-based reverse genetics screen for defects in biogenesis and/or maintenance of the primary cilium, obtaining a global resource. We identify 112 candidate ciliogenesis

  5. Spontaneous germline excision of Tol1, a DNA-based transposable element naturally occurring in the medaka fish genome.

    Science.gov (United States)

    Watanabe, Kohei; Koga, Hajime; Nakamura, Kodai; Fujita, Akiko; Hattori, Akimasa; Matsuda, Masaru; Koga, Akihiko

    2014-04-01

    DNA-based transposable elements are ubiquitous constituents of eukaryotic genomes. Vertebrates are, however, exceptional in that most of their DNA-based elements appear to be inactivated. The Tol1 element of the medaka fish, Oryzias latipes, is one of the few elements for which copies containing an undamaged gene have been found. Spontaneous transposition of this element in somatic cells has previously been demonstrated, but there is only indirect evidence for its germline transposition. Here, we show direct evidence of spontaneous excision in the germline. Tyrosinase is the key enzyme in melanin biosynthesis. In an albino laboratory strain of medaka fish, which is homozygous for a mutant tyrosinase gene in which a Tol1 copy is inserted, we identified de novo reversion mutations related to melanin pigmentation. The gamete-based reversion rate was as high as 0.4%. The revertant fish carried the tyrosinase gene from which the Tol1 copy had been excised. We previously reported the germline transposition of Tol2, another DNA-based element that is thought to be a recent invader of the medaka fish genome. Tol1 is an ancient resident of the genome. Our results indicate that even an old element can contribute to genetic variation in the host genome as a natural mutator.

  6. Meta-analysis of genome-wide association for migraine in six population-based European cohorts

    NARCIS (Netherlands)

    Ligthart, Lannie; de Vries, Boukje; Smith, Albert V.; Ikram, M. Arfan; Amin, Najaf; Hottenga, Jouke-Jan; Koelewijn, Stephany C.; Kattenberg, V. Mathijs; de Moor, Marleen H. M.; Janssens, A. Cecile J. W.; Aulchenko, Yurii S.; Oostra, Ben A.; de Geus, Eco J. C.; Smit, Johannes H.; Zitman, Frans G.; Uitterlinden, Andre G.; Hofman, Albert; Willemsen, Gonneke; Nyholt, Dale R.; Montgomery, Grant W.; Terwindt, Gisela M.; Gudnason, Vilmundur; Penninx, Brenda W. J. H.; Breteler, Monique; Ferrari, Michel D.; Launer, Lenore J.; van Duijn, Cornelia M.; van den Maagdenberg, Arn M. J. M.; Boomsma, Dorret I.

    2011-01-01

    Migraine is a common neurological disorder with a genetically complex background. This paper describes a meta-analysis of genome-wide association (GWA) studies on migraine, performed by the Dutch-Icelandic migraine genetics (DICE) consortium, which brings together six population-based European migra

  7. Next-generation sequencing-based genome diagnostics across clinical genetics centers : implementation choices and their effects

    NARCIS (Netherlands)

    Vrijenhoek, Terry; Kraaijeveld, Ken; Elferink, Martin; de Ligt, Joep; Kranendonk, Elcke; Santen, Gijs; Nijman, Isaac J.; Butler, Derek; Claes, Godelieve; Costessi, Adalberto; Dorlijn, Wim; van Eyndhoven, Winfried; Halley, Dicky J. J.; van den Hout, Mirjam C. G. N.; van Hove, Steven; Johansson, Lennart F.; Jongbloed, Jan D. H.; Kamps, Rick; Kockx, Christel E. M.; de Koning, Bart; Kriek, Marjolein; Deprez, Ronald Lekanne Dit; Lunstroo, Hans; Mannens, Marcel; Mook, Olaf R.; Nelen, Marcel; Ploem, Corrette; Rijnen, Marco; Saris, Jasper J.; Sinke, Richard; Sistermans, Erik; van Slegtenhorst, Marjon; Sleutels, Frank; van der Stoep, Nienke; van Tienhoven, Marianne; Vermaat, Martijn; Vogel, Maartje; Waisfisz, Quinten; Weiss, Janneke Marjan; van den Wijngaard, Arthur; van Workum, Wilbert; Ijntema, Helger; van der Zwaag, Bert; van IJcken, Wilfred F. J.; den Dunnen, Johan T.; Veltman, Joris A.; Hennekam, Raoul; Cuppen, Edwin

    2015-01-01

    Implementation of next-generation DNA sequencing (NGS) technology into routine diagnostic genome care requires strategic choices. Instead of theoretical discussions on the consequences of such choices, we compared NGS-based diagnostic practices in eight clinical genetic centers in the Netherlands, b

  8. Meta-analysis of genome-wide association for migraine in six population-based European cohorts

    NARCIS (Netherlands)

    Ligthart, Lannie; de Vries, Boukje; Smith, Albert V.; Ikram, M. Arfan; Amin, Najaf; Hottenga, Jouke-Jan; Koelewijn, Stephany C.; Kattenberg, V. Mathijs; de Moor, Marleen H. M.; Janssens, A. Cecile J. W.; Aulchenko, Yurii S.; Oostra, Ben A.; de Geus, Eco J. C.; Smit, Johannes H.; Zitman, Frans G.; Uitterlinden, Andre G.; Hofman, Albert; Willemsen, Gonneke; Nyholt, Dale R.; Montgomery, Grant W.; Terwindt, Gisela M.; Gudnason, Vilmundur; Penninx, Brenda W. J. H.; Breteler, Monique; Ferrari, Michel D.; Launer, Lenore J.; van Duijn, Cornelia M.; van den Maagdenberg, Arn M. J. M.; Boomsma, Dorret I.

    2011-01-01

    Migraine is a common neurological disorder with a genetically complex background. This paper describes a meta-analysis of genome-wide association (GWA) studies on migraine, performed by the Dutch-Icelandic migraine genetics (DICE) consortium, which brings together six population-based European migra

  9. Application of a Novel "Pan-Genome"-Based Strategy for Assigning RNAseq Transcript Reads to Staphylococcus aureus Strains.

    Science.gov (United States)

    Chaves-Moreno, Diego; Wos-Oxley, Melissa L; Jáuregui, Ruy; Medina, Eva; Oxley, Andrew P A; Pieper, Dietmar H

    2015-01-01

    Understanding the behaviour of opportunistic pathogens such as Staphylococcus aureus in their natural human niche holds great medical interest. With the development of sensitive molecular methods and deep-sequencing technology, it is now possible to robustly assess the global transcriptome of bacterial species in their human habitat. However, as the genomes of the colonizing strains are often not available compiling the pan-genome for the species of interest may provide an effective method to reliably and rapidly compile the transcriptome of a bacterial species. The pan-genome of S. aureus and its associated core and accessory components were compiled based on 25 genomes and comprises a total of 65,557 proteins clustering into 4,198 Orthologous Groups (OGs). The generated gene catalogue was used to assign RNAseq-derived sequence reads to S. aureus in a variety of in vitro and in vivo samples. In all cases, the number of reads that could be assigned to S. aureus was greater using the OG database than using a reference genome. Growth of two S. aureus strains in synthetic nasal medium confirmed that both strains experienced strong iron starvation. Traits such as purine metabolism appeared to be more affected in a typical nasal colonizer than in a strain representative of the S. aureus USA300 lineage. Mapping sequencing reads from a metatranscriptome generated from the human anterior nares allowed the identification of genes highly expressed by S. aureus in vivo. The OG database generated in this study represents a useful tool to obtain a snapshot of the functional attributes of S. aureus under different in vitro and in vivo conditions. The approach proved to be advantageous to assign sequencing reads to bacterial strains when RNAseq data is derived from samples where strain information and/or the corresponding genome/s are unavailable.

  10. Application of a Novel "Pan-Genome"-Based Strategy for Assigning RNAseq Transcript Reads to Staphylococcus aureus Strains.

    Directory of Open Access Journals (Sweden)

    Diego Chaves-Moreno

    Full Text Available Understanding the behaviour of opportunistic pathogens such as Staphylococcus aureus in their natural human niche holds great medical interest. With the development of sensitive molecular methods and deep-sequencing technology, it is now possible to robustly assess the global transcriptome of bacterial species in their human habitat. However, as the genomes of the colonizing strains are often not available compiling the pan-genome for the species of interest may provide an effective method to reliably and rapidly compile the transcriptome of a bacterial species. The pan-genome of S. aureus and its associated core and accessory components were compiled based on 25 genomes and comprises a total of 65,557 proteins clustering into 4,198 Orthologous Groups (OGs. The generated gene catalogue was used to assign RNAseq-derived sequence reads to S. aureus in a variety of in vitro and in vivo samples. In all cases, the number of reads that could be assigned to S. aureus was greater using the OG database than using a reference genome. Growth of two S. aureus strains in synthetic nasal medium confirmed that both strains experienced strong iron starvation. Traits such as purine metabolism appeared to be more affected in a typical nasal colonizer than in a strain representative of the S. aureus USA300 lineage. Mapping sequencing reads from a metatranscriptome generated from the human anterior nares allowed the identification of genes highly expressed by S. aureus in vivo. The OG database generated in this study represents a useful tool to obtain a snapshot of the functional attributes of S. aureus under different in vitro and in vivo conditions. The approach proved to be advantageous to assign sequencing reads to bacterial strains when RNAseq data is derived from samples where strain information and/or the corresponding genome/s are unavailable.

  11. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA

    Directory of Open Access Journals (Sweden)

    Chan Alan

    2006-06-01

    Full Text Available Abstract Background Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. Results In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A. Conclusion Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  12. Family-Based Genome-Wide Association Scan of Attention-Deficit/Hyperactivity Disorder

    Science.gov (United States)

    Mick, Eric; Todorov, Alexandre; Smalley, Susan; Hu, Xiaolan; Loo, Sandra; Todd, Richard D.; Biederman, Joseph; Byrne, Deirdre; Dechairo, Bryan; Guiney, Allan; McCracken, James; McGough, James; Nelson, Stanley F.; Reiersen, Angela M.; Wilens, Timothy E.; Wozniak, Janet; Neale, Benjamin M.; Faraone, Stephen V.

    2010-01-01

    Objective: Genes likely play a substantial role in the etiology of attention-deficit/hyperactivity disorder (ADHD). However, the genetic architecture of the disorder is unknown, and prior genome-wide association studies (GWAS) have not identified a genome-wide significant association. We have conducted a third, independent, multisite GWAS of…

  13. An international collaborative family-based whole genome quantitative trait linkage scan for myopic refractive error

    DEFF Research Database (Denmark)

    Abbott, Diana; Li, Yi-Ju; Guggenheim, Jeremy A;

    2012-01-01

    To investigate quantitative trait loci linked to refractive error, we performed a genome-wide quantitative trait linkage analysis using single nucleotide polymorphism markers and family data from five international sites.......To investigate quantitative trait loci linked to refractive error, we performed a genome-wide quantitative trait linkage analysis using single nucleotide polymorphism markers and family data from five international sites....

  14. Rapid development of PCR-based genome-specific repetitive DNA junction markers in wheat

    Science.gov (United States)

    In hexaploid wheat (Triticum aestivum L.) (AABBDD, C=17,000Mb), repeat DNA accounts for ~ 90% of the genome of which transposable elements (TEs) constitute 60-80 %. Despite the dynamic evolution of TEs, our previous study indicated that the majority of TEs between the homologous wheat genomes are co...

  15. Segmenting the human genome based on states of neutral genetic divergence.

    Science.gov (United States)

    Kuruppumullage Don, Prabhani; Ananda, Guruprasad; Chiaromonte, Francesca; Makova, Kateryna D

    2013-09-03

    Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human-orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states--each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants--including those associated with cancer and other diseases--and to improve computational predictions of noncoding functional elements.

  16. Integrating genome-based informatics to modernize global disease monitoring, information sharing, and response

    DEFF Research Database (Denmark)

    Aarestrup, Frank Møller; Brown, Eric W; Detter, Chris;

    2012-01-01

    The rapid advancement of genome technologies holds great promise for improving the quality and speed of clinical and public health laboratory investigations and for decreasing their cost. The latest generation of genome DNA sequencers can provide highly detailed and robust information on disease-...

  17. A proposed genus boundary for the prokaryotes based on genomic insights.

    Science.gov (United States)

    Qin, Qi-Long; Xie, Bin-Bin; Zhang, Xi-Ying; Chen, Xiu-Lan; Zhou, Bai-Cheng; Zhou, Jizhong; Oren, Aharon; Zhang, Yu-Zhong

    2014-06-01

    Genomic information has already been applied to prokaryotic species definition and classification. However, the contribution of the genome sequence to prokaryotic genus delimitation has been less studied. To gain insights into genus definition for the prokaryotes, we attempted to reveal the genus-level genomic differences in the current prokaryotic classification system and to delineate the boundary of a genus on the basis of genomic information. The average nucleotide sequence identity between two genomes can be used for prokaryotic species delineation, but it is not suitable for genus demarcation. We used the percentage of conserved proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance. A comprehensive genomic survey indicated that the POCP can serve as a robust genomic index for establishing the genus boundary for prokaryotic groups. Basically, two species belonging to the same genus would share at least half of their proteins. In a specific lineage, the genus and family/order ranks showed slight or no overlap in terms of POCP values. A prokaryotic genus can be defined as a group of species with all pairwise POCP values higher than 50%. Integration of whole-genome data into the current taxonomy system can provide comprehensive information for prokaryotic genus definition and delimitation. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  18. Efficient CRISPR/Cas9-based Genome Engineering in Human Pluripotent Stem Cells

    Science.gov (United States)

    Kime, Cody; Mandegar, Mohammad A.; Srivastava, Deepak; Yamanaka, Shinya; Conklin, Bruce R.; Rand, Tim A.

    2016-01-01

    Human pluripotent stem cells (hPSCs) are rapidly emerging as a powerful tool for biomedical discovery. The advent of human induced pluripotent stem (hiPS) cells with human embryonic stem (hES) cell-like properties has led to hPSCs with disease-specific genetic backgrounds for in-vitro disease modeling, drug discovery, mechanistic and developmental studies. To fully realize this potential it will be necessary to modify the genome of hPSCs with precision and flexibility. Pioneering experiments utilizing site-specific double strand break (DSB)-mediated genome engineering tools, including Zinc Finger Nucleases (ZFNs) and Transcription Activator-Like Effector Nucleases (TALENs), have paved the way to genome engineering in previously recalcitrant systems such as hPSCs. However, these methods are technically cumbersome and require significant expertise, which limited adoption. A major recent advance involving the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) endonuclease has dramatically simplified the effort required for genome engineering and will likely be adopted widely as the most rapid and flexible system for genome editing in hPSCs. Herein, we describe commonly practiced methods for CRISPR endonuclease genomic editing of hPSCs to cell lines containing genomes altered by Insertion/Deletion (INDEL) mutagenesis or insertion of recombinant genomic DNA. PMID:26724721

  19. Efficient CRISPR/Cas9-Based Genome Engineering in Human Pluripotent Stem Cells.

    Science.gov (United States)

    Kime, Cody; Mandegar, Mohammad A; Srivastava, Deepak; Yamanaka, Shinya; Conklin, Bruce R; Rand, Tim A

    2016-01-01

    Human pluripotent stem cells (hPS cells) are rapidly emerging as a powerful tool for biomedical discovery. The advent of human induced pluripotent stem cells (hiPS cells) with human embryonic stem (hES)-cell-like properties has led to hPS cells with disease-specific genetic backgrounds for in vitro disease modeling and drug discovery as well as mechanistic and developmental studies. To fully realize this potential, it will be necessary to modify the genome of hPS cells with precision and flexibility. Pioneering experiments utilizing site-specific double-strand break (DSB)-mediated genome engineering tools, including zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), have paved the way to genome engineering in previously recalcitrant systems such as hPS cells. However, these methods are technically cumbersome and require significant expertise, which has limited adoption. A major recent advance involving the clustered regularly interspaced short palindromic repeats (CRISPR) endonuclease has dramatically simplified the effort required for genome engineering and will likely be adopted widely as the most rapid and flexible system for genome editing in hPS cells. In this unit, we describe commonly practiced methods for CRISPR endonuclease genomic editing of hPS cells into cell lines containing genomes altered by insertion/deletion (indel) mutagenesis or insertion of recombinant genomic DNA.

  20. Bayesian prediction of bacterial growth temperature range based on genome sequences

    DEFF Research Database (Denmark)

    Jensen, Dan Børge; Vesth, Tammi Camilla; Hallin, Peter Fischer

    2012-01-01

    on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results: This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles...... and psychrophilic adapted bacterial genomes....

  1. Genome size and base composition of five Pinus species from the Balkan region.

    Science.gov (United States)

    Bogunic, F; Muratovic, E; Brown, S C; Siljak-Yakovlev, S

    2003-08-01

    The 2C DNA content and base composition of five Pinus (2 n=24) species and two Pinus subspecies from the Balkan region have been estimated by flow cytometry. P. heldreichii (five populations) and P. peuce (one population) were assessed for the first time, as also were subspecies of P. nigra (three populations-two of subspecies nigra and one of subspecies dalmatica) along with P. sylvestris, and P. mugo from the same region. The 2C DNA values of these Pinus ranged from 42.5 pg to 54.9 pg (41.7-53.8 x 10(9)bp), and the base composition was quite stable (about 39.5% GC). Significant differences were observed between two subspecies of P. nigra and even between two populations of subsp. nigra. The two other species (P. sylvestris and P. mugo) had 2C values of 42.5 pg and 42.8 pg, respectively, while that of P. peuce was 54.9 pg. These genome sizes are in accordance with published values except for P. sylvestris, which was 20% below estimates made by other authors.

  2. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Science.gov (United States)

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  3. P-value based analysis for shared controls design in genome-wide association studies.

    Science.gov (United States)

    Zaykin, Dmitri V; Kozbur, Damian O

    2010-11-01

    An appealing genome-wide association study design compares one large control group against several disease samples. A pioneering study by the Wellcome Trust Case Control Consortium that employed such a design has identified multiple susceptibility regions, many of which have been independently replicated. While reusing a control sample provides effective utilization of data, it also creates correlation between association statistics across diseases. An observation of a large association statistic for one of the diseases may greatly increase chances of observing a spuriously large association for a different disease. Accounting for the correlation is also particularly important when screening for SNPs that might be involved in a set of diseases with overlapping etiology. We describe methods that correct association statistics for dependency due to shared controls, and we describe ways to obtain a measure of overall evidence and to combine association signals across multiple diseases. The methods we describe require no access to individual subject data, instead, they efficiently utilize information contained in P-values for association reported for individual diseases. P-value based combined tests for association are flexible and essentially as powerful as the approach based on aggregating the individual subject data.

  4. Genome based analyses of six hexacorallian species reject the “naked coral” hypothesis

    KAUST Repository

    Wang, Xin

    2017-09-23

    Scleractinian corals are the foundation species of the coral-reef ecosystem. Their calcium carbonate skeletons form extensive structures that are home to millions of species, making coral reefs one of the most diverse ecosystems of our planet. However, our understanding of how reef-building corals have evolved the ability to calcify and become the ecosystem builders they are today is hampered by uncertain relationships within their subclass Hexacorallia. Corallimorpharians have been proposed to originate from a complex scleractinian ancestor that lost the ability to calcify in response to increasing ocean acidification, suggesting the possibility for corals to lose and gain the ability to calcify in response to increasing ocean acidification. Here we employed a phylogenomic approach using whole-genome data from six hexacorallian species to resolve the evolutionary relationship between reef-building corals and their non-calcifying relatives. Phylogenetic analysis based on 1,421 single-copy orthologs, as well as gene presence/absence and synteny information, converged on the same topologies, showing strong support for scleractinian monophyly and a corallimorpharian sister clade. Our broad phylogenomic approach using sequence-based and sequence-independent analyses provides unambiguous evidence for the monophyly of scleractinian corals and the rejection of corallimorpharians as descendants of a complex coral ancestor.

  5. Rapid in vitro splicing of coding sequences from genomic DNA by isothermal recombination reaction-based PCR

    Directory of Open Access Journals (Sweden)

    Wenxuan Chen

    2016-09-01

    Full Text Available Cloning of coding sequence (CDS is an important step for gene function research. Here, we reported a simple and efficient strategy for assembling multiple-exon into an intron-free CDS from genomic DNA (gDNA by an isothermal recombination reaction-based PCR (IRR-PCR method. As an example, a 2067-bp full-length CDS of the anther-specific expression gene OsABCG15, which is composed of seven exons and six introns, was generated by IRR-PCR using genomic DNA of rice leaf as the template. Actually, this approach can be wildly applied to any DNA sequences assembly to achieve CDS cloning, gene fusion and multiple site-directed mutagenesis in functional genomics studies in vitro.

  6. High accuracy genotyping directly from genomic DNA using a rolling circle amplification based assay

    Directory of Open Access Journals (Sweden)

    Du Yuefen

    2003-05-01

    Full Text Available Abstract Background Rolling circle amplification of ligated probes is a simple and sensitive means for genotyping directly from genomic DNA. SNPs and mutations are interrogated with open circle probes (OCP that can be circularized by DNA ligase when the probe matches the genotype. An amplified detection signal is generated by exponential rolling circle amplification (ERCA of the circularized probe. The low cost and scalability of ligation/ERCA genotyping makes it ideally suited for automated, high throughput methods. Results A retrospective study using human genomic DNA samples of known genotype was performed for four different clinically relevant mutations: Factor V Leiden, Factor II prothrombin, and two hemochromatosis mutations, C282Y and H63D. Greater than 99% accuracy was obtained genotyping genomic DNA samples from hundreds of different individuals. The combined process of ligation/ERCA was performed in a single tube and produced fluorescent signal directly from genomic DNA in less than an hour. In each assay, the probes for both normal and mutant alleles were combined in a single reaction. Multiple ERCA primers combined with a quenched-peptide nucleic acid (Q-PNA fluorescent detection system greatly accellerated the appearance of signal. Probes designed with hairpin structures reduced misamplification. Genotyping accuracy was identical from either purified genomic DNA or genomic DNA generated using whole genome amplification (WGA. Fluorescent signal output was measured in real time and as an end point. Conclusions Combining the optimal elements for ligation/ERCA genotyping has resulted in a highly accurate single tube assay for genotyping directly from genomic DNA samples. Accuracy exceeded 99 % for four probe sets targeting clinically relevant mutations. No genotypes were called incorrectly using either genomic DNA or whole genome amplified sample.

  7. Genome-enabled Modeling of Microbial Biogeochemistry using a Trait-based Approach. Does Increasing Metabolic Complexity Increase Predictive Capabilities?

    Science.gov (United States)

    King, E.; Karaoz, U.; Molins, S.; Bouskill, N.; Anantharaman, K.; Beller, H. R.; Banfield, J. F.; Steefel, C. I.; Brodie, E.

    2015-12-01

    The biogeochemical functioning of ecosystems is shaped in part by genomic information stored in the subsurface microbiome. Cultivation-independent approaches allow us to extract this information through reconstruction of thousands of genomes from a microbial community. Analysis of these genomes, in turn, gives an indication of the organisms present and their functional roles. However, metagenomic analyses can currently deliver thousands of different genomes that range in abundance/importance, requiring the identification and assimilation of key physiologies and metabolisms to be represented as traits for successful simulation of subsurface processes. Here we focus on incorporating -omics information into BioCrunch, a genome-informed trait-based model that represents the diversity of microbial functional processes within a reactive transport framework. This approach models the rate of nutrient uptake and the thermodynamics of coupled electron donors and acceptors for a range of microbial metabolisms including heterotrophs and chemolithotrophs. Metabolism of exogenous substrates fuels catabolic and anabolic processes, with the proportion of energy used for cellular maintenance, respiration, biomass development, and enzyme production based upon dynamic intracellular and environmental conditions. This internal resource partitioning represents a trade-off against biomass formation and results in microbial community emergence across a fitness landscape. Biocrunch was used here in simulations that included organisms and metabolic pathways derived from a dataset of ~1200 non-redundant genomes reflecting a microbial community in a floodplain aquifer. Metagenomic data was directly used to parameterize trait values related to growth and to identify trait linkages associated with respiration, fermentation, and key enzymatic functions such as plant polymer degradation. Simulations spanned a range of metabolic complexities and highlight benefits originating from simulations

  8. Pedigree-based analysis of derivation of genome segments of an elite rice reveals key regions during its breeding.

    Science.gov (United States)

    Zhou, Degui; Chen, Wei; Lin, Zechuan; Chen, Haodong; Wang, Chongrong; Li, Hong; Yu, Renbo; Zhang, Fengyun; Zhen, Gang; Yi, Junliang; Li, Kanghuo; Liu, Yaoguang; Terzaghi, William; Tang, Xiaoyan; He, Hang; Zhou, Shaochuan; Deng, Xing Wang

    2016-02-01

    Analyses of genome variations with high-throughput assays have improved our understanding of genetic basis of crop domestication and identified the selected genome regions, but little is known about that of modern breeding, which has limited the usefulness of massive elite cultivars in further breeding. Here we deploy pedigree-based analysis of an elite rice, Huanghuazhan, to exploit key genome regions during its breeding. The cultivars in the pedigree were resequenced with 7.6× depth on average, and 2.1 million high-quality single nucleotide polymorphisms (SNPs) were obtained. Tracing the derivation of genome blocks with pedigree and information on SNPs revealed the chromosomal recombination during breeding, which showed that 26.22% of Huanghuazhan genome are strictly conserved key regions. These major effect regions were further supported by a QTL mapping of 260 recombinant inbred lines derived from the cross of Huanghuazhan and a very dissimilar cultivar, Shuanggui 36, and by the genome profile of eight cultivars and 36 elite lines derived from Huanghuazhan. Hitting these regions with the cloned genes revealed they include numbers of key genes, which were then applied to demonstrate how Huanghuazhan were bred after 30 years of effort and to dissect the deficiency of artificial selection. We concluded the regions are helpful to the further breeding based on this pedigree and performing breeding by design. Our study provides genetic dissection of modern rice breeding and sheds new light on how to perform genomewide breeding by design. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  9. A CRISPR/Cas9-based method and primer design tool for seamless genome editing in fission yeast

    Science.gov (United States)

    Rodríguez-López, María; Cotobal, Cristina; Fernández-Sánchez, Oscar; Borbarán Bravo, Natalia; Oktriani, Risky; Abendroth, Heike; Uka, Dardan; Hoti, Mimoza; Wang, Jin; Zaratiegui, Mikel; Bähler, Jürg

    2017-01-01

    In the fission yeast Schizosaccharomyces pombe the prevailing approach for gene manipulations is based on homologous recombination of a PCR product that contains genomic target sequences and a selectable marker. The CRISPR/Cas9 system has recently been implemented in fission yeast, which allows for seamless genome editing without integration of a selection marker or leaving any other genomic ‘scars’. The published method involves manual design of the single guide RNA (sgRNA), and digestion of a large plasmid with a problematic restriction enzyme to clone the sgRNA. To increase the efficiency of this approach, we have established and optimized a PCR-based system to clone the sgRNA without restriction enzymes into a plasmid with a dominant natMX6 (nourseothricin) selection marker. We also provide a web-tool, CRISPR4P, to support the design of the sgRNAs and the primers required for the entire process of seamless DNA deletion. Moreover, we report the preparation of G1-synchronized and cryopreserved S. pombe cells, which greatly increases the efficiency and speed for transformations, and may also facilitate standard gene manipulations. Applying this optimized CRISPR/Cas9-based approach, we have successfully deleted over 80 different non-coding RNA genes, which are generally lowly expressed, and have inserted 7 point mutations in 4 different genomic regions. PMID:28612052

  10. Clusters versus affinity-based approaches in F. tularensis whole genome search of CTL epitopes.

    Directory of Open Access Journals (Sweden)

    Anat Zvi

    Full Text Available Deciphering the cellular immunome of a bacterial pathogen is challenging due to the enormous number of putative peptidic determinants. State-of-the-art prediction methods developed in recent years enable to significantly reduce the number of peptides to be screened, yet the number of remaining candidates for experimental evaluation is still in the range of ten-thousands, even for a limited coverage of MHC alleles. We have recently established a resource-efficient approach for down selection of candidates and enrichment of true positives, based on selection of predicted MHC binders located in high density "hotspots" of putative epitopes. This cluster-based approach was applied to an unbiased, whole genome search of Francisella tularensis CTL epitopes and was shown to yield a 17-25 fold higher level of responders as compared to randomly selected predicted epitopes tested in Kb/Db C57BL/6 mice. In the present study, we further evaluate the cluster-based approach (down to a lower density range and compare this approach to the classical affinity-based approach by testing putative CTL epitopes with predicted IC(50 values of <10 nM. We demonstrate that while the percent of responders achieved by both approaches is similar, the profile of responders is different, and the predicted binding affinity of most responders in the cluster-based approach is relatively low (geometric mean of 170 nM, rendering the two approaches complimentary. The cluster-based approach is further validated in BALB/c F. tularensis immunized mice belonging to another allelic restriction (Kd/Dd group. To date, the cluster-based approach yielded over 200 novel F. tularensis peptides eliciting a cellular response, all were verified as MHC class I binders, thereby substantially increasing the F. tularensis dataset of known CTL epitopes. The generality and power of the high density cluster-based approach suggest that it can be a valuable tool for identification of novel CTLs in

  11. GenExp: an interactive web-based genomic DAS client with client-side data rendering.

    Directory of Open Access Journals (Sweden)

    Bernat Gel Moreno

    Full Text Available BACKGROUND: The Distributed Annotation System (DAS offers a standard protocol for sharing and integrating annotations on biological sequences. There are more than 1000 DAS sources available and the number is steadily increasing. Clients are an essential part of the DAS system and integrate data from several independent sources in order to create a useful representation to the user. While web-based DAS clients exist, most of them do not have direct interaction capabilities such as dragging and zooming with the mouse. RESULTS: Here we present GenExp, a web based and fully interactive visual DAS client. GenExp is a genome oriented DAS client capable of creating informative representations of genomic data zooming out from base level to complete chromosomes. It proposes a novel approach to genomic data rendering and uses the latest HTML5 web technologies to create the data representation inside the client browser. Thanks to client-side rendering most position changes do not need a network request to the server and so responses to zooming and panning are almost immediate. In GenExp it is possible to explore the genome intuitively moving it with the mouse just like geographical map applications. Additionally, in GenExp it is possible to have more than one data viewer at the same time and to save the current state of the application to revisit it later on. CONCLUSIONS: GenExp is a new interactive web-based client for DAS and addresses some of the short-comings of the existing clients. It uses client-side data rendering techniques resulting in easier genome browsing and exploration. GenExp is open source under the GPL license and it is freely available at http://gralggen.lsi.upc.edu/recerca/genexp.

  12. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes.

    Science.gov (United States)

    Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Minuchehr, Zarrin; Nassiri, Mohammad Reza

    2015-03-01

    During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS)-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data.

  13. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions

    Science.gov (United States)

    Burton, Joshua N.; Adey, Andrew; Patwardhan, Rupali P.; Qiu, Ruolan; Kitzman, Jacob O.; Shendure, Jay

    2014-01-01

    Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of H. sapiens and key model organisms generated by the Human Genome Project. To address this, we need scalable, cost-effective methods enabling chromosome-scale contiguity. Here we show that genome-wide chromatin interaction datasets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving – for human – 98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes. PMID:24185095

  14. Multiplex CRISPR/Cas9-based genome engineering from a single lentiviral vector

    OpenAIRE

    Kabadi, Ami M.; Ousterout, David G.; Hilton, Isaac B.; Gersbach, Charles A.

    2014-01-01

    Engineered DNA-binding proteins that manipulate the human genome and transcriptome have enabled rapid advances in biomedical research. In particular, the RNA-guided CRISPR/Cas9 system has recently been engineered to create site-specific double-strand breaks for genome editing or to direct targeted transcriptional regulation. A unique capability of the CRISPR/Cas9 system is multiplex genome engineering by delivering a single Cas9 enzyme and two or more single guide RNAs (sgRNAs) targeted to di...

  15. Multiplex CRISPR/Cas9-based genome engineering from a single lentiviral vector

    OpenAIRE

    Kabadi, Ami M.; Ousterout, David G.; Hilton, Isaac B.; Gersbach, Charles A.

    2014-01-01

    Engineered DNA-binding proteins that manipulate the human genome and transcriptome have enabled rapid advances in biomedical research. In particular, the RNA-guided CRISPR/Cas9 system has recently been engineered to create site-specific double-strand breaks for genome editing or to direct targeted transcriptional regulation. A unique capability of the CRISPR/Cas9 system is multiplex genome engineering by delivering a single Cas9 enzyme and two or more single guide RNAs (sgRNAs) targeted to di...

  16. DNA barcoding: a genomic-based tool for authentication of phytomedicinals and its products

    Directory of Open Access Journals (Sweden)

    Balachandran KRS

    2015-12-01

    Full Text Available Karpaga Raja Sundari Balachandran, Saravanan Mohanasundaram, Sathishkumar Ramalingam Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, Tamil Nadu, India Abstract: DNA barcoding helps to identify the plant materials based on short, standardized gene sequences in a rapid, accurate, and cost-effective manner. Recent reports reveal that DNA barcoding can be used for the assignment of unknown specimens to a taxonomic group, authentic identification of phytomedicinals, and in plant biodiversity conservation. Research indicates that there is no single universal barcode candidate for identification of all plant groups. Hence, comparative analysis of plant barcode loci is essential for choosing a best candidate for authenticating particular medicinal plant genus/families. Currently, both chloroplast/nuclear regions are used as universal barcodes for the authentication of phytomedicinals. A recent advance in genomics has further enhanced the progress in DNA barcoding of plants by the introduction of high-throughput techniques like next generation sequencing, which has paved the way for complete plastome sequencing that is now termed as super-barcodes. These approaches could improve the traditional ethno-botanical and scientific knowledge of phytomedicinals and their safe use. Hence, current focus is on the investigation of phytomedicinals and herbal product integrity and authenticity through DNA barcoding with the goal of protecting consumers from potential health risks associated with product substitution and contamination. Keywords: phytomedicinals, DNA barcoding, NGS, super-barcodes, authentication, ethno-genetics

  17. Using Array-Based Comparative Genomic Hybridization to Diagnose Pallister-Killian Syndrome.

    Science.gov (United States)

    Lee, Mi Na; Lee, Jiwon; Yu, Hee Joon; Lee, Jeehun; Kim, Sun Hee

    2017-01-01

    Pallister-Killian syndrome (PKS) is a rare multisystem disorder characterized by isochromosome 12p and tissue-limited mosaic tetrasomy 12p. In this study, we diagnosed three pediatric patients who were suspicious of having PKS using array-based comparative genomic hybridization (array CGH) and FISH analyses performed on peripheral lymphocytes. Patients 1 and 2 presented with craniofacial dysmorphic features, hypotonia, and a developmental delay. Array CGH revealed two to three copies of 12p in patient 1 and three copies in patient 2. FISH analysis showed trisomy or tetrasomy 12p. Patient 3, who had clinical features comparable to those of patients 1 and 2, was diagnosed by using FISH analysis alone. Here, we report three patients with mosaic tetrasomy 12p. There have been only reported cases diagnosed by chromosome analysis and FISH analysis on skin fibroblast or amniotic fluid. To our knowledge, patient 1 was the first case diagnosed by using array CGH performed on peripheral lymphocytes in Korea.

  18. Stochastic segmentation models for array-based comparative genomic hybridization data analysis.

    Science.gov (United States)

    Lai, Tze Leung; Xing, Haipeng; Zhang, Nancy

    2008-04-01

    Array-based comparative genomic hybridization (array-CGH) is a high throughput, high resolution technique for studying the genetics of cancer. Analysis of array-CGH data typically involves estimation of the underlying chromosome copy numbers from the log fluorescence ratios and segmenting the chromosome into regions with the same copy number at each location. We propose for the analysis of array-CGH data, a new stochastic segmentation model and an associated estimation procedure that has attractive statistical and computational properties. An important benefit of this Bayesian segmentation model is that it yields explicit formulas for posterior means, which can be used to estimate the signal directly without performing segmentation. Other quantities relating to the posterior distribution that are useful for providing confidence assessments of any given segmentation can also be estimated by using our method. We propose an approximation method whose computation time is linear in sequence length which makes our method practically applicable to the new higher density arrays. Simulation studies and applications to real array-CGH data illustrate the advantages of the proposed approach.

  19. [Optimization of genomic DNA extraction with magnetic bead- based semi-automatic system].

    Science.gov (United States)

    Ling, Jie; Wang, Hao; Zhang, Shuai; Zhang, Dan-dan; Lai, Mao-de; Zhu, Yi-min

    2012-05-01

    To develop a rapid and effective method for genomic DNA extraction with magnetic bead-based semi-automatic system. DNA was extracted from whole blood samples semi-automatically with nucleic acid automatic extraction system.The concentration and purity of samples was determined by UV-spectrophotometer. Orthogonal design was used to analyze the main effect of lysis time, blood volume, magnetic bead quantity and ethanol concentration on the DNA yield; also the 2-way interaction of these factors. Lysis time, blood volume, magnetic bead quantity and ethanol concentration were associated with DNA yield (PDNA yield was higher under the condition with 15 min of lysis time, 100 μl of blood volume, 80 μl of magnetic beads and 80 % of ethanol. A significant association was found between the magnetic bead quantity and DNA purity OD260/OD280 (P=0.008). Interaction of blood volume and lysis time also existed (P=0.013). DNA purity was better when the extracting condition was 40 μl of magnetic beads, 15 min of lysis time and 100 μl of blood volume. Magnetic beads and ethanol concentration were associated with DNA purity OD260/OD230 (P=0.017 and Pgenomic DNA from the whole blood samples.

  20. Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties.

    Science.gov (United States)

    Yue, Zhenyu; Zhang, Wenna; Lu, Yongming; Yang, Qiaoyue; Ding, Qiuying; Xia, Junfeng; Chen, Yan

    2015-01-01

    Natural products play a significant role in cancer chemotherapy. They are likely to provide many lead structures, which can be used as templates for the construction of novel drugs with enhanced antitumor activity. Traditional research approaches studied structure-activity relationship of natural products and obtained key structural properties, such as chemical bond or group, with the purpose of ascertaining their effect on a single cell line or a single tissue type. Here, for the first time, we develop a machine learning method to comprehensively predict natural products responses against a panel of cancer cell lines based on both the gene expression and the chemical properties of natural products. The results on two datasets, training set and independent test set, show that this proposed method yields significantly better prediction accuracy. In addition, we also demonstrate the predictive power of our proposed method by modeling the cancer cell sensitivity to two natural products, Curcumin and Resveratrol, which indicate that our method can effectively predict the response of cancer cell lines to these two natural products. Taken together, the method will facilitate the identification of natural products as cancer therapies and the development of precision medicine by linking the features of patient genomes to natural product sensitivity.

  1. A genome-wide association study of neuroticism in a population-based sample.

    Directory of Open Access Journals (Sweden)

    Federico C F Calboli

    Full Text Available Neuroticism is a moderately heritable personality trait considered to be a risk factor for developing major depression, anxiety disorders and dementia. We performed a genome-wide association study in 2,235 participants drawn from a population-based study of neuroticism, making this the largest association study for neuroticism to date. Neuroticism was measured by the Eysenck Personality Questionnaire. After Quality Control, we analysed 430,000 autosomal SNPs together with an additional 1.2 million SNPs imputed with high quality from the Hap Map CEU samples. We found a very small effect of population stratification, corrected using one principal component, and some cryptic kinship that required no correction. NKAIN2 showed suggestive evidence of association with neuroticism as a main effect (p < 10(-6 and GPC6 showed suggestive evidence for interaction with age (p approximately = 10(-7. We found support for one previously-reported association (PDE4D, but failed to replicate other recent reports. These results suggest common SNP variation does not strongly influence neuroticism. Our study was powered to detect almost all SNPs explaining at least 2% of heritability, and so our results effectively exclude the existence of loci having a major effect on neuroticism.

  2. Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties

    Directory of Open Access Journals (Sweden)

    Zhenyu Yue

    2015-11-01

    Full Text Available Natural products play a significant role in cancer chemotherapy. They are likely to provide many lead structures, which can be used as templates for the construction of novel drugs with enhanced antitumor activity. Traditional research approaches studied structure-activity relationship of natural products and obtained key structural properties, such as chemical bond or group, with the purpose of ascertaining their effect on a single cell line or a single tissue type. Here, for the first time, we develop a machine learning method to comprehensively predict natural products responses against a panel of cancer cell lines based on both the gene expression and the chemical properties of natural products. The results on two datasets, training set and independent test set, show that this proposed method yields significantly better prediction accuracy. In addition, we also demonstrate the predictive power of our proposed method by modeling the cancer cell sensitivity to two natural products, Curcumin and Resveratrol, which indicate that our method can effectively predict the response of cancer cell lines to these two natural products. Taken together, the method will facilitate the identification of natural products as cancer therapies and the development of precision medicine by linking the features of patient genomes to natural product sensitivity.

  3. Geosmin induces genomic instability in the mammalian cell microplate-based comet assay.

    Science.gov (United States)

    Silva, Aline Flor; Lehmann, Mauricio; Dihl, Rafael Rodrigues

    2015-11-01

    Geosmin (GEO) (trans-1,10-dimethyl-trans-9-decalol) is a metabolite that renders earthy and musty taste and odor to water. Data of GEO genotoxicity on mammalian cells are scarce in the literature. Thus, the present study assessed the genotoxicity of GEO on Chinese hamster ovary (CHO) cells in the microplate-based comet assay. The percent of tail DNA (tail intensity (TI)), tail moment (TM), and tail length (TL) were used as parameters for DNA damage assessment. The results demonstrated that concentrations of GEO of 30 and 60 μg/mL were genotoxic to CHO cells after 4- and 24-h exposure periods, in all parameters evaluated, such as TI, TM, and TL. Additionally, GEO 15 μg/mL was genotoxic in the three parameters only in the 24-h exposure time. The same was observed for GEO 7.5 μg/mL, which induced significant DNA damage observed as TI in the 24-h treatment. The results present evidence that exposure to GEO may be associated with genomic instability in mammalian cells.

  4. Homology-based double-strand break-induced genome engineering in plants.

    Science.gov (United States)

    Steinert, Jeannette; Schiml, Simon; Puchta, Holger

    2016-07-01

    This review summarises the recent progress in DSB-induced gene targeting by homologous recombination in plants. We are getting closer to efficiently inserting genes or precisely exchanging single amino acids. Although the basic features of double-strand break (DSB)-induced genome engineering were established more than 20 years ago, only in recent years has the technique come into the focus of plant biologists. Today, most scientists apply the recently discovered CRISPR/Cas system for inducing site-specific DSBs in genes of interest to obtain mutations by non-homologous end joining (NHEJ), which is the prevailing and often imprecise mechanism of DSB repair in somatic plant cells. However, predefined changes like the site-specific insertion of foreign genes or an exchange of single amino acids can be achieved by DSB-induced homologous recombination (HR). Although DSB induction drastically enhances the efficiency of HR, the efficiency is still about two orders of magnitude lower than that of NHEJ. Therefore, significant effort have been put forth to improve DSB-induced HR based technologies. This review summarises the previous studies as well as discusses the most recent developments in using the CRISPR/Cas system to improve these processes for plants.

  5. Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

    Science.gov (United States)

    Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J.; Davenport, Karen W.; Bishop-Lilly, Kimberly A.; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P.; Chain, Patrick S.G.

    2017-01-01

    Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. PMID:27899609

  6. An SVD-based comparison of nine whole eukaryotic genomes supports a coelomate rather than ecdysozoan lineage

    Directory of Open Access Journals (Sweden)

    Stuart Gary W

    2004-12-01

    Full Text Available Abstract Background Eukaryotic whole genome sequences are accumulating at an impressive rate. Effective methods for comparing multiple whole eukaryotic genomes on a large scale are needed. Most attempted solutions involve the production of large scale alignments, and many of these require a high stringency pre-screen for putative orthologs in order to reduce the effective size of the dataset and provide a reasonably high but unknown fraction of correctly aligned homologous sites for comparison. As an alternative, highly efficient methods that do not require the pre-alignment of operationally defined orthologs are also being explored. Results A non-alignment method based on the Singular Value Decomposition (SVD was used to compare the predicted protein complement of nine whole eukaryotic genomes ranging from yeast to man. This analysis resulted in the simultaneous identification and definition of a large number of well conserved motifs and gene families, and produced a species tree supporting one of two conflicting hypotheses of metazoan relationships. Conclusions Our SVD-based analysis of the entire protein complement of nine whole eukaryotic genomes suggests that highly conserved motifs and gene families can be identified and effectively compared in a single coherent definition space for the easy extraction of gene and species trees. While this occurs without the explicit definition of orthologs or homologous sites, the analysis can provide a basis for these definitions.

  7. Comparative BAC-based mapping in the white-throated sparrow, a novel behavioral genomics model, using interspecies overgo hybridization

    Directory of Open Access Journals (Sweden)

    Gonser Rusty A

    2011-06-01

    Full Text Available Abstract Background The genomics era has produced an arsenal of resources from sequenced organisms allowing researchers to target species that do not have comparable mapping and sequence information. These new "non-model" organisms offer unique opportunities to examine environmental effects on genomic patterns and processes. Here we use comparative mapping as a first step in characterizing the genome organization of a novel animal model, the white-throated sparrow (Zonotrichia albicollis, which occurs as white or tan morphs that exhibit alternative behaviors and physiology. Morph is determined by the presence or absence of a complex chromosomal rearrangement. This species is an ideal model for behavioral genomics because the association between genotype and phenotype is absolute, making it possible to identify the genomic bases of phenotypic variation. Findings We initiated a genomic study in this species by characterizing the white-throated sparrow BAC library via filter hybridization with overgo probes designed for the chicken, turkey, and zebra finch. Cross-species hybridization resulted in 640 positive sparrow BACs assigned to 77 chicken loci across almost all macro-and microchromosomes, with a focus on the chromosomes associated with morph. Out of 216 overgos, 36% of the probes hybridized successfully, with an average number of 3.0 positive sparrow BACs per overgo. Conclusions These data will be utilized for determining chromosomal architecture and for fine-scale mapping of candidate genes associated with phenotypic differences. Our research confirms the utility of interspecies hybridization for developing comparative maps in other non-model organisms.

  8. Accelerating genome editing in CHO cells using CRISPR Cas9 and CRISPy, a web-based target finding tool.

    Science.gov (United States)

    Ronda, Carlotta; Pedersen, Lasse Ebdrup; Hansen, Henning Gram; Kallehauge, Thomas Beuchert; Betenbaugh, Michael J; Nielsen, Alex Toftgaard; Kildegaard, Helene Faustrup

    2014-08-01

    Chinese hamster ovary (CHO) cells are widely used in the biopharmaceutical industry as a host for the production of complex pharmaceutical proteins. Thus genome engineering of CHO cells for improved product quality and yield is of great interest. Here, we demonstrate for the first time the efficacy of the CRISPR Cas9 technology in CHO cells by generating site-specific gene disruptions in COSMC and FUT8, both of which encode proteins involved in glycosylation. The tested single guide RNAs (sgRNAs) created an indel frequency up to 47.3% in COSMC, while an indel frequency up to 99.7% in FUT8 was achieved by applying lectin selection. All eight sgRNAs examined in this study resulted in relatively high indel frequencies, demonstrating that the Cas9 system is a robust and efficient genome-editing methodology in CHO cells. Deep sequencing revealed that 85% of the indels created by Cas9 resulted in frameshift mutations at the target sites, with a strong preference for single base indels. Finally, we have developed a user-friendly bioinformatics tool, named "CRISPy" for rapid identification of sgRNA target sequences in the CHO-K1 genome. The CRISPy tool identified 1,970,449 CRISPR targets divided into 27,553 genes and lists the number of off-target sites in the genome. In conclusion, the proven functionality of Cas9 to edit CHO genomes combined with our CRISPy database have the potential to accelerate genome editing and synthetic biology efforts in CHO cells.

  9. Genome-based polymorphic microsatellite development and validation in the mosquito Aedes aegypti and application to population genetics in Haiti

    Directory of Open Access Journals (Sweden)

    Streit Thomas G

    2009-12-01

    Full Text Available Abstract Background Microsatellite markers have proven useful in genetic studies in many organisms, yet microsatellite-based studies of the dengue and yellow fever vector mosquito Aedes aegypti have been limited by the number of assayable and polymorphic loci available, despite multiple independent efforts to identify them. Here we present strategies for efficient identification and development of useful microsatellites with broad coverage across the Aedes aegypti genome, development of multiplex-ready PCR groups of microsatellite loci, and validation of their utility for population analysis with field collections from Haiti. Results From 79 putative microsatellite loci representing 31 motifs identified in 42 whole genome sequence supercontig assemblies in the Aedes aegypti genome, 33 microsatellites providing genome-wide coverage amplified as single copy sequences in four lab strains, with a range of 2-6 alleles per locus. The tri-nucleotide motifs represented the majority (51% of the polymorphic single copy loci, and none of these was located within a putative open reading frame. Seven groups of 4-5 microsatellite loci each were developed for multiplex-ready PCR. Four multiplex-ready groups were used to investigate population genetics of Aedes aegypti populations sampled in Haiti. Of the 23 loci represented in these groups, 20 were polymorphic with a range of 3-24 alleles per locus (mean = 8.75. Allelic polymorphic information content varied from 0.171 to 0.867 (mean = 0.545. Most loci met Hardy-Weinberg expectations across populations and pairwise FST comparisons identified significant genetic differentiation between some populations. No evidence for genetic isolation by distance was observed. Conclusion Despite limited success in previous reports, we demonstrate that the Aedes aegypti genome is well-populated with single copy, polymorphic microsatellite loci that can be uncovered using the strategy developed here for rapid and efficient

  10. SECOM: A novel hash seed and community detection based-approach for genome-scale protein domain identification

    KAUST Repository

    Fan, Ming

    2012-06-28

    With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx. © 2012 Fan et al.

  11. The Micronutrient Genomics Project: A community-driven knowledge base for micronutrient research

    NARCIS (Netherlands)

    Ommen, B. van; El-Sohemy, A.; Hesketh, J.; Kaput, J.; Fenech, M.; Evelo, C.T.; McArdle, H.J.; Bouwman, J.; Lietz, G.; Mathers, J.C.; Fairweather-Tait, S.; Kranen, H. van; Elliott, R.; Wopereis, S.; Ferguson, L.R.; Méplan, C.; Perozzi, G.; Allen, L.; Rivero, D.

    2010-01-01

    Micronutrients influence multiple metabolic pathways including oxidative and inflammatory processes. Optimum micronutrient supply is important for the maintenance of homeostasis in metabolism and, ultimately, for maintaining good health. With advances in systems biology and genomics technologies, it

  12. Innovative Graphite Oxide-Cellulose Based Material Specific for Genomic DNA Extraction

    Science.gov (United States)

    Akceoglu, Garbis Atam; Li, Oi Lun; Saito, Nagahiro

    2015-11-01

    Extraction of genomic DNA from various types of samples is often challenging for commercial silica spin column. In this study, we proposed graphite oxide (GO)/cellulose composite as an alternative material for genomic DNA extraction. The purity of DNA and extraction efficiency were compared to that of commercial silica product. In this study, the total weight % of GO was fixed at 4.15% in GO/Cellulose composite. Chewed gum, nail clip, cigarette bud paper, animal tissue and hair sample were used as various genomic DNA sources for extraction experiments. Among all types of samples, the extraction efficiencies were 4 to 12 times higher than that of commercial silica spin column. The absorbance ratio of 260 nm to 280 nm (A260/A280) of all samples ranged between 1.6 and 2.0. The results demonstrated that GO/Cellulose composites might serve as an innovative solid support material for genomic DNA extraction.

  13. Identification of conserved gene clusters in multiple genomes based on synteny and homology

    Directory of Open Access Journals (Sweden)

    Nikolski Macha

    2011-10-01

    Full Text Available Abstract Background Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on gene teams and homology teams. Results Our primary contribution is a local sliding-window SYNS (SYNtenic teamS algorithm that refines an existing family structure into orthologous sub-families by analyzing the neighborhoods around the members of a given family with a locally sliding window. The neighborhood analysis is done by computing conserved gene clusters. We evaluate our algorithm on the existing homologous families from the Genolevures database over five genomes of the Hemyascomycete phylum. Conclusions The result is an efficient algorithm that works on multiple genomes, considers paralogous copies of genes and is able to uncover orthologous clusters even in distant genomes. Resulting orthologous clusters are comparable to those obtained by manual curation.

  14. The Micronutrient Genomics Project: A community-driven knowledge base for micronutrient research

    NARCIS (Netherlands)

    Ommen, B. van; El-Sohemy, A.; Hesketh, J.; Kaput, J.; Fenech, M.; Evelo, C.T.; McArdle, H.J.; Bouwman, J.; Lietz, G.; Mathers, J.C.; Fairweather-Tait, S.; Kranen, H. van; Elliott, R.; Wopereis, S.; Ferguson, L.R.; Méplan, C.; Perozzi, G.; Allen, L.; Rivero, D.

    2010-01-01

    Micronutrients influence multiple metabolic pathways including oxidative and inflammatory processes. Optimum micronutrient supply is important for the maintenance of homeostasis in metabolism and, ultimately, for maintaining good health. With advances in systems biology and genomics technologies, it

  15. MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement

    Directory of Open Access Journals (Sweden)

    Zhang Liqing

    2010-01-01

    Full Text Available Abstract Background Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model. However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model. Results In this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs, using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs, and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments

  16. CREST maps somatic structural variation in cancer genomes with base-pair resolution.

    Science.gov (United States)

    Wang, Jianmin; Mullighan, Charles G; Easton, John; Roberts, Stefan; Heatley, Sue L; Ma, Jing; Rusch, Michael C; Chen, Ken; Harris, Christopher C; Ding, Li; Holmfeldt, Linda; Payne-Turner, Debbie; Fan, Xian; Wei, Lei; Zhao, David; Obenauer, John C; Naeve, Clayton; Mardis, Elaine R; Wilson, Richard K; Downing, James R; Zhang, Jinghui

    2011-06-12

    We developed 'clipping reveals structure' (CREST), an algorithm that uses next-generation sequencing reads with partial alignments to a reference genome to directly map structural variations at the nucleotide level of resolution. Application of CREST to whole-genome sequencing data from five pediatric T-lineage acute lymphoblastic leukemias (T-ALLs) and a human melanoma cell line, COLO-829, identified 160 somatic structural variations. Experimental validation exceeded 80%, demonstrating that CREST had a high predictive accuracy.

  17. CREST maps somatic structural variation in cancer genomes with base-pair resolution

    OpenAIRE

    2011-01-01

    We developed CREST (Clipping REveals STructure), an algorithm that uses next-generation sequencing reads with partial alignments to a reference genome to directly map structural variations at the nucleotide level of resolution. Application of CREST to whole-genome sequencing data from five pediatric T-lineage acute lymphoblastic leukemias (T-ALLs) and a human melanoma cell line, COLO-829, identified 160 somatic structural variations. Experimental validation exceeded 80% demonstrating that CRE...

  18. Genome-Based Identification of Chromosomal Regions Specific for Salmonella spp.

    OpenAIRE

    Hansen-Wester, Imke; Hensel, Michael

    2002-01-01

    Acquisition of genomic elements by horizontal gene transfer represents an important mechanism in the evolution of bacterial species. Pathogenicity islands are a subset of horizontally acquired elements present in various pathogens. These elements are frequently located adjacent to tRNA genes. We performed a comparative genome analysis of Salmonella enterica serovars Typhi and Typhimurium and Escherichia coli and scanned tRNA loci for the presence of species-specific, horizontally acquired gen...

  19. Supervised Learning-Based tagSNP Selection for Genome-Wide Disease Classifications

    OpenAIRE

    Yang Mary Qu; Chen Zhongxue; Yang Jack; Liu Qingzhong; Sung Andrew H; Huang Xudong

    2008-01-01

    Abstract Background Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information...

  20. Revisiting the classification of curtoviruses based on genome-wide pairwise identity

    KAUST Repository

    Varsani, Arvind

    2014-01-25

    Members of the genus Curtovirus (family Geminiviridae) are important pathogens of many wild and cultivated plant species. Until recently, relatively few full curtovirus genomes have been characterised. However, with the 19 full genome sequences now available in public databases, we revisit the proposed curtovirus species and strain classification criteria. Using pairwise identities coupled with phylogenetic evidence, revised species and strain demarcation guidelines have been instituted. Specifically, we have established 77% genome-wide pairwise identity as a species demarcation threshold and 94% genome-wide pairwise identity as a strain demarcation threshold. Hence, whereas curtovirus sequences with >77% genome-wide pairwise identity would be classified as belonging to the same species, those sharing >94% identity would be classified as belonging to the same strain. We provide step-by-step guidelines to facilitate the classification of newly discovered curtovirus full genome sequences and a set of defined criteria for naming new species and strains. The revision yields three curtovirus species: Beet curly top virus (BCTV), Spinach severe surly top virus (SpSCTV) and Horseradish curly top virus (HrCTV). © 2014 Springer-Verlag Wien.

  1. Marine genomics

    DEFF Research Database (Denmark)

    Oliveira Ribeiro, Ângela Maria; Foote, Andrew D.; Kupczok, Anne

    2017-01-01

    Marine ecosystems occupy 71% of the surface of our planet, yet we know little about their diversity. Although the inventory of species is continually increasing, as registered by the Census of Marine Life program, only about 10% of the estimated two million marine species are known. This lag......-throughput sequencing approaches have been helping to improve our knowledge of marine biodiversity, from the rich microbial biota that forms the base of the tree of life to a wealth of plant and animal species. In this review, we present an overview of the applications of genomics to the study of marine life, from...... evolutionary biology of non-model organisms to species of commercial relevance for fishing, aquaculture and biomedicine. Instead of providing an exhaustive list of available genomic data, we rather set to present contextualized examples that best represent the current status of the field of marine genomics....

  2. Genome databases

    Energy Technology Data Exchange (ETDEWEB)

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  3. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L..

    Directory of Open Access Journals (Sweden)

    Bin Han

    Full Text Available Microsatellites or simple sequence repeats (SSRs are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR; 70,564 (23.9% were found to be monomorphic and 224,703 (76.1% were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3% amplified one locus, 8 (17.8% amplified multiple identical loci, and 13 (28.9% did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising

  4. Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress

    Directory of Open Access Journals (Sweden)

    Benham Craig J

    2006-05-01

    Full Text Available Abstract Background In our previous studies, we found that the sites in prokaryotic genomes which are most susceptible to duplex destabilization under the negative superhelical stresses that occur in vivo are statistically highly significantly associated with intergenic regions that are known or inferred to contain promoters. In this report we investigate how this structural property, either alone or together with other structural and sequence attributes, may be used to search prokaryotic genomes for promoters. Results We show that the propensity for stress-induced DNA duplex destabilization (SIDD is closely associated with specific promoter regions. The extent of destabilization in promoter-containing regions is found to be bimodally distributed. When compared with DNA curvature, deformability, thermostability or sequence motif scores within the -10 region, SIDD is found to be the most informative DNA property regarding promoter locations in the E. coli K12 genome. SIDD properties alone perform better at detecting promoter regions than other programs trained on this genome. Because this approach has a very low false positive rate, it can be used to predict with high confidence the subset of promoters that are strongly destabilized. When SIDD properties are combined with -10 motif scores in a linear classification function, they predict promoter regions with better than 80% accuracy. When these methods were tested with promoter and non-promoter sequences from Bacillus subtilis, they achieved similar or higher accuracies. We also present a strictly SIDD-based predictor for annotating promoter sequences in complete microbial genomes. Conclusion In this report we show that the propensity to undergo stress-induced duplex destabilization (SIDD is a distinctive structural attribute of many prokaryotic promoter sequences. We have developed methods to identify promoter sequences in prokaryotic genomes that use SIDD either as a sole predictor or in

  5. Ebolavirus comparative genomics

    DEFF Research Database (Denmark)

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat

    2015-01-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms...

  6. Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

    Directory of Open Access Journals (Sweden)

    Bonten Marc JM

    2010-04-01

    Full Text Available Abstract Background The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. Results We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI, which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Conclusions Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come.

  7. Isotope-based medical research in the post genome era: Gene-orchestrated life functions in medicine seen and affected by isotopes. Workshop report

    Energy Technology Data Exchange (ETDEWEB)

    Feinendegen, L.E.

    1997-12-31

    The US Department of Energy (DOE) and the National Institutes of Health (NIH) conducted a workshop on Isotope-Based Medical Research in the Post Genome Era at NIH, Bethesda, Maryland, November 12--14, 1997. The workshop aimed at identifying the role of stable and radioisotopes for advanced diagnosis and therapy of a wide range of illnesses using the new information that comes from the human genome program. In this sense, the agenda addressed the challenge of functional genomics in humans. The workshop addressed: functional genomics in clinical medicine; new diagnostic potentials; new therapy potentials; challenge to tracer- and effector-pharmaceutical chemistry; and project plans for joint ventures.

  8. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA

  9. Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera) Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

    Science.gov (United States)

    2010-01-01

    Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS) were predicted by in silico analysis of the grapevine (Vitis vinifera) genome assembly [1]. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information about gene structure and

  10. Controlling inbreeding and maximizing genetic gain using semi-definite programming with pedigree-based and genomic relationships.

    Science.gov (United States)

    Schierenbeck, S; Pimentel, E C G; Tietze, M; Körte, J; Reents, R; Reinhardt, F; Simianer, H; König, S

    2011-12-01

    Because of the relatively high levels of genetic relationships among potential bull sires and bull dams, innovative selection tools should consider both genetic gain and genetic relationships in a long-term perspective. Optimum genetic contribution theory using official estimated breeding values for a moderately heritable trait (production index, Index-PROD), and a lowly heritable functional trait (index for somatic cell score, Index-SCS) was applied to find optimal allocations of bull dams and bull sires. In contrast to previous practical applications using optimizations based on Lagrange multipliers, we focused on semi-definite programming (SDP). The SDP methodology was combined with either pedigree (a(ij)) or genomic relationships (f(ij)) among selection candidates. Selection candidates were 484 genotyped bulls, and 499 preselected genotyped bull dams completing a central test on station. In different scenarios separately for PROD and SCS, constraints on the average pedigree relationships among future progeny were varied from a(ij)=0.08 to a(ij)=0.20 in increments of 0.01. Corresponding constraints for single nucleotide polymorphism-based kinship coefficients were derived from regression analysis. Applying the coefficient of 0.52 with an intercept of 0.14 estimated for the regression pedigree relationship on genomic relationship, the corresponding range to alter genomic relationships varied from f(ij) = 0.18 to f(ij) = 0.24. Despite differences for some bulls in genomic and pedigree relationships, the same trends were observed for constraints on pedigree and corresponding genomic relationships regarding results in genetic gain and achieved coefficients of relationships. Generally, allowing higher values for relationships resulted in an increase of genetic gain for Index-PROD and Index-SCS and in a reduction in the number of selected sires. Interestingly, more sires were selected for all scenarios when restricting genomic relationships compared with restricting

  11. Whole genome semiconductor based sequencing of farmed European sea bass (Dicentrarchus labrax) Mediterranean genetic stocks using a DNA pooling approach.

    Science.gov (United States)

    Bertolini, Francesca; Geraci, Claudia; Schiavo, Giuseppina; Sardina, Maria Teresa; Chiofalo, Vincenzo; Fontanesi, Luca

    2016-08-01

    European sea bass (Dicentrarchus labrax) is an important marine species for commercial and sport fisheries and aquaculture production. Recently, the European sea bass genome has been sequenced and assembled. This resource can open new opportunities to evaluate and monitor variability and identify variants that could contribute to the adaptation to farming conditions. In this work, two DNA pools constructed from cultivated European sea bass were sequenced using a next generation semiconductor sequencing approach based on Ion Proton sequencer. Using the first draft version of the D. labrax genome as reference, sequenced reads obtained a total of about 1.6 million of single nucleotide polymorphisms (SNPs), spread all over the chromosomes. Transition/transversion (Ti/Tv) was equal to 1.28, comparable to what was already reported in Salmon species. A pilot homozygosity analysis across the D. labrax genome using DNA pool sequence datasets indicated that this approach can identify chromosome regions with putative signatures of selection, including genes involved in ion transport and chloride channel functions, amino acid metabolism and circadian clock and related neurological systems. This is the first study that reported genome wide polymorphisms in a fish species obtained with the Ion Proton sequencer. Moreover, this study provided a methodological approach for selective sweep analysis in this species.

  12. Development of a 10,000 locus genetic map of the sunflower genome based on multiple crosses.

    Science.gov (United States)

    Bowers, John E; Bachlava, Eleni; Brunick, Robert L; Rieseberg, Loren H; Knapp, Steven J; Burke, John M

    2012-07-01

    Genetic linkage maps have the potential to facilitate the genetic dissection of complex traits and comparative analyses of genome structure, as well as molecular breeding efforts in species of agronomic importance. Until recently, the majority of such maps was based on relatively low-throughput marker technologies, which limited marker density across the genome. The availability of high-throughput genotyping technologies has, however, made possible the efficient development of high-density genetic maps. Here, we describe the analysis and integration of genotypic data from four sunflower (Helianthus annuus L.) mapping populations to produce a consensus linkage map of the sunflower genome. Although the individual maps (which contained 3500-5500 loci each) were highly colinear, we observed localized variation in recombination rates in several genomic regions. We also observed several gaps up to 26 cM in length that completely lacked mappable markers in individual crosses, presumably due to regions of identity by descent in the mapping parents. Because these regions differed by cross, the consensus map of 10,080 loci contained no such gaps, clearly illustrating the value of simultaneously analyzing multiple mapping populations.

  13. dbCRY: a Web-based comparative and evolutionary genomics platform for blue-light receptors.

    Science.gov (United States)

    Kim, Yong-Min; Choi, Jaeyoung; Lee, Hye-Young; Lee, Gir-Won; Lee, Yong-Hwan; Choi, Doil

    2014-01-01

    Cryptochromes are flavoproteins that play a central role in the circadian oscillations of all living organisms except archaea. Cryptochromes are clustered into three subfamilies: plant-type cryptochromes, animal-type cryptochromes and cryptochrome-DASH proteins. These subfamilies are composed of photolyase/cryptochrome superfamily with 6-4 photolyase and cyclobutane pyrimidine dimer photolyase. Cryptochromes have conserved domain architectures with two distinct domains, an N-terminal photolyase-related domain and a C-terminal domain. Although the molecular function and domain architecture of cryptochromes are conserved, their molecular mechanisms differ between plants and animals. Thus, cryptochromes are one of the best candidates for comparative and evolutionary studies. Here, we have developed a Web-based platform for comparative and evolutionary studies of cryptochromes, dbCRY (http://www.dbcryptochrome.org/). A pipeline built upon the consensus domain profile was applied to 1438 genomes and identified 1309 genes. To support comparative and evolutionary genomics studies, the Web interface provides diverse functions such as (i) browsing by species, (ii) protein domain analysis, (iii) multiple sequence alignment, (iv) homology search and (v) extended analysis opportunities through the implementation of 'Favorite Browser' powered by the Comparative Fungal Genomics Platform 2.0 (CFGP 2.0; http://cfgp.snu.ac.kr/). dbCRY would serve as a standardized and systematic solution for cryptochrome genomics studies. Database URL: http://www.dbcryptochrome.org/

  14. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy.

    Science.gov (United States)

    Zuo, Guanghong; Hao, Bailin

    2015-10-01

    A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements.

  15. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy

    Directory of Open Access Journals (Sweden)

    Guanghong Zuo

    2015-10-01

    Full Text Available A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements.

  16. Genomics of Sorghum

    OpenAIRE

    PATERSON, ANDREW H

    2008-01-01

    Sorghum (Sorghum bicolor (L.) Moench) is a subject of plant genomics research based on its importance as one of the world's leading cereal crops, a biofuels crop of high and growing importance, a progenitor of one of the world's most noxious weeds, and a botanical model for many tropical grasses with complex genomes. A rich history of genome analysis, culminating in the recent complete sequencing of the genome of a leading inbred, provides a foundation for invigorating progress toward relatin...

  17. Genomic DNA pooling strategy for next-generation sequencing-based rare variant discovery in abdominal aortic aneurysm regions of interest-challenges and limitations

    NARCIS (Netherlands)

    Harakalova, M.; Nijman, I.J.; Medic, J.; Mokry, M.; Renkens, I.; Blankensteijn, J.D.; Kloosterman, W.P.; Baas, A.F.; Cuppen, E.

    2011-01-01

    The costs and efforts for sample preparation of hundreds of individuals, their genomic enrichment for regions of interest, and sufficient deep sequencing bring a significant burden to next-generation sequencing-based experiments. We investigated whether pooling of samples at the level of genomic DNA

  18. Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis.

    NARCIS (Netherlands)

    Hehir-Kwa, J.Y.; Egmont-Peterson, M.; Janssen, I.M.; Smeets, D.F.C.M.; Geurts van Kessel, A.H.M.; Veltman, J.A.

    2007-01-01

    Recently, comparative genomic hybridization onto bacterial artificial chromosome (BAC) arrays (array-based comparative genomic hybridization) has proved to be successful for the detection of submicroscopic DNA copy-number variations in health and disease. Technological improvements to achieve a high

  19. A genome-wide tree- and forest-based association analysis of comorbidity of alcoholism and smoking

    OpenAIRE

    Ye, Yuanqing; Zhong, Xiaoyun; Zhang, Heping

    2005-01-01

    Genetic mechanisms underlying alcoholism are complex. Understanding the etiology of alcohol dependence and its comorbid conditions such as smoking is important because of the significant health concerns. In this report, we describe a method based on classification trees and deterministic forests for association studies to perform a genome-wide joint association analysis of alcoholism and smoking. This approach is used to analyze the single-nucleotide polymorphism data from the Collaborative S...

  20. Family-based Genome-wide Association Study of Frontal Theta Oscillations Identifies Potassium Channel Gene KCNJ6

    OpenAIRE

    Kang, Sun J.; Rangaswamy, Madhavi; Manz, Niklas; Wang, Jen-Chyong; Wetherill, Leah; Hinrichs, Tony; Almasy, Laura; Brooks, Andy; Chorlian, David B.; Dick, Danielle; Hesselbrock, Victor; Kramer, John; Kuperman, Sam; Nurnberger, John,; Rice, John

    2012-01-01

    Event-related oscillations (EROs) represent highly heritable neuroelectric correlates of cognitive processes that manifest deficits in alcoholics and in offspring at high risk to develop alcoholism. Theta ERO to targets in the visual oddball task has been shown to be an endophenotype for alcoholism. A family-based genome-wide association study was performed for the frontal theta ERO phenotype using 634583 autosomal single nucleotide polymorphisms (SNPs) genotyped in 1560 family members from 1...

  1. Evidence-based annotation of the malaria parasite's genome using comparative expression profiling.

    Directory of Open Access Journals (Sweden)

    Yingyao Zhou

    Full Text Available A fundamental problem in systems biology and whole genome sequence analysis is how to infer functions for the many uncharacterized proteins that are identified, whether they are conserved across organisms of different phyla or are phylum-specific. This problem is especially acute in pathogens, such as malaria parasites, where genetic and biochemical investigations are likely to be more difficult. Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages. We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms. We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function. These analyses, together with protein-protein interaction data, provide probabilistic models that predict the function of 926 uncharacterized malaria genes and also suggest that malaria parasites may provide a simple model system for the study of some human processes. These data also provide a foundation for further studies of transcriptional regulation in malaria parasites.

  2. A network-based approach to prioritize results from genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Nirmala Akula

    Full Text Available Genome-wide association studies (GWAS are a valuable approach to understanding the genetic basis of complex traits. One of the challenges of GWAS is the translation of genetic association results into biological hypotheses suitable for further investigation in the laboratory. To address this challenge, we introduce Network Interface Miner for Multigenic Interactions (NIMMI, a network-based method that combines GWAS data with human protein-protein interaction data (PPI. NIMMI builds biological networks weighted by connectivity, which is estimated by use of a modification of the Google PageRank algorithm. These weights are then combined with genetic association p-values derived from GWAS, producing what we call 'trait prioritized sub-networks.' As a proof of principle, NIMMI was tested on three GWAS datasets previously analyzed for height, a classical polygenic trait. Despite differences in sample size and ancestry, NIMMI captured 95% of the known height associated genes within the top 20% of ranked sub-networks, far better than what could be achieved by a single-locus approach. The top 2% of NIMMI height-prioritized sub-networks were significantly enriched for genes involved in transcription, signal transduction, transport, and gene expression, as well as nucleic acid, phosphate, protein, and zinc metabolism. All of these sub-networks were ranked near the top across all three height GWAS datasets we tested. We also tested NIMMI on a categorical phenotype, Crohn's disease. NIMMI prioritized sub-networks involved in B- and T-cell receptor, chemokine, interleukin, and other pathways consistent with the known autoimmune nature of Crohn's disease. NIMMI is a simple, user-friendly, open-source software tool that efficiently combines genetic association data with biological networks, translating GWAS findings into biological hypotheses.

  3. Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs.

    Directory of Open Access Journals (Sweden)

    Adam H Freedman

    2016-03-01

    Full Text Available Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.

  4. New vaccine design based on defective genomes that combines features of attenuated and inactivated vaccines.

    Directory of Open Access Journals (Sweden)

    Teresa Rodríguez-Calvo

    Full Text Available BACKGROUND: New vaccine designs are needed to control diseases associated with antigenically variable RNA viruses. Foot-and-mouth disease (FMD is a highly contagious disease of livestock that inflicts severe economic losses. Although the current whole-virus chemically inactivated vaccine has proven effective, it has led to new outbreaks of FMD because of incomplete inactivation of the virus or the escape of infectious virus from vaccine production premises. We have previously shown that serial passages of FMD virus (FMDV C-S8c1 at high multiplicity of infection in cell culture resulted in virus populations consisting of defective genomes that are infectious by complementation (termed C-S8p260. PRINCIPAL FINDING: Here we evaluate the immunogenicity of C-S8p260, first in a mouse model system to establish a proof of principle, and second, in swine, the natural host of FMDV C-S8c1. Mice were completely protected against a lethal challenge with FMDV C-S8c1, after vaccination with a single dose of C-S8p260. Pigs immunized with different C-S8p260 doses and challenged with FMDV C-S8c1 either did not develop any clinical signs or showed delayed and mild disease symptoms. C-S8p260 induced high titers of both FMDV-specific, neutralizing antibodies and activated FMDV-specific T cells in swine, that correlated with solid protection against FMDV. CONCLUSIONS: The defective virus-based vaccine did not produce detectable levels of transmissible FMDV. Therefore, a segmented, replication-competent form of a virus, such as FMDV C-S8p260, can provide the basis of a new generation of attenuated antiviral vaccines with two safety barriers. The design can be extended to any viral pathogen that encodes trans-acting gene products, allowing complementation between replication-competent, defective forms.

  5. New vaccine design based on defective genomes that combines features of attenuated and inactivated vaccines.

    Science.gov (United States)

    Rodríguez-Calvo, Teresa; Ojosnegros, Samuel; Sanz-Ramos, Marta; García-Arriaza, Juan; Escarmís, Cristina; Domingo, Esteban; Sevilla, Noemí

    2010-04-29

    New vaccine designs are needed to control diseases associated with antigenically variable RNA viruses. Foot-and-mouth disease (FMD) is a highly contagious disease of livestock that inflicts severe economic losses. Although the current whole-virus chemically inactivated vaccine has proven effective, it has led to new outbreaks of FMD because of incomplete inactivation of the virus or the escape of infectious virus from vaccine production premises. We have previously shown that serial passages of FMD virus (FMDV) C-S8c1 at high multiplicity of infection in cell culture resulted in virus populations consisting of defective genomes that are infectious by complementation (termed C-S8p260). Here we evaluate the immunogenicity of C-S8p260, first in a mouse model system to establish a proof of principle, and second, in swine, the natural host of FMDV C-S8c1. Mice were completely protected against a lethal challenge with FMDV C-S8c1, after vaccination with a single dose of C-S8p260. Pigs immunized with different C-S8p260 doses and challenged with FMDV C-S8c1 either did not develop any clinical signs or showed delayed and mild disease symptoms. C-S8p260 induced high titers of both FMDV-specific, neutralizing antibodies and activated FMDV-specific T cells in swine, that correlated with solid protection against FMDV. The defective virus-based vaccine did not produce detectable levels of transmissible FMDV. Therefore, a segmented, replication-competent form of a virus, such as FMDV C-S8p260, can provide the basis of a new generation of attenuated antiviral vaccines with two safety barriers. The design can be extended to any viral pathogen that encodes trans-acting gene products, allowing complementation between replication-competent, defective forms.

  6. Inverse PCR-based method for isolating novel SINEs from genome.

    Science.gov (United States)

    Han, Yawei; Chen, Liping; Guan, Lihong; He, Shunping

    2014-04-01

    Short interspersed elements (SINEs) are moderately repetitive DNA sequences in eukaryotic genomes. Although eukaryotic genomes contain numerous SINEs copy, it is very difficult and laborious to isolate and identify them by the reported methods. In this study, the inverse PCR was successfully applied to isolate SINEs from Opsariichthys bidens genome in Eastern Asian Cyprinid. A group of SINEs derived from tRNA(Ala) molecular had been identified, which were named Opsar according to Opsariichthys. SINEs characteristics were exhibited in Opsar, which contained a tRNA(Ala)-derived region at the 5' end, a tRNA-unrelated region, and AT-rich region at the 3' end. The tRNA-derived region of Opsar shared 76 % sequence similarity with tRNA(Ala) gene. This result indicated that Opsar could derive from the inactive or pseudogene of tRNA(Ala). The reliability of method was tested by obtaining C-SINE, Ct-SINE, and M-SINEs from Ctenopharyngodon idellus, Megalobrama amblycephala, and Cyprinus carpio genomes. This method is simpler than the previously reported, which successfully omitted many steps, such as preparation of probes, construction of genomic libraries, and hybridization.

  7. GenoMatrix: A Software Package for Pedigree-Based and Genomic Prediction Analyses on Complex Traits.

    Science.gov (United States)

    Nazarian, Alireza; Gezan, Salvador Alejandro

    2016-07-01

    Genomic and pedigree-based best linear unbiased prediction methodologies (G-BLUP and P-BLUP) have proven themselves efficient for partitioning the phenotypic variance of complex traits into its components, estimating the individuals' genetic merits, and predicting unobserved (or yet-to-be observed) phenotypes in many species and fields of study. The GenoMatrix software, presented here, is a user-friendly package to facilitate the process of using genome-wide marker data and parentage information for G-BLUP and P-BLUP analyses on complex traits. It provides users with a collection of applications which help them on a set of tasks from performing quality control on data to constructing and manipulating the genomic and pedigree-based relationship matrices and obtaining their inverses. Such matrices will be then used in downstream analyses by other statistical packages. The package also enables users to obtain predicted values for unobserved individuals based on the genetic values of observed related individuals. GenoMatrix is available to the research community as a Windows 64bit executable and can be downloaded free of charge at: http://compbio.ufl.edu/software/genomatrix/. © The American Genetic Association. 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. GLIDERS - A web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs

    Directory of Open Access Journals (Sweden)

    Broxholme John

    2009-10-01

    Full Text Available Abstract Background A number of tools for the examination of linkage disequilibrium (LD patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb. We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine that enables the retrieval of pairwise associations with r2 ≥ 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers. Description GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range. The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%, distance limits between SNPs (minimum and maximum, r2 (0.3 to 1, HapMap population sample (CEU, YRI and JPT+CHB combined and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file. Conclusion GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.

  9. Genome-scale comparison and constraint-based metabolic reconstruction of the facultative anaerobic Fe(III-reducer Rhodoferax ferrireducens

    Directory of Open Access Journals (Sweden)

    Daugherty Sean

    2009-09-01

    Full Text Available Abstract Background Rhodoferax ferrireducens is a metabolically versatile, Fe(III-reducing, subsurface microorganism that is likely to play an important role in the carbon and metal cycles in the subsurface. It also has the unique ability to convert sugars to electricity, oxidizing the sugars to carbon dioxide with quantitative electron transfer to graphite electrodes in microbial fuel cells. In order to expand our limited knowledge about R. ferrireducens, the complete genome sequence of this organism was further annotated and then the physiology of R. ferrireducens was investigated with a constraint-based, genome-scale in silico metabolic model and laboratory studies. Results The iterative modeling and experimental approach unveiled exciting, previously unknown physiological features, including an expanded range of substrates that support growth, such as cellobiose and citrate, and provided additional insights into important features such as the stoichiometry of the electron transport chain and the ability to grow via fumarate dismutation. Further analysis explained why R. ferrireducens is unable to grow via photosynthesis or fermentation of sugars like other members of this genus and uncovered novel genes for benzoate metabolism. The genome also revealed that R. ferrireducens is well-adapted for growth in the subsurface because it appears to be capable of dealing with a number of environmental insults, including heavy metals, aromatic compounds, nutrient limitation and oxidative stress. Conclusion This study demonstrates that combining genome-scale modeling with the annotation of a new genome sequence can guide experimental studies and accelerate the understanding of the physiology of under-studied yet environmentally relevant microorganisms.

  10. Final Report: Transport and its regulation in Marine Microorganisms: A Genomic Based Approach

    Energy Technology Data Exchange (ETDEWEB)

    Brian Palenik; Bianca Brahamsha; Ian Paulsen

    2009-09-03

    This grant funded the analysis and annotation of the genomes of Synechococcus and Ostreococcus, major marine primary producers. Particular attention was paid to the analysis of transporters using state of the art bioinformatics analyses. During the analysis of the Synechococcus genome, some of the components of the unique bacterial swimming apparatus of one species of Synechococcus (Clade III, strain WH8102) were determined and these included transporters, novel giant proteins and glycosyltransferases. This grant funded the analysis of gene expression in Synechococcus using whole genome microarrays. These analyses revealed the strategies by which marine cyanobacteria respond to environmental conditions such as the absence of phosphorus, a common limiting nutrient, and the interaction of Synechococcus with other microbes. These analyses will help develop models of gene regulation in cyanobacteria and thus help predict their responses to changes in environmental conditions.

  11. Development of Highly Informative Genome-Wide Single Sequence Repeat Markers for Breeding Applications in Sesame and Construction of a Web Resource: SisatBase

    Science.gov (United States)

    Dossa, Komivi; Yu, Jingyin; Liao, Boshou; Cisse, Ndiaga; Zhang, Xiurong

    2017-01-01

    The sequencing of the full nuclear genome of sesame (Sesamum indicum L.) provides the platform for functional analyses of genome components and their application in breeding programs. Although the importance of microsatellites markers or simple sequence repeats (SSR) in crop genotyping, genetics, and breeding applications is well established, only a little information exist concerning SSRs at the whole genome level in sesame. In addition, SSRs represent a suitable marker type for sesame molecular breeding in developing countries where it is mainly grown. In this study, we identified 138,194 genome-wide SSRs of which 76.5% were physically mapped onto the 13 pseudo-chromosomes. Among these SSRs, up to three primers pairs were supplied for 101,930 SSRs and used to in silico amplify the reference genome together with two newly sequenced sesame accessions. A total of 79,957 SSRs (78%) were polymorphic between the three genomes thereby suggesting their promising use in different genomics-assisted breeding applications. From these polymorphic SSRs, 23 were selected and validated to have high polymorphic potential in 48 sesame accessions from different growing areas of Africa. Furthermore, we have developed an online user-friendly database, SisatBase (http://www.sesame-bioinfo.org/SisatBase/), which provides free access to SSRs data as well as an integrated platform for functional analyses. Altogether, the reference SSR and SisatBase would serve as useful resources for genetic assessment, genomic studies, and breeding advancement in sesame, especially in developing countries. PMID:28878802

  12. Development of Highly Informative Genome-Wide Single Sequence Repeat Markers for Breeding Applications in Sesame and Construction of a Web Resource: SisatBase

    Directory of Open Access Journals (Sweden)

    Komivi Dossa

    2017-08-01

    Full Text Available The sequencing of the full nuclear genome of sesame (Sesamum indicum L. provides the platform for functional analyses of genome components and their application in breeding programs. Although the importance of microsatellites markers or simple sequence repeats (SSR in crop genotyping, genetics, and breeding applications is well established, only a little information exist concerning SSRs at the whole genome level in sesame. In addition, SSRs represent a suitable marker type for sesame molecular breeding in developing countries where it is mainly grown. In this study, we identified 138,194 genome-wide SSRs of which 76.5% were physically mapped onto the 13 pseudo-chromosomes. Among these SSRs, up to three primers pairs were supplied for 101,930 SSRs and used to in silico amplify the reference genome together with two newly sequenced sesame accessions. A total of 79,957 SSRs (78% were polymorphic between the three genomes thereby suggesting their promising use in different genomics-assisted breeding applications. From these polymorphic SSRs, 23 were selected and validated to have high polymorphic potential in 48 sesame accessions from different growing areas of Africa. Furthermore, we have developed an online user-friendly database, SisatBase (http://www.sesame-bioinfo.org/SisatBase/, which provides free access to SSRs data as well as an integrated platform for functional analyses. Altogether, the reference SSR and SisatBase would serve as useful resources for genetic assessment, genomic studies, and breeding advancement in sesame, especially in developing countries.

  13. Microalterations of inherently unstable genomic regions in rat mammary carcinomas as revealed by long oligonucleotide array-based comparative genomic hybridization

    NARCIS (Netherlands)

    Adamovic, T.; McAllister, D.; Guryev, V.; Wang, X.; Andrae, J.W.; Cuppen, E.; Jacob, H.; Sugg, S.L.

    2009-01-01

    The presence of copy number variants in normal genomes poses a challenge to identify small genuine somatic copy number changes in high-resolution cancer genome profiling studies due to the use of unpaired reference DNA. Another problem is the well-known rearrangements of immunoglobulin and T-cell

  14. Microalterations of inherently unstable genomic regions in rat mammary carcinomas as revealed by long oligonucleotide array-based comparative genomic hybridization

    NARCIS (Netherlands)

    Adamovic, T.; McAllister, D.; Guryev, V.; Wang, X.; Andrae, J.W.; Cuppen, E.; Jacob, H.; Sugg, S.L.

    2009-01-01

    The presence of copy number variants in normal genomes poses a challenge to identify small genuine somatic copy number changes in high-resolution cancer genome profiling studies due to the use of unpaired reference DNA. Another problem is the well-known rearrangements of immunoglobulin and T-cell re

  15. Microalterations of Inherently Unstable Genomic Regions in Rat Mammary Carcinomas as Revealed by Long Oligonucleotide Array-Based Comparative Genomic Hybridization

    NARCIS (Netherlands)

    Adamovic, Tatjana; McAllister, Donna; Guryev, Victor; Wang, Xujing; Andrae, Jaime Wendt; Cuppen, Edwin; Jacob, Howard J.; Sugg, Sonia L.

    2009-01-01

    The presence of copy number variants in normal genomes poses a challenge to identify small genuine somatic copy number changes in high-resolution cancer genome profiling studies due to the use of unpaired reference DNA. Another problem is the well-known rearrangements of immunoglobulin and T-cell re

  16. Phylogenetic Relationships of the Fern Cyrtomium falcatum (Dryopteridaceae) from Dokdo Island Based on Chloroplast Genome Sequencing.

    Science.gov (United States)

    Raman, Gurusamy; Choi, Kyoung Su; Park, SeonJoo

    2016-12-02

    Cyrtomium falcatum is a popular ornamental fern cultivated worldwide. Native to the Korean Peninsula, Japan, and Dokdo Island in the Sea of Japan, it is the only fern present on Dokdo Island. We isolated and characterized the chloroplast (cp) genome of C. falcatum, and compared it with those of closely related species. The genes trnV-GAC and trnV-GAU were found to be present within the cp genome of C. falcatum, whereas trnP-GGG and rpl21 were lacking. Moreover, cp genomes of Cyrtomium devexiscapulae and Adiantum capillus-veneris lack trnP-GGG and rpl21, suggesting these are not conserved among angiosperm cp genomes. The deletion of trnR-UCG, trnR-CCG, and trnSeC in the cp genomes of C. falcatum and other eupolypod ferns indicates these genes are restricted to tree ferns, non-core leptosporangiates, and basal ferns. The C. falcatum cp genome also encoded ndhF and rps7, with GUG start codons that were only conserved in polypod ferns, and it shares two significant inversions with other ferns, including a minor inversion of the trnD-GUC region and an approximate 3 kb inversion of the trnG-trnT region. Phylogenetic analyses showed that Equisetum was found to be a sister clade to Psilotales-Ophioglossales with a 100% bootstrap (BS) value. The sister relationship between Pteridaceae and eupolypods was also strongly supported by a 100% BS, but Bayesian molecular clock analyses suggested that C. falcatum diversified in the mid-Paleogene period (45.15 ± 4.93 million years ago) and might have moved from Eurasia to Dokdo Island.

  17. Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17% of the genetic variance among lines in females (males, the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

  18. Discovering Genome-Wide Tag SNPs Based on the Mutual Information of the Variants

    Science.gov (United States)

    Elmas, Abdulkadir; Ou Yang, Tai-Hsien; Wang, Xiaodong

    2016-01-01

    Exploring linkage disequilibrium (LD) patterns among the single nucleotide polymorphism (SNP) sites can improve the accuracy and cost-effectiveness of genomic association studies, whereby representative (tag) SNPs are identified to sufficiently represent the genomic diversity in populations. There has been considerable amount of effort in developing efficient algorithms to select tag SNPs from the growing large-scale data sets. Methods using the classical pairwise-LD and multi-locus LD measures have been proposed that aim to reduce the computational complexity and to increase the accuracy, respectively. The present work solves the tag SNP selection problem by efficiently balancing the computational complexity and accuracy, and improves the coverage in genomic diversity in a cost-effective manner. The employed algorithm makes use of mutual information to explore the multi-locus association between SNPs and can handle different data types and conditions. Experiments with benchmark HapMap data sets show comparable or better performance against the state-of-the-art algorithms. In particular, as a novel application, the genome-wide SNP tagging is performed in the 1000 Genomes Project data sets, and produced a well-annotated database of tagging variants that capture the common genotype diversity in 2,504 samples from 26 human populations. Compared to conventional methods, the algorithm requires as input only the genotype (or haplotype) sequences, can scale up to genome-wide analyses, and produces accurate solutions with more information-rich output, providing an improved platform for researchers towards the subsequent association studies. PMID:27992465

  19. [Genomics innovative teaching pattern based upon amalgamation between modern educational technology and constructivism studying theory].

    Science.gov (United States)

    Liang, Xu-Fang; Peng, Jing; Zhou, Tian-Hong

    2007-04-01

    In order to overcome various malpractices in the traditional teaching methods, and also as part of the Guangdong province molecular biology perfect course project, some reforms were carried out to the teaching pattern of genomics. The reforms include using the foreign original teaching materials, bilingual teaching, as well as taking the constructivism-directed discussion teaching method and the multimedia computer-assisted instruction. To improve the scoring way and the laboratory course of the subject, we carried on a multiplex inspection systems and a self-designing experiments. Through the teaching reform on Genomics, we have gradually consummated the construction of molecular biology curriculum system.

  20. New approach for phylogenetic tree recovery based on genome-scale metabolic networks.

    Science.gov (United States)

    Gamermann, Daniel; Montagud, Arnaud; Conejero, J Alberto; Urchueguía, Javier F; de Córdoba, Pedro Fernández

    2014-07-01

    A wide range of applications and research has been done with genome-scale metabolic models. In this work, we describe an innovative methodology for comparing metabolic networks constructed from genome-scale metabolic models and how to apply this comparison in order to infer evolutionary distances between different organisms. Our methodology allows a quantification of the metabolic differences between different species from a broad range of families and even kingdoms. This quantification is then applied in order to reconstruct phylogenetic trees for sets of various organisms.

  1. Rapid enrichment of leucocytes and genomic DNA from blood based on bifunctional core shell magnetic nanoparticles

    Science.gov (United States)

    Xie, Xin; Nie, Xiaorong; Yu, Bingbin; Zhang, Xu

    2007-04-01

    A series of protocols are proposed to extract genomic DNA from whole blood at different scales using carboxyl-functionalized magnetic nanoparticles as solid-phase absorbents. The enrichment of leucocytes and the adsorption of genomic DNA can be achieved with the same carboxyl-functionalized magnetic nanoparticles. The DNA bound to the bead surfaces can be used directly as PCR templates. By coupling cell separation and DNA purification, the whole operation can be accomplished in a few minutes. Our simplified protocols proved to be rapid, low cost, and biologically and chemically non-hazardous, and are therefore promising for microfabrication of a DNA-preparation chip and routine laboratory use.

  2. CRISPR-Cas9 Based Genome Engineering: Opportunities in Agri-Food-Nutrition and Healthcare.

    Science.gov (United States)

    Rajendran, Subin Raj Cheri Kunnumal; Yau, Yuan-Yeu; Pandey, Dinesh; Kumar, Anil

    2015-05-01

    Recently developed strategies and techniques that make use of the vast amount of genetic information to perform targeted perturbations in the genome of living organisms are collectively referred to as genome engineering. The wide array of applications made possible by the use of this technology range from agriculture to healthcare. This, along with the applications involving basic biological research, has made it a very dynamic and active field of research. This review focuses on the CRISPR system from its discovery and role in bacterial adaptive immunity to the most recent developments, and its possible applications in agriculture and modern medicine.

  3. Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel® Xeon Phi™ Coprocessor.

    Science.gov (United States)

    Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas

    2015-01-01

    Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.

  4. The pan-genome: towards a knowledge-based discovery of novel targets for vaccines and antibacterials.

    Science.gov (United States)

    Muzzi, Alessandro; Masignani, Vega; Rappuoli, Rino

    2007-06-01

    During the past decade, sequencing of the entire genome of pathogenic bacteria has become a widely used practice in microbiology research. More recently, sequence data from multiple isolates of a single pathogen have provided new insights into the microevolution of a species as well as helping researchers to decipher its virulence mechanisms. The comparison of multiple strains of a single species has resulted in the definition of the species pan-genome, as a measure of the total gene repertoire that can pertain to a given microorganism. This concept can be exploited not only to study the diversity of a species, but also, as we discuss here, to provide the opportunity to use a knowledge-based approach for the development of novel vaccine candidates and new-generation targets for antimicrobials.

  5. An overview of the BioExtract Server: a distributed, Web-based system for genomic analysis.

    Science.gov (United States)

    Lushbough, C M; Brendel, V P

    2010-01-01

    Genome research is becoming increasingly dependent on access to multiple, distributed data sources, and bioinformatic tools. The importance of integration across distributed databases and Web services will continue to grow as the number of requisite resources expands. Use of bioinformatic workflows has seen considerable growth in recent years as scientific research becomes increasingly dependent on the analysis of large sets of data and the use of distributed resources. The BioExtract Server (http://bioextract.org) is a Web-based system designed to aid researchers in the analysis of distributed genomic data by providing a platform to facilitate the creation of bioinformatic workflows. Scientific workflows are created within the system by recording the analytic tasks preformed by researchers. These steps may include querying multiple data sources, saving query results as searchable data extracts, and executing local and Web-accessible analytic tools. The series of recorded tasks can be saved as a computational workflow simply by providing a name and description.

  6. New traits in crops produced by genome editing techniques based on deletions

    NARCIS (Netherlands)

    Wiel, van de C.C.M.; Schaart, J.G.; Lotz, L.A.P.; Smulders, M.J.M.

    2017-01-01

    One of the most promising New Plant Breeding Techniques is genome editing (also called gene editing) with the help of a programmable site-directed nuclease (SDN). In this review, we focus on SDN-1, which is the generation of small deletions or insertions (indels) at a precisely defined location in t

  7. Meiotic homoeologous recombination-based alien gene introgression in the genomics era of wheat

    Science.gov (United States)

    Wheat (Triticum spp.) has a narrow genetic basis due to its allopolyploid origin. However, wheat has numerous wild relatives usable for expanding genetic variability of its genome through meiotic homoeologous recombination. Traditionally, laborious cytological analyses have been employed to detect h...

  8. Genomic epidemiology of Salmonella enterica serotype Enteritidis based on population structure of prevalent lineages

    Science.gov (United States)

    Salmonella enterica serotype Enteritidis (SE) is one of the most commonly reported causes of human salmonellosis. The low genetic diversity of SE measured by fingerprinting methods has made subtyping a challenge. In this study, we used whole genome sequencing to characterize a total of 125 SE and Sa...

  9. Plant-based raw material: Improved food quality for better nutrition via plant genomics

    NARCIS (Netherlands)

    Meer, van der I.M.; Bovy, A.G.; Bosch, H.J.

    2001-01-01

    Plants form the basis of the human food chain. Characteristics of plants are therefore crucial to the quantity and quality of human food. In this review, it is discussed how technological developments in the area of plant genomics and plant genetics help to mobilise the potential of plants to improv

  10. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species

    NARCIS (Netherlands)

    Bruijn, de I.; Kock, de M.J.D.; Meng, Y.; Waard, de P.; Beek, van T.A.; Raaijmakers, J.M.

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these

  11. MOVE : A Multi-Level Ontology-Based Visualization and Exploration Framework for Genomic Networks

    NARCIS (Netherlands)

    Bosman, Diederik W.J.; Blom, Evert-Jan; Ogao, Patrick J.; Kuipers, Oscar P.; Roerdink, Jos B.T.M.; Wingender, E.

    2007-01-01

    Among the various research areas that comprise bioinformatics, systems biology is gaining increasing attention. An important goal of systems biology is the unraveling of dynamic interactions between components of living cells (e.g., proteins, genes). These interactions exist among others on genomic,

  12. New traits in crops produced by genome editing techniques based on deletions

    NARCIS (Netherlands)

    Wiel, van de C.C.M.; Schaart, J.G.; Lotz, L.A.P.; Smulders, M.J.M.

    2017-01-01

    One of the most promising New Plant Breeding Techniques is genome editing (also called gene editing) with the help of a programmable site-directed nuclease (SDN). In this review, we focus on SDN-1, which is the generation of small deletions or insertions (indels) at a precisely defined location in

  13. Insights into phylogeny, sex function and age of Fragaria based on whole chloroplast genome sequencing

    Science.gov (United States)

    Wambui Njunguna; Aaron Liston; Richard Cronn; Tia-Lynn Ashman; Nahla Bassil

    2013-01-01

    The cultivated strawberry is one of the youngest domesticated plants, developed in France in the 1700s from chance hybridization between two western hemisphere octoploid species. However, little is known about the evolution of the species that gave rise to this important fruit crop. Phylogenetic analysis of chloroplast genome sequences of 21 Fragaria...

  14. Integrative Genomics-Based Discovery of Novel Regulators of the Innate Antiviral Response

    NARCIS (Netherlands)

    Lee, R. van der; Feng, Q.; Langereis, M.A.; Horst, R. ter; Szklarczyk, R.J.; Netea, M.G.; Andeweg, A.C.; Kuppeveld, F.J.M. van; Huynen, M.A.

    2015-01-01

    The RIG-I-like receptor (RLR) pathway is essential for detecting cytosolic viral RNA to trigger the production of type I interferons (IFNalpha/beta) that initiate an innate antiviral response. Through systematic assessment of a wide variety of genomics data, we discovered 10 molecular signatures of

  15. Integrative Genomics-Based Discovery of Novel Regulators of the Innate Antiviral Response

    NARCIS (Netherlands)

    van der Lee, Robin; Feng, Qian; Langereis, Martijn A; Ter Horst, Rob; Szklarczyk, Radek; Netea, Mihai G; Andeweg, Arno C; van Kuppeveld, Frank J M; Huynen, Martijn A

    2015-01-01

    The RIG-I-like receptor (RLR) pathway is essential for detecting cytosolic viral RNA to trigger the production of type I interferons (IFNα/β) that initiate an innate antiviral response. Through systematic assessment of a wide variety of genomics data, we discovered 10 molecular signatures of known R

  16. Integrative Genomics-Based Discovery of Novel Regulators of the Innate Antiviral Response

    NARCIS (Netherlands)

    R. van der Lee (Robin); Q. Feng (Qian); M.A. Langereis (Martijn A.); R. ter Horst (Rob); R. Szklarczyk (Radek); M.G. Netea (Mihai); A.C. Andeweg (Arno); F.J.M. van Kuppeveld (Frank ); M. Huynen (Martijn)

    2015-01-01

    textabstractThe RIG-I-like receptor (RLR) pathway is essential for detecting cytosolic viral RNA to trigger the production of type I interferons (IFNα/β) that initiate an innate antiviral response. Through systematic assessment of a wide variety of genomics data, we discovered 10 molecular signature

  17. Genomic Epidemiology of Salmonella enterica Serotype Enteritidis based on Population Structure of Prevalent Lineages

    DEFF Research Database (Denmark)

    Deng, Xiangyu; Desai, Prerak T.; den Bakker, Henk