WorldWideScience

Sample records for genomics tool gene

  1. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets.

    Science.gov (United States)

    Khan, Aziz; Mathelier, Anthony

    2017-05-31

    A common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited. To address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets. Intervene and its web application companion provide an easy command line and an interactive web interface to compute intersections of multiple genomic and list sets. They have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene , with the web application available at https://asntech.shinyapps.io/intervene .

  2. CGUG: in silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb

    Directory of Open Access Journals (Sweden)

    Mahadevan Padmanabhan

    2009-08-01

    Full Text Available Abstract Background Viruses and small-genome bacteria (~2 megabases and smaller comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. "CoreGenesUniqueGenes" (CGUG is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms. Findings CGUG is available at http://binf.gmu.edu/geneorder.html as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes. Conclusion CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins.

  3. MutaNET: a tool for automated analysis of genomic mutations in gene regulatory networks.

    Science.gov (United States)

    Hollander, Markus; Hamed, Mohamed; Helms, Volkhard; Neininger, Kerstin

    2018-03-01

    Mutations in genomic key elements can influence gene expression and function in various ways, and hence greatly contribute to the phenotype. We developed MutaNET to score the impact of individual mutations on gene regulation and function of a given genome. MutaNET performs statistical analyses of mutations in different genomic regions. The tool also incorporates the mutations in a provided gene regulatory network to estimate their global impact. The integration of a next-generation sequencing pipeline enables calling mutations prior to the analyses. As application example, we used MutaNET to analyze the impact of mutations in antibiotic resistance (AR) genes and their potential effect on AR of bacterial strains. MutaNET is freely available at https://sourceforge.net/projects/mutanet/. It is implemented in Python and supported on Mac OS X, Linux and MS Windows. Step-by-step instructions are available at http://service.bioinformatik.uni-saarland.de/mutanet/. volkhard.helms@bioinformatik.uni-saarland.de. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  4. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) progr...... is freely available on a web server at http://fgf.genomics.org.cn/...

  5. Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: validation of genes and alternative mRNA splicing.

    Science.gov (United States)

    Pang, Chi Nam Ignatius; Tay, Aidan P; Aya, Carlos; Twine, Natalie A; Harkness, Linda; Hart-Smith, Gene; Chia, Samantha Z; Chen, Zhiliang; Deshpande, Nandan P; Kaakoush, Nadeem O; Mitchell, Hazel M; Kassem, Moustapha; Wilkins, Marc R

    2014-01-03

    Direct links between proteomic and genomic/transcriptomic data are not frequently made, partly because of lack of appropriate bioinformatics tools. To help address this, we have developed the PG Nexus pipeline. The PG Nexus allows users to covisualize peptides in the context of genomes or genomic contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of Campylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene validation and supports the identification of mRNA splice junction boundaries and splice variants that are protein-coding. This is illustrated with an analysis of splice junctions covered by human phosphopeptides, and other examples of relevance to the Chromosome-Centric Human Proteome Project. The PG Nexus is open-source and available from https://github.com/IntersectAustralia/ap11_Samifier. It has been integrated into Galaxy and made available in the Galaxy tool shed.

  6. GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies.

    Science.gov (United States)

    Yung, Ling Sing; Yang, Can; Wan, Xiang; Yu, Weichuan

    2011-05-01

    Collecting millions of genetic variations is feasible with the advanced genotyping technology. With a huge amount of genetic variations data in hand, developing efficient algorithms to carry out the gene-gene interaction analysis in a timely manner has become one of the key problems in genome-wide association studies (GWAS). Boolean operation-based screening and testing (BOOST), a recent work in GWAS, completes gene-gene interaction analysis in 2.5 days on a desktop computer. Compared with central processing units (CPUs), graphic processing units (GPUs) are highly parallel hardware and provide massive computing resources. We are, therefore, motivated to use GPUs to further speed up the analysis of gene-gene interactions. We implement the BOOST method based on a GPU framework and name it GBOOST. GBOOST achieves a 40-fold speedup compared with BOOST. It completes the analysis of Wellcome Trust Case Control Consortium Type 2 Diabetes (WTCCC T2D) genome data within 1.34 h on a desktop computer equipped with Nvidia GeForce GTX 285 display card. GBOOST code is available at http://bioinformatics.ust.hk/BOOST.html#GBOOST.

  7. Development of functional genomic tools in trematodes: RNA interference and luciferase reporter gene activity in Fasciola hepatica.

    Directory of Open Access Journals (Sweden)

    Gabriel Rinaldi

    2008-07-01

    Full Text Available The growing availability of sequence information from diverse parasites through genomic and transcriptomic projects offer new opportunities for the identification of key mediators in the parasite-host interaction. Functional genomics approaches and methods for the manipulation of genes are essential tools for deciphering the roles of genes and to identify new intervention targets in parasites. Exciting advances in functional genomics for parasitic helminths are starting to occur, with transgene expression and RNA interference (RNAi reported in several species of nematodes, but the area is still in its infancy in flatworms, with reports in just three species. While advancing in model organisms, there is a need to rapidly extend these technologies to other parasites responsible for several chronic diseases of humans and cattle. In order to extend these approaches to less well studied parasitic worms, we developed a test method for the presence of a viable RNAi pathway by silencing the exogenous reporter gene, firefly luciferase (fLUC. We established the method in the human blood fluke Schistosoma mansoni and then confirmed its utility in the liver fluke Fasciola hepatica. We transformed newly excysted juveniles of F. hepatica by electroporation with mRNA of fLUC and three hours later were able to detect luciferase enzyme activity, concentrated mainly in the digestive ceca. Subsequently, we tested the presence of an active RNAi pathway in F. hepatica by knocking down the exogenous luciferase activity by introduction into the transformed parasites of double-stranded RNA (dsRNA specific for fLUC. In addition, we tested the RNAi pathway targeting an endogenous F. hepatica gene encoding leucine aminopeptidase (FhLAP, and observed a significant reduction in specific mRNA levels. In summary, these studies demonstrated the utility of RNAi targeting reporter fLUC as a reporter gene assay to establish the presence of an intact RNAi pathway in helminth

  8. Application of DETECTER, an evolutionary genomic tool to analyze genetic variation, to the cystic fibrosis gene family

    Directory of Open Access Journals (Sweden)

    De Kee Danny W

    2006-03-01

    Full Text Available Abstract Background The medical community requires computational tools that distinguish missense genetic differences having phenotypic impact within the vast number of sense mutations that do not. Tools that do this will become increasingly important for those seeking to use human genome sequence data to predict disease, make prognoses, and customize therapy to individual patients. Results An approach, termed DETECTER, is proposed to identify sites in a protein sequence where amino acid replacements are likely to have a significant effect on phenotype, including causing genetic disease. This approach uses a model-dependent tool to estimate the normalized replacement rate at individual sites in a protein sequence, based on a history of those sites extracted from an evolutionary analysis of the corresponding protein family. This tool identifies sites that have higher-than-average, average, or lower-than-average rates of change in the lineage leading to the sequence in the population of interest. The rates are then combined with sequence data to determine the likelihoods that particular amino acids were present at individual sites in the evolutionary history of the gene family. These likelihoods are used to predict whether any specific amino acid replacements, if introduced at the site in a modern human population, would have a significant impact on fitness. The DETECTER tool is used to analyze the cystic fibrosis transmembrane conductance regulator (CFTR gene family. Conclusion In this system, DETECTER retrodicts amino acid replacements associated with the cystic fibrosis disease with greater accuracy than alternative approaches. While this result validates this approach for this particular family of proteins only, the approach may be applicable to the analysis of polymorphisms generally, including SNPs in a human population.

  9. Uses of antimicrobial genes from microbial genome

    Science.gov (United States)

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  10. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF...

  11. ACE-it: a tool for genome-wide integration of gene dosage and RNA expression data

    NARCIS (Netherlands)

    van Wieringen, W.N.; Belien, J.A.M.; Vosse, S.; Achame, E.M.; Ylstra, B.

    2006-01-01

    Summary: We describe a tool, called ACE-it (Array CGH Expression integration tool). ACE-it links the chromosomal position of the gene dosage measured by array CGH to the genes measured by the expression array. ACE-it uses this link to statistically test whether gene dosage affects RNA expression. ©

  12. Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: validation of genes and alternative mRNA splicing

    DEFF Research Database (Denmark)

    Pang, Chi; Tay, Aidan; Aya, Carlos

    2014-01-01

    contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates...... the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of Campylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene...

  13. Application of genomic tools in plant breeding.

    Science.gov (United States)

    Pérez-de-Castro, A M; Vilanova, S; Cañizares, J; Pascual, L; Blanca, J M; Díez, M J; Prohens, J; Picó, B

    2012-05-01

    Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic information. The analysis of NGS data by means of bioinformatics developments allows discovering new genes and regulatory sequences and their positions, and makes available large collections of molecular markers. Genome-wide expression studies provide breeders with an understanding of the molecular basis of complex traits. Genomic approaches include TILLING and EcoTILLING, which make possible to screen mutant and germplasm collections for allelic variants in target genes. Re-sequencing of genomes is very useful for the genome-wide discovery of markers amenable for high-throughput genotyping platforms, like SSRs and SNPs, or the construction of high density genetic maps. All these tools and resources facilitate studying the genetic diversity, which is important for germplasm management, enhancement and use. Also, they allow the identification of markers linked to genes and QTLs, using a diversity of techniques like bulked segregant analysis (BSA), fine genetic mapping, or association mapping. These new markers are used for marker assisted selection, including marker assisted backcross selection, 'breeding by design', or new strategies, like genomic selection. In conclusion, advances in genomics are providing breeders with new tools and methodologies that allow a great leap forward in plant breeding, including the 'superdomestication' of crops and the genetic dissection and breeding for complex traits.

  14. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.

  15. SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354

  16. Visualizing conserved gene location across microbe genomes

    Science.gov (United States)

    Shaw, Chris D.

    2009-01-01

    This paper introduces an analysis-based zoomable visualization technique for displaying the location of genes across many related species of microbes. The purpose of this visualizatiuon is to enable a biologist to examine the layout of genes in the organism of interest with respect to the gene organization of related organisms. During the genomic annotation process, the ability to observe gene organization in common with previously annotated genomes can help a biologist better confirm the structure and function of newly analyzed microbe DNA sequences. We have developed a visualization and analysis tool that enables the biologist to observe and examine gene organization among genomes, in the context of the primary sequence of interest. This paper describes the visualization and analysis steps, and presents a case study using a number of Rickettsia genomes.

  17. VISTA - computational tools for comparative genomics

    Energy Technology Data Exchange (ETDEWEB)

    Frazer, Kelly A.; Pachter, Lior; Poliakov, Alexander; Rubin,Edward M.; Dubchak, Inna

    2004-01-01

    Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/VISTA/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, submit their own sequences of interest to several VISTA servers for various types of comparative analysis, and obtain detailed comparative analysis results for a set of cardiovascular genes. We illustrate capabilities of the VISTA site by the analysis of a 180 kilobase (kb) interval on human chromosome 5 that encodes for the kinesin family member3A (KIF3A) protein.

  18. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Wasnick Michael

    2008-03-01

    Full Text Available Abstract Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any

  19. Genomic variation in Salmonella enterica core genes for epidemiological typing

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Lukjancenko, Oksana; Rundsten, Carsten Friis

    2012-01-01

    Background: Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over...... genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher...... that there is a positive selection towards mutations leading to amino acid changes. Conclusions: Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important...

  20. Dcode.org anthology of comparative genomic tools.

    Science.gov (United States)

    Loots, Gabriela G; Ovcharenko, Ivan

    2005-07-01

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.

  1. Analysis tools for the interplay between genome layout and regulation.

    Science.gov (United States)

    Bouyioukos, Costas; Elati, Mohamed; Képès, François

    2016-06-06

    Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes. Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation. SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information. We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation.

  2. The molecular genetic linkage map of the model legume Medicago truncatula: an essential tool for comparative legume genomics and the isolation of agronomically important genes

    Directory of Open Access Journals (Sweden)

    Ané Jean-Michel

    2002-01-01

    Full Text Available Abstract Background The legume Medicago truncatula has emerged as a model plant for the molecular and genetic dissection of various plant processes involved in rhizobial, mycorrhizal and pathogenic plant-microbe interactions. Aiming to develop essential tools for such genetic approaches, we have established the first genetic map of this species. Two parental homozygous lines were selected from the cultivar Jemalong and from the Algerian natural population (DZA315 on the basis of their molecular and phenotypic polymorphism. Results An F2 segregating population of 124 individuals between these two lines was obtained using an efficient manual crossing technique established for M. truncatula and was used to construct a genetic map. This map spans 1225 cM (average 470 kb/cM and comprises 289 markers including RAPD, AFLP, known genes and isoenzymes arranged in 8 linkage groups (2n = 16. Markers are uniformly distributed throughout the map and segregation distortion is limited to only 3 linkage groups. By mapping a number of common markers, the eight linkage groups are shown to be homologous to those of diploid alfalfa (M. sativa, implying a good level of macrosynteny between the two genomes. Using this M. truncatula map and the derived F3 populations, we were able to map the Mtsym6 symbiotic gene on linkage group 8 and the SPC gene, responsible for the direction of pod coiling, on linkage group 7. Conclusions These results demonstrate that Medicago truncatula is amenable to diploid genetic analysis and they open the way to map-based cloning of symbiotic or other agronomically-important genes using this model plant.

  3. Application of Genomic Tools in Plant Breeding

    OpenAIRE

    Pérez-de-Castro, A.M.; Vilanova, S.; Cañizares, J.; Pascual, L.; Blanca, J.M.; Díez, M.J.; Prohens, J.; Picó, B.

    2012-01-01

    Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic...

  4. Genome position and gene amplification

    Czech Academy of Sciences Publication Activity Database

    Jirsová, Pavla; Snijders, A.M.; Kwek, S.; Roydasgupta, R.; Fridlyand, J.; Tokuyasu, T.; Pinkel, D.; Albertson, D. G.

    2007-01-01

    Roč. 8, č. 6 (2007), r120 ISSN 1474-760X Institutional research plan: CEZ:AV0Z50040507; CEZ:AV0Z50040702 Keywords : gene amplification * array comparative genomic hybridization * oncogene Subject RIV: BO - Biophysics Impact factor: 6.589, year: 2007

  5. A combined coalescence gene-dropping tool for evaluating genomic selection in complex scenarios (ms2gs).

    Science.gov (United States)

    Pérez-Enciso, M; Legarra, A

    2016-04-01

    We present ms2gs, a combined coalescence - gene dropping (i.e. backward-forward) simulator for complex traits. It therefore aims at combining the advantages of both approaches. It is primarily conceived for very short term, recent scenarios such as those that are of interest in animal and plant breeding. It is very flexible in terms of defining QTL architecture and SNP ascertainment bias, and it allows for easy modelling of alternative markers such as RADs. It can use real sequence or chip data or generate molecular polymorphisms via the coalescence. It can generate QTL conditional on extant molecular information, such as low-density genotyping. It models (simplistically) sequence, imputation or genotyping errors. It requires as input both genotypic data in plink or ms formats, and a pedigree that is used to perform the gene dropping. By default, it compares accuracy for BLUP, SNP ascertained data, sequence, and causal SNPs. It employs VanRaden's linear (GBLUP) and nonlinear method for incorporating molecular information. To illustrate the program, we present a small application in a half-sib population and a multiparental (MAGIC) cross. The program, manual and examples are available at https://github.com/mperezenciso/ms2gs. © 2016 Blackwell Verlag GmbH.

  6. RatMap--rat genome tools and data.

    Science.gov (United States)

    Petersen, Greta; Johnson, Per; Andersson, Lars; Klinga-Levan, Karin; Gómez-Fabre, Pedro M; Ståhl, Fredrik

    2005-01-01

    The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB-Genetics at Goteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided.

  7. BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

    Science.gov (United States)

    Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

    2015-08-18

    Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  8. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.

    2015-08-18

    Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  9. GREAT: a web portal for Genome Regulatory Architecture Tools.

    Science.gov (United States)

    Bouyioukos, Costas; Bucchini, François; Elati, Mohamed; Képès, François

    2016-07-08

    GREAT (Genome REgulatory Architecture Tools) is a novel web portal for tools designed to generate user-friendly and biologically useful analysis of genome architecture and regulation. The online tools of GREAT are freely accessible and compatible with essentially any operating system which runs a modern browser. GREAT is based on the analysis of genome layout -defined as the respective positioning of co-functional genes- and its relation with chromosome architecture and gene expression. GREAT tools allow users to systematically detect regular patterns along co-functional genomic features in an automatic way consisting of three individual steps and respective interactive visualizations. In addition to the complete analysis of regularities, GREAT tools enable the use of periodicity and position information for improving the prediction of transcription factor binding sites using a multi-view machine learning approach. The outcome of this integrative approach features a multivariate analysis of the interplay between the location of a gene and its regulatory sequence. GREAT results are plotted in web interactive graphs and are available for download either as individual plots, self-contained interactive pages or as machine readable tables for downstream analysis. The GREAT portal can be reached at the following URL https://absynth.issb.genopole.fr/GREAT and each individual GREAT tool is available for downloading. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

    2015-01-01

    We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  11. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  12. MaGnET: Malaria Genome Exploration Tool.

    Science.gov (United States)

    Sharman, Joanna L; Gerloff, Dietlind L

    2013-09-15

    The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive 'exploration-style' visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein-protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary data are available at Bioinformatics online.

  13. The Switchgrass Genome: Tools and Strategies

    Directory of Open Access Journals (Sweden)

    Michael D. Casler

    2011-11-01

    Full Text Available Switchgrass ( L. is a perennial grass species receiving significant focus as a potential bioenergy crop. In the last 5 yr the switchgrass research community has produced a genetic linkage map, an expressed sequence tag (EST database, a set of single nucleotide polymorphism (SNP markers that are distributed across the 18 linkage groups, 4x sampling of the AP13 genome in 400-bp reads, and bacterial artificial chromosome (BAC libraries containing over 200,000 clones. These studies have revealed close collinearity of the switchgrass genome with those of sorghum [ (L. Moench], rice ( L., and (L. P. Beauv. Switchgrass researchers have also developed several microarray technologies for gene expression studies. Switchgrass genomic resources will accelerate the ability of plant breeders to enhance productivity, pest resistance, and nutritional quality. Because switchgrass is a relative newcomer to the genomics world, many secrets of the switchgrass genome have yet to be revealed. To continue to efficiently explore basic and applied topics in switchgrass, it will be critical to capture and exploit the knowledge of plant geneticists and breeders on the next logical steps in the development and utilization of genomic resources for this species. To this end, the community has established a switchgrass genomics executive committee and work group ( [verified 28 Oct. 2011].

  14. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  15. Genomic Tools and Animal Health.

    Science.gov (United States)

    Zanella, Ricardo

    2016-09-07

    Animals have been selected to improve their productivity in order to increase the profitability to the producer. In this scenario, not much attention was given to health traits. As a consequence of that, selection was made for animals with higher production and a shortened productive life. In addition to that, the intense production system used in livestock has forced animals to be exposed to higher pathogen loads, therefore predisposing them to infections. Infectious diseases are known to be caused by micro-organisms that are able to infect and colonize the host, affecting their physiological functions and causing problems in their production and on animal welfare. Even with the best management practices, diseases are still the most important cause of economic losses in the animal industry. In this review article we have addressed the new tools that could be used to select animals to better cope with diseases and pathogens.

  16. GAPIT: genome association and prediction integrated tool.

    Science.gov (United States)

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  17. Pichia stipitis genomics, transcriptomics, and gene clusters

    Science.gov (United States)

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  18. Comparative genomics using data mining tools

    Indian Academy of Sciences (India)

    We have analysed the genomes of representatives of three kingdoms of life, namely, archaea, eubacteria and eukaryota using data mining tools based on compositional analyses of the protein sequences. The representatives chosen in this analysis were Methanococcus jannaschii, Haemophilus influenzae and ...

  19. Persistence drives gene clustering in bacterial genomes

    Directory of Open Access Journals (Sweden)

    Rocha Eduardo PC

    2008-01-01

    Full Text Available Abstract Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering.

  20. Gene conversion in the rice genome

    DEFF Research Database (Denmark)

    Xu, Shuqing; Clark, Terry; Zheng, Hongkun

    2008-01-01

    -chromosomal conversions distributed between chromosome 1 and 5, 2 and 6, and 3 and 5 are more frequent than genome average (Z-test, P ... is not tightly linked to natural selection in the rice genome. To assess the contribution of segmental duplication on gene conversion statistics, we determined locations of conversion partners with respect to inter-chromosomal segment duplication. The number of conversions associated with segmentation is less...... involved in conversion events. CONCLUSION: The evolution of gene families in the rice genome may have been accelerated by conversion with pseudogenes. Our analysis suggests a possible role for gene conversion in the evolution of pathogen-response genes....

  1. QUAST: quality assessment tool for genome assemblies.

    Science.gov (United States)

    Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay; Tesler, Glenn

    2013-04-15

    Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers have been developed, but none is yet a recognized benchmark. Further, most existing methods for comparing assemblies are only applicable to new assemblies of finished genomes; the problem of evaluating assemblies of previously unsequenced species has not been adequately considered. Here, we present QUAST-a quality assessment tool for evaluating and comparing genome assemblies. This tool improves on leading assembly comparison software with new ideas and quality metrics. QUAST can evaluate assemblies both with a reference genome, as well as without a reference. QUAST produces many reports, summary tables and plots to help scientists in their research and in their publications. In this study, we used QUAST to compare several genome assemblers on three datasets. QUAST tables and plots for all of them are available in the Supplementary Material, and interactive versions of these reports are on the QUAST website. http://bioinf.spbau.ru/quast . Supplementary data are available at Bioinformatics online.

  2. Functional genomics tools applied to plant metabolism: a survey on plant respiration, its connections and the annotation of complex gene functions

    Directory of Open Access Journals (Sweden)

    Wagner L. Araújo

    2012-09-01

    Full Text Available The application of post-genomic techniques in plant respiration studies has greatly improved our ability to assign functions to gene products. In addition it has also revealed previously unappreciated interactions between distal elements of metabolism. Such results have reinforced the need to consider plant respiratory metabolism as part of a complex network and making sense of such interactions will ultimately require the construction of predictive and mechanistic models. Transcriptomics, proteomics, metabolomics and the quantification of metabolic flux will be of great value in creating such models both by facilitating the annotation of complex gene function, determining their structure and by furnishing the quantitative data required to test them. In this review we highlight how these experimental approaches have contributed to our current understanding of plant respiratory metabolism and its interplay with associated process (e.g. photosynthesis, photorespiration and nitrogen metabolism. We also discuss how data from these techniques may be integrated, with the ultimate aim of identifying mechanisms that control and regulate plant respiration and discovering novel gene functions with potential biotechnological implications.

  3. Genome-Wide Comparative Gene Family Classification

    Science.gov (United States)

    Frech, Christian; Chen, Nansheng

    2010-01-01

    Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221

  4. RatMap—rat genome tools and data

    Science.gov (United States)

    Petersen, Greta; Johnson, Per; Andersson, Lars; Klinga-Levan, Karin; Gómez-Fabre, Pedro M.; Ståhl, Fredrik

    2005-01-01

    The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB–Genetics at Göteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided. PMID:15608244

  5. From Genomics to Gene Therapy: Induced Pluripotent Stem Cells Meet Genome Editing.

    Science.gov (United States)

    Hotta, Akitsu; Yamanaka, Shinya

    2015-01-01

    The advent of induced pluripotent stem (iPS) cells has opened up numerous avenues of opportunity for cell therapy, including the initiation in September 2014 of the first human clinical trial to treat dry age-related macular degeneration. In parallel, advances in genome-editing technologies by site-specific nucleases have dramatically improved our ability to edit endogenous genomic sequences at targeted sites of interest. In fact, clinical trials have already begun to implement this technology to control HIV infection. Genome editing in iPS cells is a powerful tool and enables researchers to investigate the intricacies of the human genome in a dish. In the near future, the groundwork laid by such an approach may expand the possibilities of gene therapy for treating congenital disorders. In this review, we summarize the exciting progress being made in the utilization of genomic editing technologies in pluripotent stem cells and discuss remaining challenges toward gene therapy applications.

  6. Gene hunting: molecular analysis of the chicken genome

    NARCIS (Netherlands)

    Crooijmans, R.P.M.A.

    2000-01-01

    This dissertation describes the development of molecular tools to identify genes that are involved in production and health traits in poultry. To unravel the chicken genome, fluorescent molecular markers (microsatellite markers) were developed and optimized to perform high throughput

  7. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  8. Genephony: a knowledge management tool for genome-wide research

    Directory of Open Access Journals (Sweden)

    Riva Alberto

    2009-09-01

    Full Text Available Abstract Background One of the consequences of the rapid and widespread adoption of high-throughput experimental technologies is an exponential increase of the amount of data produced by genome-wide experiments. Researchers increasingly need to handle very large volumes of heterogeneous data, including both the data generated by their own experiments and the data retrieved from publicly available repositories of genomic knowledge. Integration, exploration, manipulation and interpretation of data and information therefore need to become as automated as possible, since their scale and breadth are, in general, beyond the limits of what individual researchers and the basic data management tools in normal use can handle. This paper describes Genephony, a tool we are developing to address these challenges. Results We describe how Genephony can be used to manage large datesets of genomic information, integrating them with existing knowledge repositories. We illustrate its functionalities with an example of a complex annotation task, in which a set of SNPs coming from a genotyping experiment is annotated with genes known to be associated to a phenotype of interest. We show how, thanks to the modular architecture of Genephony and its user-friendly interface, this task can be performed in a few simple steps. Conclusion Genephony is an online tool for the manipulation of large datasets of genomic information. It can be used as a browser for genomic data, as a high-throughput annotation tool, and as a knowledge discovery tool. It is designed to be easy to use, flexible and extensible. Its knowledge management engine provides fine-grained control over individual data elements, as well as efficient operations on large datasets.

  9. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  10. Simultaneous gene finding in multiple genomes.

    Science.gov (United States)

    König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario

    2016-11-15

    As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Gene disruptions using P transposable elements: an integral component of the Drosophila genome project.

    OpenAIRE

    Spradling, A C; Stern, D M; Kiss, I; Roote, J; Laverty, T; Rubin, G M

    1995-01-01

    Biologists require genetic as well as molecular tools to decipher genomic information and ultimately to understand gene function. The Berkeley Drosophila Genome Project is addressing these needs with a massive gene disruption project that uses individual, genetically engineered P transposable elements to target open reading frames throughout the Drosophila genome. DNA flanking the insertions is sequenced, thereby placing an extensive series of genetic markers on the physical genomic map and a...

  12. Comparative Genomic Analysis of Soybean Flowering Genes

    Science.gov (United States)

    Jung, Chol-Hee; Wong, Chui E.; Singh, Mohan B.; Bhalla, Prem L.

    2012-01-01

    Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja) revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant, Arabidopsis. PMID:22679494

  13. Mutated genes as research tool

    International Nuclear Information System (INIS)

    1981-01-01

    mutations, it was pointed out that analogous genetical structures exist in all living organisms, the more closely related, the more similar. This is reflected in strikingly similar biochemical pathways, leading from the primary gene message to the ultimate compound or trait. Induced mutations are a unique tool for analysing these gene-controlled pathways, thus leading also to a better understanding of natural evolution

  14. KAIKObase: An integrated silkworm genome database and data mining tool

    Directory of Open Access Journals (Sweden)

    Nagaraju Javaregowda

    2009-10-01

    Full Text Available Abstract Background The silkworm, Bombyx mori, is one of the most economically important insects in many developing countries owing to its large-scale cultivation for silk production. With the development of genomic and biotechnological tools, B. mori has also become an important bioreactor for production of various recombinant proteins of biomedical interest. In 2004, two genome sequencing projects for B. mori were reported independently by Chinese and Japanese teams; however, the datasets were insufficient for building long genomic scaffolds which are essential for unambiguous annotation of the genome. Now, both the datasets have been merged and assembled through a joint collaboration between the two groups. Description Integration of the two data sets of silkworm whole-genome-shotgun sequencing by the Japanese and Chinese groups together with newly obtained fosmid- and BAC-end sequences produced the best continuity (~3.7 Mb in N50 scaffold size among the sequenced insect genomes and provided a high degree of nucleotide coverage (88% of all 28 chromosomes. In addition, a physical map of BAC contigs constructed by fingerprinting BAC clones and a SNP linkage map constructed using BAC-end sequences were available. In parallel, proteomic data from two-dimensional polyacrylamide gel electrophoresis in various tissues and developmental stages were compiled into a silkworm proteome database. Finally, a Bombyx trap database was constructed for documenting insertion positions and expression data of transposon insertion lines. Conclusion For efficient usage of genome information for functional studies, genomic sequences, physical and genetic map information and EST data were compiled into KAIKObase, an integrated silkworm genome database which consists of 4 map viewers, a gene viewer, and sequence, keyword and position search systems to display results and data at the level of nucleotide sequence, gene, scaffold and chromosome. Integration of the

  15. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

    Science.gov (United States)

    Gerlt, John A

    2017-08-22

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.

  16. In-silico human genomics with GeneCards

    Directory of Open Access Journals (Sweden)

    Stelzer Gil

    2011-10-01

    Full Text Available Abstract Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org. This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.

  17. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    ://www.cbs.dtu.dk/ws/BLASTatlas), where programming examples are available in Perl. By providing an interoperable method to carry out whole genome visualization of homology, this service offers bioinformaticians as well as biologists an easy-to-adopt workflow that can be directly called from the programming language of the user, hence......The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced...... genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including...

  18. Tools for the Validation of Genomes and Transcriptomes with Proteomics data

    DEFF Research Database (Denmark)

    Pang, Chi Nam Ignatius; Aya, Carlos; Tay, Aidan

    data generated from protein mass spectrometry. We are developing a set of tools which allow users to: •Co-visualise genomics, transcriptomics, and proteomics data using the Integrated Genomics Viewer (IGV).1 •Validate the existence of genes and mRNAs using peptides identified from mass spectrometry...

  19. Reconstruction of Ancestral Genomes in Presence of Gene Gain and Loss.

    Science.gov (United States)

    Avdeyev, Pavel; Jiang, Shuai; Aganezov, Sergey; Hu, Fei; Alekseyev, Max A

    2016-03-01

    Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools.

  20. Genomic suppression subtractive hybridization as a tool to identify differences in mycorrhizal fungal genomes.

    Science.gov (United States)

    Murat, Claude; Zampieri, Elisa; Vallino, Marta; Daghino, Stefania; Perotto, Silvia; Bonfante, Paola

    2011-05-01

    Characterization of genomic variation among different microbial species, or different strains of the same species, is a field of significant interest with a wide range of potential applications. We have investigated the genomic variation in mycorrhizal fungal genomes through genomic suppressive subtractive hybridization. The comparison was between phylogenetically distant and close truffle species (Tuber spp.), and between isolates of the ericoid mycorrhizal fungus Oidiodendron maius featuring different degrees of metal tolerance. In the interspecies experiment, almost all the sequences that were identified in the Tuber melanosporum genome and absent in Tuber borchii and Tuber indicum corresponded to transposable elements. In the intraspecies comparison, some specific sequences corresponded to regions coding for enzymes, among them a glutathione synthetase known to be involved in metal tolerance. This approach is a quick and rather inexpensive tool to develop molecular markers for mycorrhizal fungi tracking and barcoding, to identify functional genes and to investigate the genome plasticity, adaptation and evolution. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  1. DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical Research

    OpenAIRE

    Trevino, Victor; Falciani, Francesco; Barrera-Saldaña, Hugo A

    2007-01-01

    Among the many benefits of the Human Genome Project are new and powerful tools such as the genome-wide hybridization devices referred to as microarrays. Initially designed to measure gene transcriptional levels, microarray technologies are now used for comparing other genome features among individuals and their tissues and cells. Results provide valuable information on disease subcategories, disease prognosis, and treatment outcome. Likewise, they reveal differences in genetic makeup, regulat...

  2. Identification of neural outgrowth genes using genome-wide RNAi.

    Directory of Open Access Journals (Sweden)

    Katharine J Sepp

    2008-07-01

    Full Text Available While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new

  3. Tandemly Arrayed Genes in Vertebrate Genomes

    Directory of Open Access Journals (Sweden)

    Deng Pan

    2008-01-01

    Full Text Available Tandemly arrayed genes (TAGs are duplicated genes that are linked as neighbors on a chromosome, many of which have important physiological and biochemical functions. Here we performed a survey of these genes in 11 available vertebrate genomes. TAGs account for an average of about 14% of all genes in these vertebrate genomes, and about 25% of all duplications. The majority of TAGs (72–94% have parallel transcription orientation (i.e., they are encoded on the same strand in contrast to the genome, which has about 50% of its genes in parallel transcription orientation. The majority of tandem arrays have only two members. In all species, the proportion of genes that belong to TAGs tends to be higher in large gene families than in small ones; together with our recent finding that tandem duplication played a more important role than retroposition in large families, this fact suggests that among all types of duplication mechanisms, tandem duplication is the predominant mechanism of duplication, especially in large families. Finally, several species have a higher proportion of large tandem arrays that are species-specific than random expectation.

  4. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

    Science.gov (United States)

    Holt, Carson; Yandell, Mark

    2011-12-22

    Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  5. Molecular Tools for Exploring Polyploid Genomes in Plants

    Directory of Open Access Journals (Sweden)

    Domenico Carputo

    2012-08-01

    Full Text Available Polyploidy is a very common phenomenon in the plant kingdom, where even diploid species are often described as paleopolyploids. The polyploid condition may bring about several advantages compared to the diploid state. Polyploids often show phenotypes that are not present in their diploid progenitors or exceed the range of the contributing species. Some of these traits may play a role in heterosis or could favor adaptation to new ecological niches. Advances in genomics and sequencing technology may create unprecedented opportunities for discovering and monitoring the molecular effects of polyploidization. Through this review, we provide an overview of technologies and strategies that may allow an in-depth analysis of polyploid genomes. After introducing some basic aspects on the origin and genetics of polyploids, we highlight the main tools available for genome and gene expression analysis and summarize major findings. In the last part of this review, the implications of next generation sequencing are briefly discussed. The accumulation of knowledge on polyploid formation, maintenance, and divergence at whole-genome and subgenome levels will not only help plant biologists to understand how plants have evolved and diversified, but also assist plant breeders in designing new strategies for crop improvement.

  6. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  7. On Computing Breakpoint Distances for Genomes with Duplicate Genes.

    Science.gov (United States)

    Shao, Mingfu; Moret, Bernard M E

    2017-06-01

    A fundamental problem in comparative genomics is to compute the distance between two genomes in terms of its higher level organization (given by genes or syntenic blocks). For two genomes without duplicate genes, we can easily define (and almost always efficiently compute) a variety of distance measures, but the problem is NP-hard under most models when genomes contain duplicate genes. To tackle duplicate genes, three formulations (exemplar, maximum matching, and any matching) have been proposed, all of which aim to build a matching between homologous genes so as to minimize some distance measure. Of the many distance measures, the breakpoint distance (the number of nonconserved adjacencies) was the first one to be studied and remains of significant interest because of its simplicity and model-free property. The three breakpoint distance problems corresponding to the three formulations have been widely studied. Although we provided last year a solution for the exemplar problem that runs very fast on full genomes, computing optimal solutions for the other two problems has remained challenging. In this article, we describe very fast, exact algorithms for these two problems. Our algorithms rely on a compact integer-linear program that we further simplify by developing an algorithm to remove variables, based on new results on the structure of adjacencies and matchings. Through extensive experiments using both simulations and biological data sets, we show that our algorithms run very fast (in seconds) on mammalian genomes and scale well beyond. We also apply these algorithms (as well as the classic orthology tool MSOAR) to create orthology assignment, then compare their quality in terms of both accuracy and coverage. We find that our algorithm for the "any matching" formulation significantly outperforms other methods in terms of accuracy while achieving nearly maximum coverage.

  8. Gene therapy and genome surgery in the retina.

    Science.gov (United States)

    DiCarlo, James E; Mahajan, Vinit B; Tsang, Stephen H

    2018-06-01

    Precision medicine seeks to treat disease with molecular specificity. Advances in genome sequence analysis, gene delivery, and genome surgery have allowed clinician-scientists to treat genetic conditions at the level of their pathology. As a result, progress in treating retinal disease using genetic tools has advanced tremendously over the past several decades. Breakthroughs in gene delivery vectors, both viral and nonviral, have allowed the delivery of genetic payloads in preclinical models of retinal disorders and have paved the way for numerous successful clinical trials. Moreover, the adaptation of CRISPR-Cas systems for genome engineering have enabled the correction of both recessive and dominant pathogenic alleles, expanding the disease-modifying power of gene therapies. Here, we highlight the translational progress of gene therapy and genome editing of several retinal disorders, including RPE65-, CEP290-, and GUY2D-associated Leber congenital amaurosis, as well as choroideremia, achromatopsia, Mer tyrosine kinase- (MERTK-) and RPGR X-linked retinitis pigmentosa, Usher syndrome, neovascular age-related macular degeneration, X-linked retinoschisis, Stargardt disease, and Leber hereditary optic neuropathy.

  9. GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

    Directory of Open Access Journals (Sweden)

    Promponas Vasilis J

    2003-10-01

    Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating

  10. Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome

    Science.gov (United States)

    Kim, Woonsu; Park, Hyesun; Seo, Seongwon

    2016-01-01

    The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle. PMID

  11. RGmatch: matching genomic regions to proximal genes in omics data integration

    Directory of Open Access Journals (Sweden)

    Pedro Furió-Tarí

    2016-11-01

    Full Text Available Abstract Background The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. Results In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. Conclusions RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher’s specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch’s flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.

  12. Comparative genomics and transcriptomics of trait-gene association

    Directory of Open Access Journals (Sweden)

    Pierlé Sebastián

    2012-11-01

    Full Text Available Abstract Background The Order Rickettsiales includes important tick-borne pathogens, from Rickettsia rickettsii, which causes Rocky Mountain spotted fever, to Anaplasma marginale, the most prevalent vector-borne pathogen of cattle. Although most pathogens in this Order are transmitted by arthropod vectors, little is known about the microbial determinants of transmission. A. marginale provides unique tools for studying the determinants of transmission, with multiple strain sequences available that display distinct and reproducible transmission phenotypes. The closed core A. marginale genome suggests that any phenotypic differences are due to single nucleotide polymorphisms (SNPs. We combined DNA/RNA comparative genomic approaches using strains with different tick transmission phenotypes and identified genes that segregate with transmissibility. Results Comparison of seven strains with different transmission phenotypes generated a list of SNPs affecting 18 genes and nine promoters. Transcriptional analysis found two candidate genes downstream from promoter SNPs that were differentially transcribed. To corroborate the comparative genomics approach we used three RNA-seq platforms to analyze the transcriptomes from two A. marginale strains with different transmission phenotypes. RNA-seq analysis confirmed the comparative genomics data and found 10 additional genes whose transcription between strains with distinct transmission efficiencies was significantly different. Six regions of the genome that contained no annotation were found to be transcriptionally active, and two of these newly identified transcripts were differentially transcribed. Conclusions This approach identified 30 genes and two novel transcripts potentially involved in tick transmission. We describe the transcriptome of an obligate intracellular bacterium in depth, while employing massive parallel sequencing to dissect an important trait in bacterial pathogenesis.

  13. Enhanced genome editing tools for multi-gene deletion knock-out approaches using paired CRISPR sgRNAs in CHO cells

    DEFF Research Database (Denmark)

    Schmieder, Valerie; Bydlinski, Nina; Strasser, Richard

    2017-01-01

    (sgRNAs) for full gene deletions. This strategy also enables the targeting of regulatory regions, which would not respond to the conventional frameshift mutations, as shown by deleting the α-1,6-Fucosyltransferase 8 (FUT8) promoter resulting in a functional knock-out. Fut8 also served as model...

  14. Whole genome DNA methylation: beyond genes silencing

    OpenAIRE

    Tirado-Magallanes, Roberto; Rebbani, Khadija; Lim, Ricky; Pradhan, Sriharsa; Benoukraf, Touati

    2016-01-01

    The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation at near base pair level resolution, far beyond that of the kilobase-long canonical CpG islands that initially revealed the biological relevance of this covalent DNA modification. The latest high-resolution studies have revealed a role for very punctual DNA methylation in chromatin plasticity, gene regulation and splicing. Here, we aim to outline the ...

  15. Gene Composer in a structural genomics environment

    International Nuclear Information System (INIS)

    Lorimer, Don; Raymond, Amy; Mixon, Mark; Burgin, Alex; Staker, Bart; Stewart, Lance

    2011-01-01

    For structural biology applications, protein-construct engineering is guided by comparative sequence analysis and structural information, which allow the researcher to better define domain boundaries for terminal deletions and nonconserved regions for surface mutants. A database software application called Gene Composer has been developed to facilitate construct design. The structural genomics effort at the Seattle Structural Genomics Center for Infectious Disease (SSGCID) requires the manipulation of large numbers of amino-acid sequences and the underlying DNA sequences which are to be cloned into expression vectors. To improve efficiency in high-throughput protein structure determination, a database software package, Gene Composer, has been developed which facilitates the information-rich design of protein constructs and their underlying gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bioinformatics steps used in modern structure-guided protein engineering and synthetic gene engineering. An example of the structure determination of H1N1 RNA-dependent RNA polymerase PB2 subunit is given

  16. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...

  17. Computational Tools and Algorithms for Designing Customized Synthetic Genes

    Directory of Open Access Journals (Sweden)

    Nathan eGould

    2014-10-01

    Full Text Available Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de-novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations.

  18. Computational Tools and Algorithms for Designing Customized Synthetic Genes

    Energy Technology Data Exchange (ETDEWEB)

    Gould, Nathan [Department of Computer Science, The College of New Jersey, Ewing, NJ (United States); Hendy, Oliver [Department of Biology, The College of New Jersey, Ewing, NJ (United States); Papamichail, Dimitris, E-mail: papamicd@tcnj.edu [Department of Computer Science, The College of New Jersey, Ewing, NJ (United States)

    2014-10-06

    Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein-coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review, we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations.

  19. Computational Tools and Algorithms for Designing Customized Synthetic Genes

    International Nuclear Information System (INIS)

    Gould, Nathan; Hendy, Oliver; Papamichail, Dimitris

    2014-01-01

    Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein-coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review, we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations.

  20. Supplementary Material for: BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

    2015-01-01

    Abstract Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACONâ s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  1. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  2. Viral Cre-LoxP tools aid genome engineering in mammalian cells.

    Science.gov (United States)

    Sengupta, Ranjita; Mendenhall, Amy; Sarkar, Nandita; Mukherjee, Chandreyee; Afshari, Amirali; Huang, Joseph; Lu, Biao

    2017-01-01

    Targeted nucleases have transformed genome editing technology, providing more efficient methods to make targeted changes in mammalian genome. In parallel, there is an increasing demand of Cre-LoxP technology for complex genome manipulation such as large deletion, addition, gene fusion and conditional removal of gene sequences at the target site. However, an efficient and easy-to-use Cre-recombinase delivery system remains lacking. We designed and constructed two sets of expression vectors for Cre-recombinase using two highly efficient viral systems, the integrative lentivirus and non-integrative adeno associated virus. We demonstrate the effectiveness of those methods in Cre-delivery into stably-engineered HEK293 cells harboring LoxP-floxed red fluorescent protein (RFP) and puromycin (Puro) resistant reporters. The delivered Cre recombinase effectively excised the floxed RFP-Puro either directly or conditionally, therefore validating the function of these molecular tools. Given the convenient options of two selections markers, these viral-based systems offer a robust and easy-to-use tool for advanced genome editing, expanding complicated genome engineering to a variety of cell types and conditions. We have developed and functionally validated two viral-based Cre-recombinase delivery systems for efficient genome manipulation in various mammalian cells. The ease of gene delivery with the built-in reporters and inducible element enables live cell monitoring, drug selection and temporal knockout, broadening applications of genome editing.

  3. Gene Ontology-Based Analysis of Zebrafish Omics Data Using the Web Tool Comparative Gene Ontology.

    Science.gov (United States)

    Ebrahimie, Esmaeil; Fruzangohar, Mario; Moussavi Nik, Seyyed Hani; Newman, Morgan

    2017-10-01

    Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: "Molecular Function," "Biological Process," and "Cellular Component." GO analysis can assist in revealing functional mechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 ( www.comparativego.com ). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.

  4. CRISPR-Cas9: tool for qualitative and quantitative plant genome editing

    Directory of Open Access Journals (Sweden)

    Ali Noman

    2016-11-01

    Full Text Available Genome editing advancements have made many unachievable ideas practical. Increased adoption of genome editing has been geared by swiftly developing CRISPR-Cas9 technology. This technique is appearing as driving force for innovative utilization in diverse branches of plant biology. CRISPR mediated genome editing is being used for rapid, easy and efficient alteration of indigenous genes among diverse plant species. With approximate completion of conceptual work about CRISPR/Cas9, plant scientists are applying this genome editing tool for crop attributes enhancement. The capability of CRISPR-Cas9 systems for performing targeted and efficient modifications in genome sequence as well as gene expression will certainly spur novel developments not only in model plants but also in crop plants. Additionally, due to non-involvement of foreign DNA, this technique may help alleviating regulatory issues associated with GM Plants. We expect that prevailing challenges in plant science like genomic region manipulation, crop specific vectors etc. will be addressed along with sustained growth of this genome editing tool. In this review, recent progress of CRISPR/Cas9 technology in plants has been summarized and discussed. We review potential of CRISPR/Cas9 for different aspects of plant life. It also covers strengths of this technique in comparison with other genome editing techniques e.g. ZFNs and TALENs and potential challenges in coming decades have been described.

  5. New bioinformatic tool for quick identification of functionally relevant endogenous retroviral inserts in human genome.

    Science.gov (United States)

    Garazha, Andrew; Ivanova, Alena; Suntsova, Maria; Malakhova, Galina; Roumiantsev, Sergey; Zhavoronkov, Alex; Buzdin, Anton

    2015-01-01

    Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.

  6. PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

    Science.gov (United States)

    Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

    2015-12-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

    Science.gov (United States)

    Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

    2015-01-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591

  8. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    OpenAIRE

    Wolf Yuri I; Novichkov Pavel S; Sorokin Alexander V; Makarova Kira S; Koonin Eugene V

    2007-01-01

    Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs ...

  9. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis.

    Science.gov (United States)

    Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T

    2015-07-11

    SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research

  10. The zebrafish genome: a review and msx gene case study.

    Science.gov (United States)

    Postlethwait, J H

    2006-01-01

    Zebrafish is one of several important teleost models for understanding principles of vertebrate developmental, molecular, organismal, genetic, evolutionary, and genomic biology. Efficient investigation of the molecular genetic basis of induced mutations depends on knowledge of the zebrafish genome. Principles of zebrafish genomic analysis, including gene mapping, ortholog identification, conservation of syntenies, genome duplication, and evolution of duplicate gene function are discussed here using as a case study the zebrafish msxa, msxb, msxc, msxd, and msxe genes, which together constitute zebrafish orthologs of tetrapod Msx1, Msx2, and Msx3. Genomic analysis suggests orthologs for this difficult to understand group of paralogs.

  11. Genomic analysis of primordial dwarfism reveals novel disease genes.

    Science.gov (United States)

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis.

  12. Comparative genomics on Norrie disease gene.

    Science.gov (United States)

    Katoh, Masuko; Katoh, Masaru

    2005-05-01

    DAND1 (NBL1), DAND2 (CKTSF1B1 or GREM1 or GREMLIN), DAND3 (CKTSF1B2 or GREM2 or PRDC), DAND4 (CER1), DAND5 (CKTSF1B3 or GREM3 or DANTE), MUC2, MUC5AC, MUC5B, MUC6, MUC19, WISP1, WISP2, WISP3, VWF, NOV and Norrie disease (NDP or NORRIN) genes encode proteins with cysteine knot domain. Cysteine-knot superfamily proteins regulate ligand-receptor interactions for a variety of signaling pathways implicated in embryogenesis, homeostasis, and carcinogenesis. Although Ndp is unrelated to Wnt family members, Ndp is claimed to function as a ligand for Fzd4. Here, we identified and characterized rat Ndp, cow Ndp, chicken ndp and zebrafish ndp genes by using bioinformatics. Rat Ndp gene, consisting of three exons, was located within AC105563.4 genome sequence. Cow Ndp and chicken ndp complete CDS were derived from CB467544.1 EST and BX932859.2 cDNA, respectively. Zebrafish ndp gene was located within BX572627.5 genome sequence. Rat Ndp (131 aa) was a secreted protein with C-terminal cysteine knot-like (CTCK) domain. Rat Ndp showed 100, 96.9, 95.4, 87.8 and 66.4 total-amino-acid identity with mouse Ndp, cow Ndp, human NDP, chicken ndp and zebrafish ndp, respectively. Exon-intron structure of mammalian Ndp orthologs was well conserved. FOXA2, CUTL1 (CCAAT displacement protein), LMO2, CEBPA (C/EBPalpha)-binding sites and triple POU2F1 (OCT1)-binding sites were conserved among promoters of mammalian Ndp orthologs.

  13. G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes

    Directory of Open Access Journals (Sweden)

    Lemay Danielle G

    2012-09-01

    Full Text Available Abstract Background In previous studies, gene neighborhoods—spatial clusters of co-expressed genes in the genome—have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously. Results Using G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods. Conclusions Our experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The

  14. Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances.

    Science.gov (United States)

    Xu, Jianping

    2006-06-01

    Microbial ecology examines the diversity and activity of micro-organisms in Earth's biosphere. In the last 20 years, the application of genomics tools have revolutionized microbial ecological studies and drastically expanded our view on the previously underappreciated microbial world. This review first introduces the basic concepts in microbial ecology and the main genomics methods that have been used to examine natural microbial populations and communities. In the ensuing three specific sections, the applications of the genomics in microbial ecological research are highlighted. The first describes the widespread application of multilocus sequence typing and representational difference analysis in studying genetic variation within microbial species. Such investigations have identified that migration, horizontal gene transfer and recombination are common in natural microbial populations and that microbial strains can be highly variable in genome size and gene content. The second section highlights and summarizes the use of four specific genomics methods (phylogenetic analysis of ribosomal RNA, DNA-DNA re-association kinetics, metagenomics, and micro-arrays) in analysing the diversity and potential activity of microbial populations and communities from a variety of terrestrial and aquatic environments. Such analyses have identified many unexpected phylogenetic lineages in viruses, bacteria, archaea, and microbial eukaryotes. Functional analyses of environmental DNA also revealed highly prevalent, but previously unknown, metabolic processes in natural microbial communities. In the third section, the ecological implications of sequenced microbial genomes are briefly discussed. Comparative analyses of prokaryotic genomic sequences suggest the importance of ecology in determining microbial genome size and gene content. The significant variability in genome size and gene content among strains and species of prokaryotes indicate the highly fluid nature of prokaryotic

  15. Genomic Islands: an overview of current software tools and future improvements

    Directory of Open Access Journals (Sweden)

    Soares Siomar de Castro

    2016-03-01

    Full Text Available Microbes are highly diverse and widely distributed organisms. They account for ~60% of Earth’s biomass and new predictions point for the existence of 1011 to 1012 species, which are constantly sharing genes through several different mechanisms. Genomic Islands (GI are critical in this context, as they are large regions acquired through horizontal gene transfer. Also, they present common features like genomic signature deviation, transposase genes, flanking tRNAs and insertion sequences. GIs carry large numbers of genes related to specific lifestyle and are commonly classified in Pathogenicity, Resistance, Metabolic or Symbiotic Islands. With the advent of the next-generation sequencing technologies and the deluge of genomic data, many software tools have been developed that aim to tackle the problem of GI prediction and they are all based on the prediction of GI common features. However, there is still room for the development of new software tools that implements new approaches, such as, machine learning and pangenomics based analyses. Finally, GIs will always hold a potential application in every newly invented genomic approach as they are directly responsible for much of the genomic plasticity of bacteria.

  16. Genomic Islands: an overview of current software tools and future improvements.

    Science.gov (United States)

    Soares, Siomar de Castro; Oliveira, Letícia de Castro; Jaiswal, Arun Kumar; Azevedo, Vasco

    2016-03-01

    Microbes are highly diverse and widely distributed organisms. They account for ~60% of Earth's biomass and new predictions point for the existence of 1011 to 1012 species, which are constantly sharing genes through several different mechanisms. Genomic Islands (GI) are critical in this context, as they are large regions acquired through horizontal gene transfer. Also, they present common features like genomic signature deviation, transposase genes, flanking tRNAs and insertion sequences. GIs carry large numbers of genes related to specific lifestyle and are commonly classified in Pathogenicity, Resistance, Metabolic or Symbiotic Islands. With the advent of the next-generation sequencing technologies and the deluge of genomic data, many software tools have been developed that aim to tackle the problem of GI prediction and they are all based on the prediction of GI common features. However, there is still room for the development of new software tools that implements new approaches, such as, machine learning and pangenomics based analyses. Finally, GIs will always hold a potential application in every newly invented genomic approach as they are directly responsible for much of the genomic plasticity of bacteria.

  17. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Science.gov (United States)

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  18. GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Stærfeldt, Hans Henrik; Rotenberg, Eva

    2009-01-01

    , standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from http://www.cbs.dtu.dk/services/gwBrowser. [Supplemental material including interactive...... atlases is available online at http://www.cbs.dtu.dk/services/gwBrowser/suppl/]....... readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot, allowing for example visualization of gene expression and regulation data. Further...

  19. Coral Reef Genomics: Developing tools for functional genomics ofcoral symbiosis

    Energy Technology Data Exchange (ETDEWEB)

    Schwarz, Jodi; Brokstein, Peter; Manohar, Chitra; Coffroth, MaryAlice; Szmant, Alina; Medina, Monica

    2005-03-01

    Symbioses between cnidarians and dinoflagellates in the genus Symbiodinium are widespread in the marine environment. The importance of this symbiosis to reef-building corals and reef nutrient and carbon cycles is well documented, but little is known about the mechanisms by which the partners establish and regulate the symbiosis. Because the dinoflagellate symbionts live inside the cells of their host coral, the interactions between the partners occur on cellular and molecular levels, as each partner alters the expression of genes and proteins to facilitate the partnership. These interactions can examined using high-throughput techniques that allow thousands of genes to be examined simultaneously. We are developing the groundwork so that we can use DNA microarray profiling to identify genes involved in the Montastraea faveolata and Acropora palmata symbioses. Here we report results from the initial steps in this microarray initiative, that is, the construction of cDNA libraries from 4 of 16 target stages, sequencing of 3450 cDNA clones to generate Expressed Sequenced Tags (ESTs), and annotation of the ESTs to identify candidate genes to include in the microarrays. An understanding of how the coral-dinoflagellate symbiosis is regulated will have implications for atmospheric and ocean sciences, conservation biology, the study and diagnosis of coral bleaching and disease, and comparative studies of animal-protest interactions.

  20. Widespread of horizontal gene transfer in the human genome.

    Science.gov (United States)

    Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun

    2017-04-04

    A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. From the pair-wise alignments between human genome and 53 vertebrate genomes, 1,467 human genome regions (2.6 M bases) from all chromosomes were found to be more conserved with non-mammals than with most mammals. These human genome regions involve 642 known genes, which are enriched with ion binding. Compared to known horizontal gene transfer regions in the human genome, there were few overlapping regions, which indicated horizontal gene transfer is more common than we expected in the human genome. Horizontal gene transfer impacts hundreds of human genes and this study provided insight into potential mechanisms of HGT in the human genome.

  1. BRED: a simple and powerful tool for constructing mutant and recombinant bacteriophage genomes.

    Directory of Open Access Journals (Sweden)

    Laura J Marinelli

    Full Text Available Advances in DNA sequencing technology have facilitated the determination of hundreds of complete genome sequences both for bacteria and their bacteriophages. Some of these bacteria have well-developed and facile genetic systems for constructing mutants to determine gene function, and recombineering is a particularly effective tool. However, generally applicable methods for constructing defined mutants of bacteriophages are poorly developed, in part because of the inability to use selectable markers such as drug resistance genes during viral lytic growth. Here we describe a method for simple and effective directed mutagenesis of bacteriophage genomes using Bacteriophage Recombineering of Electroporated DNA (BRED, in which a highly efficient recombineering system is utilized directly on electroporated phage DNA; no selection is required and mutants can be readily detected by PCR. We describe the use of BRED to construct unmarked gene deletions, in-frame internal deletions, base substitutions, precise gene replacements, and the addition of gene tags.

  2. A salmonid EST genomic study: genes, duplications, phylogeny and microarrays

    Directory of Open Access Journals (Sweden)

    Brahmbhatt Sonal

    2008-11-01

    Full Text Available Abstract Background Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most widely studied groups of fish. Results 298,304 expressed sequence tags (ESTs from Atlantic salmon (69% of the total, 11,664 chinook, 10,813 sockeye, 10,051 brook trout, 10,975 grayling, 8,630 lake whitefish, and 3,624 northern pike ESTs were obtained in this study and have been deposited into the public databases. Contigs were built and putative full-length Atlantic salmon clones have been identified. A database containing ESTs, assemblies, consensus sequences, open reading frames, gene predictions and putative annotation is available. The overall similarity between Atlantic salmon ESTs and those of rainbow trout, chinook, sockeye, brook trout, grayling, lake whitefish, northern pike and rainbow smelt is 93.4, 94.2, 94.6, 94.4, 92.5, 91.7, 89.6, and 86.2% respectively. An analysis of 78 transcript sets show Salmo as a sister group to Oncorhynchus and Salvelinus within Salmoninae, and Thymallinae as a sister group to Salmoninae and Coregoninae within Salmonidae. Extensive gene duplication is consistent with a genome duplication in the common ancestor of salmonids. Using all of the available EST data, a new expanded salmonid cDNA microarray of 32,000 features was created. Cross-species hybridizations to this cDNA microarray indicate that this resource will be useful for studies of all 68 salmonid species. Conclusion An extensive collection and analysis of salmonid RNA putative transcripts indicate that Pacific salmon, Atlantic salmon and charr are 94–96% similar while the more distant whitefish, grayling, pike and smelt are 93, 92, 89 and 86% similar to salmon. The salmonid transcriptome reveals a complex history of gene duplication that is

  3. The genomic view of genes responsive to the antagonistic phytohormones, abscisic acid, and gibberellin.

    Science.gov (United States)

    Yazaki, Junshi; Kikuchi, Shoshi

    2005-01-01

    We now have the various genomics tools for monocot (Oryza sativa) and a dicot (Arabidopsis thaliana) plant. Plant is not only a very important agricultural resource but also a model organism for biological research. It is important that the interaction between ABA and GA is investigated for controlling the transition from embryogenesis to germination in seeds using genomics tools. These studies have investigated the relationship between dormancy and germination using genomics tools. Genomics tools identified genes that had never before been annotated as ABA- or GA-responsive genes in plant, detected new interactions between genes responsive to the two hormones, comprehensively characterized cis-elements of hormone-responsive genes, and characterized cis-elements of rice and Arabidopsis. In these research, ABA- and GA-regulated genes have been classified as functional proteins (proteins that probably function in stress or PR tolerance) and regulatory proteins (protein factors involved in further regulation of signal transduction). Comparison between ABA and/or GA-responsive genes in rice and those in Arabidopsis has shown that the cis-element has specificity in each species. cis-Elements for the dehydration-stress response have been specified in Arabidopsis but not in rice. cis-Elements for protein storage are remarkably richer in the upstream regions of the rice gene than in those of Arabidopsis.

  4. Construction of the BAC Library of Small Abalone (Haliotis diversicolor) for Gene Screening and Genome Characterization.

    Science.gov (United States)

    Jiang, Likun; You, Weiwei; Zhang, Xiaojun; Xu, Jian; Jiang, Yanliang; Wang, Kai; Zhao, Zixia; Chen, Baohua; Zhao, Yunfeng; Mahboob, Shahid; Al-Ghanim, Khalid A; Ke, Caihuan; Xu, Peng

    2016-02-01

    The small abalone (Haliotis diversicolor) is one of the most important aquaculture species in East Asia. To facilitate gene cloning and characterization, genome analysis, and genetic breeding of it, we constructed a large-insert bacterial artificial chromosome (BAC) library, which is an important genetic tool for advanced genetics and genomics research. The small abalone BAC library includes 92,610 clones with an average insert size of 120 Kb, equivalent to approximately 7.6× of the small abalone genome. We set up three-dimensional pools and super pools of 18,432 BAC clones for target gene screening using PCR method. To assess the approach, we screened 12 target genes in these 18,432 BAC clones and identified 16 positive BAC clones. Eight positive BAC clones were then sequenced and assembled with the next generation sequencing platform. The assembled contigs representing these 8 BAC clones spanned 928 Kb of the small abalone genome, providing the first batch of genome sequences for genome evaluation and characterization. The average GC content of small abalone genome was estimated as 40.33%. A total of 21 protein-coding genes, including 7 target genes, were annotated into the 8 BACs, which proved the feasibility of PCR screening approach with three-dimensional pools in small abalone BAC library. One hundred fifty microsatellite loci were also identified from the sequences for marker development in the future. The BAC library and clone pools provided valuable resources and tools for genetic breeding and conservation of H. diversicolor.

  5. Genetic, genomic, and molecular tools for studying the protoploid yeast, L. waltii.

    Science.gov (United States)

    Di Rienzi, Sara C; Lindstrom, Kimberly C; Lancaster, Ragina; Rolczynski, Lisa; Raghuraman, M K; Brewer, Bonita J

    2011-02-01

    Sequencing of the yeast Kluyveromyces waltii (recently renamed Lachancea waltii) provided evidence of a whole genome duplication event in the lineage leading to the well-studied Saccharomyces cerevisiae. While comparative genomic analyses of these yeasts have proven to be extremely instructive in modeling the loss or maintenance of gene duplicates, experimental tests of the ramifications following such genome alterations remain difficult. To transform L. waltii from an organism of the computational comparative genomic literature into an organism of the functional comparative genomic literature, we have developed genetic, molecular and genomic tools for working with L. waltii. In particular, we have characterized basic properties of L. waltii (growth, ploidy, molecular karyotype, mating type and the sexual cycle), developed transformation, cell cycle arrest and synchronization protocols, and have created centromeric and non-centromeric vectors as well as a genome browser for L. waltii. We hope that these tools will be used by the community to follow up on the ideas generated by sequence data and lead to a greater understanding of eukaryotic biology and genome evolution. 2010 John Wiley & Sons, Ltd.

  6. Brief Guide to Genomics: DNA, Genes and Genomes

    Science.gov (United States)

    ... clinic. Most new drugs based on genome-based research are estimated to be at least 10 to 15 years away, though recent genome-driven efforts in lipid-lowering therapy have considerably shortened that interval. According ...

  7. CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome.

    Science.gov (United States)

    Lee, Mikyung; Kim, Yangseok

    2009-12-16

    Genomic alterations frequently occur in many cancer patients and play important mechanistic roles in the pathogenesis of cancer. Furthermore, they can modify the expression level of genes due to altered copy number in the corresponding region of the chromosome. An accumulating body of evidence supports the possibility that strong genome-wide correlation exists between DNA content and gene expression. Therefore, more comprehensive analysis is needed to quantify the relationship between genomic alteration and gene expression. A well-designed bioinformatics tool is essential to perform this kind of integrative analysis. A few programs have already been introduced for integrative analysis. However, there are many limitations in their performance of comprehensive integrated analysis using published software because of limitations in implemented algorithms and visualization modules. To address this issue, we have implemented the Java-based program CHESS to allow integrative analysis of two experimental data sets: genomic alteration and genome-wide expression profile. CHESS is composed of a genomic alteration analysis module and an integrative analysis module. The genomic alteration analysis module detects genomic alteration by applying a threshold based method or SW-ARRAY algorithm and investigates whether the detected alteration is phenotype specific or not. On the other hand, the integrative analysis module measures the genomic alteration's influence on gene expression. It is divided into two separate parts. The first part calculates overall correlation between comparative genomic hybridization ratio and gene expression level by applying following three statistical methods: simple linear regression, Spearman rank correlation and Pearson's correlation. In the second part, CHESS detects the genes that are differentially expressed according to the genomic alteration pattern with three alternative statistical approaches: Student's t-test, Fisher's exact test and Chi square

  8. Development of an Efficient Genome Editing Tool in Bacillus licheniformis Using CRISPR-Cas9 Nickase.

    Science.gov (United States)

    Li, Kaifeng; Cai, Dongbo; Wang, Zhangqian; He, Zhili; Chen, Shouwen

    2018-03-15

    Bacillus strains are important industrial bacteria that can produce various biochemical products. However, low transformation efficiencies and a lack of effective genome editing tools have hindered its widespread application. Recently, clustered regularly interspaced short palindromic repeat (CRISPR)-Cas9 techniques have been utilized in many organisms as genome editing tools because of their high efficiency and easy manipulation. In this study, an efficient genome editing method was developed for Bacillus licheniformis using a CRISPR-Cas9 nickase integrated into the genome of B. licheniformis DW2 with overexpression driven by the P43 promoter. The yvmC gene was deleted using the CRISPR-Cas9n technique with homology arms of 1.0 kb as a representative example, and an efficiency of 100% was achieved. In addition, two genes were simultaneously disrupted with an efficiency of 11.6%, and the large DNA fragment bacABC (42.7 kb) was deleted with an efficiency of 79.0%. Furthermore, the heterologous reporter gene aprN , which codes for nattokinase in Bacillus subtilis , was inserted into the chromosome of B. licheniformis with an efficiency of 76.5%. The activity of nattokinase in the DWc9nΔ7/pP43SNT-S sacC strain reached 59.7 fibrinolytic units (FU)/ml, which was 25.7% higher than that of DWc9n/pP43SNT-S sacC Finally, the engineered strain DWc9nΔ7 (Δ epr Δ wprA Δ mpr Δ aprE Δ vpr Δ bprA Δ bacABC ), with multiple disrupted genes, was constructed using the CRISPR-Cas9n technique. Taken together, we have developed an efficient genome editing tool based on CRISPR-Cas9n in B. licheniformis This tool could be applied to strain improvement for future research. IMPORTANCE As important industrial bacteria, Bacillus strains have attracted significant attention due to their production of biological products. However, genetic manipulation of these bacteria is difficult. The CRISPR-Cas9 system has been applied to genome editing in some bacteria, and CRISPR-Cas9n was proven to

  9. The GATO gene annotation tool for research laboratories

    Directory of Open Access Journals (Sweden)

    A. Fujita

    2005-11-01

    Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

  10. New tools in regenerative medicine: gene therapy.

    Science.gov (United States)

    Muñoz Ruiz, Miguel; Regueiro, José R

    2012-01-01

    Gene therapy aims to transfer genetic material into cells to provide them with new functions. A gene transfer agent has to be safe, capable of expressing the desired gene for a sustained period of time in a sufficiently large population of cells to produce a biological effect. Identifying a gene transfer tool that meets all of these criteria has proven to be a difficult objective. Viral and nonviral vectors, in vivo, ex vivo and in situ strategies co-exist at present, although ex vivo lenti-or retroviral vectors are presently the most popular.Natural stem cells (from embryonic, hematopoietic, mesenchymal, or adult tissues) or induced progenitor stem (iPS) cells can be modified by gene therapy for use in regenerative medicine. Among them, hematopoietic stem cells have shown clear clinical benefit, but iPS cells hold humongous potential with no ethical concerns.

  11. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    to protein: through epigenetic modifications, transcription regulators or post-transcriptional controls. The following papers concern several layers of gene regulation with questions answered by different HTS approaches. Genome-wide screening of epigenetic changes by ChIP-seq allowed us to study both spatial...... and temporal alterations of histone modifications (Papers I and II). Coupling the data with machine learning approaches, we established a prediction framework to assess the most informative histone marks as well as their most influential nucleosome positions in predicting the promoter usages. (Papers I...... they regulated or if the sites had global elevated usage rates by multiple TFs. Using RNA-seq, 5’end-seq in combination with depletion of 5’exonuclease as well as nonsensemediated decay (NMD) factors, we systematically analyzed NMD substrates as well as their degradation intermediates in human cells (Paper V...

  12. Next generation tools for genomic data generation, distribution, and visualization

    Directory of Open Access Journals (Sweden)

    Nix David A

    2010-09-01

    Full Text Available Abstract Background With the rapidly falling cost and availability of high throughput sequencing and microarray technologies, the bottleneck for effectively using genomic analysis in the laboratory and clinic is shifting to one of effectively managing, analyzing, and sharing genomic data. Results Here we present three open-source, platform independent, software tools for generating, analyzing, distributing, and visualizing genomic data. These include a next generation sequencing/microarray LIMS and analysis project center (GNomEx; an application for annotating and programmatically distributing genomic data using the community vetted DAS/2 data exchange protocol (GenoPub; and a standalone Java Swing application (GWrap that makes cutting edge command line analysis tools available to those who prefer graphical user interfaces. Both GNomEx and GenoPub use the rich client Flex/Flash web browser interface to interact with Java classes and a relational database on a remote server. Both employ a public-private user-group security model enabling controlled distribution of patient and unpublished data alongside public resources. As such, they function as genomic data repositories that can be accessed manually or programmatically through DAS/2-enabled client applications such as the Integrated Genome Browser. Conclusions These tools have gained wide use in our core facilities, research laboratories and clinics and are freely available for non-profit use. See http://sourceforge.net/projects/gnomex/, http://sourceforge.net/projects/genoviz/, and http://sourceforge.net/projects/useq.

  13. Next generation tools for genomic data generation, distribution, and visualization.

    Science.gov (United States)

    Nix, David A; Di Sera, Tonya L; Dalley, Brian K; Milash, Brett A; Cundick, Robert M; Quinn, Kevin S; Courdy, Samir J

    2010-09-09

    With the rapidly falling cost and availability of high throughput sequencing and microarray technologies, the bottleneck for effectively using genomic analysis in the laboratory and clinic is shifting to one of effectively managing, analyzing, and sharing genomic data. Here we present three open-source, platform independent, software tools for generating, analyzing, distributing, and visualizing genomic data. These include a next generation sequencing/microarray LIMS and analysis project center (GNomEx); an application for annotating and programmatically distributing genomic data using the community vetted DAS/2 data exchange protocol (GenoPub); and a standalone Java Swing application (GWrap) that makes cutting edge command line analysis tools available to those who prefer graphical user interfaces. Both GNomEx and GenoPub use the rich client Flex/Flash web browser interface to interact with Java classes and a relational database on a remote server. Both employ a public-private user-group security model enabling controlled distribution of patient and unpublished data alongside public resources. As such, they function as genomic data repositories that can be accessed manually or programmatically through DAS/2-enabled client applications such as the Integrated Genome Browser. These tools have gained wide use in our core facilities, research laboratories and clinics and are freely available for non-profit use. See http://sourceforge.net/projects/gnomex/, http://sourceforge.net/projects/genoviz/, and http://sourceforge.net/projects/useq.

  14. New Genome Similarity Measures based on Conserved Gene Adjacencies.

    Science.gov (United States)

    Doerr, Daniel; Kowada, Luis Antonio B; Araujo, Eloi; Deshpande, Shachi; Dantas, Simone; Moret, Bernard M E; Stoye, Jens

    2017-06-01

    Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerful-but also most complex-models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.

  15. GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

    Science.gov (United States)

    Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

    2013-06-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. © 2013 Wiley Periodicals, Inc.

  16. Comparative genomics of the relationship between gene structure and expression

    NARCIS (Netherlands)

    Ren, X.

    2006-01-01

    The relationship between the structure of genes and their expression is a relatively new aspect of genome organization and regulation. With more genome sequences and expression data becoming available, bioinformatics approaches can help the further elucidation of the relationships between gene

  17. LEMONS - A Tool for the Identification of Splice Junctions in Transcriptomes of Organisms Lacking Reference Genomes.

    Directory of Open Access Journals (Sweden)

    Liron Levin

    Full Text Available RNA-seq is becoming a preferred tool for genomics studies of model and non-model organisms. However, DNA-based analysis of organisms lacking sequenced genomes cannot rely on RNA-seq data alone to isolate most genes of interest, as DNA codes both exons and introns. With this in mind, we designed a novel tool, LEMONS, that exploits the evolutionary conservation of both exon/intron boundary positions and splice junction recognition signals to produce high throughput splice-junction predictions in the absence of a reference genome. When tested on multiple annotated vertebrate mRNA data, LEMONS accurately identified 87% (average of the splice-junctions. LEMONS was then applied to our updated Mediterranean chameleon transcriptome, which lacks a reference genome, and predicted a total of 90,820 exon-exon junctions. We experimentally verified these splice-junction predictions by amplifying and sequencing twenty randomly selected genes from chameleon DNA templates. Exons and introns were detected in 19 of 20 of the positions predicted by LEMONS. To the best of our knowledge, LEMONS is currently the only experimentally verified tool that can accurately predict splice-junctions in organisms that lack a reference genome.

  18. Regulation of methane genes and genome expression

    Energy Technology Data Exchange (ETDEWEB)

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  19. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  20. Evolution of closely linked gene pairs in vertebrate genomes

    NARCIS (Netherlands)

    Franck, E.; Hulsen, T.; Huynen, M.A.; Jong, de W.W.; Lunsen, N.H.; Madsen, O.

    2008-01-01

    The orientation of closely linked genes in mammalian genomes is not random: there are more head-to-head (h2h) gene pairs than expected. To understand the origin of this enrichment in h2h gene pairs, we have analyzed the phylogenetic distribution of gene pairs separated by less than 600 bp of

  1. Decoding the genome with an integrative analysis tool: combinatorial CRM Decoder.

    Science.gov (United States)

    Kang, Keunsoo; Kim, Joomyeong; Chung, Jae Hoon; Lee, Daeyoup

    2011-09-01

    The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.

  2. Genomic tools in pearl millet breeding for drought tolerance: Status and prospects

    Directory of Open Access Journals (Sweden)

    Desalegn Debelo Serba

    2016-11-01

    Full Text Available Pearl millet (Penisetum glaucum (L R. Br. is a hardy cereal crop grown in the arid and semiarid tropics where other cereals are likely to fail to produce economic yields due to drought and heat stresses. Adaptive evolution, a form of natural selection shaped the crop to grow and yield satisfactorily with limited moisture supply or under periodic water deficits in the soil. Drought tolerance is a complex polygenic trait that various morphological and physiological responses are controlled by hundreds of genes and significantly influenced by the environment. The development of genomic tools will have enormous potential to improve the efficiency and precision of conventional breeding. The apparent independent domestication events, highly outcrossing nature and traditional cultivation in stressful environments maintained tremendous amount of polymorphism in pearl millet. This high polymorphism of the crop has been revealed by genome mapping that in turn stimulated the mapping and tagging of genomic regions controlling important traits such as drought tolerance. Mapping of a major QTL for terminal drought tolerance in independent populations envisaged the prospect for the development of molecular breeding in pearl millet. To accelerate genetic gains for drought tolerance targeted novel approaches such as establishment of marker-trait associations, genomic selection tools, genome sequence and genotyping-by-sequencing are still limited. Development and application of high throughput genomic tools need to be intensified to improve the breeding efficiency of pearl millet to minimize the impact of climate change on its production.

  3. A Novel Tool for Microbial Genome Editing Using the Restriction-Modification System.

    Science.gov (United States)

    Bai, Hua; Deng, Aihua; Liu, Shuwen; Cui, Di; Qiu, Qidi; Wang, Laiyou; Yang, Zhao; Wu, Jie; Shang, Xiuling; Zhang, Yun; Wen, Tingyi

    2018-01-19

    Scarless genetic manipulation of genomes is an essential tool for biological research. The restriction-modification (R-M) system is a defense system in bacteria that protects against invading genomes on the basis of its ability to distinguish foreign DNA from self DNA. Here, we designed an R-M system-mediated genome editing (RMGE) technique for scarless genetic manipulation in different microorganisms. For bacteria with Type IV REase, an RMGE technique using the inducible DNA methyltransferase gene, bceSIIM (RMGE-bceSIIM), as the counter-selection cassette was developed to edit the genome of Escherichia coli. For bacteria without Type IV REase, an RMGE technique based on a restriction endonuclease (RMGE-mcrA) was established in Bacillus subtilis. These techniques were successfully used for gene deletion and replacement with nearly 100% counter-selection efficiencies, which were higher and more stable compared to conventional methods. Furthermore, precise point mutation without limiting sites was achieved in E. coli using RMGE-bceSIIM to introduce a single base mutation of A128C into the rpsL gene. In addition, the RMGE-mcrA technique was applied to delete the CAN1 gene in Saccharomyces cerevisiae DAY414 with 100% counter-selection efficiency. The effectiveness of the RMGE technique in E. coli, B. subtilis, and S. cerevisiae suggests the potential universal usefulness of this technique for microbial genome manipulation.

  4. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  5. Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.

    Science.gov (United States)

    Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J

    2017-01-01

    The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.

  6. Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

    Science.gov (United States)

    Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

    2016-02-27

    In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a

  7. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

    Science.gov (United States)

    Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre; Borodovsky, Mark; Stanke, Mario

    2016-03-01

    Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Genome editing: The efficient tool CRISPR–Cpf1

    KAUST Repository

    Mahfouz, Magdy M.

    2017-01-01

    The novel features of the CRISPR–Cpf1 RNA-guided endonuclease system facilitate precise and efficient genome engineering. Application of CRISPR–Cpf1 in plants shows promise for robust gene editing and regulation, opening exciting possibilities for targeted trait improvement in crops.

  9. Genome editing: The efficient tool CRISPR–Cpf1

    KAUST Repository

    Mahfouz, Magdy M.

    2017-03-01

    The novel features of the CRISPR–Cpf1 RNA-guided endonuclease system facilitate precise and efficient genome engineering. Application of CRISPR–Cpf1 in plants shows promise for robust gene editing and regulation, opening exciting possibilities for targeted trait improvement in crops.

  10. Adenoviral vectors as genome editing tools : repairing defective DMD alleles

    NARCIS (Netherlands)

    Maggio, Ignazio

    2016-01-01

    Adenoviral vectors (AdVs) constitute powerful gene delivery vehicles. However, so far, their potential for genome editing has not been extensively investigated. By tailoring AdVs as carriers of designer nucleases and donor DNA sequences, the research presented in this thesis expands the utility of

  11. Recognizing genes and other components of genomic structure

    Energy Technology Data Exchange (ETDEWEB)

    Burks, C. (Los Alamos National Lab., NM (USA)); Myers, E. (Arizona Univ., Tucson, AZ (USA). Dept. of Computer Science); Stormo, G.D. (Colorado Univ., Boulder, CO (USA). Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  12. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  13. Directed evolution combined with synthetic biology strategies expedite semi-rational engineering of genes and genomes.

    Science.gov (United States)

    Kang, Zhen; Zhang, Junli; Jin, Peng; Yang, Sen

    2015-01-01

    Owing to our limited understanding of the relationship between sequence and function and the interaction between intracellular pathways and regulatory systems, the rational design of enzyme-coding genes and de novo assembly of a brand-new artificial genome for a desired functionality or phenotype are difficult to achieve. As an alternative approach, directed evolution has been widely used to engineer genomes and enzyme-coding genes. In particular, significant developments toward DNA synthesis, DNA assembly (in vitro or in vivo), recombination-mediated genetic engineering, and high-throughput screening techniques in the field of synthetic biology have been matured and widely adopted, enabling rapid semi-rational genome engineering to generate variants with desired properties. In this commentary, these novel tools and their corresponding applications in the directed evolution of genomes and enzymes are discussed. Moreover, the strategies for genome engineering and rapid in vitro enzyme evolution are also proposed.

  14. A dual selection based, targeted gene replacement tool for Magnaporthe grisea and Fusarium oxysporum.

    Science.gov (United States)

    Khang, Chang Hyun; Park, Sook-Young; Lee, Yong-Hwan; Kang, Seogchan

    2005-06-01

    Rapid progress in fungal genome sequencing presents many new opportunities for functional genomic analysis of fungal biology through the systematic mutagenesis of the genes identified through sequencing. However, the lack of efficient tools for targeted gene replacement is a limiting factor for fungal functional genomics, as it often necessitates the screening of a large number of transformants to identify the desired mutant. We developed an efficient method of gene replacement and evaluated factors affecting the efficiency of this method using two plant pathogenic fungi, Magnaporthe grisea and Fusarium oxysporum. This method is based on Agrobacterium tumefaciens-mediated transformation with a mutant allele of the target gene flanked by the herpes simplex virus thymidine kinase (HSVtk) gene as a conditional negative selection marker against ectopic transformants. The HSVtk gene product converts 5-fluoro-2'-deoxyuridine to a compound toxic to diverse fungi. Because ectopic transformants express HSVtk, while gene replacement mutants lack HSVtk, growing transformants on a medium amended with 5-fluoro-2'-deoxyuridine facilitates the identification of targeted mutants by counter-selecting against ectopic transformants. In addition to M. grisea and F. oxysporum, the method and associated vectors are likely to be applicable to manipulating genes in a broad spectrum of fungi, thus potentially serving as an efficient, universal functional genomic tool for harnessing the growing body of fungal genome sequence data to study fungal biology.

  15. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3  ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA

  16. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    to investigate locomotor activity, and applied genomic feature prediction models to identify gene ontology (GO) cate- gories predictive of this phenotype. Next, we applied the covariance association test to partition the genomic variance of the predictive GO terms to the genes within these terms. We...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated......Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...

  17. Widespread of horizontal gene transfer in the human genome

    OpenAIRE

    Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun

    2017-01-01

    Background A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. Results From the pa...

  18. Cross-family translational genomics of abiotic stress-responsive genes between Arabidopsis and Medicago truncatula.

    Directory of Open Access Journals (Sweden)

    Daejin Hyung

    Full Text Available Cross-species translation of genomic information may play a pivotal role in applying biological knowledge gained from relatively simple model system to other less studied, but related, genomes. The information of abiotic stress (ABS-responsive genes in Arabidopsis was identified and translated into the legume model system, Medicago truncatula. Various data resources, such as TAIR/AtGI DB, expression profiles and literatures, were used to build a genome-wide list of ABS genes. tBlastX/BlastP similarity search tools and manual inspection of alignments were used to identify orthologous genes between the two genomes. A total of 1,377 genes were finally collected and classified into 18 functional criteria of gene ontology (GO. The data analysis according to the expression cues showed that there was substantial level of interaction among three major types (i.e., drought, salinity and cold stress of abiotic stresses. In an attempt to translate the ABS genes between these two species, genomic locations for each gene were mapped using an in-house-developed comparative analysis platform. The comparative analysis revealed that fragmental colinearity, represented by only 37 synteny blocks, existed between Arabidopsis and M. truncatula. Based on the combination of E-value and alignment remarks, estimated translation rate was 60.2% for this cross-family translation. As a prelude of the functional comparative genomic approaches, in-silico gene network/interactome analyses were conducted to predict key components in the ABS responses, and one of the sub-networks was integrated with corresponding comparative map. The results demonstrated that core members of the sub-network were well aligned with previously reported ABS regulatory networks. Taken together, the results indicate that network-based integrative approaches of comparative and functional genomics are important to interpret and translate genomic information for complex traits such as abiotic stresses.

  19. Gene copy number variation throughout the Plasmodium falciparum genome

    Directory of Open Access Journals (Sweden)

    Stewart Lindsay B

    2009-08-01

    Full Text Available Abstract Background Gene copy number variation (CNV is responsible for several important phenotypes of the malaria parasite Plasmodium falciparum, including drug resistance, loss of infected erythrocyte cytoadherence and alteration of receptor usage for erythrocyte invasion. Despite the known effects of CNV, little is known about its extent throughout the genome. Results We performed a whole-genome survey of CNV genes in P. falciparum using comparative genome hybridisation of a diverse set of 16 laboratory culture-adapted isolates to a custom designed high density Affymetrix GeneChip array. Overall, 186 genes showed hybridisation signals consistent with deletion or amplification in one or more isolate. There is a strong association of CNV with gene length, genomic location, and low orthology to genes in other Plasmodium species. Sub-telomeric regions of all chromosomes are strongly associated with CNV genes independent from members of previously described multigene families. However, ~40% of CNV genes were located in more central regions of the chromosomes. Among the previously undescribed CNV genes, several that are of potential phenotypic relevance are identified. Conclusion CNV represents a major form of genetic variation within the P. falciparum genome; the distribution of gene features indicates the involvement of highly non-random mutational and selective processes. Additional studies should be directed at examining CNV in natural parasite populations to extend conclusions to clinical settings.

  20. [CRISPR/CAS9, the King of Genome Editing Tools].

    Science.gov (United States)

    Bannikov, A V; Lavrov, A V

    2017-01-01

    The discovery of CRISPR/Cas9 brought a hope for having an efficient, reliable, and readily available tool for genome editing. CRISPR/Cas9 is certainly easy to use, while its efficiency and reliability remain the focus of studies. The review describes the general principles of the organization and function of Cas nucleases and a number of important issues to be considered while planning genome editing experiments with CRISPR/Cas9. The issues include evaluation of the efficiency and specificity for Cas9, sgRNA selection, Cas9 variants designed artificially, and use of homologous recombination and nonhomologous end joining in DNA editing.

  1. Databases and web tools for cancer genomics study.

    Science.gov (United States)

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-02-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  2. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    International Nuclear Information System (INIS)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-01-01

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society

  3. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    Energy Technology Data Exchange (ETDEWEB)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-09-18

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.

  4. PSP: rapid identification of orthologous coding genes under positive selection across multiple closely related prokaryotic genomes.

    Science.gov (United States)

    Su, Fei; Ou, Hong-Yu; Tao, Fei; Tang, Hongzhi; Xu, Ping

    2013-12-27

    With genomic sequences of many closely related bacterial strains made available by deep sequencing, it is now possible to investigate trends in prokaryotic microevolution. Positive selection is a sub-process of microevolution, in which a particular mutation is favored, causing the allele frequency to continuously shift in one direction. Wide scanning of prokaryotic genomes has shown that positive selection at the molecular level is much more frequent than expected. Genes with significant positive selection may play key roles in bacterial adaption to different environmental pressures. However, selection pressure analyses are computationally intensive and awkward to configure. Here we describe an open access web server, which is designated as PSP (Positive Selection analysis for Prokaryotic genomes) for performing evolutionary analysis on orthologous coding genes, specially designed for rapid comparison of dozens of closely related prokaryotic genomes. Remarkably, PSP facilitates functional exploration at the multiple levels by assignments and enrichments of KO, GO or COG terms. To illustrate this user-friendly tool, we analyzed Escherichia coli and Bacillus cereus genomes and found that several genes, which play key roles in human infection and antibiotic resistance, show significant evidence of positive selection. PSP is freely available to all users without any login requirement at: http://db-mml.sjtu.edu.cn/PSP/. PSP ultimately allows researchers to do genome-scale analysis for evolutionary selection across multiple prokaryotic genomes rapidly and easily, and identify the genes undergoing positive selection, which may play key roles in the interactions of host-pathogen and/or environmental adaptation.

  5. Functional Associations by Response Overlap (FARO), a functional genomics approach matching gene expression phenotypes

    DEFF Research Database (Denmark)

    Nielsen, Henrik Bjørn; Mundy, J.; Willenbrock, Hanni

    2007-01-01

    The systematic comparison of transcriptional responses of organisms is a powerful tool in functional genomics. For example, mutants may be characterized by comparing their transcript profiles to those obtained in other experiments querying the effects on gene expression of many experimental facto...

  6. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  7. Genes but not genomes reveal bacterial domestication of Lactococcus lactis.

    Directory of Open Access Journals (Sweden)

    Delphine Passerini

    Full Text Available BACKGROUND: The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE. METHODOLOGY/PRINCIPAL FINDINGS: The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST differing by up to 230 kb in genome size. CONCLUSION/SIGNIFICANCE: The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between "environmental" strains, the main contributors to the genetic diversity within the subspecies, and "domesticated" strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the "domesticated" strains essentially arose through substantial genomic flux within the dispensable

  8. A comparative evaluation of genome assembly reconciliation tools.

    Science.gov (United States)

    Alhakami, Hind; Mirebrahim, Hamid; Lonardi, Stefano

    2017-05-18

    The majority of eukaryotic genomes are unfinished due to the algorithmic challenges of assembling them. A variety of assembly and scaffolding tools are available, but it is not always obvious which tool or parameters to use for a specific genome size and complexity. It is, therefore, common practice to produce multiple assemblies using different assemblers and parameters, then select the best one for public release. A more compelling approach would allow one to merge multiple assemblies with the intent of producing a higher quality consensus assembly, which is the objective of assembly reconciliation. Several assembly reconciliation tools have been proposed in the literature, but their strengths and weaknesses have never been compared on a common dataset. We fill this need with this work, in which we report on an extensive comparative evaluation of several tools. Specifically, we evaluate contiguity, correctness, coverage, and the duplication ratio of the merged assembly compared to the individual assemblies provided as input. None of the tools we tested consistently improved the quality of the input GAGE and synthetic assemblies. Our experiments show an increase in contiguity in the consensus assembly when the original assemblies already have high quality. In terms of correctness, the quality of the results depends on the specific tool, as well as on the quality and the ranking of the input assemblies. In general, the number of misassemblies ranges from being comparable to the best of the input assembly to being comparable to the worst of the input assembly.

  9. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    Josephine Erhiakporeh

    2016-07-06

    Jul 6, 2016 ... candidate genes for drought tolerance in sesame. (Sesamum ... Our results provided genomic resources for further functional analysis and genetic engineering .... reverse transcribed using the Reverse Transcription System.

  10. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    Science.gov (United States)

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  11. Synthetic RNAs for Gene Regulation: Design Principles and Computational Tools

    International Nuclear Information System (INIS)

    Laganà, Alessandro; Shasha, Dennis; Croce, Carlo Maria

    2014-01-01

    The use of synthetic non-coding RNAs for post-transcriptional regulation of gene expression has not only become a standard laboratory tool for gene functional studies but it has also opened up new perspectives in the design of new and potentially promising therapeutic strategies. Bioinformatics has provided researchers with a variety of tools for the design, the analysis, and the evaluation of RNAi agents such as small-interfering RNA (siRNA), short-hairpin RNA (shRNA), artificial microRNA (a-miR), and microRNA sponges. More recently, a new system for genome engineering based on the bacterial CRISPR-Cas9 system (Clustered Regularly Interspaced Short Palindromic Repeats), was shown to have the potential to also regulate gene expression at both transcriptional and post-transcriptional level in a more specific way. In this mini review, we present RNAi and CRISPRi design principles and discuss the advantages and limitations of the current design approaches.

  12. Synthetic RNAs for Gene Regulation: Design Principles and Computational Tools

    Energy Technology Data Exchange (ETDEWEB)

    Laganà, Alessandro [Department of Molecular Virology, Immunology and Medical Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH (United States); Shasha, Dennis [Courant Institute of Mathematical Sciences, New York University, New York, NY (United States); Croce, Carlo Maria [Department of Molecular Virology, Immunology and Medical Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH (United States)

    2014-12-11

    The use of synthetic non-coding RNAs for post-transcriptional regulation of gene expression has not only become a standard laboratory tool for gene functional studies but it has also opened up new perspectives in the design of new and potentially promising therapeutic strategies. Bioinformatics has provided researchers with a variety of tools for the design, the analysis, and the evaluation of RNAi agents such as small-interfering RNA (siRNA), short-hairpin RNA (shRNA), artificial microRNA (a-miR), and microRNA sponges. More recently, a new system for genome engineering based on the bacterial CRISPR-Cas9 system (Clustered Regularly Interspaced Short Palindromic Repeats), was shown to have the potential to also regulate gene expression at both transcriptional and post-transcriptional level in a more specific way. In this mini review, we present RNAi and CRISPRi design principles and discuss the advantages and limitations of the current design approaches.

  13. Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

    Directory of Open Access Journals (Sweden)

    Yubo Hou

    Full Text Available The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10-transformed protein-coding gene number (Y' versus log(10-transformed genome size (X', genome size in kbp were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p0.91. Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1% compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%. The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6 kbp contains 38,188 protein-coding (40,086 total genes and the largest (245x10(6 kbp 87,688 protein-coding (92,013 total genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

  14. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries

    Directory of Open Access Journals (Sweden)

    Kudrna David

    2011-03-01

    Full Text Available Abstract Background Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. Results We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1 digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb to 157 Kb (Eg_Ba, very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. Conclusions The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×, contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae

  15. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  16. The CRISPR/Cas genome-editing tool: application in improvement of crops

    Directory of Open Access Journals (Sweden)

    SURENDER eKHATODIA

    2016-04-01

    Full Text Available The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR associated Cas9/sgRNA system is a novel fledgling targeted genome-editing technique from bacterial immune system, which is a cheap, easy and most rapidly adopted genome editing tool transforming to revolutionary paradigm. Cas9 protein is an RNA guided endonuclease utilized for creating targeted double stranded breaks with only a short RNA sequence to confer recognition of the target in animals and plants. Development of genetically edited (GE crops similar to those developed by conventional or mutation breeding using this potential technique makes it a promising and extremely versatile tool for providing sustainable productive agriculture for better feeding of rapidly growing population in changing climate. The emerging areas of research for the genome editing in plants are like, interrogating gene function, rewiring the regulatory signaling networks, sgRNA library for high-throughput loss-of-function screening. In this review, we will discuss the broad applicability of the Cas9 nuclease mediated targeted plant genome editing for development of designer crops. The regulatory uncertainty and social acceptance of plant breeding by Cas9 genome editing have also been discussed. The non-GM designer genetically edited plants could prospect climate resilient and sustainable energy agriculture in coming future for maximizing the yield by combating abiotic and biotic stresses with this new innovative plant breeding technique.

  17. The invasive MED/Q Bemisia tabaci genome: a tale of gene loss and gene gain

    Science.gov (United States)

    Whiteflies are a group of invasive crop pests that impact global agriculture. An analysis was conducted to compare draft genomes of two whitefly strains, which demonstrated the relative conserved gene order, but a number of genes were either novel (added) or omitted (deleted) between genomes. This...

  18. Data on the genome-wide identification of CNL R-genes in Setaria italica (L.) P. Beauv.

    Science.gov (United States)

    Andersen, Ethan J; Nepal, Madhav P

    2017-08-01

    We report data associated with the identification of 242 disease resistance genes (R-genes) in the genome of Setaria italica as presented in "Genetic diversity of disease resistance genes in foxtail millet ( Setaria italica L.)" (Andersen and Nepal, 2017) [1]. Our data describe the structure and evolution of the Coiled-coil, Nucleotide-binding site, Leucine-rich repeat (CNL) R-genes in foxtail millet. The CNL genes were identified through rigorous extraction and analysis of recently available plant genome sequences using cutting-edge analytical software. Data visualization includes gene structure diagrams, chromosomal syntenic maps, a chromosomal density plot, and a maximum-likelihood phylogenetic tree comparing Sorghum bicolor , Panicum virgatum , Setaria italica , and Arabidopsis thaliana . Compilation of InterProScan annotations, Gene Ontology (GO) annotations, and Basic Local Alignment Search Tool (BLAST) results for the 242 R-genes identified in the foxtail millet genome are also included in tabular format.

  19. Analysis and prediction of gene splice sites in four Aspergillus genomes

    DEFF Research Database (Denmark)

    Wang, Kai; Ussery, David; Brunak, Søren

    2009-01-01

    Several Aspergillus fungal genomic sequences have been published, with many more in progress. Obviously, it is essential to have high-quality, consistently annotated sets of proteins from each of the genomes, in order to make meaningful comparisons. We have developed a dedicated, publicly available......, splice site prediction program called NetAspGene, for the genus Aspergillus. Gene sequences from Aspergillus fumigatus, the most common mould pathogen, were used to build and test our model. Compared to many animals and plants, Aspergillus contains smaller introns; thus we have applied a larger window...... better splice site prediction than other available tools. NetAspGene will be very helpful for the study in Aspergillus splice sites and especially in alternative splicing. A webpage for NetAspGene is publicly available at http://www.cbs.dtu.dk/services/NetAspGene....

  20. Genomic-based-breeding tools for tropical maize improvement.

    Science.gov (United States)

    Chakradhar, Thammineni; Hindu, Vemuri; Reddy, Palakolanu Sudhakar

    2017-12-01

    Maize has traditionally been the main staple diet in the Southern Asia and Sub-Saharan Africa and widely grown by millions of resource poor small scale farmers. Approximately, 35.4 million hectares are sown to tropical maize, constituting around 59% of the developing worlds. Tropical maize encounters tremendous challenges besides poor agro-climatic situations with average yields recorded <3 tones/hectare that is far less than the average of developed countries. On the contrary to poor yields, the demand for maize as food, feed, and fuel is continuously increasing in these regions. Heterosis breeding introduced in early 90 s improved maize yields significantly, but genetic gains is still a mirage, particularly for crop growing under marginal environments. Application of molecular markers has accelerated the pace of maize breeding to some extent. The availability of array of sequencing and genotyping technologies offers unrivalled service to improve precision in maize-breeding programs through modern approaches such as genomic selection, genome-wide association studies, bulk segregant analysis-based sequencing approaches, etc. Superior alleles underlying complex traits can easily be identified and introgressed efficiently using these sequence-based approaches. Integration of genomic tools and techniques with advanced genetic resources such as nested association mapping and backcross nested association mapping could certainly address the genetic issues in maize improvement programs in developing countries. Huge diversity in tropical maize and its inherent capacity for doubled haploid technology offers advantage to apply the next generation genomic tools for accelerating production in marginal environments of tropical and subtropical world. Precision in phenotyping is the key for success of any molecular-breeding approach. This article reviews genomic technologies and their application to improve agronomic traits in tropical maize breeding has been reviewed in

  1. Conserved genomic organisation of Group B Sox genes in insects.

    Directory of Open Access Journals (Sweden)

    Woerfel Gertrud

    2005-05-01

    Full Text Available Abstract Background Sox domain containing genes are important metazoan transcriptional regulators implicated in a wide rage of developmental processes. The vertebrate B subgroup contains the Sox1, Sox2 and Sox3 genes that have early functions in neural development. Previous studies show that Drosophila Group B genes have been functionally conserved since they play essential roles in early neural specification and mutations in the Drosophila Dichaete and SoxN genes can be rescued with mammalian Sox genes. Despite their importance, the extent and organisation of the Group B family in Drosophila has not been fully characterised, an important step in using Drosophila to examine conserved aspects of Group B Sox gene function. Results We have used the directed cDNA sequencing along with the output from the publicly-available genome sequencing projects to examine the structure of Group B Sox domain genes in Drosophila melanogaster, Drosophila pseudoobscura, Anopheles gambiae and Apis mellifora. All of the insect genomes contain four genes encoding Group B proteins, two of which are intronless, as is the case with vertebrate group B genes. As has been previously reported and unusually for Group B genes, two of the insect group B genes, Sox21a and Sox21b, contain introns within their DNA-binding domains. We find that the highly unusual multi-exon structure of the Sox21b gene is common to the insects. In addition, we find that three of the group B Sox genes are organised in a linked cluster in the insect genomes. By in situ hybridisation we show that the pattern of expression of each of the four group B genes during embryogenesis is conserved between D. melanogaster and D. pseudoobscura. Conclusion The DNA-binding domain sequences and genomic organisation of the group B genes have been conserved over 300 My of evolution since the last common ancestor of the Hymenoptera and the Diptera. Our analysis suggests insects have two Group B1 genes, SoxN and

  2. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    Directory of Open Access Journals (Sweden)

    Wolf Yuri I

    2007-11-01

    Full Text Available Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs. Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. Results New Archaeal Clusters of Orthologous Genes (arCOGs were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile

  3. Evolution of genes and genomes on the Drosophila phylogeny

    DEFF Research Database (Denmark)

    Clark, Andrew G; Eisen, Michael B; Smith, Douglas R

    2007-01-01

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the ......Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here...... tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila...

  4. MVisAGe Identifies Concordant and Discordant Genomic Alterations of Driver Genes in Squamous Tumors.

    Science.gov (United States)

    Walter, Vonn; Du, Ying; Danilova, Ludmila; Hayward, Michele C; Hayes, D Neil

    2018-06-15

    Integrated analyses of multiple genomic datatypes are now common in cancer profiling studies. Such data present opportunities for numerous computational experiments, yet analytic pipelines are limited. Tools such as the cBioPortal and Regulome Explorer, although useful, are not easy to access programmatically or to implement locally. Here, we introduce the MVisAGe R package, which allows users to quantify gene-level associations between two genomic datatypes to investigate the effect of genomic alterations (e.g., DNA copy number changes on gene expression). Visualizing Pearson/Spearman correlation coefficients according to the genomic positions of the underlying genes provides a powerful yet novel tool for conducting exploratory analyses. We demonstrate its utility by analyzing three publicly available cancer datasets. Our approach highlights canonical oncogenes in chr11q13 that displayed the strongest associations between expression and copy number, including CCND1 and CTTN , genes not identified by copy number analysis in the primary reports. We demonstrate highly concordant usage of shared oncogenes on chr3q, yet strikingly diverse oncogene usage on chr11q as a function of HPV infection status. Regions of chr19 that display remarkable associations between methylation and gene expression were identified, as were previously unreported miRNA-gene expression associations that may contribute to the epithelial-to-mesenchymal transition. Significance: This study presents an important bioinformatics tool that will enable integrated analyses of multiple genomic datatypes. Cancer Res; 78(12); 3375-85. ©2018 AACR . ©2018 American Association for Cancer Research.

  5. LATERAL GENE TRANSFER AND THE HISTORY OF BACTERIAL GENOMES

    Energy Technology Data Exchange (ETDEWEB)

    Howard Ochman

    2006-02-22

    The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.

  6. Genomic Tools in Pea Breeding Programs: Status and Perspectives

    Science.gov (United States)

    Tayeh, Nadim; Aubert, Grégoire; Pilet-Nayel, Marie-Laure; Lejeune-Hénaut, Isabelle; Warkentin, Thomas D.; Burstin, Judith

    2015-01-01

    Pea (Pisum sativum L.) is an annual cool-season legume and one of the oldest domesticated crops. Dry pea seeds contain 22–25% protein, complex starch and fiber constituents, and a rich array of vitamins, minerals, and phytochemicals which make them a valuable source for human consumption and livestock feed. Dry pea ranks third to common bean and chickpea as the most widely grown pulse in the world with more than 11 million tons produced in 2013. Pea breeding has achieved great success since the time of Mendel's experiments in the mid-1800s. However, several traits still require significant improvement for better yield stability in a larger growing area. Key breeding objectives in pea include improving biotic and abiotic stress resistance and enhancing yield components and seed quality. Taking advantage of the diversity present in the pea genepool, many mapping populations have been constructed in the last decades and efforts have been deployed to identify loci involved in the control of target traits and further introgress them into elite breeding materials. Pea now benefits from next-generation sequencing and high-throughput genotyping technologies that are paving the way for genome-wide association studies and genomic selection approaches. This review covers the significant development and deployment of genomic tools for pea breeding in recent years. Future prospects are discussed especially in light of current progress toward deciphering the pea genome. PMID:26640470

  7. Genomic tools in pea breeding programs: status and perspectives

    Directory of Open Access Journals (Sweden)

    Nadim eTAYEH

    2015-11-01

    Full Text Available Pea (Pisum sativum L. is an annual cool-season legume and one of the oldest domesticated crops. Dry pea seeds contain 22-25 percent protein, complex starch and fibre constituents and a rich array of vitamins, minerals, and phytochemicals which make them a valuable source for human consumption and livestock feed. Dry pea ranks third to common bean and chickpea as the most widely grown pulse in the world with more than 11 million tonnes produced in 2013. Pea breeding has achieved great success since the time of Mendel’s experiments in the mid-1800s. However, several traits still require significant improvement for better yield stability in a larger growing area. Key breeding objectives in pea include improving biotic and abiotic stress resistance and enhancing yield components and seed quality. Taking advantage of the diversity present in the pea genepool, many mapping populations have been constructed in the last decades and efforts have been deployed to identify loci involved in the control of target traits and further introgress them into elite breeding materials. Pea now benefits from next-generation sequencing and high-throughput genotyping technologies that are paving the way for genome-wide association studies and genomic selection approaches. This review covers the significant development and deployment of genomic tools for pea breeding in recent years. Future prospects are discussed especially in light of current progress towards deciphering the pea genome.

  8. Expression of a transferred nuclear gene in a mitochondrial genome

    Directory of Open Access Journals (Sweden)

    Yichun Qiu

    2014-08-01

    Full Text Available Transfer of mitochondrial genes to the nucleus, and subsequent gain of regulatory elements for expression, is an ongoing evolutionary process in plants. Many examples have been characterized, which in some cases have revealed sources of mitochondrial targeting sequences and cis-regulatory elements. In contrast, there have been no reports of a nuclear gene that has undergone intracellular transfer to the mitochondrial genome and become expressed. Here we show that the orf164 gene in the mitochondrial genome of several Brassicaceae species, including Arabidopsis, is derived from the nuclear ARF17 gene that codes for an auxin responsive protein and is present across flowering plants. Orf164 corresponds to a portion of ARF17, and the nucleotide and amino acid sequences are 79% and 81% identical, respectively. Orf164 is transcribed in several organ types of Arabidopsis thaliana, as detected by RT-PCR. In addition, orf164 is transcribed in five other Brassicaceae within the tribes Camelineae, Erysimeae and Cardamineae, but the gene is not present in Brassica or Raphanus. This study shows that nuclear genes can be transferred to the mitochondrial genome and become expressed, providing a new perspective on the movement of genes between the genomes of subcellular compartments.

  9. Genome engineering using a synthetic gene circuit in Bacillus subtilis.

    Science.gov (United States)

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-03-31

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac-chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated...

  11. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    Energy Technology Data Exchange (ETDEWEB)

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives

  12. Genovar: a detection and visualization tool for genomic variants.

    Science.gov (United States)

    Jung, Kwang Su; Moon, Sanghoon; Kim, Young Jin; Kim, Bong-Jo; Park, Kiejung

    2012-05-08

    Along with single nucleotide polymorphisms (SNPs), copy number variation (CNV) is considered an important source of genetic variation associated with disease susceptibility. Despite the importance of CNV, the tools currently available for its analysis often produce false positive results due to limitations such as low resolution of array platforms, platform specificity, and the type of CNV. To resolve this problem, spurious signals must be separated from true signals by visual inspection. None of the previously reported CNV analysis tools support this function and the simultaneous visualization of comparative genomic hybridization arrays (aCGH) and sequence alignment. The purpose of the present study was to develop a useful program for the efficient detection and visualization of CNV regions that enables the manual exclusion of erroneous signals. A JAVA-based stand-alone program called Genovar was developed. To ascertain whether a detected CNV region is a novel variant, Genovar compares the detected CNV regions with previously reported CNV regions using the Database of Genomic Variants (DGV, http://projects.tcag.ca/variation) and the Single Nucleotide Polymorphism Database (dbSNP). The current version of Genovar is capable of visualizing genomic data from sources such as the aCGH data file and sequence alignment format files. Genovar is freely accessible and provides a user-friendly graphic user interface (GUI) to facilitate the detection of CNV regions. The program also provides comprehensive information to help in the elimination of spurious signals by visual inspection, making Genovar a valuable tool for reducing false positive CNV results. http://genovar.sourceforge.net/.

  13. The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes.

    Directory of Open Access Journals (Sweden)

    Marion Ouedraogo

    Full Text Available BACKGROUND: There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD was developed for this purpose. METHODOLOGY: Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. CONCLUSIONS: The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.

  14. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    OpenAIRE

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Deng, Zixin; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’....

  15. The genomic structure of the DMBT1 gene

    DEFF Research Database (Denmark)

    Mollenhauer, J; Holmskov, U; Wiemann, S

    1999-01-01

    Increasing evidence has accumulated for an involvement of the inactivation of tumour suppressor genes at chromosome 10q in the carcinogenesis of brain tumours, melanomas, and carcinomas of the lung, the prostate, the pancreas, and the endometrium. The gene DMBT1 (Deleted in Malignant Brain Tumours...... 1) is located at chromosome 10q25.3-q26.1, within one of the putative intervals for tumour suppressor genes. DMBT1 is a member of the scavenger-receptor cysteine-rich (SRCR) superfamily and displays homozygous deletions or lack of expression in glioblastoma multiforme, medulloblastoma......, and in gastrointestinal and lung cancers. Based on these properties, DMBT1 has been proposed to be a candidate tumour suppressor gene. We have determined the genomic sequence of DMBT1 to allow analyses of mutations. The gene has at least 54 exons that span a genomic region of about 80 kb. We have identified a putative...

  16. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...... in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H. influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions....

  17. iPat: intelligent prediction and association tool for genomic research.

    Science.gov (United States)

    Chen, Chunpeng James; Zhang, Zhiwu

    2018-06-01

    The ultimate goal of genomic research is to effectively predict phenotypes from genotypes so that medical management can improve human health and molecular breeding can increase agricultural production. Genomic prediction or selection (GS) plays a complementary role to genome-wide association studies (GWAS), which is the primary method to identify genes underlying phenotypes. Unfortunately, most computing tools cannot perform data analyses for both GWAS and GS. Furthermore, the majority of these tools are executed through a command-line interface (CLI), which requires programming skills. Non-programmers struggle to use them efficiently because of the steep learning curves and zero tolerance for data formats and mistakes when inputting keywords and parameters. To address these problems, this study developed a software package, named the Intelligent Prediction and Association Tool (iPat), with a user-friendly graphical user interface. With iPat, GWAS or GS can be performed using a pointing device to simply drag and/or click on graphical elements to specify input data files, choose input parameters and select analytical models. Models available to users include those implemented in third party CLI packages such as GAPIT, PLINK, FarmCPU, BLINK, rrBLUP and BGLR. Users can choose any data format and conduct analyses with any of these packages. File conversions are automatically conducted for specified input data and selected packages. A GWAS-assisted genomic prediction method was implemented to perform genomic prediction using any GWAS method such as FarmCPU. iPat was written in Java for adaptation to multiple operating systems including Windows, Mac and Linux. The iPat executable file, user manual, tutorials and example datasets are freely available at http://zzlab.net/iPat. zhiwu.zhang@wsu.edu.

  18. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies.

    Science.gov (United States)

    Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M

    2012-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.

  19. BFAST: an alignment tool for large scale genome resequencing.

    Directory of Open Access Journals (Sweden)

    Nils Homer

    2009-11-01

    Full Text Available The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net.

  20. Investigation of mutations in the HBB gene using the 1,000 genomes database.

    Science.gov (United States)

    Carlice-Dos-Reis, Tânia; Viana, Jaime; Moreira, Fabiano Cordeiro; Cardoso, Greice de Lemos; Guerreiro, João; Santos, Sidney; Ribeiro-Dos-Santos, Ândrea

    2017-01-01

    Mutations in the HBB gene are responsible for several serious hemoglobinopathies, such as sickle cell anemia and β-thalassemia. Sickle cell anemia is one of the most common monogenic diseases worldwide. Due to its prevalence, diverse strategies have been developed for a better understanding of its molecular mechanisms. In silico analysis has been increasingly used to investigate the genotype-phenotype relationship of many diseases, and the sequences of healthy individuals deposited in the 1,000 Genomes database appear to be an excellent tool for such analysis. The objective of this study is to analyze the variations in the HBB gene in the 1,000 Genomes database, to describe the mutation frequencies in the different population groups, and to investigate the pattern of pathogenicity. The computational tool SNPEFF was used to align the data from 2,504 samples of the 1,000 Genomes database with the HG19 genome reference. The pathogenicity of each amino acid change was investigated using the databases CLINVAR, dbSNP and HbVar and five different predictors. Twenty different mutations were found in 209 healthy individuals. The African group had the highest number of individuals with mutations, and the European group had the lowest number. Thus, it is concluded that approximately 8.3% of phenotypically healthy individuals from the 1,000 Genomes database have some mutation in the HBB gene. The frequency of mutated genes was estimated at 0.042, so that the expected frequency of being homozygous or compound heterozygous for these variants in the next generation is approximately 0.002. In total, 193 subjects had a non-synonymous mutation, which 186 (7.4%) have a deleterious mutation. Considering that the 1,000 Genomes database is representative of the world's population, it can be estimated that fourteen out of every 10,000 individuals in the world will have a hemoglobinopathy in the next generation.

  1. PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix.

    Science.gov (United States)

    Ambrosini, Giovanna; Groux, Romain; Bucher, Philipp

    2018-03-05

    Transcription factors (TFs) regulate gene expression by binding to specific short DNA sequences of 5 to 20-bp to regulate the rate of transcription of genetic information from DNA to messenger RNA. We present PWMScan, a fast web-based tool to scan server-resident genomes for matches to a user-supplied PWM or TF binding site model from a public database. The web server and source code are available at http://ccg.vital-it.ch/pwmscan and https://sourceforge.net/projects/pwmscan, respectively. giovanna.ambrosini@epfl.ch. SUPPLEMENTARY DATA ARE AVAILABLE AT BIOINFORMATICS ONLINE.

  2. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    Science.gov (United States)

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813

  3. Sampling the genomic pool of protein tyrosine kinase genes using the polymerase chain reaction with genomic DNA.

    Science.gov (United States)

    Oates, A C; Wollberg, P; Achen, M G; Wilks, A F

    1998-08-28

    The polymerase chain reaction (PCR), with cDNA as template, has been widely used to identify members of protein families from many species. A major limitation of using cDNA in PCR is that detection of a family member is dependent on temporal and spatial patterns of gene expression. To circumvent this restriction, and in order to develop a technique that is broadly applicable we have tested the use of genomic DNA as PCR template to identify members of protein families in an expression-independent manner. This test involved amplification of DNA encoding protein tyrosine kinase (PTK) genes from the genomes of three animal species that are well known development models; namely, the mouse Mus musculus, the fruit fly Drosophila melanogaster, and the nematode worm Caenorhabditis elegans. Ten PTK genes were identified from the mouse, 13 from the fruit fly, and 13 from the nematode worm. Among these kinases were 13 members of the PTK family that had not been reported previously. Selected PTKs from this screen were shown to be expressed during development, demonstrating that the amplified fragments did not arise from pseudogenes. This approach will be useful for the identification of many novel members of gene families in organisms of agricultural, medical, developmental and evolutionary significance and for analysis of gene families from any species, or biological sample whose habitat precludes the isolation of mRNA. Furthermore, as a tool to hasten the discovery of members of gene families that are of particular interest, this method offers an opportunity to sample the genome for new members irrespective of their expression pattern.

  4. Zebrafish Expression Ontology of Gene Sets (ZEOGS): a tool to analyze enrichment of zebrafish anatomical terms in large gene sets.

    Science.gov (United States)

    Prykhozhij, Sergey V; Marsico, Annalisa; Meijsing, Sebastiaan H

    2013-09-01

    The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene expression

  5. Zebrafish Expression Ontology of Gene Sets (ZEOGS): A Tool to Analyze Enrichment of Zebrafish Anatomical Terms in Large Gene Sets

    Science.gov (United States)

    Marsico, Annalisa

    2013-01-01

    Abstract The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene

  6. Sugar Lego: gene composition of bacterial carbohydrate metabolism genomic loci.

    Science.gov (United States)

    Kaznadzey, Anna; Shelyakin, Pavel; Gelfand, Mikhail S

    2017-11-25

    Bacterial carbohydrate metabolism is extremely diverse, since carbohydrates serve as a major energy source and are involved in a variety of cellular processes. Bacterial genes belonging to same metabolic pathway are often co-localized in the chromosome, but it is not a strict rule. Gene co-localization in linked to co-evolution and co-regulation. This study focuses on a large-scale analysis of bacterial genomic loci related to the carbohydrate metabolism. We demonstrate that only 53% of 148,000 studied genes from over six hundred bacterial genomes are co-localized in bacterial genomes with other carbohydrate metabolism genes, which points to a significant role of singleton genes. Co-localized genes form cassettes, ranging in size from two to fifteen genes. Two major factors influencing the cassette-forming tendency are gene function and bacterial phylogeny. We have obtained a comprehensive picture of co-localization preferences of genes for nineteen major carbohydrate metabolism functional classes, over two hundred gene orthologous clusters, and thirty bacterial classes, and characterized the cassette variety in size and content among different species, highlighting a significant role of short cassettes. The preference towards co-localization of carbohydrate metabolism genes varies between 40 and 76% for bacterial taxa. Analysis of frequently co-localized genes yielded forty-five significant pairwise links between genes belonging to different functional classes. The number of such links per class range from zero to eight, demonstrating varying preferences of respective genes towards a specific chromosomal neighborhood. Genes from eleven functional classes tend to co-localize with genes from the same class, indicating an important role of clustering of genes with similar functions. At that, in most cases such co-localization does not originate from local duplication events. Overall, we describe a complex web formed by evolutionary relationships of bacterial

  7. Gene organization inside replication domains in mammalian genomes

    Science.gov (United States)

    Zaghloul, Lamia; Baker, Antoine; Audit, Benjamin; Arneodo, Alain

    2012-11-01

    We investigate the large-scale organization of human genes with respect to "master" replication origins that were previously identified as bordering nucleotide compositional skew domains. We separate genes in two categories depending on their CpG enrichment at the promoter which can be considered as a marker of germline DNA methylation. Using expression data in mouse, we confirm that CpG-rich genes are highly expressed in germline whereas CpG-poor genes are in a silent state. We further show that, whether tissue-specific or broadly expressed (housekeeping genes), the CpG-rich genes are over-represented close to the replication skew domain borders suggesting some coordination of replication and transcription. We also reveal that the transcription of the longest CpG-rich genes is co-oriented with replication fork progression so that the promoter of these transcriptionally active genes be located into the accessible open chromatin environment surrounding the master replication origins that border the replication skew domains. The observation of a similar gene organization in the mouse genome confirms the interplay of replication, transcription and chromatin structure as the cornerstone of mammalian genome architecture.

  8. Differential retention of metabolic genes following whole-genome duplication.

    Science.gov (United States)

    Gout, Jean-François; Duret, Laurent; Kahn, Daniel

    2009-05-01

    Classical studies in Metabolic Control Theory have shown that metabolic fluxes usually exhibit little sensitivity to changes in individual enzyme activity, yet remain sensitive to global changes of all enzymes in a pathway. Therefore, little selective pressure is expected on the dosage or expression of individual metabolic genes, yet entire pathways should still be constrained. However, a direct estimate of this selective pressure had not been evaluated. Whole-genome duplications (WGDs) offer a good opportunity to address this question by analyzing the fates of metabolic genes during the massive gene losses that follow. Here, we take advantage of the successive rounds of WGD that occurred in the Paramecium lineage. We show that metabolic genes exhibit different gene retention patterns than nonmetabolic genes. Contrary to what was expected for individual genes, metabolic genes appeared more retained than other genes after the recent WGD, which was best explained by selection for gene expression operating on entire pathways. Metabolic genes also tend to be less retained when present at high copy number before WGD, contrary to other genes that show a positive correlation between gene retention and preduplication copy number. This is rationalized on the basis of the classical concave relationship relating metabolic fluxes with enzyme expression.

  9. Detailed analysis of putative genes encoding small proteins in legume genomes

    Directory of Open Access Journals (Sweden)

    Gabriel eGuillén

    2013-06-01

    Full Text Available Diverse plant genome sequencing projects coupled with powerful bioinformatics tools have facilitated massive data analysis to construct specialized databases classified according to cellular function. However, there are still a considerable number of genes encoding proteins whose function has not yet been characterized. Included in this category are small proteins (SPs, 30-150 amino acids encoded by short open reading frames (sORFs. SPs play important roles in plant physiology, growth, and development. Unfortunately, protocols focused on the genome-wide identification and characterization of sORFs are scarce or remain poorly implemented. As a result, these genes are underrepresented in many genome annotations. In this work, we exploited publicly available genome sequences of Phaseolus vulgaris, Medicago truncatula, Glycine max and Lotus japonicus to analyze the abundance of annotated SPs in plant legumes. Our strategy to uncover bona fide sORFs at the genome level was centered in bioinformatics analysis of characteristics such as evidence of expression (transcription, presence of known protein regions or domains, and identification of orthologous genes in the genomes explored. We collected 6170, 10461, 30521, and 23599 putative sORFs from P. vulgaris, G. max, M. truncatula, and L. japonicus genomes, respectively. Expressed sequence tags (ESTs available in the DFCI Gene Index database provided evidence that ~one-third of the predicted legume sORFs are expressed. Most potential SPs have a counterpart in a different plant species and counterpart regions or domains in larger proteins. Potential functional sORFs were also classified according to a reduced set of GO categories, and the expression of 13 of them during P. vulgaris nodule ontogeny was confirmed by qPCR. This analysis provides a collection of sORFs that potentially encode for meaningful SPs, and offers the possibility of their further functional evaluation.

  10. Genetic addiction: selfish gene's strategy for symbiosis in the genome.

    Science.gov (United States)

    Mochizuki, Atsushi; Yahara, Koji; Kobayashi, Ichizo; Iwasa, Yoh

    2006-02-01

    The evolution and maintenance of the phenomenon of postsegregational host killing or genetic addiction are paradoxical. In this phenomenon, a gene complex, once established in a genome, programs death of a host cell that has eliminated it. The intact form of the gene complex would survive in other members of the host population. It is controversial as to why these genetic elements are maintained, due to the lethal effects of host killing, or perhaps some other properties are beneficial to the host. We analyzed their population dynamics by analytical methods and computer simulations. Genetic addiction turned out to be advantageous to the gene complex in the presence of a competitor genetic element. The advantage is, however, limited in a population without spatial structure, such as that in a well-mixed liquid culture. In contrast, in a structured habitat, such as the surface of a solid medium, the addiction gene complex can increase in frequency, irrespective of its initial density. Our demonstration that genomes can evolve through acquisition of addiction genes has implications for the general question of how a genome can evolve as a community of potentially selfish genes.

  11. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    Energy Technology Data Exchange (ETDEWEB)

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  12. University of Texas Southwestern Medical Center (UTSW): Functional Signature Ontology Tool: Triplicate Measurements of Reporter Gene Expression in Response to Individual Genetic and Chemical Perturbations in HCT116 Cells | Office of Cancer Genomics

    Science.gov (United States)

    The goal of this project is to use an eight-gene expression profile to define functional signatures for small molecules and natural products with heretofore undefined mechanism of action. Two genes in the eight gene set are used as internal controls and do not vary across gene expression array data collected from the public domain. The remaining six genes are found to vary independently across a large collection of publically available gene expression array datasets.  Read the abstract

  13. University of Texas Southwestern Medical Center: Functional Signature Ontology Tool: Triplicate Measurements of Reporter Gene Expression in Response to Individual Genetic and Chemical Perturbations in HCT116 Cells | Office of Cancer Genomics

    Science.gov (United States)

    The goal of this project is to use an eight-gene expression profile to define functional signatures for small molecules and natural products with heretofore undefined mechanism of action. Two genes in the eight gene set are used as internal controls and do not vary across gene expression array data collected from the public domain. The remaining six genes are found to vary independently across a large collection of publically available gene expression array datasets.  Read the abstract

  14. Identification of candidate new cancer susceptibility genes using yeast genomics

    International Nuclear Information System (INIS)

    Brown, M.; Brown, J.A.; Game, J.C.

    2003-01-01

    A large proportion of cancer susceptibility syndromes are the result of mutations in genes in DNA repair or in cell-cycle checkpoints in response to DNA damage, such as ataxia telangiectasia (AT), Fanconi's anemia (FA), Bloom's syndrome (BS), Nijmegen breakage syndrome (NBS), and xeroderma pigmentosum (XP). Mutations in these genes often cause gross chromosomal instability leading to an increased mutation rate of all genes including those directly responsible for cancer. We have proposed that because the orthologs of these genes in budding yeast, S. cerevisiae, confer protection against killing by DNA damaging agents it should be possible to identify new cancer susceptibility genes by identifying yeast genes whose deletion causes sensitivity to DNA damage. We therefore screened the recently completed collection of individual gene deletion mutants to identify genes that affect sensitivity to DNA-damaging agents. Screening for sensitivity in this obtained up to now with the F98 glioma model othe fact that each deleted gene is replaced by a cassette containing two molecular 'barcodes', or 20-mers, that uniquely identify the strain when DNA from a pool of strains is hybridized to an oligonucleotide array containing the complementary sequences of the barcodes. We performed the screen with UV, IR, H 2 0 2 and other DNA damaging agents. In addition to identifying genes already known to confer resistance to DNA damaging agents we have identified, and individually confirmed, several genes not previously associated with resistance. Several of these are of unknown function. We have also examined the chromosomal stability of selected strains and found that IR sensitive strains often but not always exhibit genomic instability. We are presently constructing a yeast artificial chromosome to globally interrogate all the genes in the deletion pool for their involvement in genomic stability. This work shows that budding yeast is a valuable eukaryotic model organism to identify

  15. Population genomics of the immune evasion (var genes of Plasmodium falciparum.

    Directory of Open Access Journals (Sweden)

    Alyssa E Barry

    2007-03-01

    Full Text Available Var genes encode the major surface antigen (PfEMP1 of the blood stages of the human malaria parasite Plasmodium falciparum. Differential expression of up to 60 diverse var genes in each parasite genome underlies immune evasion. We compared the diversity of the DBLalpha domain of var genes sampled from 30 parasite isolates from a malaria endemic area of Papua New Guinea (PNG and 59 from widespread geographic origins (global. Overall, we obtained over 8,000 quality-controlled DBLalpha sequences. Within our sampling frame, the global population had a total of 895 distinct DBLalpha "types" and negligible overlap among repertoires. This indicated that var gene diversity on a global scale is so immense that many genomes would need to be sequenced to capture its true extent. In contrast, we found a much lower diversity in PNG of 185 DBLalpha types, with an average of approximately 7% overlap among repertoires. While we identify marked geographic structuring, nearly 40% of types identified in PNG were also found in samples from different countries showing a cosmopolitan distribution for much of the diversity. We also present evidence to suggest that recombination plays a key role in maintaining the unprecedented levels of polymorphism found in these immune evasion genes. This population genomic framework provides a cost effective molecular epidemiological tool to rapidly explore the geographic diversity of var genes.

  16. Genome Binding and Gene Regulation by Stem Cell Transcription Factors

    NARCIS (Netherlands)

    J.H. Brandsma (Johan)

    2016-01-01

    markdownabstractNearly all cells of an individual organism contain the same genome. However, each cell type transcribes a different set of genes due to the presence of different sets of cell type-specific transcription factors. Such transcription factors bind to regulatory regions such as promoters

  17. Genomic dissection and prioritizing of candidate genes of QTL for ...

    Indian Academy of Sciences (India)

    of Anatomy and Neurobiology, University of Tennessee Health Science Center, Memphis, TN 38163, USA. 5Mudanjiang ..... Fragile X mental retardation gene 1,. −2.1 ... stimulus/stress and signalling associated with acute-phase response were .... This work was supported by the Center of Genomics and Bioinfor- matics and ...

  18. Re-Examining the Gene in Personalized Genomics

    Science.gov (United States)

    Bartol, Jordan

    2013-01-01

    Personalized genomics companies (PG; also called "direct-to-consumer genetics") are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept…

  19. Genomic dissection and prioritizing of candidate genes of QTL for ...

    Indian Academy of Sciences (India)

    Genomic dissection and prioritizing of candidate genes of QTL for regulating spontaneous arthritis on chromosome 1 in mice deficient for interleukin-1 receptor antagonist. Yanhong Cao, Jifei Zhang, Yan Jiao, Jian Yan, Feng Jiao, XiaoYun Liu, Robert W. Williams, Karen A. Hasty,. John M. Stuart and Weikuan Gu. J. Genet.

  20. Gene therapy as a potential tool for treating neuroblastoma-a focused review.

    Science.gov (United States)

    Kumar, M D; Dravid, A; Kumar, A; Sen, D

    2016-05-01

    Neuroblastoma, a solid tumor caused by rapid division of undifferentiated neuroblasts, is the most common childhood malignancy affecting children aged genes is restored to normalcy. Gene therapy is a powerful tool with the potential to inhibit the deleterious effects of oncogenes by inserting corrected/normal genes into the genome. Both viral and non-viral vector-based gene therapies have been developed and adopted to deliver the target genes into neuroblastoma cells. These attempts have given hope to bringing in a new regime of treatment against neuroblastoma. A few gene-therapy-based treatment strategies have been tested in limited clinical trials yielding some positive results. This mini review is an attempt to provide an overview of the available options of gene therapy to treat neuroblastoma.

  1. A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

    Science.gov (United States)

    Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J

    2018-04-16

    Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and

  2. Comparative genomics of Mycoplasma: analysis of conserved essential genes and diversity of the pan-genome.

    Directory of Open Access Journals (Sweden)

    Wei Liu

    Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.

  3. Genomic Features That Predict Allelic Imbalance in Humans Suggest Patterns of Constraint on Gene Expression Variation

    Science.gov (United States)

    Fédrigo, Olivier; Haygood, Ralph; Mukherjee, Sayan; Wray, Gregory A.

    2009-01-01

    Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary

  4. Importing statistical measures into Artemis enhances gene identification in the Leishmania genome project

    Directory of Open Access Journals (Sweden)

    McDonagh Paul D

    2003-06-01

    Full Text Available Abstract Background Seattle Biomedical Research Institute (SBRI as part of the Leishmania Genome Network (LGN is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces. Results Here we describe a methodology used to import results from three different protein-coding gene-prediction algorithms (GLIMMER, TESTCODE and GENESCAN into the ARTEMIS sequence viewer and annotation tool. Comparison of these methods, along with the CODONUSAGE algorithm built into ARTEMIS, shows the importance of combining methods to more accurately annotate the L. major genomic sequence. Conclusion An improvised and powerful tool for gene prediction has been developed by importing data from widely-used algorithms into an existing annotation platform. This approach is especially fruitful in the Leishmania genome project where there is large proportion of novel genes requiring manual annotation.

  5. Transposons As Tools for Functional Genomics in Vertebrate Models.

    Science.gov (United States)

    Kawakami, Koichi; Largaespada, David A; Ivics, Zoltán

    2017-11-01

    Genetic tools and mutagenesis strategies based on transposable elements are currently under development with a vision to link primary DNA sequence information to gene functions in vertebrate models. By virtue of their inherent capacity to insert into DNA, transposons can be developed into powerful tools for chromosomal manipulations. Transposon-based forward mutagenesis screens have numerous advantages including high throughput, easy identification of mutated alleles, and providing insight into genetic networks and pathways based on phenotypes. For example, the Sleeping Beauty transposon has become highly instrumental to induce tumors in experimental animals in a tissue-specific manner with the aim of uncovering the genetic basis of diverse cancers. Here, we describe a battery of mutagenic cassettes that can be applied in conjunction with transposon vectors to mutagenize genes, and highlight versatile experimental strategies for the generation of engineered chromosomes for loss-of-function as well as gain-of-function mutagenesis for functional gene annotation in vertebrate models, including zebrafish, mice, and rats. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Development of Genomic and Genetic Tools for Foxtail Millet, and Use of These Tools in the Improvement of Biomass Production for Bioenergy Crops

    Energy Technology Data Exchange (ETDEWEB)

    Doust, Andrew, N.

    2011-11-11

    The overall aim of this research was to develop genomic and genetic tools in foxtail millet that will be useful in improving biomass production in bioenergy crops such as switchgrass, napier grass, and pearl millet. A variety of approaches have been implemented, and our lab has been primarily involved in genome analysis and quantitative genetic analysis. Our progress in these activities has been substantially helped by the genomic sequence of foxtail millet produced by the Joint Genome Institute (Bennetzen et al., in prep). In particular, the annotation and analysis of candidate genes for architecture, biomass production and flowering has led to new insights into the control of branching and flowering time, and has shown how closely related flowering time is to vegetative architectural development and biomass accumulation. The differences in genetic control identified at high and low density plantings have direct relevance to the breeding of bioenergy grasses that are tolerant of high planting densities. The developmental analyses have shown how plant architecture changes over time and may indicate which genes may best be manipulated at various times during development to obtain required biomass characteristics. This data contributes to the overall aim of significantly improving genetic and genomic tools in foxtail millet that can be directed to improvement of bioenergy grasses such as switchgrass, where it is important to maximize vegetative growth for greatest biomass production.

  7. HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies

    Science.gov (United States)

    De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Novikov, A.; Poyda, A.; Tertychnyy, I.; Wenaus, T.

    2017-10-01

    PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.

  8. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  9. From genes to milk: genomic organization and epigenetic regulation of the mammary transcriptome.

    Science.gov (United States)

    Lemay, Danielle G; Pollard, Katherine S; Martin, William F; Freeman Zadrowski, Courtneay; Hernandez, Joseph; Korf, Ian; German, J Bruce; Rijnkels, Monique

    2013-01-01

    Even in genomes lacking operons, a gene's position in the genome influences its potential for expression. The mechanisms by which adjacent genes are co-expressed are still not completely understood. Using lactation and the mammary gland as a model system, we explore the hypothesis that chromatin state contributes to the co-regulation of gene neighborhoods. The mammary gland represents a unique evolutionary model, due to its recent appearance, in the context of vertebrate genomes. An understanding of how the mammary gland is regulated to produce milk is also of biomedical and agricultural importance for human lactation and dairying. Here, we integrate epigenomic and transcriptomic data to develop a comprehensive regulatory model. Neighborhoods of mammary-expressed genes were determined using expression data derived from pregnant and lactating mice and a neighborhood scoring tool, G-NEST. Regions of open and closed chromatin were identified by ChIP-Seq of histone modifications H3K36me3, H3K4me2, and H3K27me3 in the mouse mammary gland and liver tissue during lactation. We found that neighborhoods of genes in regions of uniquely active chromatin in the lactating mammary gland, compared with liver tissue, were extremely rare. Rather, genes in most neighborhoods were suppressed during lactation as reflected in their expression levels and their location in regions of silenced chromatin. Chromatin silencing was largely shared between the liver and mammary gland during lactation, and what distinguished the mammary gland was mainly a small tissue-specific repertoire of isolated, expressed genes. These findings suggest that an advantage of the neighborhood organization is in the collective repression of groups of genes via a shared mechanism of chromatin repression. Genes essential to the mammary gland's uniqueness are isolated from neighbors, and likely have less tolerance for variation in expression, properties they share with genes responsible for an organism's survival.

  10. Soft rot erwiniae: from genes to genomes.

    Science.gov (United States)

    Toth, Ian K; Bell, Kenneth S; Holeva, Maria C; Birch, Paul R J

    2003-01-01

    SUMMARY The soft rot erwiniae, Erwinia carotovora ssp. atroseptica (Eca), E. carotovora ssp. carotovora (Ecc) and E. chrysanthemi (Ech) are major bacterial pathogens of potato and other crops world-wide. We currently understand much about how these bacteria attack plants and protect themselves against plant defences. However, the processes underlying the establishment of infection, differences in host range and their ability to survive when not causing disease, largely remain a mystery. This review will focus on our current knowledge of pathogenesis in these organisms and discuss how modern genomic approaches, including complete genome sequencing of Eca and Ech, may open the door to a new understanding of the potential subtlety and complexity of soft rot erwiniae and their interactions with plants. The soft rot erwiniae are members of the Enterobacteriaceae, along with other plant pathogens such as Erwinia amylovora and human pathogens such as Escherichia coli, Salmonella spp. and Yersinia spp. Although the genus name Erwinia is most often used to describe the group, an alternative genus name Pectobacterium was recently proposed for the soft rot species. Ech mainly affects crops and other plants in tropical and subtropical regions and has a wide host range that includes potato and the important model host African violet (Saintpaulia ionantha). Ecc affects crops and other plants in subtropical and temperate regions and has probably the widest host range, which also includes potato. Eca, on the other hand, has a host range limited almost exclusively to potato in temperate regions only. Disease symptoms: Soft rot erwiniae cause general tissue maceration, termed soft rot disease, through the production of plant cell wall degrading enzymes. Environmental factors such as temperature, low oxygen concentration and free water play an essential role in disease development. On potato, and possibly other plants, disease symptoms may differ, e.g. blackleg disease is associated

  11. Genome-wide identification of key modulators of gene-gene interaction networks in breast cancer.

    Science.gov (United States)

    Chiu, Yu-Chiao; Wang, Li-Ju; Hsiao, Tzu-Hung; Chuang, Eric Y; Chen, Yidong

    2017-10-03

    With the advances in high-throughput gene profiling technologies, a large volume of gene interaction maps has been constructed. A higher-level layer of gene-gene interaction, namely modulate gene interaction, is composed of gene pairs of which interaction strengths are modulated by (i.e., dependent on) the expression level of a key modulator gene. Systematic investigations into the modulation by estrogen receptor (ER), the best-known modulator gene, have revealed the functional and prognostic significance in breast cancer. However, a genome-wide identification of key modulator genes that may further unveil the landscape of modulated gene interaction is still lacking. We proposed a systematic workflow to screen for key modulators based on genome-wide gene expression profiles. We designed four modularity parameters to measure the ability of a putative modulator to perturb gene interaction networks. Applying the method to a dataset of 286 breast tumors, we comprehensively characterized the modularity parameters and identified a total of 973 key modulator genes. The modularity of these modulators was verified in three independent breast cancer datasets. ESR1, the encoding gene of ER, appeared in the list, and abundant novel modulators were illuminated. For instance, a prognostic predictor of breast cancer, SFRP1, was found the second modulator. Functional annotation analysis of the 973 modulators revealed involvements in ER-related cellular processes as well as immune- and tumor-associated functions. Here we present, as far as we know, the first comprehensive analysis of key modulator genes on a genome-wide scale. The validity of filtering parameters as well as the conservativity of modulators among cohorts were corroborated. Our data bring new insights into the modulated layer of gene-gene interaction and provide candidates for further biological investigations.

  12. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  13. Comparative genome analysis of PHB gene family reveals deep evolutionary origins and diverse gene function.

    Science.gov (United States)

    Di, Chao; Xu, Wenying; Su, Zhen; Yuan, Joshua S

    2010-10-07

    PHB (Prohibitin) gene family is involved in a variety of functions important for different biological processes. PHB genes are ubiquitously present in divergent species from prokaryotes to eukaryotes. Human PHB genes have been found to be associated with various diseases. Recent studies by our group and others have shown diverse function of PHB genes in plants for development, senescence, defence, and others. Despite the importance of the PHB gene family, no comprehensive gene family analysis has been carried to evaluate the relatedness of PHB genes across different species. In order to better guide the gene function analysis and understand the evolution of the PHB gene family, we therefore carried out the comparative genome analysis of the PHB genes across different kingdoms. The relatedness, motif distribution, and intron/exon distribution all indicated that PHB genes is a relatively conserved gene family. The PHB genes can be classified into 5 classes and each class have a very deep evolutionary origin. The PHB genes within the class maintained the same motif patterns during the evolution. With Arabidopsis as the model species, we found that PHB gene intron/exon structure and domains are also conserved during the evolution. Despite being a conserved gene family, various gene duplication events led to the expansion of the PHB genes. Both segmental and tandem gene duplication were involved in Arabidopsis PHB gene family expansion. However, segmental duplication is predominant in Arabidopsis. Moreover, most of the duplicated genes experienced neofunctionalization. The results highlighted that PHB genes might be involved in important functions so that the duplicated genes are under the evolutionary pressure to derive new function. PHB gene family is a conserved gene family and accounts for diverse but important biological functions based on the similar molecular mechanisms. The highly diverse biological function indicated that more research needs to be carried out

  14. Single virus genomics: a new tool for virus discovery.

    Directory of Open Access Journals (Sweden)

    Lisa Zeigler Allen

    Full Text Available Whole genome amplification and sequencing of single microbial cells has significantly influenced genomics and microbial ecology by facilitating direct recovery of reference genome data. However, viral genomics continues to suffer due to difficulties related to the isolation and characterization of uncultivated viruses. We report here on a new approach called 'Single Virus Genomics', which enabled the isolation and complete genome sequencing of the first single virus particle. A mixed assemblage comprised of two known viruses; E. coli bacteriophages lambda and T4, were sorted using flow cytometric methods and subsequently immobilized in an agarose matrix. Genome amplification was then achieved in situ via multiple displacement amplification (MDA. The complete lambda phage genome was recovered with an average depth of coverage of approximately 437X. The isolation and genome sequencing of uncultivated viruses using Single Virus Genomics approaches will enable researchers to address questions about viral diversity, evolution, adaptation and ecology that were previously unattainable.

  15. Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.

    Science.gov (United States)

    Hiscock, D; Upton, C

    2000-05-01

    The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .

  16. Gene Conversion in Angiosperm Genomes with an Emphasis on Genes Duplicated by Polyploidization

    Directory of Open Access Journals (Sweden)

    Xi-Yin Wang

    2011-01-01

    Full Text Available Angiosperm genomes differ from those of mammals by extensive and recursive polyploidizations. The resulting gene duplication provides opportunities both for genetic innovation, and for concerted evolution. Though most genes may escape conversion by their homologs, concerted evolution of duplicated genes can last for millions of years or longer after their origin. Indeed, paralogous genes on two rice chromosomes duplicated an estimated 60–70 million years ago have experienced gene conversion in the past 400,000 years. Gene conversion preserves similarity of paralogous genes, but appears to accelerate their divergence from orthologous genes in other species. The mutagenic nature of recombination coupled with the buffering effect provided by gene redundancy, may facilitate the evolution of novel alleles that confer functional innovations while insulating biological fitness of affected plants. A mixed evolutionary model, characterized by a primary birth-and-death process and occasional homoeologous recombination and gene conversion, may best explain the evolution of multigene families.

  17. Genome-wide associations of gene expression variation in humans.

    Directory of Open Access Journals (Sweden)

    Barbara E Stranger

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  18. Genome-Wide Associations of Gene Expression Variation in Humans.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  19. Confluence of genes, environment, development, and behavior in a post Genome-Wide Association Study world.

    Science.gov (United States)

    Vrieze, Scott I; Iacono, William G; McGue, Matt

    2012-11-01

    This article serves to outline a research paradigm to investigate main effects and interactions of genes, environment, and development on behavior and psychiatric illness. We provide a historical context for candidate gene studies and genome-wide association studies, including benefits, limitations, and expected payoffs. Using substance use and abuse as our driving example, we then turn to the importance of etiological psychological theory in guiding genetic, environmental, and developmental research, as well as the utility of refined phenotypic measures, such as endophenotypes, in the pursuit of etiological understanding and focused tests of genetic and environmental associations. Phenotypic measurement has received considerable attention in the history of psychology and is informed by psychometrics, whereas the environment remains relatively poorly measured and is often confounded with genetic effects (i.e., gene-environment correlation). Genetically informed designs, which are no longer limited to twin and adoption studies thanks to ever-cheaper genotyping, are required to understand environmental influences. Finally, we outline the vast amount of individual difference in structural genomic variation, most of which remains to be leveraged in genetic association tests. Although the genetic data can be massive and burdensome (tens of millions of variants per person), we argue that improved understanding of genomic structure and function will provide investigators with new tools to test specific a priori hypotheses derived from etiological psychological theory, much like current candidate gene research but with less confusion and more payoff than candidate gene research has to date.

  20. Development of Genome Engineering Tools from Plant-Specific PPR Proteins Using Animal Cultured Cells.

    Science.gov (United States)

    Kobayashi, Takehito; Yagi, Yusuke; Nakamura, Takahiro

    2016-01-01

    The pentatricopeptide repeat (PPR) motif is a sequence-specific RNA/DNA-binding module. Elucidation of the RNA/DNA recognition mechanism has enabled engineering of PPR motifs as new RNA/DNA manipulation tools in living cells, including for genome editing. However, the biochemical characteristics of PPR proteins remain unknown, mostly due to the instability and/or unfolding propensities of PPR proteins in heterologous expression systems such as bacteria and yeast. To overcome this issue, we constructed reporter systems using animal cultured cells. The cell-based system has highly attractive features for PPR engineering: robust eukaryotic gene expression; availability of various vectors, reagents, and antibodies; highly efficient DNA delivery ratio (>80 %); and rapid, high-throughput data production. In this chapter, we introduce an example of such reporter systems: a PPR-based sequence-specific translational activation system. The cell-based reporter system can be applied to characterize plant genes of interested and to PPR engineering.

  1. Sequencing Single Cell Microbial Genomes with Microfluidic Amplifications Tools (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Quake, Steve

    2011-10-12

    Stanford University's Steve Quake on "Sequencing Single Cell Microbial Genomes with Microfluidic Amplification Tools" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  2. Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing

    Science.gov (United States)

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293

  3. GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies.

    Science.gov (United States)

    Sulovari, Arvis; Li, Dawei

    2014-07-19

    Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep

  4. Evolutionary maintenance of filovirus-like genes in bat genomes

    Directory of Open Access Journals (Sweden)

    Taylor Derek J

    2011-11-01

    Full Text Available Abstract Background Little is known of the biological significance and evolutionary maintenance of integrated non-retroviral RNA virus genes in eukaryotic host genomes. Here, we isolated novel filovirus-like genes from bat genomes and tested for evolutionary maintenance. We also estimated the age of filovirus VP35-like gene integrations and tested the phylogenetic hypotheses that there is a eutherian mammal clade and a marsupial/ebolavirus/Marburgvirus dichotomy for filoviruses. Results We detected homologous copies of VP35-like and NP-like gene integrations in both Old World and New World species of Myotis (bats. We also detected previously unknown VP35-like genes in rodents that are positionally homologous. Comprehensive phylogenetic estimates for filovirus NP-like and VP35-like loci support two main clades with a marsupial and a rodent grouping within the ebolavirus/Lloviu virus/Marburgvirus clade. The concordance of VP35-like, NP-like and mitochondrial gene trees with the expected species tree supports the notion that the copies we examined are orthologs that predate the global spread and radiation of the genus Myotis. Parametric simulations were consistent with selective maintenance for the open reading frame (ORF of VP35-like genes in Myotis. The ORF of the filovirus-like VP35 gene has been maintained in bat genomes for an estimated 13. 4 MY. ORFs were disrupted for the NP-like genes in Myotis. Likelihood ratio tests revealed that a model that accommodates positive selection is a significantly better fit to the data than a model that does not allow for positive selection for VP35-like sequences. Moreover, site-by-site analysis of selection using two methods indicated at least 25 sites in the VP35-like alignment are under positive selection in Myotis. Conclusions Our results indicate that filovirus-like elements have significance beyond genomic imprints of prior infection. That is, there appears to be, or have been, functionally maintained

  5. Integrated genomic and gene expression profiling identifies two major genomic circuits in urothelial carcinoma.

    Directory of Open Access Journals (Sweden)

    David Lindgren

    Full Text Available Similar to other malignancies, urothelial carcinoma (UC is characterized by specific recurrent chromosomal aberrations and gene mutations. However, the interconnection between specific genomic alterations, and how patterns of chromosomal alterations adhere to different molecular subgroups of UC, is less clear. We applied tiling resolution array CGH to 146 cases of UC and identified a number of regions harboring recurrent focal genomic amplifications and deletions. Several potential oncogenes were included in the amplified regions, including known oncogenes like E2F3, CCND1, and CCNE1, as well as new candidate genes, such as SETDB1 (1q21, and BCL2L1 (20q11. We next combined genome profiling with global gene expression, gene mutation, and protein expression data and identified two major genomic circuits operating in urothelial carcinoma. The first circuit was characterized by FGFR3 alterations, overexpression of CCND1, and 9q and CDKN2A deletions. The second circuit was defined by E3F3 amplifications and RB1 deletions, as well as gains of 5p, deletions at PTEN and 2q36, 16q, 20q, and elevated CDKN2A levels. TP53/MDM2 alterations were common for advanced tumors within the two circuits. Our data also suggest a possible RAS/RAF circuit. The tumors with worst prognosis showed a gene expression profile that indicated a keratinized phenotype. Taken together, our integrative approach revealed at least two separate networks of genomic alterations linked to the molecular diversity seen in UC, and that these circuits may reflect distinct pathways of tumor development.

  6. The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes.

    Directory of Open Access Journals (Sweden)

    Adam Alexander Thil Smith

    2012-05-01

    Full Text Available Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes, a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short. The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.

  7. Gene Discovery through Genomic Sequencing of Brucella abortus

    Science.gov (United States)

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  8. Degrees of separation as a statistical tool for evaluating candidate genes.

    Science.gov (United States)

    Nelson, Ronald M; Pettersson, Mats E

    2014-12-01

    Selection of candidate genes is an important step in the exploration of complex genetic architecture. The number of gene networks available is increasing and these can provide information to help with candidate gene selection. It is currently common to use the degree of connectedness in gene networks as validation in Genome Wide Association (GWA) and Quantitative Trait Locus (QTL) mapping studies. However, it can cause misleading results if not validated properly. Here we present a method and tool for validating the gene pairs from GWA studies given the context of the network they co-occur in. It ensures that proposed interactions and gene associations are not statistical artefacts inherent to the specific gene network architecture. The CandidateBacon package provides an easy and efficient method to calculate the average degree of separation (DoS) between pairs of genes to currently available gene networks. We show how these empirical estimates of average connectedness are used to validate candidate gene pairs. Validation of interacting genes by comparing their connectedness with the average connectedness in the gene network will provide support for said interactions by utilising the growing amount of gene network information available. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Genomic Tools in Cowpea Breeding Programs: Status and Perspectives

    Science.gov (United States)

    Boukar, Ousmane; Fatokun, Christian A.; Huynh, Bao-Lam; Roberts, Philip A.; Close, Timothy J.

    2016-01-01

    Cowpea is one of the most important grain legumes in sub-Saharan Africa (SSA). It provides strong support to the livelihood of small-scale farmers through its contributions to their nutritional security, income generation and soil fertility enhancement. Worldwide about 6.5 million metric tons of cowpea are produced annually on about 14.5 million hectares. The low productivity of cowpea is attributable to numerous abiotic and biotic constraints. The abiotic stress factors comprise drought, low soil fertility, and heat while biotic constraints include insects, diseases, parasitic weeds, and nematodes. Cowpea farmers also have limited access to quality seeds of improved varieties for planting. Some progress has been made through conventional breeding at international and national research institutions in the last three decades. Cowpea improvement could also benefit from modern breeding methods based on molecular genetic tools. A number of advances in cowpea genetic linkage maps, and quantitative trait loci associated with some desirable traits such as resistance to Striga, Macrophomina, Fusarium wilt, bacterial blight, root-knot nematodes, aphids, and foliar thrips have been reported. An improved consensus genetic linkage map has been developed and used to identify QTLs of additional traits. In order to take advantage of these developments single nucleotide polymorphism (SNP) genotyping is being streamlined to establish an efficient workflow supported by genotyping support service (GSS)-client interactions. About 1100 SNPs mapped on the cowpea genome were converted by LGC Genomics to KASP assays. Several cowpea breeding programs have been exploiting these resources to implement molecular breeding, especially for MARS and MABC, to accelerate cowpea variety improvement. The combination of conventional breeding and molecular breeding strategies, with workflow managed through the CGIAR breeding management system (BMS), promises an increase in the number of improved

  10. Genomic tools in cowpea breeding programs: status and perspectives

    Directory of Open Access Journals (Sweden)

    Ousmane eBoukar

    2016-06-01

    Full Text Available Cowpea is one of the most important grain legumes in sub-Saharan Africa (SSA. It provides strong support to the livelihood of small-scale farmers through its contributions to their nutritional security, income generation and soil fertility enhancement. Worldwide about 6.5 million metric tons of cowpea are produced annually on about 14.5 million hectares. The low productivity of cowpea is attributable to numerous abiotic and biotic constraints. The abiotic stress factors comprise drought, low soil fertility, and heat while biotic constraints include insects, diseases, parasitic weeds and nematodes. Cowpea farmers also have limited access to quality seeds of improved varieties for planting. Some progress has been made through conventional breeding at international and national research institutions in the last three decades. Cowpea improvement could also benefit from modern breeding methods based on molecular genetic tools. A number of advances in cowpea genetic linkage maps, and quantitative trait loci associated with some desirable traits such as resistance to Striga, Macrophomina, Fusarium wilt, bacterial blight, root-knot nematodes, aphids and foliar thrips have been reported. An improved consensus genetic linkage map has been developed and used to identify QTLs of additional traits. In order to take advantage of these developments single nucleotide polymorphism (SNP genotyping is being streamlined to establish an efficient workflow supported by genotyping support service (GSS-client interactions. About 1100 SNPs mapped on the cowpea genome were converted by LGC Genomics to KASP assays. Several cowpea breeding programs have been exploiting these resources to implement molecular breeding, especially for MARS and MABC, to accelerate cowpea variety improvement. The combination of conventional breeding and molecular breeding strategies, with workflow managed through the CGIAR breeding management system (BMS, promises an increase in the number of

  11. GeneBins: a database for classifying gene expression data, with application to plant genome arrays

    Directory of Open Access Journals (Sweden)

    Weiller Georg

    2007-03-01

    Full Text Available Abstract Background To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms. Results We developed a bioinformatics system to assign the probe set sequences of any organism to a hierarchical functional classification modelled on KEGG ontology. The GeneBins database currently supports the functional classification of expression data from four Affymetrix arrays; Arabidopsis thaliana, Oryza sativa, Glycine max and Medicago truncatula. An online analysis tool to identify relevant functions is also provided. Conclusion GeneBins provides resources to interpret gene expression results from microarray experiments. It is available at http://bioinfoserver.rsbs.anu.edu.au/utils/GeneBins/

  12. Identification of DNA repair genes in the human genome

    International Nuclear Information System (INIS)

    Hoeijmakers, J.H.J.; van Duin, M.; Westerveld, A.; Yasui, A.; Bootsma, D.

    1986-01-01

    To identify human DNA repair genes we have transfected human genomic DNA ligated to a dominant marker to excision repair deficient xeroderma pigmentosum (XP) and CHO cells. This resulted in the cloning of a human gene, ERCC-1, that complements the defect of a UV- and mitomycin-C sensitive CHO mutant 43-3B. The ERCC-1 gene has a size of 15 kb, consists of 10 exons and is located in the region 19q13.2-q13.3. Its primary transcript is processed into two mRNAs by alternative splicing of an internal coding exon. One of these transcripts encodes a polypeptide of 297 aminoacids. A putative DNA binding protein domain and nuclear location signal could be identified. Significant AA-homology is found between ERCC-1 and the yeast excision repair gene RAD10. 58 references, 6 figures, 1 table

  13. Re-examining the Gene in Personalized Genomics

    Science.gov (United States)

    Bartol, Jordan

    2013-10-01

    Personalized genomics companies (PG; also called `direct-to-consumer genetics') are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept presented to customers and the relation between the information given and the science behind PG. Two quite different gene concepts are present in company rhetoric, but only one features in the science. To explain this, we must appreciate the delicate tension between PG, academic science, public expectation, and market forces.

  14. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David

    2012-01-01

    Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful...... for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps...... more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...

  15. Versatile Gene-Specific Sequence Tags for Arabidopsis Functional Genomics: Transcript Profiling and Reverse Genetics Applications

    Science.gov (United States)

    Hilson, Pierre; Allemeersch, Joke; Altmann, Thomas; Aubourg, Sébastien; Avon, Alexandra; Beynon, Jim; Bhalerao, Rishikesh P.; Bitton, Frédérique; Caboche, Michel; Cannoot, Bernard; Chardakov, Vasil; Cognet-Holliger, Cécile; Colot, Vincent; Crowe, Mark; Darimont, Caroline; Durinck, Steffen; Eickhoff, Holger; de Longevialle, Andéol Falcon; Farmer, Edward E.; Grant, Murray; Kuiper, Martin T.R.; Lehrach, Hans; Léon, Céline; Leyva, Antonio; Lundeberg, Joakim; Lurin, Claire; Moreau, Yves; Nietfeld, Wilfried; Paz-Ares, Javier; Reymond, Philippe; Rouzé, Pierre; Sandberg, Goran; Segura, Maria Dolores; Serizet, Carine; Tabrett, Alexandra; Taconnat, Ludivine; Thareau, Vincent; Van Hummelen, Paul; Vercruysse, Steven; Vuylsteke, Marnik; Weingartner, Magdalena; Weisbeek, Peter J.; Wirta, Valtteri; Wittink, Floyd R.A.; Zabeau, Marc; Small, Ian

    2004-01-01

    Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics. PMID:15489341

  16. SCNS: a graphical tool for reconstructing executable regulatory networks from single-cell genomic data.

    Science.gov (United States)

    Woodhouse, Steven; Piterman, Nir; Wintersteiger, Christoph M; Göttgens, Berthold; Fisher, Jasmin

    2018-05-25

    Reconstruction of executable mechanistic models from single-cell gene expression data represents a powerful approach to understanding developmental and disease processes. New ambitious efforts like the Human Cell Atlas will soon lead to an explosion of data with potential for uncovering and understanding the regulatory networks which underlie the behaviour of all human cells. In order to take advantage of this data, however, there is a need for general-purpose, user-friendly and efficient computational tools that can be readily used by biologists who do not have specialist computer science knowledge. The Single Cell Network Synthesis toolkit (SCNS) is a general-purpose computational tool for the reconstruction and analysis of executable models from single-cell gene expression data. Through a graphical user interface, SCNS takes single-cell qPCR or RNA-sequencing data taken across a time course, and searches for logical rules that drive transitions from early cell states towards late cell states. Because the resulting reconstructed models are executable, they can be used to make predictions about the effect of specific gene perturbations on the generation of specific lineages. SCNS should be of broad interest to the growing number of researchers working in single-cell genomics and will help further facilitate the generation of valuable mechanistic insights into developmental, homeostatic and disease processes.

  17. GAViT: Genome Assembly Visualization Tool for Short Read Data

    Energy Technology Data Exchange (ETDEWEB)

    Syed, Aijazuddin; Shapiro, Harris; Tu, Hank; Pangilinan, Jasmyn; Trong, Stephan

    2008-03-14

    It is a challenging job for genome analysts to accurately debug, troubleshoot, and validate genome assembly results. Genome analysts rely on visualization tools to help validate and troubleshoot assembly results, including such problems as mis-assemblies, low-quality regions, and repeats. Short read data adds further complexity and makes it extremely challenging for the visualization tools to scale and to view all needed assembly information. As a result, there is a need for a visualization tool that can scale to display assembly data from the new sequencing technologies. We present Genome Assembly Visualization Tool (GAViT), a highly scalable and interactive assembly visualization tool developed at the DOE Joint Genome Institute (JGI).

  18. Whole genome sequencing as the ultimate tool to diagnose tuberculosis

    Directory of Open Access Journals (Sweden)

    Dick van Soolingen

    2016-01-01

    Full Text Available In the past two decades, DNA techniques have been increasingly used in the laboratory diagnosis of tuberculosis (TB. The (sub species of the Mycobacterium tuberculosis complex are usually identified using reverse line blot techniques. The resistance is predicted by the detection of mutations in genes associated with resistance. Nevertheless, all cases are still subjected to cumbersome phenotypic resistance testing. The production of a strain-characteristic DNA fingerprint, to investigate the epidemiology of TB, is done by the 24-locus variable number tandem repeat (VNTR typing. However, most of the molecular techniques in the diagnosis of TB can eventually be replaced by whole genome sequencing (WGS. Many international TB reference laboratories are currently working on the introduction of WGS; however, standardization in the international context is lacking. The European Centre for Infectious Disease Prevention and Control in Stockholm, Sweden organizes a yearly round of quality control on VNTR typing and in 2015 for the first time also WGS. In this first proficiency study, only three out of eight international TB laboratories produced WGS results in line with those of the reference laboratory. The whole process of DNA isolation, purification, quantification, sequencing, and analysis/interpretation of data is still under development. In this presentation, many aspects will be covered that influence the quality and interpretation of WGS results. The turn-around-time, analysis, and utility of WGS will be discussed. Moreover, the experiences in the use of WGS in the molecular epidemiology of TB in The Netherlands are detailed. It can be concluded that many difficulties still have to be conquered. The state of the art is that bacteria still have to be cultured to have sufficient quality and quantity of DNA for succesful WGS. The quality of sequencing has improved significantly over the past 7 years, and the detection of mutations has, therefore

  19. Genes on B chromosomes: Old questions revisited with new tools

    Czech Academy of Sciences Publication Activity Database

    Banaei-Moghaddam, A.M.; Martis, M.M.; Macas, Jiří; Gundlach, H.; Himmelbach, A.; Altschmied, L.; Mayer, K. F. X.; Houben, A.

    2015-01-01

    Roč. 1849, č. 1 (2015), s. 64-70 ISSN 1874-9399 R&D Projects: GA ČR GBP501/12/G090 Institutional support: RVO:60077344 Keywords : Gene regulation * genome evolution * junk DNA * pseudogene * transcription Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 5.373, year: 2015

  20. Online Analytical Processing (OLAP: A Fast and Effective Data Mining Tool for Gene Expression Databases

    Directory of Open Access Journals (Sweden)

    Alkharouf Nadim W.

    2005-01-01

    Full Text Available Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with SQLServer2000, to construct an OLAP cube that was used to mine a time series experiment designed to identify genes associated with resistance of soybean to the soybean cyst nematode, a devastating pest of soybean. The data for these experiments is stored in the soybean genomics and microarray database (SGMD. A number of candidate resistance genes and pathways were found. Compared to traditional cluster analysis of gene expression data, OLAP was more effective and faster in finding biologically meaningful information. OLAP is available from a number of vendors and can work with any relational database management system through OLE DB.

  1. Online analytical processing (OLAP): a fast and effective data mining tool for gene expression databases.

    Science.gov (United States)

    Alkharouf, Nadim W; Jamison, D Curtis; Matthews, Benjamin F

    2005-06-30

    Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP) can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with SQLServer2000, to construct an OLAP cube that was used to mine a time series experiment designed to identify genes associated with resistance of soybean to the soybean cyst nematode, a devastating pest of soybean. The data for these experiments is stored in the soybean genomics and microarray database (SGMD). A number of candidate resistance genes and pathways were found. Compared to traditional cluster analysis of gene expression data, OLAP was more effective and faster in finding biologically meaningful information. OLAP is available from a number of vendors and can work with any relational database management system through OLE DB.

  2. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    Energy Technology Data Exchange (ETDEWEB)

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  3. Conditions for the evolution of gene clusters in bacterial genomes.

    Directory of Open Access Journals (Sweden)

    Sara Ballouz

    2010-02-01

    Full Text Available Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model, genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters.

  4. Conditions for the Evolution of Gene Clusters in Bacterial Genomes

    Science.gov (United States)

    Ballouz, Sara; Francis, Andrew R.; Lan, Ruiting; Tanaka, Mark M.

    2010-01-01

    Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters. PMID:20168992

  5. GenPlay Multi-Genome, a tool to compare and analyze multiple human genomes in a graphical interface.

    Science.gov (United States)

    Lajugie, Julien; Fourel, Nicolas; Bouhassira, Eric E

    2015-01-01

    Parallel visualization of multiple individual human genomes is a complex endeavor that is rapidly gaining importance with the increasing number of personal, phased and cancer genomes that are being generated. It requires the display of variants such as SNPs, indels and structural variants that are unique to specific genomes and the introduction of multiple overlapping gaps in the reference sequence. Here, we describe GenPlay Multi-Genome, an application specifically written to visualize and analyze multiple human genomes in parallel. GenPlay Multi-Genome is ideally suited for the comparison of allele-specific expression and functional genomic data obtained from multiple phased genomes in a graphical interface with access to multiple-track operation. It also allows the analysis of data that have been aligned to custom genomes rather than to a standard reference and can be used as a variant calling format file browser and as a tool to compare different genome assembly, such as hg19 and hg38. GenPlay is available under the GNU public license (GPL-3) from http://genplay.einstein.yu.edu. The source code is available at https://github.com/JulienLajugie/GenPlay. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Complete Chloroplast Genome of Pinus massoniana (Pinaceae): Gene Rearrangements, Loss of ndh Genes, and Short Inverted Repeats Contraction, Expansion.

    Science.gov (United States)

    Ni, ZhouXian; Ye, YouJu; Bai, Tiandao; Xu, Meng; Xu, Li-An

    2017-09-11

    The chloroplast genome (CPG) of Pinus massoniana belonging to the genus Pinus (Pinaceae), which is a primary source of turpentine, was sequenced and analyzed in terms of gene rearrangements, ndh genes loss, and the contraction and expansion of short inverted repeats (IRs). P. massoniana CPG has a typical quadripartite structure that includes large single copy (LSC) (65,563 bp), small single copy (SSC) (53,230 bp) and two IRs (IRa and IRb, 485 bp). The 108 unique genes were identified, including 73 protein-coding genes, 31 tRNAs, and 4 rRNAs. Most of the 81 simple sequence repeats (SSRs) identified in CPG were mononucleotides motifs of A/T types and located in non-coding regions. Comparisons with related species revealed an inversion (21,556 bp) in the LSC region; P. massoniana CPG lacks all 11 intact ndh genes (four ndh genes lost completely; the five remained truncated as pseudogenes; and the other two ndh genes remain as pseudogenes because of short insertions or deletions). A pair of short IRs was found instead of large IRs, and size variations among pine species were observed, which resulted from short insertions or deletions and non-synchronized variations between "IRa" and "IRb". The results of phylogenetic analyses based on whole CPG sequences of 16 conifers indicated that the whole CPG sequences could be used as a powerful tool in phylogenetic analyses.

  7. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

    Science.gov (United States)

    Biankin, Andrew V; Waddell, Nicola; Kassahn, Karin S; Gingras, Marie-Claude; Muthuswamy, Lakshmi B; Johns, Amber L; Miller, David K; Wilson, Peter J; Patch, Ann-Marie; Wu, Jianmin; Chang, David K; Cowley, Mark J; Gardiner, Brooke B; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J; Gill, Anthony J; Pinho, Andreia V; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R Scott; Humphris, Jeremy L; Kaplan, Warren; Jones, Marc D; Colvin, Emily K; Nagrial, Adnan M; Humphrey, Emily S; Chou, Angela; Chin, Venessa T; Chantrill, Lorraine A; Mawson, Amanda; Samra, Jaswinder S; Kench, James G; Lovell, Jessica A; Daly, Roger J; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M; Fisher, William E; Brunicardi, F Charles; Hodges, Sally E; Reid, Jeffrey G; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R; Dinh, Huyen; Buhay, Christian J; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E; Yung, Christina K; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A; Petersen, Gloria M; Gallinger, Steven; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Schulick, Richard D; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A; Mann, Karen M; Jenkins, Nancy A; Perez-Mancera, Pedro A; Adams, David J; Largaespada, David A; Wessels, Lodewyk F A; Rust, Alistair G; Stein, Lincoln D; Tuveson, David A; Copeland, Neal G; Musgrove, Elizabeth A; Scarpa, Aldo; Eshleman, James R; Hudson, Thomas J; Sutherland, Robert L; Wheeler, David A; Pearson, John V; McPherson, John D; Gibbs, Richard A; Grimmond, Sean M

    2012-11-15

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.

  8. Genome-wide identification of KANADI1 target genes.

    Directory of Open Access Journals (Sweden)

    Paz Merelo

    Full Text Available Plant organ development and polarity establishment is mediated by the action of several transcription factors. Among these, the KANADI (KAN subclade of the GARP protein family plays important roles in polarity-associated processes during embryo, shoot and root patterning. In this study, we have identified a set of potential direct target genes of KAN1 through a combination of chromatin immunoprecipitation/DNA sequencing (ChIP-Seq and genome-wide transcriptional profiling using tiling arrays. Target genes are over-represented for genes involved in the regulation of organ development as well as in the response to auxin. KAN1 affects directly the expression of several genes previously shown to be important in the establishment of polarity during lateral organ and vascular tissue development. We also show that KAN1 controls through its target genes auxin effects on organ development at different levels: transport and its regulation, and signaling. In addition, KAN1 regulates genes involved in the response to abscisic acid, jasmonic acid, brassinosteroids, ethylene, cytokinins and gibberellins. The role of KAN1 in organ polarity is antagonized by HD-ZIPIII transcription factors, including REVOLUTA (REV. A comparison of their target genes reveals that the REV/KAN1 module acts in organ patterning through opposite regulation of shared targets. Evidence of mutual repression between closely related family members is also shown.

  9. AnGeLi: A Tool for the Analysis of Gene Lists from Fission Yeast

    Directory of Open Access Journals (Sweden)

    Danny A Bitton

    2015-11-01

    Full Text Available Genome-wide assays and screens typically result in large lists of genes or proteins. Enrichments of functional or other biological properties within such lists can provide valuable insights and testable hypotheses. To systematically detect these enrichments can be challenging and time-consuming, because relevant data to compare against query gene lists are spread over many different sources. We have developed AnGeLi (Analysis of Gene Lists, an intuitive, integrated web-tool for comprehensive and customized interrogation of gene lists from the fission yeast, Schizosaccharomyces pombe. AnGeLi searches for significant enrichments among multiple qualitative and quantitative information sources, including gene and phenotype ontologies, genetic and protein interactions, numerous features of genes, transcripts, translation, and proteins such as copy numbers, chromosomal positions, genetic diversity, RNA polymerase II and ribosome occupancy, localization, conservation, half-lives, domains and molecular weight among others, as well as diverse sets of genes that are co-regulated or lead to the same phenotypes when mutated. AnGeLi uses robust statistics which can be tailored to specific needs. It also provides the option to upload user-defined gene sets to compare against the query list. Through an integrated data submission form, AnGeLi encourages the community to contribute additional curated gene lists to further increase the usefulness of this resource and to get the most from the ever increasing large-scale experiments. AnGeLi offers a rigorous yet flexible statistical analysis platform for rich insights into functional enrichments and biological context for query gene lists, thus providing a powerful exploratory tool through which S. pombe researchers can uncover fresh perspectives and unexpected connections from genomic data. AnGeLi is freely available at: www.bahlerlab.info/AnGeLi

  10. Novel gene expression tools for rice biotechnology

    Science.gov (United States)

    Biotechnology is an effective and important method of improving both quality and agronomic traits in rice. We are developing novel molecular tools for genetic engineering, with a focus on developing novel transgene expression control elements (i.e. promoters) for rice. A suite of monocot grass promo...

  11. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis

    Directory of Open Access Journals (Sweden)

    Stajich Jason E

    2006-11-01

    Full Text Available Abstract Background To date, most fungal phylogenies have been derived from single gene comparisons, or from concatenated alignments of a small number of genes. The increase in fungal genome sequencing presents an opportunity to reconstruct evolutionary events using entire genomes. As a tool for future comparative, phylogenomic and phylogenetic studies, we used both supertrees and concatenated alignments to infer relationships between 42 species of fungi for which complete genome sequences are available. Results A dataset of 345,829 genes was extracted from 42 publicly available fungal genomes. Supertree methods were employed to derive phylogenies from 4,805 single gene families. We found that the average consensus supertree method may suffer from long-branch attraction artifacts, while matrix representation with parsimony (MRP appears to be immune from these. A genome phylogeny was also reconstructed from a concatenated alignment of 153 universally distributed orthologs. Our MRP supertree and concatenated phylogeny are highly congruent. Within the Ascomycota, the sub-phyla Pezizomycotina and Saccharomycotina were resolved. Both phylogenies infer that the Leotiomycetes are the closest sister group to the Sordariomycetes. There is some ambiguity regarding the placement of Stagonospora nodurum, the sole member of the class Dothideomycetes present in the dataset. Within the Saccharomycotina, a monophyletic clade containing organisms that translate CTG as serine instead of leucine is evident. There is also strong support for two groups within the CTG clade, one containing the fully sexual species Candida lusitaniae, Candida guilliermondii and Debaryomyces hansenii, and the second group containing Candida albicans, Candida dubliniensis, Candida tropicalis, Candida parapsilosis and Lodderomyces elongisporus. The second major clade within the Saccharomycotina contains species whose genomes have undergone a whole genome duplication (WGD, and their close

  12. Genomic and gene variation in Mycoplasma hominis strains

    DEFF Research Database (Denmark)

    Christiansen, Gunna; Andersen, H; Birkelund, Svend

    1987-01-01

    DNAs from 14 strains of Mycoplasma hominis isolated from various habitats, including strain PG21, were analyzed for genomic heterogeneity. DNA-DNA filter hybridization values were from 51 to 91%. Restriction endonuclease digestion patterns, analyzed by agarose gel electrophoresis, revealed...... no identity or cluster formation between strains. Variation within M. hominis rRNA genes was analyzed by Southern hybridization of EcoRI-cleaved DNA hybridized with a cloned fragment of the rRNA gene from the mycoplasma strain PG50. Five of the M. hominis strains showed identical hybridization patterns....... These hybridization patterns were compared with those of 12 other mycoplasma species, which showed a much more complex band pattern. Cloned nonribosomal RNA gene fragments of M. hominis PG21 DNA were analyzed, and the fragments were used to demonstrate heterogeneity among the strains. A monoclonal antibody against...

  13. Combining genetical genomics and bulked segregant analysis differential expression: an approach to gene localization

    NARCIS (Netherlands)

    Chen, Xinwei; Hedley, P.E.; Morris, J.; Liu, Hui; Niks, R.E.; Waugh, R.

    2011-01-01

    Positional gene isolation in unsequenced species generally requires either a reference genome sequence or an inference of gene content based on conservation of synteny with a genomic model. In the large unsequenced genomes of the Triticeae cereals the latter, i.e. conservation of synteny with the

  14. Mapping our genes: The genome projects: How big, how fast

    Energy Technology Data Exchange (ETDEWEB)

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  15. Mapping Our Genes: The Genome Projects: How Big, How Fast

    Science.gov (United States)

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for ?writing the rules? of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  16. Evolution of endogenous non-retroviral genes integrated into plant genomes

    Directory of Open Access Journals (Sweden)

    Hyosub Chu

    2014-08-01

    Full Text Available Numerous comparative genome analyses have revealed the wide extent of horizontal gene transfer (HGT in living organisms, which contributes to their evolution and genetic diversity. Viruses play important roles in HGT. Endogenous viral elements (EVEs are defined as viral DNA sequences present within the genomes of non-viral organisms. In eukaryotic cells, the majority of EVEs are derived from RNA viruses using reverse transcription. In contrast, endogenous non-retroviral elements (ENREs are poorly studied. However, the increasing availability of genomic data and the rapid development of bioinformatics tools have enabled the identification of several ENREs in various eukaryotic organisms. To date, a small number of ENREs integrated into plant genomes have been identified. Of the known non-retroviruses, most identified ENREs are derived from double-strand (ds RNA viruses, followed by single-strand (ss DNA and ssRNA viruses. At least eight virus families have been identified. Of these, viruses in the family Partitiviridae are dominant, followed by viruses of the families Chrysoviridae and Geminiviridae. The identified ENREs have been primarily identified in eudicots, followed by monocots. In this review, we briefly discuss the current view on non-retroviral sequences integrated into plant genomes that are associated with plant-virus evolution and their possible roles in antiviral resistance.

  17. Census of solo LuxR genes in prokaryotic genomes.

    Science.gov (United States)

    Hudaiberdiev, Sanjarbek; Choudhary, Kumari S; Vera Alvarez, Roberto; Gelencsér, Zsolt; Ligeti, Balázs; Lamba, Doriano; Pongor, Sándor

    2015-01-01

    luxR genes encode transcriptional regulators that control acyl homoserine lactone-based quorum sensing (AHL QS) in Gram negative bacteria. On the bacterial chromosome, luxR genes are usually found next or near to a luxI gene encoding the AHL signal synthase. Recently, a number of luxR genes were described that have no luxI genes in their vicinity on the chromosome. These so-called solo luxR genes may either respond to internal AHL signals produced by a non-adjacent luxI in the chromosome, or can respond to exogenous signals. Here we present a survey of solo luxR genes found in complete and draft bacterial genomes in the NCBI databases using HMMs. We found that 2698 of the 3550 luxR genes found are solos, which is an unexpectedly high number even if some of the hits may be false positives. We also found that solo LuxR sequences form distinct clusters that are different from the clusters of LuxR sequences that are part of the known luxR-luxI topological arrangements. We also found a number of cases that we termed twin luxR topologies, in which two adjacent luxR genes were in tandem or divergent orientation. Many of the luxR solo clusters were devoid of the sequence motifs characteristic of AHL binding LuxR proteins so there is room to speculate that the solos may be involved in sensing hitherto unknown signals. It was noted that only some of the LuxR clades are rich in conserved cysteine residues. Molecular modeling suggests that some of the cysteines may be involved in disulfide formation, which makes us speculate that some LuxR proteins, including some of the solos may be involved in redox regulation.

  18. Development of genomic SSR markers for fingerprinting lettuce (Lactuca sativa L.) cultivars and mapping genes.

    Science.gov (United States)

    Rauscher, Gilda; Simko, Ivan

    2013-01-22

    Lettuce (Lactuca sativa L.) is the major crop from the group of leafy vegetables. Several types of molecular markers were developed that are effectively used in lettuce breeding and genetic studies. However only a very limited number of microsattelite-based markers are publicly available. We have employed the method of enriched microsatellite libraries to develop 97 genomic SSR markers. Testing of newly developed markers on a set of 36 Lactuca accession (33 L. sativa, and one of each L. serriola L., L. saligna L., and L. virosa L.) revealed that both the genetic heterozygosity (UHe = 0.56) and the number of loci per SSR (Na = 5.50) are significantly higher for genomic SSR markers than for previously developed EST-based SSR markers (UHe = 0.32, Na = 3.56). Fifty-four genomic SSR markers were placed on the molecular linkage map of lettuce. Distribution of markers in the genome appeared to be random, with the exception of possible cluster on linkage group 6. Any combination of 32 genomic SSRs was able to distinguish genotypes of all 36 accessions. Fourteen of newly developed SSR markers originate from fragments with high sequence similarity to resistance gene candidates (RGCs) and RGC pseudogenes. Analysis of molecular variance (AMOVA) of L. sativa accessions showed that approximately 3% of genetic diversity was within accessions, 79% among accessions, and 18% among horticultural types. The newly developed genomic SSR markers were added to the pool of previously developed EST-SSRs markers. These two types of SSR-based markers provide useful tools for lettuce cultivar fingerprinting, development of integrated molecular linkage maps, and mapping of genes.

  19. webMGR: an online tool for the multiple genome rearrangement problem.

    Science.gov (United States)

    Lin, Chi Ho; Zhao, Hao; Lowcay, Sean Harry; Shahab, Atif; Bourque, Guillaume

    2010-02-01

    The algorithm MGR enables the reconstruction of rearrangement phylogenies based on gene or synteny block order in multiple genomes. Although MGR has been successfully applied to study the evolution of different sets of species, its utilization has been hampered by the prohibitive running time for some applications. In the current work, we have designed new heuristics that significantly speed up the tool without compromising its accuracy. Moreover, we have developed a web server (webMGR) that includes elaborate web output to facilitate navigation through the results. webMGR can be accessed via http://www.gis.a-star.edu.sg/~bourque. The source code of the improved standalone version of MGR is also freely available from the web site. Supplementary data are available at Bioinformatics online.

  20. Mass spectrometric imaging as a high-spatial resolution tool for functional genomics: Tissue-specific gene expression of TT7 inferred from heterogeneous distribution of metabolites in Arabidopsis flowers

    Energy Technology Data Exchange (ETDEWEB)

    Korte, Andrew R.; Song, Zhihong; Nikolau, Basil J.; Lee, Young Jin

    2011-12-23

    Laser desorption/ionization (LDI) mass spectrometry imaging (MSI) was used to acquire chemical images of flavonoid metabolites on the surface of wild-type and mutant (tt7) Arabidopsis thaliana flowers. Flavonoids were localized to the petals and carpels of flowers, with tissue heterogeneity in the petals. Specifically, kaempferol and/or its glycosides were abundant in the distal region of petals and quercetin and its downstream flavonoids were highly enriched in the more proximal region of petals. As a result of a mutation in the TT7 gene which blocks the conversion of dihydrokaempferol to dihydroquercetin, the downstream metabolites, quercetin, isohamnetin, and their glycosides, were not observed in the mutant flowers. Instead, the metabolites in an alternative pathway, kaempferol and/or its glycosides, were as highly abundant on the proximal region of the petals as in the distal region. In addition, the combined flavonoid amounts on the proximal region of petals in the wild-type are almost equivalent to the amounts of kaempferol and/or its glycosides in the mutant. This strongly suggests that the expression of the TT7 gene is localized on the proximal part of the petal while the other genes in the upper stream pathway are evenly expressed throughout the petal. Most importantly, this work demonstrates MSI of metabolites can be utilized for the localization of gene expression.

  1. BPhyOG: An interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes

    Directory of Open Access Journals (Sweden)

    Lin Kui

    2007-07-01

    Full Text Available Abstract Background Overlapping genes (OGs in bacterial genomes are pairs of adjacent genes of which the coding sequences overlap partly or entirely. With the rapid accumulation of sequence data, many OGs in bacterial genomes have now been identified. Indeed, these might prove a consistent feature across all microbial genomes. Our previous work suggests that OGs can be considered as robust markers at the whole genome level for the construction of phylogenies. An online, interactive web server for inferring phylogenies is needed for biologists to analyze phylogenetic relationships among a set of bacterial genomes of interest. Description BPhyOG is an online interactive server for reconstructing the phylogenies of completely sequenced bacterial genomes on the basis of their shared overlapping genes. It provides two tree-reconstruction methods: Neighbor Joining (NJ and Unweighted Pair-Group Method using Arithmetic averages (UPGMA. Users can apply the desired method to generate phylogenetic trees, which are based on an evolutionary distance matrix for the selected genomes. The distance between two genomes is defined by the normalized number of their shared OG pairs. BPhyOG also allows users to browse the OGs that were used to infer the phylogenetic relationships. It provides detailed annotation for each OG pair and the features of the component genes through hyperlinks. Users can also retrieve each of the homologous OG pairs that have been determined among 177 genomes. It is a useful tool for analyzing the tree of life and overlapping genes from a genomic standpoint. Conclusion BPhyOG is a useful interactive web server for genome-wide inference of any potential evolutionary relationship among the genomes selected by users. It currently includes 177 completely sequenced bacterial genomes containing 79,855 OG pairs, the annotation and homologous OG pairs of which are integrated comprehensively. The reliability of phylogenies complemented by

  2. Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes.

    Science.gov (United States)

    Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong

    2018-03-01

    Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.

  3. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling

    Science.gov (United States)

    Sato, Yukuto; Tsukamoto, Katsumi; Nishida, Mutsumi

    2015-01-01

    Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post–teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70–80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis. PMID:26578810

  4. Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum.

    Science.gov (United States)

    Rao, Soumya; Nandineni, Madhusudan R

    2017-01-01

    Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.

  5. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D

    2017-01-04

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    Science.gov (United States)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  7. DR-Integrator: a new analytic tool for integrating DNA copy number and gene expression data.

    Science.gov (United States)

    Salari, Keyan; Tibshirani, Robert; Pollack, Jonathan R

    2010-02-01

    DNA copy number alterations (CNA) frequently underlie gene expression changes by increasing or decreasing gene dosage. However, only a subset of genes with altered dosage exhibit concordant changes in gene expression. This subset is likely to be enriched for oncogenes and tumor suppressor genes, and can be identified by integrating these two layers of genome-scale data. We introduce DNA/RNA-Integrator (DR-Integrator), a statistical software tool to perform integrative analyses on paired DNA copy number and gene expression data. DR-Integrator identifies genes with significant correlations between DNA copy number and gene expression, and implements a supervised analysis that captures genes with significant alterations in both DNA copy number and gene expression between two sample classes. DR-Integrator is freely available for non-commercial use from the Pollack Lab at http://pollacklab.stanford.edu/ and can be downloaded as a plug-in application to Microsoft Excel and as a package for the R statistical computing environment. The R package is available under the name 'DRI' at http://cran.r-project.org/. An example analysis using DR-Integrator is included as supplemental material. Supplementary data are available at Bioinformatics online.

  8. PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome.

    Science.gov (United States)

    Sarika; Arora, Vasu; Iquebal, M A; Rai, Anil; Kumar, Dinesh

    2013-01-01

    Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on 'three-tier architecture' that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers' search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/

  9. The compact Selaginella genome identifies changes in gene content associated with the evolution of vascular plants

    Energy Technology Data Exchange (ETDEWEB)

    Grigoriev, Igor V.; Banks, Jo Ann; Nishiyama, Tomoaki; Hasebe, Mitsuyasu; Bowman, John L.; Gribskov, Michael; dePamphilis, Claude; Albert, Victor A.; Aono, Naoki; Aoyama, Tsuyoshi; Ambrose, Barbara A.; Ashton, Neil W.; Axtell, Michael J.; Barker, Elizabeth; Barker, Michael S.; Bennetzen, Jeffrey L.; Bonawitz, Nicholas D.; Chapple, Clint; Cheng, Chaoyang; Correa, Luiz Gustavo Guedes; Dacre, Michael; DeBarry, Jeremy; Dreyer, Ingo; Elias, Marek; Engstrom, Eric M.; Estelle, Mark; Feng, Liang; Finet, Cedric; Floyd, Sandra K.; Frommer, Wolf B.; Fujita, Tomomichi; Gramzow, Lydia; Gutensohn, Michael; Harholt, Jesper; Hattori, Mitsuru; Heyl, Alexander; Hirai, Tadayoshi; Hiwatashi, Yuji; Ishikawa, Masaki; Iwata, Mineko; Karol, Kenneth G.; Koehler, Barbara; Kolukisaoglu, Uener; Kubo, Minoru; Kurata, Tetsuya; Lalonde, Sylvie; Li, Kejie; Li, Ying; Litt, Amy; Lyons, Eric; Manning, Gerard; Maruyama, Takeshi; Michael, Todd P.; Mikami, Koji; Miyazaki, Saori; Morinaga, Shin-ichi; Murata, Takashi; Mueller-Roeber, Bernd; Nelson, David R.; Obara, Mari; Oguri, Yasuko; Olmstead, Richard G.; Onodera, Naoko; Petersen, Bent Larsen; Pils, Birgit; Prigge, Michael; Rensing, Stefan A.; Riano-Pachon, Diego Mauricio; Roberts, Alison W.; Sato, Yoshikatsu; Scheller, Henrik Vibe; Schulz, Burkhard; Schulz, Christian; Shakirov, Eugene V.; Shibagaki, Nakako; Shinohara, Naoki; Shippen, Dorothy E.; Sorensen, Iben; Sotooka, Ryo; Sugimoto, Nagisa; Sugita, Mamoru; Sumikawa, Naomi; Tanurdzic, Milos; Theilsen, Gunter; Ulvskov, Peter; Wakazuki, Sachiko; Weng, Jing-Ke; Willats, William W.G.T.; Wipf, Daniel; Wolf, Paul G.; Yang, Lixing; Zimmer, Andreas D.; Zhu, Qihui; Mitros, Therese; Hellsten, Uffe; Loque, Dominique; Otillar, Robert; Salamov, Asaf; Schmutz, Jeremy; Shapiro, Harris; Lindquist, Erika; Lucas, Susan; Rokhsar, Daniel

    2011-04-28

    We report the genome sequence of the nonseed vascular plant, Selaginella moellendorffii, and by comparative genomics identify genes that likely played important roles in the early evolution of vascular plants and their subsequent evolution

  10. FunGeneNet: a web tool to estimate enrichment of functional interactions in experimental gene sets.

    Science.gov (United States)

    Tiys, Evgeny S; Ivanisenko, Timofey V; Demenkov, Pavel S; Ivanisenko, Vladimir A

    2018-02-09

    Estimation of functional connectivity in gene sets derived from genome-wide or other biological experiments is one of the essential tasks of bioinformatics. A promising approach for solving this problem is to compare gene networks built using experimental gene sets with random networks. One of the resources that make such an analysis possible is CrossTalkZ, which uses the FunCoup database. However, existing methods, including CrossTalkZ, do not take into account individual types of interactions, such as protein/protein interactions, expression regulation, transport regulation, catalytic reactions, etc., but rather work with generalized types characterizing the existence of any connection between network members. We developed the online tool FunGeneNet, which utilizes the ANDSystem and STRING to reconstruct gene networks using experimental gene sets and to estimate their difference from random networks. To compare the reconstructed networks with random ones, the node permutation algorithm implemented in CrossTalkZ was taken as a basis. To study the FunGeneNet applicability, the functional connectivity analysis of networks constructed for gene sets involved in the Gene Ontology biological processes was conducted. We showed that the method sensitivity exceeds 0.8 at a specificity of 0.95. We found that the significance level of the difference between gene networks of biological processes and random networks is determined by the type of connections considered between objects. At the same time, the highest reliability is achieved for the generalized form of connections that takes into account all the individual types of connections. By taking examples of the thyroid cancer networks and the apoptosis network, it is demonstrated that key participants in these processes are involved in the interactions of those types by which these networks differ from random ones. FunGeneNet is a web tool aimed at proving the functionality of networks in a wide range of sizes of

  11. Paralogous Genes as a Tool to Study the Regulation of Gene Expression

    DEFF Research Database (Denmark)

    Hoffmann, Robert D

    The genomes of plants are marked by reoccurring events of whole-genome duplication. These events are major contributors to speciation and provide the genetic material for organisms to evolve ever greater complexity. Duplicated genes, referred to as paralogs, may be retained because they acquired...... regions. These results suggest that a concurrent purifying selection acts on coding and non-coding sequences of paralogous genes in A. thaliana. Mutational analyses of the promoters from a paralogous gene pair were performed in transgenic A. thaliana plants. The results revealed a 170-bp long DNA sequence...... that forms a bifunctional cis-regulatory module; it represses gene expression in the sporophyte while activating it in pollen. This finding is important for many aspects of gene regulation and the transcriptional changes underlying gametophyte development. In conclusion, the presented thesis suggests that...

  12. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    Science.gov (United States)

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  13. Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting | Office of Cancer Genomics

    Science.gov (United States)

    The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest.

  14. Gene prediction and RFX transcriptional regulation analysis using comparative genomics

    OpenAIRE

    Chu, Jeffrey Shih Chieh

    2011-01-01

    Regulatory Factor X (RFX) is a family of transcription factors (TF) that is conserved in all metazoans, in some fungi, and in only a few single-cellular organisms. Seven members are found in mammals, nine in fishes, three in fruit flies, and a single member in nematodes and fungi. RFX is involved in many different roles in humans, but a particular function that is conserved in many metazoans is its regulation of ciliogenesis. Probing over 150 genomes for the presence of RFX and ciliary genes ...

  15. Cognitive genomics: Linking genes to behavior in the human brain

    Directory of Open Access Journals (Sweden)

    Genevieve Konopka

    2017-02-01

    Full Text Available Correlations of genetic variation in DNA with functional brain activity have already provided a starting point for delving into human cognitive mechanisms. However, these analyses do not provide the specific genes driving the associations, which are complicated by intergenic localization as well as tissue-specific epigenetics and expression. The use of brain-derived expression datasets could build upon the foundation of these initial genetic insights and yield genes and molecular pathways for testing new hypotheses regarding the molecular bases of human brain development, cognition, and disease. Thus, coupling these human brain gene expression data with measurements of brain activity may provide genes with critical roles in brain function. However, these brain gene expression datasets have their own set of caveats, most notably a reliance on postmortem tissue. In this perspective, I summarize and examine the progress that has been made in this realm to date, and discuss the various frontiers remaining, such as the inclusion of cell-type-specific information, additional physiological measurements, and genomic data from patient cohorts.

  16. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure.

    Science.gov (United States)

    Gordon, Sean P; Contreras-Moreira, Bruno; Woods, Daniel P; Des Marais, David L; Burgess, Diane; Shu, Shengqiang; Stritt, Christoph; Roulin, Anne C; Schackwitz, Wendy; Tyler, Ludmila; Martin, Joel; Lipzen, Anna; Dochy, Niklas; Phillips, Jeremy; Barry, Kerrie; Geuten, Koen; Budak, Hikmet; Juenger, Thomas E; Amasino, Richard; Caicedo, Ana L; Goodstein, David; Davidson, Patrick; Mur, Luis A J; Figueroa, Melania; Freeling, Michael; Catalan, Pilar; Vogel, John P

    2017-12-19

    While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely to be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.

  17. WebScipio: An online tool for the determination of gene structures using protein sequences

    Directory of Open Access Journals (Sweden)

    Waack Stephan

    2008-09-01

    Full Text Available Abstract Background Obtaining the gene structure for a given protein encoding gene is an important step in many analyses. A software suited for this task should be readily accessible, accurate, easy to handle and should provide the user with a coherent representation of the most probable gene structure. It should be rigorous enough to optimise features on the level of single bases and at the same time flexible enough to allow for cross-species searches. Results WebScipio, a web interface to the Scipio software, allows a user to obtain the corresponding coding sequence structure of a here given a query protein sequence that belongs to an already assembled eukaryotic genome. The resulting gene structure is presented in various human readable formats like a schematic representation, and a detailed alignment of the query and the target sequence highlighting any discrepancies. WebScipio can also be used to identify and characterise the gene structures of homologs in related organisms. In addition, it offers a web service for integration with other programs. Conclusion WebScipio is a tool that allows users to get a high-quality gene structure prediction from a protein query. It offers more than 250 eukaryotic genomes that can be searched and produces predictions that are close to what can be achieved by manual annotation, for in-species and cross-species searches alike. WebScipio is freely accessible at http://www.webscipio.org.

  18. Comparative genomics of Geobacter chemotaxis genes reveals diverse signaling function

    Directory of Open Access Journals (Sweden)

    Antommattei Frances M

    2008-10-01

    Full Text Available Abstract Background Geobacter species are δ-Proteobacteria and are often the predominant species in a variety of sedimentary environments where Fe(III reduction is important. Their ability to remediate contaminated environments and produce electricity makes them attractive for further study. Cell motility, biofilm formation, and type IV pili all appear important for the growth of Geobacter in changing environments and for electricity production. Recent studies in other bacteria have demonstrated that signaling pathways homologous to the paradigm established for Escherichia coli chemotaxis can regulate type IV pili-dependent motility, the synthesis of flagella and type IV pili, the production of extracellular matrix material, and biofilm formation. The classification of these pathways by comparative genomics improves the ability to understand how Geobacter thrives in natural environments and better their use in microbial fuel cells. Results The genomes of G. sulfurreducens, G. metallireducens, and G. uraniireducens contain multiple (~70 homologs of chemotaxis genes arranged in several major clusters (six, seven, and seven, respectively. Unlike the single gene cluster of E. coli, the Geobacter clusters are not all located near the flagellar genes. The probable functions of some Geobacter clusters are assignable by homology to known pathways; others appear to be unique to the Geobacter sp. and contain genes of unknown function. We identified large numbers of methyl-accepting chemotaxis protein (MCP homologs that have diverse sensing domain architectures and generate a potential for sensing a great variety of environmental signals. We discuss mechanisms for class-specific segregation of the MCPs in the cell membrane, which serve to maintain pathway specificity and diminish crosstalk. Finally, the regulation of gene expression in Geobacter differs from E. coli. The sequences of predicted promoter elements suggest that the alternative sigma factors

  19. Semantic integration of gene expression analysis tools and data sources using software connectors

    Science.gov (United States)

    2013-01-01

    Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools

  20. Evolution of red algal plastid genomes: ancient architectures, introns, horizontal gene transfer, and taxonomic utility of plastid markers.

    Directory of Open Access Journals (Sweden)

    Jan Janouškovec

    Full Text Available Red algae have the most gene-rich plastid genomes known, but despite their evolutionary importance these genomes remain poorly sampled. Here we characterize three complete and one partial plastid genome from a diverse range of florideophytes. By unifying annotations across all available red algal plastid genomes we show they all share a highly compact and slowly-evolving architecture and uniquely rich gene complements. Both chromosome structure and gene content have changed very little during red algal diversification, and suggest that plastid-to nucleus gene transfers have been rare. Despite their ancient character, however, the red algal plastids also contain several unprecedented features, including a group II intron in a tRNA-Met gene that encodes the first example of red algal plastid intron maturase - a feature uniquely shared among florideophytes. We also identify a rare case of a horizontally-acquired proteobacterial operon, and propose this operon may have been recruited for plastid function and potentially replaced a nucleus-encoded plastid-targeted paralogue. Plastid genome phylogenies yield a fully resolved tree and suggest that plastid DNA is a useful tool for resolving red algal relationships. Lastly, we estimate the evolutionary rates among more than 200 plastid genes, and assess their usefulness for species and subspecies taxonomy by comparison to well-established barcoding markers such as cox1 and rbcL. Overall, these data demonstrates that red algal plastid genomes are easily obtainable using high-throughput sequencing of total genomic DNA, interesting from evolutionary perspectives, and promising in resolving red algal relationships at evolutionarily-deep and species/subspecies levels.

  1. Annotating novel genes by integrating synthetic lethals and genomic information

    Directory of Open Access Journals (Sweden)

    Faty Mahamadou

    2008-01-01

    Full Text Available Abstract Background Large scale screening for synthetic lethality serves as a common tool in yeast genetics to systematically search for genes that play a role in specific biological processes. Often the amounts of data resulting from a single large scale screen far exceed the capacities of experimental characterization of every identified target. Thus, there is need for computational tools that select promising candidate genes in order to reduce the number of follow-up experiments to a manageable size. Results We analyze synthetic lethality data for arp1 and jnm1, two spindle migration genes, in order to identify novel members in this process. To this end, we use an unsupervised statistical method that integrates additional information from biological data sources, such as gene expression, phenotypic profiling, RNA degradation and sequence similarity. Different from existing methods that require large amounts of synthetic lethal data, our method merely relies on synthetic lethality information from two single screens. Using a Multivariate Gaussian Mixture Model, we determine the best subset of features that assign the target genes to two groups. The approach identifies a small group of genes as candidates involved in spindle migration. Experimental testing confirms the majority of our candidates and we present she1 (YBL031W as a novel gene involved in spindle migration. We applied the statistical methodology also to TOR2 signaling as another example. Conclusion We demonstrate the general use of Multivariate Gaussian Mixture Modeling for selecting candidate genes for experimental characterization from synthetic lethality data sets. For the given example, integration of different data sources contributes to the identification of genetic interaction partners of arp1 and jnm1 that play a role in the same biological process.

  2. The plant ontology as a tool for comparative plant anatomy and genomic analyses

    Science.gov (United States)

    Plant science is now a major player in the fields of genomics, gene expression analysis, phenomics and metabolomics. Recent advances in sequencing technologies have led to a windfall of data, with new species being added rapidly to the list of species whose genomes have been decoded. The Plant Ontol...

  3. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

    Directory of Open Access Journals (Sweden)

    Steinfeld Israel

    2009-02-01

    Full Text Available Abstract Background Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results. Results GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (e.g. by level of expression or of differential expression. GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms. Conclusion GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation. GOrilla is publicly available at: http://cbl-gorilla.cs.technion.ac.il

  4. Candidate Essential Genes in Burkholderia cenocepacia J2315 Identified by Genome-Wide TraDIS

    KAUST Repository

    Wong, Yee-Chin

    2016-08-22

    Burkholderia cenocepacia infection often leads to fatal cepacia syndrome in cystic fibrosis patients. However, antibiotic therapy rarely results in complete eradication of the pathogen due to its intrinsic resistance to many clinically available antibiotics. Recent attention has turned to the identification of essential genes as the proteins encoded by these genes may serve as potential targets for development of novel antimicrobials. In this study, we utilized TraDIS (Transposon Directed Insertion-site Sequencing) as a genome-wide screening tool to facilitate the identification of B. cenocepacia genes essential for its growth and viability. A transposon mutant pool consisting of approximately 500,000 mutants was successfully constructed, with more than 400,000 unique transposon insertion sites identified by computational analysis of TraDIS datasets. The saturated library allowed for the identification of 383 genes that were predicted to be essential in B. cenocepacia. We extended the application of TraDIS to identify conditionally essential genes required for in vitro growth and revealed an additional repertoire of 439 genes to be crucial for B. cenocepacia growth under nutrient-depleted conditions. The library of B. cenocepacia mutants can subsequently be subjected to various biologically related conditions to facilitate the discovery of genes involved in niche adaptation as well as pathogenicity and virulence.

  5. Candidate Essential Genes in Burkholderia cenocepacia J2315 Identified by Genome-Wide TraDIS

    KAUST Repository

    Wong, Yee-Chin; Abd El Ghany, Moataz; Naeem, Raeece; Lee, Kok-Wei; Tan, Yung-Chie; Pain, Arnab; Nathan, Sheila

    2016-01-01

    Burkholderia cenocepacia infection often leads to fatal cepacia syndrome in cystic fibrosis patients. However, antibiotic therapy rarely results in complete eradication of the pathogen due to its intrinsic resistance to many clinically available antibiotics. Recent attention has turned to the identification of essential genes as the proteins encoded by these genes may serve as potential targets for development of novel antimicrobials. In this study, we utilized TraDIS (Transposon Directed Insertion-site Sequencing) as a genome-wide screening tool to facilitate the identification of B. cenocepacia genes essential for its growth and viability. A transposon mutant pool consisting of approximately 500,000 mutants was successfully constructed, with more than 400,000 unique transposon insertion sites identified by computational analysis of TraDIS datasets. The saturated library allowed for the identification of 383 genes that were predicted to be essential in B. cenocepacia. We extended the application of TraDIS to identify conditionally essential genes required for in vitro growth and revealed an additional repertoire of 439 genes to be crucial for B. cenocepacia growth under nutrient-depleted conditions. The library of B. cenocepacia mutants can subsequently be subjected to various biologically related conditions to facilitate the discovery of genes involved in niche adaptation as well as pathogenicity and virulence.

  6. Candidate essential genes in Burkholderia cenocepacia J2315 identified by genome-wide TraDIS

    Directory of Open Access Journals (Sweden)

    Yee-Chin Wong

    2016-08-01

    Full Text Available Burkholderia cenocepacia infection often leads to fatal cepacia syndrome in cystic fibrosis patients. However, antibiotic therapy rarely results in complete eradication of the pathogen due to its intrinsic resistance to many clinically available antibiotics. Recent attention has turned to the identification of essential genes as the proteins encoded by these genes may serve as potential targets for development of novel antimicrobials. In this study, we utilized TraDIS (Transposon Directed Insertion-site Sequencing as a genome-wide screening tool to facilitate the identification of B. cenocepacia genes essential for its growth and viability. A transposon mutant pool consisting of approximately 500,000 mutants was successfully constructed, with more than 400,000 unique transposon insertion sites identified by computational analysis of TraDIS datasets. The saturated library allowed for the identification of 383 genes that were predicted to be essential in B. cenocepacia. We extended the application of TraDIS to identify conditionally essential genes required for in vitro growth and revealed an additional repertoire of 439 genes to be crucial for B. cenocepacia growth under nutrient-depleted conditions. The library of B. cenocepacia mutants can subsequently be subjected to various biologically related conditions to facilitate the discovery of genes involved in niche adaptation as well as pathogenicity and virulence.

  7. Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

    Science.gov (United States)

    Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

    2017-06-26

    The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis

  8. The genome of Chelonid herpesvirus 5 harbors atypical genes

    Science.gov (United States)

    Ackermann, Mathias; Koriabine, Maxim; Hartmann-Fritsch, Fabienne; de Jong, Pieter J.; Lewis, Teresa D.; Schetle, Nelli; Work, Thierry M.; Dagenais, Julie; Balazs, George H.; Leong, Jo-Ann C.

    2012-01-01

    The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within thealphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of “atypical” DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis

  9. Genome Enabled Discovery of Carbon Sequestration Genes in Poplar

    Energy Technology Data Exchange (ETDEWEB)

    Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping; Strauss, Steve

    2007-02-22

    The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Complete descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral

  10. Lineage-specific evolution of the vertebrate Otopetrin gene family revealed by comparative genomic analyses

    Directory of Open Access Journals (Sweden)

    Ryan Joseph F

    2011-01-01

    Full Text Available Abstract Background Mutations in the Otopetrin 1 gene (Otop1 in mice and fish produce an unusual bilateral vestibular pathology that involves the absence of otoconia without hearing impairment. The encoded protein, Otop1, is the only functionally characterized member of the Otopetrin Domain Protein (ODP family; the extended sequence and structural preservation of ODP proteins in metazoans suggest a conserved functional role. Here, we use the tools of sequence- and cytogenetic-based comparative genomics to study the Otop1 and the Otop2-Otop3 genes and to establish their genomic context in 25 vertebrates. We extend our evolutionary study to include the gene mutated in Usher syndrome (USH subtype 1G (Ush1g, both because of the head-to-tail clustering of Ush1g with Otop2 and because Otop1 and Ush1g mutations result in inner ear phenotypes. Results We established that OTOP1 is the boundary gene of an inversion polymorphism on human chromosome 4p16 that originated in the common human-chimpanzee lineage more than 6 million years ago. Other lineage-specific evolutionary events included a three-fold expansion of the Otop genes in Xenopus tropicalis and of Ush1g in teleostei fish. The tight physical linkage between Otop2 and Ush1g is conserved in all vertebrates. To further understand the functional organization of the Ushg1-Otop2 locus, we deduced a putative map of binding sites for CCCTC-binding factor (CTCF, a mammalian insulator transcription factor, from genome-wide chromatin immunoprecipitation-sequencing (ChIP-seq data in mouse and human embryonic stem (ES cells combined with detection of CTCF-binding motifs. Conclusions The results presented here clarify the evolutionary history of the vertebrate Otop and Ush1g families, and establish a framework for studying the possible interaction(s of Ush1g and Otop in developmental pathways.

  11. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    Directory of Open Access Journals (Sweden)

    Yunsheng Wang

    Full Text Available In this study, we identified and compared nucleotide-binding site (NBS domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China. Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  12. Data on the genome-wide identification of CNL R-genes in Setaria italica (L. P. Beauv.

    Directory of Open Access Journals (Sweden)

    Ethan J. Andersen

    2017-08-01

    Full Text Available We report data associated with the identification of 242 disease resistance genes (R-genes in the genome of Setaria italica as presented in “Genetic diversity of disease resistance genes in foxtail millet (Setaria italica L.” (Andersen and Nepal, 2017 [1]. Our data describe the structure and evolution of the Coiled-coil, Nucleotide-binding site, Leucine-rich repeat (CNL R-genes in foxtail millet. The CNL genes were identified through rigorous extraction and analysis of recently available plant genome sequences using cutting-edge analytical software. Data visualization includes gene structure diagrams, chromosomal syntenic maps, a chromosomal density plot, and a maximum-likelihood phylogenetic tree comparing Sorghum bicolor, Panicum virgatum, Setaria italica, and Arabidopsis thaliana. Compilation of InterProScan annotations, Gene Ontology (GO annotations, and Basic Local Alignment Search Tool (BLAST results for the 242 R-genes identified in the foxtail millet genome are also included in tabular format.

  13. GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

    Science.gov (United States)

    Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

    2006-03-31

    Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.

  14. A tutorial of diverse genome analysis tools found in the CoGe web-platform using Plasmodium spp. as a model

    Science.gov (United States)

    Castillo, Andreina I; Nelson, Andrew D L; Haug-Baltzell, Asher K; Lyons, Eric

    2018-01-01

    Abstract Integrated platforms for storage, management, analysis and sharing of large quantities of omics data have become fundamental to comparative genomics. CoGe (https://genomevolution.org/coge/) is an online platform designed to manage and study genomic data, enabling both data- and hypothesis-driven comparative genomics. CoGe’s tools and resources can be used to organize and analyse both publicly available and private genomic data from any species. Here, we demonstrate the capabilities of CoGe through three example workflows using 17 Plasmodium genomes as a model. Plasmodium genomes present unique challenges for comparative genomics due to their rapidly evolving and highly variable genomic AT/GC content. These example workflows are intended to serve as templates to help guide researchers who would like to use CoGe to examine diverse aspects of genome evolution. In the first workflow, trends in genome composition and amino acid usage are explored. In the second, changes in genome structure and the distribution of synonymous (Ks) and non-synonymous (Kn) substitution values are evaluated across species with different levels of evolutionary relatedness. In the third workflow, microsyntenic analyses of multigene families’ genomic organization are conducted using two Plasmodium-specific gene families—serine repeat antigen, and cytoadherence-linked asexual gene—as models. In general, these example workflows show how to achieve quick, reproducible and shareable results using the CoGe platform. We were able to replicate previously published results, as well as leverage CoGe’s tools and resources to gain additional insight into various aspects of Plasmodium genome evolution. Our results highlight the usefulness of the CoGe platform, particularly in understanding complex features of genome evolution. Database URL: https://genomevolution.org/coge/

  15. CRISPR-Cpf1: A New Tool for Plant Genome Editing

    KAUST Repository

    Zaidi, Syed Shan-e-Ali

    2017-05-19

    Clustered regularly interspaced palindromic repeats (CRISPR)-CRISPR-associated proteins (CRISPR-Cas), a groundbreaking genome-engineering tool, has facilitated targeted trait improvement in plants. Recently, CRISPR-CRISPR from Prevotella and Francisella 1 (Cpf1) has emerged as a new tool for efficient genome editing, including DNA-free editing in plants, with higher efficiency, specificity, and potentially wider applications than CRISPR-Cas9.

  16. CRISPR-Cpf1: A New Tool for Plant Genome Editing

    KAUST Repository

    Zaidi, Syed Shan-e-Ali; Mahfouz, Magdy M.; Mansoor, Shahid

    2017-01-01

    Clustered regularly interspaced palindromic repeats (CRISPR)-CRISPR-associated proteins (CRISPR-Cas), a groundbreaking genome-engineering tool, has facilitated targeted trait improvement in plants. Recently, CRISPR-CRISPR from Prevotella and Francisella 1 (Cpf1) has emerged as a new tool for efficient genome editing, including DNA-free editing in plants, with higher efficiency, specificity, and potentially wider applications than CRISPR-Cas9.

  17. MODEST: a web-based design tool for oligonucleotide-mediated genome engineering and recombineering

    DEFF Research Database (Denmark)

    Bonde, Mads; Klausen, Michael Schantz; Anderson, Mads Valdemar

    2014-01-01

    Recombineering and multiplex automated genome engineering (MAGE) offer the possibility to rapidly modify multiple genomic or plasmid sites at high efficiencies. This enables efficient creation of genetic variants including both single mutants with specifically targeted modifications as well......, which confers the corresponding genetic change, is performed manually. To address these challenges, we have developed the MAGE Oligo Design Tool (MODEST). This web-based tool allows designing of MAGE oligos for (i) tuning translation rates by modifying the ribosomal binding site, (ii) generating...

  18. Comparative genome analysis and resistance gene mapping in grain legumes

    International Nuclear Information System (INIS)

    Young, N.D.

    1998-01-01

    Using, DNA markers and genome organization, several important disease resistance genes have been analyzed in mungbean (Vigna radiata), cowpea (Vigna unguiculata), common bean (Phaseolus vulgaris), and soybean (Glycine max). In the process, medium-density linkage maps consisting of restriction fragment length polymorphism (RFLP) markers were constructed for both mungbean and cowpea. Comparisons between these maps, as well as the maps of soybean and common bean, indicate that there is significant conservation of DNA marker order, though the conserved blocks in soybean are much shorter than in the others. DNA mapping results also indicate that a gene for seed weight may be conserved between mungbean and cowpea. Using the linkage maps, genes that control bruchid (genus Callosobruchus) and powdery mildew (Erysiphe polygoni) resistance in mungbean, aphid resistance in cowpea (Aphis craccivora), and cyst nematode (Heterodera glycines) resistance in soybean have all been mapped and characterized. For some of these traits resistance was found to be oligogenic and DNA mapping uncovered multiple genes involved in the phenotype. (author)

  19. Plant ion channels: gene families, physiology, and functional genomics analyses.

    Science.gov (United States)

    Ward, John M; Mäser, Pascal; Schroeder, Julian I

    2009-01-01

    Distinct potassium, anion, and calcium channels in the plasma membrane and vacuolar membrane of plant cells have been identified and characterized by patch clamping. Primarily owing to advances in Arabidopsis genetics and genomics, and yeast functional complementation, many of the corresponding genes have been identified. Recent advances in our understanding of ion channel genes that mediate signal transduction and ion transport are discussed here. Some plant ion channels, for example, ALMT and SLAC anion channel subunits, are unique. The majority of plant ion channel families exhibit homology to animal genes; such families include both hyperpolarization- and depolarization-activated Shaker-type potassium channels, CLC chloride transporters/channels, cyclic nucleotide-gated channels, and ionotropic glutamate receptor homologs. These plant ion channels offer unique opportunities to analyze the structural mechanisms and functions of ion channels. Here we review gene families of selected plant ion channel classes and discuss unique structure-function aspects and their physiological roles in plant cell signaling and transport.

  20. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, Marie; Jensen, L.J.; Brunak, Søren

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only similar to 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  1. Statistical Viewer: a tool to upload and integrate linkage and association data as plots displayed within the Ensembl genome browser

    Directory of Open Access Journals (Sweden)

    Hauser Elizabeth R

    2005-04-01

    Full Text Available Abstract Background To facilitate efficient selection and the prioritization of candidate complex disease susceptibility genes for association analysis, increasingly comprehensive annotation tools are essential to integrate, visualize and analyze vast quantities of disparate data generated by genomic screens, public human genome sequence annotation and ancillary biological databases. We have developed a plug-in package for Ensembl called "Statistical Viewer" that facilitates the analysis of genomic features and annotation in the regions of interest defined by linkage analysis. Results Statistical Viewer is an add-on package to the open-source Ensembl Genome Browser and Annotation System that displays disease study-specific linkage and/or association data as 2 dimensional plots in new panels in the context of Ensembl's Contig View and Cyto View pages. An enhanced upload server facilitates the upload of statistical data, as well as additional feature annotation to be displayed in DAS tracts, in the form of Excel Files. The Statistical View panel, drawn directly under the ideogram, illustrates lod score values for markers from a study of interest that are plotted against their position in base pairs. A module called "Get Map" easily converts the genetic locations of markers to genomic coordinates. The graph is placed under the corresponding ideogram features a synchronized vertical sliding selection box that is seamlessly integrated into Ensembl's Contig- and Cyto- View pages to choose the region to be displayed in Ensembl's "Overview" and "Detailed View" panels. To resolve Association and Fine mapping data plots, a "Detailed Statistic View" plot corresponding to the "Detailed View" may be displayed underneath. Conclusion Features mapping to regions of linkage are accentuated when Statistic View is used in conjunction with the Distributed Annotation System (DAS to display supplemental laboratory information such as differentially expressed disease

  2. OxyGene: an innovative platform for investigating oxidative-response genes in whole prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Barloy-Hubler Frédérique

    2008-12-01

    Full Text Available Abstract Background Oxidative stress is a common stress encountered by living organisms and is due to an imbalance between intracellular reactive oxygen and nitrogen species (ROS, RNS and cellular antioxidant defence. To defend themselves against ROS/RNS, bacteria possess a subsystem of detoxification enzymes, which are classified with regard to their substrates. To identify such enzymes in prokaryotic genomes, different approaches based on similarity, enzyme profiles or patterns exist. Unfortunately, several problems persist in the annotation, classification and naming of these enzymes due mainly to some erroneous entries in databases, mistake propagation, absence of updating and disparity in function description. Description In order to improve the current annotation of oxidative stress subsystems, an innovative platform named OxyGene has been developed. It integrates an original database called OxyDB, holding thoroughly tested anchor-based signatures associated to subfamilies of oxidative stress enzymes, and a new anchor-driven annotator, for ab initio detection of ROS/RNS response genes. All complete Bacterial and Archaeal genomes have been re-annotated, and the results stored in the OxyGene repository can be interrogated via a Graphical User Interface. Conclusion OxyGene enables the exploration and comparative analysis of enzymes belonging to 37 detoxification subclasses in 664 microbial genomes. It proposes a new classification that improves both the ontology and the annotation of the detoxification subsystems in prokaryotic whole genomes, while discovering new ORFs and attributing precise function to hypothetical annotated proteins. OxyGene is freely available at: http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software

  3. Population genomics of the Arabidopsis thaliana flowering time gene network.

    Science.gov (United States)

    Flowers, Jonathan M; Hanzawa, Yoshie; Hall, Megan C; Moore, Richard C; Purugganan, Michael D

    2009-11-01

    The time to flowering is a key component of the life-history strategy of the model plant Arabidopsis thaliana that varies quantitatively among genotypes. A significant problem for evolutionary and ecological genetics is to understand how natural selection may operate on this ecologically significant trait. Here, we conduct a population genomic study of resequencing data from 52 genes in the flowering time network. McDonald-Kreitman tests of neutrality suggested a strong excess of amino acid polymorphism when pooling across loci. This excess of replacement polymorphism across the flowering time network and a skewed derived frequency spectrum toward rare alleles for both replacement and noncoding polymorphisms relative to synonymous changes is consistent with a large class of deleterious polymorphisms segregating in these genes. Assuming selective neutrality of synonymous changes, we estimate that approximately 30% of amino acid polymorphisms are deleterious. Evidence of adaptive substitution is less prominent in our analysis. The photoperiod regulatory gene, CO, and a gibberellic acid transcription factor, AtMYB33, show evidence of adaptive fixation of amino acid mutations. A test for extended haplotypes revealed no examples of flowering time alleles with haplotypes comparable in length to those associated with the null fri(Col) allele reported previously. This suggests that the FRI gene likely has a uniquely intense or recent history of selection among the flowering time genes considered here. Although there is some evidence for adaptive evolution in these life-history genes, it appears that slightly deleterious polymorphisms are a major component of natural molecular variation in the flowering time network of A. thaliana.

  4. Constraints on genome dynamics revealed from gene distribution among the Ralstonia solanacearum species.

    Directory of Open Access Journals (Sweden)

    Pierre Lefeuvre

    Full Text Available Because it is suspected that gene content may partly explain host adaptation and ecology of pathogenic bacteria, it is important to study factors affecting genome composition and its evolution. While recent genomic advances have revealed extremely large pan-genomes for some bacterial species, it remains difficult to predict to what extent gene pool is accessible within or transferable between populations. As genomes bear imprints of the history of the organisms, gene distribution pattern analyses should provide insights into the forces and factors at play in the shaping and maintaining of bacterial genomes. In this study, we revisited the data obtained from a previous CGH microarrays analysis in order to assess the genomic plasticity of the R. solanacearum species complex. Gene distribution analyses demonstrated the remarkably dispersed genome of R. solanacearum with more than half of the genes being accessory. From the reconstruction of the ancestral genomes compositions, we were able to infer the number of gene gain and loss events along the phylogeny. Analyses of gene movement patterns reveal that factors associated with gene function, genomic localization and ecology delineate gene flow patterns. While the chromosome displayed lower rates of movement, the megaplasmid was clearly associated with hot-spots of gene gain and loss. Gene function was also confirmed to be an essential factor in gene gain and loss dynamics with significant differences in movement patterns between different COG categories. Finally, analyses of gene distribution highlighted possible highways of horizontal gene transfer. Due to sampling and design bias, we can only speculate on factors at play in this gene movement dynamic. Further studies examining precise conditions that favor gene transfer would provide invaluable insights in the fate of bacteria, species delineation and the emergence of successful pathogens.

  5. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    Science.gov (United States)

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  6. Complete genome sequence of Brachyspira intermedia reveals unique genomic features in Brachyspira species and phage-mediated horizontal gene transfer

    Science.gov (United States)

    2011-01-01

    Background Brachyspira spp. colonize the intestines of some mammalian and avian species and show different degrees of enteropathogenicity. Brachyspira intermedia can cause production losses in chickens and strain PWS/AT now becomes the fourth genome to be completed in the genus Brachyspira. Results 15 classes of unique and shared genes were analyzed in B. intermedia, B. murdochii, B. hyodysenteriae and B. pilosicoli. The largest number of unique genes was found in B. intermedia and B. murdochii. This indicates the presence of larger pan-genomes. In general, hypothetical protein annotations are overrepresented among the unique genes. A 3.2 kb plasmid was found in B. intermedia strain PWS/AT. The plasmid was also present in the B. murdochii strain but not in nine other Brachyspira isolates. Within the Brachyspira genomes, genes had been translocated and also frequently switched between leading and lagging strands, a process that can be followed by different AT-skews in the third positions of synonymous codons. We also found evidence that bacteriophages were being remodeled and genes incorporated into them. Conclusions The accessory gene pool shapes species-specific traits. It is also influenced by reductive genome evolution and horizontal gene transfer. Gene-transfer events can cross both species and genus boundaries and bacteriophages appear to play an important role in this process. A mechanism for horizontal gene transfer appears to be gene translocations leading to remodeling of bacteriophages in combination with broad tropism. PMID:21816042

  7. Tools to kill: Genome of one of the most destructive plant pathogenic fungi Macrophomina phaseolina

    Directory of Open Access Journals (Sweden)

    Islam Md

    2012-09-01

    Full Text Available Abstract Background Macrophomina phaseolina is one of the most destructive necrotrophic fungal pathogens that infect more than 500 plant species throughout the world. It can grow rapidly in infected plants and subsequently produces a large amount of sclerotia that plugs the vessels, resulting in wilting of the plant. Results We sequenced and assembled ~49 Mb into 15 super-scaffolds covering 92.83% of the M. phaseolina genome. We predict 14,249 open reading frames (ORFs of which 9,934 are validated by the transcriptome. This phytopathogen has an abundance of secreted oxidases, peroxidases, and hydrolytic enzymes for degrading cell wall polysaccharides and lignocelluloses to penetrate into the host tissue. To overcome the host plant defense response, M. phaseolina encodes a significant number of P450s, MFS type membrane transporters, glycosidases, transposases, and secondary metabolites in comparison to all sequenced ascomycete species. A strikingly distinct set of carbohydrate esterases (CE are present in M. phaseolina, with the CE9 and CE10 families remarkably higher than any other fungi. The phenotypic microarray data indicates that M. phaseolina can adapt to a wide range of osmotic and pH environments. As a broad host range pathogen, M. phaseolina possesses a large number of pathogen-host interaction genes including those for adhesion, signal transduction, cell wall breakdown, purine biosynthesis, and potent mycotoxin patulin. Conclusions The M. phaseolina genome provides a framework of the infection process at the cytological and molecular level which uses a diverse arsenal of enzymatic and toxin tools to destroy the host plants. Further understanding of the M. phaseolina genome-based plant-pathogen interactions will be instrumental in designing rational strategies for disease control, essential to ensuring global agricultural crop production and security.

  8. Tools to kill: genome of one of the most destructive plant pathogenic fungi Macrophomina phaseolina.

    Science.gov (United States)

    Islam, Md Shahidul; Haque, Md Samiul; Islam, Mohammad Moinul; Emdad, Emdadul Mannan; Halim, Abdul; Hossen, Quazi Md Mosaddeque; Hossain, Md Zakir; Ahmed, Borhan; Rahim, Sifatur; Rahman, Md Sharifur; Alam, Md Monjurul; Hou, Shaobin; Wan, Xuehua; Saito, Jennifer A; Alam, Maqsudul

    2012-09-19

    Macrophomina phaseolina is one of the most destructive necrotrophic fungal pathogens that infect more than 500 plant species throughout the world. It can grow rapidly in infected plants and subsequently produces a large amount of sclerotia that plugs the vessels, resulting in wilting of the plant. We sequenced and assembled ~49 Mb into 15 super-scaffolds covering 92.83% of the M. phaseolina genome. We predict 14,249 open reading frames (ORFs) of which 9,934 are validated by the transcriptome. This phytopathogen has an abundance of secreted oxidases, peroxidases, and hydrolytic enzymes for degrading cell wall polysaccharides and lignocelluloses to penetrate into the host tissue. To overcome the host plant defense response, M. phaseolina encodes a significant number of P450s, MFS type membrane transporters, glycosidases, transposases, and secondary metabolites in comparison to all sequenced ascomycete species. A strikingly distinct set of carbohydrate esterases (CE) are present in M. phaseolina, with the CE9 and CE10 families remarkably higher than any other fungi. The phenotypic microarray data indicates that M. phaseolina can adapt to a wide range of osmotic and pH environments. As a broad host range pathogen, M. phaseolina possesses a large number of pathogen-host interaction genes including those for adhesion, signal transduction, cell wall breakdown, purine biosynthesis, and potent mycotoxin patulin. The M. phaseolina genome provides a framework of the infection process at the cytological and molecular level which uses a diverse arsenal of enzymatic and toxin tools to destroy the host plants. Further understanding of the M. phaseolina genome-based plant-pathogen interactions will be instrumental in designing rational strategies for disease control, essential to ensuring global agricultural crop production and security.

  9. Genes on B chromosomes: old questions revisited with new tools.

    Science.gov (United States)

    Banaei-Moghaddam, Ali M; Martis, Mihaela M; Macas, Jiří; Gundlach, Heidrun; Himmelbach, Axel; Altschmied, Lothar; Mayer, Klaus F X; Houben, Andreas

    2015-01-01

    B chromosomes are supernumerary dispensable parts of the karyotype which appear in some individuals of some populations in some species. Often, they have been considered as 'junk DNA' or genomic parasites without functional genes. Due to recent advances in sequencing technologies, it became possible to investigate their DNA composition, transcriptional activity and effects on the host transcriptome profile in detail. Here, we review the most recent findings regarding the gene content of B chromosomes and their transcriptional activities and discuss these findings in the context of comparable biological phenomena, like sex chromosomes, aneuploidy and pseudogenes. Recent data suggest that B chromosomes carry transcriptionally active genic sequences which could affect the transcriptome profile of their host genome. These findings are gradually changing our view that B chromosomes are solely genetically inert selfish elements without any functional genes. This at one side could partly explain the deleterious effects which are associated with their presence. On the other hand it makes B chromosome a nice model for studying regulatory mechanisms of duplicated genes and their evolutionary consequences. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. Analysis of the Genome and Chromium Metabolism-Related Genes of Serratia sp. S2.

    Science.gov (United States)

    Dong, Lanlan; Zhou, Simin; He, Yuan; Jia, Yan; Bai, Qunhua; Deng, Peng; Gao, Jieying; Li, Yingli; Xiao, Hong

    2018-05-01

    This study is to investigate the genome sequence of Serratia sp. S2. The genomic DNA of Serratia sp. S2 was extracted and the sequencing library was constructed. The sequencing was carried out by Illumina 2000 and complete genomic sequences were obtained. Gene function annotation and bioinformatics analysis were performed by comparing with the known databases. The genome size of Serratia sp. S2 was 5,604,115 bp and the G+C content was 57.61%. There were 5373 protein coding genes, and 3732, 3614, and 3942 genes were respectively annotated into the GO, KEGG, and COG databases. There were 12 genes related to chromium metabolism in the Serratia sp. S2 genome. The whole genome sequence of Serratia sp. S2 is submitted to the GenBank database with gene accession number of LNRP00000000. Our findings may provide theoretical basis for the subsequent development of new biotechnology to repair environmental chromium pollution.

  11. Gene loss and horizontal gene transfer contributed to the genome evolution of the extreme acidophile Ferrovum

    Directory of Open Access Journals (Sweden)

    Sophie Roxana Ullrich

    2016-05-01

    Full Text Available Acid mine drainage (AMD, associated with active and abandoned mining sites, is a habitat for acidophilic microorganisms that gain energy from the oxidation of reduced sulfur compounds and ferrous iron and that thrive at pH below 4. Members of the recently proposed genus Ferrovum are the first acidophilic iron oxidizers to be described within the Betaproteobacteria. Although they have been detected as typical community members in AMD habitats worldwide, knowledge of their phylogenetic and metabolic diversity is scarce. Genomics approaches appear to be most promising in addressing this lacuna since isolation and cultivation of Ferrovum has proven to be extremely difficult and has so far only been successful for the designated type strain Ferrovum myxofaciens P3G. In this study, the genomes of two novel strains of Ferrovum (PN-J185 and Z-31 derived from water samples of a mine water treatment plant were sequenced. These genomes were compared with those of Ferrovum sp. JA12 that also originated from the mine water treatment plant, and of the type strain (P3G. Phylogenomic scrutiny suggests that the four strains represent three Ferrovum species that cluster in two groups (1 and 2. Comprehensive analysis of their predicted metabolic pathways revealed that these groups harbor characteristic metabolic profiles, notably with respect to motility, chemotaxis, nitrogen metabolism, biofilm formation and their potential strategies to cope with the acidic environment. For example, while the F. myxofaciens strains (group 1 appear to be motile and diazotrophic, the non-motile group 2 strains have the predicted potential to use a greater variety of fixed nitrogen sources. Furthermore, analysis of their genome synteny provides first insights into their genome evolution, suggesting that horizontal gene transfer and genome reduction in the group 2 strains by loss of genes encoding complete metabolic pathways or physiological features contributed to the observed

  12. Genetics and Genomics: Discovery, Validation, and Utility of Novel Tools for management of Prostate Cancer

    Directory of Open Access Journals (Sweden)

    Alan W. Shindel

    2017-01-01

    Full Text Available Genomics is the science of how genes influence human health and disease states. It differs from traditional genetic screening in that the transcriptional activity (or other markers in full panels of related genes are studied. Compared to simple genetic testing, assessment of expression levels in a panel of genes provides a more nuanced and holistic understanding of genetic modulation of human disease. Genomic testing may be used to great effect in resolving controversial questions on detection and treatment of prostate cancer. Genomic tests are currently in use for numerous facets of prostate cancer care, including screening, biopsy, and treatment planning. The clinical validity (predictive capacity of these assays has been well established; studies on clinical utility (i.e. usefulness of these tests in guiding patient/provider decisions have shown promising results. Men’s health specialists should be familiar with the role genomic testing will play in contemporary management of prostate cancer.

  13. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    Science.gov (United States)

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  14. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    Directory of Open Access Journals (Sweden)

    Ueki Masao

    2012-05-01

    Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.

  15. Comparing genomes: databases and computational tools for comparative analysis of prokaryotic genomes - DOI: 10.3395/reciis.v1i2.Sup.105en

    Directory of Open Access Journals (Sweden)

    Marcos Catanho

    2007-12-01

    Full Text Available Since the 1990's, the complete genetic code of more than 600 living organisms has been deciphered, such as bacteria, yeasts, protozoan parasites, invertebrates and vertebrates, including Homo sapiens, and plants. More than 2,000 other genome projects representing medical, commercial, environmental and industrial interests, or comprising model organisms, important for the development of the scientific research, are currently in progress. The achievement of complete genome sequences of numerous species combined with the tremendous progress in computation that occurred in the last few decades allowed the use of new holistic approaches in the study of genome structure, organization and evolution, as well as in the field of gene prediction and functional classification. Numerous public or proprietary databases and computational tools have been created attempting to optimize the access to this information through the web. In this review, we present the main resources available through the web for comparative analysis of prokaryotic genomes. We concentrated on the group of mycobacteria that contains important human and animal pathogens. The birth of Bioinformatics and Computational Biology and the contributions of these disciplines to the scientific development of this field are also discussed.

  16. Mojo Hand, a TALEN design tool for genome editing applications

    Directory of Open Access Journals (Sweden)

    Neff Kevin L

    2013-01-01

    Full Text Available Abstract Background Recent studies of transcription activator-like (TAL effector domains fused to nucleases (TALENs demonstrate enormous potential for genome editing. Effective design of TALENs requires a combination of selecting appropriate genetic features, finding pairs of binding sites based on a consensus sequence, and, in some cases, identifying endogenous restriction sites for downstream molecular genetic applications. Results We present the web-based program Mojo Hand for designing TAL and TALEN constructs for genome editing applications (http://www.talendesign.org. We describe the algorithm and its implementation. The features of Mojo Hand include (1 automatic download of genomic data from the National Center for Biotechnology Information, (2 analysis of any DNA sequence to reveal pairs of binding sites based on a user-defined template, (3 selection of restriction-enzyme recognition sites in the spacer between the TAL monomer binding sites including options for the selection of restriction enzyme suppliers, and (4 output files designed for subsequent TALEN construction using the Golden Gate assembly method. Conclusions Mojo Hand enables the rapid identification of TAL binding sites for use in TALEN design. The assembly of TALEN constructs, is also simplified by using the TAL-site prediction program in conjunction with a spreadsheet management aid of reagent concentrations and TALEN formulation. Mojo Hand enables scientists to more rapidly deploy TALENs for genome editing applications.

  17. Development of genomic tools for verification of hybrids and selfed ...

    African Journals Online (AJOL)

    The petiole color trait was also used to verify TMS 96/1089A X TME117 where the pink color of the male parent was dominant over the female's green color. The pace of genomic analysis of populations used in the study was enhanced using a modified , quicker DNA isolation protocol which slashed extraction time by 60%.

  18. Mojo Hand, a TALEN design tool for genome editing applications.

    Science.gov (United States)

    Neff, Kevin L; Argue, David P; Ma, Alvin C; Lee, Han B; Clark, Karl J; Ekker, Stephen C

    2013-01-16

    Recent studies of transcription activator-like (TAL) effector domains fused to nucleases (TALENs) demonstrate enormous potential for genome editing. Effective design of TALENs requires a combination of selecting appropriate genetic features, finding pairs of binding sites based on a consensus sequence, and, in some cases, identifying endogenous restriction sites for downstream molecular genetic applications. We present the web-based program Mojo Hand for designing TAL and TALEN constructs for genome editing applications (http://www.talendesign.org). We describe the algorithm and its implementation. The features of Mojo Hand include (1) automatic download of genomic data from the National Center for Biotechnology Information, (2) analysis of any DNA sequence to reveal pairs of binding sites based on a user-defined template, (3) selection of restriction-enzyme recognition sites in the spacer between the TAL monomer binding sites including options for the selection of restriction enzyme suppliers, and (4) output files designed for subsequent TALEN construction using the Golden Gate assembly method. Mojo Hand enables the rapid identification of TAL binding sites for use in TALEN design. The assembly of TALEN constructs, is also simplified by using the TAL-site prediction program in conjunction with a spreadsheet management aid of reagent concentrations and TALEN formulation. Mojo Hand enables scientists to more rapidly deploy TALENs for genome editing applications.

  19. MUTAGEN: Multi-user tool for annotating GENomes

    DEFF Research Database (Denmark)

    Brugger, K.; Redder, P.; Skovgaard, Marie

    2003-01-01

    MUTAGEN is a free prokaryotic annotation system. It offers the advantages of genome comparison, graphical sequence browsers, search facilities and open-source for user-specific adjustments. The web-interface allows several users to access the system from standard desktop computers. The Sulfolobus...

  20. Powerful tools for genetic modification: Advances in gene editing.

    Science.gov (United States)

    Roesch, Erica A; Drumm, Mitchell L

    2017-11-01

    Recent discoveries and technical advances in genetic engineering, methods called gene or genome editing, provide hope for repairing genes that cause diseases like cystic fibrosis (CF) or otherwise altering a gene for therapeutic benefit. There are both hopes and hurdles with these technologies, with new ideas emerging almost daily. Initial studies using intestinal organoid cultures carrying the common, F508del mutation have shown that gene editing by CRISPR/Cas9 can convert cells lacking CFTR function to cells with normal channel function, providing a precedent that this technology can be harnessed for CF. While this is an important precedent, the challenges that remain are not trivial. A logistical issue for this and many other genetic diseases is genetic heterogeneity. Approximately, 2000 mutations associated with CF have been found in CFTR, the gene responsible for CF, and thus a feasible strategy that would encompass all individuals affected by the disease is particularly difficult to envision. However, single strategies that would be applicable to all subjects affected by CF have been conceived and are being investigated. With all of these approaches, efficiency (the proportion of cells edited), accuracy (how often other sites in the genome are affected), and delivery of the gene editing components to the desired cells are perhaps the most significant, impending hurdles. Our understanding of each of these areas is increasing rapidly, and while it is impossible to predict when a successful strategy will reach the clinic, there is every reason to believe it is a question of "when" and not "if." © 2017 Wiley Periodicals, Inc.

  1. Mapping Determinants of Gene Expression Plasticity by Genetical Genomics in C. elegans

    NARCIS (Netherlands)

    Li, Y.; Alda Alvarez, O.; Gutteling, E.W.; Tijsterman, M.; Fu, J.; Riksen, J.A.G.; Hazendonk, E.; Prins, J.C.P.; Plasterk, R.H.A.; Jansen, R.C.; Breitling, R.; Kammenga, J.E.

    2006-01-01

    Recent genetical genomics studies have provided intimate views on gene regulatory networks. Gene expression variations between genetically different individuals have been mapped to the causal regulatory regions, termed expression quantitative trait loci. Whether the environment-induced plastic

  2. Mapping determinants of gene expression plasticity by genetical genomics in C. elegans.

    NARCIS (Netherlands)

    Li, Y.; Alvarez, O.A.; Gutteling, E.W.; Tijsterman, M.; Fu, J.; Riksen, J.A.; Hazendonk, M.G.A.; Prins, P.; Plasterk, R.H.A.; Jansen, R.C.; Breitling, R.; Kammenga, J.E.

    2006-01-01

    Recent genetical genomics studies have provided intimate views on gene regulatory networks. Gene expression variations between genetically different individuals have been mapped to the causal regulatory regions, termed expression quantitative trait loci. Whether the environment-induced plastic

  3. Genome-wide analysis of regions similar to promoters of histone genes

    KAUST Repository

    Chowdhary, Rajesh; Bajic, Vladimir B.; Dong, Difeng; Wong, Limsoon; Liu, Jun S

    2010-01-01

    of histone and histone-coregulated gene transcription initiation. While these hypotheses still remain to be verified, we believe that these form a useful resource for researchers to further explore regulation of human histone genes and human genome

  4. MicroScope-an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data.

    Science.gov (United States)

    Médigue, Claudine; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Gautreau, Guillaume; Josso, Adrien; Lajus, Aurélie; Langlois, Jordan; Pereira, Hugo; Planel, Rémi; Roche, David; Rollin, Johan; Rouy, Zoe; Vallenet, David

    2017-09-12

    The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources. © The Author 2017. Published by Oxford University Press.

  5. Public data and open source tools for multi-assay genomic investigation of disease.

    Science.gov (United States)

    Kannan, Lavanya; Ramos, Marcel; Re, Angela; El-Hachem, Nehme; Safikhani, Zhaleh; Gendoo, Deena M A; Davis, Sean; Gomez-Cabrero, David; Castelo, Robert; Hansen, Kasper D; Carey, Vincent J; Morgan, Martin; Culhane, Aedín C; Haibe-Kains, Benjamin; Waldron, Levi

    2016-07-01

    Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods. © The Author 2015. Published by Oxford University Press.

  6. A Simulation Tool for the Study of Symmetric Inversions in Bacterial Genomes

    Science.gov (United States)

    Dias, Ulisses; Dias, Zanoni; Setubal, João C.

    We present the tool SIB that simulates genomic inversions in bacterial chromosomes. The tool simulates symmetric inversions but allows the appearance of nonsymmetric inversions by simulating small syntenic blocks frequently observed on bacterial genome comparisons. We evaluate SIB by comparing its results to real genome alignments. We develop measures that allow quantitative comparisons between real pairwise alignments (in terms of dotplots) and simulated ones. These measures allow an evaluation of SIB in terms of dendrograms. We evaluate SIB by comparing its results to whole chromosome alignments and maximum likelihood trees for three bacterial groups (the Pseudomonadaceae family and the Xanthomonas and Shewanella genera). We demonstrate an application of SIB by using it to evaluate the ancestral genome reconstruction tool MGR.

  7. Genome-wide study of correlations between genomic features and their relationship with the regulation of gene expression.

    Science.gov (United States)

    Kravatsky, Yuri V; Chechetkin, Vladimir R; Tchurikov, Nikolai A; Kravatskaya, Galina I

    2015-02-01

    The broad class of tasks in genetics and epigenetics can be reduced to the study of various features that are distributed over the genome (genome tracks). The rapid and efficient processing of the huge amount of data stored in the genome-scale databases cannot be achieved without the software packages based on the analytical criteria. However, strong inhomogeneity of genome tracks hampers the development of relevant statistics. We developed the criteria for the assessment of genome track inhomogeneity and correlations between two genome tracks. We also developed a software package, Genome Track Analyzer, based on this theory. The theory and software were tested on simulated data and were applied to the study of correlations between CpG islands and transcription start sites in the Homo sapiens genome, between profiles of protein-binding sites in chromosomes of Drosophila melanogaster, and between DNA double-strand breaks and histone marks in the H. sapiens genome. Significant correlations between transcription start sites on the forward and the reverse strands were observed in genomes of D. melanogaster, Caenorhabditis elegans, Mus musculus, H. sapiens, and Danio rerio. The observed correlations may be related to the regulation of gene expression in eukaryotes. Genome Track Analyzer is freely available at http://ancorr.eimb.ru/. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  8. Genomic Characterization of Phenylalanine Ammonia Lyase Gene in Buckwheat.

    Directory of Open Access Journals (Sweden)

    Karthikeyan Thiyagarajan

    Full Text Available Phenylalanine Ammonia Lyase (PAL gene which plays a key role in bio-synthesis of medicinally important compounds, Rutin/quercetin was sequence characterized for its efficient genomics application. These compounds possessing anti-diabetic and anti-cancer properties and are predominantly produced by Fagopyrum spp. In the present study, PAL gene was sequenced from three Fagopyrum spp. (F. tataricum, F. esculentum and F. dibotrys and showed the presence of three SNPs and four insertion/deletions at intra and inter specific level. Among them, the potential SNP (position 949th bp G>C with Parsimony Informative Site was selected and successfully utilised to individuate the zygosity/allelic variation of 16 F. tataricum varieties. Insertion mutations were identified in coding region, which resulted the change of a stretch of 39 amino acids on the putative protein. Our Study revealed that autogamous species (F. tataricum has lower frequency of observed SNPs as compared to allogamous species (F. dibotrys and F. esculentum. The identified SNPs in F. tataricum didn't result to amino acid change, while in other two species it caused both conservative and non-conservative variations. Consistent pattern of SNPs across the species revealed their phylogenetic importance. We found two groups of F. tataricum and one of them was closely related with F. dibotrys. Sequence characterization information of PAL gene reported in present investigation can be utilized in genetic improvement of buckwheat in reference to its medicinal value.

  9. CRISPR/Cas9-loxP-Mediated Gene Editing as a Novel Site-Specific Genetic Manipulation Tool.

    Science.gov (United States)

    Yang, Fayu; Liu, Changbao; Chen, Ding; Tu, Mengjun; Xie, Haihua; Sun, Huihui; Ge, Xianglian; Tang, Lianchao; Li, Jin; Zheng, Jiayong; Song, Zongming; Qu, Jia; Gu, Feng

    2017-06-16

    Cre-loxP, as one of the site-specific genetic manipulation tools, offers a method to study the spatial and temporal regulation of gene expression/inactivation in order to decipher gene function. CRISPR/Cas9-mediated targeted genome engineering technologies are sparking a new revolution in biological research. Whether the traditional site-specific genetic manipulation tool and CRISPR/Cas9 could be combined to create a novel genetic tool for highly specific gene editing is not clear. Here, we successfully generated a CRISPR/Cas9-loxP system to perform gene editing in human cells, providing the proof of principle that these two technologies can be used together for the first time. We also showed that distinct non-homologous end-joining (NHEJ) patterns from CRISPR/Cas9-mediated gene editing of the targeting sequence locates at the level of plasmids (episomal) and chromosomes. Specially, the CRISPR/Cas9-mediated NHEJ pattern in the nuclear genome favors deletions (64%-68% at the human AAVS1 locus versus 4%-28% plasmid DNA). CRISPR/Cas9-loxP, a novel site-specific genetic manipulation tool, offers a platform for the dissection of gene function and molecular insights into DNA-repair pathways. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  10. Genome-wide target profiling of piggyBac and Tol2 in HEK 293: pros and cons for gene discovery and gene therapy

    Science.gov (United States)

    2011-01-01

    Background DNA transposons have emerged as indispensible tools for manipulating vertebrate genomes with applications ranging from insertional mutagenesis and transgenesis to gene therapy. To fully explore the potential of two highly active DNA transposons, piggyBac and Tol2, as mammalian genetic tools, we have conducted a side-by-side comparison of the two transposon systems in the same setting to evaluate their advantages and disadvantages for use in gene therapy and gene discovery. Results We have observed that (1) the Tol2 transposase (but not piggyBac) is highly sensitive to molecular engineering; (2) the piggyBac donor with only the 40 bp 3'-and 67 bp 5'-terminal repeat domain is sufficient for effective transposition; and (3) a small amount of piggyBac transposases results in robust transposition suggesting the piggyBac transpospase is highly active. Performing genome-wide target profiling on data sets obtained by retrieving chromosomal targeting sequences from individual clones, we have identified several piggyBac and Tol2 hotspots and observed that (4) piggyBac and Tol2 display a clear difference in targeting preferences in the human genome. Finally, we have observed that (5) only sites with a particular sequence context can be targeted by either piggyBac or Tol2. Conclusions The non-overlapping targeting preference of piggyBac and Tol2 makes them complementary research tools for manipulating mammalian genomes. PiggyBac is the most promising transposon-based vector system for achieving site-specific targeting of therapeutic genes due to the flexibility of its transposase for being molecularly engineered. Insights from this study will provide a basis for engineering piggyBac transposases to achieve site-specific therapeutic gene targeting. PMID:21447194

  11. Genome Wide Identification, Phylogeny, and Expression of Aquaporin Genes in Common Carp (Cyprinus carpio.

    Directory of Open Access Journals (Sweden)

    Chuanju Dong

    Full Text Available Aquaporins (Aqps are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication.In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplication event and that the aqp gene family in other teleosts has been lost in their evolution history with the reason that the functions of genes are redundant and conservation. Expression patterns were assessed in various tissues, including brain, heart, spleen, liver, intestine, gill, muscle, and skin, which demonstrated the comprehensive expression profiles of aqp genes in the tetraploidized genome. Significant gene expression divergences have been observed, revealing substantial expression divergences or functional divergences in those duplicated aqp genes post the latest WGD event.To some extent, the gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp aqp gene family provides an

  12. Endonucleases : new tools to edit the mouse genome

    NARCIS (Netherlands)

    Wijshake, Tobias; Baker, Darren J.; van de Sluis, Bart

    2014-01-01

    Mouse transgenesis has been instrumental in determining the function of genes in the pathophysiology of human diseases and modification of genes by homologous recombination in mouse embryonic stem cells remains a widely used technology. However, this approach harbors a number of disadvantages, as it

  13. Comparative analysis of genome maintenance genes in naked mole rat, mouse, and human

    NARCIS (Netherlands)

    S.L. Macrae (Sheila L.); Q. Zhang (Quanwei); C. Lemetre (Christophe); I. Seim (Inge); R.B. Calder (Robert B.); J.H.J. Hoeijmakers (Jan); Y. Suh (Yousin); V.N. Gladyshev (Vadim N.); A. Seluanov (Andrei); V. Gorbunova (Vera); J. Vijg (Jan); Z.D. Zhang (Zhengdong D.)

    2015-01-01

    textabstractGenome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM

  14. Analysis of cassava (Manihot esculenta) ESTs: A tool for the discovery of genes

    International Nuclear Information System (INIS)

    Zapata, Andres; Neme, Rafik; Sanabria, Carolina; Lopez, Camilo

    2011-01-01

    Cassava (Manihot esculenta) is the main source of calories for more than 1,000 millions of people around the world and has been consolidated as the fourth most important crop after rice, corn and wheat. Cassava is considered tolerant to abiotic and biotic stress conditions; nevertheless these characteristics are mainly present in non-commercial varieties. Genetic breeding strategies represent an alternative to introduce the desirable characteristics into commercial varieties. A fundamental step for accelerating the genetic breeding process in cassava requires the identification of genes associated to these characteristics. One rapid strategy for the identification of genes is the possibility to have a large collection of ESTs (expressed sequence tag). In this study, a complete analysis of cassava ESTs was done. The cassava ESTs represent 80,459 sequences which were assembled in a set of 29,231 unique genes (unigen), comprising 10,945 contigs and 18,286 singletones. These 29,231 unique genes represent about 80% of the genes of the cassava's genome. Between 5% and 10% of the unigenes of cassava not show similarity to any sequences present in the NCBI database and could be consider as cassava specific genes. a functional category was assigned to a group of sequences of the unigen set (29%) following the Gene Ontology Vocabulary. the molecular function component was the best represented with 43% of the sequences, followed by the biological process component (38%) and finally the cellular component with 19%. in the cassava ESTs collection, 3,709 microsatellites were identified and they could be used as molecular markers. this study represents an important contribution to the knowledge of the functional genomic structure of cassava and constitutes an important tool for the identification of genes associated to agricultural characteristics of interest that could be employed in cassava breeding programs.

  15. EVA: Exome Variation Analyzer, an efficient and versatile tool for filtering strategies in medical genomics

    Directory of Open Access Journals (Sweden)

    Coutant Sophie

    2012-09-01

    Full Text Available Abstract Background Whole exome sequencing (WES has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of efficient algorithms has been developed to ensure the variant discovery. They generally lead to ~20,000 variations that have to be narrow down to find the potential pathogenic allelic variant(s and the affected gene(s. For this purpose, commonly adopted procedures which implicate various filtering strategies have emerged: exclusion of common variations, type of the allelics variants, pathogenicity effect prediction, modes of inheritance and multiple individuals for exome comparison. To deal with the expansion of WES in medical genomics individual laboratories, new convivial and versatile software tools have to implement these filtering steps. Non-programmer biologists have to be autonomous combining themselves different filtering criteria and conduct a personal strategy depending on their assumptions and study design. Results We describe EVA (Exome Variation Analyzer, a user-friendly web-interfaced software dedicated to the filtering strategies for medical WES. Thanks to different modules, EVA (i integrates and stores annotated exome variation data as strictly confidential to the project owner, (ii allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. We report a demonstrative case study that allowed to identify a new candidate gene

  16. Agrobacterium-mediated transformation as a tool for functional genomics in fungi

    NARCIS (Netherlands)

    Michielse, C.B.; Hooykaas, P.J.J.; Hondel, C.A.M.J.J. van den; Ram, A.F.J.

    2005-01-01

    In the era of functional genomics, the need for tools to perform large-scale targeted and random mutagenesis is increasing. A potential tool is Agrobacterium-mediated fungal transformation. A. tumefaciens is able to transfer a part of its DNA (transferred DNA; T-DNA) to a wide variety of fungi and

  17. Theories of Population Variation in Genes and Genomes

    DEFF Research Database (Denmark)

    Christiansen, Freddy

    This textbook provides an authoritative introduction to both classical and coalescent approaches to population genetics. Written for graduate students and advanced undergraduates by one of the world’s leading authorities in the field, the book focuses on the theoretical background of population...... genetics, while emphasizing the close interplay between theory and empiricism. Traditional topics such as genetic and phenotypic variation, mutation, migration, and linkage are covered and advanced by contemporary coalescent theory, which describes the genealogy of genes in a population, ultimately...... connecting them to a single common ancestor. Effects of selection, particularly genomic effects, are discussed with reference to molecular genetic variation. The book is designed for students of population genetics, bioinformatics, evolutionary biology, molecular evolution, and theoretical biology—as well...

  18. Genome wide analyses of metal responsive genes in Caenorhabditis elegans

    Directory of Open Access Journals (Sweden)

    Michael eAschner

    2012-04-01

    Full Text Available Metals are major contaminants that influence human health. Many metals have physiologic roles, but excessive levels can be harmful. Advances in technology have made toxicogenomic analyses possible to characterize the effects of metal exposure on the entire genome. Much of what is known about cellular responses to metals has come from mammalian systems; however the use of non-mammalian species is gaining wider attention. Caenorhabditis elegans (C. elegans is a small round worm whose genome has been fully sequenced and its development from egg to adult is well characterized. It is an attractive model for high throughput screens due to its short lifespan, ease of genetic mutability, low cost and high homology with humans. Research performed in C. elegans has led to insights in apoptosis, gene expression and neurodegeneration, all of which can be altered by metal exposure. Additionally, by using worms one can potentially study how the mechanisms that underline differential responses to metals in nematodes and humans, allowing for identification of novel pathways and therapeutic targets. In this review, toxicogenomic studies performed in C. elegans exposed to various metals will be discussed, highlighting how this non-mammalian system can be utilized to study cellular processes and pathways induced by metals. Recent work focusing on neurodegeneration in Parkinson’s disease will be discussed as an example of the usefulness of genetic screens in C. elegans and the novel findings that can be produced.

  19. Genomes correction and assembling: present methods and tools

    Science.gov (United States)

    Wojcieszek, Michał; Pawełkowicz, Magdalena; Nowak, Robert; Przybecki, Zbigniew

    2014-11-01

    Recent rapid development of next generation sequencing (NGS) technologies provided significant impact into genomics field of study enabling implementation of many de novo sequencing projects of new species which was previously confined by technological costs. Along with advancement of NGS there was need for adjustment in assembly programs. New algorithms must cope with massive amounts of data computation in reasonable time limits and processing power and hardware is also an important factor. In this paper, we address the issue of assembly pipeline for de novo genome assembly provided by programs presently available for scientist both as commercial and as open - source software. The implementation of four different approaches - Greedy, Overlap - Layout - Consensus (OLC), De Bruijn and Integrated resulting in variation of performance is the main focus of our discussion with additional insight into issue of short and long reads correction.

  20. Meta genome-wide network from functional linkages of genes in human gut microbial ecosystems.

    Science.gov (United States)

    Ji, Yan; Shi, Yixiang; Wang, Chuan; Dai, Jianliang; Li, Yixue

    2013-03-01

    The human gut microbial ecosystem (HGME) exerts an important influence on the human health. In recent researches, meta-genomics provided deep insights into the HGME in terms of gene contents, metabolic processes and genome constitutions of meta-genome. Here we present a novel methodology to investigate the HGME on the basis of a set of functionally coupled genes regardless of their genome origins when considering the co-evolution properties of genes. By analyzing these coupled genes, we showed some basic properties of HGME significantly associated with each other, and further constructed a protein interaction map of human gut meta-genome to discover some functional modules that may relate with essential metabolic processes. Compared with other studies, our method provides a new idea to extract basic function elements from meta-genome systems and investigate complex microbial environment by associating its biological traits with co-evolutionary fingerprints encoded in it.

  1. GeneDig: a web application for accessing genomic and bioinformatics knowledge.

    Science.gov (United States)

    Suciu, Radu M; Aydin, Emir; Chen, Brian E

    2015-02-28

    With the exponential increase and widespread availability of genomic, transcriptomic, and proteomic data, accessing these '-omics' data is becoming increasingly difficult. The current resources for accessing and analyzing these data have been created to perform highly specific functions intended for specialists, and thus typically emphasize functionality over user experience. We have developed a web-based application, GeneDig.org, that allows any general user access to genomic information with ease and efficiency. GeneDig allows for searching and browsing genes and genomes, while a dynamic navigator displays genomic, RNA, and protein information simultaneously for co-navigation. We demonstrate that our application allows more than five times faster and efficient access to genomic information than any currently available methods. We have developed GeneDig as a platform for bioinformatics integration focused on usability as its central design. This platform will introduce genomic navigation to broader audiences while aiding the bioinformatics analyses performed in everyday biology research.

  2. Hypothesis: Gene-rich plastid genomes in red algae may be an outcome of nuclear genome reduction.

    Science.gov (United States)

    Qiu, Huan; Lee, Jun Mo; Yoon, Hwan Su; Bhattacharya, Debashish

    2017-06-01

    Red algae (Rhodophyta) putatively diverged from the eukaryote tree of life >1.2 billion years ago and are the source of plastids in the ecologically important diatoms, haptophytes, and dinoflagellates. In general, red algae contain the largest plastid gene inventory among all such organelles derived from primary, secondary, or additional rounds of endosymbiosis. In contrast, their nuclear gene inventory is reduced when compared to their putative sister lineage, the Viridiplantae, and other photosynthetic lineages. The latter is thought to have resulted from a phase of genome reduction that occurred in the stem lineage of Rhodophyta. A recent comparative analysis of a taxonomically broad collection of red algal and Viridiplantae plastid genomes demonstrates that the red algal ancestor encoded ~1.5× more plastid genes than Viridiplantae. This difference is primarily explained by more extensive endosymbiotic gene transfer (EGT) in the stem lineage of Viridiplantae, when compared to red algae. We postulate that limited EGT in Rhodophytes resulted from the countervailing force of ancient, and likely recurrent, nuclear genome reduction. In other words, the propensity for nuclear gene loss led to the retention of red algal plastid genes that would otherwise have undergone intracellular gene transfer to the nucleus. This hypothesis recognizes the primacy of nuclear genome evolution over that of plastids, which have no inherent control of their gene inventory and can change dramatically (e.g., secondarily non-photosynthetic eukaryotes, dinoflagellates) in response to selection acting on the host lineage. © 2017 Phycological Society of America.

  3. Strategies used for genetically modifying bacterial genome: ite-directed mutagenesis, gene inactivation, and gene over-expression*

    Science.gov (United States)

    Xu, Jian-zhong; Zhang, Wei-guo

    2016-01-01

    With the availability of the whole genome sequence of Escherichia coli or Corynebacterium glutamicum, strategies for directed DNA manipulation have developed rapidly. DNA manipulation plays an important role in understanding the function of genes and in constructing novel engineering bacteria according to requirement. DNA manipulation involves modifying the autologous genes and expressing the heterogenous genes. Two alternative approaches, using electroporation linear DNA or recombinant suicide plasmid, allow a wide variety of DNA manipulation. However, the over-expression of the desired gene is generally executed via plasmid-mediation. The current review summarizes the common strategies used for genetically modifying E. coli and C. glutamicum genomes, and discusses the technical problem of multi-layered DNA manipulation. Strategies for gene over-expression via integrating into genome are proposed. This review is intended to be an accessible introduction to DNA manipulation within the bacterial genome for novices and a source of the latest experimental information for experienced investigators. PMID:26834010

  4. Draft Genome Sequence and Gene Annotation of the Entomopathogenic Fungus Verticillium hemipterigenum

    OpenAIRE

    Horn, Fabian; Habel, Andreas; Scharf, Daniel H.; Dworschak, Jan; Brakhage, Axel A.; Guthke, Reinhard; Hertweck, Christian; Linde, J?rg

    2015-01-01

    Verticillium hemipterigenum (anamorph Torrubiella hemipterigena) is an entomopathogenic fungus and produces a broad range of secondary metabolites. Here, we present the draft genome sequence of the fungus, including gene structure and functional annotation. Genes were predicted incorporating RNA-Seq data and functionally annotated to provide the basis for further genome studies.

  5. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming.

    Science.gov (United States)

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.

  6. Congruent Deep Relationships in the Grape Family (Vitaceae Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming.

    Directory of Open Access Journals (Sweden)

    Ning Zhang

    Full Text Available Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera. The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.

  7. Cost-effective cloud computing: a case study using the comparative genomics tool, roundup.

    Science.gov (United States)

    Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A; Tonellato, Peter J; Wall, Dennis P

    2010-12-22

    Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource-Roundup-using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.

  8. G-NEST: A gene neighborhood scoring tool to identify co-conserved, co-expressed genes

    Science.gov (United States)

    In previous studies, gene neighborhoods--spatial clusters of co-expressed genes in the genome--have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Sc...

  9. Emerging Genomic Tools for Legume Breeding: Current Status and Future Prospects

    Science.gov (United States)

    Pandey, Manish K.; Roorkiwal, Manish; Singh, Vikas K.; Ramalingam, Abirami; Kudapa, Himabindu; Thudi, Mahendar; Chitikineni, Anu; Rathore, Abhishek; Varshney, Rajeev K.

    2016-01-01

    Legumes play a vital role in ensuring global nutritional food security and improving soil quality through nitrogen fixation. Accelerated higher genetic gains is required to meet the demand of ever increasing global population. In recent years, speedy developments have been witnessed in legume genomics due to advancements in next-generation sequencing (NGS) and high-throughput genotyping technologies. Reference genome sequences for many legume crops have been reported in the last 5 years. The availability of the draft genome sequences and re-sequencing of elite genotypes for several important legume crops have made it possible to identify structural variations at large scale. Availability of large-scale genomic resources and low-cost and high-throughput genotyping technologies are enhancing the efficiency and resolution of genetic mapping and marker-trait association studies. Most importantly, deployment of molecular breeding approaches has resulted in development of improved lines in some legume crops such as chickpea and groundnut. In order to support genomics-driven crop improvement at a fast pace, the deployment of breeder-friendly genomics and decision support tools seems appear to be critical in breeding programs in developing countries. This review provides an overview of emerging genomics and informatics tools/approaches that will be the key driving force for accelerating genomics-assisted breeding and ultimately ensuring nutritional and food security in developing countries. PMID:27199998

  10. antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth

    2015-01-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we...... introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration...... of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products...

  11. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    OpenAIRE

    Wright, James C.; Sugden, Deana; Francis-McIntyre, Sue; Riba Garcia, Isabel; Gaskell, Simon J.; Grigoriev, Igor V.; Baker, Scott E.; Beynon, Robert J.; Hubbard, Simon J.

    2009-01-01

    Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were ac...

  12. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes

    Science.gov (United States)

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approxima...

  13. The First Myriapod Genome Sequence Reveals Conservative Arthropod Gene Content and Genome Organisation in the Centipede Strigamia maritima

    Science.gov (United States)

    Chipman, Ariel D.; Ferrier, David E. K.; Brena, Carlo; Qu, Jiaxin; Hughes, Daniel S. T.; Schröder, Reinhard; Torres-Oliva, Montserrat; Znassi, Nadia; Jiang, Huaiyang; Almeida, Francisca C.; Alonso, Claudio R.; Apostolou, Zivkos; Aqrawi, Peshtewani; Arthur, Wallace; Barna, Jennifer C. J.; Blankenburg, Kerstin P.; Brites, Daniela; Capella-Gutiérrez, Salvador; Coyle, Marcus; Dearden, Peter K.; Du Pasquier, Louis; Duncan, Elizabeth J.; Ebert, Dieter; Eibner, Cornelius; Erikson, Galina; Evans, Peter D.; Extavour, Cassandra G.; Francisco, Liezl; Gabaldón, Toni; Gillis, William J.; Goodwin-Horn, Elizabeth A.; Green, Jack E.; Griffiths-Jones, Sam; Grimmelikhuijzen, Cornelis J. P.; Gubbala, Sai; Guigó, Roderic; Han, Yi; Hauser, Frank; Havlak, Paul; Hayden, Luke; Helbing, Sophie; Holder, Michael; Hui, Jerome H. L.; Hunn, Julia P.; Hunnekuhl, Vera S.; Jackson, LaRonda; Javaid, Mehwish; Jhangiani, Shalini N.; Jiggins, Francis M.; Jones, Tamsin E.; Kaiser, Tobias S.; Kalra, Divya; Kenny, Nathan J.; Korchina, Viktoriya; Kovar, Christie L.; Kraus, F. Bernhard; Lapraz, François; Lee, Sandra L.; Lv, Jie; Mandapat, Christigale; Manning, Gerard; Mariotti, Marco; Mata, Robert; Mathew, Tittu; Neumann, Tobias; Newsham, Irene; Ngo, Dinh N.; Ninova, Maria; Okwuonu, Geoffrey; Ongeri, Fiona; Palmer, William J.; Patil, Shobha; Patraquim, Pedro; Pham, Christopher; Pu, Ling-Ling; Putman, Nicholas H.; Rabouille, Catherine; Ramos, Olivia Mendivil; Rhodes, Adelaide C.; Robertson, Helen E.; Robertson, Hugh M.; Ronshaugen, Matthew; Rozas, Julio; Saada, Nehad; Sánchez-Gracia, Alejandro; Scherer, Steven E.; Schurko, Andrew M.; Siggens, Kenneth W.; Simmons, DeNard; Stief, Anna; Stolle, Eckart; Telford, Maximilian J.; Tessmar-Raible, Kristin; Thornton, Rebecca; van der Zee, Maurijn; von Haeseler, Arndt; Williams, James M.; Willis, Judith H.; Wu, Yuanqing; Zou, Xiaoyan; Lawson, Daniel; Muzny, Donna M.; Worley, Kim C.; Gibbs, Richard A.; Akam, Michael; Richards, Stephen

    2014-01-01

    Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific

  14. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, M; Jensen, L J; Brunak, S

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribut......In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only approximately 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  15. The impact of genome triplication on tandem gene evolution in Brassica rapa

    Directory of Open Access Journals (Sweden)

    Lu eFang

    2012-11-01

    Full Text Available Whole genome duplication (WGD and tandem duplication (TD are both important modes of gene expansion. However, how whole genome duplication influences tandemly duplicated genes is not well studied. We used Brassica rapa, which has undergone an additional genome triplication (WGT and shares a common ancestor with Arabidopsis thaliana, Arabidopsis lyrata and Thellungiella parvula, to investigate the impact of genome triplication on tandem gene evolution. We identified 2,137, 1,569, 1,751 and 1,135 tandem gene arrays in B. rapa, A. thaliana, A. lyrata and T. parvula respectively. Among them, 414 conserved tandem arrays are shared by the 3 species without WGT, which were also considered as existing in the diploid ancestor of B. rapa. Thus, after genome triplication, B. rapa should have 1,242 tandem arrays according to the 414 conserved tandems. Here, we found 400 out of the 414 tandems had at least one syntenic ortholog in the genome of B. rapa. Furthermore, 294 out of the 400 shared syntenic orthologs maintain tandem arrays (more than one gene for each syntenic hit in B. rapa. For the 294 tandem arrays, we obtained 426 copies of syntenic paralogous tandems in the triplicated genome of B. rapa. In this study, we demonstrated that tandem arrays in B. rapa were dramatically fractionated after WGT when compared either to non-tandem genes in the B. rapa genome or to the tandem arrays in closely related species that have not experienced a recent whole-genome polyploidization event.

  16. The origin and evolution of Basigin(BSG) gene: A comparative genomic and phylogenetic analysis.

    Science.gov (United States)

    Zhu, Xinyan; Wang, Shenglan; Shao, Mingjie; Yan, Jie; Liu, Fei

    2017-07-01

    Basigin (BSG), also known as extracellular matrix metalloproteinase inducer (EMMPRIN) or cluster of differentiation 147 (CD147), plays various fundamental roles in the intercellular recognition involved in immunologic phenomena, differentiation, and development. In this study, we aimed to compare the similarities and differences of BSG among organisms and explore possible evolutionary relationships based on the comparison result. We used the extensive BLAST tool to search the metazoan genomes, N-glycosylation sites, the transmembrane region and other functional sites. We then identified BSG homologs from genomic sequences and analyzed their phylogenetic relationships. We identified that BSG genes exist not only in the vertebrate metazoans but also in the invertebrate metazoans such as Amphioxus B. floridae, D. melanogaster, A. mellifera, S. japonicum, C. gigas, and T. patagoniensis. After sequence analysis, we confirmed that only vertebrate metazoans and Cephalochordate (amphioxus B. floridae) have the classic structure (a signal peptide, two Ig-like domains (IgC2 and IgI), a transmembrane region, and an intracellular domain). The invertebrate metazoans (excluding amphioxus B. floridae) lack the N-terminal signal peptides and IgC2 domain. We then generated a phylogenetic tree, genome organization comparison, and chromosomal disposition analysis based on the biological information obtained from the NCBI and Ensembl databases. Finally, we established the possible evolutionary scenario of the BSG gene, which showed the restricted exon rearrangement that has occurred during evolution, forming the present-day BSG gene. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Inter-genomic DNA Exchanges and Homeologous Gene Silencing Shaped the Nascent Allopolyploid Coffee Genome (Coffea arabica L.

    Directory of Open Access Journals (Sweden)

    Philippe Lashermes

    2016-09-01

    Full Text Available Allopolyploidization is a biological process that has played a major role in plant speciation and evolution. Genomic changes are common consequences of polyploidization, but their dynamics over time are still poorly understood. Coffea arabica, a recently formed allotetraploid, was chosen to study genetic changes that accompany allopolyploid formation. Both RNA-seq and DNA-seq data were generated from two genetically distant C. arabica accessions. Genomic structural variation was investigated using C. canephora, one of its diploid progenitors, as reference genome. The fate of 9047 duplicate homeologous genes was inferred and compared between the accessions. The pattern of SNP density along the reference genome was consistent with the allopolyploid structure. Large genomic duplications or deletions were not detected. Two homeologous copies were retained and expressed in 96% of the genes analyzed. Nevertheless, duplicated genes were found to be affected by various genomic changes leading to homeolog loss or silencing. Genetic and epigenetic changes were evidenced that could have played a major role in the stabilization of the unique ancestral allotetraploid and its subsequent diversification. While the early evolution of C. arabica mainly involved homeologous crossover exchanges, the later stage appears to have relied on more gradual evolution involving gene conversion and homeolog silencing.

  18. Convergent functional genomics in addiction research - a translational approach to study candidate genes and gene networks.

    Science.gov (United States)

    Spanagel, Rainer

    2013-01-01

    Convergent functional genomics (CFG) is a translational methodology that integrates in a Bayesian fashion multiple lines of evidence from studies in human and animal models to get a better understanding of the genetics of a disease or pathological behavior. Here the integration of data sets that derive from forward genetics in animals and genetic association studies including genome wide association studies (GWAS) in humans is described for addictive behavior. The aim of forward genetics in animals and association studies in humans is to identify mutations (e.g. SNPs) that produce a certain phenotype; i.e. "from phenotype to genotype". Most powerful in terms of forward genetics is combined quantitative trait loci (QTL) analysis and gene expression profiling in recombinant inbreed rodent lines or genetically selected animals for a specific phenotype, e.g. high vs. low drug consumption. By Bayesian scoring genomic information from forward genetics in animals is then combined with human GWAS data on a similar addiction-relevant phenotype. This integrative approach generates a robust candidate gene list that has to be functionally validated by means of reverse genetics in animals; i.e. "from genotype to phenotype". It is proposed that studying addiction relevant phenotypes and endophenotypes by this CFG approach will allow a better determination of the genetics of addictive behavior.

  19. Complementary Information Derived from CRISPR Cas9 Mediated Gene Deletion and Suppression. | Office of Cancer Genomics

    Science.gov (United States)

    CRISPR-Cas9 provides the means to perform genome editing and facilitates loss-of-function screens. However, we and others demonstrated that expression of the Cas9 endonuclease induces a gene-independent response that correlates with the number of target sequences in the genome. An alternative approach to suppressing gene expression is to block transcription using a catalytically inactive Cas9 (dCas9). Here we directly compare genome editing by CRISPR-Cas9 (cutting, CRISPRc) and gene suppression using KRAB-dCas9 (CRISPRi) in loss-of-function screens to identify cell essential genes.

  20. DivStat: a user-friendly tool for single nucleotide polymorphism analysis of genomic diversity.

    Directory of Open Access Journals (Sweden)

    Inês Soares

    Full Text Available Recent developments have led to an enormous increase of publicly available large genomic data, including complete genomes. The 1000 Genomes Project was a major contributor, releasing the results of sequencing a large number of individual genomes, and allowing for a myriad of large scale studies on human genetic variation. However, the tools currently available are insufficient when the goal concerns some analyses of data sets encompassing more than hundreds of base pairs and when considering haplotype sequences of single nucleotide polymorphisms (SNPs. Here, we present a new and potent tool to deal with large data sets allowing the computation of a variety of summary statistics of population genetic data, increasing the speed of data analysis.

  1. Finding the missing honey bee genes: Lessons learned from a genome upgrade

    KAUST Repository

    Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A; Patil, S.; Gubbala, S.; Aqrawi, P.; Arias, F.; Bess, C.; Blankenburg, K. B.; Brocchini, M.; Buhay, C.; Challis, D.; Chang, K.; Chen, D.; Coleman, P.; Drummond, J.; English, A.; Evani, U.; Francisco, L.; Fu, Q.; Goodspeed, R.; Haessly, T. H.; Hale, W.; Han, H.; Hu, Y.; Jackson, L.; Jakkamsetti, A.; Jayaseelan, J. C.; Kakkar, N.; Kalra, D.; Kandadi, H.; Lee, S.; Li, H.; Liu, Y.; Macmil, S.; Mandapat, C. M.; Mata, R.; Mathew, T.; Matskevitch, T.; Munidasa, M.; Nagaswamy, U.; Najjar, R.; Nguyen, N.; Niu, J.; Opheim, D.; Palculict, T.; Paul, S.; Pellon, M.; Perales, L.; Pham, C.; Pham, P.

    2014-01-01

    Background: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. 2014 Elsik et al.; licensee BioMed Central Ltd.

  2. Finding the missing honey bee genes: lessons learned from a genome upgrade.

    Science.gov (United States)

    Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A

    2014-01-30

    The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

  3. Finding the missing honey bee genes: Lessons learned from a genome upgrade

    KAUST Repository

    Elsik, Christine G

    2014-01-30

    Background: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. 2014 Elsik et al.; licensee BioMed Central Ltd.

  4. A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm

    Directory of Open Access Journals (Sweden)

    Allen Eric E

    2008-10-01

    Full Text Available Abstract Background The process of horizontal gene transfer (HGT is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been impractical for large numbers of genomes at once, due to prohibitive computational demands. DarkHorse, a recently described statistical method for discovering phylogenetically atypical genes on a genome-wide basis, provides a means to solve this problem through lineage probability index (LPI ranking scores. LPI scores inversely reflect phylogenetic distance between a test amino acid sequence and its closest available database matches. Proteins with low LPI scores are good horizontal gene transfer candidates; those with high scores are not. Description The DarkHorse algorithm has been applied to 955 microbial genome sequences, and the results organized into a web-searchable relational database, called the DarkHorse HGT Candidate Resource http://darkhorse.ucsd.edu. Users can select individual genomes or groups of genomes to screen by LPI score, search for protein functions by descriptive annotation or amino acid sequence similarity, or select proteins with unusual G+C composition in their underlying coding sequences. The search engine reports LPI scores for match partners as well as query sequences, providing the opportunity to explore whether potential HGT donor sequences are phylogenetically typical or atypical within their own genomes. This information can be used to predict whether or not sufficient information is available to build a well-supported phylogenetic tree using the potential donor sequence. Conclusion The DarkHorse HGT Candidate database provides a powerful, flexible set of tools for identifying phylogenetically atypical proteins, allowing researchers to explore both individual HGT events in single genomes, and

  5. Extensive error in the number of genes inferred from draft genome assemblies.

    Directory of Open Access Journals (Sweden)

    James F Denton

    2014-12-01

    Full Text Available Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.

  6. The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists.

    OpenAIRE

    Sven Heinicke; Michael S Livstone; Charles Lu; Rose Oughtred; Fan Kang; Samuel V Angiuoli; Owen White; David Botstein; Kara Dolinski

    2007-01-01

    Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic r...

  7. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data

    Directory of Open Access Journals (Sweden)

    Merchant Sabeeha S

    2011-07-01

    Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of

  8. Genome-wide search for gene-gene interactions in colorectal cancer.

    Directory of Open Access Journals (Sweden)

    Shuo Jiao

    Full Text Available Genome-wide association studies (GWAS have successfully identified a number of single-nucleotide polymorphisms (SNPs associated with colorectal cancer (CRC risk. However, these susceptibility loci known today explain only a small fraction of the genetic risk. Gene-gene interaction (GxG is considered to be one source of the missing heritability. To address this, we performed a genome-wide search for pair-wise GxG associated with CRC risk using 8,380 cases and 10,558 controls in the discovery phase and 2,527 cases and 2,658 controls in the replication phase. We developed a simple, but powerful method for testing interaction, which we term the Average Risk Due to Interaction (ARDI. With this method, we conducted a genome-wide search to identify SNPs showing evidence for GxG with previously identified CRC susceptibility loci from 14 independent regions. We also conducted a genome-wide search for GxG using the marginal association screening and examining interaction among SNPs that pass the screening threshold (p<10(-4. For the known locus rs10795668 (10p14, we found an interacting SNP rs367615 (5q21 with replication p = 0.01 and combined p = 4.19×10(-8. Among the top marginal SNPs after LD pruning (n = 163, we identified an interaction between rs1571218 (20p12.3 and rs10879357 (12q21.1 (nominal combined p = 2.51×10(-6; Bonferroni adjusted p = 0.03. Our study represents the first comprehensive search for GxG in CRC, and our results may provide new insight into the genetic etiology of CRC.

  9. Accessing the SEED genome databases via Web services API: tools for programmers.

    Science.gov (United States)

    Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A

    2010-06-14

    The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.

  10. Forest and fibre genomics: biotechnology tools for applied tree ...

    African Journals Online (AJOL)

    A milestone for eucalypt research, the project will facilitate the development of new biotechnology tools that will accelerate the domestication, improvement and ... The application of DNA fingerprinting in eucalypt breeding programmes represented an early technology delivery to industry with practical, short-term benefi ts, ...

  11. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci

    DEFF Research Database (Denmark)

    Rasmusen, L. H.; Dargis, R.; Iversen, Katrine Højholt

    2016-01-01

    observed in single gene analyses. Species identification based on single gene analysis showed their limitations when more strains were included. In contrast, analyses incorporating more sequence data, like MLSA, SNPs and core-genome analyses, provided more distinct clustering. The core-genome tree showed......Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients...

  12. VarB Plus: An Integrated Tool for Visualization of Genome Variation Datasets

    KAUST Repository

    Hidayah, Lailatul

    2012-07-01

    Research on genomic sequences has been improving significantly as more advanced technology for sequencing has been developed. This opens enormous opportunities for sequence analysis. Various analytical tools have been built for purposes such as sequence assembly, read alignments, genome browsing, comparative genomics, and visualization. From the visualization perspective, there is an increasing trend towards use of large-scale computation. However, more than power is required to produce an informative image. This is a challenge that we address by providing several ways of representing biological data in order to advance the inference endeavors of biologists. This thesis focuses on visualization of variations found in genomic sequences. We develop several visualization functions and embed them in an existing variation visualization tool as extensions. The tool we improved is named VarB, hence the nomenclature for our enhancement is VarB Plus. To the best of our knowledge, besides VarB, there is no tool that provides the capability of dynamic visualization of genome variation datasets as well as statistical analysis. Dynamic visualization allows users to toggle different parameters on and off and see the results on the fly. The statistical analysis includes Fixation Index, Relative Variant Density, and Tajima’s D. Hence we focused our efforts on this tool. The scope of our work includes plots of per-base genome coverage, Principal Coordinate Analysis (PCoA), integration with a read alignment viewer named LookSeq, and visualization of geo-biological data. In addition to description of embedded functionalities, significance, and limitations, future improvements are discussed. The result is four extensions embedded successfully in the original tool, which is built on the Qt framework in C++. Hence it is portable to numerous platforms. Our extensions have shown acceptable execution time in a beta testing with various high-volume published datasets, as well as positive

  13. A comprehensive evaluation of rodent malaria parasite genomes and gene expression

    KAUST Repository

    Otto, Thomas D

    2014-10-30

    Background: Rodent malaria parasites (RMP) are used extensively as models of human malaria. Draft RMP genomes have been published for Plasmodium yoelii, P. berghei ANKA (PbA) and P. chabaudi AS (PcAS). Although availability of these genomes made a significant impact on recent malaria research, these genomes were highly fragmented and were annotated with little manual curation. The fragmented nature of the genomes has hampered genome wide analysis of Plasmodium gene regulation and function. Results: We have greatly improved the genome assemblies of PbA and PcAS, newly sequenced the virulent parasite P. yoelii YM genome, sequenced additional RMP isolates/lines and have characterized genotypic diversity within RMP species. We have produced RNA-seq data and utilized it to improve gene-model prediction and to provide quantitative, genome-wide, data on gene expression. Comparison of the RMP genomes with the genome of the human malaria parasite P. falciparum and RNA-seq mapping permitted gene annotation at base-pair resolution. Full-length chromosomal annotation permitted a comprehensive classification of all subtelomeric multigene families including the `Plasmodium interspersed repeat genes\\' (pir). Phylogenetic classification of the pir family, combined with pir expression patterns, indicates functional diversification within this family. Conclusions: Complete RMP genomes, RNA-seq and genotypic diversity data are excellent and important resources for gene-function and post-genomic analyses and to better interrogate Plasmodium biology. Genotypic diversity between P. chabaudi isolates makes this species an excellent parasite to study genotype-phenotype relationships. The improved classification of multigene families will enhance studies on the role of (variant) exported proteins in virulence and immune evasion/modulation.

  14. StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

    Science.gov (United States)

    Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A

    2017-10-15

    Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. favorov@sensi.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. Genomic organization, annotation, and ligand-receptor inferences of chicken chemokines and chemokine receptor genes based on comparative genomics

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2005-03-01

    Full Text Available Abstract Background Chemokines and their receptors play important roles in host defense, organogenesis, hematopoiesis, and neuronal communication. Forty-two chemokines and 19 cognate receptors have been found in the human genome. Prior to this report, only 11 chicken chemokines and 7 receptors had been reported. The objectives of this study were to systematically identify chicken chemokines and their cognate receptor genes in the chicken genome and to annotate these genes and ligand-receptor binding by a comparative genomics approach. Results Twenty-three chemokine and 14 chemokine receptor genes were identified in the chicken genome. All of the chicken chemokines contained a conserved CC, CXC, CX3C, or XC motif, whereas all the chemokine receptors had seven conserved transmembrane helices, four extracellular domains with a conserved cysteine, and a conserved DRYLAIV sequence in the second intracellular domain. The number of coding exons in these genes and the syntenies are highly conserved between human, mouse, and chicken although the amino acid sequence homologies are generally low between mammalian and chicken chemokines. Chicken genes were named with the systematic nomenclature used in humans and mice based on phylogeny, synteny, and sequence homology. Conclusion The independent nomenclature of chicken chemokines and chemokine receptors suggests that the chicken may have ligand-receptor pairings similar to mammals. All identified chicken chemokines and their cognate receptors were identified in the chicken genome except CCR9, whose ligand was not identified in this study. The organization of these genes suggests that there were a substantial number of these genes present before divergence between aves and mammals and more gene duplications of CC, CXC, CCR, and CXCR subfamilies in mammals than in aves after the divergence.

  16. Mammalian-specific genomic functions: Newly acquired traits generated by genomic imprinting and LTR retrotransposon-derived genes in mammals.

    Science.gov (United States)

    Kaneko-Ishino, Tomoko; Ishino, Fumitoshi

    2015-01-01

    Mammals, including human beings, have evolved a unique viviparous reproductive system and a highly developed central nervous system. How did these unique characteristics emerge in mammalian evolution, and what kinds of changes did occur in the mammalian genomes as evolution proceeded? A key conceptual term in approaching these issues is "mammalian-specific genomic functions", a concept covering both mammalian-specific epigenetics and genetics. Genomic imprinting and LTR retrotransposon-derived genes are reviewed as the representative, mammalian-specific genomic functions that are essential not only for the current mammalian developmental system, but also mammalian evolution itself. First, the essential roles of genomic imprinting in mammalian development, especially related to viviparous reproduction via placental function, as well as the emergence of genomic imprinting in mammalian evolution, are discussed. Second, we introduce the novel concept of "mammalian-specific traits generated by mammalian-specific genes from LTR retrotransposons", based on the finding that LTR retrotransposons served as a critical driving force in the mammalian evolution via generating mammalian-specific genes.

  17. Genome-wide analysis of regions similar to promoters of histone genes

    KAUST Repository

    Chowdhary, Rajesh

    2010-05-28

    Background: The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes.Results: We successfully developed a histone promoter model using a comprehensive collection of histone genes. Based on leave-one-out cross-validation test, the model produced good prediction accuracy (94.1% sensitivity, 92.6% specificity, and 92.8% positive predictive value). We used this model to predict across the genome a number of genes that shared similar promoter structures with the histone gene promoters. We thus hypothesize that these predicted genes could be coregulated with histone genes. This hypothesis matches well with the available gene expression, gene ontology, and pathways data. Jointly with promoters of the above-mentioned genes, we found a large number of intergenic regions with similar structure as histone promoters.Conclusions: This study represents one of the most comprehensive computational analyses conducted thus far on a genome-wide scale of promoters of human histone genes. Our analysis suggests a number of other human genes that share a high similarity of promoter structure with the histone genes and thus are highly likely to be coregulated, and consequently coexpressed, with the histone genes. We also found that there are a large number of intergenic regions across the genome with their structures similar to promoters of histone genes. These regions may be promoters of yet unidentified genes, or may represent remote control regions that

  18. Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks.

    Science.gov (United States)

    Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun

    2014-11-25

    The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position

  19. Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.

    Science.gov (United States)

    Evans, Teri; Johnson, Andrew D; Loose, Matthew

    2018-01-12

    Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .

  20. The Use of a Combined Bioinformatics Approach to Locate Antibiotic Resistance Genes on Plasmids From Whole Genome Sequences of Salmonella enterica Serovars From Humans in Ghana

    Directory of Open Access Journals (Sweden)

    Egle Kudirkiene

    2018-05-01

    Full Text Available In the current study, we identified plasmids carrying antimicrobial resistance genes in draft whole genome sequences of 16 selected Salmonella enterica isolates representing six different serovars from humans in Ghana. The plasmids and the location of resistance genes in the genomes were predicted using a combination of PlasmidFinder, ResFinder, plasmidSPAdes and BLAST genomic analysis tools. Subsequently, S1-PFGE was employed for analysis of plasmid profiles. Whole genome sequencing confirmed the presence of antimicrobial resistance genes in Salmonella isolates showing multidrug resistance phenotypically. ESBL, either blaTEM52−B or blaCTX−M15 were present in two cephalosporin resistant isolates of S. Virchow and S. Poona, respectively. The systematic genome analysis revealed the presence of different plasmids in different serovars, with or without insertion of antimicrobial resistance genes. In S. Enteritidis, resistance genes were carried predominantly on plasmids of IncN type, in S. Typhimurium on plasmids of IncFII(S/IncFIB(S/IncQ1 type. In S. Virchow and in S. Poona, resistance genes were detected on plasmids of IncX1 and TrfA/IncHI2/IncHI2A type, respectively. The latter two plasmids were described for the first time in these serovars. The combination of genomic analytical tools allowed nearly full mapping of the resistance plasmids in all Salmonella strains analyzed. The results suggest that the improved analytical approach used in the current study may be used to identify plasmids that are specifically associated with resistance phenotypes in whole genome sequences. Such knowledge would allow the development of rapid multidrug resistance tracking tools in Salmonella populations using WGS.

  1. The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows.

    Science.gov (United States)

    O'Connor, Brian D; Yuen, Denis; Chung, Vincent; Duncan, Andrew G; Liu, Xiang Kun; Patricia, Janice; Paten, Benedict; Stein, Lincoln; Ferretti, Vincent

    2017-01-01

    As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).

  2. Comprehensive search for intra- and inter-specific sequence polymorphisms among coding envelope genes of retroviral origin found in the human genome: genes and pseudogenes

    Directory of Open Access Journals (Sweden)

    Vasilescu Alexandre

    2005-09-01

    Full Text Available Abstract Background The human genome carries a high load of proviral-like sequences, called Human Endogenous Retroviruses (HERVs, which are the genomic traces of ancient infections by active retroviruses. These elements are in most cases defective, but open reading frames can still be found for the retroviral envelope gene, with sixteen such genes identified so far. Several of them are conserved during primate evolution, having possibly been co-opted by their host for a physiological role. Results To characterize further their status, we presently sequenced 12 of these genes from a panel of 91 Caucasian individuals. Genomic analyses reveal strong sequence conservation (only two non synonymous Single Nucleotide Polymorphisms [SNPs] for the two HERV-W and HERV-FRD envelope genes, i.e. for the two genes specifically expressed in the placenta and possibly involved in syncytiotrophoblast formation. We further show – using an ex vivo fusion assay for each allelic form – that none of these SNPs impairs the fusogenic function. The other envelope proteins disclose variable polymorphisms, with the occurrence of a stop codon and/or frameshift for most – but not all – of them. Moreover, the sequence conservation analysis of the orthologous genes that can be found in primates shows that three env genes have been maintained in a fully coding state throughout evolution including envW and envFRD. Conclusion Altogether, the present study strongly suggests that some but not all envelope encoding sequences are bona fide genes. It also provides new tools to elucidate the possible role of endogenous envelope proteins as susceptibility factors in a number of pathologies where HERVs have been suspected to be involved.

  3. DFAST and DAGA: web-based integrated genome annotation tools and resources.

    Science.gov (United States)

    Tanizawa, Yasuhiro; Fujisawa, Takatomo; Kaminuma, Eli; Nakamura, Yasukazu; Arita, Masanori

    2016-01-01

    Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus , obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii , whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.

  4. Whole genome duplications and expansion of the vertebrate GATA transcription factor gene family

    Directory of Open Access Journals (Sweden)

    Bowerman Bruce

    2009-08-01

    Full Text Available Abstract Background GATA transcription factors influence many developmental processes, including the specification of embryonic germ layers. The GATA gene family has significantly expanded in many animal lineages: whereas diverse cnidarians have only one GATA transcription factor, six GATA genes have been identified in many vertebrates, five in many insects, and eleven to thirteen in Caenorhabditis nematodes. All bilaterian animal genomes have at least one member each of two classes, GATA123 and GATA456. Results We have identified one GATA123 gene and one GATA456 gene from the genomic sequence of two invertebrate deuterostomes, a cephalochordate (Branchiostoma floridae and a hemichordate (Saccoglossus kowalevskii. We also have confirmed the presence of six GATA genes in all vertebrate genomes, as well as additional GATA genes in teleost fish. Analyses of conserved sequence motifs and of changes to the exon-intron structure, and molecular phylogenetic analyses of these deuterostome GATA genes support their origin from two ancestral deuterostome genes, one GATA 123 and one GATA456. Comparison of the conserved genomic organization across vertebrates identified eighteen paralogous gene families linked to multiple vertebrate GATA genes (GATA paralogons, providing the strongest evidence yet for expansion of vertebrate GATA gene families via genome duplication events. Conclusion From our analysis, we infer the evolutionary birth order and relationships among vertebrate GATA transcription factors, and define their expansion via multiple rounds of whole genome duplication events. As the genomes of four independent invertebrate deuterostome lineages contain single copy GATA123 and GATA456 genes, we infer that the 0R (pre-genome duplication invertebrate deuterostome ancestor also had two GATA genes, one of each class. Synteny analyses identify duplications of paralogous chromosomal regions (paralogons, from single ancestral vertebrate GATA123 and GATA456

  5. Gene Overexpression Resources in Cereals for Functional Genomics and Discovery of Useful Genes

    Directory of Open Access Journals (Sweden)

    Kiyomi Abe

    2016-09-01

    Full Text Available Identification and elucidation of functions of plant genes is valuable for both basic and applied research. In addition to natural variation in model plants, numerous loss-of-function resources have been produced by mutagenesis with chemicals, irradiation, or insertions of transposable elements or T-DNA. However, we may be unable to observe loss-of-function phenotypes for genes with functionally redundant homologs, and for those essential for growth and development. To offset such disadvantages, gain-of-function transgenic resources have been exploited. Activation-tagged lines have been generated using obligatory overexpression of endogenous genes by random insertion of an enhancer. Recent progress in DNA sequencing technology and bioinformatics has enabled the preparation of genomewide collections of full-length cDNAs (fl-cDNAs in some model species. Using the fl-cDNA clones, a novel gain-of-function strategy, Fl-cDNA OvereXpressor gene (FOX-hunting system, has been developed. A mutant phenotype in a FOX line can be directly attributed to the overexpressed fl-cDNA. Investigating a large population of FOX lines could reveal important genes conferring favorable phenotypes for crop breeding. Alternatively, a unique loss-of-function approach Chimeric REpressor gene Silencing Technology (CRES-T has been developed. In CRES-T, overexpression of a chimeric repressor, composed of the coding sequence of a transcription factor (TF and short peptide designated as the repression domain, could interfere with the action of endogenous TF in plants. Although plant TFs usually consist of gene families, CRES-T is effective, in principle, even for the TFs with functional redundancy. In this review, we focus on the current status of the gene-overexpression strategies and resources for identifying and elucidating novel functions of cereal genes. We discuss the potential of these research tools for identifying useful genes and phenotypes for application in crop

  6. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  7. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Science.gov (United States)

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  8. Accelerating Genome Editing in CHO Cells Using CRISPR Cas9 and CRISPy, a Web-Based Target Finding Tool

    DEFF Research Database (Denmark)

    Ronda, Carlotta; Pedersen, Lasse Ebdrup; Hansen, Henning Gram

    2014-01-01

    of the CRISPR Cas9 technology in CHO cells by generating site-specific gene disruptions in COSMC and FUT8, both of which encode proteins involved in glycosylation. The tested single guide RNAs (sgRNAs) created an indel frequency up to 47.3% in COSMC, while an indel frequency up to 99.7% in FUT8 was achieved...... mutations at the target sites, with a strong preference for single base indels. Finally, we have developed a user-friendly bioinformatics tool, named “CRISPy” for rapid identification of sgRNA target sequences in the CHO-K1 genome. The CRISPy tool identified 1,970,449 CRISPR targets divided into 27...

  9. WebaCGH: an interactive online tool for the analysis and display of array comparative genomic hybridisation data.

    Science.gov (United States)

    Frankenberger, Casey; Wu, Xiaolin; Harmon, Jerry; Church, Deanna; Gangi, Lisa M; Munroe, David J; Urzúa, Ulises

    2006-01-01

    Gene copy number variations occur both in normal cells and in numerous pathologies including cancer and developmental diseases. Array comparative genomic hybridisation (aCGH) is an emerging technology that allows detection of chromosomal gains and losses in a high-resolution format. When aCGH is performed on cDNA and oligonucleotide microarrays, the impact of DNA copy number on gene transcription profiles may be directly compared. We have created an online software tool, WebaCGH, that functions to (i) upload aCGH and gene transcription results from multiple experiments; (ii) identify significant aberrant regions using a local Z-score threshold in user-selected chromosomal segments subjected to smoothing with moving averages; and (iii) display results in a graphical format with full genome and individual chromosome views. In the individual chromosome display, data can be zoomed in/out in both dimensions (i.e. ratio and physical location) and plotted features can have 'mouse over' linking to outside databases to identify loci of interest. Uploaded data can be stored indefinitely for subsequent retrieval and analysis. WebaCGH was created as a Java-based web application using the open-source database MySQL. WebaCGH is freely accessible at http://129.43.22.27/WebaCGH/welcome.htm Xiaolin Wu (forestwu@mail.nih.gov) or Ulises Urzúa (uurzua@med.uchile.cl).

  10. Mitogenomes from type specimens, a genotyping tool for morphologically simple species: ten genomes of agar-producing red algae.

    Science.gov (United States)

    Boo, Ga Hun; Hughey, Jeffery R; Miller, Kathy Ann; Boo, Sung Min

    2016-10-14

    DNA sequences from type specimens provide independent, objective characters that enhance the value of type specimens and permit the correct application of species names to phylogenetic clades and specimens. We provide mitochondrial genomes (mitogenomes) from archival type specimens of ten species in agar-producing red algal genera Gelidium and Pterocladiella. The genomes contain 43-44 genes, ranging in size from 24,910 to 24,970 bp with highly conserved gene synteny. Low Ka/Ks ratios of apocytochrome b and cytochrome oxidase genes support their utility as markers. Phylogenies of mitogenomes and cox1+rbcL sequences clarified classification at the genus and species levels. Three species formerly in Gelidium and Pterocladia are transferred to Pterocladiella: P. media comb. nov., P. musciformis comb. nov., and P. luxurians comb. and stat. nov. Gelidium sinicola is merged with G. coulteri because they share identical cox1 and rbcL sequences. We describe a new species, Gelidium millariana sp. nov., previously identified as G. isabelae from Australia. We demonstrate that mitogenomes from type specimens provide a new tool for typifying species in the Gelidiales and that there is an urgent need for analyzing mitogenomes from type specimens of red algae and other morphologically simple organisms for insight into their nomenclature, taxonomy and evolution.

  11. Genome-wide gene expression regulation as a function of genotype and age in C. elegans

    NARCIS (Netherlands)

    Viñuela Rodriguez, A.; Snoek, L.B.; Riksen, J.A.G.; Kammenga, J.E.

    2010-01-01

    Gene expression becomes more variable with age, and it is widely assumed that this is due to a decrease in expression regulation. But currently there is no understanding how gene expression regulatory patterns progress with age. Here we explored genome-wide gene expression variation and regulatory

  12. The Symbiodinium kawagutii genome illuminates dinoflagellate gene expression and coral symbiosis

    DEFF Research Database (Denmark)

    Lin, Senjie; Cheng, Shifeng; Song, Bo

    2015-01-01

    Symbiodinium-specific gene families. No whole-genome duplication was observed, but instead we found active (retro) transposition and gene family expansion, especially in processes important for successful symbiosis with corals. We also documented genes potentially governing sexual reproduction and cyst...... the molecular basis and evolution of coral symbiosis....

  13. Functional Associations by Response Overlap (FARO, a functional genomics approach matching gene expression phenotypes.

    Directory of Open Access Journals (Sweden)

    Henrik Bjørn Nielsen

    2007-08-01

    Full Text Available The systematic comparison of transcriptional responses of organisms is a powerful tool in functional genomics. For example, mutants may be characterized by comparing their transcript profiles to those obtained in other experiments querying the effects on gene expression of many experimental factors including treatments, mutations and pathogen infections. Similarly, drugs may be discovered by the relationship between the transcript profiles effectuated or impacted by a candidate drug and by the target disease. The integration of such data enables systems biology to predict the interplay between experimental factors affecting a biological system. Unfortunately, direct comparisons of gene expression profiles obtained in independent, publicly available microarray experiments are typically compromised by substantial, experiment-specific biases. Here we suggest a novel yet conceptually simple approach for deriving 'Functional Association(s by Response Overlap' (FARO between microarray gene expression studies. The transcriptional response is defined by the set of differentially expressed genes independent from the magnitude or direction of the change. This approach overcomes the limited comparability between studies that is typical for methods that rely on correlation in gene expression. We apply FARO to a compendium of 242 diverse Arabidopsis microarray experimental factors, including phyto-hormones, stresses and pathogens, growth conditions/stages, tissue types and mutants. We also use FARO to confirm and further delineate the functions of Arabidopsis MAP kinase 4 in disease and stress responses. Furthermore, we find that a large, well-defined set of genes responds in opposing directions to different stress conditions and predict the effects of different stress combinations. This demonstrates the usefulness of our approach for exploiting public microarray data to derive biologically meaningful associations between experimental factors. Finally, our

  14. Agrobacterium rhizogenes-induced cotton hairy root culture as an alternative tool for cotton functional genomics

    Science.gov (United States)

    Although well-accepted as the ultimate method for cotton functional genomics, Agrobacterium tumefaciens-mediated cotton transformation is not widely used for functional analyses of cotton genes and their promoters since regeneration of cotton in tissue culture is lengthy and labor intensive. In cer...

  15. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    Directory of Open Access Journals (Sweden)

    Grigoriev Igor V

    2009-02-01

    Full Text Available Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR. Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6% of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  16. High GC Content Cas9-Mediated Genome-Editing and Biosynthetic Gene Cluster Activation in Saccharopolyspora erythraea.

    Science.gov (United States)

    Liu, Yong; Wei, Wen-Ping; Ye, Bang-Ce

    2018-05-18

    The overexpression of bacterial secondary metabolite biosynthetic enzymes is the basis for industrial overproducing strains. Genome editing tools can be used to further improve gene expression and yield. Saccharopolyspora erythraea produces erythromycin, which has extensive clinical applications. In this study, the CRISPR-Cas9 system was used to edit genes in the S. erythraea genome. A temperature-sensitive plasmid containing the PermE promoter, to drive Cas9 expression, and the Pj23119 and PkasO promoters, to drive sgRNAs, was designed. Erythromycin esterase, encoded by S. erythraea SACE_1765, inactivates erythromycin by hydrolyzing the macrolactone ring. Sequencing and qRT-PCR confirmed that reporter genes were successfully inserted into the SACE_1765 gene. Deletion of SACE_1765 in a high-producing strain resulted in a 12.7% increase in erythromycin levels. Subsequent PermE- egfp knock-in at the SACE_0712 locus resulted in an 80.3% increase in erythromycin production compared with that of wild type. Further investigation showed that PermE promoter knock-in activated the erythromycin biosynthetic gene clusters at the SACE_0712 locus. Additionally, deletion of indA (SACE_1229) using dual sgRNA targeting without markers increased the editing efficiency to 65%. In summary, we have successfully applied Cas9-based genome editing to a bacterial strain, S. erythraea, with a high GC content. This system has potential application for both genome-editing and biosynthetic gene cluster activation in Actinobacteria.

  17. Using FlyBase, a Database of Drosophila Genes and Genomes.

    Science.gov (United States)

    Marygold, Steven J; Crosby, Madeline A; Goodman, Joshua L

    2016-01-01

    For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic, and high-throughput technologies add to the quantity and diversity of available data and resources.FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets, or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback.This chapter provides an overview of the data content, organization, and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries.

  18. Network graph analysis of gene-gene interactions in genome-wide association study data.

    Science.gov (United States)

    Lee, Sungyoung; Kwon, Min-Seok; Park, Taesung

    2012-12-01

    Most common complex traits, such as obesity, hypertension, diabetes, and cancers, are known to be associated with multiple genes, environmental factors, and their epistasis. Recently, the development of advanced genotyping technologies has allowed us to perform genome-wide association studies (GWASs). For detecting the effects of multiple genes on complex traits, many approaches have been proposed for GWASs. Multifactor dimensionality reduction (MDR) is one of the powerful and efficient methods for detecting high-order gene-gene (GxG) interactions. However, the biological interpretation of GxG interactions identified by MDR analysis is not easy. In order to aid the interpretation of MDR results, we propose a network graph analysis to elucidate the meaning of identified GxG interactions. The proposed network graph analysis consists of three steps. The first step is for performing GxG interaction analysis using MDR analysis. The second step is to draw the network graph using the MDR result. The third step is to provide biological evidence of the identified GxG interaction using external biological databases. The proposed method was applied to Korean Association Resource (KARE) data, containing 8838 individuals with 327,632 single-nucleotide polymorphisms, in order to perform GxG interaction analysis of body mass index (BMI). Our network graph analysis successfully showed that many identified GxG interactions have known biological evidence related to BMI. We expect that our network graph analysis will be helpful to interpret the biological meaning of GxG interactions.

  19. Gene expression profile and genomic alterations in colonic tumours induced by 1,2-dimethylhydrazine (DMH) in rats

    International Nuclear Information System (INIS)

    Femia, Angelo Pietro; Luceri, Cristina; Toti, Simona; Giannini, Augusto; Dolara, Piero; Caderni, Giovanna

    2010-01-01

    Azoxymethane (AOM) or 1,2-dimethylhydrazine (DMH)-induced colon carcinogenesis in rats shares many phenotypical similarities with human sporadic colon cancer and is a reliable model for identifying chemopreventive agents. Genetic mutations relevant to human colon cancer have been described in this model, but comprehensive gene expression and genomic analysis have not been reported so far. Therefore, we applied genome-wide technologies to study variations in gene expression and genomic alterations in DMH-induced colon cancer in F344 rats. For gene expression analysis, 9 tumours (TUM) and their paired normal mucosa (NM) were hybridized on 4 × 44K Whole rat arrays (Agilent) and selected genes were validated by semi-quantitative RT-PCR. Functional analysis on microarray data was performed by GenMAPP/MappFinder analysis. Array-comparative genomic hybridization (a-CGH) was performed on 10 paired TUM-NM samples hybridized on Rat genome arrays 2 × 105K (Agilent) and the results were analyzed by CGH Analytics (Agilent). Microarray gene expression analysis showed that Defcr4, Igfbp5, Mmp7, Nos2, S100A8 and S100A9 were among the most up-regulated genes in tumours (Fold Change (FC) compared with NM: 183, 48, 39, 38, 36 and 32, respectively), while Slc26a3, Mptx, Retlna and Muc2 were strongly down-regulated (FC: -500; -376, -167, -79, respectively). Functional analysis showed that pathways controlling cell cycle, protein synthesis, matrix metalloproteinases, TNFα/NFkB, and inflammatory responses were up-regulated in tumours, while Krebs cycle, the electron transport chain, and fatty acid beta oxidation were down-regulated. a-CGH analysis showed that four TUM out of ten had one or two chromosomal aberrations. Importantly, one sample showed a deletion on chromosome 18 including Apc. The results showed complex gene expression alterations in adenocarcinomas encompassing many altered pathways. While a-CGH analysis showed a low degree of genomic imbalance, it is interesting to

  20. New tools for Mendelian disease gene identification: PhenoDB variant analysis module; and GeneMatcher, a web-based tool for linking investigators with an interest in the same gene.

    Science.gov (United States)

    Sobreira, Nara; Schiettecatte, François; Boehm, Corinne; Valle, David; Hamosh, Ada

    2015-04-01

    Identifying the causative variant from among the thousands identified by whole-exome sequencing or whole-genome sequencing is a formidable challenge. To make this process as efficient and flexible as possible, we have developed a Variant Analysis Module coupled to our previously described Web-based phenotype intake tool, PhenoDB (http://researchphenodb.net and http://phenodb.org). When a small number of candidate-causative variants have been identified in a study of a particular patient or family, a second, more difficult challenge becomes proof of causality for any given variant. One approach to this problem is to find other cases with a similar phenotype and mutations in the same candidate gene. Alternatively, it may be possible to develop biological evidence for causality, an approach that is assisted by making connections to basic scientists studying the gene of interest, often in the setting of a model organism. Both of these strategies benefit from an open access, online site where individual clinicians and investigators could post genes of interest. To this end, we developed GeneMatcher (http://genematcher.org), a freely accessible Website that enables connections between clinicians and researchers across the world who share an interest in the same gene(s). © 2015 WILEY PERIODICALS, INC.

  1. CRISPR-Cas: From the Bacterial Adaptive Immune System to a Versatile Tool for Genome Engineering.

    Science.gov (United States)

    Kirchner, Marion; Schneider, Sabine

    2015-11-09

    The field of biology has been revolutionized by the recent advancement of an adaptive bacterial immune system as a universal genome engineering tool. Bacteria and archaea use repetitive genomic elements termed clustered regularly interspaced short palindromic repeats (CRISPR) in combination with an RNA-guided nuclease (CRISPR-associated nuclease: Cas) to target and destroy invading DNA. By choosing the appropriate sequence of the guide RNA, this two-component system can be used to efficiently modify, target, and edit genomic loci of interest in plants, insects, fungi, mammalian cells, and whole organisms. This has opened up new frontiers in genome engineering, including the potential to treat or cure human genetic disorders. Now the potential risks as well as the ethical, social, and legal implications of this powerful new technique move into the limelight. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals.

    Science.gov (United States)

    Popova, Olga V; Mikhailov, Kirill V; Nikitin, Mikhail A; Logacheva, Maria D; Penin, Aleksey A; Muntyan, Maria S; Kedrova, Olga S; Petrov, Nikolai B; Panchin, Yuri V; Aleoshin, Vladimir V

    2016-01-01

    Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia.

  3. Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals.

    Directory of Open Access Journals (Sweden)

    Olga V Popova

    Full Text Available Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida and Pycnophyes kielensis (Allomalorhagida. Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even

  4. CRISPR-Cas9; an efficient tool for precise plant genome editing.

    Science.gov (United States)

    Islam, Waqar

    2018-04-03

    Efficient plant genome editing is dependent upon induction of double stranded DNA breaks (DSBs) through site specified nucleases. These DSBs initiate the process of DNA repair which can either base upon homologous recombination (HR) or non-homologous end jointing (NHEJ). Recently, CRISPR-Cas9 mechanism got highlighted as revolutionizing genetic tool due to its simpler frame work along with the broad range of adaptability and applications. So, in this review, I have tried to sum up the application of this biotechnological tool in plant genome editing. Furthermore, I have tried to explain successful adaptation of CRISPR in various plant species where it is used for the successful generation of stable mutations in a steadily growing number of species through NHEJ. The review also sheds light upon other biotechnological approaches relying upon single DNA lesion induction such as genomic deletion or pair wise nickases for evasion of offsite effects. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. Genome-wide comparative analysis of NBS-encoding genes between Brassica species and Arabidopsis thaliana.

    Science.gov (United States)

    Yu, Jingyin; Tehrim, Sadia; Zhang, Fengqi; Tong, Chaobo; Huang, Junyan; Cheng, Xiaohui; Dong, Caihua; Zhou, Yanqiu; Qin, Rui; Hua, Wei; Liu, Shengyi

    2014-01-03

    Plant disease resistance (R) genes with the nucleotide binding site (NBS) play an important role in offering resistance to pathogens. The availability of complete genome sequences of Brassica oleracea and Brassica rapa provides an important opportunity for researchers to identify and characterize NBS-encoding R genes in Brassica species and to compare with analogues in Arabidopsis thaliana based on a comparative genomics approach. However, little is known about the evolutionary fate of NBS-encoding genes in the Brassica lineage after split from A. thaliana. Here we present genome-wide analysis of NBS-encoding genes in B. oleracea, B. rapa and A. thaliana. Through the employment of HMM search and manual curation, we identified 157, 206 and 167 NBS-encoding genes in B. oleracea, B. rapa and A. thaliana genomes, respectively. Phylogenetic analysis among 3 species classified NBS-encoding genes into 6 subgroups. Tandem duplication and whole genome triplication (WGT) analyses revealed that after WGT of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions in Brassica ancestor were deleted or lost quickly, but NBS-encoding genes in Brassica species experienced species-specific gene amplification by tandem duplication after divergence of B. rapa and B. oleracea. Expression profiling of NBS-encoding orthologous gene pairs indicated the differential expression pattern of retained orthologous gene copies in B. oleracea and B. rapa. Furthermore, evolutionary analysis of CNL type NBS-encoding orthologous gene pairs among 3 species suggested that orthologous genes in B. rapa species have undergone stronger negative selection than those in B .oleracea species. But for TNL type, there are no significant differences in the orthologous gene pairs between the two species. This study is first identification and characterization of NBS-encoding genes in B. rapa and B. oleracea based on whole genome sequences. Through tandem duplication and whole genome

  6. A massive incorporation of microbial genes into the genome of Tetranychus urticae, a polyphagous arthropod herbivore.

    Science.gov (United States)

    Wybouw, N; Van Leeuwen, T; Dermauw, W

    2018-06-01

    A number of horizontal gene transfers (HGTs) have been identified in the spider mite Tetranychus urticae, a chelicerate herbivore. However, the genome of this mite species has at present not been thoroughly mined for the presence of HGT genes. Here, we performed a systematic screen for HGT genes in the T. urticae genome using the h-index metric. Our results not only validated previously identified HGT genes but also uncovered 25 novel HGT genes. In addition to HGT genes with a predicted biochemical function in carbohydrate, lipid and folate metabolism, we also identified the horizontal transfer of a ketopantoate hydroxymethyltransferase and a pantoate β-alanine ligase gene. In plants and bacteria, both genes are essential for vitamin B5 biosynthesis and their presence in the mite genome strongly suggests that spider mites, similar to Bemisia tabaci and nematodes, can synthesize their own vitamin B5. We further show that HGT genes were physically embedded within the mite genome and were expressed in different life stages. By screening chelicerate genomes and transcriptomes, we were able to estimate the evolutionary histories of these HGTs during chelicerate evolution. Our study suggests that HGT has made a significant and underestimated impact on the metabolic repertoire of plant-feeding spider mites. © 2018 The Royal Entomological Society.

  7. Genic regions of a large salamander genome contain long introns and novel genes

    Directory of Open Access Journals (Sweden)

    Bryant Susan V

    2009-01-01

    Full Text Available Abstract Background The basis of genome size variation remains an outstanding question because DNA sequence data are lacking for organisms with large genomes. Sixteen BAC clones from the Mexican axolotl (Ambystoma mexicanum: c-value = 32 × 109 bp were isolated and sequenced to characterize the structure of genic regions. Results Annotation of genes within BACs showed that axolotl introns are on average 10× longer than orthologous vertebrate introns and they are predicted to contain more functional elements, including miRNAs and snoRNAs. Loci were discovered within BACs for two novel EST transcripts that are differentially expressed during spinal cord regeneration and skin metamorphosis. Unexpectedly, a third novel gene was also discovered while manually annotating BACs. Analysis of human-axolotl protein-coding sequences suggests there are 2% more lineage specific genes in the axolotl genome than the human genome, but the great majority (86% of genes between axolotl and human are predicted to be 1:1 orthologs. Considering that axolotl genes are on average 5× larger than human genes, the genic component of the salamander genome is estimated to be incredibly large, approximately 2.8 gigabases! Conclusion This study shows that a large salamander genome has a correspondingly large genic component, primarily because genes have incredibly long introns. These intronic sequences may harbor novel coding and non-coding sequences that regulate biological processes that are unique to salamanders.

  8. Snf2 family gene distribution in higher plant genomes reveals DRD1 expansion and diversification in the tomato genome.

    Science.gov (United States)

    Bargsten, Joachim W; Folta, Adam; Mlynárová, Ludmila; Nap, Jan-Peter

    2013-01-01

    As part of large protein complexes, Snf2 family ATPases are responsible for energy supply during chromatin remodeling, but the precise mechanism of action of many of these proteins is largely unknown. They influence many processes in plants, such as the response to environmental stress. This analysis is the first comprehensive study of Snf2 family ATPases in plants. We here present a comparative analysis of 1159 candidate plant Snf2 genes in 33 complete and annotated plant genomes, including two green algae. The number of Snf2 ATPases shows considerable variation across plant genomes (17-63 genes). The DRD1, Rad5/16 and Snf2 subfamily members occur most often. Detailed analysis of the plant-specific DRD1 subfamily in related plant genomes shows the occurrence of a complex series of evolutionary events. Notably tomato carries unexpected gene expansions of DRD1 gene members. Most of these genes are expressed in tomato, although at low levels and with distinct tissue or organ specificity. In contrast, the Snf2 subfamily genes tend to be expressed constitutively in tomato. The results underpin and extend the Snf2 subfamily classification, which could help to determine the various functional roles of Snf2 ATPases and to target environmental stress tolerance and yield in future breeding.

  9. Snf2 family gene distribution in higher plant genomes reveals DRD1 expansion and diversification in the tomato genome.

    Directory of Open Access Journals (Sweden)

    Joachim W Bargsten

    Full Text Available As part of large protein complexes, Snf2 family ATPases are responsible for energy supply during chromatin remodeling, but the precise mechanism of action of many of these proteins is largely unknown. They influence many processes in plants, such as the response to environmental stress. This analysis is the first comprehensive study of Snf2 family ATPases in plants. We here present a comparative analysis of 1159 candidate plant Snf2 genes in 33 complete and annotated plant genomes, including two green algae. The number of Snf2 ATPases shows considerable variation across plant genomes (17-63 genes. The DRD1, Rad5/16 and Snf2 subfamily members occur most often. Detailed analysis of the plant-specific DRD1 subfamily in related plant genomes shows the occurrence of a complex series of evolutionary events. Notably tomato carries unexpected gene expansions of DRD1 gene members. Most of these genes are expressed in tomato, although at low levels and with distinct tissue or organ specificity. In contrast, the Snf2 subfamily genes tend to be expressed constitutively in tomato. The results underpin and extend the Snf2 subfamily classification, which could help to determine the various functional roles of Snf2 ATPases and to target environmental stress tolerance and yield in future breeding.

  10. Genome-wide identification of SAUR genes in watermelon (Citrullus lanatus).

    Science.gov (United States)

    Zhang, Na; Huang, Xing; Bao, Yaning; Wang, Bo; Zeng, Hongxia; Cheng, Weishun; Tang, Mi; Li, Yuhua; Ren, Jian; Sun, Yuhong

    2017-07-01

    The early auxin responsive SAUR family is an important gene family in auxin signal transduction. We here present the first report of a genome-wide identification of SAUR genes in watermelon genome. We successfully identified 65 ClaSAURs and provide a genomic framework for future study on these genes. Phylogenetic result revealed a Cucurbitaceae-specific SAUR subfamily and contribute to understanding of the evolutionary pattern of SAUR genes in plants. Quantitative RT-PCR analysis demonstrates the existed expression of 11 randomly selected SAUR genes in watermelon tissues. ClaSAUR36 was highly expressed in fruit, for which further study might bring a new prospective for watermelon fruit development. Moreover, correlation analysis revealed the similar expression profiles of SAUR genes between watermelon and Arabidopsis during shoot organogenesis. This work gives us a new support for the conserved auxin machinery in plants.

  11. Bioinformatics Tools for Genome-Wide Epigenetic Research.

    Science.gov (United States)

    Angarica, Vladimir Espinosa; Del Sol, Antonio

    2017-01-01

    Epigenetics play a central role in the regulation of many important cellular processes, and dysregulations at the epigenetic level could be the source of serious pathologies, such as neurological disorders affecting brain development, neurodegeneration, and intellectual disability. Despite significant technological advances for epigenetic profiling, there is still a need for a systematic understanding of how epigenetics shapes cellular circuitry, and disease pathogenesis. The development of accurate computational approaches for analyzing complex epigenetic profiles is essential for disentangling the mechanisms underlying cellular development, and the intricate interaction networks determining and sensing chromatin modifications and DNA methylation to control gene expression. In this chapter, we review the recent advances in the field of "computational epigenetics," including computational methods for processing different types of epigenetic data, prediction of chromatin states, and study of protein dynamics. We also discuss how "computational epigenetics" has complemented the fast growth in the generation of epigenetic data for uncovering the main differences and similarities at the epigenetic level between individuals and the mechanisms underlying disease onset and progression.

  12. Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

    KAUST Repository

    Magana-Mora, Arturo

    2017-08-15

    BackgroundPolyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge.ResultsIn this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results.ConclusionsThe results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/.

  13. CRISPR/Cas9:A powerful tool for crop genome editing

    Institute of Scientific and Technical Information of China (English)

    Gaoyuan Song; Meiling Jia; Kai Chen; Xingchen Kong; Bushra Khattak; Chuanxiao Xie; Aili Li; Long Mao

    2016-01-01

    The CRISPR/Cas9 technology is evolved from a type II bacterial immune system and represents a new generation of targeted genome editing technology that can be applied to nearly all organisms. Site-specific modification is achieved by a single guide RNA (usually about 20 nucleotides) that is complementary to a target gene or locus and is anchored by a protospacer-adjacent motif. Cas9 nuclease then cleaves the targeted DNA to generate double-strand breaks (DSBs), which are subsequently repaired by non-homologous end joining (NHEJ) or homology-directed repair (HDR) mechanisms. NHEJ may introduce indels that cause frame shift mutations and hence the disruption of gene functions. When combined with double or multiplex guide RNA design, NHEJ may also introduce targeted chromosome deletions, whereas HDR can be engineered for target gene correction, gene replacement, and gene knock-in. In this review, we briefly survey the history of the CRISPR/Cas9 system invention and its genome-editing mechanism. We also describe the most recent innovation of the CRISPR/Cas9 technology, particularly the broad applications of modified Cas9 variants, and discuss the potential of this system for targeted genome editing and modification for crop improvement.

  14. CRISPR/Cas9:A powerful tool for crop genome editing

    Institute of Scientific and Technical Information of China (English)

    Gaoyuan Song; Meiling Jia; Kai Chen; Xingchen Kong; Bushra Khattak; Chuanxiao Xie; Aili Li; Long Mao

    2016-01-01

    The CRISPR/Cas9 technology is evolved from a type II bacterial immune system and represents a new generation of targeted genome editing technology that can be applied to nearly all organisms. Site-specific modification is achieved by a single guide RNA(usually about 20nucleotides) that is complementary to a target gene or locus and is anchored by a protospaceradjacent motif. Cas9 nuclease then cleaves the targeted DNA to generate double-strand breaks(DSBs), which are subsequently repaired by non-homologous end joining(NHEJ) or homology-directed repair(HDR) mechanisms. NHEJ may introduce indels that cause frame shift mutations and hence the disruption of gene functions. When combined with double or multiplex guide RNA design, NHEJ may also introduce targeted chromosome deletions,whereas HDR can be engineered for target gene correction, gene replacement, and gene knock-in. In this review, we briefly survey the history of the CRISPR/Cas9 system invention and its genome-editing mechanism. We also describe the most recent innovation of the CRISPR/Cas9 technology, particularly the broad applications of modified Cas9 variants, and discuss the potential of this system for targeted genome editing and modification for crop improvement.

  15. In silico exploration of Red Sea Bacillus genomes for natural product biosynthetic gene clusters

    KAUST Repository

    Othoum, Ghofran K; Bougouffa, Salim; Razali, Rozaimi; Bokhari, Ameerah; Alamoudi, Soha; Antunes, André ; Gao, Xin; Hoehndorf, Robert; Arold, Stefan T.; Gojobori, Takashi; Hirt, Heribert; Mijakovic, Ivan; Bajic, Vladimir B.; Lafi, Feras Fawzi; Essack, Magbubah

    2018-01-01

    are better potential sources for novel antibiotics. Moreover, the genome of the Red Sea strain B. paralicheniformis Bac48 is more enriched in modular PKS genes compared to B. licheniformis strains and other B. paralicheniformis strains. This may be linked

  16. Comparative inference of duplicated genes produced by polyploidization in soybean genome.

    Science.gov (United States)

    Yang, Yanmei; Wang, Jinpeng; Di, Jianyong

    2013-01-01

    Soybean (Glycine max) is one of the most important crop plants for providing protein and oil. It is important to investigate soybean genome for its economic and scientific value. Polyploidy is a widespread and recursive phenomenon during plant evolution, and it could generate massive duplicated genes which is an important resource for genetic innovation. Improved sequence alignment criteria and statistical analysis are used to identify and characterize duplicated genes produced by polyploidization in soybean. Based on the collinearity method, duplicated genes by whole genome duplication account for 70.3% in soybean. From the statistical analysis of the molecular distances between duplicated genes, our study indicates that the whole genome duplication event occurred more than once in the genome evolution of soybean, which is often distributed near the ends of chromosomes.

  17. Site-Specific Integration of Exogenous Genes Using Genome Editing Technologies in Zebrafish

    Directory of Open Access Journals (Sweden)

    Atsuo Kawahara

    2016-05-01

    Full Text Available The zebrafish (Danio rerio is an ideal vertebrate model to investigate the developmental molecular mechanism of organogenesis and regeneration. Recent innovation in genome editing technologies, such as zinc finger nucleases (ZFNs, transcription activator-like effector nucleases (TALENs and the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR associated protein 9 (Cas9 system, have allowed researchers to generate diverse genomic modifications in whole animals and in cultured cells. The CRISPR/Cas9 and TALEN techniques frequently induce DNA double-strand breaks (DSBs at the targeted gene, resulting in frameshift-mediated gene disruption. As a useful application of genome editing technology, several groups have recently reported efficient site-specific integration of exogenous genes into targeted genomic loci. In this review, we provide an overview of TALEN- and CRISPR/Cas9-mediated site-specific integration of exogenous genes in zebrafish.

  18. Analysis of genomic imbalances and gene expression changes in transformed follicular lymphoma (FL)

    DEFF Research Database (Denmark)

    Obel, G.; Farinha, P.; Lam, W.

    2005-01-01

    American patients with transformed FL. Methods: High-resolution BAC-array comparative genomic hybridisation (CGH) was used to detect genomic imbalances. Gene expression profiling was performed using cDNA microarrays (Affymetrix). Results: Of 9 biopsy pairs identified so far, analysis results of the first 4...

  19. Implementing an online tool for genome-wide validation of survival-associated biomarkers in ovarian-cancer using microarray data from 1287 patients

    DEFF Research Database (Denmark)

    Győrffy, Balázs; Lánczky, András; Szállási, Zoltán

    2012-01-01

    was set up using gene expression data and survival information of 1287 ovarian cancer patients downloaded from Gene Expression Omnibus and The Cancer Genome Atlas (Affymetrix HG-U133A, HG-U133A 2.0, and HG-U133 Plus 2.0 microarrays). After quality control and normalization, only probes present on all......). A Kaplan–Meier survival plot was generated and significance was computed. The tool can be accessed online at www.kmplot.com/ovar. We used this integrative data analysis tool to validate the prognostic power of 37 biomarkers identified in the literature. Of these, CA125 (MUC16; P=3.7x10–5, hazard ratio (HR...... biomarker validation platform that mines all available microarray data to assess the prognostic power of 22 277 genes in 1287 ovarian cancer patients. We specifically used this tool to evaluate the effect of 37 previously published biomarkers on ovarian cancer prognosis....

  20. Developing tools for the study of molluscan immunity: The sequencing of the genome of the eastern oyster, Crassostrea virginica.

    Science.gov (United States)

    Gómez-Chiarri, Marta; Warren, Wesley C; Guo, Ximing; Proestou, Dina

    2015-09-01

    The eastern oyster, Crassostrea virginica, provides important ecological and economical services, making it the target of restoration projects and supporting a significant fishery/aquaculture industry with landings valued at more than $100 million in 2012 in the United States of America. Due to the impact of infectious diseases on wild, restored, and cultured populations, the eastern oyster has been the focus of studies on host-pathogen interactions and immunity, as well as the target of selective breeding efforts for disease resistant oyster lines. Despite these efforts, relatively little is known about the genetic basis of resistance to diseases or environmental stress, not only in eastern oyster, but also in other molluscan species of commercial interest worldwide. In order to develop tools and resources to assist in the elucidation of the genomic basis of traits of commercial, biological, and ecological interest in oysters, a team of genome and bioinformatics experts, in collaboration with the oyster research community, is sequencing, assembling, and annotating the first reference genome for the eastern oyster and producing an exhaustive transcriptome from a variety of oyster developmental stages and tissues in response to a diverse set of environmentally-relevant stimuli. These transcriptomes and reference genome for the eastern oyster, added to the already available genome and transcriptomes for the Pacific oyster (Crassostrea gigas) and other bivalve species, will be an essential resource for the discovery of candidate genes and markers associated with traits of commercial, biological, and ecologic importance in bivalve molluscs, including those related to host-pathogen interactions and immunity. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Correcting Inconsistencies and Errors in Bacterial Genome Metadata Using an Automated Curation Tool in Excel (AutoCurE).

    Science.gov (United States)

    Schmedes, Sarah E; King, Jonathan L; Budowle, Bruce

    2015-01-01

    Whole-genome data are invaluable for large-scale comparative genomic studies. Current sequencing technologies have made it feasible to sequence entire bacterial genomes with relative ease and time with a substantially reduced cost per nucleotide, hence cost per genome. More than 3,000 bacterial genomes have been sequenced and are available at the finished status. Publically available genomes can be readily downloaded; however, there are challenges to verify the specific supporting data contained within the download and to identify errors and inconsistencies that may be present within the organizational data content and metadata. AutoCurE, an automated tool for bacterial genome database curation in Excel, was developed to facilitate local database curation of supporting data that accompany downloaded genomes from the National Center for Biotechnology Information. AutoCurE provides an automated approach to curate local genomic databases by flagging inconsistencies or errors by comparing the downloaded supporting data to the genome reports to verify genome name, RefSeq accession numbers, the presence of archaea, BioProject/UIDs, and sequence file descriptions. Flags are generated for nine metadata fields if there are inconsistencies between the downloaded genomes and genomes reports and if erroneous or missing data are evident. AutoCurE is an easy-to-use tool for local database curation for large-scale genome data prior to downstream analyses.

  2. Genome-wide methylation analysis identifies genes silenced in non-seminoma cell lines.

    Science.gov (United States)

    Noor, Dzul Azri Mohamed; Jeyapalan, Jennie N; Alhazmi, Safiah; Carr, Matthew; Squibb, Benjamin; Wallace, Claire; Tan, Christopher; Cusack, Martin; Hughes, Jaime; Reader, Tom; Shipley, Janet; Sheer, Denise; Scotting, Paul J

    2016-01-01

    Silencing of genes by DNA methylation is a common phenomenon in many types of cancer. However, the genome-wide effect of DNA methylation on gene expression has been analysed in relatively few cancers. Germ cell tumours (GCTs) are a complex group of malignancies. They are unique in developing from a pluripotent progenitor cell. Previous analyses have suggested that non-seminomas exhibit much higher levels of DNA methylation than seminomas. The genomic targets that are methylated, the extent to which this results in gene silencing and the identity of the silenced genes most likely to play a role in the tumours' biology have not yet been established. In this study, genome-wide methylation and expression analysis of GCT cell lines was combined with gene expression data from primary tumours to address this question. Genome methylation was analysed using the Illumina infinium HumanMethylome450 bead chip system and gene expression was analysed using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Regulation by methylation was confirmed by demethylation using 5-aza-2-deoxycytidine and reverse transcription-quantitative PCR. Large differences in the level of methylation of the CpG islands of individual genes between tumour cell lines correlated well with differential gene expression. Treatment of non-seminoma cells with 5-aza-2-deoxycytidine verified that methylation of all genes tested played a role in their silencing in yolk sac tumour cells and many of these genes were also differentially expressed in primary tumours. Genes silenced by methylation in the various GCT cell lines were identified. Several pluripotency-associated genes were identified as a major functional group of silenced genes.

  3. Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species.

    Science.gov (United States)

    Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V

    2017-09-30

    Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis , we investigated nTNL orthologs in the genomes of common bean, Medicago , soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis , common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.

  4. Candidate genes revealed by a genome scan for mosquito resistance to a bacterial insecticide: sequence and gene expression variations

    Directory of Open Access Journals (Sweden)

    David Jean-Philippe

    2009-11-01

    Full Text Available Abstract Background Genome scans are becoming an increasingly popular approach to study the genetic basis of adaptation and speciation, but on their own, they are often helpless at identifying the specific gene(s or mutation(s targeted by selection. This shortcoming is hopefully bound to disappear in the near future, thanks to the wealth of new genomic resources that are currently being developed for many species. In this article, we provide a foretaste of this exciting new era by conducting a genome scan in the mosquito Aedes aegypti with the aim to look for candidate genes involved in resistance to Bacillus thuringiensis subsp. israelensis (Bti insecticidal toxins. Results The genome of a Bti-resistant and a Bti-susceptible strains was surveyed using about 500 MITE-based molecular markers, and the loci showing the highest inter-strain genetic differentiation were sequenced and mapped on the Aedes aegypti genome sequence. Several good candidate genes for Bti-resistance were identified in the vicinity of these highly differentiated markers. Two of them, coding for a cadherin and a leucine aminopeptidase, were further examined at the sequence and gene expression levels. In the resistant strain, the cadherin gene displayed patterns of nucleotide polymorphisms consistent with the action of positive selection (e.g. an excess of high compared to intermediate frequency mutations, as well as a significant under-expression compared to the susceptible strain. Conclusion Both sequence and gene expression analyses agree to suggest a role for positive selection in the evolution of this cadherin gene in the resistant strain. However, it is unlikely that resistance to Bti is conferred by this gene alone, and further investigation will be needed to characterize other genes significantly associated with Bti resistance in Ae. aegypti. Beyond these results, this article illustrates how genome scans can build on the body of new genomic information (here, full

  5. Regulation of gene expression in Mycoplasmas: contribution from Mycoplasma hyopneumoniae and Mycoplasma synoviae genome sequences

    Directory of Open Access Journals (Sweden)

    Humberto Maciel França Madeira

    2007-01-01

    Full Text Available This report describes the transcription apparatus of Mycoplasma hyopneumoniae (strains J and 7448 and Mycoplasma synoviae, using a comparative genomics approach to summarize the main features related to transcription and control of gene expression in mycoplasmas. Most of the transcription-related genes present in the three strains are well conserved among mycoplasmas. Some unique aspects of transcription in mycoplasmas and the scarcity of regulatory proteins in mycoplasma genomes are discussed.

  6. The Fanconi anemia/BRCA gene network in zebrafish: Embryonic expression and comparative genomics

    OpenAIRE

    Titus, Tom A.; Yan, Yi-Lin; Wilson, Catherine; Starks, Amber M.; Frohnmayer, Jonathan D.; Canestro, Cristian; Rodriguez-Mari, Adriana; He, Xinjun; Postlethwait, John H.

    2008-01-01

    Fanconi anemia (FA) is a genic disease resulting in bone marrow failure, high cancer risks, and infertility, and developmental anomalies including microphthalmia, microcephaly, hypoplastic radius and thumb. Here we present cDNA sequences, genetic mapping, and genomic analyses for the four previously undescribed zebrafish FA genes (fanci, fancj, fancm, and fancn, and show that they reverted to single copy after the teleost genome duplication. We tested the hypothesis that FA genes are expresse...

  7. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  8. G2D: a tool for mining genes associated with disease

    OpenAIRE

    Perez-Iratxeta, Carolina; Wjst, Matthias; Bork, Peer; Andrade, Miguel A

    2005-01-01

    Abstract Background Human inherited diseases can be associated by genetic linkage with one or more genomic regions. The availability of the complete sequence of the human genome allows examining those locations for an associated gene. We previously developed an algorithm to prioritize genes on a chromosomal region according to their possible relation to an inherited disease using a combination of data mining on biomedical databases and gene sequence analysis. Results We have implemented this ...

  9. Prospects: the tomato genome as a cornerstone for gene discovery

    Science.gov (United States)

    Those involved in the international tomato genome sequencing effort contributed to not only the development of an important genome sequence relevant to a major economic and nutritional crop, but also to the tomato experimental system as a model for plant biology. Without question, prior seminal work...

  10. Genome Mutational and Transcriptional Hotspots Are Traps for Duplicated Genes and Sources of Adaptations.

    Science.gov (United States)

    Fares, Mario A; Sabater-Muñoz, Beatriz; Toft, Christina

    2017-05-01

    Gene duplication generates new genetic material, which has been shown to lead to major innovations in unicellular and multicellular organisms. A whole-genome duplication occurred in the ancestor of Saccharomyces yeast species but 92% of duplicates returned to single-copy genes shortly after duplication. The persisting duplicated genes in Saccharomyces led to the origin of major metabolic innovations, which have been the source of the unique biotechnological capabilities in the Baker's yeast Saccharomyces cerevisiae. What factors have determined the fate of duplicated genes remains unknown. Here, we report the first demonstration that the local genome mutation and transcription rates determine the fate of duplicates. We show, for the first time, a preferential location of duplicated genes in the mutational and transcriptional hotspots of S. cerevisiae genome. The mechanism of duplication matters, with whole-genome duplicates exhibiting different preservation trends compared to small-scale duplicates. Genome mutational and transcriptional hotspots are rich in duplicates with large repetitive promoter elements. Saccharomyces cerevisiae shows more tolerance to deleterious mutations in duplicates with repetitive promoter elements, which in turn exhibit higher transcriptional plasticity against environmental perturbations. Our data demonstrate that the genome traps duplicates through the accelerated regulatory and functional divergence of their gene copies providing a source of novel adaptations in yeast. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. Systematic discovery of unannotated genes in 11 yeast species using a database of orthologous genomic segments

    LENUS (Irish Health Repository)

    OhEigeartaigh, Sean S

    2011-07-26

    Abstract Background In standard BLAST searches, no information other than the sequences of the query and the database entries is considered. However, in situations where two genes from different species have only borderline similarity in a BLAST search, the discovery that the genes are located within a region of conserved gene order (synteny) can provide additional evidence that they are orthologs. Thus, for interpreting borderline search results, it would be useful to know whether the syntenic context of a database hit is similar to that of the query. This principle has often been used in investigations of particular genes or genomic regions, but to our knowledge it has never been implemented systematically. Results We made use of the synteny information contained in the Yeast Gene Order Browser database for 11 yeast species to carry out a systematic search for protein-coding genes that were overlooked in the original annotations of one or more yeast genomes but which are syntenic with their orthologs. Such genes tend to have been overlooked because they are short, highly divergent, or contain introns. The key features of our software - called SearchDOGS - are that the database entries are classified into sets of genomic segments that are already known to be orthologous, and that very weak BLAST hits are retained for further analysis if their genomic location is similar to that of the query. Using SearchDOGS we identified 595 additional protein-coding genes among the 11 yeast species, including two new genes in Saccharomyces cerevisiae. We found additional genes for the mating pheromone a-factor in six species including Kluyveromyces lactis. Conclusions SearchDOGS has proven highly successful for identifying overlooked genes in the yeast genomes. We anticipate that our approach can be adapted for study of further groups of species, such as bacterial genomes. More generally, the concept of doing sequence similarity searches against databases to which external

  12. AthaMap web tools for the analysis of transcriptional and posttranscriptional regulation of gene expression in Arabidopsis thaliana.

    Science.gov (United States)

    Hehl, Reinhard; Bülow, Lorenz

    2014-01-01

    The AthaMap database provides a map of verified and predicted transcription factor (TF) and small RNA-binding sites for the A. thaliana genome. The database can be used for bioinformatic predictions of putative regulatory sites. Several online web tools are available that address specific questions. Starting with the identification of transcription factor-binding sites (TFBS) in any gene of interest, colocalizing TFBS can be identified as well as common TFBS in a set of user-provided genes. Furthermore, genes can be identified that are potentially targeted by specific transcription factors or small inhibitory RNAs. This chapter provides detailed information on how each AthaMap web tool can be used online. Examples on how this database is used to address questions in circadian and diurnal regulation are given. Furthermore, complementary databases and databases that go beyond questions addressed with AthaMap are discussed.

  13. CRISPR/Cas9 Genome Editing: A Promising Tool for Therapeutic Applications of Induced Pluripotent Stem Cells.

    Science.gov (United States)

    Zhang, Yanli; Sastre, Danuta; Wang, Feng

    2018-01-01

    Induced pluripotent stem cells hold tremendous potential for biological and therapeutic applications. The development of efficient technologies for targeted genome alteration of stem cells in disease models is a prerequisite for utilizing stem cells to their full potential. The revolutionary technology for genome editing known as the clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein 9 (Cas9) system is recently recognized as a powerful tool for editing DNA at specific loci. The ease of use of the CRISPR-Cas9 technology will allow us to improve our understanding of genomic variation in disease processes via cellular and animal models. More recently, this system was modified to repress (CRISPR interference, CRISPRi) or activate (CRISPR activation, CRISPRa) gene expression without alterations in the DNA, which amplified the scope of applications of CRISPR systems for stem cell biology. Here, we highlight latest advances of CRISPR-associated applications in human pluripotent stem cells. The challenges and future prospects of CRISPR-based systems for human research are also discussed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Profiling of gene duplication patterns of sequenced teleost genomes: evidence for rapid lineage-specific genome expansion mediated by recent tandem duplications.

    Science.gov (United States)

    Lu, Jianguo; Peatman, Eric; Tang, Haibao; Lewis, Joshua; Liu, Zhanjiang

    2012-06-15

    Gene duplication has had a major impact on genome evolution. Localized (or tandem) duplication resulting from unequal crossing over and whole genome duplication are believed to be the two dominant mechanisms contributing to vertebrate genome evolution. While much scrutiny has been directed toward discerning patterns indicative of whole-genome duplication events in teleost species, less attention has been paid to the continuous nature of gene duplications and their impact on the size, gene content, functional diversity, and overall architecture of teleost genomes. Here, using a Markov clustering algorithm directed approach we catalogue and analyze patterns of gene duplication in the four model teleost species with chromosomal coordinates: zebrafish, medaka, stickleback, and Tetraodon. Our analyses based on set size, duplication type, synonymous substitution rate (Ks), and gene ontology emphasize shared and lineage-specific patterns of genome evolution via gene duplication. Most strikingly, our analyses highlight the extraordinary duplication and retention rate of recent duplicates in zebrafish and their likely role in the structural and functional expansion of the zebrafish genome. We find that the zebrafish genome is remarkable in its large number of duplicated genes, small duplicate set size, biased Ks distribution toward minimal mutational divergence, and proportion of tandem and intra-chromosomal duplicates when compared with the other teleost model genomes. The observed gene duplication patterns have played significant roles in shaping the architecture of teleost genomes and appear to have contributed to the recent functional diversification and divergence of important physiological processes in zebrafish. We have analyzed gene duplication patterns and duplication types among the available teleost genomes and found that a large number of genes were tandemly and intrachromosomally duplicated, suggesting their origin of independent and continuous duplication

  15. The FUN of identifying gene function in bacterial pathogens; insights from Salmonella functional genomics.

    Science.gov (United States)

    Hammarlöf, Disa L; Canals, Rocío; Hinton, Jay C D

    2013-10-01

    The availability of thousands of genome sequences of bacterial pathogens poses a particular challenge because each genome contains hundreds of genes of unknown function (FUN). How can we easily discover which FUN genes encode important virulence factors? One solution is to combine two different functional genomic approaches. First, transcriptomics identifies bacterial FUN genes that show differential expression during the process of mammalian infection. Second, global mutagenesis identifies individual FUN genes that the pathogen requires to cause disease. The intersection of these datasets can reveal a small set of candidate genes most likely to encode novel virulence attributes. We demonstrate this approach with the Salmonella infection model, and propose that a similar strategy could be used for other bacterial pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Collective Dynamics of Specific Gene Ensembles Crucial for Neutrophil Differentiation: The Existence of Genome Vehicles Revealed

    Science.gov (United States)

    Giuliani, Alessandro; Tomita, Masaru

    2010-01-01

    Cell fate decision remarkably generates specific cell differentiation path among the multiple possibilities that can arise through the complex interplay of high-dimensional genome activities. The coordinated action of thousands of genes to switch cell fate decision has indicated the existence of stable attractors guiding the process. However, origins of the intracellular mechanisms that create “cellular attractor” still remain unknown. Here, we examined the collective behavior of genome-wide expressions for neutrophil differentiation through two different stimuli, dimethyl sulfoxide (DMSO) and all-trans-retinoic acid (atRA). To overcome the difficulties of dealing with single gene expression noises, we grouped genes into ensembles and analyzed their expression dynamics in correlation space defined by Pearson correlation and mutual information. The standard deviation of correlation distributions of gene ensembles reduces when the ensemble size is increased following the inverse square root law, for both ensembles chosen randomly from whole genome and ranked according to expression variances across time. Choosing the ensemble size of 200 genes, we show the two probability distributions of correlations of randomly selected genes for atRA and DMSO responses overlapped after 48 hours, defining the neutrophil attractor. Next, tracking the ranked ensembles' trajectories, we noticed that only certain, not all, fall into the attractor in a fractal-like manner. The removal of these genome elements from the whole genomes, for both atRA and DMSO responses, destroys the attractor providing evidence for the existence of specific genome elements (named “genome vehicle”) responsible for the neutrophil attractor. Notably, within the genome vehicles, genes with low or moderate expression changes, which are often considered noisy and insignificant, are essential components for the creation of the neutrophil attractor. Further investigations along with our findings might

  17. Agrobacterium rhizogenes-mediated transformation of Superroot-derived Lotus corniculatus plants: a valuable tool for functional genomics

    Directory of Open Access Journals (Sweden)

    Liu Wei

    2009-06-01

    Full Text Available Abstract Background Transgenic approaches provide a powerful tool for gene function investigations in plants. However, some legumes are still recalcitrant to current transformation technologies, limiting the extent to which functional genomic studies can be performed on. Superroot of Lotus corniculatus is a continuous root cloning system allowing direct somatic embryogenesis and mass regeneration of plants. Recently, a technique to obtain transgenic L. corniculatus plants from Superroot-derived leaves through A. tumefaciens-mediated transformation was described. However, transformation efficiency was low and it took about six months from gene transfer to PCR identification. Results In the present study, we developed an A. rhizogenes-mediated transformation of Superroot-derived L. corniculatus for gene function investigation, combining the efficient A. rhizogenes-mediated transformation and the rapid regeneration system of Superroot. The transformation system using A. rhizogenes K599 harbouring pGFPGUSPlus was improved by validating some parameters which may influence the transformation frequency. Using stem sections with one node as explants, a 2-day pre-culture of explants, infection with K599 at OD600 = 0.6, and co-cultivation on medium (pH 5.4 at 22°C for 2 days enhanced the transformation frequency significantly. As proof of concept, Superroot-derived L. corniculatus was transformed with a gene from wheat encoding an Na+/H+ antiporter (TaNHX2 using the described system. Transgenic Superroot plants were obtained and had increased salt tolerance, as expected from the expression of TaNHX2. Conclusion A rapid and efficient tool for gene function investigation in L. corniculatus was developed, combining the simplicity and high efficiency of the Superroot regeneration system and the availability of A. rhizogenes-mediated transformation. This system was improved by validating some parameters influencing the transformation frequency, which could

  18. Genome-wide identification and characterization of WRKY gene family in peanut

    Directory of Open Access Journals (Sweden)

    Hui eSong

    2016-04-01

    Full Text Available WRKY, an important transcription factor family, is widely distributed in the plant kingdom. Many reports focused on analysis of phylogenetic relationship and biological function of WRKY protein at the whole genome level in different plant species. However, little is known about WRKY proteins in the genome of Arachis species and their response to salicylic acid (SA and jasmonic acid (JA treatment. In this study, we identified 77 and 75 WRKY proteins from the two wild ancestral diploid genomes of cultivated tetraploid peanut, Arachis duranensis and Arachis ipaënsis, using bioinformatics approaches. Most peanut WRKY coding genes were located on A. duranensis chromosome A6 and A. ipaënsis chromosome B3, while the least number of WRKY genes was found in chromosome 9. The WRKY orthologous gene pairs in A. duranensis and A. ipaënsis chromosomes were highly syntenic. Our analysis indicated that segmental duplication events played a major role in AdWRKY and AiWRKY genes, and strong purifying selection was observed in gene duplication pairs. Furthermore, we translate the knowledge gained from the genome-wide analysis result of wild ancestral peanut to cultivated peanut to reveal that gene activities of specific cultivated peanut WRKY gene were changed due to SA and JA treatment. Peanut WRKY7, 8 and 13 genes were down-regulated, whereas WRKY1 and 12 genes were up-regulated with SA and JA treatment. These results could provide valuable information for peanut improvement.

  19. Genome-Wide Identification and Characterization of WRKY Gene Family in Peanut.

    Science.gov (United States)

    Song, Hui; Wang, Pengfei; Lin, Jer-Young; Zhao, Chuanzhi; Bi, Yuping; Wang, Xingjun

    2016-01-01

    WRKY, an important transcription factor family, is widely distributed in the plant kingdom. Many reports focused on analysis of phylogenetic relationship and biological function of WRKY protein at the whole genome level in different plant species. However, little is known about WRKY proteins in the genome of Arachis species and their response to salicylic acid (SA) and jasmonic acid (JA) treatment. In this study, we identified 77 and 75 WRKY proteins from the two wild ancestral diploid genomes of cultivated tetraploid peanut, Arachis duranensis and Arachis ipaënsis, using bioinformatics approaches. Most peanut WRKY coding genes were located on A. duranensis chromosome A6 and A. ipaënsis chromosome B3, while the least number of WRKY genes was found in chromosome 9. The WRKY orthologous gene pairs in A. duranensis and A. ipaënsis chromosomes were highly syntenic. Our analysis indic