WorldWideScience

Sample records for genes multiple genomic

  1. The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Burgt, van der A.; Bink, M.C.A.M.; Braak, ter C.J.F.; Ham, van R.C.H.J.

    2007-01-01

    The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 -

  2. Evolution of paralogous genes: Reconstruction of genome rearrangements through comparison of multiple genomes within Staphylococcus aureus.

    Science.gov (United States)

    Tsuru, Takeshi; Kawai, Mikihiko; Mizutani-Ui, Yoko; Uchiyama, Ikuo; Kobayashi, Ichizo

    2006-06-01

    Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.

  3. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling.

    Science.gov (United States)

    Inoue, Jun; Sato, Yukuto; Sinclair, Robert; Tsukamoto, Katsumi; Nishida, Mutsumi

    2015-12-01

    Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post-teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70-80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis.

  4. Integrating multiple genome annotation databases improves the interpretation of microarray gene expression data

    Directory of Open Access Journals (Sweden)

    Kennedy Breandan

    2010-01-01

    Full Text Available Abstract Background The Affymetrix GeneChip is a widely used gene expression profiling platform. Since the chips were originally designed, the genome databases and gene definitions have been considerably updated. Thus, more accurate interpretation of microarray data requires parallel updating of the specificity of GeneChip probes. We propose a new probe remapping protocol, using the zebrafish GeneChips as an example, by removing nonspecific probes, and grouping the probes into transcript level probe sets using an integrated zebrafish genome annotation. This genome annotation is based on combining transcript information from multiple databases. This new remapping protocol, especially the new genome annotation, is shown here to be an important factor in improving the interpretation of gene expression microarray data. Results Transcript data from the RefSeq, GenBank and Ensembl databases were downloaded from the UCSC genome browser, and integrated to generate a combined zebrafish genome annotation. Affymetrix probes were filtered and remapped according to the new annotation. The influence of transcript collection and gene definition methods was tested using two microarray data sets. Compared to remapping using a single database, this new remapping protocol results in up to 20% more probes being retained in the remapping, leading to approximately 1,000 more genes being detected. The differentially expressed gene lists are consequently increased by up to 30%. We are also able to detect up to three times more alternative splicing events. A small number of the bioinformatics predictions were confirmed using real-time PCR validation. Conclusions By combining gene definitions from multiple databases, it is possible to greatly increase the numbers of genes and splice variants that can be detected in microarray gene expression experiments.

  5. Identification of conserved gene clusters in multiple genomes based on synteny and homology

    Directory of Open Access Journals (Sweden)

    Nikolski Macha

    2011-10-01

    Full Text Available Abstract Background Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on gene teams and homology teams. Results Our primary contribution is a local sliding-window SYNS (SYNtenic teamS algorithm that refines an existing family structure into orthologous sub-families by analyzing the neighborhoods around the members of a given family with a locally sliding window. The neighborhood analysis is done by computing conserved gene clusters. We evaluate our algorithm on the existing homologous families from the Genolevures database over five genomes of the Hemyascomycete phylum. Conclusions The result is an efficient algorithm that works on multiple genomes, considers paralogous copies of genes and is able to uncover orthologous clusters even in distant genomes. Resulting orthologous clusters are comparable to those obtained by manual curation.

  6. A method to find groups of orthogous genes across multiple genomes

    Directory of Open Access Journals (Sweden)

    ALMEIDA, N.F.

    2013-12-01

    Full Text Available In this work we propose a simple method to obtain groups of homologous genes across multiple (k organisms, called kGC. Our method takes as input all-against-all Blastp comparisons and produces groups of homologous sequences. First, homologies among groups of paralogs of all the k compared genomes are found, followed by homologies of groups among k - 1 genomes and so on, until groups belonging exclusively to only one genome, that is, groups of one genome not presenting strong similarities with any group of any other genome, are identified. We have used our method to determine homologous groups across six Actinobacterial complete genomes. To validate kGC, we first investigate the Pfam classification of the homologous groups, and after compare our results with those produced by OrthoMCL. Although kGC is much simpler than OrthoMCL it presented similar results with respect to Pfam classification.

  7. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets.

    Science.gov (United States)

    Khan, Aziz; Mathelier, Anthony

    2017-05-31

    A common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited. To address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets. Intervene and its web application companion provide an easy command line and an interactive web interface to compute intersections of multiple genomic and list sets. They have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene , with the web application available at https://asntech.shinyapps.io/intervene .

  8. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Science.gov (United States)

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  9. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Directory of Open Access Journals (Sweden)

    Param Priya Singh

    2015-07-01

    Full Text Available Whole genome duplications (WGD have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  10. Identification of genes for complex diseases using integrated analysis of multiple types of genomic data.

    Directory of Open Access Journals (Sweden)

    Hongbao Cao

    Full Text Available Various types of genomic data (e.g., SNPs and mRNA transcripts have been employed to identify risk genes for complex diseases. However, the analysis of these data has largely been performed in isolation. Combining these multiple data for integrative analysis can take advantage of complementary information and thus can have higher power to identify genes (and/or their functions that would otherwise be impossible with individual data analysis. Due to the different nature, structure, and format of diverse sets of genomic data, multiple genomic data integration is challenging. Here we address the problem by developing a sparse representation based clustering (SRC method for integrative data analysis. As an example, we applied the SRC method to the integrative analysis of 376821 SNPs in 200 subjects (100 cases and 100 controls and expression data for 22283 genes in 80 subjects (40 cases and 40 controls to identify significant genes for osteoporosis (OP. Comparing our results with previous studies, we identified some genes known related to OP risk (e.g., 'THSD4', 'CRHR1', 'HSD11B1', 'THSD7A', 'BMPR1B' 'ADCY10', 'PRL', 'CA8','ESRRA', 'CALM1', 'CALM1', 'SPARC', and 'LRP1'. Moreover, we uncovered novel osteoporosis susceptible genes ('DICER1', 'PTMA', etc. that were not found previously but play functionally important roles in osteoporosis etiology from existing studies. In addition, the SRC method identified genes can lead to higher accuracy for the diagnosis/classification of osteoporosis subjects when compared with the traditional T-test and Fisher-exact test, which further validates the proposed SRC approach for integrative analysis.

  11. Identification of genes for complex diseases using integrated analysis of multiple types of genomic data.

    Science.gov (United States)

    Cao, Hongbao; Lei, Shufeng; Deng, Hong-Wen; Wang, Yu-Ping

    2012-01-01

    Various types of genomic data (e.g., SNPs and mRNA transcripts) have been employed to identify risk genes for complex diseases. However, the analysis of these data has largely been performed in isolation. Combining these multiple data for integrative analysis can take advantage of complementary information and thus can have higher power to identify genes (and/or their functions) that would otherwise be impossible with individual data analysis. Due to the different nature, structure, and format of diverse sets of genomic data, multiple genomic data integration is challenging. Here we address the problem by developing a sparse representation based clustering (SRC) method for integrative data analysis. As an example, we applied the SRC method to the integrative analysis of 376821 SNPs in 200 subjects (100 cases and 100 controls) and expression data for 22283 genes in 80 subjects (40 cases and 40 controls) to identify significant genes for osteoporosis (OP). Comparing our results with previous studies, we identified some genes known related to OP risk (e.g., 'THSD4', 'CRHR1', 'HSD11B1', 'THSD7A', 'BMPR1B' 'ADCY10', 'PRL', 'CA8','ESRRA', 'CALM1', 'CALM1', 'SPARC', and 'LRP1'). Moreover, we uncovered novel osteoporosis susceptible genes ('DICER1', 'PTMA', etc.) that were not found previously but play functionally important roles in osteoporosis etiology from existing studies. In addition, the SRC method identified genes can lead to higher accuracy for the diagnosis/classification of osteoporosis subjects when compared with the traditional T-test and Fisher-exact test, which further validates the proposed SRC approach for integrative analysis.

  12. Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates

    OpenAIRE

    Kikuta, Hiroshi; Laplante, Mary; Navrátilová, Pavla; Komisarczuk, Anna Zofia; Engström, Pär G.; Fredman, David; Akalin, Altuna; Caccamo, Mario; Sealy, Ian; Howe, Kerstin; Ghislain, Julien; Pezeron, Guillaume; Mourrain, Philippe; Ellingsen, Staale; Oates, Andrew C.

    2007-01-01

    We report evidence for a mechanism for the maintenance of long-range conserved synteny across vertebrate genomes. We found the largest mammal-teleost conserved chromosomal segments to be spanned by highly conserved noncoding elements (HCNEs), their developmental regulatory target genes, and phylogenetically and functionally unrelated “bystander” genes. Bystander genes are not specifically under the control of the regulatory elements that drive the target genes and are expressed in patterns th...

  13. The multiple facets of homology and their use in comparative genomics to study the evolution of genes, genomes, and species.

    Science.gov (United States)

    Descorps-Declère, Stéphane; Lemoine, Frédéric; Sculo, Quentin; Lespinet, Olivier; Labedan, Bernard

    2008-04-01

    The incredible development of comparative genomics during the last decade has required a correct use of the concept of homology that was previously utilized only by evolutionary biologists. Unhappily, this concept has been often misunderstood and thus misused when exploited outside its evolutionary context. This review brings back to the correct definition of homology and explains how this definition has been progressively refined in order to adapt it to the various new kinds of analysis of gene properties and of their products that appear with the progress of comparative genomics. Then, we illustrate the power and the proficiency of such a concept when using the available genomics data in order to study the evolution of individual genes, of entire genomes and of species, respectively. After explaining how we detect homologues by an exhaustive comparison of a hundred of complete proteomes, we describe three main lines of research we have developed in the recent years. The first one exploits synteny and gene context data to better understand the mechanisms of genome evolution in prokaryotes. The second one is based on phylogenomics approaches to reconstruct the tree of life. The last one is devoted to reminding that protein homology is often limited to structural segments (SOH=segment of homology or module). Detecting and numbering modules allows tracing back protein history by identifying the events of gene duplication and gene fusion. We insist that one of the main present difficulties in such studies is a lack of a reliable method to identify genuine orthologues. Finally, we show how these homology studies are helpful to annotate genes and genomes and to study the complexity of the relationships between sequence and function of a gene.

  14. The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes.

    Directory of Open Access Journals (Sweden)

    Adam Alexander Thil Smith

    2012-05-01

    Full Text Available Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes, a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short. The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.

  15. Gene relocations within chloroplast genomes of Jasminum and Menodora (Oleaceae) are due to multiple, overlapping inversions.

    Science.gov (United States)

    Lee, Hae-Lim; Jansen, Robert K; Chumley, Timothy W; Kim, Ki-Joong

    2007-05-01

    The chloroplast (cp) DNA sequence of Jasminum nudiflorum (Oleaceae-Jasmineae) is completed and compared with the large single-copy region sequences from 6 related species. The cp genomes of the tribe Jasmineae (Jasminum and Menodora) show several distinctive rearrangements, including inversions, gene duplications, insertions, inverted repeat expansions, and gene and intron losses. The ycf4-psaI region in Jasminum section Primulina was relocated as a result of 2 overlapping inversions of 21,169 and 18,414 bp. The 1st, larger inversion is shared by all members of the Jasmineae indicating that it occurred in the common ancestor of the tribe. Similar rearrangements were also identified in the cp genome of Menodora. In this case, 2 fragments including ycf4 and rps4-trnS-ycf3 genes were moved by 2 additional inversions of 14 and 59 kb that are unique to Menodora. Other rearrangements in the Oleaceae are confined to certain regions of the Jasminum and Menodora cp genomes, including the presence of highly repeated sequences and duplications of coding and noncoding sequences that are inserted into clpP and between rbcL and psaI. These insertions are correlated with the loss of 2 introns in clpP and a serial loss of segments of accD. The loss of the accD gene and clpP introns in both the monocot family Poaceae and the eudicot family Oleaceae are clearly independent evolutionary events. However, their genome organization is surprisingly similar despite the distant relationship of these 2 angiosperm families.

  16. Improving pan-genome annotation using whole genome multiple alignment

    Directory of Open Access Journals (Sweden)

    Salzberg Steven L

    2011-06-01

    Full Text Available Abstract Background Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. Results We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. Conclusions Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.

  17. The effect of multiple evolutionary selections on synonymous codon usage of genes in the Mycoplasma bovis genome.

    Directory of Open Access Journals (Sweden)

    Jian-hua Zhou

    Full Text Available Mycoplasma bovis is a major pathogen causing arthritis, respiratory disease and mastitis in cattle. A better understanding of its genetic features and evolution might represent evidences of surviving host environments. In this study, multiple factors influencing synonymous codon usage patterns in M. bovis (three strains' genomes were analyzed. The overall nucleotide content of genes in the M. bovis genome is AT-rich. Although the G and C contents at the third codon position of genes in the leading strand differ from those in the lagging strand (p<0.05, the 59 synonymous codon usage patterns of genes in the leading strand are highly similar to those in the lagging strand. The over-represented codons and the under-represented codons were identified. A comparison of the synonymous codon usage pattern of M. bovis and cattle (susceptible host indicated the independent formation of synonymous codon usage of M. bovis. Principal component analysis revealed that (i strand-specific mutational bias fails to affect the synonymous codon usage pattern in the leading and lagging strands, (ii mutation pressure from nucleotide content plays a role in shaping the overall codon usage, and (iii the major trend of synonymous codon usage has a significant correlation with the gene expression level that is estimated by the codon adaptation index. The plot of the effective number of codons against the G+C content at the third codon position also reveals that mutation pressure undoubtedly contributes to the synonymous codon usage pattern of M. bovis. Additionally, the formation of the overall codon usage is determined by certain evolutionary selections for gene function classification (30S protein, 50S protein, transposase, membrane protein, and lipoprotein and translation elongation region of genes in M. bovis. The information could be helpful in further investigations of evolutionary mechanisms of the Mycoplasma family and heterologous expression of its functionally

  18. Genomic organisation of the channel catfish Mx1 gene and characterisation of multiple channel catfish Mx gene promoters.

    Science.gov (United States)

    Plant, Karen P; Thune, Ronald L

    2008-05-01

    In order to further characterise channel catfish (Ictalurus punctatus) Mx1, studies were initiated to amplify and clone the Mx1 promoter into a reporter vector, pGL3basic. Initially the Mx1 gene was amplified from genomic DNA and was found to have 12 exons and 11 introns, spanning a region over 6 kilobases (kb) in length. The Mx1 promoter was amplified using genome walking and during this process four additional Mx promoters were identified, suggesting the presence of five Mx genes in the channel catfish. All five promoters possess an interferon stimulated response element (ISRE) and the Mx1 promoter possessed two potential NF-kappabeta transcription sites. Following cloning each construct was transiently transfected into COS-7 and EPC cells for 24h and treated with 5 microg/ml poly I:C for 24h. An increase in expression of the reporter gene in response to poly I:C was noted in both cell lines in the pGL3Mx1 construct only. However, the reporter gene was also constitutively expressed in these cells. Constitutive expression was also observed in channel catfish ovary cells transiently transfected with pGL3Mx1 only. Treatment with 5 microg/ml poly I:C did not increase this expression, which may be due to high levels of cell death in this difficult to transfect cell line. The constitutive expression observed implies that a repressor element is missing in the 390 base pair sequence of the Mx1 promoter used in this study. These results suggest that only channel catfish Mx1 is involved in the type I interferon pathway and that the presence of an ISRE in a regulatory region is not necessarily indicative of a role in the type I interferon response.

  19. Multiple pathways for steel regulation suggested by genomic and sequence analysis of the murine Steel gene

    Energy Technology Data Exchange (ETDEWEB)

    Bedell, M.A.; Copeland, N.G.; Jenkins, N.A. [NCI-Frederick Cancer Research and Development Center, Frederick, MD (United States)

    1996-03-01

    The Steel (Sl) locus encodes mast cell growth factor (Mgf) that is required for the development of germ cells, hematopoietic cells and melanocytes. Although the expression patterns of the Mgf gene are well characterized, little is known of the factors which regulate its expression. Here, we describe the cloning and sequence of the full-length transcription unit and the 5{prime} flanking region of the murine Mgf gene. The full-length Mgf mRNA consists of a short 5{prime} untranslated region (UTR), a 0.8-kb ORF and a long 3{prime} UTR. A single transcription initiation site is used in a number of mouse tissues and is located just downstream of binding sites for several known transcription factors. In the 5{prime} UTR, two ATGs were found upstream of the initiator methionine and are conserved among different species, suggesting that Mgf may be translationally regulated. At least two Mgf mRNAs are produced by alternative use of polyadenylation sites, but numerous other potential polyadenylation sites were found in the 3{prime} UTR. In addition, the 3{prime} UTR contains numerous sequence motifs that may regulate Mgf mRNA stability. These studies suggest multiple ways in which expression of Mgf may be regulated. 39 refs., 4 figs.

  20. Scanning for genes in large genomic regions: cosmid-based exon trapping of multiple exons in a single product.

    Science.gov (United States)

    Datson, N A; van de Vosse, E; Dauwerse, H G; Bout, M; van Ommen, G J; den Dunnen, J T

    1996-03-15

    To facilitate the scanning of large genomic regions for the presence of exonic gene segments we have constructed a cosmid-based exon trap vector. The vector serves a dual purpose since it is also suitable for contig construction and physical mapping. The exon trap cassette of vector sCOGH1 consists of the human growth hormone gene driven by the mouse mettallothionein-1 promoter. Inserts are cloned in the multicloning site located in intron 2 of the hGH gene. The efficiency of the system is demonstrated with cosmids containing multiple exons of the Duchenne Muscular Dystrophy gene. All exons present in the inserts were successfully retrieved and no cryptic products were detected. Up to seven exons were isolated simultaneously in a single spliced product. The system has been extended by a transcription-translation-test protocol to determine the presence of large open reading frames in the trapped products, using a combination of tailed PCR primers directing protein synthesis in three different reading frames, followed by in vitro transcription-translation. Having larger stretches of coding sequence in a single exon trap product rather than small single exons greatly facilitates further analysis of potential genes and offers new possibilities for direct mutation analysis of exon trap material.

  1. Multiple source genes of HAmo SINE actively expanded and ongoing retroposition in cyprinid genomes relying on its partner LINE

    Directory of Open Access Journals (Sweden)

    Gan Xiaoni

    2010-04-01

    Full Text Available Abstract Background We recently characterized HAmo SINE and its partner LINE in silver carp and bighead carp based on hybridization capture of repetitive elements from digested genomic DNA in solution using a bead-probe 1. To reveal the distribution and evolutionary history of SINEs and LINEs in cyprinid genomes, we performed a multi-species search for HAmo SINE and its partner LINE using the bead-probe capture and internal-primer-SINE polymerase chain reaction (PCR techniques. Results Sixty-seven full-size and 125 internal-SINE sequences (as well as 34 full-size and 9 internal sequences previously reported in bighead carp and silver carp from 17 species of the family Cyprinidae were aligned as well as 14 new isolated HAmoL2 sequences. Four subfamilies (type I, II, III and IV, which were divided based on diagnostic nucleotides in the tRNA-unrelated region, expanded preferentially within a certain lineage or within the whole family of Cyprinidae as multiple active source genes. The copy numbers of HAmo SINEs were estimated to vary from 104 to 106 in cyprinid genomes by quantitative RT-PCR. Over one hundred type IV members were identified and characterized in the primitive cyprinid Danio rerio genome but only tens of sequences were found to be similar with type I, II and III since the type IV was the oldest subfamily and its members dispersed in almost all investigated cyprinid fishes. For determining the taxonomic distribution of HAmo SINE, inter-primer SINE PCR was conducted in other non-cyprinid fishes, the results shows that HAmo SINE- related sequences may disperse in other families of order Cypriniforms but absent in other orders of bony fishes: Siluriformes, Polypteriformes, Lepidosteiformes, Acipenseriformes and Osteoglossiforms. Conclusions Depending on HAmo LINE2, multiple source genes (subfamilies of HAmo SINE actively expanded and underwent retroposition in a certain lineage or within the whole family of Cyprinidae. From this

  2. Evolutionary changes of multiple visual pigment genes in the complete genome of Pacific bluefin tuna

    OpenAIRE

    Nakamura, Yoji; Mori, Kazuki; Saitoh, Kenji; Oshima, Kenshiro; Mekuchi, Miyuki; Sugaya, Takuma; Shigenobu, Yuya; Ojima, Nobuhiko; Muta, Shigeru; Fujiwara, Atushi; Yasuike, Motoshige; Oohara, Ichiro; Hirakawa, Hideki; Chowdhury, Vishwajit Sur; Kobayashi, Takanori

    2013-01-01

    Tunas are migratory fishes in offshore habitats and top predators with unique features. Despite their ecological importance and high market values, the open-ocean lifestyle of tuna, in which effective sensing systems such as color vision are required for capture of prey, has been poorly understood. To elucidate the genetic and evolutionary basis of optic adaptation of tuna, we determined the genome sequence of the Pacific bluefin tuna (Thunnus orientalis), using next-generation sequencing tec...

  3. Multiple type 2 diabetes susceptibility genes following genome-wide association scan in UK samples

    OpenAIRE

    Zeggini, Eleftheria; Michael N. Weedon; Lindgren, Cecilia M.; Frayling, Timothy M; Elliott, Katherine S.; Lango, Hana; Nicholas J Timpson; Perry, John R B; Nigel W Rayner; Freathy, Rachel M; Barrett, Jeffrey C.; Shields, Beverley; Andrew P Morris; Ellard, Sian; Groves, Christopher J

    2007-01-01

    The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibilit...

  4. Genome-wide analysis of the sox family in the calcareous sponge Sycon ciliatum: multiple genes with unique expression patterns

    Directory of Open Access Journals (Sweden)

    Fortunato Sofia

    2012-07-01

    Full Text Available Abstract Background Sox genes are HMG-domain containing transcription factors with important roles in developmental processes in animals; many of them appear to have conserved functions among eumetazoans. Demosponges have fewer Sox genes than eumetazoans, but their roles remain unclear. The aim of this study is to gain insight into the early evolutionary history of the Sox gene family by identification and expression analysis of Sox genes in the calcareous sponge Sycon ciliatum. Methods Calcaronean Sox related sequences were retrieved by searching recently generated genomic and transcriptome sequence resources and analyzed using variety of phylogenetic methods and identification of conserved motifs. Expression was studied by whole mount in situ hybridization. Results We have identified seven Sox genes and four Sox-related genes in the complete genome of Sycon ciliatum. Phylogenetic and conserved motif analyses showed that five of Sycon Sox genes represent groups B, C, E, and F present in cnidarians and bilaterians. Two additional genes are classified as Sox genes but cannot be assigned to specific subfamilies, and four genes are more similar to Sox genes than to other HMG-containing genes. Thus, the repertoire of Sox genes is larger in this representative of calcareous sponges than in the demosponge Amphimedon queenslandica. It remains unclear whether this is due to the expansion of the gene family in Sycon or a secondary reduction in the Amphimedon genome. In situ hybridization of Sycon Sox genes revealed a variety of expression patterns during embryogenesis and in specific cell types of adult sponges. Conclusions In this study, we describe a large family of Sox genes in Sycon ciliatum with dynamic expression patterns, indicating that Sox genes are regulators in development and cell type determination in sponges, as observed in higher animals. The revealed differences between demosponge and calcisponge Sox genes repertoire highlight the need to

  5. Multiple horizontal gene transfer events and domain fusions have created novel regulatory and metabolic networks in the oomycete genome.

    Directory of Open Access Journals (Sweden)

    Paul Francis Morris

    Full Text Available Complex enzymes with multiple catalytic activities are hypothesized to have evolved from more primitive precursors. Global analysis of the Phytophthora sojae genome using conservative criteria for evaluation of complex proteins identified 273 novel multifunctional proteins that were also conserved in P. ramorum. Each of these proteins contains combinations of protein motifs that are not present in bacterial, plant, animal, or fungal genomes. A subset of these proteins were also identified in the two diatom genomes, but the majority of these proteins have formed after the split between diatoms and oomycetes. Documentation of multiple cases of domain fusions that are common to both oomycetes and diatom genomes lends additional support for the hypothesis that oomycetes and diatoms are monophyletic. Bifunctional proteins that catalyze two steps in a metabolic pathway can be used to infer the interaction of orthologous proteins that exist as separate entities in other genomes. We postulated that the novel multifunctional proteins of oomycetes could function as potential Rosetta Stones to identify interacting proteins of conserved metabolic and regulatory networks in other eukaryotic genomes. However ortholog analysis of each domain within our set of 273 multifunctional proteins against 39 sequenced bacterial and eukaryotic genomes, identified only 18 candidate Rosetta Stone proteins. Thus the majority of multifunctional proteins are not Rosetta Stones, but they may nonetheless be useful in identifying novel metabolic and regulatory networks in oomycetes. Phylogenetic analysis of all the enzymes in three pathways with one or more novel multifunctional proteins was conducted to determine the probable origins of individual enzymes. These analyses revealed multiple examples of horizontal transfer from both bacterial genomes and the photosynthetic endosymbiont in the ancestral genome of Stramenopiles. The complexity of the phylogenetic origins of these

  6. Prosecutor : parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources

    NARCIS (Netherlands)

    Blom, E.J.; Breitling, R.; Hofstede, K.J.; Roerdink, J.B.T.M.; van Hijum, S.A F T; Kuipers, O.P.

    2008-01-01

    Background: Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar

  7. Multiple Genome Comparison within a Bacterial Species Reveals a Unit of Evolution Spanning Two Adjacent Genes in a Tandem Paralog Cluster

    Science.gov (United States)

    Tsuru, Takeshi

    2008-01-01

    It has been assumed that an open reading frame (ORF) represents a unit of gene evolution as well as a unit of gene expression and function. In the present work, we report a case in which a unit comprising the 3′ region of an ORF linked to a downstream intergenic region that is in turn linked to the 5′ region of a downstream ORF has been conserved, and has served as the unit of gene evolution. The genes are tandem paralogous genes from the bacterium Staphylococcus aureus, for which more than ten entire genomes have been sequenced. We compared these multiple genome sequences at a locus for the lpl (lipoprotein-like) cluster (encoding lipoprotein homologs presumably related to their host interaction) in the genomic island termed νSaα. A highly conserved nucleotide sequence found within every lpl ORF is likely to provide a site for homologous recombination. Comparison of phylogenies of the 5′-variable region and the 3′-variable region within the same ORF revealed significant incongruence. In contrast, pairs of the 3′-variable region of an ORF and the 5′-variable region of the next downstream ORF gave more congruent phylogenies, with distinct groups of conserved pairs. The intergenic region seemed to have coevolved with the flanking variable regions. Multiple recombination events at the central conserved region appear to have caused various types of rearrangements among strains, shuffling the two variable regions in one ORF, but maintaining a conserved unit comprising the 3′-variable region, the intergenic region, and the 5′-variable region spanning adjacent ORFs. This result has strong impact on our understanding of gene evolution because most gene lineages underwent tandem duplication and then diversified. This work also illustrates the use of multiple genome sequences for high-resolution evolutionary analysis within the same species. PMID:18765438

  8. Scanning for genes in large genomic regions: cosmid-based exon trapping of multiple exons in a single product.

    OpenAIRE

    Datson, N.A.; Vosse, E van de; Dauwerse, H.G.; Bout, M; van Ommen, G J; J T den Dunnen

    1996-01-01

    To facilitate the scanning of large genomic regions for the presence of exonic gene segments we have constructed a cosmid-based exon trap vector. The vector serves a dual purpose since it is also suitable for contig construction and physical mapping. The exon trap cassette of vector sCOGH1 consists of the human growth hormone gene driven by the mouse mettallothionein-1 promoter. Inserts are cloned in the multicloning site located in intron 2 of the hGH gene. The efficiency of the system is de...

  9. Positive selection and multiple losses of the LINE-1-derived L1TD1 gene in mammals suggest a dual role in genome defense and pluripotency.

    Science.gov (United States)

    McLaughlin, Richard N; Young, Janet M; Yang, Lei; Neme, Rafik; Wichman, Holly A; Malik, Harmit S

    2014-09-01

    Mammalian genomes comprise many active and fossilized retroelements. The obligate requirement for retroelement integration affords host genomes an opportunity to 'domesticate' retroelement genes for their own purpose, leading to important innovations in genome defense and placentation. While many such exaptations involve retroviruses, the L1TD1 gene is the only known domesticated gene whose protein-coding sequence is almost entirely derived from a LINE-1 (L1) retroelement. Human L1TD1 has been shown to play an important role in pluripotency maintenance. To investigate how this role was acquired, we traced the origin and evolution of L1TD1. We find that L1TD1 originated in the common ancestor of eutherian mammals, but was lost or pseudogenized multiple times during mammalian evolution. We also find that L1TD1 has evolved under positive selection during primate and mouse evolution, and that one prosimian L1TD1 has 'replenished' itself with a more recent L1 ORF1 from the prosimian genome. These data suggest that L1TD1 has been recurrently selected for functional novelty, perhaps for a role in genome defense. L1TD1 loss is associated with L1 extinction in several megabat lineages, but not in sigmodontine rodents. We hypothesize that L1TD1 could have originally evolved for genome defense against L1 elements. Later, L1TD1 may have become incorporated into pluripotency maintenance in some lineages. Our study highlights the role of retroelement gene domestication in fundamental aspects of mammalian biology, and that such domesticated genes can adopt different functions in different lineages.

  10. Positive selection and multiple losses of the LINE-1-derived L1TD1 gene in mammals suggest a dual role in genome defense and pluripotency.

    Directory of Open Access Journals (Sweden)

    Richard N McLaughlin

    2014-09-01

    Full Text Available Mammalian genomes comprise many active and fossilized retroelements. The obligate requirement for retroelement integration affords host genomes an opportunity to 'domesticate' retroelement genes for their own purpose, leading to important innovations in genome defense and placentation. While many such exaptations involve retroviruses, the L1TD1 gene is the only known domesticated gene whose protein-coding sequence is almost entirely derived from a LINE-1 (L1 retroelement. Human L1TD1 has been shown to play an important role in pluripotency maintenance. To investigate how this role was acquired, we traced the origin and evolution of L1TD1. We find that L1TD1 originated in the common ancestor of eutherian mammals, but was lost or pseudogenized multiple times during mammalian evolution. We also find that L1TD1 has evolved under positive selection during primate and mouse evolution, and that one prosimian L1TD1 has 'replenished' itself with a more recent L1 ORF1 from the prosimian genome. These data suggest that L1TD1 has been recurrently selected for functional novelty, perhaps for a role in genome defense. L1TD1 loss is associated with L1 extinction in several megabat lineages, but not in sigmodontine rodents. We hypothesize that L1TD1 could have originally evolved for genome defense against L1 elements. Later, L1TD1 may have become incorporated into pluripotency maintenance in some lineages. Our study highlights the role of retroelement gene domestication in fundamental aspects of mammalian biology, and that such domesticated genes can adopt different functions in different lineages.

  11. Reconciling gene and genome duplication events: using multiple nuclear gene families to infer the phylogeny of the aquatic plant family Pontederiaceae.

    Science.gov (United States)

    Ness, Rob W; Graham, Sean W; Barrett, Spencer C H

    2011-11-01

    Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis.

  12. Prosecutor: parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources

    Directory of Open Access Journals (Sweden)

    van Hijum Sacha AFT

    2008-10-01

    Full Text Available Abstract Background Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar expression patterns are more likely to participate in shared biological processes. Results We developed Prosecutor, an application that enables researchers to rapidly infer gene function based on available gene expression data and functional annotations. Our parameter-free functional prediction method uses a sensitive algorithm to achieve a high association rate of linking genes with unknown function to annotated genes. Furthermore, Prosecutor utilizes additional biological information such as genomic context and known regulatory mechanisms that are specific for prokaryotes. We analyzed publicly available transcriptome data sets and used literature sources to validate putative functions suggested by Prosecutor. We supply the complete results of our analysis for 11 prokaryotic organisms on a dedicated website. Conclusion The Prosecutor software and supplementary datasets available at http://www.prosecutor.nl allow researchers working on any of the analyzed organisms to quickly identify the putative functions of their genes of interest. A de novo analysis allows new organisms to be studied.

  13. Multiple Changes of Gene Expression and Function Reveal Genomic and Phenotypic Complexity in SLE-like Disease.

    Directory of Open Access Journals (Sweden)

    Maria Wilbe

    2015-06-01

    Full Text Available The complexity of clinical manifestations commonly observed in autoimmune disorders poses a major challenge to genetic studies of such diseases. Systemic lupus erythematosus (SLE affects humans as well as other mammals, and is characterized by the presence of antinuclear antibodies (ANA in patients' sera and multiple disparate clinical features. Here we present evidence that particular sub-phenotypes of canine SLE-related disease, based on homogenous (ANA(H and speckled ANA (ANA(S staining pattern, and also steroid-responsive meningitis-arteritis (SRMA are associated with different but overlapping sets of genes. In addition to association to certain MHC alleles and haplotypes, we identified 11 genes (WFDC3, HOMER2, VRK1, PTPN3, WHAMM, BANK1, AP3B2, DAPP1, LAMTOR3, DDIT4L and PPP3CA located on five chromosomes that contain multiple risk haplotypes correlated with gene expression and disease sub-phenotypes in an intricate manner. Intriguingly, the association of BANK1 with both human and canine SLE appears to lead to similar changes in gene expression levels in both species. Our results suggest that molecular definition may help unravel the mechanisms of different clinical features common between and specific to various autoimmune disease phenotypes in dogs and humans.

  14. Genome-wide identification of transcriptional targets of RORA reveals direct regulation of multiple genes associated with autism spectrum disorder.

    Science.gov (United States)

    Sarachana, Tewarit; Hu, Valerie W

    2013-05-22

    We have recently identified the nuclear hormone receptor RORA (retinoic acid-related orphan receptor-alpha) as a novel candidate gene for autism spectrum disorder (ASD). Our independent cohort studies have consistently demonstrated the reduction of RORA transcript and/or protein levels in blood-derived lymphoblasts as well as in the postmortem prefrontal cortex and cerebellum of individuals with ASD. Moreover, we have also shown that RORA has the potential to be under negative and positive regulation by androgen and estrogen, respectively, suggesting the possibility that RORA may contribute to the male bias of ASD. However, little is known about transcriptional targets of this nuclear receptor, particularly in humans. Here we identify transcriptional targets of RORA in human neuronal cells on a genome-wide level using chromatin immunoprecipitation (ChIP) with an anti-RORA antibody followed by whole-genome promoter array (chip) analysis. Selected potential targets of RORA were then validated by an independent ChIP followed by quantitative PCR analysis. To further demonstrate that reduced RORA expression results in reduced transcription of RORA targets, we determined the expression levels of the selected transcriptional targets in RORA-deficient human neuronal cells, as well as in postmortem brain tissues from individuals with ASD who exhibit reduced RORA expression. The ChIP-on-chip analysis reveals that RORA1, a major isoform of RORA protein in human brain, can be recruited to as many as 2,764 genomic locations corresponding to promoter regions of 2,544 genes across the human genome. Gene ontology analysis of this dataset of genes that are potentially directly regulated by RORA1 reveals statistically significant enrichment in biological functions negatively impacted in individuals with ASD, including neuronal differentiation, adhesion and survival, synaptogenesis, synaptic transmission and plasticity, and axonogenesis, as well as higher level functions such as

  15. The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata.

    Science.gov (United States)

    Inglis, Diane O; Arnaud, Martha B; Binkley, Jonathan; Shah, Prachi; Skrzypek, Marek S; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2012-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at candida-curator@lists.stanford.edu.

  16. Multiple-integrations of HPV16 genome and altered transcription of viral oncogenes and cellular genes are associated with the development of cervical cancer.

    Directory of Open Access Journals (Sweden)

    Xulian Lu

    Full Text Available The constitutive expression of the high-risk HPV E6 and E7 viral oncogenes is the major cause of cervical cancer. To comprehensively explore the composition of HPV16 early transcripts and their genomic annotation, cervical squamous epithelial tissues from 40 HPV16-infected patients were collected for analysis of papillomavirus oncogene transcripts (APOT. We observed different transcription patterns of HPV16 oncogenes in progression of cervical lesions to cervical cancer and identified one novel transcript. Multiple-integration events in the tissues of cervical carcinoma (CxCa are significantly more often than those of low-grade squamous intraepithelial lesions (LSIL and high-grade squamous intraepithelial lesions (HSIL. Moreover, most cellular genes within or near these integration sites are cancer-associated genes. Taken together, this study suggests that the multiple-integrations of HPV genome during persistent viral infection, which thereby alters the expression patterns of viral oncogenes and integration-related cellular genes, play a crucial role in progression of cervical lesions to cervix cancer.

  17. Integrating genome-wide association study and expression quantitative trait loci data identifies multiple genes and gene set associated with neuroticism.

    Science.gov (United States)

    Fan, Qianrui; Wang, Wenyu; Hao, Jingcan; He, Awen; Wen, Yan; Guo, Xiong; Wu, Cuiyan; Ning, Yujie; Wang, Xi; Wang, Sen; Zhang, Feng

    2017-08-01

    Neuroticism is a fundamental personality trait with significant genetic determinant. To identify novel susceptibility genes for neuroticism, we conducted an integrative analysis of genomic and transcriptomic data of genome wide association study (GWAS) and expression quantitative trait locus (eQTL) study. GWAS summary data was driven from published studies of neuroticism, totally involving 170,906 subjects. eQTL dataset containing 927,753 eQTLs were obtained from an eQTL meta-analysis of 5311 samples. Integrative analysis of GWAS and eQTL data was conducted by summary data-based Mendelian randomization (SMR) analysis software. To identify neuroticism associated gene sets, the SMR analysis results were further subjected to gene set enrichment analysis (GSEA). The gene set annotation dataset (containing 13,311 annotated gene sets) of GSEA Molecular Signatures Database was used. SMR single gene analysis identified 6 significant genes for neuroticism, including MSRA (p value=2.27×10(-10)), MGC57346 (p value=6.92×10(-7)), BLK (p value=1.01×10(-6)), XKR6 (p value=1.11×10(-6)), C17ORF69 (p value=1.12×10(-6)) and KIAA1267 (p value=4.00×10(-6)). Gene set enrichment analysis observed significant association for Chr8p23 gene set (false discovery rate=0.033). Our results provide novel clues for the genetic mechanism studies of neuroticism. Copyright © 2017. Published by Elsevier Inc.

  18. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  19. Gene finding in novel genomes

    Directory of Open Access Journals (Sweden)

    Korf Ian

    2004-05-01

    Full Text Available Abstract Background Computational gene prediction continues to be an important problem, especially for genomes with little experimental data. Results I introduce the SNAP gene finder which has been designed to be easily adaptable to a variety of genomes. In novel genomes without an appropriate gene finder, I demonstrate that employing a foreign gene finder can produce highly inaccurate results, and that the most compatible parameters may not come from the nearest phylogenetic neighbor. I find that foreign gene finders are more usefully employed to bootstrap parameter estimation and that the resulting parameters can be highly accurate. Conclusion Since gene prediction is sensitive to species-specific parameters, every genome needs a dedicated gene finder.

  20. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Wasnick Michael

    2008-03-01

    Full Text Available Abstract Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any

  1. Brief Guide to Genomics: DNA, Genes and Genomes

    Science.gov (United States)

    ... Breve guía de genómica A Brief Guide to Genomics DNA, Genes and Genomes Deoxyribonucleic acid (DNA) is ... genetic basis for health and disease. Implications of Genomics for Medical Science Virtually every human ailment has ...

  2. Gene discovery in the Entamoeba invadens genome.

    Science.gov (United States)

    Wang, Zheng; Samuelson, John; Clark, C Graham; Eichinger, Daniel; Paul, Jaishree; Van Dellen, Katrina; Hall, Neil; Anderson, Iain; Loftus, Brendan

    2003-06-01

    Entamoeba invadens, a parasite of reptiles, is a model for the study of encystation by the human enteric pathogen Entamoeba histolytica, because E. invadens form cysts in axenic culture. With approximately 0.5-fold sequence coverage of the genome, we were able to get insights into E. invadens gene and genome features. Overall, the E. invadens genome displays many of the features that are emerging from ongoing genome sequencing efforts in E. histolytica. At the nucleotide level the E. invadens genome has on average 60% sequence identity with that of E. histolytica. The presence of introns in E. invadens was predicted with similar consensus (GTTTGT em leader A/TAG) sequences to those identified in E. histolytica and Entamoeba dispar. Sequences highly repeated in the genome of E. histolytica (rRNAs, tRNAs, CXXC-rich proteins, and Leu-rich repeat proteins) were found to be highly repeated in the E. invadens genome. Numerous proteins homologous to those implicated in amoebic virulence, (Gal/GalNAc lectins, amoebapores, and cysteine proteinases) and drug resistance (p-glycoproteins) were identified. Homologs of proteins involved in cell cycle, vesicular trafficking and signal transduction were identified, which may be involved in en/excystation and cell growth of E. invadens. Finally, multiple copies of a number of E. invadens genes coding for predicted enzymes involved in core metabolism and the targets of anti-amoebic drugs were identified.

  3. Diversification of genes encoding granule-bound starch synthase in monocots and dicots is marked by multiple genome-wide duplication events.

    Directory of Open Access Journals (Sweden)

    Jun Cheng

    Full Text Available Starch is one of the major components of cereals, tubers, and fruits. Genes encoding granule-bound starch synthase (GBSS, which is responsible for amylose synthesis, have been extensively studied in cereals but little is known about them in fruits. Due to their low copy gene number, GBSS genes have been used to study plant phylogenetic and evolutionary relationships. In this study, GBSS genes have been isolated and characterized in three fruit trees, including apple, peach, and orange. Moreover, a comprehensive evolutionary study of GBSS genes has also been conducted between both monocots and eudicots. Results have revealed that genomic structures of GBSS genes in plants are conserved, suggesting they all have evolved from a common ancestor. In addition, the GBSS gene in an ancestral angiosperm must have undergone genome duplication ∼251 million years ago (MYA to generate two families, GBSSI and GBSSII. Both GBSSI and GBSSII are found in monocots; however, GBSSI is absent in eudicots. The ancestral GBSSII must have undergone further divergence when monocots and eudicots split ∼165 MYA. This is consistent with expression profiles of GBSS genes, wherein these profiles are more similar to those of GBSSII in eudicots than to those of GBSSI genes in monocots. In dicots, GBSSII must have undergone further divergence when rosids and asterids split from each other ∼126 MYA. Taken together, these findings suggest that it is GBSSII rather than GBSSI of monocots that have orthologous relationships with GBSS genes of eudicots. Moreover, diversification of GBSS genes is mainly associated with genome-wide duplication events throughout the evolutionary course of history of monocots and eudicots.

  4. Multiple models for Rosaceae genomics.

    Science.gov (United States)

    Shulaev, Vladimir; Korban, Schuyler S; Sosinski, Bryon; Abbott, Albert G; Aldwinckle, Herb S; Folta, Kevin M; Iezzoni, Amy; Main, Dorrie; Arús, Pere; Dandekar, Abhaya M; Lewers, Kim; Brown, Susan K; Davis, Thomas M; Gardiner, Susan E; Potter, Daniel; Veilleux, Richard E

    2008-07-01

    The plant family Rosaceae consists of over 100 genera and 3,000 species that include many important fruit, nut, ornamental, and wood crops. Members of this family provide high-value nutritional foods and contribute desirable aesthetic and industrial products. Most rosaceous crops have been enhanced by human intervention through sexual hybridization, asexual propagation, and genetic improvement since ancient times, 4,000 to 5,000 B.C. Modern breeding programs have contributed to the selection and release of numerous cultivars having significant economic impact on the U.S. and world markets. In recent years, the Rosaceae community, both in the United States and internationally, has benefited from newfound organization and collaboration that have hastened progress in developing genetic and genomic resources for representative crops such as apple (Malus spp.), peach (Prunus spp.), and strawberry (Fragaria spp.). These resources, including expressed sequence tags, bacterial artificial chromosome libraries, physical and genetic maps, and molecular markers, combined with genetic transformation protocols and bioinformatics tools, have rendered various rosaceous crops highly amenable to comparative and functional genomics studies. This report serves as a synopsis of the resources and initiatives of the Rosaceae community, recent developments in Rosaceae genomics, and plans to apply newly accumulated knowledge and resources toward breeding and crop improvement.

  5. Cancer genomics object model: an object model for multiple functional genomics data for cancer research.

    Science.gov (United States)

    Park, Yu Rang; Lee, Hye Won; Cho, Sung Bum; Kim, Ju Han

    2007-01-01

    The development of functional genomics including transcriptomics, proteomics and metabolomics allow us to monitor a large number of key cellular pathways simultaneously. Several technology-specific data models have been introduced for the representation of functional genomics experimental data, including the MicroArray Gene Expression-Object Model (MAGE-OM), the Proteomics Experiment Data Repository (PEDRo), and the Tissue MicroArray-Object Model (TMA-OM). Despite the increasing number of cancer studies using multiple functional genomics technologies, there is still no integrated data model for multiple functional genomics experimental and clinical data. We propose an object-oriented data model for cancer genomics research, Cancer Genomics Object Model (CaGe-OM). We reference four data models: Functional Genomic-Object Model, MAGE-OM, TMAOM and PEDRo. The clinical and histopathological information models are created by analyzing cancer management workflow and referencing the College of American Pathology Cancer Protocols and National Cancer Institute Common Data Elements. The CaGe-OM provides a comprehensive data model for integrated storage and analysis of clinical and multiple functional genomics data.

  6. Employment of Near Full-Length Ribosome Gene TA-Cloning and Primer-Blast to Detect Multiple Species in a Natural Complex Microbial Community Using Species-Specific Primers Designed with Their Genome Sequences.

    Science.gov (United States)

    Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou

    2016-11-01

    It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.

  7. Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor; Kohler, Annegret; Martin, Francis

    2013-03-08

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012 alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.

  8. Multiple Myeloma Genomics: A Systematic Review.

    Science.gov (United States)

    Weaver, Casey J; Tariman, Joseph D

    2017-08-01

    This integrative review describes the genomic variants that have been found to be associated with poor prognosis in patients diagnosed with multiple myeloma (MM). Second, it identifies MM genetic and genomic changes using next-generation sequencing, specifically whole-genome sequencing or exome sequencing. A search for peer-reviewed articles through PubMed, EBSCOhost, and DePaul WorldCat Libraries Worldwide yielded 33 articles that were included in the final analysis. The most commonly reported genetic changes were KRAS, NRAS, TP53, FAM46C, BRAF, DIS3, ATM, and CCND1. These genetic changes play a role in the pathogenesis of MM, prognostication, and therapeutic targets for novel therapies. MM genetics and genomics are expanding rapidly; oncology nurse clinicians must have basic competencies in genetics and genomics to help patients understand the complexities of genetic and genomic alterations and be able to refer patients to appropriate genomic professionals if needed. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Multiple Genes Related to Muscle Identified through a Joint Analysis of a Two-stage Genome-wide Association Study for Racing Performance of 1,156 Thoroughbreds.

    Science.gov (United States)

    Shin, Dong-Hyun; Lee, Jin Woo; Park, Jong-Eun; Choi, Ik-Young; Oh, Hee-Seok; Kim, Hyeon Jeong; Kim, Heebal

    2015-06-01

    Thoroughbred, a relatively recent horse breed, is best known for its use in horse racing. Although myostatin (MSTN) variants have been reported to be highly associated with horse racing performance, the trait is more likely to be polygenic in nature. The purpose of this study was to identify genetic variants strongly associated with racing performance by using estimated breeding value (EBV) for race time as a phenotype. We conducted a two-stage genome-wide association study to search for genetic variants associated with the EBV. In the first stage of genome-wide association study, a relatively large number of markers (~54,000 single-nucleotide polymorphisms, SNPs) were evaluated in a small number of samples (240 horses). In the second stage, a relatively small number of markers identified to have large effects (170 SNPs) were evaluated in a much larger number of samples (1,156 horses). We also validated the SNPs related to MSTN known to have large effects on racing performance and found significant associations in the stage two analysis, but not in stage one. We identified 28 significant SNPs related to 17 genes. Among these, six genes have a function related to myogenesis and five genes are involved in muscle maintenance. To our knowledge, these genes are newly reported for the genetic association with racing performance of Thoroughbreds. It complements a recent horse genome-wide association studies of racing performance that identified other SNPs and genes as the most significant variants. These results will help to expand our knowledge of the polygenic nature of racing performance in Thoroughbreds.

  10. Multiple Genes Related to Muscle Identified through a Joint Analysis of a Two-stage Genome-wide Association Study for Racing Performance of 1,156 Thoroughbreds

    Directory of Open Access Journals (Sweden)

    Dong-Hyun Shin

    2015-06-01

    Full Text Available Thoroughbred, a relatively recent horse breed, is best known for its use in horse racing. Although myostatin (MSTN variants have been reported to be highly associated with horse racing performance, the trait is more likely to be polygenic in nature. The purpose of this study was to identify genetic variants strongly associated with racing performance by using estimated breeding value (EBV for race time as a phenotype. We conducted a two-stage genome-wide association study to search for genetic variants associated with the EBV. In the first stage of genome-wide association study, a relatively large number of markers (~54,000 single-nucleotide polymorphisms, SNPs were evaluated in a small number of samples (240 horses. In the second stage, a relatively small number of markers identified to have large effects (170 SNPs were evaluated in a much larger number of samples (1,156 horses. We also validated the SNPs related to MSTN known to have large effects on racing performance and found significant associations in the stage two analysis, but not in stage one. We identified 28 significant SNPs related to 17 genes. Among these, six genes have a function related to myogenesis and five genes are involved in muscle maintenance. To our knowledge, these genes are newly reported for the genetic association with racing performance of Thoroughbreds. It complements a recent horse genome-wide association studies of racing performance that identified other SNPs and genes as the most significant variants. These results will help to expand our knowledge of the polygenic nature of racing performance in Thoroughbreds.

  11. Comparative genomic analysis of eutherian kallikrein genes

    Directory of Open Access Journals (Sweden)

    Marko Premzl

    2017-03-01

    Full Text Available The present study made attempts to update and revise eutherian kallikrein genes implicated in major physiological and pathological processes and in medical molecular diagnostics. Using eutherian comparative genomic analysis protocol and free available genomic sequence assemblies, the tests of reliability of eutherian public genomic sequences annotated most comprehensive curated third party data gene data set of eutherian kallikrein genes including 121 complete coding sequences among 335 potential coding sequences. The present analysis first described 13 major gene clusters of eutherian kallikrein genes, and explained their differential gene expansion patterns. One updated classification and nomenclature of eutherian kallikrein genes was proposed, as new framework of future experiments.

  12. Gene and genome duplication in Acanthamoeba polyphaga Mimivirus.

    Science.gov (United States)

    Suhre, Karsten

    2005-11-01

    Gene duplication is key to molecular evolution in all three domains of life and may be the first step in the emergence of new gene function. It is a well-recognized feature in large DNA viruses but has not been studied extensively in the largest known virus to date, the recently discovered Acanthamoeba polyphaga Mimivirus. Here, I present a systematic analysis of gene and genome duplication events in the mimivirus genome. I found that one-third of the mimivirus genes are related to at least one other gene in the mimivirus genome, either through a large segmental genome duplication event that occurred in the more remote past or through more recent gene duplication events, which often occur in tandem. This shows that gene and genome duplication played a major role in shaping the mimivirus genome. Using multiple alignments, together with remote-homology detection methods based on Hidden Markov Model comparison, I assign putative functions to some of the paralogous gene families. I suggest that a large part of the duplicated mimivirus gene families are likely to interfere with important host cell processes, such as transcription control, protein degradation, and cell regulatory processes. My findings support the view that large DNA viruses are complex evolving organisms, possibly deeply rooted within the tree of life, and oppose the paradigm that viral evolution is dominated by lateral gene acquisition, at least in regard to large DNA viruses.

  13. Uses of antimicrobial genes from microbial genome

    Science.gov (United States)

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  14. The Analysis of Multiple Genome Comparisons in Genus Escherichia and Its Application to the Discovery of Uncharacterised Metabolic Genes in Uropathogenic Escherichia coli CFT073

    Directory of Open Access Journals (Sweden)

    William A. Bryant

    2009-01-01

    Full Text Available A survey of a complete gene synteny comparison has been carried out between twenty fully sequenced strains from the genus Escherichia with the aim of finding yet uncharacterised genes implicated in the metabolism of uropathogenic strains of E. coli (UPEC. Several sets of adjacent colinear genes have been identified which are present in all four UPEC included in this study (CFT073, F11, UTI89, and 536, annotated with putative metabolic functions, but are not found in any other strains considered. An operon closely homologous to that encoding the L-sorbose degradation pathway in Klebsiella pneumoniae has been identified in E. coli CFT073; this operon is present in all of the UPEC considered, but only in 7 of the other 16 strains. The operon's function has been confirmed by cloning the genes into E. coli DH5α and testing for growth on L-sorbose. The functional genomic approach combining in silico and in vitro work presented here can be used as a basis for the discovery of other uncharacterised genes contributing to bacterial survival in specific environments.

  15. Genomic sequencing and analyses of Lymantria xylina multiple nucleopolyhedrovirus

    Directory of Open Access Journals (Sweden)

    Lo Chu-Fang

    2010-02-01

    Full Text Available Abstract Background Outbreaks of the casuarina moth, Lymantria xylina Swinehoe (Lepidoptera: Lymantriidae, which is a very important forest pest in Taiwan, have occurred every five to 10 years. This moth has expanded its range of host plants to include more than 65 species of broadleaf trees. LyxyMNPV (L. xylina multiple nucleopolyhedrovirus is highly virulent to the casuarina moth and has been investigated as a possible biopesticide for controlling this moth. LdMNPV-like virus has also been isolated from Lymantria xylina larvae but LyxyMNPV was more virulent than LdMNPV-like virus both in NTU-LY and IPLB-LD-652Y cell lines. To better understand LyxyMNPV, the nucleotide sequence of the LyxyMNPV DNA genome was determined and analysed. Results The genome of LyxyMNPV consists of 156,344 bases, has a G+C content of 53.4% and contains 157 putative open reading frames (ORFs. The gene content and gene order of LyxyMNPV were similar to those of LdMNPV, with 151 ORFs identified as homologous to those reported in the LdMNPV genome. Two genes (Lyxy49 and Lyxy123 were homologous to other baculoviruses, and four unique LyxyMNPV ORFs (Lyxy11, Lyxy19, Lyxy130 and Lyxy131 were identified in the LyxyMNPV genome, including a gag-like gene that was not reported in baculoviruses. LdMNPV contains 23 ORFs that are absent in LyxyMNPV. Readily identifiable homologues of the gene host range factor-1 (hrf-1, which appears to be involved in the susceptibility of L. dispar to NPV infection, were not present in LyxyMNPV. Additionally, two putative odv-e27 homologues were identified in LyxyMNPV. The LyxyMNPV genome encoded 14 bro genes compared with 16 in LdMNPV, which occupied more than 8% of the LyxyMNPV genome. Thirteen homologous regions (hrs were identified containing 48 repeated sequences composed of 30-bp imperfect palindromes. However, they differed in the relative positions, number of repeats and orientation in the genome compared to LdMNPV. Conclusion The gene

  16. Gene and genome parameters of mammalian liver circadian genes (LCGs.

    Directory of Open Access Journals (Sweden)

    Gang Wu

    Full Text Available The mammalian circadian system controls various physiology processes and behavior responses by regulating thousands of circadian genes with rhythmic expressions. In this study, we redefined circadian-regulated genes based on published results in the mouse liver and compared them with other gene groups defined relative to circadian regulations, especially the non-circadian-regulated genes expressed in liver at multiple molecular levels from gene position to protein expression based on integrative analyses of different datasets from the literature. Based on the intra-tissue analysis, the liver circadian genes or LCGs show unique features when compared to other gene groups. First, LCGs in general have less neighboring genes and larger in both genomic and 3'-UTR lengths but shorter in CDS (coding sequence lengths. Second, LCGs have higher mRNA and protein abundance, higher temporal expression variations, and shorter mRNA half-life. Third, more than 60% of LCGs form major co-expression clusters centered in four temporal windows: dawn, day, dusk, and night. In addition, larger and smaller LCGs are found mainly expressed in the day and night temporal windows, respectively, and we believe that LCGs are well-partitioned into the gene expression regulatory network that takes advantage of gene size, expression constraint, and chromosomal architecture. Based on inter-tissue analysis, more than half of LCGs are ubiquitously expressed in multiple tissues but only show rhythmical expression in one or limited number of tissues. LCGs show at least three-fold lower expression variations across the temporal windows than those among different tissues, and this observation suggests that temporal expression variations regulated by the circadian system is relatively subtle as compared with the tissue expression variations formed during development. Taken together, we suggest that the circadian system selects gene parameters in a cost effective way to improve tissue

  17. Cactus: Algorithms for genome multiple sequence alignment

    OpenAIRE

    Paten, Benedict; Earl, Dent; Nguyen, Ngan; Diekhans, Mark; Zerbino, Daniel; Haussler, David

    2011-01-01

    Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms...

  18. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    Science.gov (United States)

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  19. Genomics of local adaptation with gene flow.

    Science.gov (United States)

    Tigano, Anna; Friesen, Vicki L

    2016-05-01

    Gene flow is a fundamental evolutionary force in adaptation that is especially important to understand as humans are rapidly changing both the natural environment and natural levels of gene flow. Theory proposes a multifaceted role for gene flow in adaptation, but it focuses mainly on the disruptive effect that gene flow has on adaptation when selection is not strong enough to prevent the loss of locally adapted alleles. The role of gene flow in adaptation is now better understood due to the recent development of both genomic models of adaptive evolution and genomic techniques, which both point to the importance of genetic architecture in the origin and maintenance of adaptation with gene flow. In this review, we discuss three main topics on the genomics of adaptation with gene flow. First, we investigate selection on migration and gene flow. Second, we discuss the three potential sources of adaptive variation in relation to the role of gene flow in the origin of adaptation. Third, we explain how local adaptation is maintained despite gene flow: we provide a synthesis of recent genomic models of adaptation, discuss the genomic mechanisms and review empirical studies on the genomics of adaptation with gene flow. Despite predictions on the disruptive effect of gene flow in adaptation, an increasing number of studies show that gene flow can promote adaptation, that local adaptations can be maintained despite high gene flow, and that genetic architecture plays a fundamental role in the origin and maintenance of local adaptation with gene flow.

  20. Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes.

    Science.gov (United States)

    Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

    2014-12-19

    Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution.

  1. [Susceptibility gene in multiple system atrophy (MSA)].

    Science.gov (United States)

    Tsuji, Shoji

    2014-01-01

    To elucidate molecular bases of multiple system atrophy (MSA), we first focused on recently identified MSA multiplex families. Though linkage analyses followed by whole genome resequencing, we have identified a causative gene, COQ2, for MSA. We then conducted comprehensive nucleotide sequence analysis of COQ2 of sporadic MSA cases and controls, and found that functionally deleterious COQ2 variants confer a strong risk for developing MSA. COQ2 encodes an enzyme in the biosynthetic pathway of coenzyme Q10. Decreased synthesis of coenzyme Q10 is considered to be involved in the pathogenesis of MSA through decreased electron transport in mitochondria and increased vulnerability to oxidative stress.

  2. Syntenator: Multiple gene order alignments with a gene-specific scoring function

    Directory of Open Access Journals (Sweden)

    Dieterich Christoph

    2008-11-01

    Full Text Available Abstract Background Identification of homologous regions or conserved syntenies across genomes is one crucial step in comparative genomics. This task is usually performed by genome alignment softwares like WABA or blastz. In case of conserved syntenies, such regions are defined as conserved gene orders. On the gene order level, homologous regions can even be found between distantly related genomes, which do not align on the nucleotide sequence level. Results We present a novel approach to identify regions of conserved synteny across multiple genomes. Syntenator represents genomes and alignments thereof as partial order graphs (POGs. These POGs are aligned by a dynamic programming approach employing a gene-specific scoring function. The scoring function reflects the level of protein sequence similarity for each possible gene pair. Our method consistently defines larger homologous regions in pairwise gene order alignments than nucleotide-level comparisons. Our method is superior to methods that work on predefined homology gene sets (as implemented in Blockfinder. Syntenator successfully reproduces 80% of the EnsEMBL man-mouse conserved syntenic blocks. The full potential of our method becomes visible by comparing remotely related genomes and multiple genomes. Gene order alignments potentially resolve up to 75% of the EnsEMBL 1:many orthology relations and 27% of the many:many orthology relations. Conclusion We propose Syntenator as a software solution to reliably infer conserved syntenies among distantly related genomes. The software is available from http://www2.tuebingen.mpg.de/abt4/plone.

  3. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  4. Pichia stipitis genomics, transcriptomics, and gene clusters

    Science.gov (United States)

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  5. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  6. Genome classification by gene distribution: An overlapping subspace clustering approach

    Directory of Open Access Journals (Sweden)

    Halgamuge Saman K

    2008-04-01

    Full Text Available Abstract Background Genomes of lower organisms have been observed with a large amount of horizontal gene transfers, which cause difficulties in their evolutionary study. Bacteriophage genomes are a typical example. One recent approach that addresses this problem is the unsupervised clustering of genomes based on gene order and genome position, which helps to reveal species relationships that may not be apparent from traditional phylogenetic methods. Results We propose the use of an overlapping subspace clustering algorithm for such genome classification problems. The advantage of subspace clustering over traditional clustering is that it can associate clusters with gene arrangement patterns, preserving genomic information in the clusters produced. Additionally, overlapping capability is desirable for the discovery of multiple conserved patterns within a single genome, such as those acquired from different species via horizontal gene transfers. The proposed method involves a novel strategy to vectorize genomes based on their gene distribution. A number of existing subspace clustering and biclustering algorithms were evaluated to identify the best framework upon which to develop our algorithm; we extended a generic subspace clustering algorithm called HARP to incorporate overlapping capability. The proposed algorithm was assessed and applied on bacteriophage genomes. The phage grouping results are consistent overall with the Phage Proteomic Tree and showed common genomic characteristics among the TP901-like, Sfi21-like and sk1-like phage groups. Among 441 phage genomes, we identified four significantly conserved distribution patterns structured by the terminase, portal, integrase, holin and lysin genes. We also observed a subgroup of Sfi21-like phages comprising a distinctive divergent genome organization and identified nine new phage members to the Sfi21-like genus: Staphylococcus 71, phiPVL108, Listeria A118, 2389, Lactobacillus phi AT3, A2

  7. KEGG: kyoto encyclopedia of genes and genomes.

    Science.gov (United States)

    Kanehisa, M; Goto, S

    2000-01-01

    KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www. genome.ad.jp/kegg/).

  8. Evolution of genes and genomes on the Drosophila phylogeny.

    Science.gov (United States)

    Clark, Andrew G; Eisen, Michael B; Smith, Douglas R; Bergman, Casey M; Oliver, Brian; Markow, Therese A; Kaufman, Thomas C; Kellis, Manolis; Gelbart, William; Iyer, Venky N; Pollard, Daniel A; Sackton, Timothy B; Larracuente, Amanda M; Singh, Nadia D; Abad, Jose P; Abt, Dawn N; Adryan, Boris; Aguade, Montserrat; Akashi, Hiroshi; Anderson, Wyatt W; Aquadro, Charles F; Ardell, David H; Arguello, Roman; Artieri, Carlo G; Barbash, Daniel A; Barker, Daniel; Barsanti, Paolo; Batterham, Phil; Batzoglou, Serafim; Begun, Dave; Bhutkar, Arjun; Blanco, Enrico; Bosak, Stephanie A; Bradley, Robert K; Brand, Adrianne D; Brent, Michael R; Brooks, Angela N; Brown, Randall H; Butlin, Roger K; Caggese, Corrado; Calvi, Brian R; Bernardo de Carvalho, A; Caspi, Anat; Castrezana, Sergio; Celniker, Susan E; Chang, Jean L; Chapple, Charles; Chatterji, Sourav; Chinwalla, Asif; Civetta, Alberto; Clifton, Sandra W; Comeron, Josep M; Costello, James C; Coyne, Jerry A; Daub, Jennifer; David, Robert G; Delcher, Arthur L; Delehaunty, Kim; Do, Chuong B; Ebling, Heather; Edwards, Kevin; Eickbush, Thomas; Evans, Jay D; Filipski, Alan; Findeiss, Sven; Freyhult, Eva; Fulton, Lucinda; Fulton, Robert; Garcia, Ana C L; Gardiner, Anastasia; Garfield, David A; Garvin, Barry E; Gibson, Greg; Gilbert, Don; Gnerre, Sante; Godfrey, Jennifer; Good, Robert; Gotea, Valer; Gravely, Brenton; Greenberg, Anthony J; Griffiths-Jones, Sam; Gross, Samuel; Guigo, Roderic; Gustafson, Erik A; Haerty, Wilfried; Hahn, Matthew W; Halligan, Daniel L; Halpern, Aaron L; Halter, Gillian M; Han, Mira V; Heger, Andreas; Hillier, LaDeana; Hinrichs, Angie S; Holmes, Ian; Hoskins, Roger A; Hubisz, Melissa J; Hultmark, Dan; Huntley, Melanie A; Jaffe, David B; Jagadeeshan, Santosh; Jeck, William R; Johnson, Justin; Jones, Corbin D; Jordan, William C; Karpen, Gary H; Kataoka, Eiko; Keightley, Peter D; Kheradpour, Pouya; Kirkness, Ewen F; Koerich, Leonardo B; Kristiansen, Karsten; Kudrna, Dave; Kulathinal, Rob J; Kumar, Sudhir; Kwok, Roberta; Lander, Eric; Langley, Charles H; Lapoint, Richard; Lazzaro, Brian P; Lee, So-Jeong; Levesque, Lisa; Li, Ruiqiang; Lin, Chiao-Feng; Lin, Michael F; Lindblad-Toh, Kerstin; Llopart, Ana; Long, Manyuan; Low, Lloyd; Lozovsky, Elena; Lu, Jian; Luo, Meizhong; Machado, Carlos A; Makalowski, Wojciech; Marzo, Mar; Matsuda, Muneo; Matzkin, Luciano; McAllister, Bryant; McBride, Carolyn S; McKernan, Brendan; McKernan, Kevin; Mendez-Lago, Maria; Minx, Patrick; Mollenhauer, Michael U; Montooth, Kristi; Mount, Stephen M; Mu, Xu; Myers, Eugene; Negre, Barbara; Newfeld, Stuart; Nielsen, Rasmus; Noor, Mohamed A F; O'Grady, Patrick; Pachter, Lior; Papaceit, Montserrat; Parisi, Matthew J; Parisi, Michael; Parts, Leopold; Pedersen, Jakob S; Pesole, Graziano; Phillippy, Adam M; Ponting, Chris P; Pop, Mihai; Porcelli, Damiano; Powell, Jeffrey R; Prohaska, Sonja; Pruitt, Kim; Puig, Marta; Quesneville, Hadi; Ram, Kristipati Ravi; Rand, David; Rasmussen, Matthew D; Reed, Laura K; Reenan, Robert; Reily, Amy; Remington, Karin A; Rieger, Tania T; Ritchie, Michael G; Robin, Charles; Rogers, Yu-Hui; Rohde, Claudia; Rozas, Julio; Rubenfield, Marc J; Ruiz, Alfredo; Russo, Susan; Salzberg, Steven L; Sanchez-Gracia, Alejandro; Saranga, David J; Sato, Hajime; Schaeffer, Stephen W; Schatz, Michael C; Schlenke, Todd; Schwartz, Russell; Segarra, Carmen; Singh, Rama S; Sirot, Laura; Sirota, Marina; Sisneros, Nicholas B; Smith, Chris D; Smith, Temple F; Spieth, John; Stage, Deborah E; Stark, Alexander; Stephan, Wolfgang; Strausberg, Robert L; Strempel, Sebastian; Sturgill, David; Sutton, Granger; Sutton, Granger G; Tao, Wei; Teichmann, Sarah; Tobari, Yoshiko N; Tomimura, Yoshihiko; Tsolas, Jason M; Valente, Vera L S; Venter, Eli; Venter, J Craig; Vicario, Saverio; Vieira, Filipe G; Vilella, Albert J; Villasante, Alfredo; Walenz, Brian; Wang, Jun; Wasserman, Marvin; Watts, Thomas; Wilson, Derek; Wilson, Richard K; Wing, Rod A; Wolfner, Mariana F; Wong, Alex; Wong, Gane Ka-Shu; Wu, Chung-I; Wu, Gabriel; Yamamoto, Daisuke; Yang, Hsiao-Pei; Yang, Shiaw-Pyng; Yorke, James A; Yoshida, Kiyohito; Zdobnov, Evgeny; Zhang, Peili; Zhang, Yu; Zimin, Aleksey V; Baldwin, Jennifer; Abdouelleil, Amr; Abdulkadir, Jamal; Abebe, Adal; Abera, Brikti; Abreu, Justin; Acer, St Christophe; Aftuck, Lynne; Alexander, Allen; An, Peter; Anderson, Erica; Anderson, Scott; Arachi, Harindra; Azer, Marc; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Berlin, Aaron; Bessette, Daniel; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Bourzgui, Imane; Brown, Adam; Cahill, Patrick; Channer, Sheridon; Cheshatsang, Yama; Chuda, Lisa; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Costello, Maura; D'Aco, Katie; Daza, Riza; De Haan, Georgius; DeGray, Stuart; DeMaso, Christina; Dhargay, Norbu; Dooley, Kimberly; Dooley, Erin; Doricent, Missole; Dorje, Passang; Dorjee, Kunsang; Dupes, Alan; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Fisher, Sheila; Foley, Chelsea D; Franke, Alicia; Friedrich, Dennis; Gadbois, Loryn; Gearin, Gary; Gearin, Christina R; Giannoukos, Georgia; Goode, Tina; Graham, Joseph; Grandbois, Edward; Grewal, Sharleen; Gyaltsen, Kunsang; Hafez, Nabil; Hagos, Birhane; Hall, Jennifer; Henson, Charlotte; Hollinger, Andrew; Honan, Tracey; Huard, Monika D; Hughes, Leanne; Hurhula, Brian; Husby, M Erii; Kamat, Asha; Kanga, Ben; Kashin, Seva; Khazanovich, Dmitry; Kisner, Peter; Lance, Krista; Lara, Marcia; Lee, William; Lennon, Niall; Letendre, Frances; LeVine, Rosie; Lipovsky, Alex; Liu, Xiaohong; Liu, Jinlei; Liu, Shangtao; Lokyitsang, Tashi; Lokyitsang, Yeshi; Lubonja, Rakela; Lui, Annie; MacDonald, Pen; Magnisalis, Vasilia; Maru, Kebede; Matthews, Charles; McCusker, William; McDonough, Susan; Mehta, Teena; Meldrim, James; Meneus, Louis; Mihai, Oana; Mihalev, Atanas; Mihova, Tanya; Mittelman, Rachel; Mlenga, Valentine; Montmayeur, Anna; Mulrain, Leonidas; Navidi, Adam; Naylor, Jerome; Negash, Tamrat; Nguyen, Thu; Nguyen, Nga; Nicol, Robert; Norbu, Choe; Norbu, Nyima; Novod, Nathaniel; O'Neill, Barry; Osman, Sahal; Markiewicz, Eva; Oyono, Otero L; Patti, Christopher; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Raghuraman, Sujaa; Rege, Filip; Reyes, Rebecca; Rise, Cecil; Rogov, Peter; Ross, Keenan; Ryan, Elizabeth; Settipalli, Sampath; Shea, Terry; Sherpa, Ngawang; Shi, Lu; Shih, Diana; Sparrow, Todd; Spaulding, Jessica; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Strader, Christopher; Tesfaye, Senait; Thomson, Talene; Thoulutsang, Yama; Thoulutsang, Dawa; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Young, Geneva; Yu, Qing; Zembek, Lisa; Zhong, Danni; Zimmer, Andrew; Zwirko, Zac; Jaffe, David B; Alvarez, Pablo; Brockman, Will; Butler, Jonathan; Chin, CheeWhye; Gnerre, Sante; Grabherr, Manfred; Kleber, Michael; Mauceli, Evan; MacCallum, Iain

    2007-11-08

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

  9. Gene enrichment in plant genomic shotgun libraries.

    Science.gov (United States)

    Rabinowicz, Pablo D; McCombie, W Richard; Martienssen, Robert A

    2003-04-01

    The Arabidopsis genome (about 130 Mbp) has been completely sequenced; whereas a draft sequence of the rice genome (about 430 Mbp) is now available and the sequencing of this genome will be completed in the near future. The much larger genomes of several important crop species, such as wheat (about 16,000 Mbp) or maize (about 2500 Mbp), may not be fully sequenced with current technology. Instead, sequencing-analysis strategies are being developed to obtain sequencing and mapping information selectively for the genic fraction (gene space) of complex plant genomes.

  10. Murasaki: a fast, parallelizable algorithm to find anchors from multiple genomes.

    Directory of Open Access Journals (Sweden)

    Kris Popendorf

    Full Text Available BACKGROUND: With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. METHODOLOGY/PRINCIPAL FINDINGS: Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1 adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2 parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow in 21 hours CPU time (42 minutes wall time. This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. CONCLUSIONS/SIGNIFICANCE: Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with

  11. Murasaki: a fast, parallelizable algorithm to find anchors from multiple genomes.

    Science.gov (United States)

    Popendorf, Kris; Tsuyoshi, Hachiya; Osana, Yasunori; Sakakibara, Yasubumi

    2010-09-24

    With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1) adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds) and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2) parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow) in 21 hours CPU time (42 minutes wall time). This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with computational efficiency significantly greater than existing methods. Murasaki is available

  12. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss.

    Science.gov (United States)

    den Bakker, Henk C; Cummings, Craig A; Ferreira, Vania; Vatta, Paolo; Orsi, Renato H; Degoricija, Lovorka; Barker, Melissa; Petrauskene, Olga; Furtado, Manohar R; Wiedmann, Martin

    2010-12-02

    The bacterial genus Listeria contains pathogenic and non-pathogenic species, including the pathogens L. monocytogenes and L. ivanovii, both of which carry homologous virulence gene clusters such as the prfA cluster and clusters of internalin genes. Initial evidence for multiple deletions of the prfA cluster during the evolution of Listeria indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains. To better understand genome evolution and evolution of virulence characteristics in Listeria, we used a next generation sequencing approach to generate draft genomes for seven strains representing Listeria species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main Listeria species, showed evidence for (i) a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii) a critical role of gene loss events in transition of Listeria species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii) divergence of modern pathogenic and non-pathogenic Listeria species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes. Genome evolution in Listeria involved limited gene loss and acquisition as supported by (i) a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii) conserved genome size (between 2.8 and 3.2 Mb), and (iii) a highly syntenic genome. Limited gene loss in Listeria did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus Listeria thus provides an example of a group of

  13. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss

    Directory of Open Access Journals (Sweden)

    Barker Melissa

    2010-12-01

    Full Text Available Abstract Background The bacterial genus Listeria contains pathogenic and non-pathogenic species, including the pathogens L. monocytogenes and L. ivanovii, both of which carry homologous virulence gene clusters such as the prfA cluster and clusters of internalin genes. Initial evidence for multiple deletions of the prfA cluster during the evolution of Listeria indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains. Results To better understand genome evolution and evolution of virulence characteristics in Listeria, we used a next generation sequencing approach to generate draft genomes for seven strains representing Listeria species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main Listeria species, showed evidence for (i a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii a critical role of gene loss events in transition of Listeria species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii divergence of modern pathogenic and non-pathogenic Listeria species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes. Conclusions Genome evolution in Listeria involved limited gene loss and acquisition as supported by (i a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii conserved genome size (between 2.8 and 3.2 Mb, and (iii a highly syntenic genome. Limited gene loss in Listeria did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus

  14. Gene conversion in the rice genome

    DEFF Research Database (Denmark)

    Xu, Shuqing; Clark, Terry; Zheng, Hongkun;

    2008-01-01

    BACKGROUND: Gene conversion causes a non-reciprocal transfer of genetic information between similar sequences. Gene conversion can both homogenize genes and recruit point mutations thereby shaping the evolution of multigene families. In the rice genome, the large number of duplicated genes...... is not tightly linked to natural selection in the rice genome. To assess the contribution of segmental duplication on gene conversion statistics, we determined locations of conversion partners with respect to inter-chromosomal segment duplication. The number of conversions associated with segmentation is less...

  15. Clustering of gene ontology terms in genomes.

    Science.gov (United States)

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. Copyright © 2014 Elsevier B.V. All

  16. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  17. Pinpointing disease genes through phenomic and genomic data fusion.

    Science.gov (United States)

    Jiang, Rui; Wu, Mengmeng; Li, Lianshuo

    2015-01-01

    Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology.

  18. Genomic evidence for adaptation by gene duplication.

    Science.gov (United States)

    Qian, Wenfeng; Zhang, Jianzhi

    2014-08-01

    Gene duplication is widely believed to facilitate adaptation, but unambiguous evidence for this hypothesis has been found in only a small number of cases. Although gene duplication may increase the fitness of the involved organisms by doubling gene dosage or neofunctionalization, it may also result in a simple division of ancestral functions into daughter genes, which need not promote adaptation. Hence, the general validity of the adaptation by gene duplication hypothesis remains uncertain. Indeed, a genome-scale experiment found similar fitness effects of deleting pairs of duplicate genes and deleting individual singleton genes from the yeast genome, leading to the conclusion that duplication rarely results in adaptation. Here we contend that the above comparison is unfair because of a known duplication bias among genes with different fitness contributions. To rectify this problem, we compare homologous genes from the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. We discover that simultaneously deleting a duplicate gene pair in S. cerevisiae reduces fitness significantly more than deleting their singleton counterpart in S. pombe, revealing post-duplication adaptation. The duplicates-singleton difference in fitness effect is not attributable to a potential increase in gene dose after duplication, suggesting that the adaptation is owing to neofunctionalization, which we find to be explicable by acquisitions of binary protein-protein interactions rather than gene expression changes. These results provide genomic evidence for the role of gene duplication in organismal adaptation and are important for understanding the genetic mechanisms of evolutionary innovation.

  19. apex: phylogenetics with multiple genes.

    Science.gov (United States)

    Jombart, Thibaut; Archer, Frederick; Schliep, Klaus; Kamvar, Zhian; Harris, Rebecca; Paradis, Emmanuel; Goudet, Jérome; Lapp, Hilmar

    2017-01-01

    Genetic sequences of multiple genes are becoming increasingly common for a wide range of organisms including viruses, bacteria and eukaryotes. While such data may sometimes be treated as a single locus, in practice, a number of biological and statistical phenomena can lead to phylogenetic incongruence. In such cases, different loci should, at least as a preliminary step, be examined and analysed separately. The r software has become a popular platform for phylogenetics, with several packages implementing distance-based, parsimony and likelihood-based phylogenetic reconstruction, and an even greater number of packages implementing phylogenetic comparative methods. Unfortunately, basic data structures and tools for analysing multiple genes have so far been lacking, thereby limiting potential for investigating phylogenetic incongruence. In this study, we introduce the new r package apex to fill this gap. apex implements new object classes, which extend existing standards for storing DNA and amino acid sequences, and provides a number of convenient tools for handling, visualizing and analysing these data. In this study, we introduce the main features of the package and illustrate its functionalities through the analysis of a simple data set.

  20. Gene finding in the chicken genome

    Directory of Open Access Journals (Sweden)

    Antonarakis Stylianos E

    2005-05-01

    Full Text Available Abstract Background Despite the continuous production of genome sequence for a number of organisms, reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularly true for genomes for which there is not a large collection of known gene sequences, such as the recently published chicken genome. We used the chicken sequence to test comparative and homology-based gene-finding methods followed by experimental validation as an effective genome annotation method. Results We performed experimental evaluation by RT-PCR of three different computational gene finders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram was computed and each component of it was evaluated. The results showed that de novo comparative methods can identify up to about 700 chicken genes with no previous evidence of expression, and can correctly extend about 40% of homology-based predictions at the 5' end. Conclusions De novo comparative gene prediction followed by experimental verification is effective at enhancing the annotation of the newly sequenced genomes provided by standard homology-based methods.

  1. A transcription factor map as revealed by a genome-wide gene expression analysis of whole-blood mRNA transcriptome in multiple sclerosis.

    Directory of Open Access Journals (Sweden)

    Carlos Riveros

    Full Text Available BACKGROUND: Several lines of evidence suggest that transcription factors are involved in the pathogenesis of Multiple Sclerosis (MS but complete mapping of the whole network has been elusive. One of the reasons is that there are several clinical subtypes of MS and transcription factors that may be involved in one subtype may not be in others. We investigate the possibility that this network could be mapped using microarray technologies and contemporary bioinformatics methods on a dataset derived from whole blood in 99 untreated MS patients (36 Relapse Remitting MS, 43 Primary Progressive MS, and 20 Secondary Progressive MS and 45 age-matched healthy controls. METHODOLOGY/PRINCIPAL FINDINGS: We have used two different analytical methodologies: a non-standard differential expression analysis and a differential co-expression analysis, which have converged on a significant number of regulatory motifs that are statistically overrepresented in genes that are either differentially expressed (or differentially co-expressed in cases and controls (e.g., V$KROX_Q6, p-value <3.31E-6; V$CREBP1_Q2, p-value <9.93E-6, V$YY1_02, p-value <1.65E-5. CONCLUSIONS/SIGNIFICANCE: Our analysis uncovered a network of transcription factors that potentially dysregulate several genes in MS or one or more of its disease subtypes. The most significant transcription factor motifs were for the Early Growth Response EGR/KROX family, ATF2, YY1 (Yin and Yang 1, E2F-1/DP-1 and E2F-4/DP-2 heterodimers, SOX5, and CREB and ATF families. These transcription factors are involved in early T-lymphocyte specification and commitment as well as in oligodendrocyte dedifferentiation and development, both pathways that have significant biological plausibility in MS causation.

  2. Comparative genomic analysis of soybean flowering genes.

    Directory of Open Access Journals (Sweden)

    Chol-Hee Jung

    Full Text Available Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant

  3. Genomic disorders: A window into human gene and genome evolution

    Science.gov (United States)

    Carvalho, Claudia M. B.; Zhang, Feng; Lupski, James R.

    2010-01-01

    Gene duplications alter the genetic constitution of organisms and can be a driving force of molecular evolution in humans and the great apes. In this context, the study of genomic disorders has uncovered the essential role played by the genomic architecture, especially low copy repeats (LCRs) or segmental duplications (SDs). In fact, regardless of the mechanism, LCRs can mediate or stimulate rearrangements, inciting genomic instability and generating dynamic and unstable regions prone to rapid molecular evolution. In humans, copy-number variation (CNV) has been implicated in common traits such as neuropathy, hypertension, color blindness, infertility, and behavioral traits including autism and schizophrenia, as well as disease susceptibility to HIV, lupus nephritis, and psoriasis among many other clinical phenotypes. The same mechanisms implicated in the origin of genomic disorders may also play a role in the emergence of segmental duplications and the evolution of new genes by means of genomic and gene duplication and triplication, exon shuffling, exon accretion, and fusion/fission events. PMID:20080665

  4. A genome-wide association study in multiple system atrophy

    Science.gov (United States)

    Sailer, Anna; Nalls, Michael A.; Schulte, Claudia; Federoff, Monica; Price, T. Ryan; Lees, Andrew; Ross, Owen A.; Dickson, Dennis W.; Mok, Kin; Mencacci, Niccolo E.; Schottlaender, Lucia; Chelban, Viorica; Ling, Helen; O'Sullivan, Sean S.; Wood, Nicholas W.; Traynor, Bryan J.; Ferrucci, Luigi; Federoff, Howard J.; Mhyre, Timothy R.; Morris, Huw R.; Deuschl, Günther; Quinn, Niall; Widner, Hakan; Albanese, Alberto; Infante, Jon; Bhatia, Kailash P.; Poewe, Werner; Oertel, Wolfgang; Höglinger, Günter U.; Wüllner, Ullrich; Goldwurm, Stefano; Pellecchia, Maria Teresa; Ferreira, Joaquim; Tolosa, Eduardo; Bloem, Bastiaan R.; Rascol, Olivier; Meissner, Wassilios G.; Hardy, John A.; Revesz, Tamas; Holton, Janice L.; Gasser, Thomas; Wenning, Gregor K.; Singleton, Andrew B.

    2016-01-01

    Objective: To identify genetic variants that play a role in the pathogenesis of multiple system atrophy (MSA), we undertook a genome-wide association study (GWAS). Methods: We performed a GWAS with >5 million genotyped and imputed single nucleotide polymorphisms (SNPs) in 918 patients with MSA of European ancestry and 3,864 controls. MSA cases were collected from North American and European centers, one third of which were neuropathologically confirmed. Results: We found no significant loci after stringent multiple testing correction. A number of regions emerged as potentially interesting for follow-up at p < 1 × 10−6, including SNPs in the genes FBXO47, ELOVL7, EDN1, and MAPT. Contrary to previous reports, we found no association of the genes SNCA and COQ2 with MSA. Conclusions: We present a GWAS in MSA. We have identified several potentially interesting gene loci, including the MAPT locus, whose significance will have to be evaluated in a larger sample set. Common genetic variation in SNCA and COQ2 does not seem to be associated with MSA. In the future, additional samples of well-characterized patients with MSA will need to be collected to perform a larger MSA GWAS, but this initial study forms the basis for these next steps. PMID:27629089

  5. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    cells are capable of regulating their gene expression, so that each cell can only express a particular set of genes yielding limited numbers of proteins with specialized functions. Therefore a rigid control of differential gene expression is necessary for cellular diversity. On the other hand, aberrant...... gene regulation will disrupt the cell’s fundamental processes, which in turn can cause disease. Hence, understanding gene regulation is essential for deciphering the code of life. Along with the development of high throughput sequencing (HTS) technology and the subsequent large-scale data analysis......, genome-wide assays have increased our understanding of gene regulation significantly. This thesis describes the integration and analysis of HTS data across different important aspects of gene regulation. Gene expression can be regulated at different stages when the genetic information is passed from gene...

  6. Genome editing for human gene therapy.

    Science.gov (United States)

    Meissner, Torsten B; Mandal, Pankaj K; Ferreira, Leonardo M R; Rossi, Derrick J; Cowan, Chad A

    2014-01-01

    The rapid advancement of genome-editing techniques holds much promise for the field of human gene therapy. From bacteria to model organisms and human cells, genome editing tools such as zinc-finger nucleases (ZNFs), TALENs, and CRISPR/Cas9 have been successfully used to manipulate the respective genomes with unprecedented precision. With regard to human gene therapy, it is of great interest to test the feasibility of genome editing in primary human hematopoietic cells that could potentially be used to treat a variety of human genetic disorders such as hemoglobinopathies, primary immunodeficiencies, and cancer. In this chapter, we explore the use of the CRISPR/Cas9 system for the efficient ablation of genes in two clinically relevant primary human cell types, CD4+ T cells and CD34+ hematopoietic stem and progenitor cells. By using two guide RNAs directed at a single locus, we achieve highly efficient and predictable deletions that ablate gene function. The use of a Cas9-2A-GFP fusion protein allows FACS-based enrichment of the transfected cells. The ease of designing, constructing, and testing guide RNAs makes this dual guide strategy an attractive approach for the efficient deletion of clinically relevant genes in primary human hematopoietic stem and effector cells and enables the use of CRISPR/Cas9 for gene therapy.

  7. Tandemly Arrayed Genes in Vertebrate Genomes

    Directory of Open Access Journals (Sweden)

    Deng Pan

    2008-01-01

    Full Text Available Tandemly arrayed genes (TAGs are duplicated genes that are linked as neighbors on a chromosome, many of which have important physiological and biochemical functions. Here we performed a survey of these genes in 11 available vertebrate genomes. TAGs account for an average of about 14% of all genes in these vertebrate genomes, and about 25% of all duplications. The majority of TAGs (72–94% have parallel transcription orientation (i.e., they are encoded on the same strand in contrast to the genome, which has about 50% of its genes in parallel transcription orientation. The majority of tandem arrays have only two members. In all species, the proportion of genes that belong to TAGs tends to be higher in large gene families than in small ones; together with our recent finding that tandem duplication played a more important role than retroposition in large families, this fact suggests that among all types of duplication mechanisms, tandem duplication is the predominant mechanism of duplication, especially in large families. Finally, several species have a higher proportion of large tandem arrays that are species-specific than random expectation.

  8. Bidirectional promoters of insects: genome-wide comparison, evolutionary implication and influence on gene expression.

    Science.gov (United States)

    Behura, Susanta K; Severson, David W

    2015-01-30

    Bidirectional promoters are widespread in insect genomes. By analyzing 23 insect genomes we show that the frequency of bidirectional gene pairs varies according to genome compactness and density of genes among the species. The density of bidirectional genes expected based on number of genes per megabase of genome explains the observed density suggesting that bidirectional pairing of genes may be due to random event. We identified specific transcription factor binding motifs that are enriched in bidirectional promoters across insect species. Furthermore, we observed that bidirectional promoters may act as transcriptional hotspots in insect genomes where protein coding genes tend to aggregate in significantly biased (p promoters. Natural selection seems to have an association with the extent of bidirectionality of genes among the species. The rate of non-synonymous-to-synonymous changes (dN/dS) shows a second-order polynomial distribution with bidirectionality between species indicating that bidirectionality is dependent upon evolutionary pressure acting on the genomes. Analysis of genome-wide microarray expression data of multiple insect species suggested that bidirectionality has a similar association with transcriptome variation across species. Furthermore, bidirectional promoters show significant association with correlated expression of the divergent gene pairs depending upon their motif composition. Analysis of gene ontology showed that bidirectional genes tend to have a common association with functions related to "binding" (including ion binding, nucleotide binding and protein binding) across genomes. Such functional constraint of bidirectional genes may explain their widespread persistence in genome of diverse insect species.

  9. Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms

    Directory of Open Access Journals (Sweden)

    Meller Jaroslaw

    2007-03-01

    Full Text Available Abstract Background Identifying syntenic regions, i.e., blocks of genes or other markers with evolutionary conserved order, and quantifying evolutionary relatedness between genomes in terms of chromosomal rearrangements is one of the central goals in comparative genomics. However, the analysis of synteny and the resulting assessment of genome rearrangements are sensitive to the choice of a number of arbitrary parameters that affect the detection of synteny blocks. In particular, the choice of a set of markers and the effect of different aggregation strategies, which enable coarse graining of synteny blocks and exclusion of micro-rearrangements, need to be assessed. Therefore, existing tools and resources that facilitate identification, visualization and analysis of synteny need to be further improved to provide a flexible platform for such analysis, especially in the context of multiple genomes. Results We present a new tool, Cinteny, for fast identification and analysis of synteny with different sets of markers and various levels of coarse graining of syntenic blocks. Using Hannenhalli-Pevzner approach and its extensions, Cinteny also enables interactive determination of evolutionary relationships between genomes in terms of the number of rearrangements (the reversal distance. In particular, Cinteny provides: i integration of synteny browsing with assessment of evolutionary distances for multiple genomes; ii flexibility to adjust the parameters and re-compute the results on-the-fly; iii ability to work with user provided data, such as orthologous genes, sequence tags or other conserved markers. In addition, Cinteny provides many annotated mammalian, invertebrate and fungal genomes that are pre-loaded and available for analysis at http://cinteny.cchmc.org. Conclusion Cinteny allows one to automatically compare multiple genomes and perform sensitivity analysis for synteny block detection and for the subsequent computation of reversal distances

  10. Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    Science.gov (United States)

    Uchiyama, Ikuo

    2008-10-31

    Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  11. Multiple genome alignment for identifying the core structure among moderately related microbial genomes

    Directory of Open Access Journals (Sweden)

    Uchiyama Ikuo

    2008-10-01

    Full Text Available Abstract Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  12. Genome-wide patterns of Arabidopsis gene expression in nature.

    Directory of Open Access Journals (Sweden)

    Christina L Richards

    Full Text Available Organisms in the wild are subject to multiple, fluctuating environmental factors, and it is in complex natural environments that genetic regulatory networks actually function and evolve. We assessed genome-wide gene expression patterns in the wild in two natural accessions of the model plant Arabidopsis thaliana and examined the nature of transcriptional variation throughout its life cycle and gene expression correlations with natural environmental fluctuations. We grew plants in a natural field environment and measured genome-wide time-series gene expression from the plant shoot every three days, spanning the seedling to reproductive stages. We find that 15,352 genes were expressed in the A. thaliana shoot in the field, and accession and flowering status (vegetative versus flowering were strong components of transcriptional variation in this plant. We identified between ∼110 and 190 time-varying gene expression clusters in the field, many of which were significantly overrepresented by genes regulated by abiotic and biotic environmental stresses. The two main principal components of vegetative shoot gene expression (PC(veg correlate to temperature and precipitation occurrence in the field. The largest PC(veg axes included thermoregulatory genes while the second major PC(veg was associated with precipitation and contained drought-responsive genes. By exposing A. thaliana to natural environments in an open field, we provide a framework for further understanding the genetic networks that are deployed in natural environments, and we connect plant molecular genetics in the laboratory to plant organismal ecology in the wild.

  13. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  14. 3D Genome Tuner: Compare Multiple Circular Genomes in a 3D Context

    Institute of Scientific and Technical Information of China (English)

    Qi Wang; Qun Liang; Xiuqing Zhang

    2009-01-01

    Circular genomes, being the largest proportion of sequenced genomes, play an important role in genome analysis. However, traditional 2D circular map only provides an overview and annotations of genome but does not offer feature-based comparison. For remedying these shortcomings, we developed 3D Genome Tuner, a hybrid of circular map and comparative map tools. Its capability of viewing comparisons between multiple circular maps in a 3D space offers great benefits to the study of comparative genomics. The program is freely available(under an LGPL licence)at http://sourceforge.net/projects/dgenometuner.

  15. Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes.

    Science.gov (United States)

    Dunning Hotopp, Julie C; Grifantini, Renata; Kumar, Nikhil; Tzeng, Yih Ling; Fouts, Derrick; Frigimelica, Elisabetta; Draghi, Monia; Giuliani, Marzia Monica; Rappuoli, Rino; Stephens, David S; Grandi, Guido; Tettelin, Hervé

    2006-12-01

    To better understand Neisseria meningitidis genomes and virulence, microarray comparative genome hybridization (mCGH) data were collected from one Neisseria cinerea, two Neisseria lactamica, two Neisseria gonorrhoeae and 48 Neisseria meningitidis isolates. For N. meningitidis, these isolates are from diverse clonal complexes, invasive and carriage strains, and all major serogroups. The microarray platform represented N. meningitidis strains MC58, Z2491 and FAM18, and N. gonorrhoeae FA1090. By comparing hybridization data to genome sequences, the core N. meningitidis genome and insertions/deletions (e.g. capsule locus, type I secretion system) related to pathogenicity were identified, including further characterization of the capsule locus, bioinformatics analysis of a type I secretion system, and identification of some metabolic pathways associated with intracellular survival in pathogens. Hybridization data clustered meningococcal isolates from similar clonal complexes that were distinguished by the differential presence of six distinct islands of horizontal transfer. Several of these islands contained prophage or other mobile elements, including a novel prophage and a transposon carrying portions of a type I secretion system. Acquisition of some genetic islands appears to have occurred in multiple lineages, including transfer between N. lactamica and N. meningitidis. However, island acquisition occurs infrequently, such that the genomic-level relationship is not obscured within clonal complexes. The N. meningitidis genome is characterized by the horizontal acquisition of multiple genetic islands; the study of these islands reveals important sets of genes varying between isolates and likely to be related to pathogenicity.

  16. Genome-wide association study in a high-risk isolate for multiple sclerosis reveals associated variants in STAT3 gene

    DEFF Research Database (Denmark)

    Jakkula, Eveliina; Leppä, Virpi; Sulonen, Anna-Maija

    2010-01-01

    Genetic risk for multiple sclerosis (MS) is thought to involve both common and rare risk alleles. Recent GWAS and subsequent meta-analysis have established the critical role of the HLA locus and identified new common variants associated to MS. These variants have small odds ratios (ORs) and expla...

  17. Genomics of the human carnitine acyltransferase genes

    NARCIS (Netherlands)

    van der Leij, FR; Huijkman, NCA; Boomsma, C; Kuipers, JRG; Bartelds, B

    2000-01-01

    Five genes in the human genome are known to encode different active forms of related carnitine acyltransferases: CPT1A for liver-type carnitine palmitoyltransferase I, CPT1B for muscle-type carnitine palmitoyltransferase I, CPT2 for carnitine palmitoyltransferase II, CROT for carnitine octanoyltrans

  18. Genome-wide association study in a high-risk isolate for multiple sclerosis reveals associated variants in STAT3 gene

    DEFF Research Database (Denmark)

    Jakkula, Eveliina; Leppä, Virpi; Sulonen, Anna-Maija;

    2010-01-01

    in 711 cases and 1029 controls from Finland, and the top two findings were validated in 3859 cases and 9110 controls from more heterogeneous populations. SNP (rs744166) within the STAT3 gene was associated to MS (p = 2.75 x 10(-10), OR 0.87, confidence interval 0.83-0.91). The protective haplotype for MS...

  19. Whole genome amplification of DNA for genotyping pharmacogenetics candidate genes.

    Directory of Open Access Journals (Sweden)

    Santosh ePhilips

    2012-03-01

    Full Text Available Whole genome amplification (WGA technologies can be used to amplify genomic DNA when only small amounts of DNA are available. The Multiple Displacement Amplification Phi polymerase based amplification has been shown to accurately amplify DNA for a variety of genotyping assays; however, it has not been tested for genotyping many of the clinically relevant genes important for pharmacogenetic studies, such as the cytochrome P450 genes, that are typically difficult to genotype due to multiple pseudogenes, copy number variations, and high similarity to other related genes. We evaluated whole genome amplified samples for Taqman™ genotyping of SNPs in a variety of pharmacogenetic genes. In 24 DNA samples from the Coriell human diversity panel, the call rates and concordance between amplified (~200-fold amplification and unamplified samples was 100% for two SNPs in CYP2D6 and one in ESR1. In samples from a breast cancer clinical trial (Trial 1, we compared the genotyping results in samples before and after WGA for four SNPs in CYP2D6, one SNP in CYP2C19, one SNP in CYP19A1, two SNPs in ESR1, and two SNPs in ESR2. The concordance rates were all >97%. Finally, we compared the allele frequencies of 143 SNPs determined in Trial 1 (whole genome amplified DNA to the allele frequencies determined in unamplified DNA samples from a separate trial (Trial 2 that enrolled a similar population. The call rates and allele frequencies between the two trials were 98% and 99.7%, respectively. We conclude that the whole genome amplified DNA is suitable for Taqman™ genotyping for a wide variety of pharmacogenetically relevant SNPs.

  20. Genome-Wide Associations of Gene Expression Variation in Humans.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  1. Genome-wide associations of gene expression variation in humans.

    Directory of Open Access Journals (Sweden)

    Barbara E Stranger

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  2. The diversity of cyanobacterial metabolism: genome analysis of multiple phototrophic microorganisms

    Directory of Open Access Journals (Sweden)

    Beck Christian

    2012-02-01

    Full Text Available Abstract Background Cyanobacteria are among the most abundant organisms on Earth and represent one of the oldest and most widespread clades known in modern phylogenetics. As the only known prokaryotes capable of oxygenic photosynthesis, cyanobacteria are considered to be a promising resource for renewable fuels and natural products. Our efforts to harness the sun's energy using cyanobacteria would greatly benefit from an increased understanding of the genomic diversity across multiple cyanobacterial strains. In this respect, the advent of novel sequencing techniques and the availability of several cyanobacterial genomes offers new opportunities for understanding microbial diversity and metabolic organization and evolution in diverse environments. Results Here, we report a whole genome comparison of multiple phototrophic cyanobacteria. We describe genetic diversity found within cyanobacterial genomes, specifically with respect to metabolic functionality. Our results are based on pair-wise comparison of protein sequences and concomitant construction of clusters of likely ortholog genes. We differentiate between core, shared and unique genes and show that the majority of genes are associated with a single genome. In contrast, genes with metabolic function are strongly overrepresented within the core genome that is common to all considered strains. The analysis of metabolic diversity within core carbon metabolism reveals parts of the metabolic networks that are highly conserved, as well as highly fragmented pathways. Conclusions Our results have direct implications for resource allocation and further sequencing projects. It can be extrapolated that the number of newly identified genes still significantly increases with increasing number of new sequenced genomes. Furthermore, genome analysis of multiple phototrophic strains allows us to obtain a detailed picture of metabolic diversity that can serve as a starting point for biotechnological

  3. Multidimensional gene set analysis of genomic data.

    Directory of Open Access Journals (Sweden)

    David Montaner

    Full Text Available Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms in response to one particular variable (e.g. differential gene expression. In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc. simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.

  4. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    Directory of Open Access Journals (Sweden)

    Yunsheng Wang

    Full Text Available In this study, we identified and compared nucleotide-binding site (NBS domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China. Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  5. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    Science.gov (United States)

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  6. Multiple genomic recombination events in the evolution of saffold cardiovirus.

    Directory of Open Access Journals (Sweden)

    Lili Ren

    Full Text Available BACKGROUND: Saffold cardiovirus (SAFV is a new human cardiovirus with 11 identified genotypes. Little is known about the natural history and pathogenicity of SAFVs. METHODOLOGY/PRINCIPAL FINDINGS: We sequenced the genome of five SAFV-1 strains which were identified from fecal samples taken from children with viral diarrhea in Beijing, China between March 2006 and November 2007, and analyzed the phylogenetic and phylodynamic properties of SAFVs using the genome sequences of every known SAFV genotypes. We identified multiple recombination events in our SAFV-1 strains, specifically recombination between SAFV-2, -3, -4, -9, -10 and the prototype SAFV-1 strain in the VP4 region and recombination between SAFV-4, -6, -8, -10, -11 and prototype SAFV-1 in the VP1/2A region. Notably, recombination in the structural gene VP4 is a rare event in Cardiovirus. The ratio of nonsynonymous substitutions to synonymous substitutions indicates a purifying selection of the SAFV genome. Phylogenetic and molecular clock analysis indicates the existence of at least two subclades of SAFV-1 with different origins. Subclade 1 includes two strains isolated from Pakistan, whereas subclade 2 includes the prototype strain and strains isolated in China, Pakistan, and Afghanistan. The most recent common ancestor of all SAFV genotypes dates to the 1710s, and SAFV-1, -2, and -3 to the 1940s, 1950s, and 1960s, respectively. No obvious relationship between variation and pathogenicity exists in the critical domains of the CD and EF loops of viral capsid proteins or the multi-functional proteins L based on amino acid sequence identity comparison between SAFV genotypes. CONCLUSIONS/SIGNIFICANCE: Our findings suggest that intertypic recombination plays an important role in the diversity of SAFVs, highlighting the diversity of the five strains with the previously described SAFV-1 strains.

  7. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

    Science.gov (United States)

    Arnaud, Martha B; Chibucos, Marcus C; Costanzo, Maria C; Crabtree, Jonathan; Inglis, Diane O; Lotia, Adil; Orvis, Joshua; Shah, Prachi; Skrzypek, Marek S; Binkley, Gail; Miyasato, Stuart R; Wortman, Jennifer R; Sherlock, Gavin

    2010-01-01

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

  8. Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data

    Science.gov (United States)

    Ko, Hyoseok; Kim, Kipoong

    2016-01-01

    In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's T2 test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso.

  9. Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models

    Directory of Open Access Journals (Sweden)

    Smith Terry J

    2004-03-01

    Full Text Available Abstract Background Many current gene prediction methods use only one model to represent protein-coding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation. Results This work explores a new approach to gene-prediction, based on the Self-Organizing Map, which has the ability to automatically identify multiple gene models within a genome. The current implementation, named RescueNet, uses relative synonymous codon usage as the indicator of protein-coding potential. Conclusions While its raw accuracy rate can be less than other methods, RescueNet consistently identifies some genes that other methods do not, and should therefore be of interest to gene-prediction software developers and genome annotation teams alike. RescueNet is recommended for use in conjunction with, or as a complement to, other gene prediction methods.

  10. The Role of Multiple Transcription Factors In Archaeal Gene Expression

    Energy Technology Data Exchange (ETDEWEB)

    Charles J. Daniels

    2008-09-23

    Since the inception of this research program, the project has focused on two central questions: What is the relationship between the 'eukaryal-like' transcription machinery of archaeal cells and its counterparts in eukaryal cells? And, how does the archaeal cell control gene expression using its mosaic of eukaryal core transcription machinery and its bacterial-like transcription regulatory proteins? During the grant period we have addressed these questions using a variety of in vivo approaches and have sought to specifically define the roles of the multiple TATA binding protein (TBP) and TFIIB-like (TFB) proteins in controlling gene expression in Haloferax volcanii. H. volcanii was initially chosen as a model for the Archaea based on the availability of suitable genetic tools; however, later studies showed that all haloarchaea possessed multiple tbp and tfb genes, which led to the proposal that multiple TBP and TFB proteins may function in a manner similar to alternative sigma factors in bacterial cells. In vivo transcription and promoter analysis established a clear relationship between the promoter requirements of haloarchaeal genes and those of the eukaryal RNA polymerase II promoter. Studies on heat shock gene promoters, and the demonstration that specific tfb genes were induced by heat shock, provided the first indication that TFB proteins may direct expression of specific gene families. The construction of strains lacking tbp or tfb genes, coupled with the finding that many of these genes are differentially expressed under varying growth conditions, provided further support for this model. Genetic tools were also developed that led to the construction of insertion and deletion mutants, and a novel gene expression scheme was designed that allowed the controlled expression of these genes in vivo. More recent studies have used a whole genome array to examine the expression of these genes and we have established a linkage between the expression of

  11. Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome.

    Science.gov (United States)

    Kim, Woonsu; Park, Hyesun; Seo, Seongwon

    2016-01-01

    The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle.

  12. Simultaneous clustering of multiple gene expression and physical interaction datasets.

    Directory of Open Access Journals (Sweden)

    Manikandan Narayanan

    2010-04-01

    Full Text Available Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.

  13. Extensive error in the number of genes inferred from draft genome assemblies.

    Directory of Open Access Journals (Sweden)

    James F Denton

    2014-12-01

    Full Text Available Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.

  14. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...

  15. Genome-Wide Architecture of Disease Resistance Genes in Lettuce.

    Science.gov (United States)

    Christopoulou, Marilena; Wo, Sebastian Reyes-Chin; Kozik, Alex; McHale, Leah K; Truco, Maria-Jose; Wroblewski, Tadeusz; Michelmore, Richard W

    2015-10-08

    Genome-wide motif searches identified 1134 genes in the lettuce reference genome of cv. Salinas that are potentially involved in pathogen recognition, of which 385 were predicted to encode nucleotide binding-leucine rich repeat receptor (NLR) proteins. Using a maximum-likelihood approach, we grouped the NLRs into 25 multigene families and 17 singletons. Forty-one percent of these NLR-encoding genes belong to three families, the largest being RGC16 with 62 genes in cv. Salinas. The majority of NLR-encoding genes are located in five major resistance clusters (MRCs) on chromosomes 1, 2, 3, 4, and 8 and cosegregate with multiple disease resistance phenotypes. Most MRCs contain primarily members of a single NLR gene family but a few are more complex. MRC2 spans 73 Mb and contains 61 NLRs of six different gene families that cosegregate with nine disease resistance phenotypes. MRC3, which is 25 Mb, contains 22 RGC21 genes and colocates with Dm13. A library of 33 transgenic RNA interference tester stocks was generated for functional analysis of NLR-encoding genes that cosegregated with disease resistance phenotypes in each of the MRCs. Members of four NLR-encoding families, RGC1, RGC2, RGC21, and RGC12 were shown to be required for 16 disease resistance phenotypes in lettuce. The general composition of MRCs is conserved across different genotypes; however, the specific repertoire of NLR-encoding genes varied particularly of the rapidly evolving Type I genes. These tester stocks are valuable resources for future analyses of additional resistance phenotypes. Copyright © 2015 Christopoulou et al.

  16. Multiple roles of genome-attached bacteriophage terminal proteins

    Energy Technology Data Exchange (ETDEWEB)

    Redrejo-Rodríguez, Modesto; Salas, Margarita, E-mail: msalas@cbm.csic.es

    2014-11-15

    Protein-primed replication constitutes a generalized mechanism to initiate DNA or RNA synthesis in linear genomes, including viruses, gram-positive bacteria, linear plasmids and mobile elements. By this mechanism a specific amino acid primes replication and becomes covalently linked to the genome ends. Despite the fact that TPs lack sequence homology, they share a similar structural arrangement, with the priming residue in the C-terminal half of the protein and an accumulation of positively charged residues at the N-terminal end. In addition, various bacteriophage TPs have been shown to have DNA-binding capacity that targets TPs and their attached genomes to the host nucleoid. Furthermore, a number of bacteriophage TPs from different viral families and with diverse hosts also contain putative nuclear localization signals and localize in the eukaryotic nucleus, which could lead to the transport of the attached DNA. This suggests a possible role of bacteriophage TPs in prokaryote-to-eukaryote horizontal gene transfer. - Highlights: • Protein-primed genome replication constitutes a strategy to initiate DNA or RNA synthesis in linear genomes. • Bacteriophage terminal proteins (TPs) are covalently attached to viral genomes by their primary function priming DNA replication. • TPs are also DNA-binding proteins and target phage genomes to the host nucleoid. • TPs can also localize in the eukaryotic nucleus and may have a role in phage-mediated interkingdom gene transfer.

  17. Bioinformatics Assisted Gene Discovery and Annotation of Human Genome

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    As the sequencing stage of human genome project is near the end, the work has begun for discovering novel genes from genome sequences and annotating their biological functions. Here are reviewed current major bioinformatics tools and technologies available for large scale gene discovery and annotation from human genome sequences. Some ideas about possible future development are also provided.

  18. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    IP-seq and small RNA-seq, we delineated the landscape of the promoters with bidirectional transcriptions that yield steady-state RNA in only one directions (Paper III). A subsequent motif analysis enabled us to uncover specific DNA signals – early polyA sites – that make RNA on the reverse strand sensitive...... they regulated or if the sites had global elevated usage rates by multiple TFs. Using RNA-seq, 5’end-seq in combination with depletion of 5’exonuclease as well as nonsensemediated decay (NMD) factors, we systematically analyzed NMD substrates as well as their degradation intermediates in human cells (Paper V......). Gene enrichment analysis on the detected NMD substrates revealed an unappreciated NMD-based regulatory mechanism of the genes hosting multiple intronic snoRNAs, which can facilitate differential expression of individual snoRNAs from a single host gene locus. Finally, supported by RNA-seq and small RNA-seq...

  19. Genomic variation in Salmonella enterica core genes for epidemiological typing

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Lukjancenko, Oksana; Rundsten, Carsten Friis

    2012-01-01

    Background: Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over...... genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher...... that there is a positive selection towards mutations leading to amino acid changes. Conclusions: Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important...

  20. Genomic variation in Salmonella enterica core genes for epidemiological typing

    Directory of Open Access Journals (Sweden)

    Leekitcharoenphon Pimlapas

    2012-03-01

    Full Text Available Abstract Background Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over time. The core genes--the genes that are conserved in all (or most members of a genus or species--are potentially good candidates for investigating genomic variation in phylogeny and epidemiology. Results We identify a set of 2,882 core genes clusters based on 73 publicly available Salmonella enterica genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher confidence. The core genes can be divided into two categories: a few highly variable genes and a larger set of conserved core genes, with low variance. For the most variable core genes, the variance in amino acid sequences is higher than for the corresponding nucleotide sequences, suggesting that there is a positive selection towards mutations leading to amino acid changes. Conclusions Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important especially in trend analysis.

  1. Putative essential and core-essential genes in Mycoplasma genomes.

    Science.gov (United States)

    Lin, Yan; Zhang, Randy Ren

    2011-01-01

    Mycoplasma, which was used to create the first "synthetic life", has been an important species in the emerging field, synthetic biology. However, essential genes, an important concept of synthetic biology, for both M. mycoides and M. capricolum, as well as 14 other Mycoplasma with available genomes, are still unknown. We have developed a gene essentiality prediction algorithm that incorporates information of biased gene strand distribution, homologous search and codon adaptation index. The algorithm, which achieved an accuracy of 80.8% and 78.9% in self-consistence and cross-validation tests, respectively, predicted 5880 essential genes in the 16 Mycoplasma genomes. The intersection set of essential genes in available Mycoplasma genomes consists of 153 core essential genes. The predicted essential genes (available from pDEG, tubic.tju.edu.cn/pdeg) and the proposed algorithm can be helpful for studying minimal Mycoplasma genomes as well as essential genes in other genomes.

  2. USING GENOMICS TO EXAMINE MULTIPLE EXPOSURE VARIABLES IN BIOINDICATORS RESEARCH

    Science.gov (United States)

    Genomics technologies provide a powerful tool for rapid assessment of differentially expressed genes in laboratory and field animals exposed to toxicants, and a means by which to link the earliest indicators of exposure to diverse effects in organisms and populations. However, a...

  3. An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies

    OpenAIRE

    Li, Jia; Tseng, George C

    2011-01-01

    Global expression analyses using microarray technologies are becoming more common in genomic research, therefore, new statistical challenges associated with combining information from multiple studies must be addressed. In this paper we will describe our proposal for an adaptively weighted (AW) statistic to combine multiple genomic studies for detecting differentially expressed genes. We will also present our results from comparisons of our proposed AW statistic to Fisher...

  4. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  5. Insular organization of gene space in grass genomes.

    Science.gov (United States)

    Gottlieb, Andrea; Müller, Hans-Georg; Massa, Alicia N; Wanjugi, Humphrey; Deal, Karin R; You, Frank M; Xu, Xiangyang; Gu, Yong Q; Luo, Ming-Cheng; Anderson, Olin D; Chan, Agnes P; Rabinowicz, Pablo; Devos, Katrien M; Dvorak, Jan

    2013-01-01

    Wheat and maize genes were hypothesized to be clustered into islands but the hypothesis was not statistically tested. The hypothesis is statistically tested here in four grass species differing in genome size, Brachypodium distachyon, Oryza sativa, Sorghum bicolor, and Aegilops tauschii. Density functions obtained under a model where gene locations follow a homogeneous Poisson process and thus are not clustered are compared with a model-free situation quantified through a non-parametric density estimate. A simple homogeneous Poisson model for gene locations is not rejected for the small O. sativa and B. distachyon genomes, indicating that genes are distributed largely uniformly in those species, but is rejected for the larger S. bicolor and Ae. tauschii genomes, providing evidence for clustering of genes into islands. It is proposed to call the gene islands "gene insulae" to distinguish them from other types of gene clustering that have been proposed. An average S. bicolor and Ae. tauschii insula is estimated to contain 3.7 and 3.9 genes with an average intergenic distance within an insula of 2.1 and 16.5 kb, respectively. Inter-insular distances are greater than 8 and 81 kb and average 15.1 and 205 kb, in S. bicolor and Ae. tauschii, respectively. A greater gene density observed in the distal regions of the Ae. tauschii chromosomes is shown to be primarily caused by shortening of inter-insular distances. The comparison of the four grass genomes suggests that gene locations are largely a function of a homogeneous Poisson process in small genomes. Nonrandom insertions of LTR retroelements during genome expansion creates gene insulae, which become less dense and further apart with the increase in genome size. High concordance in relative lengths of orthologous intergenic distances among the investigated genomes including the maize genome suggests functional constraints on gene distribution in the grass genomes.

  6. Insular organization of gene space in grass genomes.

    Directory of Open Access Journals (Sweden)

    Andrea Gottlieb

    Full Text Available Wheat and maize genes were hypothesized to be clustered into islands but the hypothesis was not statistically tested. The hypothesis is statistically tested here in four grass species differing in genome size, Brachypodium distachyon, Oryza sativa, Sorghum bicolor, and Aegilops tauschii. Density functions obtained under a model where gene locations follow a homogeneous Poisson process and thus are not clustered are compared with a model-free situation quantified through a non-parametric density estimate. A simple homogeneous Poisson model for gene locations is not rejected for the small O. sativa and B. distachyon genomes, indicating that genes are distributed largely uniformly in those species, but is rejected for the larger S. bicolor and Ae. tauschii genomes, providing evidence for clustering of genes into islands. It is proposed to call the gene islands "gene insulae" to distinguish them from other types of gene clustering that have been proposed. An average S. bicolor and Ae. tauschii insula is estimated to contain 3.7 and 3.9 genes with an average intergenic distance within an insula of 2.1 and 16.5 kb, respectively. Inter-insular distances are greater than 8 and 81 kb and average 15.1 and 205 kb, in S. bicolor and Ae. tauschii, respectively. A greater gene density observed in the distal regions of the Ae. tauschii chromosomes is shown to be primarily caused by shortening of inter-insular distances. The comparison of the four grass genomes suggests that gene locations are largely a function of a homogeneous Poisson process in small genomes. Nonrandom insertions of LTR retroelements during genome expansion creates gene insulae, which become less dense and further apart with the increase in genome size. High concordance in relative lengths of orthologous intergenic distances among the investigated genomes including the maize genome suggests functional constraints on gene distribution in the grass genomes.

  7. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  8. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    Directory of Open Access Journals (Sweden)

    Ueki Masao

    2012-05-01

    Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.

  9. Prevalent role of gene features in determining evolutionary fates of whole-genome duplication duplicated genes in flowering plants.

    Science.gov (United States)

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-04-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs.

  10. Comparative genomics of the relationship between gene structure and expression

    NARCIS (Netherlands)

    Ren, X.

    2006-01-01

    The relationship between the structure of genes and their expression is a relatively new aspect of genome organization and regulation. With more genome sequences and expression data becoming available, bioinformatics approaches can help the further elucidation of the relationships between gene struc

  11. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  12. Polycomb target genes are silenced in multiple myeloma.

    Directory of Open Access Journals (Sweden)

    Antonia Kalushkova

    Full Text Available Multiple myeloma (MM is a genetically heterogeneous disease, which to date remains fatal. Finding a common mechanism for initiation and progression of MM continues to be challenging. By means of integrative genomics, we identified an underexpressed gene signature in MM patient cells compared to normal counterpart plasma cells. This profile was enriched for previously defined H3K27-tri-methylated genes, targets of the Polycomb group (PcG proteins in human embryonic fibroblasts. Additionally, the silenced gene signature was more pronounced in ISS stage III MM compared to stage I and II. Using chromatin immunoprecipitation (ChIP assay on purified CD138+ cells from four MM patients and on two MM cell lines, we found enrichment of H3K27me3 at genes selected from the profile. As the data implied that the Polycomb-targeted gene profile would be highly relevant for pharmacological treatment of MM, we used two compounds to chemically revert the H3K27-tri-methylation mediated gene silencing. The S-adenosylhomocysteine hydrolase inhibitor 3-Deazaneplanocin (DZNep and the histone deacetylase inhibitor LBH589 (Panobinostat, reactivated the expression of genes repressed by H3K27me3, depleted cells from the PRC2 component EZH2 and induced apoptosis in human MM cell lines. In the immunocompetent 5T33MM in vivo model for MM, treatment with LBH589 resulted in gene upregulation, reduced tumor load and increased overall survival. Taken together, our results reveal a common gene signature in MM, mediated by gene silencing via the Polycomb repressor complex. The importance of the underexpressed gene profile in MM tumor initiation and progression should be subjected to further studies.

  13. Weeding out the genes: the Arabidopsis genome project.

    Science.gov (United States)

    Martienssen, R A

    2000-05-01

    The Arabidopsis genome sequence is scheduled for completion at the end of this year (December 2000). It will be the first higher plant genome to be sequenced, and will allow a detailed comparison with bacterial, yeast and animal genomes. Already, two of the five chromosomes have been sequenced, and we have had our first glimpse of higher eukaryotic centromeres, and the structure of heterochromatin. The implications for understanding plant gene function, genome structure and genome organization are profound. In this review, the lessons learned for future genome projects are reviewed as well as a summary of the initial findings in Arabidopsis.

  14. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context...

  15. A Multiple QTL-Seq Strategy Delineates Potential Genomic Loci Governing Flowering Time in Chickpea

    Science.gov (United States)

    Srivastava, Rishi; Upadhyaya, Hari D.; Kumar, Rajendra; Daware, Anurag; Basu, Udita; Shimray, Philanim W.; Tripathi, Shailesh; Bharadwaj, Chellapilla; Tyagi, Akhilesh K.; Parida, Swarup K.

    2017-01-01

    Identification of functionally relevant potential genomic loci using an economical, simpler and user-friendly genomics-assisted breeding strategy is vital for rapid genetic dissection of complex flowering time quantitative trait in chickpea. A high-throughput multiple QTL-seq strategy was employed in two inter (Cicer arietinum desi accession ICC 4958 × C reticulatum wild accession ICC 17160)- and intra (ICC 4958 × C. arietinum kabuli accession ICC 8261)-specific RIL mapping populations to identify the major QTL genomic regions governing flowering time in chickpea. The whole genome resequencing discovered 1635117 and 592486 SNPs exhibiting differentiation between early- and late-flowering mapping parents and bulks, constituted by pooling the homozygous individuals of extreme flowering time phenotypic trait from each of two aforesaid RIL populations. The multiple QTL-seq analysis using these mined SNPs in two RIL mapping populations narrowed-down two longer (907.1 kb and 1.99 Mb) major flowering time QTL genomic regions into the high-resolution shorter (757.7 kb and 1.39 Mb) QTL intervals on chickpea chromosome 4. This essentially identified regulatory as well as coding (non-synonymous/synonymous) novel SNP allelic variants from two efl1 (early flowering 1) and GI (GIGANTEA) genes regulating flowering time in chickpea. Interestingly, strong natural allelic diversity reduction (88–91%) of two known flowering genes especially mapped at major QTL intervals as compared to that of background genomic regions (where no flowering time QTLs were mapped; 61.8%) in cultivated vis-à-vis wild Cicer gene pools was evident inferring the significant impact of evolutionary bottlenecks on these loci during chickpea domestication. Higher association potential of coding non-synonymous and regulatory SNP alleles mined from efl1 (36–49%) and GI (33–42%) flowering genes for early and late flowering time differentiation among chickpea accessions was evident. The robustness and

  16. High-Diversity Genes in the Arabidopsis Genome

    OpenAIRE

    Cork, Jennifer M.; Purugganan, Michael D.

    2005-01-01

    High-diversity genes represent an important class of loci in organismal genomes. Since elevated levels of nucleotide variation are a key component of the molecular signature for balancing selection or local adaptation, high-diversity genes may represent loci whose alleles are selectively maintained as balanced polymorphisms. Comparison of 4300 random shotgun sequence fragments of the Arabidopsis thaliana Ler ecotype genome with the whole genomic sequence of the Col-0 ecotype identified 60 gen...

  17. Regulation of methane genes and genome expression

    Energy Technology Data Exchange (ETDEWEB)

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  18. Regulation of methane genes and genome expression

    Energy Technology Data Exchange (ETDEWEB)

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  19. Simultaneous mapping of multiple gene loci with pooled segregants.

    Directory of Open Access Journals (Sweden)

    Jürgen Claesen

    Full Text Available The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases remains an important challenge. It requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms (SNPs as genetic markers. Combining the technologies with pooling of segregants, as performed in bulked segregant analysis (BSA, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. The gene mapping process, applied here, consists of three steps: First, a controlled crossing of parents with and without a trait. Second, selection based on phenotypic screening of the offspring, followed by the mapping of short offspring sequences against the parental reference. The final step aims at detecting genetic markers such as SNPs, insertions and deletions with next generation sequencing (NGS. Markers in close proximity of genomic loci that are associated to the trait have a higher probability to be inherited together. Hence, these markers are very useful for discovering the loci and the genetic mechanism underlying the characteristic of interest. Within this context, NGS produces binomial counts along the genome, i.e., the number of sequenced reads that matches with the SNP of the parental reference strain, which is a proxy for the number of individuals in the offspring that share the SNP with the parent. Genomic loci associated with the trait can thus be discovered by analyzing trends in the counts along the genome. We exploit the link between smoothing splines and generalized mixed models for estimating the underlying structure present in the SNP scatterplots.

  20. SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence

    Institute of Scientific and Technical Information of China (English)

    Xiao Li; Qingan Ren; Yang Weng; Haoyang Cai; Yunmin Zhu; Yizheng Zhang

    2008-01-01

    Predicting protein-coding genes still remains a significant challenge. Although a variety of computational programs that use commonly machine learning methods have emerged, the accuracy of predictions remains a low level when implementing in large genomic sequences. Moreover, computational gene finding in newly sequenced genomes is especially a difficult task due to the absence of a training set of abundant validated genes. Here we present a new gene-finding program, SCGPred,to improve the accuracy of prediction by combining multiple sources of evidence.SCGPred can perform both supervised method in previously well-studied genomes and unsupervised one in novel genomes. By testing with datasets composed of large DNA sequences from human and a novel genome of Ustilago maydi, SCGPred gains a significant improvement in comparison to the popular ab initio gene predictors. We also demonstrate that SCGPred can significantly improve prediction in novel genomes by combining several foreign gene finders with similarity alignments, which is superior to other unsupervised methods. Therefore, SCGPred can serve as an alternative gene-finding tool for newly sequenced eukaryotic genomes. The program is freely available at http://bio.scu.edu.cn/SCGPred/.

  1. Multiple reference genomes and transcriptomes for Arabidopsis thaliana

    KAUST Repository

    Gan, Xiangchao

    2011-08-28

    Genetic differences between Arabidopsis thaliana accessions underlie the plants extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions. ©2011 Macmillan Publishers Limited. All rights reserved.

  2. Evolution of closely linked gene pairs in vertebrate genomes

    NARCIS (Netherlands)

    Franck, E.; Hulsen, T.; Huynen, M.A.; Jong, de W.W.; Lunsen, N.H.; Madsen, O.

    2008-01-01

    The orientation of closely linked genes in mammalian genomes is not random: there are more head-to-head (h2h) gene pairs than expected. To understand the origin of this enrichment in h2h gene pairs, we have analyzed the phylogenetic distribution of gene pairs separated by less than 600 bp of interge

  3. Missing genes in the annotation of prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Feng Wu-chun

    2010-03-01

    Full Text Available Abstract Background Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting. Therefore the question arises as to whether current genome annotations have systematically missing, small genes. Results We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations. The vast majority of the missing genes found are small (less than 100 aa. A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs. Conclusions Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

  4. Genome-wide association study identifies multiple susceptibility loci for multiple myeloma

    DEFF Research Database (Denmark)

    Mitchell, Jonathan S; Li, Ni; Weinhold, Niels;

    2016-01-01

    Multiple myeloma (MM) is a plasma cell malignancy with a significant heritable basis. Genome-wide association studies have transformed our understanding of MM predisposition, but individual studies have had limited power to discover risk loci. Here we perform a meta-analysis of these GWAS, add a ...

  5. Novel genomic rearrangements mediated by multiple genetic elements in Streptococcus pyogenes M23ND confer potential for evolutionary persistence.

    Science.gov (United States)

    Bao, Yun-Juan; Liang, Zhong; Mayfield, Jeffrey A; McShan, William M; Lee, Shaun W; Ploplis, Victoria A; Castellino, Francis J

    2016-08-01

    Symmetric genomic rearrangements around replication axes in genomes are commonly observed in prokaryotic genomes, including Group A Streptococcus (GAS). However, asymmetric rearrangements are rare. Our previous studies showed that the hypervirulent invasive GAS strain, M23ND, containing an inactivated transcriptional regulator system, covRS, exhibits unique extensive asymmetric rearrangements, which reconstructed a genomic structure distinct from other GAS genomes. In the current investigation, we identified the rearrangement events and examined the genetic consequences and evolutionary implications underlying the rearrangements. By comparison with a close phylogenetic relative, M18-MGAS8232, we propose a molecular model wherein a series of asymmetric rearrangements have occurred in M23ND, involving translocations, inversions and integrations mediated by multiple factors, viz., rRNA-comX (factor for late competence), transposons and phage-encoded gene segments. Assessments of the cumulative gene orientations and GC skews reveal that the asymmetric genomic rearrangements did not affect the general genomic integrity of the organism. However, functional distributions reveal re-clustering of a broad set of CovRS-regulated actively transcribed genes, including virulence factors and metabolic genes, to the same leading strand, with high confidence (p-value ~10-10). The re-clustering of the genes suggests a potential selection advantage for the spatial proximity to the transcription complexes, which may contain the global transcriptional regulator, CovRS, and other RNA polymerases. Their proximities allow for efficient transcription of the genes required for growth, virulence and persistence. A new paradigm of survival strategies of GAS strains is provided through multiple genomic rearrangements, while, at the same time, maintaining genomic integrity.

  6. Microfluidic gene arrays for rapid genomic profiling

    Science.gov (United States)

    West, Jay A.; Hukari, Kyle W.; Hux, Gary A.; Shepodd, Timothy J.

    2004-12-01

    Genomic analysis tools have recently become an indispensable tool for the evaluation of gene expression in a variety of experiment protocols. Two of the main drawbacks to this technology are the labor and time intensive process for sample preparation and the relatively long times required for target/probe hybridization. In order to overcome these two technological barriers we have developed a microfluidic chip to perform on chip sample purification and labeling, integrated with a high density genearray. Sample purification was performed using a porous polymer monolithic material functionalized with an oligo dT nucleotide sequence for the isolation of high purity mRNA. These purified mRNA"s can then rapidly labeled using a covalent fluorescent molecule which forms a selective covalent bond at the N7 position of guanine residues. These labeled mRNA"s can then released from the polymer monolith to allow for direct hybridization with oligonucletide probes deposited in microfluidic channel. To allow for rapid target/probe hybridization high density microarray were printed in microchannels. The channels can accommodate array densities as high as 4000 probes. When oligonucleotide deposition is complete, these channels are sealed using a polymer film which forms a pressure tight seal to allow sample reagent flow to the arrayed probes. This process will allow for real time target to probe hybridization monitoring using a top mounted CCD fiber bundle combination. Using this process we have been able to perform a multi-step sample preparation to labeled target/probe hybridization in less than 30 minutes. These results demonstrate the capability to perform rapid genomic screening on a high density microfluidic microarray of oligonucleotides.

  7. Horizontal acquisition of multiple mitochondrial genes from a parasitic plant followed by gene conversion with host mitochondrial genes

    Directory of Open Access Journals (Sweden)

    Hao Weilong

    2010-12-01

    Full Text Available Abstract Background Horizontal gene transfer (HGT is relatively common in plant mitochondrial genomes but the mechanisms, extent and consequences of transfer remain largely unknown. Previous results indicate that parasitic plants are often involved as either transfer donors or recipients, suggesting that direct contact between parasite and host facilitates genetic transfer among plants. Results In order to uncover the mechanistic details of plant-to-plant HGT, the extent and evolutionary fate of transfer was investigated between two groups: the parasitic genus Cuscuta and a small clade of Plantago species. A broad polymerase chain reaction (PCR survey of mitochondrial genes revealed that at least three genes (atp1, atp6 and matR were recently transferred from Cuscuta to Plantago. Quantitative PCR assays show that these three genes have a mitochondrial location in the one species line of Plantago examined. Patterns of sequence evolution suggest that these foreign genes degraded into pseudogenes shortly after transfer and reverse transcription (RT-PCR analyses demonstrate that none are detectably transcribed. Three cases of gene conversion were detected between native and foreign copies of the atp1 gene. The identical phylogenetic distribution of the three foreign genes within Plantago and the retention of cytidines at ancestral positions of RNA editing indicate that these genes were probably acquired via a single, DNA-mediated transfer event. However, samplings of multiple individuals from two of the three species in the recipient Plantago clade revealed complex and perplexing phylogenetic discrepancies and patterns of sequence divergence for all three of the foreign genes. Conclusions This study reports the best evidence to date that multiple mitochondrial genes can be transferred via a single HGT event and that transfer occurred via a strictly DNA-level intermediate. The discovery of gene conversion between co-resident foreign and native

  8. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns

    Science.gov (United States)

    Jansen, Robert K.; Cai, Zhengqiu; Raubeson, Linda A.; Daniell, Henry; dePamphilis, Claude W.; Leebens-Mack, James; Müller, Kai F.; Guisinger-Bellian, Mary; Haberle, Rosemarie C.; Hansen, Anne K.; Chumley, Timothy W.; Lee, Seung-Bum; Peery, Rhiannon; McNeal, Joel R.; Kuehl, Jennifer V.; Boore, Jeffrey L.

    2007-01-01

    Angiosperms are the largest and most successful clade of land plants with >250,000 species distributed in nearly every terrestrial habitat. Many phylogenetic studies have been based on DNA sequences of one to several genes, but, despite decades of intensive efforts, relationships among early diverging lineages and several of the major clades remain either incompletely resolved or weakly supported. We performed phylogenetic analyses of 81 plastid genes in 64 sequenced genomes, including 13 new genomes, to estimate relationships among the major angiosperm clades, and the resulting trees are used to examine the evolution of gene and intron content. Phylogenetic trees from multiple methods, including model-based approaches, provide strong support for the position of Amborella as the earliest diverging lineage of flowering plants, followed by Nymphaeales and Austrobaileyales. The plastid genome trees also provide strong support for a sister relationship between eudicots and monocots, and this group is sister to a clade that includes Chloranthales and magnoliids. Resolution of relationships among the major clades of angiosperms provides the necessary framework for addressing numerous evolutionary questions regarding the rapid diversification of angiosperms. Gene and intron content are highly conserved among the early diverging angiosperms and basal eudicots, but 62 independent gene and intron losses are limited to the more derived monocot and eudicot clades. Moreover, a lineage-specific correlation was detected between rates of nucleotide substitutions, indels, and genomic rearrangements. PMID:18048330

  9. Comparative genomics of Geobacter chemotaxis genes reveals diverse signaling function

    Directory of Open Access Journals (Sweden)

    Antommattei Frances M

    2008-10-01

    Full Text Available Abstract Background Geobacter species are δ-Proteobacteria and are often the predominant species in a variety of sedimentary environments where Fe(III reduction is important. Their ability to remediate contaminated environments and produce electricity makes them attractive for further study. Cell motility, biofilm formation, and type IV pili all appear important for the growth of Geobacter in changing environments and for electricity production. Recent studies in other bacteria have demonstrated that signaling pathways homologous to the paradigm established for Escherichia coli chemotaxis can regulate type IV pili-dependent motility, the synthesis of flagella and type IV pili, the production of extracellular matrix material, and biofilm formation. The classification of these pathways by comparative genomics improves the ability to understand how Geobacter thrives in natural environments and better their use in microbial fuel cells. Results The genomes of G. sulfurreducens, G. metallireducens, and G. uraniireducens contain multiple (~70 homologs of chemotaxis genes arranged in several major clusters (six, seven, and seven, respectively. Unlike the single gene cluster of E. coli, the Geobacter clusters are not all located near the flagellar genes. The probable functions of some Geobacter clusters are assignable by homology to known pathways; others appear to be unique to the Geobacter sp. and contain genes of unknown function. We identified large numbers of methyl-accepting chemotaxis protein (MCP homologs that have diverse sensing domain architectures and generate a potential for sensing a great variety of environmental signals. We discuss mechanisms for class-specific segregation of the MCPs in the cell membrane, which serve to maintain pathway specificity and diminish crosstalk. Finally, the regulation of gene expression in Geobacter differs from E. coli. The sequences of predicted promoter elements suggest that the alternative sigma factors

  10. Identification and Categorization of Horizontally Transferred Genes in Prokaryotic Genomes

    Institute of Scientific and Technical Information of China (English)

    Shuo-Yong SHI; Xiao-Hui CAI; Da-fu DING

    2005-01-01

    Horizontal gene transfer (HGT), a process through which genomes acquire genetic materials from distantly related organisms, is believed to be one of the major forces in prokaryotic genome evolution.However, systematic investigation is still scarce to clarify two basic issues about HGT: (1) what types of genes are transferred; and (2) what influence HGT events over the organization and evolution of biological pathways. Genome-scale investigations of these two issues will advance the systematical understanding of HGT in the context of prokaryotic genome evolution. Having investigated 82 genomes, we constructed an HGT database across broad evolutionary timescales. We identified four function categories containing a high proportion of horizontally transferred genes: cell envelope, energy metabolism, regulatory functions, and transport/binding proteins. Such biased function distribution indicates that HGT is not completely random;instead, it is under high selective pressure, required by function restraints in organisms. Furthermore, we mapped the transferred genes onto the connectivity structure map of organism-specific pathways listed in Kyoto Encyclopedia of Genes and Genomes (KEGG). Our results suggest that recruitment of transferred genes into pathways is also selectively constrained because of the tuned interaction between original pathway members. Pathway organization structures still conserve well through evolution even with the recruitment of horizontally transferred genes. Interestingly, in pathways whose organization were significantly affected by HGT events, the operon-like arrangement of transferred genes was found to be prevalent. Such results suggest that operon plays an essential and directional role in the integration of alien genes into pathways.

  11. Multiple recent horizontal transfers of a large genomic region in cheese making fungi.

    Science.gov (United States)

    Cheeseman, Kevin; Ropars, Jeanne; Renault, Pierre; Dupont, Joëlle; Gouzy, Jérôme; Branca, Antoine; Abraham, Anne-Laure; Ceppi, Maurizio; Conseiller, Emmanuel; Debuchy, Robert; Malagnac, Fabienne; Goarin, Anne; Silar, Philippe; Lacoste, Sandrine; Sallet, Erika; Bensimon, Aaron; Giraud, Tatiana; Brygoo, Yves

    2014-01-01

    While the extent and impact of horizontal transfers in prokaryotes are widely acknowledged, their importance to the eukaryotic kingdom is unclear and thought by many to be anecdotal. Here we report multiple recent transfers of a huge genomic island between Penicillium spp. found in the food environment. Sequencing of the two leading filamentous fungi used in cheese making, P. roqueforti and P. camemberti, and comparison with the penicillin producer P. rubens reveals a 575 kb long genomic island in P. roqueforti--called Wallaby--present as identical fragments at non-homologous loci in P. camemberti and P. rubens. Wallaby is detected in Penicillium collections exclusively in strains from food environments. Wallaby encompasses about 250 predicted genes, some of which are probably involved in competition with microorganisms. The occurrence of multiple recent eukaryotic transfers in the food environment provides strong evidence for the importance of this understudied and probably underestimated phenomenon in eukaryotes.

  12. Moving towards system genetics through multiple trait analysis in genome-wide association studies

    Directory of Open Access Journals (Sweden)

    Daniel eShriner

    2012-01-01

    Full Text Available Association studies are a staple of genotype-phenotype mapping studies, whether they are based on single markers, haplotypes, candidate genes, genome-wide genotypes, or whole genome sequences. Although genetic epidemiological studies typically contain data collected on multiple traits which themselves are often correlated, most analyses have been performed on single traits. Here, I review several methods that have been developed to perform multiple trait analysis. These methods range from traditional multivariate models for systems of equations to recently developed graphical approaches based on network theory. The application of network theory to genetics is termed systems genetics and has the potential to address long-standing questions in genetics about complex processes such as coordinate regulation, homeostasis, and pleiotropy.

  13. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database ( http://palmxplore.mpob.gov.my ), will provide important resources for studies on the genomes of oil palm and related crops. This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.

  14. A GeneTrek analysis of the maize genome.

    Science.gov (United States)

    Liu, Renyi; Vitte, Clémentine; Ma, Jianxin; Mahama, A Assibi; Dhliwayo, Thanda; Lee, Michael; Bennetzen, Jeffrey L

    2007-07-10

    Analysis of the sequences of 74 randomly selected BACs demonstrated that the maize nuclear genome contains approximately 37,000 candidate genes with homologues in other plant species. An additional approximately 5,500 predicted genes are severely truncated and probably pseudogenes. The distribution of genes is uneven, with approximately 30% of BACs containing no genes. BAC gene density varies from 0 to 7.9 per 100 kb, whereas most gene islands contain only one gene. The average number of genes per gene island is 1.7. Only 72% of these genes show collinearity with the rice genome. Particular LTR retrotransposon families (e.g., Gyma) are enriched on gene-free BACs, most of which do not come from pericentromeres or other large heterochromatic regions. Gene-containing BACs are relatively enriched in different families of LTR retrotransposons (e.g., Ji). Two major bursts of LTR retrotransposon activity in the last 2 million years are responsible for the large size of the maize genome, but only the more recent of these is well represented in gene-containing BACs, suggesting that LTR retrotransposons are more efficiently removed in these domains. The results demonstrate that sample sequencing and careful annotation of a few randomly selected BACs can provide a robust description of a complex plant genome.

  15. A Method for Identification of Selenoprotein Genes in Archaeal Genomes

    Institute of Scientific and Technical Information of China (English)

    Mingfeng Li; Yanzhao Huang; Yi Xiao

    2009-01-01

    The genetic codon UGA has a dual function: serving as a terminator and encoding selenocysteine. However, most popular gene annotation programs only take it as a stop signal, resulting in misannotation or completely missing selenoprotein genes. We developed a computational method named Asec-Prediction that is specific for the prediction of archaeal selenoprotein genes. To evaluate its effectiveness, we first applied it to 14 archaeal genomes with previously known selenoprotein genes, and Asec-Prediction identified all reported selenoprotein genes without redundant results. When we applied it to 12 archaeal genomes that had not been researched for selenoprotein genes, Asec-Prediction detected a novel selenoprotein gene in Methanosarcina acetivorans. Further evidence was also collected to support that the predicted gene should be a real selenoprotein gene. The result shows that Asec-Prediction is effective for the prediction of archaeal selenoprotein genes.

  16. Genome Enabled Discovery of Carbon Sequestration Genes in Poplar

    Energy Technology Data Exchange (ETDEWEB)

    Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping; Strauss, Steve

    2007-02-22

    The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Complete descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral

  17. Functional and evolutionary correlates of gene constellations in the Drosophila melanogaster genome that deviate from the stereotypical gene architecture

    Directory of Open Access Journals (Sweden)

    Kohn Michael H

    2010-05-01

    Full Text Available Abstract Background The biological dimensions of genes are manifold. These include genomic properties, (e.g., X/autosomal linkage, recombination and functional properties (e.g., expression level, tissue specificity. Multiple properties, each generally of subtle influence individually, may affect the evolution of genes or merely be (auto-correlates. Results of multidimensional analyses may reveal the relative importance of these properties on the evolution of genes, and therefore help evaluate whether these properties should be considered during analyses. While numerous properties are now considered during studies, most work still assumes the stereotypical solitary gene as commonly depicted in textbooks. Here, we investigate the Drosophila melanogaster genome to determine whether deviations from the stereotypical gene architecture correlate with other properties of genes. Results Deviations from the stereotypical gene architecture were classified as the following gene constellations: Overlapping genes were defined as those that overlap in the 5-prime, exonic, or intronic regions. Chromatin co-clustering genes were defined as genes that co-clustered within 20 kb of transcriptional territories. If this scheme is applied the stereotypical gene emerges as a rare occurrence (7.5%, slightly varied schemes yielded between ~1%-50%. Moreover, when following our scheme, paired-overlapping genes and chromatin co-clustering genes accounted for 50.1 and 42.4% of the genes analyzed, respectively. Gene constellation was a correlate of a number of functional and evolutionary properties of genes, but its statistical effect was ~1-2 orders of magnitude lower than the effects of recombination, chromosome linkage and protein function. Analysis of datasets on male reproductive proteins showed these were biased in their representation of gene constellations and evolutionary rate Ka/Ks estimates, but these biases did not overwhelm the biologically meaningful

  18. Animal models for human contiguous gene syndromes and other genomic disorders

    Directory of Open Access Journals (Sweden)

    Katherina Walz

    2004-01-01

    Full Text Available Genomic disorders refer to a group of syndromes caused by DNA rearrangements, such as deletions and duplications, which result in an alteration of normal gene dosage. The chromosomal rearrangements are usually relatively small and often difficult to detect cytogenetically. In a subset of such conditions the rearrangements comprise multiple unrelated contiguous genes that are physically linked and thus have been referred to as contiguous gene syndromes (CGS. In general, each syndrome presents a complex clinical phenotype that has been attributed generally to dosage sensitive gene(s present in the responsible chromosomal interval. A common mechanism for CGS resulting from interstitial deletion/duplication has recently been elucidated. The DNA rearrangements result from nonallelic homologous recombination (NAHR utilizing flanking low-copy repeats (LCRs as recombination substrates. The resulting rearrangements often involve the same genomic region, a common deletion or duplication, making it difficult to assign a specific phenotype or endophenotype to a single responsible gene. The human and mouse genome sequencing projects, in conjunction with the ability to engineer mouse chromosome rearrangements, have enabled the production of mouse models for CGS and genomic disorders. In this review we present an overview of different techniques utilized to generate mouse models for selected genomic disorders. These models foment novel insights into the specific genes that convey the phenotype by dosage and/or position effects and provide opportunities to explore therapeutic options.

  19. Genetics and Genomics of Single-Gene Cardiovascular Diseases: Common Hereditary Cardiomyopathies as Prototypes of Single-Gene Disorders.

    Science.gov (United States)

    Marian, Ali J; van Rooij, Eva; Roberts, Robert

    2016-12-27

    This is the first of 2 review papers on genetics and genomics appearing as part of the series on "omics." Genomics pertains to all components of an organism's genes, whereas genetics involves analysis of a specific gene or genes in the context of heredity. The paper provides introductory comments, describes the basis of human genetic diversity, and addresses the phenotypic consequences of genetic variants. Rare variants with large effect sizes are responsible for single-gene disorders, whereas complex polygenic diseases are typically due to multiple genetic variants, each exerting a modest effect size. To illustrate the clinical implications of genetic variants with large effect sizes, 3 common forms of hereditary cardiomyopathies are discussed as prototypic examples of single-gene disorders, including their genetics, clinical manifestations, pathogenesis, and treatment. The genetic basis of complex traits is discussed in a separate paper. Copyright © 2016 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  20. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    Science.gov (United States)

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  1. Genome-editing Technologies for Gene and Cell Therapy

    Science.gov (United States)

    Maeder, Morgan L; Gersbach, Charles A

    2016-01-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333

  2. Genome-editing Technologies for Gene and Cell Therapy.

    Science.gov (United States)

    Maeder, Morgan L; Gersbach, Charles A

    2016-03-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed.

  3. A unified gene catalog for the laboratory mouse reference genome.

    Science.gov (United States)

    Zhu, Y; Richardson, J E; Hale, P; Baldarelli, R M; Reed, D J; Recla, J M; Sinclair, R; Reddy, T B K; Bult, C J

    2015-08-01

    We report here a semi-automated process by which mouse genome feature predictions and curated annotations (i.e., genes, pseudogenes, functional RNAs, etc.) from Ensembl, NCBI and Vertebrate Genome Annotation database (Vega) are reconciled with the genome features in the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org) into a comprehensive and non-redundant catalog. Our gene unification method employs an algorithm (fjoin--feature join) for efficient detection of genome coordinate overlaps among features represented in two annotation data sets. Following the analysis with fjoin, genome features are binned into six possible categories (1:1, 1:0, 0:1, 1:n, n:1, n:m) based on coordinate overlaps. These categories are subsequently prioritized for assessment of annotation equivalencies and differences. The version of the unified catalog reported here contains more than 59,000 entries, including 22,599 protein-coding coding genes, 12,455 pseudogenes, and 24,007 other feature types (e.g., microRNAs, lincRNAs, etc.). More than 23,000 of the entries in the MGI gene catalog have equivalent gene models in the annotation files obtained from NCBI, Vega, and Ensembl. 12,719 of the features are unique to NCBI relative to Ensembl/Vega; 11,957 are unique to Ensembl/Vega relative to NCBI, and 3095 are unique to MGI. More than 4000 genome features fall into categories that require manual inspection to resolve structural differences in the gene models from different annotation sources. Using the MGI unified gene catalog, researchers can easily generate a comprehensive report of mouse genome features from a single source and compare the details of gene and transcript structure using MGI's mouse genome browser.

  4. Evolution of genes and genomes on the Drosophila phylogeny

    OpenAIRE

    Clark, Andrew G.; Pachter, Lior

    2007-01-01

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illumin...

  5. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  6. Comprehensive genomic characterization defines human glioblastoma genes and core pathways

    NARCIS (Netherlands)

    Chin, L.; Meyerson, M.; Aldape, K.; Bigner, D.; Mikkelsen, T.; VandenBerg, S.; Kahn, A.; Penny, R.; Gerhard, D. S.; Getz, G.; Brennan, C.; Taylor, B. S.; Winckler, W.; Park, P.; Ladanyi, M.; Hoadley, K. A.; Verhaak, R. G. W.; Hayes, D. N.; Spellman, Paul T.; Absher, D.; Weir, B. A.; Ding, L.; Wheeler, D.; Lawrence, M. S.; Cibulskis, K.; Mardis, E.; Zhang, Jinghui; Wilson, R. K.; Donehower, L.; Wheeler, D. A.; Purdom, E.; Wallis, J.; Laird, P. W.; Herman, J. G.; Schuebel, K. E.; Weisenberger, D. J.; Baylin, S. B.; Schultz, N.; Yao, Jun; Wiedemeyer, R.; Weinstein, J.; Sander, C.; Gibbs, R. A.; Gray, J.; Kucherlapati, R.; Lander, E. S.; Myers, R. M.; Perou, C. M.; McLendon, Roger; Friedman, Allan; Van Meir, Erwin G; Brat, Daniel J; Mastrogianakis, Gena Marie; Olson, Jeffrey J; Lehman, Norman; Yung, W. K. Alfred; Bogler, Oliver; Berger, Mitchel; Prados, Michael; Muzny, Donna; Morgan, Margaret; Scherer, Steve; Sabo, Aniko; Nazareth, Lynn; Lewis, Lora; Hall, Otis; Zhu, Yiming; Ren, Yanru; Alvi, Omar; Yao, Jiqiang; Hawes, Alicia; Jhangiani, Shalini; Fowler, Gerald; San Lucas, Anthony; Kovar, Christie; Cree, Andrew; Dinh, Huyen; Santibanez, Jireh; Joshi, Vandita; Gonzalez-Garay, Manuel L.; Miller, Christopher A.; Milosavljevic, Aleksandar; Sougnez, Carrie; Fennell, Tim; Mahan, Scott; Wilkinson, Jane; Ziaugra, Liuda; Onofrio, Robert; Bloom, Toby; Nicol, Rob; Ardlie, Kristin; Baldwin, Jennifer; Gabriel, Stacey; Fulton, Robert S.; McLellan, Michael D.; Larson, David E.; Shi, Xiaoqi; Abbott, Rachel; Fulton, Lucinda; Chen, Ken; Koboldt, Daniel C.; Wendl, Michael C.; Meyer, Rick; Tang, Yuzhu; Lin, Ling; Osborne, John R.; Dunford-Shore, Brian H.; Miner, Tracie L.; Delehaunty, Kim; Markovic, Chris; Swift, Gary; Courtney, William; Pohl, Craig; Abbott, Scott; Hawkins, Amy; Leong, Shin; Haipek, Carrie; Schmidt, Heather; Wiechert, Maddy; Vickery, Tammi; Scott, Sacha; Dooling, David J.; Chinwalla, Asif; Weinstock, George M.; O'Kelly, Michael; Robinson, Jim; Alexe, Gabriele; Beroukhim, Rameen; Carter, Scott; Chiang, Derek; Gould, Josh; Gupta, Supriya; Korn, Josh; Mermel, Craig; Mesirov, Jill; Monti, Stefano; Nguyen, Huy; Parkin, Melissa; Reich, Michael; Stransky, Nicolas; Garraway, Levi; Golub, Todd; Protopopov, Alexei; Perna, Ilana; Aronson, Sandy; Sathiamoorthy, Narayan; Ren, Georgia; Kim, Hyunsoo; Kong, Sek Won; Xiao, Yonghong; Kohane, Isaac S.; Seidman, Jon; Cope, Leslie; Pan, Fei; Van Den Berg, David; Van Neste, Leander; Yi, Joo Mi; Li, Jun Z.; Southwick, Audrey; Brady, Shannon; Aggarwal, Amita; Chung, Tisha; Sherlock, Gavin; Brooks, James D.; Jakkula, Lakshmi R.; Lapuk, Anna V.; Marr, Henry; Dorton, Shannon; Choi, Yoon Gi; Han, Ju; Ray, Amrita; Wang, Victoria; Durinck, Steffen; Robinson, Mark; Wang, Nicholas J.; Vranizan, Karen; Peng, Vivian; Van Name, Eric; Fontenay, Gerald V.; Ngai, John; Conboy, John G.; Parvin, Bahram; Feiler, Heidi S.; Speed, Terence P.; Socci, Nicholas D.; Olshen, Adam; Lash, Alex; Reva, Boris; Antipin, Yevgeniy; Stukalov, Alexey; Gross, Benjamin; Cerami, Ethan; Wang, Wei Qing; Qin, Li-Xuan; Seshan, Venkatraman E.; Villafania, Liliana; Cavatore, Magali; Borsu, Laetitia; Viale, Agnes; Gerald, William; Topal, Michael D.; Qi, Yuan; Balu, Sai; Shi, Yan; Wu, George; Bittner, Michael; Shelton, Troy; Lenkiewicz, Elizabeth; Morris, Scott; Beasley, Debbie; Sanders, Sheri; Sfeir, Robert; Chen, Jessica; Nassau, David; Feng, Larry; Hickey, Erin; Schaefer, Carl; Madhavan, Subha; Buetow, Ken; Barker, Anna; Vockley, Joseph; Compton, Carolyn; Vaught, Jim; Fielding, Peter; Collins, Francis; Good, Peter; Guyer, Mark; Ozenberger, Brad; Peterson, Jane; Thomson, Elizabeth

    2008-01-01

    Human cancer cells typically harbour multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas ( TCGA) pilot project aims to assess the value of large- scale multi- dimensional analysis of these molecular

  7. Comprehensive genomic characterization defines human glioblastoma genes and core pathways

    NARCIS (Netherlands)

    Chin, L.; Meyerson, M.; Aldape, K.; Bigner, D.; Mikkelsen, T.; VandenBerg, S.; Kahn, A.; Penny, R.; Gerhard, D. S.; Getz, G.; Brennan, C.; Taylor, B. S.; Winckler, W.; Park, P.; Ladanyi, M.; Hoadley, K. A.; Verhaak, R. G. W.; Hayes, D. N.; Spellman, Paul T.; Absher, D.; Weir, B. A.; Ding, L.; Wheeler, D.; Lawrence, M. S.; Cibulskis, K.; Mardis, E.; Zhang, Jinghui; Wilson, R. K.; Donehower, L.; Wheeler, D. A.; Purdom, E.; Wallis, J.; Laird, P. W.; Herman, J. G.; Schuebel, K. E.; Weisenberger, D. J.; Baylin, S. B.; Schultz, N.; Yao, Jun; Wiedemeyer, R.; Weinstein, J.; Sander, C.; Gibbs, R. A.; Gray, J.; Kucherlapati, R.; Lander, E. S.; Myers, R. M.; Perou, C. M.; McLendon, Roger; Friedman, Allan; Van Meir, Erwin G; Brat, Daniel J; Mastrogianakis, Gena Marie; Olson, Jeffrey J; Lehman, Norman; Yung, W. K. Alfred; Bogler, Oliver; Berger, Mitchel; Prados, Michael; Muzny, Donna; Morgan, Margaret; Scherer, Steve; Sabo, Aniko; Nazareth, Lynn; Lewis, Lora; Hall, Otis; Zhu, Yiming; Ren, Yanru; Alvi, Omar; Yao, Jiqiang; Hawes, Alicia; Jhangiani, Shalini; Fowler, Gerald; San Lucas, Anthony; Kovar, Christie; Cree, Andrew; Dinh, Huyen; Santibanez, Jireh; Joshi, Vandita; Gonzalez-Garay, Manuel L.; Miller, Christopher A.; Milosavljevic, Aleksandar; Sougnez, Carrie; Fennell, Tim; Mahan, Scott; Wilkinson, Jane; Ziaugra, Liuda; Onofrio, Robert; Bloom, Toby; Nicol, Rob; Ardlie, Kristin; Baldwin, Jennifer; Gabriel, Stacey; Fulton, Robert S.; McLellan, Michael D.; Larson, David E.; Shi, Xiaoqi; Abbott, Rachel; Fulton, Lucinda; Chen, Ken; Koboldt, Daniel C.; Wendl, Michael C.; Meyer, Rick; Tang, Yuzhu; Lin, Ling; Osborne, John R.; Dunford-Shore, Brian H.; Miner, Tracie L.; Delehaunty, Kim; Markovic, Chris; Swift, Gary; Courtney, William; Pohl, Craig; Abbott, Scott; Hawkins, Amy; Leong, Shin; Haipek, Carrie; Schmidt, Heather; Wiechert, Maddy; Vickery, Tammi; Scott, Sacha; Dooling, David J.; Chinwalla, Asif; Weinstock, George M.; O'Kelly, Michael; Robinson, Jim; Alexe, Gabriele; Beroukhim, Rameen; Carter, Scott; Chiang, Derek; Gould, Josh; Gupta, Supriya; Korn, Josh; Mermel, Craig; Mesirov, Jill; Monti, Stefano; Nguyen, Huy; Parkin, Melissa; Reich, Michael; Stransky, Nicolas; Garraway, Levi; Golub, Todd; Protopopov, Alexei; Perna, Ilana; Aronson, Sandy; Sathiamoorthy, Narayan; Ren, Georgia; Kim, Hyunsoo; Kong, Sek Won; Xiao, Yonghong; Kohane, Isaac S.; Seidman, Jon; Cope, Leslie; Pan, Fei; Van Den Berg, David; Van Neste, Leander; Yi, Joo Mi; Li, Jun Z.; Southwick, Audrey; Brady, Shannon; Aggarwal, Amita; Chung, Tisha; Sherlock, Gavin; Brooks, James D.; Jakkula, Lakshmi R.; Lapuk, Anna V.; Marr, Henry; Dorton, Shannon; Choi, Yoon Gi; Han, Ju; Ray, Amrita; Wang, Victoria; Durinck, Steffen; Robinson, Mark; Wang, Nicholas J.; Vranizan, Karen; Peng, Vivian; Van Name, Eric; Fontenay, Gerald V.; Ngai, John; Conboy, John G.; Parvin, Bahram; Feiler, Heidi S.; Speed, Terence P.; Socci, Nicholas D.; Olshen, Adam; Lash, Alex; Reva, Boris; Antipin, Yevgeniy; Stukalov, Alexey; Gross, Benjamin; Cerami, Ethan; Wang, Wei Qing; Qin, Li-Xuan; Seshan, Venkatraman E.; Villafania, Liliana; Cavatore, Magali; Borsu, Laetitia; Viale, Agnes; Gerald, William; Topal, Michael D.; Qi, Yuan; Balu, Sai; Shi, Yan; Wu, George; Bittner, Michael; Shelton, Troy; Lenkiewicz, Elizabeth; Morris, Scott; Beasley, Debbie; Sanders, Sheri; Sfeir, Robert; Chen, Jessica; Nassau, David; Feng, Larry; Hickey, Erin; Schaefer, Carl; Madhavan, Subha; Buetow, Ken; Barker, Anna; Vockley, Joseph; Compton, Carolyn; Vaught, Jim; Fielding, Peter; Collins, Francis; Good, Peter; Guyer, Mark; Ozenberger, Brad; Peterson, Jane; Thomson, Elizabeth

    2008-01-01

    Human cancer cells typically harbour multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas ( TCGA) pilot project aims to assess the value of large- scale multi- dimensional analysis of these molecular char

  8. Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks.

    Science.gov (United States)

    Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun

    2014-11-25

    The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position

  9. Cancer driver gene discovery through an integrative genomics approach in a non-parametric Bayesian framework.

    Science.gov (United States)

    Yang, Hai; Wei, Qiang; Zhong, Xue; Yang, Hushan; Li, Bingshan

    2017-02-15

    Comprehensive catalogue of genes that drive tumor initiation and progression in cancer is key to advancing diagnostics, therapeutics and treatment. Given the complexity of cancer, the catalogue is far from complete yet. Increasing evidence shows that driver genes exhibit consistent aberration patterns across multiple-omics in tumors. In this study, we aim to leverage complementary information encoded in each of the omics data to identify novel driver genes through an integrative framework. Specifically, we integrated mutations, gene expression, DNA copy numbers, DNA methylation and protein abundance, all available in The Cancer Genome Atlas (TCGA) and developed iDriver, a non-parametric Bayesian framework based on multivariate statistical modeling to identify driver genes in an unsupervised fashion. iDriver captures the inherent clusters of gene aberrations and constructs the background distribution that is used to assess and calibrate the confidence of driver genes identified through multi-dimensional genomic data. We applied the method to 4 cancer types in TCGA and identified candidate driver genes that are highly enriched with known drivers. (e.g.: P < 3.40 × 10 -36 for breast cancer). We are particularly interested in novel genes and observed multiple lines of supporting evidence. Using systematic evaluation from multiple independent aspects, we identified 45 candidate driver genes that were not previously known across these 4 cancer types. The finding has important implications that integrating additional genomic data with multivariate statistics can help identify cancer drivers and guide the next stage of cancer genomics research. The C ++ source code is freely available at https://medschool.vanderbilt.edu/cgg/ . hai.yang@vanderbilt.edu or bingshan.li@Vanderbilt.Edu. Supplementary data are available at Bioinformatics online.

  10. Genes but not genomes reveal bacterial domestication of Lactococcus lactis.

    Directory of Open Access Journals (Sweden)

    Delphine Passerini

    Full Text Available BACKGROUND: The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE. METHODOLOGY/PRINCIPAL FINDINGS: The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST differing by up to 230 kb in genome size. CONCLUSION/SIGNIFICANCE: The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between "environmental" strains, the main contributors to the genetic diversity within the subspecies, and "domesticated" strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the "domesticated" strains essentially arose through substantial genomic flux within the dispensable

  11. Genome Variability and Gene Content in Chordopoxviruses: Dependence on Microsatellites

    Science.gov (United States)

    Hatcher, Eneida L.; Wang, Chunlin; Lefkowitz, Elliot J.

    2015-01-01

    To investigate gene loss in poxviruses belonging to the Chordopoxvirinae subfamily, we assessed the gene content of representative members of the subfamily, and determined whether individual genes present in each genome were intact, truncated, or fragmented. When nonintact genes were identified, the early stop mutations (ESMs) leading to gene truncation or fragmentation were analyzed. Of all the ESMs present in these poxvirus genomes, over 65% co-localized with microsatellites—simple sequence nucleotide repeats. On average, microsatellites comprise 24% of the nucleotide sequence of these poxvirus genomes. These simple repeats have been shown to exhibit high rates of variation, and represent a target for poxvirus protein variation, gene truncation, and reductive evolution. PMID:25912716

  12. Genome engineering and gene expression control for bacterial strain development.

    Science.gov (United States)

    Song, Chan Woo; Lee, Joungmin; Lee, Sang Yup

    2015-01-01

    In recent years, a number of techniques and tools have been developed for genome engineering and gene expression control to achieve desired phenotypes of various bacteria. Here we review and discuss the recent advances in bacterial genome manipulation and gene expression control techniques, and their actual uses with accompanying examples. Genome engineering has been commonly performed based on homologous recombination. During such genome manipulation, the counterselection systems employing SacB or nucleases have mainly been used for the efficient selection of desired engineered strains. The recombineering technology enables simple and more rapid manipulation of the bacterial genome. The group II intron-mediated genome engineering technology is another option for some bacteria that are difficult to be engineered by homologous recombination. Due to the increasing demands on high-throughput screening of bacterial strains having the desired phenotypes, several multiplex genome engineering techniques have recently been developed and validated in some bacteria. Another approach to achieve desired bacterial phenotypes is the repression of target gene expression without the modification of genome sequences. This can be performed by expressing antisense RNA, small regulatory RNA, or CRISPR RNA to repress target gene expression at the transcriptional or translational level. All of these techniques allow efficient and rapid development and screening of bacterial strains having desired phenotypes, and more advanced techniques are expected to be seen.

  13. Maximum likelihood for genome phylogeny on gene content.

    Science.gov (United States)

    Zhang, Hongmei; Gu, Xun

    2004-01-01

    With the rapid growth of entire genome data, reconstructing the phylogenetic relationship among different genomes has become a hot topic in comparative genomics. Maximum likelihood approach is one of the various approaches, and has been very successful. However, there is no reported study for any applications in the genome tree-making mainly due to the lack of an analytical form of a probability model and/or the complicated calculation burden. In this paper we studied the mathematical structure of the stochastic model of genome evolution, and then developed a simplified likelihood function for observing a specific phylogenetic pattern under four genome situation using gene content information. We use the maximum likelihood approach to identify phylogenetic trees. Simulation results indicate that the proposed method works well and can identify trees with a high correction rate. Real data application provides satisfied results. The approach developed in this paper can serve as the basis for reconstructing phylogenies of more than four genomes.

  14. Recent Achievement in Gene Cloning and Functional Genomics in Soybean

    Directory of Open Access Journals (Sweden)

    Zhengjun Xia

    2013-01-01

    Full Text Available Soybean is a model plant for photoperiodism as well as for symbiotic nitrogen fixation. However, a rather low efficiency in soybean transformation hampers functional analysis of genes isolated from soybean. In comparison, rapid development and progress in flowering time and photoperiodic response have been achieved in Arabidopsis and rice. As the soybean genomic information has been released since 2008, gene cloning and functional genomic studies have been revived as indicated by successfully characterizing genes involved in maturity and nematode resistance. Here, we review some major achievements in the cloning of some important genes and some specific features at genetic or genomic levels revealed by the analysis of functional genomics of soybean.

  15. Gene coexpression as Hebbian learning in prokaryotic genomes.

    Science.gov (United States)

    Vey, Gregory

    2013-12-01

    Biological interaction networks represent a powerful tool for characterizing intracellular functional relationships, such as transcriptional regulation and protein interactions. Although artificial neural networks are routinely employed for a broad range of applications across computational biology, their underlying connectionist basis has not been extensively applied to modeling biological interaction networks. In particular, the Hopfield network offers nonlinear dynamics that represent the minimization of a system energy function through temporally distinct rewiring events. Here, a scaled energy minimization model is presented to test the feasibility of deriving a composite biological interaction network from multiple constituent data sets using the Hebbian learning principle. The performance of the scaled energy minimization model is compared against the standard Hopfield model using simulated data. Several networks are also derived from real data, compared to one another, and then combined to produce an aggregate network. The utility and limitations of the proposed model are discussed, along with possible implications for a genomic learning analogy where the fundamental Hebbian postulate is rendered into its genomic equivalent: Genes that function together junction together.

  16. Genome organization and long-range regulation of gene expression by enhancers.

    Science.gov (United States)

    Smallwood, Andrea; Ren, Bing

    2013-06-01

    It is now well accepted that cell-type specific gene regulation is under the purview of enhancers. Great strides have been made recently to characterize and identify enhancers both genetically and epigenetically for multiple cell types and species, but efforts have just begun to link enhancers to their target promoters. Mapping these interactions and understanding how the 3D landscape of the genome constrains such interactions is fundamental to our understanding of mammalian gene regulation. Here, we review recent progress in mapping long-range regulatory interactions in mammalian genomes, focusing on transcriptional enhancers and chromatin organization principles. Copyright © 2013. Published by Elsevier Ltd.

  17. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  18. Genomic location and characterisation of MIC genes in cattle.

    Science.gov (United States)

    Birch, James; De Juan Sanjuan, Cristina; Guzman, Efrain; Ellis, Shirley A

    2008-08-01

    Major histocompatibility complex (MHC) class I chain-related (MIC) genes have been previously identified and characterised in human. They encode polymorphic class I-like molecules that are stress-inducible, and constitute one of the ligands of the activating natural killer cell receptor NKG2D. We have identified three MIC genes within the cattle genome, located close to three non-classical MHC class I genes. The genomic position relative to other genes is very similar to the arrangement reported in the pig MHC region. Analysis of MIC cDNA sequences derived from a range of cattle cell lines suggest there may be four MIC genes in total. We have investigated the presence of the genes in distinct and well-defined MHC haplotypes, and show that one gene is consistently present, while configuration of the other three genes appears variable.

  19. De novo genome assembly of the economically important weed horseweed using integrated data from multiple sequencing platforms.

    Science.gov (United States)

    Peng, Yanhui; Lai, Zhao; Lane, Thomas; Nageswara-Rao, Madhugiri; Okada, Miki; Jasieniuk, Marie; O'Geen, Henriette; Kim, Ryan W; Sammons, R Douglas; Rieseberg, Loren H; Stewart, C Neal

    2014-11-01

    Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From 116.3 Gb (approximately 350× coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly = 33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592 protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed. Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first published draft genome of an agricultural weed.

  20. Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

    Directory of Open Access Journals (Sweden)

    Yubo Hou

    Full Text Available The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10-transformed protein-coding gene number (Y' versus log(10-transformed genome size (X', genome size in kbp were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p0.91. Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1% compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%. The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6 kbp contains 38,188 protein-coding (40,086 total genes and the largest (245x10(6 kbp 87,688 protein-coding (92,013 total genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

  1. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    Science.gov (United States)

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  2. A rare variant of the TYK2 gene is confirmed to be associated with multiple sclerosis

    DEFF Research Database (Denmark)

    Mero, Inger-Lise; Lorentzen, Aslaug R; Ban, Maria;

    2010-01-01

    A rare functional variant within the TYK2 gene (rs34536443) has been reported as protective in multiple sclerosis (MS) in recent studies. However, because of the low frequency of the minor allele (minor allele frequency=0.04), genome-wide significant association has been hard to establish. We...

  3. Statistical applications in nutrigenomics : analyzing multiple genes and proteins in relation to complex diseases in humans

    NARCIS (Netherlands)

    Heidema, A.G.

    2008-01-01

    Background The recent advances in technology provide the possibility to obtain large genomic datasets that contain information on large numbers of variables, while the sample sizes are moderate to small. This has lead to statistical challenges in the analysis of multiple genes and proteins in relat

  4. Genome-wide gene-gene interaction analysis for next-generation sequencing.

    Science.gov (United States)

    Zhao, Jinying; Zhu, Yun; Xiong, Momiao

    2016-03-01

    The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study.

  5. Genomic analysis reveals extensive gene duplication within the bovine TRB locus

    Directory of Open Access Journals (Sweden)

    Law Andy

    2009-04-01

    Full Text Available Abstract Background Diverse TR and IG repertoires are generated by V(DJ somatic recombination. Genomic studies have been pivotal in cataloguing the V, D, J and C genes present in the various TR/IG loci and describing how duplication events have expanded the number of these genes. Such studies have also provided insights into the evolution of these loci and the complex mechanisms that regulate TR/IG expression. In this study we analyze the sequence of the third bovine genome assembly to characterize the germline repertoire of bovine TRB genes and compare the organization, evolution and regulatory structure of the bovine TRB locus with that of humans and mice. Results The TRB locus in the third bovine genome assembly is distributed over 5 scaffolds, extending to ~730 Kb. The available sequence contains 134 TRBV genes, assigned to 24 subgroups, and 3 clusters of DJC genes, each comprising a single TRBD gene, 5–7 TRBJ genes and a single TRBC gene. Seventy-nine of the TRBV genes are predicted to be functional. Comparison with the human and murine TRB loci shows that the gene order, as well as the sequences of non-coding elements that regulate TRB expression, are highly conserved in the bovine. Dot-plot analyses demonstrate that expansion of the genomic TRBV repertoire has occurred via a complex and extensive series of duplications, predominantly involving DNA blocks containing multiple genes. These duplication events have resulted in massive expansion of several TRBV subgroups, most notably TRBV6, 9 and 21 which contain 40, 35 and 16 members respectively. Similarly, duplication has lead to the generation of a third DJC cluster. Analyses of cDNA data confirms the diversity of the TRBV genes and, in addition, identifies a substantial number of TRBV genes, predominantly from the larger subgroups, which are still absent from the genome assembly. The observed gene duplication within the bovine TRB locus has created a repertoire of phylogenetically

  6. Genomic organization and sequences of immunoglobulin light chain genes in a primitive vertebrate suggest coevolution of immunoglobulin gene organization.

    Science.gov (United States)

    Shamblott, M J; Litman, G W

    1989-01-01

    The genomic organization and sequence of immunoglobulin light chain genes in Heterodontus francisci (horned shark), a phylogenetically primitive vertebrate, have been characterized. Light chain variable (VL) and joining (JI) segments are separated by 380 nucleotides and together with the single constant region exon (CI), occupy less than 2.7 kb, the closest linkage described thus far for a rearranging gene system. The VL segment is flanked by a characteristic recombination signal sequence possessing a 12 nucleotide spacer; the recombination signal sequence flanking the JL segment is 23 nucleotides. The VL genes, unlike heavy chain genes, possess a typical upstream regulatory octamer as well as conserved enhancer core sequences in the intervening sequence separating JL and CL. Restriction mapping and genomic Southern blotting are consistent with the presence of multiple light chain gene clusters. There appear to be considerably fewer light than heavy chain genes. Heavy and light chain clusters show no evidence of genomic linkage using field inversion gel electrophoresis. The findings of major differences in the organization and functional rearrangement properties of immunoglobulin genes in species representing different levels of vertebrate evolution, but consistent similarity in the organization of heavy and light chain genes within a species, suggests that these systems may be coevolving. Images PMID:2511000

  7. Reproduction-related genes in the pearl oyster genome.

    Science.gov (United States)

    Matsumoto, Toshie; Masaoka, Tetsuji; Fujiwara, Atsushi; Nakamura, Yoji; Satoh, Nori; Awaji, Masahiko

    2013-10-01

    Molluscan reproduction has been a target of biological research because of the various reproductive strategies that have evolved in this phylum. It has also been studied for the development of fisheries technologies, particularly aquaculture. Although fundamental processes of reproduction in other phyla, such as vertebrates and arthropods, have been well studied, information on the molecular mechanisms of molluscan reproduction remains limited. The recently released draft genome of the pearl oyster Pinctada fucata provides a novel and powerful platform for obtaining structural information on the genes and proteins involved in bivalve reproduction. In the present study, we analyzed the pearl oyster draft genome to screen reproduction-related genes. Analysis was mainly conducted for genes reported from other molluscs for encoding orthologs of reproduction-related proteins in other phyla. The gene search in the P. fucata gene models (version 1.1) and genome assembly (version 1.0) were performed using Genome Browser and BLAST software. The obtained gene models were then BLASTP searched against a public database to confirm the best-hit sequences. As a result, more than 40 gene models were identified with high accuracy to encode reproduction-related genes reported for P. fucata and other molluscs. These include vasa, nanos, doublesex- and mab-3-related transcription factor, 5-hydroxytryptamine (5-HT) receptors, vitellogenin, estrogen receptor, and others. The set of reproduction-related genes of P. fucata identified in the present study constitute a new tool for research on bivalve reproduction at the molecular level.

  8. Functional analysis of sirtuin genes in multiple Plasmodium falciparum strains.

    Directory of Open Access Journals (Sweden)

    Catherine J Merrick

    Full Text Available Plasmodium falciparum, the causative agent of severe human malaria, employs antigenic variation to avoid host immunity. Antigenic variation is achieved by transcriptional switching amongst polymorphic var genes, enforced by epigenetic modification of chromatin. The histone-modifying 'sirtuin' enzymes PfSir2a and PfSir2b have been implicated in this process. Disparate patterns of var expression have been reported in patient isolates as well as in cultured strains. We examined var expression in three commonly used laboratory strains (3D7, NF54 and FCR-3 in parallel. NF54 parasites express significantly lower levels of var genes compared to 3D7, despite the fact that 3D7 was originally a clone of the NF54 strain. To investigate whether this was linked to the expression of sirtuins, genetic disruption of both sirtuins was attempted in all three strains. No dramatic changes in var gene expression occurred in NF54 or FCR-3 following PfSir2b disruption, contrasting with previous observations in 3D7. In 3D7, complementation of the PfSir2a genetic disruption resulted in a significant decrease in previously-elevated var gene expression levels, but with the continued expression of multiple var genes. Finally, rearranged chromosomes were observed in the 3D7 PfSir2a knockout line. Our results focus on the potential for parasite genetic background to contribute to sirtuin function in regulating virulence gene expression and suggest a potential role for sirtuins in maintaining genome integrity.

  9. Functional analysis of sirtuin genes in multiple Plasmodium falciparum strains.

    Science.gov (United States)

    Merrick, Catherine J; Jiang, Rays H Y; Skillman, Kristen M; Samarakoon, Upeka; Moore, Rachel M; Dzikowski, Ron; Ferdig, Michael T; Duraisingh, Manoj T

    2015-01-01

    Plasmodium falciparum, the causative agent of severe human malaria, employs antigenic variation to avoid host immunity. Antigenic variation is achieved by transcriptional switching amongst polymorphic var genes, enforced by epigenetic modification of chromatin. The histone-modifying 'sirtuin' enzymes PfSir2a and PfSir2b have been implicated in this process. Disparate patterns of var expression have been reported in patient isolates as well as in cultured strains. We examined var expression in three commonly used laboratory strains (3D7, NF54 and FCR-3) in parallel. NF54 parasites express significantly lower levels of var genes compared to 3D7, despite the fact that 3D7 was originally a clone of the NF54 strain. To investigate whether this was linked to the expression of sirtuins, genetic disruption of both sirtuins was attempted in all three strains. No dramatic changes in var gene expression occurred in NF54 or FCR-3 following PfSir2b disruption, contrasting with previous observations in 3D7. In 3D7, complementation of the PfSir2a genetic disruption resulted in a significant decrease in previously-elevated var gene expression levels, but with the continued expression of multiple var genes. Finally, rearranged chromosomes were observed in the 3D7 PfSir2a knockout line. Our results focus on the potential for parasite genetic background to contribute to sirtuin function in regulating virulence gene expression and suggest a potential role for sirtuins in maintaining genome integrity.

  10. The cavefish genome reveals candidate genes for eye loss

    Science.gov (United States)

    McGaugh, Suzanne E.; Gross, Joshua B.; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R.; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O’Quin, Kelly E.; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M. J.; Stahl, Bethany A.; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C.

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  11. Genome-wide gene expression analysis of anguillid herpesvirus 1

    NARCIS (Netherlands)

    Beurden, van S.J.; Peeters, B.P.H.; Rottier, P.J.M.; Davison, A.A.; Engelsma, M.Y.

    2013-01-01

    Background Whereas temporal gene expression in mammalian herpesviruses has been studied extensively, little is known about gene expression in fish herpesviruses. Here we report a genome-wide transcription analysis of a fish herpesvirus, anguillid herpesvirus 1, in cell culture, studied during the

  12. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    Josephine Erhiakporeh

    2016-07-06

    Jul 6, 2016 ... identification of a set of 75 candidate genes (42, 22 and 11 from Arabidopsis, potato and tomato, ... understanding on the genetic basis of drought tolerance by using the .... Comparative genomics and genes expression assay ... Primer code ... physiological and molecular responses to drought stress.

  13. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  14. LATERAL GENE TRANSFER AND THE HISTORY OF BACTERIAL GENOMES

    Energy Technology Data Exchange (ETDEWEB)

    Howard Ochman

    2006-02-22

    The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.

  15. Building phylogenetic trees by using gene Nucleotide Genomic Signals.

    Science.gov (United States)

    Cristea, Paul Dan

    2012-01-01

    Nucleotide genomic signal (NuGS) methodology allows a molecular level approach to determine distances between homologous genes or between conserved equivalent non-coding genome regions in various species or individuals of the same species. Therefore, distances between the genes of species or individuals can be computed and phylogenetic trees can be built. The paper illustrates the use of the nucleotide imbalance (N) and nucleotide pair imbalance (P) signals to determine the distances between the genes of several Hominidae. The results are in accordance with those of other genetic or phylogenetic approaches to establish distances between Hominidae species.

  16. Simple and Efficient Targeting of Multiple Genes Through CRISPR-Cas9 in Physcomitrella patens

    Directory of Open Access Journals (Sweden)

    Mauricio Lopez-Obando

    2016-11-01

    Full Text Available Powerful genome editing technologies are needed for efficient gene function analysis. The CRISPR-Cas9 system has been adapted as an efficient gene-knock-out technology in a variety of species. However, in a number of situations, knocking out or modifying a single gene is not sufficient; this is particularly true for genes belonging to a common family, or for genes showing redundant functions. Like many plants, the model organism Physcomitrella patens has experienced multiple events of polyploidization during evolution that has resulted in a number of families of duplicated genes. Here, we report a robust CRISPR-Cas9 system, based on the codelivery of a CAS9 expressing cassette, multiple sgRNA vectors, and a cassette for transient transformation selection, for gene knock-out in multiple gene families. We demonstrate that CRISPR-Cas9-mediated targeting of five different genes allows the selection of a quintuple mutant, and all possible subcombinations of mutants, in one experiment, with no mutations detected in potential off-target sequences. Furthermore, we confirmed the observation that the presence of repeats in the vicinity of the cutting region favors deletion due to the alternative end joining pathway, for which induced frameshift mutations can be potentially predicted. Because the number of multiple gene families in Physcomitrella is substantial, this tool opens new perspectives to study the role of expanded gene families in the colonization of land by plants.

  17. From trees to the forest: genes to genomics.

    Science.gov (United States)

    Mullighan, Charles; Petersdorf, Effie; Davies, Stella M; DiPersio, John

    2011-01-01

    Crick, Watson, and colleagues revealed the genetic code in 1953, and since that time, remarkable progress has been made in understanding what makes each of us who we are. Identification of single genes important in disease, and the development of a mechanistic understanding of genetic elements that regulate gene function, have cast light on the pathophysiology of many heritable and acquired disorders. In 1990, the human genome project commenced, with the goal of sequencing the entire human genome, and a "first draft" was published with astonishing speed in 2001. The first draft, although an extraordinary achievement, reported essentially an imaginary haploid mix of alleles rather than a true diploid genome. In the years since 2001, technology has further improved, and efforts have been focused on filling in the gaps in the initial genome and starting the huge task of looking at normal variation in the human genome. This work is the beginning of understanding human genetics in the context of the structure of the genome as a complete entity, and as more than simply the sum of a series of genes. We present 3 studies in this review that apply genomic approaches to leukemia and to transplantation to improve and extend therapies.

  18. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    Energy Technology Data Exchange (ETDEWEB)

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives

  19. Genome engineering using a synthetic gene circuit in Bacillus subtilis.

    Science.gov (United States)

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-03-31

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac-chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications.

  20. Expression of a transferred nuclear gene in a mitochondrial genome

    Directory of Open Access Journals (Sweden)

    Yichun Qiu

    2014-08-01

    Full Text Available Transfer of mitochondrial genes to the nucleus, and subsequent gain of regulatory elements for expression, is an ongoing evolutionary process in plants. Many examples have been characterized, which in some cases have revealed sources of mitochondrial targeting sequences and cis-regulatory elements. In contrast, there have been no reports of a nuclear gene that has undergone intracellular transfer to the mitochondrial genome and become expressed. Here we show that the orf164 gene in the mitochondrial genome of several Brassicaceae species, including Arabidopsis, is derived from the nuclear ARF17 gene that codes for an auxin responsive protein and is present across flowering plants. Orf164 corresponds to a portion of ARF17, and the nucleotide and amino acid sequences are 79% and 81% identical, respectively. Orf164 is transcribed in several organ types of Arabidopsis thaliana, as detected by RT-PCR. In addition, orf164 is transcribed in five other Brassicaceae within the tribes Camelineae, Erysimeae and Cardamineae, but the gene is not present in Brassica or Raphanus. This study shows that nuclear genes can be transferred to the mitochondrial genome and become expressed, providing a new perspective on the movement of genes between the genomes of subcellular compartments.

  1. Whole genome phylogeny of Prochlorococcus marinus group of cyanobacteria: genome alignment and overlapping gene approach.

    Science.gov (United States)

    Prabha, Ratna; Singh, Dhananjaya P; Gupta, Shailendra K; Rai, Anil

    2014-06-01

    Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically distinct P. marinus isolates from many oceans in the world were assigned two different groups, a tightly clustered high-light (HL)-adapted and a divergent low-light (LL-) adapted clade. Phylogenetic analysis of this cyanobacterium on the basis of 16S rRNA and other conserved genes did not show consistency with its phenotypic behavior. We analyzed phylogeny of this genus on the basis of complete genome sequences through genome alignment, overlapping-gene content and gene-order approach. Phylogenetic tree of P. marinus obtained by comparing whole genome sequences in contrast to that based on 16S rRNA gene, corresponded well with the HL/LL ecotypic distinction of twelve strains and showed consistency with phenotypic classification of P. marinus. Evidence for the horizontal descent and acquisition of genes within and across the genus was observed. Many genes involved in metabolic functions were found to be conserved across these genomes and many were continuously gained by different strains as per their needs during the course of their evolution. Consistency in the physiological and genetic phylogeny based on whole genome sequence is established. These observations improve our understanding about the adaptation and diversification of these organisms under evolutionary pressure.

  2. Putative essential and core-essential genes in Mycoplasma genomes

    OpenAIRE

    Lin, Yan; Zhang, Randy Ren

    2011-01-01

    Mycoplasma, which was used to create the first “synthetic life”, has been an important species in the emerging field, synthetic biology. However, essential genes, an important concept of synthetic biology, for both M. mycoides and M. capricolum, as well as 14 other Mycoplasma with available genomes, are still unknown. We have developed a gene essentiality prediction algorithm that incorporates information of biased gene strand distribution, homologous search and codon adaptation index. The al...

  3. Phylogeny of a genomically diverse group of elymus (poaceae allopolyploids reveals multiple levels of reticulation.

    Directory of Open Access Journals (Sweden)

    Roberta J Mason-Gamer

    Full Text Available The grass tribe Triticeae (=Hordeeae comprises only about 300 species, but it is well known for the economically important crop plants wheat, barley, and rye. The group is also recognized as a fascinating example of evolutionary complexity, with a history shaped by numerous events of auto- and allopolyploidy and apparent introgression involving diploids and polyploids. The genus Elymus comprises a heterogeneous collection of allopolyploid genome combinations, all of which include at least one set of homoeologs, designated St, derived from Pseudoroegneria. The current analysis includes a geographically and genomically diverse collection of 21 tetraploid Elymus species, and a single hexaploid species. Diploid and polyploid relationships were estimated using four molecular data sets, including one that combines two regions of the chloroplast genome, and three from unlinked nuclear genes: phosphoenolpyruvate carboxylase, β-amylase, and granule-bound starch synthase I. Four gene trees were generated using maximum likelihood, and the phylogenetic placement of the polyploid sequences reveals extensive reticulation beyond allopolyploidy alone. The trees were interpreted with reference to numerous phenomena known to complicate allopolyploid phylogenies, and introgression was identified as a major factor in their history. The work illustrates the interpretation of complicated phylogenetic results through the sequential consideration of numerous possible explanations, and the results highlight the value of careful inspection of multiple independent molecular phylogenetic estimates, with particular focus on the differences among them.

  4. Phylogeny of a genomically diverse group of elymus (poaceae) allopolyploids reveals multiple levels of reticulation.

    Science.gov (United States)

    Mason-Gamer, Roberta J

    2013-01-01

    The grass tribe Triticeae (=Hordeeae) comprises only about 300 species, but it is well known for the economically important crop plants wheat, barley, and rye. The group is also recognized as a fascinating example of evolutionary complexity, with a history shaped by numerous events of auto- and allopolyploidy and apparent introgression involving diploids and polyploids. The genus Elymus comprises a heterogeneous collection of allopolyploid genome combinations, all of which include at least one set of homoeologs, designated St, derived from Pseudoroegneria. The current analysis includes a geographically and genomically diverse collection of 21 tetraploid Elymus species, and a single hexaploid species. Diploid and polyploid relationships were estimated using four molecular data sets, including one that combines two regions of the chloroplast genome, and three from unlinked nuclear genes: phosphoenolpyruvate carboxylase, β-amylase, and granule-bound starch synthase I. Four gene trees were generated using maximum likelihood, and the phylogenetic placement of the polyploid sequences reveals extensive reticulation beyond allopolyploidy alone. The trees were interpreted with reference to numerous phenomena known to complicate allopolyploid phylogenies, and introgression was identified as a major factor in their history. The work illustrates the interpretation of complicated phylogenetic results through the sequential consideration of numerous possible explanations, and the results highlight the value of careful inspection of multiple independent molecular phylogenetic estimates, with particular focus on the differences among them.

  5. Genetic diagnosis of a Chinese multiple endocrine neoplasia type 2A family through whole genome sequencing

    Indian Academy of Sciences (India)

    ZHEN-FANG DU; PENG-FEI LI; JIAN-QIANG ZHAO; ZHI-LIE CAO; FENG LI; JU-MING MA; XIAO-PING QI

    2017-06-01

    Approximately 98% of patients with multiple endocrine neoplasia type 2A (MEN 2A) have an identifiable RETmutation. Prophylactic or early total thyroidectomy or pheochromocytoma/parathyroid removal in patients can bepreventative or curative and has become standard management. The general strategy for RET screening on familymembers at risk is to sequence the most commonly affected exons and, if negative, to extend sequencing to additionalexons. However, different families with MEN 2A due to the same RET mutation often have significant variability inthe clinical exhibition of disease and aggressiveness of the MTC, which implies additional genetic loci exsit beyondRET coding region. Whole genome sequencing (WGS) greatly expands the breadth of screening from genes associatedwith a particular disease to the whole genome and, potentially, all the information that the genome containsabout diseases or traits. This is presumably due to additive effect of disease modifying factors. In this study, weperformed WGS on a typical Chinese MEN 2A proband and identified the pathogenic RET p.C634R mutation. Wealso identified several neutral variants within RET and pheochromocytoma-related genes. Moreover, we found severalinteresting structural variants including genetic deletions (RSPO1, OVCH2 and AP3S1, etc.) and fusion transcripts(FSIP1-BAZ2A, etc.).

  6. Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER

    Indian Academy of Sciences (India)

    Gautam Aggarwal; Ramakrishna Ramaswamy

    2002-02-01

    We compare the annotation of three complete genomes using the ab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail.

  7. Genomic organization and sequence analysis of the vomeronasal receptor V2R genes in mouse genome

    Institute of Scientific and Technical Information of China (English)

    YANG Hui; Zhang YaPing

    2007-01-01

    Two multigene superfamilies, named V1R and V2R, encoding seven-transmembrane-domain G-protein coupled receptors (GPCRs) have been identified as pheromone receptors in mammals. Three V2R gene families have been described in mouse and rat. Here we screened the updated mouse genome sequence database and finally retrieved 63 putative functional V2R genes including three newly identified genes which formed a new additional family. We described the genomic organization of these genes and also characterized the conservation of mouse V2R protein sequences. These genomic and sequence information we described are useful as part of the evidence to speculate the functional domain of V2Rs and should give aid to the functionality study in the future.

  8. Bacterial Cellular Engineering by Genome Editing and Gene Silencing

    Directory of Open Access Journals (Sweden)

    Nobutaka Nakashima

    2014-02-01

    Full Text Available Genome editing is an important technology for bacterial cellular engineering, which is commonly conducted by homologous recombination-based procedures, including gene knockout (disruption, knock-in (insertion, and allelic exchange. In addition, some new recombination-independent approaches have emerged that utilize catalytic RNAs, artificial nucleases, nucleic acid analogs, and peptide nucleic acids. Apart from these methods, which directly modify the genomic structure, an alternative approach is to conditionally modify the gene expression profile at the posttranscriptional level without altering the genomes. This is performed by expressing antisense RNAs to knock down (silence target mRNAs in vivo. This review describes the features and recent advances on methods used in genomic engineering and silencing technologies that are advantageously used for bacterial cellular engineering.

  9. Genomic and expression analysis of multiple Sry loci from a single Rattus norvegicus Y chromosome

    Directory of Open Access Journals (Sweden)

    Farkas Joel

    2007-04-01

    Full Text Available Abstract Background Sry is a gene known to be essential for testis determination but is also transcribed in adult male tissues. The laboratory rat, Rattus norvegicus, has multiple Y chromosome copies of Sry while most mammals have only a single copy. DNA sequence comparisons with other rodents with multiple Sry copies are inconsistent in divergence patterns and functionality of the multiple copies. To address hypotheses of divergence, gene conversion and functional constraints, we sequenced Sry loci from a single R. norvegicus Y chromosome from the Spontaneously Hypertensive Rat strain (SHR and analyzed DNA sequences for homology among copies. Next, to determine whether all copies of Sry are expressed, we developed a modification of the fluorescent marked capillary electrophoresis method to generate three different sized amplification products to identify Sry copies. We applied this fragment analysis method to both genomic DNA and cDNA prepared from mRNA from testis and adrenal gland of adult male rats. Results Y chromosome fragments were amplified and sequenced using primers that included the entire Sry coding region and flanking sequences. The analysis of these sequences identified six Sry loci on the Y chromosome. These are paralogous copies consistent with a single phylogeny and the divergence between any two copies is less than 2%. All copies have a conserved reading frame and amino acid sequence consistent with function. Fragment analysis of genomic DNA showed close approximations of experimental with predicted values, validating the use of this method to identify proportions of each copy. Using the fragment analysis procedure with cDNA samples showed the Sry copies expressed were significantly different from the genomic distribution (testis p Sry transcript expression, analyzed by real-time PCR, showed significantly higher levels of Sry in testis than adrenal gland (p, 0.001. Conclusion The SHR Y chromosome contains at least 6 full length

  10. The genomic environment around the Aromatase gene: evolutionary insights

    Directory of Open Access Journals (Sweden)

    Reis-Henriques Maria A

    2005-08-01

    Full Text Available Abstract Background The cytochrome P450 aromatase (CYP19, catalyses the aromatisation of androgens to estrogens, a key mechanism in vertebrate reproductive physiology. A current evolutionary hypothesis suggests that CYP19 gene arose at the origin of vertebrates, given that it has not been found outside this clade. The human CYP19 gene is located in one of the proposed MHC-paralogon regions (HSA15q. At present it is unclear whether this genomic location is ancestral (which would suggest an invertebrate origin for CYP19 or derived (genomic location with no evolutionary meaning. The distinction between these possibilities should help to clarify the timing of the CYP19 emergence and which taxa should be investigated. Results Here we determine the "genomic environment" around CYP19 in three vertebrate species Homo sapiens, Tetraodon nigroviridis and Xenopus tropicalis. Paralogy studies and phylogenetic analysis of six gene families suggests that the CYP19 gene region was structured through "en bloc" genomic duplication (as part of the MHC-paralogon formation. Four gene families have specifically duplicated in the vertebrate lineage. Moreover, the mapping location of the different paralogues is consistent with a model of "en bloc" duplication. Furthermore, we also determine that this region has retained the same gene content since the divergence of Actinopterygii and Tetrapods. A single inversion in gene order has taken place, probably in the mammalian lineage. Finally, we describe the first invertebrate CYP19 sequence, from Branchiostoma floridae. Conclusion Contrary to previous suggestions, our data indicates an invertebrate origin for the aromatase gene, given the striking conservation pattern in both gene order and gene content, and the presence of aromatase in amphioxus. We propose that CYP19 duplicated in the vertebrate lineage to yield four paralogues, followed by the subsequent loss of all but one gene in vertebrate evolution. Finally, we

  11. Diversity of 23S rRNA genes within individual prokaryotic genomes.

    Directory of Open Access Journals (Sweden)

    Anna Pei

    Full Text Available BACKGROUND: The concept of ribosomal constraints on rRNA genes is deduced primarily based on the comparison of consensus rRNA sequences between closely related species, but recent advances in whole-genome sequencing allow evaluation of this concept within organisms with multiple rRNA operons. METHODOLOGY/PRINCIPAL FINDINGS: Using the 23S rRNA gene as an example, we analyzed the diversity among individual rRNA genes within a genome. Of 184 prokaryotic species containing multiple 23S rRNA genes, diversity was observed in 113 (61.4% genomes (mean 0.40%, range 0.01%-4.04%. Significant (1.17%-4.04% intragenomic variation was found in 8 species. In 5 of the 8 species, the diversity in the primary structure had only minimal effect on the secondary structure (stem versus loop transition. In the remaining 3 species, the diversity significantly altered local secondary structure, but the alteration appears minimized through complex rearrangement. Intervening sequences (IVS, ranging between 9 and 1471 nt in size, were found in 7 species. IVS in Deinococcus radiodurans and Nostoc sp. encode transposases. T. tengcongensis was the only species in which intragenomic diversity >3% was observed among 4 paralogous 23S rRNA genes. CONCLUSIONS/SIGNIFICANCE: These findings indicate tight ribosomal constraints on individual 23S rRNA genes within a genome. Although classification using primary 23S rRNA sequences could be erroneous, significant diversity among paralogous 23S rRNA genes was observed only once in the 184 species analyzed, indicating little overall impact on the mainstream of 23S rRNA gene-based prokaryotic taxonomy.

  12. Convergent functional genomics of oligodendrocyte differentiation identifies multiple autoinhibitory signaling circuits.

    Science.gov (United States)

    Gobert, Rosanna Pescini; Joubert, Lara; Curchod, Marie-Laure; Salvat, Catherine; Foucault, Isabelle; Jorand-Lebrun, Catherine; Lamarine, Marc; Peixoto, Hélène; Vignaud, Chloé; Frémaux, Christèle; Jomotte, Thérèse; Françon, Bernard; Alliod, Chantal; Bernasconi, Lilia; Abderrahim, Hadi; Perrin, Dominique; Bombrun, Agnes; Zanoguera, Francisca; Rommel, Christian; Hooft van Huijsduijnen, Rob

    2009-03-01

    Inadequate remyelination of brain white matter lesions has been associated with a failure of oligodendrocyte precursors to differentiate into mature, myelin-producing cells. In order to better understand which genes play a critical role in oligodendrocyte differentiation, we performed time-dependent, genome-wide gene expression studies of mouse Oli-neu cells as they differentiate into process-forming and myelin basic protein-producing cells, following treatment with three different agents. Our data indicate that different inducers activate distinct pathways that ultimately converge into the completely differentiated state, where regulated gene sets overlap maximally. In order to also gain insight into the functional role of genes that are regulated in this process, we silenced 88 of these genes using small interfering RNA and identified multiple repressors of spontaneous differentiation of Oli-neu, most of which were confirmed in rat primary oligodendrocyte precursors cells. Among these repressors were CNP, a well-known myelin constituent, and three phosphatases, each known to negatively control mitogen-activated protein kinase cascades. We show that a novel inhibitor for one of the identified genes, dual-specificity phosphatase DUSP10/MKP5, was also capable of inducing oligodendrocyte differentiation in primary oligodendrocyte precursors. Oligodendrocytic differentiation feedback loops may therefore yield pharmacological targets to treat disease related to dysfunctional myelin deposition.

  13. A "candidate-interactome" aggregate analysis of genome-wide association data in multiple sclerosis

    DEFF Research Database (Denmark)

    Mechelli, Rosella; Umeton, Renato; Policano, Claudia

    2013-01-01

    of genes whose products are known to physically interact with environmental factors that may be relevant for disease pathogenesis) analysis of genome-wide association data in multiple sclerosis. We looked for statistical enrichment of associations among interactomes that, at the current state of knowledge...... immunity interactome for type I interferon, autoimmune regulator, vitamin D receptor, aryl hydrocarbon receptor and a panel of proteins targeted by 70 innate immune-modulating viral open reading frames from 30 viral species. Interactomes were either obtained from the literature or were manually curated...... emerges as relevant for multiple sclerosis etiology. However, in line with recent data on the coexistence of common and unique strategies used by viruses to perturb the human molecular system, also other viruses have a similar potential, though probably less relevant in epidemiological terms....

  14. Analysis of the genome-wide variations among multiple strains of the plant pathogenic bacterium Xylella fastidiosa

    Directory of Open Access Journals (Sweden)

    Walker M Andrew

    2006-09-01

    Full Text Available Abstract Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c, 54 (Dixon, 83 (Ann1 and 9 (Temecula-1. A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes

  15. Construction of gene targeting vectors from lambda KOS genomic libraries.

    Science.gov (United States)

    Wattler, S; Kelly, M; Nehls, M

    1999-06-01

    We describe a highly redundant murine genomic library in a new lambda phage, lambda knockout shuttle (lambda KOS) that facilitates the very rapid construction of replacement-type gene targeting vectors. The library consists of 94 individually amplified subpools, each containing an average of 40,000 independent genomic clones. The subpools are arrayed into a 96-well format that allows a PCR-based efficient recovery of independent genomic clones. The lambda KOS vector backbone permits the CRE-mediated conversion into high-copy number pKOS plasmids, wherein the genomic inserts are automatically flanked by negative-selection cassettes. The lambda KOS vector system exploits the yeast homologous recombination machinery to simplify the construction of replacement-type gene targeting vectors independent of restriction sites within the genomic insert. We outline procedures that allow the generation of simple and more sophisticated conditional gene targeting vectors within 3-4 weeks, beginning with the screening of the lambda KOS genomic library.

  16. Gene duplication in the genome of parasitic Giardia lamblia

    Directory of Open Access Journals (Sweden)

    Flores Roberto

    2010-02-01

    Full Text Available Abstract Background Giardia are a group of widespread intestinal protozoan parasites in a number of vertebrates. Much evidence from G. lamblia indicated they might be the most primitive extant eukaryotes. When and how such a group of the earliest branching unicellular eukaryotes developed the ability to successfully parasitize the latest branching higher eukaryotes (vertebrates is an intriguing question. Gene duplication has long been thought to be the most common mechanism in the production of primary resources for the origin of evolutionary novelties. In order to parse the evolutionary trajectory of Giardia parasitic lifestyle, here we carried out a genome-wide analysis about gene duplication patterns in G. lamblia. Results Although genomic comparison showed that in G. lamblia the contents of many fundamental biologic pathways are simplified and the whole genome is very compact, in our study 40% of its genes were identified as duplicated genes. Evolutionary distance analyses of these duplicated genes indicated two rounds of large scale duplication events had occurred in G. lamblia genome. Functional annotation of them further showed that the majority of recent duplicated genes are VSPs (Variant-specific Surface Proteins, which are essential for the successful parasitic life of Giardia in hosts. Based on evolutionary comparison with their hosts, it was found that the rapid expansion of VSPs in G. lamblia is consistent with the evolutionary radiation of placental mammals. Conclusions Based on the genome-wide analysis of duplicated genes in G. lamblia, we found that gene duplication was essential for the origin and evolution of Giardia parasitic lifestyle. The recent expansion of VSPs uniquely occurring in G. lamblia is consistent with the increment of its hosts. Therefore we proposed a hypothesis that the increment of Giradia hosts might be the driving force for the rapid expansion of VSPs.

  17. A BAC-bacterial recombination method to generate physically linked multiple gene reporter DNA constructs

    Directory of Open Access Journals (Sweden)

    Gong Shiaochin

    2009-03-01

    Full Text Available Abstract Background Reporter gene mice are valuable animal models for biological research providing a gene expression readout that can contribute to cellular characterization within the context of a developmental process. With the advancement of bacterial recombination techniques to engineer reporter gene constructs from BAC genomic clones and the generation of optically distinguishable fluorescent protein reporter genes, there is an unprecedented capability to engineer more informative transgenic reporter mouse models relative to what has been traditionally available. Results We demonstrate here our first effort on the development of a three stage bacterial recombination strategy to physically link multiple genes together with their respective fluorescent protein (FP reporters in one DNA fragment. This strategy uses bacterial recombination techniques to: (1 subclone genes of interest into BAC linking vectors, (2 insert desired reporter genes into respective genes and (3 link different gene-reporters together. As proof of concept, we have generated a single DNA fragment containing the genes Trap, Dmp1, and Ibsp driving the expression of ECFP, mCherry, and Topaz FP reporter genes, respectively. Using this DNA construct, we have successfully generated transgenic reporter mice that retain two to three gene readouts. Conclusion The three stage methodology to link multiple genes with their respective fluorescent protein reporter works with reasonable efficiency. Moreover, gene linkage allows for their common chromosomal integration into a single locus. However, the testing of this multi-reporter DNA construct by transgenesis does suggest that the linkage of two different genes together, despite their large size, can still create a positional effect. We believe that gene choice, genomic DNA fragment size and the presence of endogenous insulator elements are critical variables.

  18. Plant DNA barcoding: from gene to genome.

    Science.gov (United States)

    Li, Xiwen; Yang, Yang; Henry, Robert J; Rossetto, Maurizio; Wang, Yitao; Chen, Shilin

    2015-02-01

    DNA barcoding is currently a widely used and effective tool that enables rapid and accurate identification of plant species; however, none of the available loci work across all species. Because single-locus DNA barcodes lack adequate variations in closely related taxa, recent barcoding studies have placed high emphasis on the use of whole-chloroplast genome sequences which are now more readily available as a consequence of improving sequencing technologies. While chloroplast genome sequencing can already deliver a reliable barcode for accurate plant identification it is not yet resource-effective and does not yet offer the speed of analysis provided by single-locus barcodes to unspecialized laboratory facilities. Here, we review the development of candidate barcodes and discuss the feasibility of using the chloroplast genome as a super-barcode. We advocate a new approach for DNA barcoding that, for selected groups of taxa, combines the best use of single-locus barcodes and super-barcodes for efficient plant identification. Specific barcodes might enhance our ability to distinguish closely related plants at the species and population levels.

  19. Evolutionary genomics of LysM genes in land plants

    Directory of Open Access Journals (Sweden)

    Stacey Gary

    2009-08-01

    Full Text Available Abstract Background The ubiquitous LysM motif recognizes peptidoglycan, chitooligosaccharides (chitin and, presumably, other structurally-related oligosaccharides. LysM-containing proteins were first shown to be involved in bacterial cell wall degradation and, more recently, were implicated in perceiving chitin (one of the established pathogen-associated molecular patterns and lipo-chitin (nodulation factors in flowering plants. However, the majority of LysM genes in plants remain functionally uncharacterized and the evolutionary history of complex LysM genes remains elusive. Results We show that LysM-containing proteins display a wide range of complex domain architectures. However, only a simple core architecture is conserved across kingdoms. Each individual kingdom appears to have evolved a distinct array of domain architectures. We show that early plant lineages acquired four characteristic architectures and progressively lost several primitive architectures. We report plant LysM phylogenies and associated gene, protein and genomic features, and infer the relative timing of duplications of LYK genes. Conclusion We report a domain architecture catalogue of LysM proteins across all kingdoms. The unique pattern of LysM protein domain architectures indicates the presence of distinctive evolutionary paths in individual kingdoms. We describe a comparative and evolutionary genomics study of LysM genes in plant kingdom. One of the two groups of tandemly arrayed plant LYK genes likely resulted from an ancient genome duplication followed by local genomic rearrangement, while the origin of the other groups of tandemly arrayed LYK genes remains obscure. Given the fact that no animal LysM motif-containing genes have been functionally characterized, this study provides clues to functional characterization of plant LysM genes and is also informative with regard to evolutionary and functional studies of animal LysM genes.

  20. Evolutionary genomics of LysM genes in land plants.

    Science.gov (United States)

    Zhang, Xue-Cheng; Cannon, Steven B; Stacey, Gary

    2009-08-03

    The ubiquitous LysM motif recognizes peptidoglycan, chitooligosaccharides (chitin) and, presumably, other structurally-related oligosaccharides. LysM-containing proteins were first shown to be involved in bacterial cell wall degradation and, more recently, were implicated in perceiving chitin (one of the established pathogen-associated molecular patterns) and lipo-chitin (nodulation factors) in flowering plants. However, the majority of LysM genes in plants remain functionally uncharacterized and the evolutionary history of complex LysM genes remains elusive. We show that LysM-containing proteins display a wide range of complex domain architectures. However, only a simple core architecture is conserved across kingdoms. Each individual kingdom appears to have evolved a distinct array of domain architectures. We show that early plant lineages acquired four characteristic architectures and progressively lost several primitive architectures. We report plant LysM phylogenies and associated gene, protein and genomic features, and infer the relative timing of duplications of LYK genes. We report a domain architecture catalogue of LysM proteins across all kingdoms. The unique pattern of LysM protein domain architectures indicates the presence of distinctive evolutionary paths in individual kingdoms. We describe a comparative and evolutionary genomics study of LysM genes in plant kingdom. One of the two groups of tandemly arrayed plant LYK genes likely resulted from an ancient genome duplication followed by local genomic rearrangement, while the origin of the other groups of tandemly arrayed LYK genes remains obscure. Given the fact that no animal LysM motif-containing genes have been functionally characterized, this study provides clues to functional characterization of plant LysM genes and is also informative with regard to evolutionary and functional studies of animal LysM genes.

  1. Identification of neural outgrowth genes using genome-wide RNAi.

    Directory of Open Access Journals (Sweden)

    Katharine J Sepp

    2008-07-01

    Full Text Available While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new

  2. Comparative genomics of four closely related Clostridium perfringens bacteriophages reveals variable evolution among core genes with therapeutic potential

    Directory of Open Access Journals (Sweden)

    Siragusa Gregory R

    2011-06-01

    Full Text Available Abstract Background Because biotechnological uses of bacteriophage gene products as alternatives to conventional antibiotics will require a thorough understanding of their genomic context, we sequenced and analyzed the genomes of four closely related phages isolated from Clostridium perfringens, an important agricultural and human pathogen. Results Phage whole-genome tetra-nucleotide signatures and proteomic tree topologies correlated closely with host phylogeny. Comparisons of our phage genomes to 26 others revealed three shared COGs; of particular interest within this core genome was an endolysin (PF01520, an N-acetylmuramoyl-L-alanine amidase and a holin (PF04531. Comparative analyses of the evolutionary history and genomic context of these common phage proteins revealed two important results: 1 strongly significant host-specific sequence variation within the endolysin, and 2 a protein domain architecture apparently unique to our phage genomes in which the endolysin is located upstream of its associated holin. Endolysin sequences from our phages were one of two very distinct genotypes distinguished by variability within the putative enzymatically-active domain. The shared or core genome was comprised of genes with multiple sequence types belonging to five pfam families, and genes belonging to 12 pfam families, including the holin genes, which were nearly identical. Conclusions Significant genomic diversity exists even among closely-related bacteriophages. Holins and endolysins represent conserved functions across divergent phage genomes and, as we demonstrate here, endolysins can have significant variability and host-specificity even among closely-related genomes. Endolysins in our phage genomes may be subject to different selective pressures than the rest of the genome. These findings may have important implications for potential biotechnological applications of phage gene products.

  3. Outbred genome sequencing and CRISPR/Cas9 gene editing in butterflies

    Science.gov (United States)

    Li, Xueyan; Fan, Dingding; Zhang, Wei; Liu, Guichun; Zhang, Lu; Zhao, Li; Fang, Xiaodong; Chen, Lei; Dong, Yang; Chen, Yuan; Ding, Yun; Zhao, Ruoping; Feng, Mingji; Zhu, Yabing; Feng, Yue; Jiang, Xuanting; Zhu, Deying; Xiang, Hui; Feng, Xikan; Li, Shuaicheng; Wang, Jun; Zhang, Guojie; Kronforst, Marcus R.; Wang, Wen

    2015-01-01

    Butterflies are exceptionally diverse but their potential as an experimental system has been limited by the difficulty of deciphering heterozygous genomes and a lack of genetic manipulation technology. Here we use a hybrid assembly approach to construct high-quality reference genomes for Papilio xuthus (contig and scaffold N50: 492 kb, 3.4 Mb) and Papilio machaon (contig and scaffold N50: 81 kb, 1.15 Mb), highly heterozygous species that differ in host plant affiliations, and adult and larval colour patterns. Integrating comparative genomics and analyses of gene expression yields multiple insights into butterfly evolution, including potential roles of specific genes in recent diversification. To functionally test gene function, we develop an efficient (up to 92.5%) CRISPR/Cas9 gene editing method that yields obvious phenotypes with three genes, Abdominal-B, ebony and frizzled. Our results provide valuable genomic and technological resources for butterflies and unlock their potential as a genetic model system. PMID:26354079

  4. Outbred genome sequencing and CRISPR/Cas9 gene editing in butterflies.

    Science.gov (United States)

    Li, Xueyan; Fan, Dingding; Zhang, Wei; Liu, Guichun; Zhang, Lu; Zhao, Li; Fang, Xiaodong; Chen, Lei; Dong, Yang; Chen, Yuan; Ding, Yun; Zhao, Ruoping; Feng, Mingji; Zhu, Yabing; Feng, Yue; Jiang, Xuanting; Zhu, Deying; Xiang, Hui; Feng, Xikan; Li, Shuaicheng; Wang, Jun; Zhang, Guojie; Kronforst, Marcus R; Wang, Wen

    2015-09-10

    Butterflies are exceptionally diverse but their potential as an experimental system has been limited by the difficulty of deciphering heterozygous genomes and a lack of genetic manipulation technology. Here we use a hybrid assembly approach to construct high-quality reference genomes for Papilio xuthus (contig and scaffold N50: 492 kb, 3.4 Mb) and Papilio machaon (contig and scaffold N50: 81 kb, 1.15 Mb), highly heterozygous species that differ in host plant affiliations, and adult and larval colour patterns. Integrating comparative genomics and analyses of gene expression yields multiple insights into butterfly evolution, including potential roles of specific genes in recent diversification. To functionally test gene function, we develop an efficient (up to 92.5%) CRISPR/Cas9 gene editing method that yields obvious phenotypes with three genes, Abdominal-B, ebony and frizzled. Our results provide valuable genomic and technological resources for butterflies and unlock their potential as a genetic model system.

  5. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group......Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...

  6. Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

    Directory of Open Access Journals (Sweden)

    Ovchinnikov Sergey

    2012-03-01

    Full Text Available Abstract Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic

  7. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    Science.gov (United States)

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  8. Mining Bacterial Genomes for Secondary Metabolite Gene Clusters.

    Science.gov (United States)

    Adamek, Martina; Spohn, Marius; Stegmann, Evi; Ziemert, Nadine

    2017-01-01

    With the emergence of bacterial resistance against frequently used antibiotics, novel antibacterial compounds are urgently needed. Traditional bioactivity-guided drug discovery strategies involve laborious screening efforts and display high rediscovery rates. With the progress in next generation sequencing methods and the knowledge that the majority of antibiotics in clinical use are produced as secondary metabolites by bacteria, mining bacterial genomes for secondary metabolites with antimicrobial activity is a promising approach, which can guide a more time and cost-effective identification of novel compounds. However, what sounds easy to accomplish, comes with several challenges. To date, several tools for the prediction of secondary metabolite gene clusters are available, some of which are based on the detection of signature genes, while others are searching for specific patterns in gene content or regulation.Apart from the mere identification of gene clusters, several other factors such as determining cluster boundaries and assessing the novelty of the detected cluster are important. For this purpose, comparison of the predicted secondary metabolite genes with different cluster and compound databases is necessary. Furthermore, it is advisable to classify detected clusters into gene cluster families. So far, there is no standardized procedure for genome mining; however, different approaches to overcome all of these challenges exist and are addressed in this chapter. We give practical guidance on the workflow for secondary metabolite gene cluster identification, which includes the determination of gene cluster boundaries, addresses problems occurring with the use of draft genomes, and gives an outlook on the different methods for gene cluster classification. Based on comprehensible examples a protocol is set, which should enable the readers to mine their own genome data for interesting secondary metabolites.

  9. [Evolution of gene orders in genomes of cyanobacteria].

    Science.gov (United States)

    Markov, A V; Zakharov, I A

    2009-08-01

    Genomes of 23 strains of cyanobacteria were comparatively analyzed using quantitative methods of estimation of gene order similarity. It has been found that reconstructions of phylogenesis of cyanobacteria based on the comparison of the orders of genes in chromosomes and nucleotide sequences appear to be similar. This confirms the applicability of quantitative measures of similarity of gene orders for phylogenetic reconstructions. In the evolution of marine unicellular plankton cyanobacteria, genome rearrangements are fixed with a low rate (about 3% of gene order changes per 1% of 16S rRNA changes), whereas in other groups of cyanobacteria the gene order can change several times more rapidly. The gene orders in genomes of cyanobacteria and chloroplasts preserve a considerable degree of similarity. The closest relatives of chloroplasts among the analyzed cyanobacteria are likely to be strains from hot springs belonging to the genus Synechococcus. Comparative analysis of gene orders and nucleotide sequences strongly suggests that Synechococcus strains from diferent environments (sea, fresh waters, hot springs) are not related and belong to evolutionally distant lines.

  10. Gene mutations of acute myeloid leukemia in the genome era.

    Science.gov (United States)

    Naoe, Tomoki; Kiyoi, Hitoshi

    2013-02-01

    Ten years ago, gene mutations found in acute myeloid leukemia (AML) were conceptually grouped into class I mutation, which causes constitutive activation of intracellular signals that contribute to the growth and survival, and class II mutation, which blocks differentiation and/or enhance self-renewal by altered transcription factors. A cooperative model between two classes of mutations has been suggested by murine experiments and partly supported by epidemiological findings. In the last 5 years, comprehensive genomic analysis proceeded to find new gene mutations, which are found in the epigenome-associated enzymes and the molecules never noticed so far. These new mutations apparently increase the complexity and heterogeneity of AML. Although a long list of gene mutations might have been compiled, the entire picture of molecular pathogenesis in AML remains to be elucidated because gene rearrangement, gene copy number, DNA methylation and expression profiles are not fully studied in conjunction with gene mutations. Comprehensive genome research will deepen the understanding of AML to promote the development of new classification and treatment. This review focuses on gene mutations that were recently discovered by genome sequencing.

  11. Conservation of ribosomal protein gene ordering in 16 complete genomes

    Institute of Scientific and Technical Information of China (English)

    王宁; 陈润生; 王永雄

    2000-01-01

    The organization of ribosomal proteins in 16 prokaryotic genomes was studied as an example of comparative genome analyses of gene systems. Hypothetical ribosomal protein-containing operons were constructed. These operons also contained putative genes and other non-ribosomal genes. The correspondences among these genes across different organisms were clarified by sequence homology computations. In this way a cross tabulation of 70 ribosomal proteins genes was constructed. On average, these were organized into 9-14 operons in each genome. There were also 25 non-ribosomal or putative genes in these mainly ribosomal protein operons. Hence the table contains 95 genes in total. It was found that: (i) the conservation of the block of about 20 r-proteins in the L3 and L4 operons across almost the entire eubacteria and ar-chaebacteria is remarkable; (ii) some operons only belong to eubacteria or archaebacte-ria; (iii) although the ribosomal protein operons are highly conserved within domain, there are fine variat

  12. Conservation of ribosomal protein gene ordering in 16 complete genomes

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    The organization of ribosomal proteins in 16 prokaryotic genomes was studied as an example of comparative genome analyses of gene systems. Hypothetical ribosomal protein-containing operons were constructed. These operons also contained putative genes and other non-ribosomal genes. The correspondences among these genes across different organisms were clarified by sequence homology computations. In this way a cross tabulation of 70 ribosomal proteins genes was constructed. On average, these were organized into 9-14 operons in each genome. There were also 25 non-ribosomal or putative genes in these mainly ribosomal protein operons. Hence the table contains 95 genes in total. It was found that: (i) the conservation of the block of about 20 r-proteins in the L3 and L4 operons across almost the entire eubacteria and archaebacteria is remarkable; (ii) some operons only belong to eubacteria or archaebacteria; (iii) although the ribosomal protein operons are highly conserved within domain, there are fine variations in some operons across different organisms within each domain, and these variations are informative on the evolutionary relations among the organisms. This method provides a new potential for studying the origin and evolution of old species.

  13. A non-inheritable maternal Cas9-based multiple-gene editing system in mice.

    Science.gov (United States)

    Sakurai, Takayuki; Kamiyoshi, Akiko; Kawate, Hisaka; Mori, Chie; Watanabe, Satoshi; Tanaka, Megumu; Uetake, Ryuichi; Sato, Masahiro; Shindo, Takayuki

    2016-01-28

    The CRISPR/Cas9 system is capable of editing multiple genes through one-step zygote injection. The preexisting method is largely based on the co-injection of Cas9 DNA (or mRNA) and guide RNAs (gRNAs); however, it is unclear how many genes can be simultaneously edited by this method, and a reliable means to generate transgenic (Tg) animals with multiple gene editing has yet to be developed. Here, we employed non-inheritable maternal Cas9 (maCas9) protein derived from Tg mice with systemic Cas9 overexpression (Cas9 mice). The maCas9 protein in zygotes derived from mating or in vitro fertilization of Tg/+ oocytes and +/+ sperm could successfully edit the target genome. The efficiency of such maCas9-based genome editing was comparable to that of zygote microinjection-based genome editing widely used at present. Furthermore, we demonstrated a novel approach to create "Cas9 transgene-free" gene-modified mice using non-Tg (+/+) zygotes carrying maCas9. The maCas9 protein in mouse zygotes edited nine target loci simultaneously after injection with nine different gRNAs alone. Cas9 mouse-derived zygotes have the potential to facilitate the creation of genetically modified animals carrying the Cas9 transgene, enabling repeatable genome engineering and the production of Cas9 transgene-free mice.

  14. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    Science.gov (United States)

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  15. ECR Browser: A Tool For Visualizing And Accessing Data From Comparisons Of Multiple Vertebrate Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Loots, G G; Ovcharenko, I; Stubbs, L; Nobrega, M A

    2004-01-06

    The increasing number of vertebrate genomes being sequenced in draft or finished form provide a unique opportunity to study and decode the language of DNA sequence through comparative genome alignments. However, novel tools and strategies are required to accommodate this increasing volume of genomic information and to facilitate experimental annotation of genome function. Here we present the ECR Browser, a tool that provides an easy and dynamic access to whole genome alignments of human, mouse, rat and fish sequences. This web-based tool (http://ecrbrowser.dcode.org) provides the starting point for discovery of novel genes, identification of distant gene regulatory elements and prediction of transcription factor binding sites. The genome alignment portal of the ECR Browser also permits fast and automated alignment of any user-submitted sequence to the genome of choice. The interconnection of the ECR browser with other DNA sequence analysis tools creates a unique portal for studying and exploring vertebrate genomes.

  16. Floral gene resources from basal angiosperms for comparative genomics research

    Directory of Open Access Journals (Sweden)

    Zhang Xiaohong

    2005-03-01

    Full Text Available Abstract Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04 generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii many known floral gene homologues have been captured, and (iii phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage

  17. Floral gene resources from basal angiosperms for comparative genomics research

    Science.gov (United States)

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  18. Multiple displacement amplification of whole genomic DNA from urediospores of Puccinia striiformis f. sp. tritici.

    Science.gov (United States)

    Zhang, R; Ma, Z H; Wu, B M

    2015-05-01

    Biotrophic fungi, such as Puccinia striiformis f. sp. tritici, because they cannot be cultured on nutrient media, to obtain adequate quantity of DNA for molecular genetic analysis, are usually propagated on living hosts, wheat plants in case of P. striiformis f. sp. tritici. The propagation process is time-, space- and labor-consuming and has been a bottleneck to molecular genetic analysis of this pathogen. In this study we evaluated multiple displacement amplification (MDA) of pathogen genomic DNA from urediospores as an alternative approach to traditional propagation of urediospores followed by DNA extraction. The quantities of pathogen genomic DNA in the products were further determined via real-time PCR with a pair of primers specific for the β-tubulin gene of P. striiformis f. sp. tritici. The amplified fragment length polymorphism (AFLP) fingerprints were also compared between the DNA products. The results demonstrated that adequate genomic DNA at fragment size larger than 23 Kb could be amplified from 20 to 30 urediospores via MDA method. The real-time PCR results suggested that although fresh urediospores collected from diseased leaves were the best, spores picked from diseased leaves stored for a prolonged period could also be used for amplification. AFLP fingerprints exhibited no significant differences between amplified DNA and DNA extracted with CTAB method, suggesting amplified DNA can represent the pathogen's genomic DNA very well. Therefore, MDA could be used to obtain genomic DNA from small precious samples (dozens of spores) for molecular genetic analysis of wheat stripe rust pathogen, and other fungi that are difficult to propagate.

  19. Bacterial genes in the aphid genome: absence of functional gene transfer from Buchnera to its host.

    Directory of Open Access Journals (Sweden)

    Naruo Nikoh

    2010-02-01

    Full Text Available Genome reduction is typical of obligate symbionts. In cellular organelles, this reduction partly reflects transfer of ancestral bacterial genes to the host genome, but little is known about gene transfer in other obligate symbioses. Aphids harbor anciently acquired obligate mutualists, Buchnera aphidicola (Gammaproteobacteria, which have highly reduced genomes (420-650 kb, raising the possibility of gene transfer from ancestral Buchnera to the aphid genome. In addition, aphids often harbor other bacteria that also are potential sources of transferred genes. Previous limited sampling of genes expressed in bacteriocytes, the specialized cells that harbor Buchnera, revealed that aphids acquired at least two genes from bacteria. The newly sequenced genome of the pea aphid, Acyrthosiphon pisum, presents the first opportunity for a complete inventory of genes transferred from bacteria to the host genome in the context of an ancient obligate symbiosis. Computational screening of the entire A. pisum genome, followed by phylogenetic and experimental analyses, provided strong support for the transfer of 12 genes or gene fragments from bacteria to the aphid genome: three LD-carboxypeptidases (LdcA1, LdcA2,psiLdcA, five rare lipoprotein As (RlpA1-5, N-acetylmuramoyl-L-alanine amidase (AmiD, 1,4-beta-N-acetylmuramidase (bLys, DNA polymerase III alpha chain (psiDnaE, and ATP synthase delta chain (psiAtpH. Buchnera was the apparent source of two highly truncated pseudogenes (psiDnaE and psiAtpH. Most other transferred genes were closely related to genes from relatives of Wolbachia (Alphaproteobacteria. At least eight of the transferred genes (LdcA1, AmiD, RlpA1-5, bLys appear to be functional, and expression of seven (LdcA1, AmiD, RlpA1-5 are highly upregulated in bacteriocytes. The LdcAs and RlpAs appear to have been duplicated after transfer. Our results excluded the hypothesis that genome reduction in Buchnera has been accompanied by gene transfer to the

  20. Gene discovery in the Acanthamoeba castellanii genome

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain J.; Watkins, Russell F.; Samuelson, John; Spencer,David F.; Majoros, William H.; Gray, Michael W.; Loftus, Brendan J.

    2005-08-01

    Acanthamoeba castellanii is a free-living amoeba found in soil, freshwater, and marine environments and an important predator of bacteria. Acanthamoeba castellanii is also an opportunistic pathogen of clinical interest, responsible for several distinct diseases in humans. In order to provide a genomic platform for the study of this ubiquitous and important protist, we generated a sequence survey of approximately 0.5 x coverage of the genome. The data predict that A. castellanii exhibits a greater biosynthetic capacity than the free-living Dictyostelium discoideum and the parasite Entamoeba histolytica, providing an explanation for the ability of A. castellanii to inhabit adversity of environments. Alginate lyase may provide access to bacteria within biofilms by breaking down the biofilm matrix, and polyhydroxybutyrate depolymerase may facilitate utilization of the bacterial storage compound polyhydroxybutyrate as a food source. Enzymes for the synthesis and breakdown of cellulose were identified, and they likely participate in encystation and excystation as in D. discoideum. Trehalose-6-phosphate synthase is present, suggesting that trehalose plays a role in stress adaptation. Detection and response to a number of stress conditions is likely accomplished with a large set of signal transduction histidine kinases and a set of putative receptorserine/threonine kinases similar to those found in E. histolytica. Serine, cysteine and metalloproteases were identified, some of which are likely involved in pathogenicity.

  1. Genome-wide analysis of homeobox genes from Mesobuthus martensii reveals Hox gene duplication in scorpions.

    Science.gov (United States)

    Di, Zhiyong; Yu, Yao; Wu, Yingliang; Hao, Pei; He, Yawen; Zhao, Huabin; Li, Yixue; Zhao, Guoping; Li, Xuan; Li, Wenxin; Cao, Zhijian

    2015-06-01

    Homeobox genes belong to a large gene group, which encodes the famous DNA-binding homeodomain that plays a key role in development and cellular differentiation during embryogenesis in animals. Here, one hundred forty-nine homeobox genes were identified from the Asian scorpion, Mesobuthus martensii (Chelicerata: Arachnida: Scorpiones: Buthidae) based on our newly assembled genome sequence with approximately 248 × coverage. The identified homeobox genes were categorized into eight classes including 82 families: 67 ANTP class genes, 33 PRD genes, 11 LIM genes, five POU genes, six SINE genes, 14 TALE genes, five CUT genes, two ZF genes and six unclassified genes. Transcriptome data confirmed that more than half of the genes were expressed in adults. The homeobox gene diversity of the eight classes is similar to the previously analyzed Mandibulata arthropods. Interestingly, it is hypothesized that the scorpion M. martensii may have two Hox clusters. The first complete genome-wide analysis of homeobox genes in Chelicerata not only reveals the repertoire of scorpion, arachnid and chelicerate homeobox genes, but also shows some insights into the evolution of arthropod homeobox genes.

  2. Comparative genomics of Mycoplasma: analysis of conserved essential genes and diversity of the pan-genome.

    Directory of Open Access Journals (Sweden)

    Wei Liu

    Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.

  3. In-silico human genomics with GeneCards

    Directory of Open Access Journals (Sweden)

    Stelzer Gil

    2011-10-01

    Full Text Available Abstract Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org. This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.

  4. The genome BLASTatlas-a GeneWiz extension for visualization of whole-genome homology.

    Science.gov (United States)

    Hallin, Peter F; Binnewies, Tim T; Ussery, David W

    2008-05-01

    The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context of regions. Additional information can be added to these plots, and as an example we have added circles showing the probability of the DNA helix opening up under superhelical tension. The tool is SOAP compliant and WSDL (web services description language) files are located on our website: (http://www.cbs.dtu.dk/ws/BLASTatlas), where programming examples are available in Perl. By providing an interoperable method to carry out whole genome visualization of homology, this service offers bioinformaticians as well as biologists an easy-to-adopt workflow that can be directly called from the programming language of the user, hence enabling automation of repeated tasks. This tool can be relevant in many pangenomic as well as in metagenomic studies, by giving a quick overview of clusters of insertion sites, genomic islands and overall homology between a reference sequence and a data set.

  5. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    Energy Technology Data Exchange (ETDEWEB)

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  6. Daysleeper : from genomic parasite to indispensable gene

    NARCIS (Netherlands)

    Knip, Marijn

    2012-01-01

    In this thesis the evolutionary background, function and localization of the domesticated transposase DAYSLEEPER are described. We found that DAYSLEEPER-like genes can be found in angiosperms, but not in lower plants. We also found that DAYSLEEPER interacts with several proteins and is probably

  7. Genome Binding and Gene Regulation by Stem Cell Transcription Factors

    NARCIS (Netherlands)

    J.H. Brandsma (Johan)

    2016-01-01

    markdownabstractNearly all cells of an individual organism contain the same genome. However, each cell type transcribes a different set of genes due to the presence of different sets of cell type-specific transcription factors. Such transcription factors bind to regulatory regions such as promoters

  8. Gene hunting : molecular analysis of the chicken genome

    NARCIS (Netherlands)

    Crooijmans, R.P.M.A.

    2000-01-01

    This dissertation describes the development of molecular tools to identify genes that are involved in production and health traits in poultry. To unravel the chicken genome, fluorescent molecular markers (microsatellite markers) were developed and optimized to perform high throughput screening of re

  9. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H. influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions....

  10. Re-Examining the Gene in Personalized Genomics

    Science.gov (United States)

    Bartol, Jordan

    2013-01-01

    Personalized genomics companies (PG; also called "direct-to-consumer genetics") are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept…

  11. Infectious bronchitis viruses with naturally occurring genomic rearrangement and gene deletion.

    Science.gov (United States)

    Hewson, Kylie A; Ignjatovic, Jagoda; Browning, Glenn F; Devlin, Joanne M; Noormohammadi, Amir H

    2011-02-01

    Infectious bronchitis viruses (IBVs) are group III coronaviruses that infect poultry worldwide. Genetic variations, including whole-gene deletions, are key to IBV evolution. Australian subgroup 2 IBVs contain sequence insertions and multiple gene deletions that have resulted in a substantial genomic divergence from international IBVs. The genomic variations present in Australian IBVs were investigated and compared to those of another group III coronavirus, turkey coronavirus (TCoV). Open reading frames (ORFs) found throughout the genome of Australian IBVs were analogous in sequence and position to TCoV ORFs, except for ORF 4b, which appeared to be translocated to a different position in the subgroup 2 strains. Subgroup 2 strains were previously reported to lack genes 3a, 3b and 5a, with some also lacking 5b. Of these, however, genes 3b and 5b were found to be present but contained various mutations that may affect transcription. In this study, it was found that subgroup 2 IBVs have undergone a more substantial genomic rearrangements than previously thought.

  12. RNA-guided genome editing for target gene mutations in wheat.

    Science.gov (United States)

    Upadhyay, Santosh Kumar; Kumar, Jitesh; Alok, Anshu; Tuli, Rakesh

    2013-12-09

    The clustered, regularly interspaced, short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system has been used as an efficient tool for genome editing. We report the application of CRISPR-Cas-mediated genome editing to wheat (Triticum aestivum), the most important food crop plant with a very large and complex genome. The mutations were targeted in the inositol oxygenase (inox) and phytoene desaturase (pds) genes using cell suspension culture of wheat and in the pds gene in leaves of Nicotiana benthamiana. The expression of chimeric guide RNAs (cgRNA) targeting single and multiple sites resulted in indel mutations in all the tested samples. The expression of Cas9 or sgRNA alone did not cause any mutation. The expression of duplex cgRNA with Cas9 targeting two sites in the same gene resulted in deletion of DNA fragment between the targeted sequences. Multiplexing the cgRNA could target two genes at one time. Target specificity analysis of cgRNA showed that mismatches at the 3' end of the target site abolished the cleavage activity completely. The mismatches at the 5' end reduced cleavage, suggesting that the off target effects can be abolished in vivo by selecting target sites with unique sequences at 3' end. This approach provides a powerful method for genome engineering in plants.

  13. Cyanobacterial ribosomal RNA genes with multiple, endonuclease-encoding group I introns

    Directory of Open Access Journals (Sweden)

    Turner Seán

    2007-09-01

    Full Text Available Abstract Background Group I introns are one of the four major classes of introns as defined by their distinct splicing mechanisms. Because they catalyze their own removal from precursor transcripts, group I introns are referred to as autocatalytic introns. Group I introns are common in fungal and protist nuclear ribosomal RNA genes and in organellar genomes. In contrast, they are rare in all other organisms and genomes, including bacteria. Results Here we report five group I introns, each containing a LAGLIDADG homing endonuclease gene (HEG, in large subunit (LSU rRNA genes of cyanobacteria. Three of the introns are located in the LSU gene of Synechococcus sp. C9, and the other two are in the LSU gene of Synechococcus lividus strain C1. Phylogenetic analyses show that these introns and their HEGs are closely related to introns and HEGs located at homologous insertion sites in organellar and bacterial rDNA genes. We also present a compilation of group I introns with homing endonuclease genes in bacteria. Conclusion We have discovered multiple HEG-containing group I introns in a single bacterial gene. To our knowledge, these are the first cases of multiple group I introns in the same bacterial gene (multiple group I introns have been reported in at least one phage gene and one prophage gene. The HEGs each contain one copy of the LAGLIDADG motif and presumably function as homodimers. Phylogenetic analysis, in conjunction with their patchy taxonomic distribution, suggests that these intron-HEG elements have been transferred horizontally among organelles and bacteria. However, the mode of transfer and the nature of the biological connections among the intron-containing organisms are unknown.

  14. Methods for monitoring multiple gene expression

    Energy Technology Data Exchange (ETDEWEB)

    Berka, Randy (Davis, CA); Bachkirova, Elena (Davis, CA); Rey, Michael (Davis, CA)

    2012-05-01

    The present invention relates to methods for monitoring differential expression of a plurality of genes in a first filamentous fungal cell relative to expression of the same genes in one or more second filamentous fungal cells using microarrays containing Trichoderma reesei ESTs or SSH clones, or a combination thereof. The present invention also relates to computer readable media and substrates containing such array features for monitoring expression of a plurality of genes in filamentous fungal cells.

  15. Methods for monitoring multiple gene expression

    Energy Technology Data Exchange (ETDEWEB)

    Berka, Randy; Bachkirova, Elena; Rey, Michael

    2013-10-01

    The present invention relates to methods for monitoring differential expression of a plurality of genes in a first filamentous fungal cell relative to expression of the same genes in one or more second filamentous fungal cells using microarrays containing Trichoderma reesei ESTs or SSH clones, or a combination thereof. The present invention also relates to computer readable media and substrates containing such array features for monitoring expression of a plurality of genes in filamentous fungal cells.

  16. Methods for monitoring multiple gene expression

    Energy Technology Data Exchange (ETDEWEB)

    Berka, Randy [Davis, CA; Bachkirova, Elena [Davis, CA; Rey, Michael [Davis, CA

    2012-05-01

    The present invention relates to methods for monitoring differential expression of a plurality of genes in a first filamentous fungal cell relative to expression of the same genes in one or more second filamentous fungal cells using microarrays containing Trichoderma reesei ESTs or SSH clones, or a combination thereof. The present invention also relates to computer readable media and substrates containing such array features for monitoring expression of a plurality of genes in filamentous fungal cells.

  17. Methods for monitoring multiple gene expression

    Energy Technology Data Exchange (ETDEWEB)

    Berka, Randy; Bachkirova, Elena; Rey, Michael

    2013-10-01

    The present invention relates to methods for monitoring differential expression of a plurality of genes in a first filamentous fungal cell relative to expression of the same genes in one or more second filamentous fungal cells using microarrays containing Trichoderma reesei ESTs or SSH clones, or a combination thereof. The present invention also relates to computer readable media and substrates containing such array features for monitoring expression of a plurality of genes in filamentous fungal cells.

  18. Two complete mitochondrial genomes from Praticolella mexicana Perez, 2011 (Polygyridae) and gene order evolution in Helicoidea (Mollusca, Gastropoda)

    Science.gov (United States)

    Minton, Russell L.; Cruz, Marco A. Martinez; Farman, Mark L.; Perez, Kathryn E.

    2016-01-01

    Abstract Helicoidea is a diverse group of land snails with a global distribution. While much is known regarding the relationships of helicoid taxa, comparatively little is known about the evolution of the mitochondrial genome in the superfamily. We sequenced two complete mitochondrial genomes from Praticolella mexicana Perez, 2011 representing the first such data from the helicoid family Polygyridae, and used them in an evolutionary analysis of mitogenomic gene order. We found the mitochondrial genome of Praticolella mexicana to be 14,008 bp in size, possessing the typical 37 metazoan genes. Multiple alternate stop codons are used, as are incomplete stop codons. Mitogenome size and nucleotide content is consistent with other helicoid species. Our analysis of gene order suggested that Helicoidea has undergone four mitochondrial rearrangements in the past. Two rearrangements were limited to tRNA genes only, and two involved protein coding genes. PMID:27833437

  19. A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis

    Directory of Open Access Journals (Sweden)

    Mezey Jason G

    2010-01-01

    Full Text Available Abstract Background The success achieved by genome-wide association (GWA studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability. Results V-Bay provides a novel solution to the computational scaling constraints of most multiple locus methods and can complete a simultaneous analysis of a million genetic markers in a few hours, when using a desktop. Using a range of simulated genetic and GWA experimental scenarios, we demonstrate that V-Bay is highly accurate, and reliably identifies associations that are too weak to be discovered by single-marker testing approaches. V-Bay can also outperform a multiple locus analysis method based on the lasso, which has similar scaling properties for large numbers of genetic markers. For demonstration purposes, we also use V-Bay to confirm associations with gene expression in cell lines derived from the Phase II individuals of HapMap. Conclusions V-Bay is a versatile, fast, and accurate multiple locus GWA analysis tool for the practitioner interested in identifying weaker associations without high false positive rates.

  20. The evolution of chloroplast genes and genomes in ferns.

    Science.gov (United States)

    Wolf, Paul G; Der, Joshua P; Duffy, Aaron M; Davidson, Jacob B; Grusz, Amanda L; Pryer, Kathleen M

    2011-07-01

    Most of the publicly available data on chloroplast (plastid) genes and genomes come from seed plants, with relatively little information from their sister group, the ferns. Here we describe several broad evolutionary patterns and processes in fern plastid genomes (plastomes), and we include some new plastome sequence data. We review what we know about the evolutionary history of plastome structure across the fern phylogeny and we compare plastome organization and patterns of evolution in ferns to those in seed plants. A large clade of ferns is characterized by a plastome that has been reorganized with respect to the ancestral gene order (a similar order that is ancestral in seed plants). We review the sequence of inversions that gave rise to this organization. We also explore global nucleotide substitution patterns in ferns versus those found in seed plants across plastid genes, and we review the high levels of RNA editing observed in fern plastomes.

  1. Comparative genome-scale analysis of niche-based stress-responsive genes in Lactobacillus helveticus strains.

    Science.gov (United States)

    Senan, Suja; Prajapati, Jashbhai B; Joshi, Chaitanya G

    2014-04-01

    Next generation sequencing technologies with advanced bioinformatic tools present a unique opportunity to compare genomes from diverse niches. The identification of niche-specific stress-responsive genes can help in characterizing robust strains for multiple applications. In this study, we attempted to compare the stress-responsive genes of a potential probiotic strain, Lactobacillus helveticus MTCC 5463, and a cheese starter strain, Lactobacillus helveticus DPC 4571, from a gut and dairy niche, respectively. Sequencing of MTCC 5463 was done using 454 GS FLX, and contigs were assembled using GS Assembler software. Genome analysis was done using BLAST hits and the prokaryotic annotation server RAST. The MTCC 5463 genome carried multiple orthologs of genes governing stress responses, whereas the DPC 4571 genome lacked in the number of major stress-response proteins. The absence of the bile salt hydrolase gene in DPC 4571 and its presence in MTCC 5463 clearly indicated niche adaptation. Further, MTCC 5463 carried higher copy numbers of genes contributing towards heat, cold, osmotic, and oxidative stress resistance as compared with DPC 4571. Through comparative genomics, we could thus identify stress-responsive gene sets required to adapt to gut and dairy niches.

  2. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  3. The genome of Nectria haematococca: contribution of supernumerary chromosomes to gene expansion

    Energy Technology Data Exchange (ETDEWEB)

    Coleman, J.J.; Rounsley, S.D.; Rodriguez-Carres, M.; Kuo, A.; Wasmann, C.c.; Grimwood, J.; Schmutz, J.; Taga, M.; White, G.J.; Zhuo, S.; Schwartz, D.C.; Freitag, M.; Ma, L.-J.; Danchin, E.G.J.; Henrissat, B.; Cutinho, P.M.; Nelson, D.R.; Straney, D.; Napoli, C.A.; Baker, B.M.; Gribskov, M.; Rep, M.; Kroken, S.; Molnar, I.; Rensing, C.; Kennell, J.C.; Zamora, J.; Farman, M.L.; Selker, E.U.; Salamov, A.; Shapiro, H.; Pangilinan, J.; Lindquist, E.; Lamers, C.; Grigoriev, I.V.; Geiser, D.M.; Covert, S.F.; Temporini, S.; VanEtten, H.D.

    2009-04-20

    The ascomycetous fungus Nectria haematococca, (asexual name Fusarium solani), is a member of a group of .50 species known as the"Fusarium solani species complex". Members of this complex have diverse biological properties including the ability to cause disease on .100 genera of plants and opportunistic infections in humans. The current research analyzed the most extensively studied member of this complex, N. haematococca mating population VI (MPVI). Several genes controlling the ability of individual isolates of this species to colonize specific habitats are located on supernumerary chromosomes. Optical mapping revealed that the sequenced isolate has 17 chromosomes ranging from 530 kb to 6.52 Mb and that the physical size of the genome, 54.43 Mb, and the number of predicted genes, 15,707, are among the largest reported for ascomycetes. Two classes of genes have contributed to gene expansion: specific genes that are not found in other fungi including its closest sequenced relative, Fusarium graminearum; and genes that commonly occur as single copies in other fungi but are present as multiple copies in N. haematococca MPVI. Some of these additional genes appear to have resulted from gene duplication events, while others may have been acquired through horizontal gene transfer. The supernumerary nature of three chromosomes, 14, 15, and 17, was confirmed by their absence in pulsed field gel electrophoresis experiments of some isolates and by demonstrating that these isolates lacked chromosome-specific sequences found on the ends of these chromosomes. These supernumerary chromosomes contain more repeat sequences, are enriched in unique and duplicated genes, and have a lower G+C content in comparison to the other chromosomes. Although the origin(s) of the extra genes and the supernumerary chromosomes is not known, the gene expansion and its large genome size are consistent with this species' diverse range of habitats. Furthermore, the presence of unique genes on

  4. The genome of Nectria haematococca: contribution of supernumerary chromosomes to gene expansion.

    Directory of Open Access Journals (Sweden)

    Jeffrey J Coleman

    2009-08-01

    Full Text Available The ascomycetous fungus Nectria haematococca, (asexual name Fusarium solani, is a member of a group of >50 species known as the "Fusarium solani species complex". Members of this complex have diverse biological properties including the ability to cause disease on >100 genera of plants and opportunistic infections in humans. The current research analyzed the most extensively studied member of this complex, N. haematococca mating population VI (MPVI. Several genes controlling the ability of individual isolates of this species to colonize specific habitats are located on supernumerary chromosomes. Optical mapping revealed that the sequenced isolate has 17 chromosomes ranging from 530 kb to 6.52 Mb and that the physical size of the genome, 54.43 Mb, and the number of predicted genes, 15,707, are among the largest reported for ascomycetes. Two classes of genes have contributed to gene expansion: specific genes that are not found in other fungi including its closest sequenced relative, Fusarium graminearum; and genes that commonly occur as single copies in other fungi but are present as multiple copies in N. haematococca MPVI. Some of these additional genes appear to have resulted from gene duplication events, while others may have been acquired through horizontal gene transfer. The supernumerary nature of three chromosomes, 14, 15, and 17, was confirmed by their absence in pulsed field gel electrophoresis experiments of some isolates and by demonstrating that these isolates lacked chromosome-specific sequences found on the ends of these chromosomes. These supernumerary chromosomes contain more repeat sequences, are enriched in unique and duplicated genes, and have a lower G+C content in comparison to the other chromosomes. Although the origin(s of the extra genes and the supernumerary chromosomes is not known, the gene expansion and its large genome size are consistent with this species' diverse range of habitats. Furthermore, the presence of unique

  5. MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.

    Science.gov (United States)

    Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin

    2015-04-01

    Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  6. An enigmatic fourth runt domain gene in the fugu genome: ancestral gene loss versus accelerated evolution

    Directory of Open Access Journals (Sweden)

    Hood Leroy

    2004-11-01

    Full Text Available Abstract Background The runt domain transcription factors are key regulators of developmental processes in bilaterians, involved both in cell proliferation and differentiation, and their disruption usually leads to disease. Three runt domain genes have been described in each vertebrate genome (the RUNX gene family, but only one in other chordates. Therefore, the common ancestor of vertebrates has been thought to have had a single runt domain gene. Results Analysis of the genome draft of the fugu pufferfish (Takifugu rubripes reveals the existence of a fourth runt domain gene, FrRUNT, in addition to the orthologs of human RUNX1, RUNX2 and RUNX3. The tiny FrRUNT packs six exons and two putative promoters in just 3 kb of genomic sequence. The first exon is located within an intron of FrSUPT3H, the ortholog of human SUPT3H, and the first exon of FrSUPT3H resides within the first intron of FrRUNT. The two gene structures are therefore "interlocked". In the human genome, SUPT3H is instead interlocked with RUNX2. FrRUNT has no detectable ortholog in the genomes of mammals, birds or amphibians. We consider alternative explanations for an apparent contradiction between the phylogenetic data and the comparison of the genomic neighborhoods of human and fugu runt domain genes. We hypothesize that an ancient RUNT locus was lost in the tetrapod lineage, together with FrFSTL6, a member of a novel family of follistatin-like genes. Conclusions Our results suggest that the runt domain family may have started expanding in chordates much earlier than previously thought, and exemplify the importance of detailed analysis of whole-genome draft sequence to provide new insights into gene evolution.

  7. Identification of genes that are essential to restrict genome duplication to once per cell division

    Science.gov (United States)

    Vassilev, Alex; Lee, Chrissie Y.; Vassilev, Boris; Zhu, Wenge; Ormanoglu, Pinar; Martin, Scott E.; DePamphilis, Melvin L.

    2016-01-01

    Nuclear genome duplication is normally restricted to once per cell division, but aberrant events that allow excess DNA replication (EDR) promote genomic instability and aneuploidy, both of which are characteristics of cancer development. Here we provide the first comprehensive identification of genes that are essential to restrict genome duplication to once per cell division. An siRNA library of 21,584 human genes was screened for those that prevent EDR in cancer cells with undetectable chromosomal instability. Candidates were validated by testing multiple siRNAs and chemical inhibitors on both TP53+ and TP53- cells to reveal the relevance of this ubiquitous tumor suppressor to preventing EDR, and in the presence of an apoptosis inhibitor to reveal the full extent of EDR. The results revealed 42 genes that prevented either DNA re-replication or unscheduled endoreplication. All of them participate in one or more of eight cell cycle events. Seventeen of them have not been identified previously in this capacity. Remarkably, 14 of the 42 genes have been shown to prevent aneuploidy in mice. Moreover, suppressing a gene that prevents EDR increased the ability of the chemotherapeutic drug Paclitaxel to induce EDR, suggesting new opportunities for synthetic lethalities in the treatment of human cancers. PMID:27144335

  8. Differential differences in methylation status of putative imprinted genes among cloned swine genomes.

    Directory of Open Access Journals (Sweden)

    Chih-Jie Shen

    Full Text Available DNA methylation is a major epigenetic modification in the mammalian genome that regulates crucial aspects of gene function. Mammalian cloning by somatic cell nuclear transfer (SCNT often results in gestational or neonatal failure with only a small proportion of manipulated embryos producing live births. Many of the embryos that survive to term later succumb to a variety of abnormalities that are likely due to inappropriate epigenetic reprogramming. Aberrant methylation patterns of imprinted genes in cloned cattle and mice have been elucidated, but few reports have analyzed the cloned pig genome. Four surviving cloned sows that were created by ear fibroblast nuclear transfer, each with a different life span and multiple organ defects, such as heart defects and bone growth delay, were used as epigenetic study materials. First, we identified four putative differential methylation regions (DMR of imprinted genes in the wild-type pig genome, including two maternally imprinted loci (INS and IGF2 and two paternally imprinted loci (H19 and IGF2R. Aberrant DNA methylation, either hypermethylation or hypomethylation, commonly appeared in H19 (45% of imprinted loci hypermethylated vs. 30% hypomethylated, IGF2 (40% vs. 0%, INS (50% vs. 5%, and IGF2R (15% vs. 45% in multiple tissues from these four cloned sows compared with wild-type pigs. Our data suggest that aberrant epigenetic modifications occur frequently in the genome of cloned swine. Even with successful production of cloned swine that avoid prenatal or postnatal death, the perturbation of methylation in imprinted genes still exists, which may be one of reason for their adult pathologies and short life. Understanding the aberrant pattern of gene imprinting would permit improvements in future cloning techniques.

  9. Natural selection affects multiple aspects of genetic variation at putatively peutral sites across the human genome

    DEFF Research Database (Denmark)

    Lohmueller, Kirk E; Albrechtsen, Anders; Li, Yingrui

    2011-01-01

    A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries...... affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations....... these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination...

  10. High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

    Directory of Open Access Journals (Sweden)

    Pappas Georgios J

    2008-06-01

    Full Text Available Abstract Background Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation. Results With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST. EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches. Conclusion In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this

  11. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

    Science.gov (United States)

    Shi, Hongbo; Zhang, Guangde; Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen

    2016-01-01

    MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.

  12. Origin of multiple periodicities in the Fourier power spectra of the Plasmodium falciparum genome

    Directory of Open Access Journals (Sweden)

    Nunes Miriam CS

    2011-12-01

    Full Text Available Abstract Background Fourier transforms and their associated power spectra are used for detecting periodicities and protein-coding genes and is generally regarded as a well established technique. Many of the periodicities which have been found with this method are quite well understood such as the periodicity of 3 nt which is associated to codon usage. But what is the origin of the peculiar frequency multiples k/21 which were reported for a tiny section of chromosome 2 in P. falciparum? Are these present in other chromosomes and perhaps in related organisms? And how should we interpret fractional periodicities in genomes? Results We applied the binary indicator power spectrum to all chromosomes of P. falciparum, and found that the frequency overtones k/21 are present only in non-coding sections. We did not find such frequency overtones in any other related genomes. Furthermore, the frequency overtones were identified as artifacts of the way the genome is encoded into a numerical sequence, that is, they are frequency aliases. By choosing a different way to encode the sequence the overtones do not appear. In view of these results, we revisited early applications of this technique to proteins where frequency overtones were reported. Conclusions Some authors hinted recently at the possibility of mapping artifacts and frequency aliases in power spectra. However, in the case of P. falciparum the frequency aliases are particularly strong and can mask the 1/3 frequency which is used for gene detecting. This shows that albeit being a well known technique, with a long history of application in proteins, few researchers seem to be aware of the problems represented by frequency aliases.

  13. Genome-wide identification of new Wnt/β-catenin target genes in the human genome using CART method

    Directory of Open Access Journals (Sweden)

    Inestrosa Nibaldo C

    2010-06-01

    Full Text Available Abstract Background The importance of in silico predictions for understanding cellular processes is now widely accepted, and a variety of algorithms useful for studying different biological features have been designed. In particular, the prediction of cis regulatory modules in non-coding human genome regions represents a major challenge for understanding gene regulation in several diseases. Recently, studies of the Wnt signaling pathway revealed a connection with neurodegenerative diseases such as Alzheimer's. In this article, we construct a classification tool that uses the transcription factor binding site motifs composition of some gene promoters to identify new Wnt/β-catenin pathway target genes potentially involved in brain diseases. Results In this study, we propose 89 new Wnt/β-catenin pathway target genes predicted in silico by using a method based on multiple Classification and Regression Tree (CART analysis. We used as decision variables the presence of transcription factor binding site motifs in the upstream region of each gene. This prediction was validated by RT-qPCR in a sample of 9 genes. As expected, LEF1, a member of the T-cell factor/lymphoid enhancer-binding factor family (TCF/LEF1, was relevant for the classification algorithm and, remarkably, other factors related directly or indirectly to the inflammatory response and amyloidogenic processes also appeared to be relevant for the classification. Among the 89 new Wnt/β-catenin pathway targets, we found a group expressed in brain tissue that could be involved in diverse responses to neurodegenerative diseases, like Alzheimer's disease (AD. These genes represent new candidates to protect cells against amyloid β toxicity, in agreement with the proposed neuroprotective role of the Wnt signaling pathway. Conclusions Our multiple CART strategy proved to be an effective tool to identify new Wnt/β-catenin pathway targets based on the study of their regulatory regions in the human

  14. webMGR: an online tool for the multiple genome rearrangement problem.

    Science.gov (United States)

    Lin, Chi Ho; Zhao, Hao; Lowcay, Sean Harry; Shahab, Atif; Bourque, Guillaume

    2010-02-01

    The algorithm MGR enables the reconstruction of rearrangement phylogenies based on gene or synteny block order in multiple genomes. Although MGR has been successfully applied to study the evolution of different sets of species, its utilization has been hampered by the prohibitive running time for some applications. In the current work, we have designed new heuristics that significantly speed up the tool without compromising its accuracy. Moreover, we have developed a web server (webMGR) that includes elaborate web output to facilitate navigation through the results. webMGR can be accessed via http://www.gis.a-star.edu.sg/~bourque. The source code of the improved standalone version of MGR is also freely available from the web site. Supplementary data are available at Bioinformatics online.

  15. Specific amplification by PCR of rearranged genomic variable regions of immunoglobulin genes from mouse hybridoma cells.

    Science.gov (United States)

    Berdoz, J; Monath, T P; Kraehenbuhl, J P

    1995-04-01

    We have designed a novel strategy for the isolation of the rearranged genomic fragments encoding the L-VH-D-JH and L-V kappa/lambda-J kappa/lambda regions of mouse immunoglobulin genes. This strategy is based on the PCR amplification of genomic DNA from mouse hybridomas using multiple specific primers chosen in the 5'-untranslated region and in the intron downstream of the rearranged JH/J kappa/lambda sequences. Variable regions with intact coding sequences, including full-length leader peptides (L) can be obtained without previous DNA sequencing. Our strategy is based on a genomic template that produces fragments that do not need to be adapted for recombinant antibody expression, thus facilitating the generation of chimeric and isotype-switched immunoglobulins.

  16. Genome-Wide Analysis of BURP Domain-Containing Genes in Populus trichocarpa

    Institute of Scientific and Technical Information of China (English)

    Yuanhua Shao; Guo Wei; Ling Wang; Qing Dong; Yang Zhao; Beijiu Chen; Yan Xiang

    2011-01-01

    BURP domain-containing proteins have a conserved structure and are found extensively in plants.The functions of the proteins in this family are diverse,but remain unknown in Populus trichocarpa.In the present study,a complete genome of P.trichocarpa was analyzed bioinformatically.A total of 18 BURP family genes,named PtBURPs,were identified and characterized according to their physical positions on the P.trichocarpa chromosomes.A phylogenetic tree was generated from alignments of PtBURP protein sequences,while phylogenetic relationships were also examined between PtBURPs and BURP family genes in other plants,including rice,soybean,maize and sorghum.BURP genes in P.trichocarpa were classified into five classes,namely PG1β-Iike,BNM2-like,USP-like,RD22-like and BURP V.The multiple expectation maximization for motif elicitation (MEME) and multiple protein sequence alignments of PtBURPs were also performed.Results from the transcript level analyses of 10 PtBURP genes under different stress conditions revealed the expression patterns in poplar and led to a discussion on genome duplication and evolution,expression profiles and function of PtBURP genes.

  17. Genome-wide functional screen identifies a compendium of genes affecting sensitivity to tamoxifen.

    Science.gov (United States)

    Mendes-Pereira, Ana M; Sims, David; Dexter, Tim; Fenwick, Kerry; Assiotis, Ioannis; Kozarewa, Iwanka; Mitsopoulos, Costas; Hakas, Jarle; Zvelebil, Marketa; Lord, Christopher J; Ashworth, Alan

    2012-02-21

    Therapies that target estrogen signaling have made a very considerable contribution to reducing mortality from breast cancer. However, resistance to tamoxifen remains a major clinical problem. Here we have used a genome-wide functional profiling approach to identify multiple genes that confer resistance or sensitivity to tamoxifen. Combining whole-genome shRNA screening with massively parallel sequencing, we have profiled the impact of more than 56,670 RNA interference reagents targeting 16,487 genes on the cellular response to tamoxifen. This screen, along with subsequent validation experiments, identifies a compendium of genes whose silencing causes tamoxifen resistance (including BAP1, CLPP, GPRC5D, NAE1, NF1, NIPBL, NSD1, RAD21, RARG, SMC3, and UBA3) and also a set of genes whose silencing causes sensitivity to this endocrine agent (C10orf72, C15orf55/NUT, EDF1, ING5, KRAS, NOC3L, PPP1R15B, RRAS2, TMPRSS2, and TPM4). Multiple individual genes, including NF1, a regulator of RAS signaling, also correlate with clinical outcome after tamoxifen treatment.

  18. Prostate cancer risk locus at 8q24 as a regulatory hub by physical interactions with multiple genomic loci across the genome.

    Science.gov (United States)

    Du, Meijun; Yuan, Tiezheng; Schilter, Kala F; Dittmar, Rachel L; Mackinnon, Alexander; Huang, Xiaoyi; Tschannen, Michael; Worthey, Elizabeth; Jacob, Howard; Xia, Shu; Gao, Jianzhong; Tillmans, Lori; Lu, Yan; Liu, Pengyuan; Thibodeau, Stephen N; Wang, Liang

    2015-01-01

    Chromosome 8q24 locus contains regulatory variants that modulate genetic risk to various cancers including prostate cancer (PC). However, the biological mechanism underlying this regulation is not well understood. Here, we developed a chromosome conformation capture (3C)-based multi-target sequencing technology and systematically examined three PC risk regions at the 8q24 locus and their potential regulatory targets across human genome in six cell lines. We observed frequent physical contacts of this risk locus with multiple genomic regions, in particular, inter-chromosomal interaction with CD96 at 3q13 and intra-chromosomal interaction with MYC at 8q24. We identified at least five interaction hot spots within the predicted functional regulatory elements at the 8q24 risk locus. We also found intra-chromosomal interaction genes PVT1, FAM84B and GSDMC and inter-chromosomal interaction gene CXorf36 in most of the six cell lines. Other gene regions appeared to be cell line-specific, such as RRP12 in LNCaP, USP14 in DU-145 and SMIN3 in lymphoblastoid cell line. We further found that the 8q24 functional domains more likely interacted with genomic regions containing genes enriched in critical pathways such as Wnt signaling and promoter motifs such as E2F1 and TCF3. This result suggests that the risk locus may function as a regulatory hub by physical interactions with multiple genes important for prostate carcinogenesis. Further understanding genetic effect and biological mechanism of these chromatin interactions will shed light on the newly discovered regulatory role of the risk locus in PC etiology and progression.

  19. The Complete Mitochondrial Genome of Aleurocanthus camelliae: Insights into Gene Arrangement and Genome Organization within the Family Aleyrodidae.

    Science.gov (United States)

    Chen, Shi-Chun; Wang, Xiao-Qing; Li, Pin-Wu; Hu, Xiang; Wang, Jin-Jun; Peng, Ping

    2016-11-07

    There are numerous gene rearrangements and transfer RNA gene absences existing in mitochondrial (mt) genomes of Aleyrodidae species. To understand how mt genomes evolved in the family Aleyrodidae, we have sequenced the complete mt genome of Aleurocanthus camelliae and comparatively analyzed all reported whitefly mt genomes. The mt genome of A. camelliae is 15,188 bp long, and consists of 13 protein-coding genes, two rRNA genes, 21 tRNA genes and a putative control region (GenBank: KU761949). The tRNA gene, trnI, has not been observed in this genome. The mt genome has a unique gene order and shares most gene boundaries with Tetraleurodes acaciae. Nineteen of 21 tRNA genes have the conventional cloverleaf shaped secondary structure and two (trnS₁ and trnS₂) lack the dihydrouridine (DHU) arm. Using ARWEN and homologous sequence alignment, we have identified five tRNA genes and revised the annotation for three whitefly mt genomes. This result suggests that most absent genes exist in the genomes and have not been identified, due to be lack of technology and inference sequence. The phylogenetic relationships among 11 whiteflies and Drosophila melanogaster were inferred by maximum likelihood and Bayesian inference methods. Aleurocanthus camelliae and T. acaciae form a sister group, and all three Bemisia tabaci and two Bemisia afer strains gather together. These results are identical to the relationships inferred from gene order. We inferred that gene rearrangement plays an important role in the mt genome evolved from whiteflies.

  20. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David

    2012-01-01

    more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...

  1. Evolutionary maintenance of filovirus-like genes in bat genomes

    Directory of Open Access Journals (Sweden)

    Taylor Derek J

    2011-11-01

    Full Text Available Abstract Background Little is known of the biological significance and evolutionary maintenance of integrated non-retroviral RNA virus genes in eukaryotic host genomes. Here, we isolated novel filovirus-like genes from bat genomes and tested for evolutionary maintenance. We also estimated the age of filovirus VP35-like gene integrations and tested the phylogenetic hypotheses that there is a eutherian mammal clade and a marsupial/ebolavirus/Marburgvirus dichotomy for filoviruses. Results We detected homologous copies of VP35-like and NP-like gene integrations in both Old World and New World species of Myotis (bats. We also detected previously unknown VP35-like genes in rodents that are positionally homologous. Comprehensive phylogenetic estimates for filovirus NP-like and VP35-like loci support two main clades with a marsupial and a rodent grouping within the ebolavirus/Lloviu virus/Marburgvirus clade. The concordance of VP35-like, NP-like and mitochondrial gene trees with the expected species tree supports the notion that the copies we examined are orthologs that predate the global spread and radiation of the genus Myotis. Parametric simulations were consistent with selective maintenance for the open reading frame (ORF of VP35-like genes in Myotis. The ORF of the filovirus-like VP35 gene has been maintained in bat genomes for an estimated 13. 4 MY. ORFs were disrupted for the NP-like genes in Myotis. Likelihood ratio tests revealed that a model that accommodates positive selection is a significantly better fit to the data than a model that does not allow for positive selection for VP35-like sequences. Moreover, site-by-site analysis of selection using two methods indicated at least 25 sites in the VP35-like alignment are under positive selection in Myotis. Conclusions Our results indicate that filovirus-like elements have significance beyond genomic imprints of prior infection. That is, there appears to be, or have been, functionally maintained

  2. On the representability of complete genomes by multiple competing finite-context (Markov models.

    Directory of Open Access Journals (Sweden)

    Armando J Pinho

    Full Text Available A finite-context (Markov model of order k yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth k. Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i multiple competing Markov models of different orders (ii careful programming techniques that allow orders as large as sixteen (iii adequate inverted repeat handling (iv probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range, contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character.

  3. Genomic characterization of large rearrangements of the LDLR gene in Czech patients with familial hypercholesterolemia

    Directory of Open Access Journals (Sweden)

    Fajkus Jiří

    2010-07-01

    Full Text Available Abstract Background Mutations in the LDLR gene are the most frequent cause of Familial hypercholesterolemia, an autosomal dominant disease characterised by elevated concentrations of LDL in blood plasma. In many populations, large genomic rearrangements account for approximately 10% of mutations in the LDLR gene. Methods DNA diagnostics of large genomic rearrangements was based on Multiple Ligation dependent Probe Amplification (MLPA. Subsequent analyses of deletion and duplication breakpoints were performed using long-range PCR, PCR, and DNA sequencing. Results In set of 1441 unrelated FH patients, large genomic rearrangements were found in 37 probands. Eight different types of rearrangements were detected, from them 6 types were novel, not described so far. In all rearrangements, we characterized their exact extent and breakpoint sequences. Conclusions Sequence analysis of deletion and duplication breakpoints indicates that intrachromatid non-allelic homologous recombination (NAHR between Alu elements is involved in 6 events, while a non-homologous end joining (NHEJ is implicated in 2 rearrangements. Our study thus describes for the first time NHEJ as a mechanism involved in genomic rearrangements in the LDLR gene.

  4. Genomic Analyses of Bacterial Porin-Cytochrome Gene Clusters

    Directory of Open Access Journals (Sweden)

    Liang eShi

    2014-11-01

    Full Text Available The porin-cytochrome (Pcc protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c-type cytochrome (c-Cyt and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteria from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr gene clusters of other Fe(III-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III and Mn(IV oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular electron transfer reactions with the substrates other than Fe(III and Mn(IV oxides.

  5. Unique genomic arrangements in an invasive serotype M23 strain of Streptococcus pyogenes identify genes that induce hypervirulence.

    Science.gov (United States)

    Bao, Yunjuan; Liang, Zhong; Booyjzsen, Claire; Mayfield, Jeffrey A; Li, Yang; Lee, Shaun W; Ploplis, Victoria A; Song, Hui; Castellino, Francis J

    2014-12-01

    The first genome sequence of a group A Streptococcus pyogenes serotype M23 (emm23) strain (M23ND), isolated from an invasive human infection, has been completed. The genome of this opacity factor-negative (SOF(-)) strain is composed of a circular chromosome of 1,846,477 bp. Gene profiling showed that this strain contained six phage-encoded and 24 chromosomally inherited well-known virulence factors, as well as 11 pseudogenes. The bacterium has acquired four large prophage elements, ΦM23ND.1 to ΦM23ND.4, harboring genes encoding streptococcal superantigen (ssa), streptococcal pyrogenic exotoxins (speC, speH, and speI), and DNases (spd1 and spd3), with phage integrase genes being present at one flank of each phage insertion, suggesting that the phages were integrated by horizontal gene transfer. Comparative analyses revealed unique large-scale genomic rearrangements that result in genomic rearrangements that differ from those of previously sequenced GAS strains. These rearrangements resulted in an imbalanced genomic architecture and translocations of chromosomal virulence genes. The covS sensor in M23ND was identified as a pseudogene, resulting in the attenuation of speB function and increased expression of the genes for the chromosomal virulence factors multiple-gene activator (mga), M protein (emm23), C5a peptidase (scpA), fibronectin-binding proteins (sfbI and fbp54), streptolysin O (slo), hyaluronic acid capsule (hasA), streptokinase (ska), and DNases (spd and spd3), which were verified by PCR. These genes are responsible for facilitating host epithelial cell binding and and/or immune evasion, thus further contributing to the virulence of M23ND. In conclusion, strain M23ND has become highly pathogenic as the result of a combination of multiple genetic factors, particularly gene composition and mutations, prophage integrations, unique genomic rearrangements, and regulated expression of critical virulence factors.

  6. Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data

    Directory of Open Access Journals (Sweden)

    Kim Seon-Young

    2006-07-01

    Full Text Available Abstract Background A complete understanding of the regulatory mechanisms of gene expression is the next important issue of genomics. Many bioinformaticians have developed methods and algorithms for predicting transcriptional regulatory mechanisms from sequence, gene expression, and binding data. However, most of these studies involved the use of yeast which has much simpler regulatory networks than human and has many genome wide binding data and gene expression data under diverse conditions. Studies of genome wide transcriptional networks of human genomes currently lag behind those of yeast. Results We report herein a new method that combines gene expression data analysis with promoter analysis to infer transcriptional regulatory elements of human genes. The Z scores from the application of gene set analysis with gene sets of transcription factor binding sites (TFBSs were successfully used to represent the activity of TFBSs in a given microarray data set. A significant correlation between the Z scores of gene sets of TFBSs and individual genes across multiple conditions permitted successful identification of many known human transcriptional regulatory elements of genes as well as the prediction of numerous putative TFBSs of many genes which will constitute a good starting point for further experiments. Using Z scores of gene sets of TFBSs produced better predictions than the use of mRNA levels of a transcription factor itself, suggesting that the Z scores of gene sets of TFBSs better represent diverse mechanisms for changing the activity of transcription factors in the cell. In addition, cis-regulatory modules, combinations of co-acting TFBSs, were readily identified by our analysis. Conclusion By a strategic combination of gene set level analysis of gene expression data sets and promoter analysis, we were able to identify and predict many transcriptional regulatory elements of human genes. We conclude that this approach will aid in decoding

  7. Systematically fragmented genes in a multipartite mitochondrial genome

    Science.gov (United States)

    Vlcek, Cestmir; Marande, William; Teijeiro, Shona; Lukeš, Julius; Burger, Gertraud

    2011-01-01

    Arguably, the most bizarre mitochondrial DNA (mtDNA) is that of the euglenozoan eukaryote Diplonema papillatum. The genome consists of numerous small circular chromosomes none of which appears to encode a complete gene. For instance, the cox1 coding sequence is spread out over nine different chromosomes in non-overlapping pieces (modules), which are transcribed separately and joined to a contiguous mRNA by trans-splicing. Here, we examine how many genes are encoded by Diplonema mtDNA and whether all are fragmented and their transcripts trans-spliced. Module identification is challenging due to the sequence divergence of Diplonema mitochondrial genes. By employing most sensitive protein profile search algorithms and comparing genomic with cDNA sequence, we recognize a total of 11 typical mitochondrial genes. The 10 protein-coding genes are systematically chopped up into three to 12 modules of 60–350 bp length. The corresponding mRNAs are all trans-spliced. Identification of ribosomal RNAs is most difficult. So far, we only detect the 3′-module of the large subunit ribosomal RNA (rRNA); it does not trans-splice with other pieces. The small subunit rRNA gene remains elusive. Our results open new intriguing questions about the biochemistry and evolution of mitochondrial trans-splicing in Diplonema. PMID:20935050

  8. Correlation of microsynteny conservation and disease gene distribution in mammalian genomes

    Directory of Open Access Journals (Sweden)

    Li Xiting

    2009-11-01

    Full Text Available Abstract Background With the completion of the whole genome sequence for many organisms, investigations into genomic structure have revealed that gene distribution is variable, and that genes with similar function or expression are located within clusters. This clustering suggests that there are evolutionary constraints that determine genome architecture. However, as most of the evidence for constraints on genome evolution comes from studies on yeast, it is unclear how much of this prior work can be extrapolated to mammalian genomes. Therefore, in this work we wished to examine the constraints on regions of the mammalian genome containing conserved gene clusters. Results We first identified regions of the mouse genome with microsynteny conservation by comparing gene arrangement in the mouse genome to the human, rat, and dog genomes. We then asked if any particular gene types were found preferentially in conserved regions. We found a significant correlation between conserved microsynteny and the density of mouse orthologs of human disease genes, suggesting that disease genes are clustered in genomic regions of increased microsynteny conservation. Conclusion The correlation between microsynteny conservation and disease gene locations indicates that regions of the mouse genome with microsynteny conservation may contain undiscovered human disease genes. This study not only demonstrates that gene function constrains mammalian genome organization, but also identifies regions of the mouse genome that can be experimentally examined to produce mouse models of human disease.

  9. Genome Sequences for Multiple Clavibacter Strains from Different Subspecies.

    Science.gov (United States)

    Li, Xiang Sean; Yuan, Xiaoli Kat

    2017-09-21

    The Gram-positive genus Clavibacter harbors economically important plant pathogens infecting a variety of agricultural crops, such as potato, tomato, corn, barley, etc. Here, we report five new genome sequences, those of strains CFIA-Cs3N, CFIA-CsR14, LMG 3663(T), LMG 7333(T), and ATCC 33566(T), from different subspecies of Clavibacter michiganensis All these genomic data will be used for reclassification and niche-adapted feature comparisons. © Crown copyright 2017.

  10. Genome-wide analysis of homeobox gene family in legumes: identification, gene duplication and expression profiling.

    Science.gov (United States)

    Bhattacharjee, Annapurna; Ghangal, Rajesh; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Homeobox genes encode transcription factors that are known to play a major role in different aspects of plant growth and development. In the present study, we identified homeobox genes belonging to 14 different classes in five legume species, including chickpea, soybean, Medicago, Lotus and pigeonpea. The characteristic differences within homeodomain sequences among various classes of homeobox gene family were quite evident. Genome-wide expression analysis using publicly available datasets (RNA-seq and microarray) indicated that homeobox genes are differentially expressed in various tissues/developmental stages and under stress conditions in different legumes. We validated the differential expression of selected chickpea homeobox genes via quantitative reverse transcription polymerase chain reaction. Genome duplication analysis in soybean indicated that segmental duplication has significantly contributed in the expansion of homeobox gene family. The Ka/Ks ratio of duplicated homeobox genes in soybean showed that several members of this family have undergone purifying selection. Moreover, expression profiling indicated that duplicated genes might have been retained due to sub-functionalization. The genome-wide identification and comprehensive gene expression profiling of homeobox gene family members in legumes will provide opportunities for functional analysis to unravel their exact role in plant growth and development.

  11. A statistical multiprobe model for analyzing cis and trans genes in genetical genomics experiments with short-oligonucleotide arrays

    NARCIS (Netherlands)

    Alberts, Rudi; Terpstra, Peter; Bystrykh, Leonid V.; Haan, Gerald de; Jansen, Ritsert C.

    2005-01-01

    Short-oligonucleotide arrays typically contain multiple probes per gene. In genetical genomics applications a statistical model for the individual probe signals can help in separating ‘‘true’’ differential mRNA expression from ‘‘ghost’’ effects caused by polymorphisms, misdesigned probes, and batch

  12. Comparative genomics of multiple strains of Pseudomonas cannabina pv. alisalensis, a potential model pathogen of both monocots and dicots.

    Directory of Open Access Journals (Sweden)

    Panagiotis F Sarris

    Full Text Available Comparative genomics of closely related pathogens that differ in host range can provide insights into mechanisms of host-pathogen interactions and host adaptation. Furthermore, sequencing of multiple strains with the same host range reveals information concerning pathogen diversity and the molecular basis of virulence. Here we present a comparative analysis of draft genome sequences for four strains of Pseudomonas cannabina pathovar alisalensis (Pcal, which is pathogenic on a range of monocotyledonous and dicotyledonous plants. These draft genome sequences provide a foundation for understanding host range evolution across the monocot-dicot divide. Like other phytopathogenic pseudomonads, Pcal strains harboured a hrp/hrc gene cluster that codes for a type III secretion system. Phylogenetic analysis based on the hrp/hrc cluster genes/proteins, suggests localized recombination and functional divergence within the hrp/hrc cluster. Despite significant conservation of overall genetic content across Pcal genomes, comparison of type III effector repertoires reinforced previous molecular data suggesting the existence of two distinct lineages within this pathovar. Furthermore, all Pcal strains analyzed harbored two distinct genomic islands predicted to code for type VI secretion systems (T6SSs. While one of these systems was orthologous to known P. syringae T6SSs, the other more closely resembled a T6SS found within P. aeruginosa. In summary, our study provides a foundation to unravel Pcal adaptation to both monocot and dicot hosts and provides genetic insights into the mechanisms underlying pathogenicity.

  13. Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea).

    Science.gov (United States)

    Gao, Feng; Song, Weibo; Katz, Laura A

    2014-08-01

    In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that (1) alternative processing is extensive among gene families; and (2) such gene families are likely to be C. uncinata specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family-a protein kinase domain containing protein (PKc)-from two C. uncinata strains. Analysis of the PKc sequences reveals that (1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and (2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.

  14. Alpha tubulin genes from Leishmania braziliensis: genomic organization, gene structure and insights on their expression.

    Science.gov (United States)

    Ramírez, César A; Requena, José M; Puerta, Concepción J

    2013-07-06

    Alpha tubulin is a fundamental component of the cytoskeleton which is responsible for cell shape and is involved in cell division, ciliary and flagellar motility and intracellular transport. Alpha tubulin gene expression varies according to the morphological changes suffered by Leishmania in its life cycle. However, the objective of studying the mechanisms responsible for the differential expression has resulted to be a difficult task due to the complex genome organization of tubulin genes and to the non-conventional mechanisms of gene regulation operating in Leishmania. We started this work by analyzing the genomic organization of α-tubulin genes in the Leishmania braziliensis genome database. The genomic organization of L. braziliensis α-tubulin genes differs from that existing in the L. major and L. infantum genomes. Two loci containing α-tubulin genes were found in the chromosomes 13 and 29, even though the existence of sequence gaps does not allow knowing the exact number of genes at each locus. Southern blot assays showed that α-tubulin locus at chromosome 13 contains at least 8 gene copies, which are tandemly organized with a 2.08-kb repetition unit; the locus at chromosome 29 seems to contain a sole α-tubulin gene. In addition, it was found that L. braziliensis α-tubulin locus at chromosome 13 contains two types of α-tubulin genes differing in their 3' UTR, each one presumably containing different regulatory motifs. It was also determined that the mRNA expression levels of these genes are controlled by post-transcriptional mechanisms tightly linked to the growth temperature. Moreover, the decrease in the α-tubulin mRNA abundance observed when promastigotes were cultured at 35°C was accompanied by parasite morphology alterations, similar to that occurring during the promastigote to amastigote differentiation. Information found in the genome databases indicates that α-tubulin genes have been reorganized in a drastic manner along Leishmania

  15. Identification and expression analysis of multiple FRO gene copies in Medicago truncatula.

    Science.gov (United States)

    Del C Orozco-Mosqueda, Ma; Santoyo, G; Farías-Rodríguez, R; Macías-Rodríguez, L; Valencia-Cantero, E

    2012-12-17

    Iron (Fe) is an essential element for plant growth. Commonly, this element is found in an oxidized form in soil, which is poorly available for plants. Therefore, plants have evolved ferric-chelate reductase enzymes (FRO) to reduce iron into a more soluble ferrous form. Fe scarcity in plants induce the FRO enzyme activity. Although the legume Medicago truncatula has been employed as a model for FRO activity studies, only one copy of the M. truncatula MtFRO1 gene has been characterized so far. In this study, we identified multiple gene copies of the MtFRO gene in the genome of M. truncatula by an in silico search, using BLAST analysis in the database of the M. truncatula Genome Sequencing Project and the National Center for Biotechnology Information, and also determined whether they are functional. We identified five genes apart from MtFRO1, which had been already characterized. All of the MtFRO genes exhibited high identity with homologous FRO genes from Lycopersicon esculentum, Citrus junos and Arabidopsis thaliana. The gene copies also presented characteristic conserved FAD and NADPH motifs, transmembrane regions and oxidoreductase signature motifs. We also detected expression in five of the putative MtFRO sequences by semiquantitative RT-PCR analysis, performed with mRNA from root and shoot tissues. Iron scarcity might be a condition for an elevated expression of the MtFRO genes observed in different M. truncatula tissues.

  16. Classical Oncogenes and Tumor Suppressor Genes: A Comparative Genomics Perspective

    Directory of Open Access Journals (Sweden)

    Oxana K. Pickeral

    2000-05-01

    Full Text Available We have curated a reference set of cancer-related genes and reanalyzed their sequences in the light of molecular information and resources that have become available since they were first cloned. Homology studies were carried out for human oncogenes and tumor suppressors, compared with the complete proteome of the nematode, Caenorhabditis elegans, and partial proteomes of mouse and rat and the fruit fly, Drosophila melanogaster. Our results demonstrate that simple, semi-automated bioinformatics approaches to identifying putative functionally equivalent gene products in different organisms may often be misleading. An electronic supplement to this article1 provides an integrated view of our comparative genomics analysis as well as mapping data, physical cDNA resources and links to published literature and reviews, thus creating a “window” into the genomes of humans and other organisms for cancer biology.

  17. Genomic discovery of potent chromatin insulators for human gene therapy.

    Science.gov (United States)

    Liu, Mingdong; Maurano, Matthew T; Wang, Hao; Qi, Heyuan; Song, Chao-Zhong; Navas, Patrick A; Emery, David W; Stamatoyannopoulos, John A; Stamatoyannopoulos, George

    2015-02-01

    Insertional mutagenesis and genotoxicity, which usually manifest as hematopoietic malignancy, represent major barriers to realizing the promise of gene therapy. Although insulator sequences that block transcriptional enhancers could mitigate or eliminate these risks, so far no human insulators with high functional potency have been identified. Here we describe a genomic approach for the identification of compact sequence elements that function as insulators. These elements are highly occupied by the insulator protein CTCF, are DNase I hypersensitive and represent only a small minority of the CTCF recognition sequences in the human genome. We show that the elements identified acted as potent enhancer blockers and substantially decreased the risk of tumor formation in a cancer-prone animal model. The elements are small, can be efficiently accommodated by viral vectors and have no detrimental effects on viral titers. The insulators we describe here are expected to increase the safety of gene therapy for genetic diseases.

  18. A whole genome RNAi screen identifies replication stress response genes.

    Science.gov (United States)

    Kavanaugh, Gina; Ye, Fei; Mohni, Kareem N; Luzwick, Jessica W; Glick, Gloria; Cortez, David

    2015-11-01

    Proper DNA replication is critical to maintain genome stability. When the DNA replication machinery encounters obstacles to replication, replication forks stall and the replication stress response is activated. This response includes activation of cell cycle checkpoints, stabilization of the replication fork, and DNA damage repair and tolerance mechanisms. Defects in the replication stress response can result in alterations to the DNA sequence causing changes in protein function and expression, ultimately leading to disease states such as cancer. To identify additional genes that control the replication stress response, we performed a three-parameter, high content, whole genome siRNA screen measuring DNA replication before and after a challenge with replication stress as well as a marker of checkpoint kinase signalling. We identified over 200 replication stress response genes and subsequently analyzed how they influence cellular viability in response to replication stress. These data will serve as a useful resource for understanding the replication stress response.

  19. Comparison of genome-wide selection strategies to identify furfural tolerance genes in Escherichia coli.

    Science.gov (United States)

    Glebes, Tirzah Y; Sandoval, Nicholas R; Gillis, Jacob H; Gill, Ryan T

    2015-01-01

    Engineering both feedstock and product tolerance is important for transitioning towards next-generation biofuels derived from renewable sources. Tolerance to chemical inhibitors typically results in complex phenotypes, for which multiple genetic changes must often be made to confer tolerance. Here, we performed a genome-wide search for furfural-tolerant alleles using the TRackable Multiplex Recombineering (TRMR) method (Warner et al. (2010), Nature Biotechnology), which uses chromosomally integrated mutations directed towards increased or decreased expression of virtually every gene in Escherichia coli. We employed various growth selection strategies to assess the role of selection design towards growth enrichments. We also compared genes with increased fitness from our TRMR selection to those from a previously reported genome-wide identification study of furfural tolerance genes using a plasmid-based genomic library approach (Glebes et al. (2014) PLOS ONE). In several cases, growth improvements were observed for the chromosomally integrated promoter/RBS mutations but not for the plasmid-based overexpression constructs. Through this assessment, four novel tolerance genes, ahpC, yhjH, rna, and dicA, were identified and confirmed for their effect on improving growth in the presence of furfural.

  20. Predominant and substoichiometric isomers of the plastid genome coexist within Juniperus plants and have shifted multiple times during cupressophyte evolution.

    Science.gov (United States)

    Guo, Wenhu; Grewe, Felix; Cobo-Clark, Amie; Fan, Weishu; Duan, Zelin; Adams, Robert P; Schwarzbach, Andrea E; Mower, Jeffrey P

    2014-03-01

    Most land plant plastomes contain two copies of a large inverted repeat (IR) that promote high-frequency homologous recombination to generate isomeric genomic forms. Among conifer plastomes, this canonical IR is highly reduced in Pinaceae and completely lost from cupressophytes. However, both lineages have acquired short, novel IRs, some of which also exhibit recombinational activity to generate genomic structural diversity. This diversity has been shown to exist between, and occasionally within, cupressophyte species, but it is not known whether multiple genomic forms coexist within individual plants. To examine the recombinational potential of the novel cupressophyte IRs within individuals and between species, we sequenced the plastomes of four closely related species of Juniperus. The four plastomes have identical gene content and genome organization except for a large 36 kb inversion between approximately 250 bp IR containing trnQ-UUG. Southern blotting showed that different isomeric versions of the plastome predominate among individual junipers, whereas polymerase chain reaction and high-throughput read-pair mapping revealed the substoichiometric presence of the alternative isomeric form within each individual plant. Furthermore, our comparative genomic studies demonstrate that the predominant and substoichiometric arrangements of this IR have changed several times in other cupressophytes as well. These results provide compelling evidence for substoichiometric shifting of plastomic forms during cupressophyte evolution and suggest that substoichiometric shifting activity in plastid genomes may be adaptive.

  1. Cartilage-selective genes identified in genome-scale analysis of non-cartilage and cartilage gene expression

    Directory of Open Access Journals (Sweden)

    Cohn Zachary A

    2007-06-01

    Full Text Available Abstract Background Cartilage plays a fundamental role in the development of the human skeleton. Early in embryogenesis, mesenchymal cells condense and differentiate into chondrocytes to shape the early skeleton. Subsequently, the cartilage anlagen differentiate to form the growth plates, which are responsible for linear bone growth, and the articular chondrocytes, which facilitate joint function. However, despite the multiplicity of roles of cartilage during human fetal life, surprisingly little is known about its transcriptome. To address this, a whole genome microarray expression profile was generated using RNA isolated from 18–22 week human distal femur fetal cartilage and compared with a database of control normal human tissues aggregated at UCLA, termed Celsius. Results 161 cartilage-selective genes were identified, defined as genes significantly expressed in cartilage with low expression and little variation across a panel of 34 non-cartilage tissues. Among these 161 genes were cartilage-specific genes such as cartilage collagen genes and 25 genes which have been associated with skeletal phenotypes in humans and/or mice. Many of the other cartilage-selective genes do not have established roles in cartilage or are novel, unannotated genes. Quantitative RT-PCR confirmed the unique pattern of gene expression observed by microarray analysis. Conclusion Defining the gene expression pattern for cartilage has identified new genes that may contribute to human skeletogenesis as well as provided further candidate genes for skeletal dysplasias. The data suggest that fetal cartilage is a complex and transcriptionally active tissue and demonstrate that the set of genes selectively expressed in the tissue has been greatly underestimated.

  2. Functional Genomics of Allergen Gene Families in Fruits

    Directory of Open Access Journals (Sweden)

    Fatemeh Maghuly

    2009-10-01

    Full Text Available Fruit consumption is encouraged for health reasons; however, fruits may harbour a series of allergenic proteins that may cause discomfort or even represent serious threats to certain individuals. Thus, the identification and characterization of allergens in fruits requires novel approaches involving genomic and proteomic tools. Since avoidance of fruits also negatively affects the quality of patients’ lives, biotechnological interventions are ongoing to produce low allergenic fruits by down regulating specific genes. In this respect, the control of proteins associated with allergenicity could be achieved by fine tuning the spatial and temporal expression of the relevant genes.

  3. Genomic organization and evolution of the ULBP genes in cattle.

    Science.gov (United States)

    Larson, Joshua H; Marron, Brandy M; Beever, Jonathan E; Roe, Bruce A; Lewin, Harris A

    2006-09-05

    The cattle UL16-binding protein 1 (ULBP1) and ULBP2 genes encode members of the MHC Class I superfamily that have homology to the human ULBP genes. Human ULBP1 and ULBP2 interact with the NKG2D receptor to activate effector cells in the immune system. The human cytomegalovirus UL16 protein is known to disrupt the ULBP-NKG2D interaction, thereby subverting natural killer cell-mediated responses. Previous Southern blotting experiments identified evidence of increased ULBP copy number within the genomes of ruminant artiodactyls. On the basis of these observations we hypothesized that the cattle ULBPs evolved by duplication and sequence divergence to produce a sufficient number and diversity of ULBP molecules to deliver an immune activation signal in the presence of immunogenic peptides. Given the importance of the ULBPs in antiviral immunity in other species, our goal was to determine the copy number and genomic organization of the ULBP genes in the cattle genome. Sequencing of cattle bacterial artificial chromosome genomic inserts resulted in the identification of 30 cattle ULBP loci existing in two gene clusters. Evidence of extensive segmental duplication and approximately 14 Kbp of novel repetitive sequences were identified within the major cluster. Ten ULBPs are predicted to be expressed at the cell surface. Substitution analysis revealed 11 outwardly directed residues in the predicted extracellular domains that show evidence of positive Darwinian selection. These positively selected residues have only one residue that overlaps with those proposed to interact with NKG2D, thus suggesting the interaction with molecules other than NKG2D. The ULBP loci in the cattle genome apparently arose by gene duplication and subsequent sequence divergence. Substitution analysis of the ULBP proteins provided convincing evidence for positive selection on extracellular residues that may interact with peptide ligands. These results support our hypothesis that the cattle ULBPs

  4. Genomic organization and evolution of the ULBP genes in cattle

    Directory of Open Access Journals (Sweden)

    Lewin Harris A

    2006-09-01

    Full Text Available Abstract Background The cattle UL16-binding protein 1 (ULBP1 and ULBP2 genes encode members of the MHC Class I superfamily that have homology to the human ULBP genes. Human ULBP1 and ULBP2 interact with the NKG2D receptor to activate effector cells in the immune system. The human cytomegalovirus UL16 protein is known to disrupt the ULBP-NKG2D interaction, thereby subverting natural killer cell-mediated responses. Previous Southern blotting experiments identified evidence of increased ULBP copy number within the genomes of ruminant artiodactyls. On the basis of these observations we hypothesized that the cattle ULBPs evolved by duplication and sequence divergence to produce a sufficient number and diversity of ULBP molecules to deliver an immune activation signal in the presence of immunogenic peptides. Given the importance of the ULBPs in antiviral immunity in other species, our goal was to determine the copy number and genomic organization of the ULBP genes in the cattle genome. Results Sequencing of cattle bacterial artificial chromosome genomic inserts resulted in the identification of 30 cattle ULBP loci existing in two gene clusters. Evidence of extensive segmental duplication and approximately 14 Kbp of novel repetitive sequences were identified within the major cluster. Ten ULBPs are predicted to be expressed at the cell surface. Substitution analysis revealed 11 outwardly directed residues in the predicted extracellular domains that show evidence of positive Darwinian selection. These positively selected residues have only one residue that overlaps with those proposed to interact with NKG2D, thus suggesting the interaction with molecules other than NKG2D. Conclusion The ULBP loci in the cattle genome apparently arose by gene duplication and subsequent sequence divergence. Substitution analysis of the ULBP proteins provided convincing evidence for positive selection on extracellular residues that may interact with peptide ligands. These

  5. Metabolic Genes within Cyanophage Genomes: Implications for Diversity and Evolution

    Directory of Open Access Journals (Sweden)

    E-Bin Gao

    2016-09-01

    Full Text Available Cyanophages, a group of viruses specifically infecting cyanobacteria, are genetically diverse and extensively abundant in water environments. As a result of selective pressure, cyanophages often acquire a range of metabolic genes from host genomes. The host-derived genes make a significant contribution to the ecological success of cyanophages. In this review, we summarize the host-derived metabolic genes, as well as their origin and roles in cyanophage evolution and important host metabolic pathways, such as the light-dependent reactions of photosynthesis, the pentose phosphate pathway, nutrient acquisition and nucleotide biosynthesis. We also discuss the suitability of the host-derived metabolic genes as potential diagnostic markers for the detection of genetic diversity of cyanophages in natural environments.

  6. Changes of multiple genes in human gastric carcinomas

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Objective: To investigate the mutual relation of the changesamong multiple genes in human gastric carcinomas (GC). Methods: By means of software package about social science (SPSS) and statistics analysis system (SAS), the mutual relation of the expression of oncogenes (p21, p185) and tumor suppressor genes (RB, p53, p16, nm23) in 78 GC is discussed. Results: There existed correlations among some genes, i.e., p21 and p185, RB and p16, p16 and p53 as well as p16 and nm23; It is relatively uncommon that the carcinogenesis of GC simultaneously related to more changes of multiple genes; The inactivation of p16 gene was independent factor to predict the metastasis of lymphaden, the mutation of p53 gene and the inactivation of p16 gene were independent factors to predict the invasive depth. Conclusion: There are not only the changes of multiple genes including oncogenes activation and tumor suppressor genes inactivation, but also they may play an important role in carcinogenesis of GC through mutual cooperation. The inactivation of p16 gene is one of the most useful index to predict the prognosis of patient with GC.

  7. An improved canine genome and a comprehensive catalogue of coding genes and non-coding transcripts.

    Directory of Open Access Journals (Sweden)

    Marc P Hoeppner

    Full Text Available The domestic dog, Canis familiaris, is a well-established model system for mapping trait and disease loci. While the original draft sequence was of good quality, gaps were abundant particularly in promoter regions of the genome, negatively impacting the annotation and study of candidate genes. Here, we present an improved genome build, canFam3.1, which includes 85 MB of novel sequence and now covers 99.8% of the euchromatic portion of the genome. We also present multiple RNA-Sequencing data sets from 10 different canine tissues to catalog ∼175,000 expressed loci. While about 90% of the coding genes previously annotated by EnsEMBL have measurable expression in at least one sample, the number of transcript isoforms detected by our data expands the EnsEMBL annotations by a factor of four. Syntenic comparison with the human genome revealed an additional ∼3,000 loci that are characterized as protein coding in human and were also expressed in the dog, suggesting that those were previously not annotated in the EnsEMBL canine gene set. In addition to ∼20,700 high-confidence protein coding loci, we found ∼4,600 antisense transcripts overlapping exons of protein coding genes, ∼7,200 intergenic multi-exon transcripts without coding potential, likely candidates for long intergenic non-coding RNAs (lincRNAs and ∼11,000 transcripts were reported by two different library construction methods but did not fit any of the above categories. Of the lincRNAs, about 6,000 have no annotated orthologs in human or mouse. Functional analysis of two novel transcripts with shRNA in a mouse kidney cell line altered cell morphology and motility. All in all, we provide a much-improved annotation of the canine genome and suggest regulatory functions for several of the novel non-coding transcripts.

  8. Predicting Gene Structures from Multiple RT-PCR Tests

    Science.gov (United States)

    Kováč, Jakub; Vinař, Tomáš; Brejová, Broňa

    It has been demonstrated that the use of additional information such as ESTs and protein homology can significantly improve accuracy of gene prediction. However, many sources of external information are still being omitted from consideration. Here, we investigate the use of product lengths from RT-PCR experiments in gene finding. We present hardness results and practical algorithms for several variants of the problem and apply our methods to a real RT-PCR data set in the Drosophila genome. We conclude that the use of RT-PCR data can improve the sensitivity of gene prediction and locate novel splicing variants.

  9. Regulatory Features for Odorant Receptor Genes in the Mouse Genome.

    Science.gov (United States)

    Degl'Innocenti, Andrea; D'Errico, Anna

    2017-01-01

    The odorant receptor genes, seven transmembrane receptor genes constituting the vastest mammalian gene multifamily, are expressed monogenically and monoallelicaly in each sensory neuron in the olfactory epithelium. This characteristic, often referred to as the one neuron-one receptor rule, is driven by mostly uncharacterized molecular dynamics, generally named odorant receptor gene choice. Much attention has been paid by the scientific community to the identification of sequences regulating the expression of odorant receptor genes within their loci, where related genes are usually arranged in genomic clusters. A number of studies identified transcription factor binding sites on odorant receptor promoter sequences. Similar binding sites were also found on a number of enhancers that regulate in cis their transcription, but have been proposed to form interchromosomal networks. Odorant receptor gene choice seems to occur via the local removal of strongly repressive epigenetic markings, put in place during the maturation of the sensory neuron on each odorant receptor locus. Here we review the fast-changing state of art for the study of regulatory features for odorant receptor genes.

  10. GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

    Directory of Open Access Journals (Sweden)

    Promponas Vasilis J

    2003-10-01

    Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating

  11. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    Energy Technology Data Exchange (ETDEWEB)

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  12. Genome-level identification, gene expression, and comparative analysis of porcine ß-defensin genes

    Directory of Open Access Journals (Sweden)

    Choi Min-Kyeung

    2012-11-01

    Full Text Available Abstract Background Beta-defensins (β-defensins are innate immune peptides with evolutionary conservation across a wide range of species and has been suggested to play important roles in innate immune reactions against pathogens. However, the complete β-defensin repertoire in the pig has not been fully addressed. Result A BLAST analysis was performed against the available pig genomic sequence in the NCBI database to identify β-defensin-related sequences using previously reported β-defensin sequences of pigs, humans, and cattle. The porcine β-defensin gene clusters were mapped to chromosomes 7, 14, 15 and 17. The gene expression analysis of 17 newly annotated porcine β-defensin genes across 15 tissues using semi-quantitative reverse transcription polymerase chain reaction (RT-PCR showed differences in their tissue distribution, with the kidney and testis having the largest pBD expression repertoire. We also analyzed single nucleotide polymorphisms (SNPs in the mature peptide region of pBD genes from 35 pigs of 7 breeds. We found 8 cSNPs in 7 pBDs. Conclusion We identified 29 porcine β-defensin (pBD gene-like sequences, including 17 unreported pBDs in the porcine genome. Comparative analysis of β-defensin genes in the pig genome with those in human and cattle genomes showed structural conservation of β-defensin syntenic regions among these species.

  13. A salmonid EST genomic study: genes, duplications, phylogeny and microarrays

    Directory of Open Access Journals (Sweden)

    Brahmbhatt Sonal

    2008-11-01

    Full Text Available Abstract Background Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most widely studied groups of fish. Results 298,304 expressed sequence tags (ESTs from Atlantic salmon (69% of the total, 11,664 chinook, 10,813 sockeye, 10,051 brook trout, 10,975 grayling, 8,630 lake whitefish, and 3,624 northern pike ESTs were obtained in this study and have been deposited into the public databases. Contigs were built and putative full-length Atlantic salmon clones have been identified. A database containing ESTs, assemblies, consensus sequences, open reading frames, gene predictions and putative annotation is available. The overall similarity between Atlantic salmon ESTs and those of rainbow trout, chinook, sockeye, brook trout, grayling, lake whitefish, northern pike and rainbow smelt is 93.4, 94.2, 94.6, 94.4, 92.5, 91.7, 89.6, and 86.2% respectively. An analysis of 78 transcript sets show Salmo as a sister group to Oncorhynchus and Salvelinus within Salmoninae, and Thymallinae as a sister group to Salmoninae and Coregoninae within Salmonidae. Extensive gene duplication is consistent with a genome duplication in the common ancestor of salmonids. Using all of the available EST data, a new expanded salmonid cDNA microarray of 32,000 features was created. Cross-species hybridizations to this cDNA microarray indicate that this resource will be useful for studies of all 68 salmonid species. Conclusion An extensive collection and analysis of salmonid RNA putative transcripts indicate that Pacific salmon, Atlantic salmon and charr are 94–96% similar while the more distant whitefish, grayling, pike and smelt are 93, 92, 89 and 86% similar to salmon. The salmonid transcriptome reveals a complex history of gene duplication that is

  14. Genome-wide identification and characterization of WRKY gene family in Salix suchowensis

    Directory of Open Access Journals (Sweden)

    Changwei Bi

    2016-09-01

    Full Text Available WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I–III, with five subgroups (IIa–IIe in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon–intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the

  15. Chromosome mapping of dragline silk genes in the genomes of widow spiders (Araneae, Theridiidae.

    Directory of Open Access Journals (Sweden)

    Yonghui Zhao

    Full Text Available With its incredible strength and toughness, spider dragline silk is widely lauded for its impressive material properties. Dragline silk is composed of two structural proteins, MaSp1 and MaSp2, which are encoded by members of the spidroin gene family. While previous studies have characterized the genes that encode the constituent proteins of spider silks, nothing is known about the physical location of these genes. We determined karyotypes and sex chromosome organization for the widow spiders, Latrodectus hesperus and L. geometricus (Araneae, Theridiidae. We then used fluorescence in situ hybridization to map the genomic locations of the genes for the silk proteins that compose the remarkable spider dragline. These genes included three loci for the MaSp1 protein and the single locus for the MaSp2 protein. In addition, we mapped a MaSp1 pseudogene. All the MaSp1 gene copies and pseudogene localized to a single chromosomal region while MaSp2 was located on a different chromosome of L. hesperus. Using probes derived from L. hesperus, we comparatively mapped all three MaSp1 loci to a single region of a L. geometricus chromosome. As with L. hesperus, MaSp2 was found on a separate L. geometricus chromosome, thus again unlinked to the MaSp1 loci. These results indicate orthology of the corresponding chromosomal regions in the two widow genomes. Moreover, the occurrence of multiple MaSp1 loci in a conserved gene cluster across species suggests that MaSp1 proliferated by tandem duplication in a common ancestor of L. geometricus and L. hesperus. Unequal crossover events during recombination could have given rise to the gene copies and could also maintain sequence similarity among gene copies over time. Further comparative mapping with taxa of increasing divergence from Latrodectus will pinpoint when the MaSp1 duplication events occurred and the phylogenetic distribution of silk gene linkage patterns.

  16. Genomic analysis of primordial dwarfism reveals novel disease genes.

    Science.gov (United States)

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis.

  17. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

    Science.gov (United States)

    Biankin, Andrew V; Waddell, Nicola; Kassahn, Karin S; Gingras, Marie-Claude; Muthuswamy, Lakshmi B; Johns, Amber L; Miller, David K; Wilson, Peter J; Patch, Ann-Marie; Wu, Jianmin; Chang, David K; Cowley, Mark J; Gardiner, Brooke B; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J; Gill, Anthony J; Pinho, Andreia V; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R Scott; Humphris, Jeremy L; Kaplan, Warren; Jones, Marc D; Colvin, Emily K; Nagrial, Adnan M; Humphrey, Emily S; Chou, Angela; Chin, Venessa T; Chantrill, Lorraine A; Mawson, Amanda; Samra, Jaswinder S; Kench, James G; Lovell, Jessica A; Daly, Roger J; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M; Fisher, William E; Brunicardi, F Charles; Hodges, Sally E; Reid, Jeffrey G; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R; Dinh, Huyen; Buhay, Christian J; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E; Yung, Christina K; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A; Petersen, Gloria M; Gallinger, Steven; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Schulick, Richard D; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A; Mann, Karen M; Jenkins, Nancy A; Perez-Mancera, Pedro A; Adams, David J; Largaespada, David A; Wessels, Lodewyk F A; Rust, Alistair G; Stein, Lincoln D; Tuveson, David A; Copeland, Neal G; Musgrove, Elizabeth A; Scarpa, Aldo; Eshleman, James R; Hudson, Thomas J; Sutherland, Robert L; Wheeler, David A; Pearson, John V; McPherson, John D; Gibbs, Richard A; Grimmond, Sean M

    2012-11-15

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.

  18. Genome-wide identification of KANADI1 target genes.

    Directory of Open Access Journals (Sweden)

    Paz Merelo

    Full Text Available Plant organ development and polarity establishment is mediated by the action of several transcription factors. Among these, the KANADI (KAN subclade of the GARP protein family plays important roles in polarity-associated processes during embryo, shoot and root patterning. In this study, we have identified a set of potential direct target genes of KAN1 through a combination of chromatin immunoprecipitation/DNA sequencing (ChIP-Seq and genome-wide transcriptional profiling using tiling arrays. Target genes are over-represented for genes involved in the regulation of organ development as well as in the response to auxin. KAN1 affects directly the expression of several genes previously shown to be important in the establishment of polarity during lateral organ and vascular tissue development. We also show that KAN1 controls through its target genes auxin effects on organ development at different levels: transport and its regulation, and signaling. In addition, KAN1 regulates genes involved in the response to abscisic acid, jasmonic acid, brassinosteroids, ethylene, cytokinins and gibberellins. The role of KAN1 in organ polarity is antagonized by HD-ZIPIII transcription factors, including REVOLUTA (REV. A comparison of their target genes reveals that the REV/KAN1 module acts in organ patterning through opposite regulation of shared targets. Evidence of mutual repression between closely related family members is also shown.

  19. Integrase-directed recovery of functional genes from genomic libraries.

    Science.gov (United States)

    Rowe-Magnus, Dean A

    2009-09-01

    Large population sizes, rapid growth and 3.8 billion years of evolution firmly establish microorganisms as a major source of the planet's biological and genetic diversity. However, up to 99% of the microorganisms in a given environment cannot be cultured. Culture-independent methods that directly access the genetic potential of an environmental sample can unveil new proteins with diverse functions, but the sequencing of random DNA can generate enormous amounts of extraneous data. Integrons are recombination systems that accumulate open reading frames (gene cassettes), many of which code for functional proteins with enormous adaptive potential. Some integrons harbor hundreds of gene cassettes and evidence suggests that the gene cassette pool may be limitless in size. Accessing this genetic pool has been hampered since sequence-based techniques, such as hybridization or PCR, often recover only partial genes or a small subset of those present in the sample. Here, a three-plasmid genetic strategy for the sequence-independent recovery of gene cassettes from genomic libraries is described and its use by retrieving functional gene cassettes from the chromosomal integron of Vibrio vulnificus ATCC 27562 is demonstrated. By manipulating the natural activity of integrons, we can gain access to the caches of functional genes amassed by these structures.

  20. Genetics and Genomics of Single-Gene Cardiovascular Diseases : Common Hereditary Cardiomyopathies as Prototypes of Single-Gene Disorders

    NARCIS (Netherlands)

    Marian, Ali J; van Rooij, Eva; Roberts, Robert

    2016-01-01

    This is the first of 2 review papers on genetics and genomics appearing as part of the series on "omics." Genomics pertains to all components of an organism's genes, whereas genetics involves analysis of a specific gene or genes in the context of heredity. The paper provides introductory comments,

  1. Genetics and Genomics of Single-Gene Cardiovascular Diseases : Common Hereditary Cardiomyopathies as Prototypes of Single-Gene Disorders

    NARCIS (Netherlands)

    Marian, Ali J.; van Rooij, Eva; Roberts, Robert

    2016-01-01

    This is the first of 2 review papers on genetics and genomics appearing as part of the series on “omics.” Genomics pertains to all components of an organism's genes, whereas genetics involves analysis of a specific gene or genes in the context of heredity. The paper provides introductory comments,

  2. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae : Implications for the microbial "pan-genome"

    NARCIS (Netherlands)

    Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Donati, C; Medini, D; Ward, NL; Angiuoli, SV; Crabtree, J; Jones, AL; Durkin, AS; DeBoy, RT; Davidsen, TM; Mora, M; Scarselli, M; Ros, IMY; Peterson, JD; Hauser, CR; Sundaram, JP; Nelson, WC; Madupu, R; Brinkac, LM; Dodson, RJ; Rosovitz, MJ; Sullivan, SA; Daugherty, SC; Haft, DH; Selengut, J; Gwinn, ML; Zhou, LW; Zafar, N; Khouri, H; Radune, D; Dimitrov, G; Watkins, K; O'Connor, KJB; Smith, S; Utterback, TR; White, O; Rubens, CE; Grandi, G; Madoff, LC; Kasper, DL; Telford, JL; Wessels, MR; Rappuoli, R; Fraser, CM

    2005-01-01

    The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and als

  3. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae : Implications for the microbial "pan-genome"

    NARCIS (Netherlands)

    Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Donati, C; Medini, D; Ward, NL; Angiuoli, SV; Crabtree, J; Jones, AL; Durkin, AS; DeBoy, RT; Davidsen, TM; Mora, M; Scarselli, M; Ros, IMY; Peterson, JD; Hauser, CR; Sundaram, JP; Nelson, WC; Madupu, R; Brinkac, LM; Dodson, RJ; Rosovitz, MJ; Sullivan, SA; Daugherty, SC; Haft, DH; Selengut, J; Gwinn, ML; Zhou, LW; Zafar, N; Khouri, H; Radune, D; Dimitrov, G; Watkins, K; O'Connor, KJB; Smith, S; Utterback, TR; White, O; Rubens, CE; Grandi, G; Madoff, LC; Kasper, DL; Telford, JL; Wessels, MR; Rappuoli, R; Fraser, CM

    2005-01-01

    The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and als

  4. Genome size diversity in angiosperms and its influence on gene space.

    Science.gov (United States)

    Dodsworth, Steven; Leitch, Andrew R; Leitch, Ilia J

    2015-12-01

    Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C=5.7Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as 'junk' DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. Localizing F(ST) outliers on a QTL map reveals evidence for large genomic regions of reduced gene exchange during speciation-with-gene-flow.

    Science.gov (United States)

    Via, Sara; Conte, Gina; Mason-Foley, Casey; Mills, Kelly

    2012-11-01

    Populations that maintain phenotypic divergence in sympatry typically show a mosaic pattern of genomic divergence, requiring a corresponding mosaic of genomic isolation (reduced gene flow). However, mechanisms that could produce the genomic isolation required for divergence-with-gene-flow have barely been explored, apart from the traditional localized effects of selection and reduced recombination near centromeres or inversions. By localizing F(ST) outliers from a genome scan of wild pea aphid host races on a Quantitative Trait Locus (QTL) map of key traits, we test the hypothesis that between-population recombination and gene exchange are reduced over large 'divergence hitchhiking' (DH) regions. As expected under divergence hitchhiking, our map confirms that QTL and divergent markers cluster together in multiple large genomic regions. Under divergence hitchhiking, the nonoutlier markers within these regions should show signs of reduced gene exchange relative to nonoutlier markers in genomic regions where ongoing gene flow is expected. We use this predicted difference among nonoutliers to perform a critical test of divergence hitchhiking. Results show that nonoutlier markers within clusters of F(ST) outliers and QTL resolve the genetic population structure of the two host races nearly as well as the outliers themselves, while nonoutliers outside DH regions reveal no population structure, as expected if they experience more gene flow. These results provide clear evidence for divergence hitchhiking, a mechanism that may dramatically facilitate the process of speciation-with-gene-flow. They also show the power of integrating genome scans with genetic analyses of the phenotypic traits involved in local adaptation and population divergence. © 2012 Blackwell Publishing Ltd.

  6. A Bayesian Hierarchical Model for Relating Multiple SNPs within Multiple Genes to Disease Risk

    Directory of Open Access Journals (Sweden)

    Lewei Duan

    2013-01-01

    Full Text Available A variety of methods have been proposed for studying the association of multiple genes thought to be involved in a common pathway for a particular disease. Here, we present an extension of a Bayesian hierarchical modeling strategy that allows for multiple SNPs within each gene, with external prior information at either the SNP or gene level. The model involves variable selection at the SNP level through latent indicator variables and Bayesian shrinkage at the gene level towards a prior mean vector and covariance matrix that depend on external information. The entire model is fitted using Markov chain Monte Carlo methods. Simulation studies show that the approach is capable of recovering many of the truly causal SNPs and genes, depending upon their frequency and size of their effects. The method is applied to data on 504 SNPs in 38 candidate genes involved in DNA damage response in the WECARE study of second breast cancers in relation to radiotherapy exposure.

  7. Mapping our genes: The genome projects: How big, how fast

    Energy Technology Data Exchange (ETDEWEB)

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  8. Mapping Our Genes: The Genome Projects: How Big, How Fast

    Science.gov (United States)

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for �writing the rules� of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  9. Computational prediction of microRNA genes in silkworm genome

    Institute of Scientific and Technical Information of China (English)

    TONG Chuan-zhou; JIN Yong-feng; ZHANG Yao-zhou

    2006-01-01

    MicroRNAs (miRNAs) constitute a novel, extensive class of small RNAs (~21 nucleotides), and play important gene-regulation roles during growth and development in various organisms. Here we conducted a homology search to identify homologs of previously validated miRNAs from silkworm genome. We identified 24 potential miRNA genes, and gave each of them a name according to the common criteria. Interestingly, we found that a great number of newly identified miRNAs were conserved in silkworm and Drosophila, and family alignment revealed that miRNA families might possess single nucleotide polymorphisms. miRNA gene clusters and possible functions of complement miRNA pairs are discussed.

  10. Genomic and gene variation in Mycoplasma hominis strains

    DEFF Research Database (Denmark)

    Christiansen, Gunna; Andersen, H; Birkelund, Svend

    1987-01-01

    DNAs from 14 strains of Mycoplasma hominis isolated from various habitats, including strain PG21, were analyzed for genomic heterogeneity. DNA-DNA filter hybridization values were from 51 to 91%. Restriction endonuclease digestion patterns, analyzed by agarose gel electrophoresis, revealed...... no identity or cluster formation between strains. Variation within M. hominis rRNA genes was analyzed by Southern hybridization of EcoRI-cleaved DNA hybridized with a cloned fragment of the rRNA gene from the mycoplasma strain PG50. Five of the M. hominis strains showed identical hybridization patterns....... These hybridization patterns were compared with those of 12 other mycoplasma species, which showed a much more complex band pattern. Cloned nonribosomal RNA gene fragments of M. hominis PG21 DNA were analyzed, and the fragments were used to demonstrate heterogeneity among the strains. A monoclonal antibody against...

  11. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes

    DEFF Research Database (Denmark)

    Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam;

    2013-01-01

    Reference genomes are required to understand the diverse roles of microorganisms in ecology, evolution, human and animal health, but most species remain uncultured. Here we present a sequence composition–independent approach to recover high-quality microbial genomes from deeply sequenced...... metagenomes. Multiple metagenomes of the same community, which differ in relative population abundances, were used to assemble 31 bacterial genomes, including rare (genomes were assembled into complete or near-complete chromosomes....... Four belong to the candidate bacterial phylum TM7 and represent the most complete genomes for this phylum to date (relative abundances, 0.06–1.58%). Reanalysis of published metagenomes reveals that differential coverage binning facilitates recovery of more complete and higher fidelity genome bins than...

  12. Microarray and comparative genomics-based identification of genes and gene regulatory regions of the mouse immune system

    Directory of Open Access Journals (Sweden)

    Katz Jonathan D

    2004-10-01

    Full Text Available Abstract Background In this study we have built and mined a gene expression database composed of 65 diverse mouse tissues for genes preferentially expressed in immune tissues and cell types. Using expression pattern criteria, we identified 360 genes with preferential expression in thymus, spleen, peripheral blood mononuclear cells, lymph nodes (unstimulated or stimulated, or in vitro activated T-cells. Results Gene clusters, formed based on similarity of expression-pattern across either all tissues or the immune tissues only, had highly significant associations both with immunological processes such as chemokine-mediated response, antigen processing, receptor-related signal transduction, and transcriptional regulation, and also with more general processes such as replication and cell cycle control. Within-cluster gene correlations implicated known associations of known genes, as well as immune process-related roles for poorly described genes. To characterize regulatory mechanisms and cis-elements of genes with similar patterns of expression, we used a new version of a comparative genomics-based cis-element analysis tool to identify clusters of cis-elements with compositional similarity among multiple genes. Several clusters contained genes that shared 5–6 cis-elements that included ETS and zinc-finger binding sites. cis-Elements AP2 EGRF ETSF MAZF SP1F ZF5F and AREB ETSF MZF1 PAX5 STAT were shared in a thymus-expressed set; AP4R E2FF EBOX ETSF MAZF SP1F ZF5F and CREB E2FF MAZF PCAT SP1F STAT cis-clusters occurred in activated T-cells; CEBP CREB NFKB SORY and GATA NKXH OCT1 RBIT occurred in stimulated lymph nodes. Conclusion This study demonstrates a series of analytic approaches that have allowed the implication of genes and regulatory elements that participate in the differentiation, maintenance, and function of the immune system. Polymorphism or mutation of these could adversely impact immune system functions.

  13. Genomic and gene expression signature of the pre-invasive testicular carcinoma in situ

    DEFF Research Database (Denmark)

    Almstrup, Kristian; Ottesen, Anne Marie; Sonne, Si Brask

    2005-01-01

    on the pre-invasive CIS and its possible fetal origin by reviewing recent data originating from DNA microarrays and comparative genomic hybridisations. A comparison of gene expression and genomic aberrations reveal chromosomal "hot spots" with mutual clustering of gene expression and genomic amplification...

  14. Genome-wide scans provide evidence for positive selection of genes implicated in Lassa fever.

    Science.gov (United States)

    Andersen, Kristian G; Shylakhter, Ilya; Tabrizi, Shervin; Grossman, Sharon R; Happi, Christian T; Sabeti, Pardis C

    2012-03-19

    Rapidly evolving viruses and other pathogens can have an immense impact on human evolution as natural selection acts to increase the prevalence of genetic variants providing resistance to disease. With the emergence of large datasets of human genetic variation, we can search for signatures of natural selection in the human genome driven by such disease-causing microorganisms. Based on this approach, we have previously hypothesized that Lassa virus (LASV) may have been a driver of natural selection in West African populations where Lassa haemorrhagic fever is endemic. In this study, we provide further evidence for this notion. By applying tests for selection to genome-wide data from the International Haplotype Map Consortium and the 1000 Genomes Consortium, we demonstrate evidence for positive selection in LARGE and interleukin 21 (IL21), two genes implicated in LASV infectivity and immunity. We further localized the signals of selection, using the recently developed composite of multiple signals method, to introns and putative regulatory regions of those genes. Our results suggest that natural selection may have targeted variants giving rise to alternative splicing or differential gene expression of LARGE and IL21. Overall, our study supports the hypothesis that selective pressures imposed by LASV may have led to the emergence of particular alleles conferring resistance to Lassa fever, and opens up new avenues of research pursuit.

  15. Genome-wide analysis of glutathione reductase (GR) genes from rice and Arabidopsis.

    Science.gov (United States)

    Trivedi, Dipesh Kumar; Gill, Sarvajeet Singh; Yadav, Sandep; Tuteja, Narendra

    2013-02-01

    Plant cells and tissues remain always on risk under abiotic and biotic stresses due to increased production of reactive oxygen species (ROS). Plants protect themselves against ROS induced oxidative damage by the upregulation of antioxidant machinery. Out of many components of antioxidant machinery, glutathione reductase (GR, EC 1.6.4.2) and glutathione (GSH, γ-Glu-Cys-Gly) play important role in the protection of cell against oxidative damage. In stress condition, the GR helps in maintaining the reduced glutathione pool for strengthening the antioxidative processes in plants. Present study investigates genome wide analysis of GR from rice and Arabidopsis. We were able to identify 3 rice GR genes (LOC_Os02 g56850, LOC_Os03 g06740, LOC_Os10 g28000) and 2 Arabidopsis GR genes (AT3G54660, AT3G24170) from their respective genomes on the basis of their annotation as well as the presence of pyridine nucleotide-disulphide oxidoreductases class-I active site. The evolutionary relationship of the GR genes from rice and Arabidopsis genomes was analyzed using the multiple sequence alignment and phylogenetic tree. This revealed evolutionary conserved pyridine nucleotide-disulphide oxidoreductases class-I active site among the GR protein in rice and Arabidopsis. This study should make an important contribution to our better understanding of the GR under normal and stress condition in plants.

  16. The genetics of multiple sclerosis: principles, background and updated results of the United Kingdom systematic genome screen.

    Science.gov (United States)

    Chataway, J; Feakes, R; Coraddu, F; Gray, J; Deans, J; Fraser, M; Robertson, N; Broadley, S; Jones, H; Clayton, D; Goodfellow, P; Sawcer, S; Compston, A

    1998-10-01

    Genetic susceptibility to multiple sclerosis is implicated on the basis of classical family studies and phenotype analyses. The only reproducible legacy from the candidate gene approach has been the discovery of population associations with alleles of the major histocompatibility complex. Systematic genome scanning has since been applied using a panel of anonymous markers to identify areas of linkage in co-affected siblings. Here, we describe the principles of genome screening and update the UK survey of multiple sclerosis. This identified 20 regions of potential interest, but in none was there unequivocal linkage. In theory, attempting to replicate these findings in a second set of sibling pair families is the most appropriate way to distinguish true from false positives, but unfortunately the number of families required to do this reliably is prohibitively large. We used three approaches to increase the definition achieved by the screen: (i) the number of sibling pairs typed in an identified region of potential linkage was extended; (ii) the information extraction was increased in an identified region; and (iii) a search was made for missed regions of potential linkage. Each of these approaches has considerable limitations. A chromosome-by-chromosome account is given to direct future searches. Although an additional marker placed distal to the 'hit' on chromosome 14q increased linkage in this area, and typing extra sibling pairs increased linkage on chromosomes 6p and 17q, evidence for linkage was more commonly reduced and no additional regions of interest were found. A further refinement of the genome screen was undertaken by conditioning for the presence of HLA-DR15. This produced a surprising degree of segregation among the regions of interest, which divided into two distinct groups depending on DR15 sharing: the DR15-sharing cohort comprised loci on chromosomal areas 1p, 17q and X; and the DR15-non-sharing cohort was made up of loci on 1cen, 3p, 7p, 14q and

  17. Whole Genome Sequencing of the Symbiont Pseudovibrio sp. from the Intertidal Marine Sponge Polymastia penicillus Revealed a Gene Repertoire for Host-Switching Permissive Lifestyle.

    Science.gov (United States)

    Alex, Anoop; Antunes, Agostinho

    2015-10-31

    Sponges harbor a complex consortium of microbial communities living in symbiotic relationship benefiting each other through the integration of metabolites. The mechanisms influencing a successful microbial association with a sponge partner are yet to be fully understood. Here, we sequenced the genome of Pseudovibrio sp. POLY-S9 strain isolated from the intertidal marine sponge Polymastia penicillus sampled from the Atlantic coast of Portugal to identify the genomic features favoring the symbiotic relationship. The draft genome revealed an exceptionally large genome size of 6.6 Mbp compared with the previously reported genomes of the genus Pseudovibrio isolated from a coral and a sponge larva. Our genomic study detected the presence of several biosynthetic gene clusters-polyketide synthase, nonribosomal peptide synthetase and siderophore-affirming the potential ability of the genus Pseudovibrio to produce a wide variety of metabolic compounds. Moreover, we identified a repertoire of genes encoding adaptive symbioses factors (eukaryotic-like proteins), such as the ankyrin repeats, tetratrico peptide repeats, and Sel1 repeats that improve the attachment to the eukaryotic hosts and the avoidance of the host's immune response : The genome also harbored a large number of mobile elements (∼5%) and gene transfer agents, which explains the massive genome expansion and suggests a possible mechanism of horizontal gene transfer. In conclusion, the genome of POLY-S9 exhibited an increase in size, number of mobile DNA, multiple metabolite gene clusters, and secretion systems, likely to influence the genome diversification and the evolvability.

  18. Multiple genes encode the major surface glycoprotein of Pneumocystis carinii

    DEFF Research Database (Denmark)

    Kovacs, J A; Powell, F; Edman, J C;

    1993-01-01

    this antigen is a good candidate for development as a vaccine to prevent or control P. carinii infection. We have cloned and sequenced seven related but unique genes encoding the major surface glycoprotein of rat P. carinii. Partial amino acid sequencing confirmed the identity of these genes. Based on Southern...... hydrophobic region at the carboxyl terminus. The presence of multiple related msg genes encoding the major surface glycoprotein of P. carinii suggests that antigenic variation is a possible mechanism for evading host defenses. Further characterization of this family of genes should allow the development...

  19. Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum.

    Science.gov (United States)

    Rao, Soumya; Nandineni, Madhusudan R

    2017-01-01

    Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.

  20. Genome-enabled Discovery of Carbon Sequestration Genes

    Energy Technology Data Exchange (ETDEWEB)

    Tuskan, Gerald A [ORNL; Tschaplinski, Timothy J [ORNL; Kalluri, Udaya C [ORNL; Yin, Tongming [ORNL; Yang, Xiaohan [ORNL; Zhang, Xinye [ORNL; Engle, Nancy L [ORNL; Ranjan, Priya [ORNL; Basu, Manojit M [ORNL; Gunter, Lee E [ORNL; Jawdy, Sara [ORNL; Martin, Madhavi Z [ORNL; Campbell, Alina S [ORNL; DiFazio, Stephen P [ORNL; Davis, John M [University of Florida; Hinchee, Maud [ORNL; Pinnacchio, Christa [U.S. Department of Energy, Joint Genome Institute; Meilan, R [Purdue University; Busov, V. [Michigan Technological University; Strauss, S [Oregon State University

    2009-01-01

    The fate of carbon below ground is likely to be a major factor determining the success of carbon sequestration strategies involving plants. Despite their importance, molecular processes controlling belowground C allocation and partitioning are poorly understood. This project is leveraging the Populus trichocarpa genome sequence to discover genes important to C sequestration in plants and soils. The focus is on the identification of genes that provide key control points for the flow and chemical transformations of carbon in roots, concentrating on genes that control the synthesis of chemical forms of carbon that result in slower turnover rates of soil organic matter (i.e., increased recalcitrance). We propose to enhance carbon allocation and partitioning to roots by 1) modifying the auxin signaling pathway, and the invertase family, which controls sucrose metabolism, and by 2) increasing root proliferation through transgenesis with genes known to control fine root proliferation (e.g., ANT), 3) increasing the production of recalcitrant C metabolites by identifying genes controlling secondary C metabolism by a major mQTL-based gene discovery effort, and 4) increasing aboveground productivity by enhancing drought tolerance to achieve maximum C sequestration. This broad, integrated approach is aimed at ultimately enhancing root biomass as well as root detritus longevity, providing the best prospects for significant enhancement of belowground C sequestration.

  1. Genome Diversification Mechanism of Rodent and Lagomorpha Chemokine Genes

    Directory of Open Access Journals (Sweden)

    Kanako Shibata

    2013-01-01

    Full Text Available Chemokines are a large family of small cytokines that are involved in host defence and body homeostasis through recruitment of cells expressing their receptors. Their genes are known to undergo rapid evolution. Therefore, the number and content of chemokine genes can be quite diverse among the different species, making the orthologous relationships often ambiguous even between closely related species. Given that rodents and rabbit are useful experimental models in medicine and drug development, we have deduced the chemokine genes from the genome sequences of several rodent species and rabbit and compared them with those of human and mouse to determine the orthologous relationships. The interspecies differences should be taken into consideration when experimental results from animal models are extrapolated into humans. The chemokine gene lists and their orthologous relationships presented here will be useful for studies using these animal models. Our analysis also enables us to reconstruct possible gene duplication processes that generated the different sets of chemokine genes in these species.

  2. Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates.

    Directory of Open Access Journals (Sweden)

    Senanu M Spring-Pearson

    Full Text Available The pangenomic diversity in Burkholderia pseudomallei is high, with approximately 5.8% of the genome consisting of genomic islands. Genomic islands are known hotspots for recombination driven primarily by site-specific recombination associated with tRNAs. However, recombination rates in other portions of the genome are also high, a feature we expected to disrupt gene order. We analyzed the pangenome of 37 isolates of B. pseudomallei and demonstrate that the pangenome is 'open', with approximately 136 new genes identified with each new genome sequenced, and that the global core genome consists of 4568±16 homologs. Genes associated with metabolism were statistically overrepresented in the core genome, and genes associated with mobile elements, disease, and motility were primarily associated with accessory portions of the pangenome. The frequency distribution of genes present in between 1 and 37 of the genomes analyzed matches well with a model of genome evolution in which 96% of the genome has very low recombination rates but 4% of the genome recombines readily. Using homologous genes among pairs of genomes, we found that gene order was highly conserved among strains, despite the high recombination rates previously observed. High rates of gene transfer and recombination are incompatible with retaining gene order unless these processes are either highly localized to specific sites within the genome, or are characterized by symmetrical gene gain and loss. Our results demonstrate that both processes occur: localized recombination introduces many new genes at relatively few sites, and recombination throughout the genome generates the novel multi-locus sequence types previously observed while preserving gene order.

  3. Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates.

    Science.gov (United States)

    Spring-Pearson, Senanu M; Stone, Joshua K; Doyle, Adina; Allender, Christopher J; Okinaka, Richard T; Mayo, Mark; Broomall, Stacey M; Hill, Jessica M; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; McNew, Lauren A; Rosenzweig, C Nicole; Gibbons, Henry S; Currie, Bart J; Wagner, David M; Keim, Paul; Tuanyok, Apichai

    2015-01-01

    The pangenomic diversity in Burkholderia pseudomallei is high, with approximately 5.8% of the genome consisting of genomic islands. Genomic islands are known hotspots for recombination driven primarily by site-specific recombination associated with tRNAs. However, recombination rates in other portions of the genome are also high, a feature we expected to disrupt gene order. We analyzed the pangenome of 37 isolates of B. pseudomallei and demonstrate that the pangenome is 'open', with approximately 136 new genes identified with each new genome sequenced, and that the global core genome consists of 4568±16 homologs. Genes associated with metabolism were statistically overrepresented in the core genome, and genes associated with mobile elements, disease, and motility were primarily associated with accessory portions of the pangenome. The frequency distribution of genes present in between 1 and 37 of the genomes analyzed matches well with a model of genome evolution in which 96% of the genome has very low recombination rates but 4% of the genome recombines readily. Using homologous genes among pairs of genomes, we found that gene order was highly conserved among strains, despite the high recombination rates previously observed. High rates of gene transfer and recombination are incompatible with retaining gene order unless these processes are either highly localized to specific sites within the genome, or are characterized by symmetrical gene gain and loss. Our results demonstrate that both processes occur: localized recombination introduces many new genes at relatively few sites, and recombination throughout the genome generates the novel multi-locus sequence types previously observed while preserving gene order.

  4. Census of solo LuxR genes in prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Sanjarbek eHudaiberdiev

    2015-03-01

    Full Text Available luxR genes encode transcriptional regulators that control acyl homoserine lactone-based quorum sensing (AHL QS in Gram negative bacteria. On the bacterial chromosome, luxR genes are usually found next or near to a luxI gene encoding the AHL signal synthase. Recently, a number of luxR genes were described that have no luxI genes in their vicinity on the chromosome. These so-called solo luxR genes may either respond to internal AHL signals produced by a non-adjacent luxI in the chromosome, or can respond to exogenous signals. Here we present a survey of solo luxR genes found in complete and draft bacterial genomes in the NCBI databases using HMMs. We found that 2698 of the 3550 luxR genes found are solos, which is an unexpectedly high number even if some of the hits may be false positives. We also found that solo LuxR sequences form distinct clusters that are different from the clusters of LuxR sequences that are part of the known luxR-luxI topological arrangements. We also found a number of cases that we termed twin luxR topologies, in which two adjacent luxR genes were in tandem or divergent orientation. Many of the luxR solo clusters were devoid of the sequence motifs characteristic of AHL binding LuxR proteins so there is room to speculate that the solos may be involved in sensing hitherto unknown signals. It was noted that only some of the LuxR clades are rich in conserved cysteine residues. Molecular modeling suggests that some of the cysteines may be involved in disulfide formation, which makes us speculate that some LuxR proteins, including some of the solos may be involved in redox regulation.

  5. SATB1 tethers multiple gene loci to reprogram expression profiledriving breast cancer metastasis

    Energy Technology Data Exchange (ETDEWEB)

    Han, Hye-Jung; Kohwi, Yoshinori; Kohwi-Shigematsu, Terumi

    2006-07-13

    Global changes in gene expression occur during tumor progression, as indicated by expression profiling of metastatic tumors. How this occurs is poorly understood. SATB1 functions as a genome organizer by folding chromatin via tethering multiple genomic loci and recruiting chromatin remodeling enzymes to regulate chromatin structure and expression of a large number of genes. Here we show that SATB1 is expressed at high levels in aggressive breast cancer cells, and is undetectable in non-malignant breast epithelial cells. Importantly, RNAi-mediated removal of SATB1 from highly-aggressive MDA-MB-231 cells altered the expression levels of over 1200 genes, restored breast-like acinar polarity in three-dimensional cultures, and prevented the metastastic phenotype in vivo. Conversely, overexpression of SATB1 in the less-aggressive breast cancer cell line Hs578T altered the gene expression profile and increased metastasis dramatically in vivo. Thus, SATB1 is a global regulator of gene expression in breast cancer cells, directly regulating crucial metastasis-associated genes, including ERRB2 (HER2/NEU), TGF-{beta}1, matrix metalloproteinase 3, and metastasin. The identification of SATB1 as a protein that re-programs chromatin organization and transcription profiles to promote breast cancer metastasis suggests a new model for metastasis and may provide means of therapeutic intervention.

  6. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    Science.gov (United States)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  7. Sampling Daphnia's expressed genes: preservation, expansion and invention of crustacean genes with reference to insect genomes

    Directory of Open Access Journals (Sweden)

    Bauer Darren J

    2007-07-01

    Full Text Available Abstract Background Functional and comparative studies of insect genomes have shed light on the complement of genes, which in part, account for shared morphologies, developmental programs and life-histories. Contrasting the gene inventories of insects to those of the nematodes provides insight into the genomic changes responsible for their diversification. However, nematodes have weak relationships to insects, as each belongs to separate animal phyla. A better outgroup to distinguish lineage specific novelties would include other members of Arthropoda. For example, crustaceans are close allies to the insects (together forming Pancrustacea and their fascinating aquatic lifestyle provides an important comparison for understanding the genetic basis of adaptations to life on land versus life in water. Results This study reports on the first characterization of cDNA libraries and sequences for the model crustacean Daphnia pulex. We analyzed 1,546 ESTs of which 1,414 represent approximately 787 nuclear genes, by measuring their sequence similarities with insect and nematode proteomes. The provisional annotation of genes is supported by expression data from microarray studies described in companion papers. Loci expected to be shared between crustaceans and insects because of their mutual biological features are identified, including genes for reproduction, regulation and cellular processes. We identify genes that are likely derived within Pancrustacea or lost within the nematodes. Moreover, lineage specific gene family expansions are identified, which suggest certain biological demands associated with their ecological setting. In particular, up to seven distinct ferritin loci are found in Daphnia compared to three in most insects. Finally, a substantial fraction of the sampled gene transcripts shares no sequence similarity with those from other arthropods. Genes functioning during development and reproduction are comparatively well conserved between

  8. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    Science.gov (United States)

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  9. High occurrence of functional new chimeric genes in survey of rice chromosome 3 short arm genome sequences.

    Science.gov (United States)

    Zhang, Chengjun; Wang, Jun; Marowsky, Nicholas C; Long, Manyuan; Wing, Rod A; Fan, Chuanzhu

    2013-01-01

    In an effort to identify newly evolved genes in rice, we searched the genomes of Asian-cultivated rice Oryza sativa ssp. japonica and its wild progenitors, looking for lineage-specific genes. Using genome pairwise comparison of approximately 20-Mb DNA sequences from the chromosome 3 short arm (Chr3s) in six rice species, O. sativa, O. nivara, O. rufipogon, O. glaberrima, O. barthii, and O. punctata, combined with synonymous substitution rate tests and other evidence, we were able to identify potential recently duplicated genes, which evolved within the last 1 Myr. We identified 28 functional O. sativa genes, which likely originated after O. sativa diverged from O. glaberrima. These genes account for around 1% (28/3,176) of all annotated genes on O. sativa's Chr3s. Among the 28 new genes, two recently duplicated segments contained eight genes. Fourteen of the 28 new genes consist of chimeric gene structure derived from one or multiple parental genes and flanking targeting sequences. Although the majority of these 28 new genes were formed by single or segmental DNA-based gene duplication and recombination, we found two genes that were likely originated partially through exon shuffling. Sequence divergence tests between new genes and their putative progenitors indicated that new genes were most likely evolving under natural selection. We showed all 28 new genes appeared to be functional, as suggested by Ka/Ks analysis and the presence of RNA-seq, cDNA, expressed sequence tag, massively parallel signature sequencing, and/or small RNA data. The high rate of new gene origination and of chimeric gene formation in rice may demonstrate rice's broad diversification, domestication, its environmental adaptation, and the role of new genes in rice speciation.

  10. Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting | Office of Cancer Genomics

    Science.gov (United States)

    The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest.

  11. Genome-scale identification method applied to find cryptic aminoglycoside resistance genes in Pseudomonas aeruginosa.

    Directory of Open Access Journals (Sweden)

    Julie M Struble

    identified a significant number of genomic regions that increased resistance to multiple aminoglycosides. We identified genetic regions that include open reading frames that encode for products from many functional categories, including genes related to O-antigen synthesis, DNA repair, and transcriptional and translational processes.

  12. Comprehensive analysis of CCCH-type zinc finger gene family in citrus (Clementine mandarin) by genome-wide characterization.

    Science.gov (United States)

    Liu, Shengrui; Khan, Muhammad Rehman Gul; Li, Yongping; Zhang, Jinzhi; Hu, Chungen

    2014-10-01

    The CCCH-type zinc finger proteins comprise a large gene family of regulatory proteins and are widely distributed in eukaryotic organisms. The CCCH proteins have been implicated in multiple biological processes and environmental responses in plants. Little information is available, however, about CCCH genes in plants, especially in woody plants such as citrus. The release of the whole-genome sequence of citrus allowed us to perform a genome-wide analysis of CCCH genes and to compare the identified proteins with their orthologs in model plants. In this study, 62 CCCH genes and a total of 132 CCCH motifs were identified, and a comprehensive analysis including the chromosomal locations, phylogenetic relationships, functional annotations, gene structures and conserved motifs was performed. Distribution mapping revealed that 54 of the 62 CCCH genes are unevenly dispersed on the nine citrus chromosomes. Based on phylogenetic analysis and gene structural features, we constructed 5 subfamilies of 62 CCCH members and integrative subfamilies from citrus, Arabidopsis, and rice, respectively. Importantly, large numbers of SNPs and InDels in 26 CCCH genes were identified from Poncirus trifoliata and Fortunella japonica using whole-genome deep re-sequencing. Furthermore, citrus CCCH genes showed distinct temporal and spatial expression patterns in different developmental processes and in response to various stress conditions. Our comprehensive analysis of CleC3Hs is a valuable resource that further elucidates the roles of CCCH family members in plant growth and development. In addition, variants and comparative genomics analyses deepen our understanding of the evolution of the CCCH gene family and will contribute to further genetics and genomics studies of citrus and other plant species.

  13. The compact Selaginella genome identifies changes in gene content associated with the evolution of vascular plants

    Energy Technology Data Exchange (ETDEWEB)

    Grigoriev, Igor V.; Banks, Jo Ann; Nishiyama, Tomoaki; Hasebe, Mitsuyasu; Bowman, John L.; Gribskov, Michael; dePamphilis, Claude; Albert, Victor A.; Aono, Naoki; Aoyama, Tsuyoshi; Ambrose, Barbara A.; Ashton, Neil W.; Axtell, Michael J.; Barker, Elizabeth; Barker, Michael S.; Bennetzen, Jeffrey L.; Bonawitz, Nicholas D.; Chapple, Clint; Cheng, Chaoyang; Correa, Luiz Gustavo Guedes; Dacre, Michael; DeBarry, Jeremy; Dreyer, Ingo; Elias, Marek; Engstrom, Eric M.; Estelle, Mark; Feng, Liang; Finet, Cedric; Floyd, Sandra K.; Frommer, Wolf B.; Fujita, Tomomichi; Gramzow, Lydia; Gutensohn, Michael; Harholt, Jesper; Hattori, Mitsuru; Heyl, Alexander; Hirai, Tadayoshi; Hiwatashi, Yuji; Ishikawa, Masaki; Iwata, Mineko; Karol, Kenneth G.; Koehler, Barbara; Kolukisaoglu, Uener; Kubo, Minoru; Kurata, Tetsuya; Lalonde, Sylvie; Li, Kejie; Li, Ying; Litt, Amy; Lyons, Eric; Manning, Gerard; Maruyama, Takeshi; Michael, Todd P.; Mikami, Koji; Miyazaki, Saori; Morinaga, Shin-ichi; Murata, Takashi; Mueller-Roeber, Bernd; Nelson, David R.; Obara, Mari; Oguri, Yasuko; Olmstead, Richard G.; Onodera, Naoko; Petersen, Bent Larsen; Pils, Birgit; Prigge, Michael; Rensing, Stefan A.; Riano-Pachon, Diego Mauricio; Roberts, Alison W.; Sato, Yoshikatsu; Scheller, Henrik Vibe; Schulz, Burkhard; Schulz, Christian; Shakirov, Eugene V.; Shibagaki, Nakako; Shinohara, Naoki; Shippen, Dorothy E.; Sorensen, Iben; Sotooka, Ryo; Sugimoto, Nagisa; Sugita, Mamoru; Sumikawa, Naomi; Tanurdzic, Milos; Theilsen, Gunter; Ulvskov, Peter; Wakazuki, Sachiko; Weng, Jing-Ke; Willats, William W.G.T.; Wipf, Daniel; Wolf, Paul G.; Yang, Lixing; Zimmer, Andreas D.; Zhu, Qihui; Mitros, Therese; Hellsten, Uffe; Loque, Dominique; Otillar, Robert; Salamov, Asaf; Schmutz, Jeremy; Shapiro, Harris; Lindquist, Erika; Lucas, Susan; Rokhsar, Daniel

    2011-04-28

    We report the genome sequence of the nonseed vascular plant, Selaginella moellendorffii, and by comparative genomics identify genes that likely played important roles in the early evolution of vascular plants and their subsequent evolution

  14. Neuropeptide Y receptor gene y6: multiple deaths or resurrections?

    Science.gov (United States)

    Starbäck, P; Wraith, A; Eriksson, H; Larhammar, D

    2000-10-14

    The neuropeptide Y family of G-protein-coupled receptors consists of five cloned members in mammals. Four genes give rise to functional receptors in all mammals investigated. The y6 gene is a pseudogene in human and pig and is absent in rat, but generates a functional receptor in rabbit and mouse and probably in the collared peccary (Pecari tajacu), a distant relative of the pig family. We report here that the guinea pig y6 gene has a highly distorted nucleotide sequence with multiple frame-shift mutations. One evolutionary scenario may suggest that y6 was inactivated before the divergence of the mammalian orders and subsequently resurrected in some lineages. However, the pseudogene mutations seem to be distinct in human, pig, and guinea pig, arguing for separate inactivation events. In either case, the y6 gene has a quite unusual evolutionary history with multiple independent deaths or resurrections.

  15. Cytokines gene expression in newly diagnosed multiple sclerosis patients.

    OpenAIRE

    Seyed Javad Hasheminia; Sepideh Tolouei; Sayyed Hamid Zarkesh-Esfahani; Vahid Shaygannejad; Hedaiat Allah Shirzad; Reza Torabi; Morteza Hashem Zadeh Chaloshtory

    2015-01-01

    Multiple Sclerosis (MS) is characterized by multiple areas of inflammation, demyelination and neurodegeneration. Infiltrating Th1 CD4+ T cells secrete proinflammatory cytokines. They stimulate the release of some cytokines, expression of adhesion molecules and these cytokines may cause damage to the myelin sheath and axons. In this study, we analyzed plasma levels and gene expressions of five important cytokines in the new diagnosed MS Patients by ELISA and Real time PCR. PCR amplifications w...

  16. EasyCloneMulti: A Set of Vectors for Simultaneous and Multiple Genomic Integrations in Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Jérôme Maury

    Full Text Available Saccharomyces cerevisiae is widely used in the biotechnology industry for production of ethanol, recombinant proteins, food ingredients and other chemicals. In order to generate highly producing and stable strains, genome integration of genes encoding metabolic pathway enzymes is the preferred option. However, integration of pathway genes in single or few copies, especially those encoding rate-controlling steps, is often not sufficient to sustain high metabolic fluxes. By exploiting the sequence diversity in the long terminal repeats (LTR of Ty retrotransposons, we developed a new set of integrative vectors, EasyCloneMulti, that enables multiple and simultaneous integration of genes in S. cerevisiae. By creating vector backbones that combine consensus sequences that aim at targeting subsets of Ty sequences and a quickly degrading selective marker, integrations at multiple genomic loci and a range of expression levels were obtained, as assessed with the green fluorescent protein (GFP reporter system. The EasyCloneMulti vector set was applied to balance the expression of the rate-controlling step in the β-alanine pathway for biosynthesis of 3-hydroxypropionic acid (3HP. The best 3HP producing clone, with 5.45 g.L(-1 of 3HP, produced 11 times more 3HP than the lowest producing clone, which demonstrates the capability of EasyCloneMulti vectors to impact metabolic pathway enzyme activity.

  17. EasyCloneMulti: A Set of Vectors for Simultaneous and Multiple Genomic Integrations in Saccharomyces cerevisiae.

    Science.gov (United States)

    Maury, Jérôme; Germann, Susanne M; Baallal Jacobsen, Simo Abdessamad; Jensen, Niels B; Kildegaard, Kanchana R; Herrgård, Markus J; Schneider, Konstantin; Koza, Anna; Forster, Jochen; Nielsen, Jens; Borodina, Irina

    2016-01-01

    Saccharomyces cerevisiae is widely used in the biotechnology industry for production of ethanol, recombinant proteins, food ingredients and other chemicals. In order to generate highly producing and stable strains, genome integration of genes encoding metabolic pathway enzymes is the preferred option. However, integration of pathway genes in single or few copies, especially those encoding rate-controlling steps, is often not sufficient to sustain high metabolic fluxes. By exploiting the sequence diversity in the long terminal repeats (LTR) of Ty retrotransposons, we developed a new set of integrative vectors, EasyCloneMulti, that enables multiple and simultaneous integration of genes in S. cerevisiae. By creating vector backbones that combine consensus sequences that aim at targeting subsets of Ty sequences and a quickly degrading selective marker, integrations at multiple genomic loci and a range of expression levels were obtained, as assessed with the green fluorescent protein (GFP) reporter system. The EasyCloneMulti vector set was applied to balance the expression of the rate-controlling step in the β-alanine pathway for biosynthesis of 3-hydroxypropionic acid (3HP). The best 3HP producing clone, with 5.45 g.L(-1) of 3HP, produced 11 times more 3HP than the lowest producing clone, which demonstrates the capability of EasyCloneMulti vectors to impact metabolic pathway enzyme activity.

  18. CBS: an open platform that integrates predictive methods and epigenetics information to characterize conserved regulatory features in multiple Drosophila genomes

    Directory of Open Access Journals (Sweden)

    Blanco Enrique

    2012-12-01

    Full Text Available Abstract Background Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.

  19. Genome-Wide Screening of Genes Required for Glycosylphosphatidylinositol Biosynthesis.

    Directory of Open Access Journals (Sweden)

    Yao Rong

    Full Text Available Glycosylphosphatidylinositol (GPI is synthesized and transferred to proteins in the endoplasmic reticulum (ER. GPI-anchored proteins are then transported from the ER to the plasma membrane through the Golgi apparatus. To date, at least 17 steps have been identified to be required for the GPI biosynthetic pathway. Here, we aimed to establish a comprehensive screening method to identify genes involved in GPI biosynthesis using mammalian haploid screens. Human haploid cells were mutagenized by the integration of gene trap vectors into the genome. Mutagenized cells were then treated with a bacterial pore-forming toxin, aerolysin, which binds to GPI-anchored proteins for targeting to the cell membrane. Cells that showed low surface expression of CD59, a GPI-anchored protein, were further enriched for. Gene trap insertion sites in the non-selected population and in the enriched population were determined by deep sequencing. This screening enriched 23 gene regions among the 26 known GPI biosynthetic genes, which when mutated are expected to decrease the surface expression of GPI-anchored proteins. Our results indicate that the forward genetic approach using haploid cells is a useful and powerful technique to identify factors involved in phenotypes of interest.

  20. A genome-wide search for genes involved in type 2 diabetes in a recently genetically isolated population from the Netherlands

    NARCIS (Netherlands)

    Y.S. Aulchenko (Yurii); N. Vaessen (Norbert); P. Heutink (Peter); J. Pullen (Jan); P.J.L.M. Snijders (Pieter); A. Hofman (Albert); L.A. Sandkuijl (Lodewijk); J.J. Houwing-Duistermaat (Jeanine); S. Bennett (Simon); B.A. Oostra (Ben); C.M. van Duijn (Cock); M. Edwards (Mark)

    2003-01-01

    textabstractMultiple genes, interacting with the environment, contribute to the susceptibility to type 2 diabetes. We performed a genome-wide search to localize type 2 diabetes susceptibility genes in a recently genetically isolated population in the Netherlands. We identified 79 nuclear families wi

  1. Genome-Wide Analysis of the Sus Gene Family in Cotton

    Institute of Scientific and Technical Information of China (English)

    Changsong Zou; Cairui Lu; Haihong Shang; Xinrui Jing; Hailiang Cheng; Youping Zhang; Guoli Song

    2013-01-01

    Sucrose synthase (Sus) is a key enzyme in plant sucrose metabolism.In cotton,Sus (EC 2.4.1.13) is the main enzyme that degrades sucrose imported into cotton fibers from the phloem of the seed coat.This study demonstrated that the genomes of Gossypium arboreum L.,G.raimondii Ulbr.,and G.hirsutum L.,contained 8,8,and 15 Sus genes,respectively.Their structural organizations,phylogenetic relationships,and expression profiles were characterized.Comparisons of genomic and coding sequences identified multiple introns,the number and positions of which were highly conserved between diploid and allotetraploid cotton species.Most of the phylogenetic clades contained sequences from all three species,suggesting that the Sus genes of tetraploid G.hirsutum derived from those of its diploid ancestors.One Sus group (Sus I) underwent expansion during cotton evolution.Expression analyses indicated that most Sus genes were differentially expressed in various tissues and had development-dependent expression profiles in cotton fiber cells.Members of the same orthologous group had very similar expression patterns in all three species.These results provide new insights into the evolution of the cotton Sus gene family,and insight into its members' physiological functions during fiber growth and development.

  2. The first myriapod genome sequence reveals conservative arthropod gene content and genome organisation in the centipede Strigamia maritima.

    OpenAIRE

    2014-01-01

    Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologue...

  3. The first myriapod genome sequence reveals conservative arthropod gene content and genome organisation in the centipede strigamia maritima

    OpenAIRE

    2014-01-01

    Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologue...

  4. The First Myriapod Genome Sequence Reveals Conservative Arthropod Gene Content and Genome Organisation in the Centipede Strigamia maritima

    OpenAIRE

    2014-01-01

    Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologue...

  5. Identification and distribution of the NBS-LRR gene family in the cassava genome

    Science.gov (United States)

    Plant resistance genes (R genes) exist in large families and usually contain both a nucleotide-binding site domain and a leucine-rich repeat domain, denoted NBS-LRR. The genome sequence of cassava (Manihot esculenta) is a valuable resource for analyzing the genomic organization of resistance genes i...

  6. Genome-wide significant association between alcohol dependence and a variant in the ADH gene cluster.

    Science.gov (United States)

    Frank, Josef; Cichon, Sven; Treutlein, Jens; Ridinger, Monika; Mattheisen, Manuel; Hoffmann, Per; Herms, Stefan; Wodarz, Norbert; Soyka, Michael; Zill, Peter; Maier, Wolfgang; Mössner, Rainald; Gaebel, Wolfgang; Dahmen, Norbert; Scherbaum, Norbert; Schmäl, Christine; Steffens, Michael; Lucae, Susanne; Ising, Marcus; Müller-Myhsok, Bertram; Nöthen, Markus M; Mann, Karl; Kiefer, Falk; Rietschel, Marcella

    2012-01-01

    Alcohol dependence (AD) is an important contributory factor to the global burden of disease. The etiology of AD involves both environmental and genetic factors, and the disorder has a heritability of around 50%. The aim of the present study was to identify susceptibility genes for AD by performing a genome-wide association study (GWAS). The sample comprised 1333 male in-patients with severe AD according to the Diagnostic and Statistical Manual of Mental Disorders, 4th edition, and 2168 controls. These included 487 patients and 1358 controls from a previous GWAS study by our group. All individuals were of German descent. Single-marker tests and a polygenic score-based analysis to assess the combined contribution of multiple markers with small effects were performed. The single nucleotide polymorphism (SNP) rs1789891, which is located between the ADH1B and ADH1C genes, achieved genome-wide significance [P = 1.27E-8, odds ratio (OR) = 1.46]. Other markers from this region were also associated with AD, and conditional analyses indicated that these made a partially independent contribution. The SNP rs1789891 is in complete linkage disequilibrium with the functional Arg272Gln variant (P = 1.24E-7, OR = 1.31) of the ADH1C gene, which has been reported to modify the rate of ethanol oxidation to acetaldehyde in vitro. A polygenic score-based approach produced a significant result (P = 9.66E-9). This is the first GWAS of AD to provide genome-wide significant support for the role of the ADH gene cluster and to suggest a polygenic component to the etiology of AD. The latter result may indicate that many more AD susceptibility genes still await identification.

  7. The genome of Chelonid herpesvirus 5 harbors atypical genes

    Science.gov (United States)

    Ackermann, Mathias; Koriabine, Maxim; Hartmann-Fritsch, Fabienne; de Jong, Pieter J.; Lewis, Teresa D.; Schetle, Nelli; Work, Thierry M.; Dagenais, Julie; Balazs, George H.; Leong, Jo-Ann C.

    2012-01-01

    The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within thealphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of “atypical” DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis

  8. The genome of Chelonid herpesvirus 5 harbors atypical genes.

    Directory of Open Access Journals (Sweden)

    Mathias Ackermann

    Full Text Available The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5 is believed to be the causative agent of fibropapillomatosis (FP, a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within the alphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2, an orthologue to the mouse cytomegalovirus M04 (F-M04 and a viral sialyltransferase (F-sial. Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1 the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2 Using FP-associated UL52 (F-UL52 as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of "atypical" DNA in independent FP cases. (3 Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4 The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis.

  9. The genome of Chelonid herpesvirus 5 harbors atypical genes.

    Science.gov (United States)

    Ackermann, Mathias; Koriabine, Maxim; Hartmann-Fritsch, Fabienne; de Jong, Pieter J; Lewis, Teresa D; Schetle, Nelli; Work, Thierry M; Dagenais, Julie; Balazs, George H; Leong, Jo-Ann C

    2012-01-01

    The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within the alphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of "atypical" DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis.

  10. Gene targeting, genome editing: from Dolly to editors.

    Science.gov (United States)

    Tan, Wenfang; Proudfoot, Chris; Lillico, Simon G; Whitelaw, C Bruce A

    2016-06-01

    One of the most powerful strategies to investigate biology we have as scientists, is the ability to transfer genetic material in a controlled and deliberate manner between organisms. When applied to livestock, applications worthy of commercial venture can be devised. Although initial methods used to generate transgenic livestock resulted in random transgene insertion, the development of SCNT technology enabled homologous recombination gene targeting strategies to be used in livestock. Much has been accomplished using this approach. However, now we have the ability to change a specific base in the genome without leaving any other DNA mark, with no need for a transgene. With the advent of the genome editors this is now possible and like other significant technological leaps, the result is an even greater diversity of possible applications. Indeed, in merely 5 years, these 'molecular scissors' have enabled the production of more than 300 differently edited pigs, cattle, sheep and goats. The advent of genome editors has brought genetic engineering of livestock to a position where industry, the public and politicians are all eager to see real use of genetically engineered livestock to address societal needs. Since the first transgenic livestock reported just over three decades ago the field of livestock biotechnology has come a long way-but the most exciting period is just starting.

  11. Large-scale prokaryotic gene prediction and comparison to genome annotation

    DEFF Research Database (Denmark)

    Nielsen, Pernille; Krogh, Anders Stærmose

    2005-01-01

    Motivation: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome...... genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms......-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation....

  12. Simultaneous integration of multiple genes into the Kluyveromyces marxianus chromosome.

    Science.gov (United States)

    Heo, Paul; Yang, Tae-Jun; Chung, Soon-Chun; Cheon, Yuna; Kim, Jun-Seob; Park, Jun-Bum; Koo, Hyun Min; Cho, Kwang Myung; Seo, Jin-Ho; Park, Jae Chan; Kweon, Dae-Hyuk

    2013-09-10

    While Kluyveromyces marxianus is a promising yeast strain for biotechnological applications, genetic engineering of this strain is still challenging, especially when multiple genes are to be transformed. Sequential gene integration, which takes advantage of repetitive insertion/excision of the URA3 gene as a marker, has been the best option until now, because the URA3-deletion mutant is the only precondition for this method. However, we found that the introduced gene is co-excised during the URA3 excision step for next gene introduction, resulting in a very low cumulative probability (<1.57×10⁻⁶ % for 4 genes) of integrating all genes of interest. To overcome this extremely low probability, and to reduce labor and time, all 4 genes were simultaneously transformed. Surprisingly, the infamously high 'non-homologous end joining' activity of K. marxianus enabled simultaneous integration of all 4 genes in a single step, with a probability of 7.9%. Various K. marxianus strains could also be similarly transformed. Our finding not only reduces the labor and time required for such procedures, but also removes a number of preconditions, such as pre-made vectors, selection markers and knockout mutants, which are needed to introduce many genes into K. marxianus.

  13. Aberrant gene promoter methylation associated with sporadic multiple colorectal cancer.

    Directory of Open Access Journals (Sweden)

    Victoria Gonzalo

    Full Text Available BACKGROUND: Colorectal cancer (CRC multiplicity has been mainly related to polyposis and non-polyposis hereditary syndromes. In sporadic CRC, aberrant gene promoter methylation has been shown to play a key role in carcinogenesis, although little is known about its involvement in multiplicity. To assess the effect of methylation in tumor multiplicity in sporadic CRC, hypermethylation of key tumor suppressor genes was evaluated in patients with both multiple and solitary tumors, as a proof-of-concept of an underlying epigenetic defect. METHODOLOGY/PRINCIPAL FINDINGS: We examined a total of 47 synchronous/metachronous primary CRC from 41 patients, and 41 gender, age (5-year intervals and tumor location-paired patients with solitary tumors. Exclusion criteria were polyposis syndromes, Lynch syndrome and inflammatory bowel disease. DNA methylation at the promoter region of the MGMT, CDKN2A, SFRP1, TMEFF2, HS3ST2 (3OST2, RASSF1A and GATA4 genes was evaluated by quantitative methylation specific PCR in both tumor and corresponding normal appearing colorectal mucosa samples. Overall, patients with multiple lesions exhibited a higher degree of methylation in tumor samples than those with solitary tumors regarding all evaluated genes. After adjusting for age and gender, binomial logistic regression analysis identified methylation of MGMT2 (OR, 1.48; 95% CI, 1.10 to 1.97; p = 0.008 and RASSF1A (OR, 2.04; 95% CI, 1.01 to 4.13; p = 0.047 as variables independently associated with tumor multiplicity, being the risk related to methylation of any of these two genes 4.57 (95% CI, 1.53 to 13.61; p = 0.006. Moreover, in six patients in whom both tumors were available, we found a correlation in the methylation levels of MGMT2 (r = 0.64, p = 0.17, SFRP1 (r = 0.83, 0.06, HPP1 (r = 0.64, p = 0.17, 3OST2 (r = 0.83, p = 0.06 and GATA4 (r = 0.6, p = 0.24. Methylation in normal appearing colorectal mucosa from patients with multiple and solitary CRC showed no relevant

  14. Ascaris phylogeny based on multiple whole mtDNA genomes

    DEFF Research Database (Denmark)

    Nejsum, Peter; Hawash, Mohamed B F; Betson, Martha

    2016-01-01

    Ascaris lumbricoides and A. suum are two parasitic nematodes infecting humans and pigs, respectively. There has been considerable debate as to whether Ascaris in the two hosts should be considered a single or two separate species. Previous studies identified at least three major clusters (A, B...... and C) of human and pig Ascaris based on partial cox1 sequences. In the present study, we selected major haplotypes from these different clusters to characterize their whole mitochondrial genomes for phylogenetic analysis. We also undertook coalescent simulations to investigate the evolutionary history...... events: the first one occurring early in the Neolithic period which resulted in a differentiated population of Ascaris in pigs (cluster C), the second occurring more recently (~ 900 generations ago), resulting in clusters A and B which might have been spread worldwide by human activities....

  15. Volume visualization of multiple alignment of genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Shah, Nameeta; Weber, Gunther H.; Dillard, Scott E.; Hamann, Bernd

    2004-05-01

    Genomes of hundreds of species have been sequenced to date and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We provide results for aligned DNA sequence data and compare it with traditional 1D line plots. Our technique, coupled with 1D line plots, results in effective multiresolution visualization of very large aligned sequence data sets.

  16. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

    Science.gov (United States)

    Xiang, Zuoshuang; Qin, Tingting; Qin, Zhaohui S; He, Yongqun

    2013-10-16

    The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining

  17. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    Directory of Open Access Journals (Sweden)

    Khan Shafiq A

    2003-06-01

    Full Text Available Abstract Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells.

  18. Recognizing genes and other components of genomic structure

    Energy Technology Data Exchange (ETDEWEB)

    Burks, C. (Los Alamos National Lab., NM (USA)); Myers, E. (Arizona Univ., Tucson, AZ (USA). Dept. of Computer Science); Stormo, G.D. (Colorado Univ., Boulder, CO (USA). Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  19. Single cell genomics indicates horizontal gene transfer and viral infections in a deep subsurface Firmicutes population

    Directory of Open Access Journals (Sweden)

    Jessica eLabonté

    2015-04-01

    Full Text Available A major fraction of Earth's prokaryotic biomass dwells in the deep subsurface, where cellular abundances per volume of sample are lower, metabolism is slower, and generation times are longer than those in surface terrestrial and marine environments. How these conditions impact biotic interactions and evolutionary processes is largely unknown. Here we employed single cell genomics to analyze cell-to-cell genome content variability and signatures of horizontal gene transfer (HGT and viral infections in five cells of Candidatus Desulforudis audaxviator, which were collected from a three km-deep fracture water in the 2.9 Ga-old Witwatersrand Basin of South Africa. Between 0 and 32 % of genes recovered from single cells were not present in the original, metagenomic assembly of Desulforudis, which was obtained from a neighboring subsurface fracture. We found a transposable prophage, a retron, multiple clustered regularly interspaced short palindromic repeats (CRISPRs and restriction-modification systems, and an unusually high frequency of transposases in the analyzed single cell genomes. This indicates that recombination, HGT and viral infections are prevalent evolutionary events in the studied population of microorganisms inhabiting a highly stable deep subsurface environment.

  20. Gene interactions in the evolution of genomic imprinting.

    Science.gov (United States)

    Wolf, J B; Brandvain, Y

    2014-08-01

    Numerous evolutionary theories have been developed to explain the epigenetic phenomenon of genomic imprinting. Here, we explore a subset of theories wherein non-additive genetic interactions can favour imprinting. In the simplest genic interaction--the case of underdominance--imprinting can be favoured to hide effectively low-fitness heterozygous genotypes; however, as there is no asymmetry between maternally and paternally inherited alleles in this model, other means of enforcing monoallelic expression may be more plausible evolutionary outcomes than genomic imprinting. By contrast, more successful interaction models of imprinting rely on an asymmetry between the maternally and paternally inherited alleles at a locus that favours the silencing of one allele as a means of coordinating the expression of high-fitness allelic combinations. For example, with interactions between autosomal loci, imprinting functionally preserves high-fitness genotypes that were favoured by selection in the previous generation. In this scenario, once a focal locus becomes imprinted, selection at interacting loci favours a matching imprint. Uniparental transmission generates similar asymmetries for sex chromosomes and cytoplasmic factors interacting with autosomal loci, with selection favouring the expression of either maternal or paternally derived autosomal alleles depending on the pattern of transmission of the uniparentally inherited factor. In a final class of models, asymmetries arise when genes expressed in offspring interact with genes expressed in one of its parents. Under such a scenario, a locus evolves to have imprinted expression in offspring to coordinate the interaction with its parent's genome. We illustrate these models and explore key links and differences using a unified framework.

  1. Genomic organisation of the seven ParaHox genes of coelacanths

    OpenAIRE

    Mulley, John F; Holland, Peter WH

    2013-01-01

    Human and mouse genomes contain six ParaHox genes implicated in gut and neural patterning. In coelacanths and cartilaginous fish, an additional ParaHox gene exists—Pdx2—that dates back to the genome duplications in early vertebrate evolution. Here we examine the genomic arrangement and flanking genes of all ParaHox genes in coelacanths, to determine the full complement of these genes. We find that coelacanths have seven ParaHox genes in total, in four chromosomal locations, revealing that fiv...

  2. Strigolactone biology: genes, functional genomics, epigenetics and applications.

    Science.gov (United States)

    Makhzoum, Abdullah; Yousefzadi, Morteza; Malik, Sonia; Gantet, Pascal; Tremouillaux-Guiller, Jocelyne

    2017-03-01

    Strigolactones (SLs) represent an important new plant hormone class marked by their multifunctional role in plant and rhizosphere interactions. These compounds stimulate hyphal branching in arbuscular mycorrhizal fungi (AMF) and seed germination of root parasitic plants. In addition, they are involved in the control of plant architecture by inhibiting bud outgrowth as well as many other morphological and developmental processes together with other plant hormones such as auxins and cytokinins. The biosynthetic pathway of SLs that are derived from carotenoids was partially decrypted based on the identification of mutants from a variety of plant species. Only a few SL biosynthetic and regulated genes and related regulatory transcription factors have been identified. However, functional genomics and epigenetic studies started to give first elements on the modality of the regulation of SLs related genes. Since they control plant architecture and plant-rhizosphere interaction, SLs start to be used for agronomical and biotechnological applications. Furthermore, the genes involved in the SL biosynthetic pathway and genes regulated by SL constitute interesting targets for plant breeding. Therefore, it is necessary to decipher and better understand the genetic determinants of their regulation at different levels.

  3. Restriction genes for retroviruses influence the risk of multiple sclerosis

    DEFF Research Database (Denmark)

    Nexø, Bjørn A; Hansen, Bettina; Nissen, Kari K

    2013-01-01

    We recently described that the autoimmune, central nervous system disease, multiple sclerosis (MS), is genetically associated with the human endogenous retroviral locus, HERV-Fc1, in Scandinavians. A number of dominant human genes encoding factors that restrict retrovirus replication have been...

  4. [The application of genome editing in identification of plant gene function and crop breeding].

    Science.gov (United States)

    Xiangchun, Zhou; Yongzhong, Xing

    2016-03-01

    Plant genome can be modified via current biotechnology with high specificity and excellent efficiency. Zinc finger nucleases (ZFN), transcription activator-like effector nucleases (TALEN) and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9) system are the key engineered nucleases used in the genome editing. Genome editing techniques enable gene targeted mutagenesis, gene knock-out, gene insertion or replacement at the target sites during the endogenous DNA repair process, including non-homologous end joining (NHEJ) and homologous recombination (HR), triggered by the induction of DNA double-strand break (DSB). Genome editing has been successfully applied in the genome modification of diverse plant species, such as Arabidopsis thaliana, Oryza sativa, and Nicotiana tabacum. In this review, we summarize the application of genome editing in identification of plant gene function and crop breeding. Moreover, we also discuss the improving points of genome editing in crop precision genetic improvement for further study.

  5. Genomic Physics. Multiple Laser Beam Treatment of Alzheimer's Disease

    Science.gov (United States)

    Stefan, V. Alexander

    2014-03-01

    The synapses affected by Alzheimer's disease can be rejuvenated by the multiple ultrashort wavelength laser beams.[2] The guiding lasers scan the whole area to detect the amyloid plaques based on the laser scattering technique. The scanning lasers pinpoint the areas with plaques and eliminate them. Laser interaction is highly efficient, because of the focusing capabilities and possibility for the identification of the damaging proteins by matching the protein oscillation eigen-frequency with laser frequency.[3] Supported by Nikola Tesla Labs, La Jolla, California, USA.

  6. Volume visualization of multiple alignment of large genomicDNA

    Energy Technology Data Exchange (ETDEWEB)

    Shah, Nameeta; Dillard, Scott E.; Weber, Gunther H.; Hamann, Bernd

    2005-07-25

    Genomes of hundreds of species have been sequenced to date, and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. As a result, tools using 1D representations are incapable of providing informatory overview for extremely large data sets. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We demonstrate our technique using multi-millions-basepair-long aligned DNA sequence data and compare it with traditional 1D line plots. The results show that our technique is superior in providing an overview of entire data sets. Our technique, coupled with 1D line plots, results in effective multi-resolution visualization of very large aligned sequence data sets.

  7. Multiple and variable NHEJ-like genes are involved in resistance to DNA damage in Streptomyces ambofaciens

    Directory of Open Access Journals (Sweden)

    Grégory Hoff

    2016-11-01

    Full Text Available Non homologous end-joining (NHEJ is a double strand break (DSB repair pathway which does not require any homologous template and can ligate two DNA ends together. The basic bacterial NHEJ machinery involves two partners: the Ku protein, a DNA end binding protein for DSB recognition and the multifunctional LigD protein composed a ligase, a nuclease and a polymerase domain, for end processing and ligation of the broken ends. In silico analyses performed in the 38 sequenced genomes of Streptomyces species revealed the existence of a large panel of NHEJ-like genes. Indeed, ku genes or ligD domain homologues are scattered throughout the genome in multiple copies and can be distinguished in two categories: the core NHEJ gene set constituted of conserved loci and the variable NHEJ gene set constituted of NHEJ-like genes present in only a part of the species. In Streptomyces ambofaciens ATCC 23877, not only the deletion of core genes but also that of variable genes led to an increased sensitivity to DNA damage induced by electron beam irradiation. Multiple mutants of ku, ligase or polymerase encoding genes showed an aggravated phenotype compared to single mutants. Biochemical assays revealed the ability of Ku-like proteins to protect and to stimulate ligation of DNA ends. RT-qPCR and GFP fusion experiments suggested that ku-like genes show a growth phase dependent expression profile consistent with their involvement in DNA repair during spores formation and/or germination.

  8. The genomic landscape underlying phenotypic integrity in the face of gene flow in crows.

    Science.gov (United States)

    Poelstra, J W; Vijay, N; Bossu, C M; Lantz, H; Ryll, B; Müller, I; Baglione, V; Unneberg, P; Wikelski, M; Grabherr, M G; Wolf, J B W

    2014-06-20

    The importance, extent, and mode of interspecific gene flow for the evolution of species has long been debated. Characterization of genomic differentiation in a classic example of hybridization between all-black carrion crows and gray-coated hooded crows identified genome-wide introgression extending far beyond the morphological hybrid zone. Gene expression divergence was concentrated in pigmentation genes expressed in gray versus black feather follicles. Only a small number of narrow genomic islands exhibited resistance to gene flow. One prominent genomic region (<2 megabases) harbored 81 of all 82 fixed differences (of 8.4 million single-nucleotide polymorphisms in total) linking genes involved in pigmentation and in visual perception-a genomic signal reflecting color-mediated prezygotic isolation. Thus, localized genomic selection can cause marked heterogeneity in introgression landscapes while maintaining phenotypic divergence. Copyright © 2014, American Association for the Advancement of Science.

  9. Identification of Chromosomes from Multiple Rice Genomes Using a Universal Molecular Cytogenetic Marker System

    Institute of Scientific and Technical Information of China (English)

    Xiaomin Tang; Weidong Bao; Wenli Zhang; Zhukuan Cheng

    2007-01-01

    To develop reliable techniques for chromosome identification is critical for cytogenetic research, especially for genomes with a large number and smaller-sized chromosomes. An efficient approach using bacterial artificial chromosome (BAG) clones as molecular cytological markers has been developed for many organisms. Herein, we present a set of chromosomal arm-specific molecular cytological markers derived from the gene-enriched regions of the sequenced rice genome. All these markers are able to generate very strong signals on the pachytene chromosomes of Oryza satlva L. (AA genome) when used as fluorescence in situ hybridization (FISH) probes. We further probed those markers to the pachytene chromosomes of O. punctata (BB genome) and O. officinalis (CC genome) and also got very strong signals on the relevant pachytene chromosomes. The signal position of each marker on the related chromosomes from the three different rice genomes was pretty much stable, which enabled us to identify different chromosomes among various rice genomes. We also constructed the karyotype for both O. punctata and O. officinalis with the BB and CC genomes, respectively, by analysis of 10 pachytene cells anchored by these chromosomal arm-specific markers.

  10. Complete Taiwanese Macaque (Macaca cyclopis) Mitochondrial Genome: Reference-Assisted de novo Assembly with Multiple k-mer Strategy.

    Science.gov (United States)

    Huang, Yu-Feng; Midha, Mohit; Chen, Tzu-Han; Wang, Yu-Tai; Smith, David Glenn; Pei, Kurtis Jai-Chyi; Chiu, Kuo Ping

    2015-01-01

    The Taiwanese (Formosan) macaque (Macaca cyclopis) is the only nonhuman primate endemic to Taiwan. This primate species is valuable for evolutionary studies and as subjects in medical research. However, only partial fragments of the mitochondrial genome (mitogenome) of this primate species have been sequenced, not mentioning its nuclear genome. We employed next-generation sequencing to generate 2 x 90 bp paired-end reads, followed by reference-assisted de novo assembly with multiple k-mer strategy to characterize the M. cyclopis mitogenome. We compared the assembled mitogenome with that of other macaque species for phylogenetic analysis. Our results show that, the M. cyclopis mitogenome consists of 16,563 nucleotides encoding for 13 protein-coding genes, 2 ribosomal RNAs and 22 transfer RNAs. Phylogenetic analysis indicates that M. cyclopis is most closely related to M. mulatta lasiota (Chinese rhesus macaque), supporting the notion of Asia-continental origin of M. cyclopis proposed in previous studies based on partial mitochondrial sequences. Our work presents a novel approach for assembling a mitogenome that utilizes the capabilities of de novo genome assembly with assistance of a reference genome. The availability of the complete Taiwanese macaque mitogenome will facilitate the study of primate evolution and the characterization of genetic variations for the potential usage of this species as a non-human primate model for medical research.

  11. A bi-dimensional genome scan for prolificacy traits in pigs shows the existence of multiple epistatic QTL

    Directory of Open Access Journals (Sweden)

    Bidanel Jean P

    2009-12-01

    Full Text Available Abstract Background Prolificacy is the most important trait influencing the reproductive efficiency of pig production systems. The low heritability and sex-limited expression of prolificacy have hindered to some extent the improvement of this trait through artificial selection. Moreover, the relative contributions of additive, dominant and epistatic QTL to the genetic variance of pig prolificacy remain to be defined. In this work, we have undertaken this issue by performing one-dimensional and bi-dimensional genome scans for number of piglets born alive (NBA and total number of piglets born (TNB in a three generation Iberian by Meishan F2 intercross. Results The one-dimensional genome scan for NBA and TNB revealed the existence of two genome-wide highly significant QTL located on SSC13 (P SSC17 (P P P P P Conclusions The complex inheritance of prolificacy traits in pigs has been evidenced by identifying multiple additive (SSC13 and SSC17, dominant and epistatic QTL in an Iberian × Meishan F2 intercross. Our results demonstrate that a significant fraction of the phenotypic variance of swine prolificacy traits can be attributed to first-order gene-by-gene interactions emphasizing that the phenotypic effects of alleles might be strongly modulated by the genetic background where they segregate.

  12. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, Marie; Jensen, L.J.; Brunak, Søren;

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribut......In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...

  13. Human gene encoding prostacyclin synthase (PTGIS): Genomic organization, chromosomal localization, and promoter activity

    Energy Technology Data Exchange (ETDEWEB)

    Yokoyama, Chieko; Yabuki, Tomoko; Inoue, Hiroyasu [National Cardiovascular Center Research Institute, Osaka (Japan)] [and others

    1996-09-01

    The prostacyclin synthase gene isolated from human genomic libraries (PTGIS) consists of 10 exons spanning approximately 60 kb. All the splice donor and acceptor sites conform to the GT/AG rule. Genomic Southern blot and fluorescence in situ hybridization analyses revealed that the human prostacyclin synthase gene is present as a single copy per haploid genome and is localized on chromosome 20q13.11-q13.13. The 1.5-kb sequence of the 5{prime} of the translational initiation site contained both GC-rich and pyrimidine-rich regions and consensus sequences of the transcription factor recognition sites such as Sp1, AP-2, the interferon-{gamma} response element, GATA, NF-{kappa}B, the CACCC box, and the glucocorticoid response element. The core binding sequence (GAGACC) of the shear stress responsive element was also found in the 5{prime}-flanking region of the gene. The major product of the primer extension analysis suggested that the transcription of the gene started from the positions around 49 bp upstream of the translational initiation codon. Transient transfection experiments using human aortic and bovine arterial endothelial cells demonstrated that the GC-rich region (positions -145 to -10) possessed a significant promoter activity. The 6-kb downstream sequence of the translational termination codon contained multiple polyadenylation signals, Alu repeat sequences, and the consensus sequence of the primate-repetitive DNA element, MER1. Two sizes of the prostacyclin synthase mRNAs (approximately 6 and 3.3 kb) were detected with the human aorta and lung. RNA blot hybridization analysis using the 3{prime}-untranslated region as probe indicated that the sizes of the 3{prime}-flanking regions were different in the major 6-kb and minor 3.3-kb mRNAs. 54 refs., 7 figs.

  14. The Nephila clavipes genome highlights the diversity of spider silk genes and their complex expression.

    Science.gov (United States)

    Babb, Paul L; Lahens, Nicholas F; Correa-Garhwal, Sandra M; Nicholson, David N; Kim, Eun Ji; Hogenesch, John B; Kuntner, Matjaž; Higgins, Linden; Hayashi, Cheryl Y; Agnarsson, Ingi; Voight, Benjamin F

    2017-06-01

    Spider silks are the toughest known biological materials, yet are lightweight and virtually invisible to the human immune system, and they thus have revolutionary potential for medicine and industry. Spider silks are largely composed of spidroins, a unique family of structural proteins. To investigate spidroin genes systematically, we constructed the first genome of an orb-weaving spider: the golden orb-weaver (Nephila clavipes), which builds large webs using an extensive repertoire of silks with diverse physical properties. We cataloged 28 Nephila spidroins, representing all known orb-weaver spidroin types, and identified 394 repeated coding motif variants and higher-order repetitive cassette structures unique to specific spidroins. Characterization of spidroin expression in distinct silk gland types indicates that glands can express multiple spidroin types. We find evidence of an alternatively spliced spidroin, a spidroin expressed only in venom glands, evolutionary mechanisms for spidroin diversification, and non-spidroin genes with expression patterns that suggest roles in silk production.

  15. Genomics 4.0 : syntenic gene and genome duplication drives diversification of plant secondary metabolism and innate immunity in flowering plants : advanced pattern analytics in duplicate genomes

    NARCIS (Netherlands)

    Hofberger, J.A.

    2015-01-01

    Genomics 4.0 - Syntenic Gene and Genome Duplication Drives Diversification of Plant Secondary Metabolism and Innate Immunity in Flowering Plants   Johannes A. Hofberger1, 2, 3 1 Biosystematics Group, Wageningen University & Research Center, Droevendaalsesteeg 1, 6708 PB Wageningen, The Neth

  16. Evolution of genes and genomes on the Drosophila phylogeny

    DEFF Research Database (Denmark)

    Clark, Andrew G; Eisen, Michael B; Smith, Douglas R

    2007-01-01

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the ...

  17. Multiple BiP genes of Arabidopsis thaliana are required for male gametogenesis and pollen competitiveness.

    Science.gov (United States)

    Maruyama, Daisuke; Sugiyama, Tomoyuki; Endo, Toshiya; Nishikawa, Shuh-Ichi

    2014-04-01

    Immunoglobulin-binding protein (BiP) is a molecular chaperone of the heat shock protein 70 (Hsp70) family. BiP is localized in the endoplasmic reticulum (ER) and plays key roles in protein translocation, protein folding and quality control in the ER. The genomes of flowering plants contain multiple BiP genes. Arabidopsis thaliana has three BiP genes. BIP1 and BIP2 are ubiquitously expressed. BIP3 encodes a less well conserved BiP paralog, and it is expressed only under ER stress conditions in the majority of organs. Here, we report that all BiP genes are expressed and functional in pollen and pollen tubes. Although the bip1 bip2 double mutation does not affect pollen viability, the bip1 bip2 bip3 triple mutation is lethal in pollen. This result indicates that lethality of the bip1 bip2 double mutation is rescued by BiP3 expression. A decrease in the copy number of the ubiquitously expressed BiP genes correlates well with a decrease in pollen tube growth, which leads to reduced fitness of mutant pollen during fertilization. Because an increased protein secretion activity is expected to increase the protein folding demand in the ER, the multiple BiP genes probably cooperate with each other to ensure ER homeostasis in cells with active secretion such as rapidly growing pollen tubes.

  18. EGN: a wizard for construction of gene and genome similarity networks.

    Science.gov (United States)

    Halary, Sébastien; McInerney, James O; Lopez, Philippe; Bapteste, Eric

    2013-07-11

    Increasingly, similarity networks are being used for evolutionary analyses of molecular datasets. These networks are very useful, in particular for the analysis of gene sharing, lateral gene transfer and for the detection of distant homologs. Currently, such analyses require some computer programming skills due to the limited availability of user-friendly freely distributed software. Consequently, although appealing, the construction and analyses of these networks remain less familiar to biologists than do phylogenetic approaches. In order to ease the use of similarity networks in the community of evolutionary biologists, we introduce a software program, EGN, that runs under Linux or MacOSX. EGN automates the reconstruction of gene and genome networks from nucleic and proteic sequences. EGN also implements statistics describing genetic diversity in these samples, for various user-defined thresholds of similarities. In the interest of studying the complexity of evolutionary processes affecting microbial evolution, we applied EGN to a dataset of 571,044 proteic sequences from the three domains of life and from mobile elements. We observed that, in Borrelia, plasmids play a different role than in most other eubacteria. Rather than being genetic couriers involved in lateral gene transfer, Borrelia's plasmids and their genes act as private genetic goods, that contribute to the creation of genetic diversity within their parasitic hosts. EGN can be used for constructing, analyzing, and mining molecular datasets in evolutionary studies. The program can help increase our knowledge of the processes through which genes from distinct sources and/or from multiple genomes co-evolve in lineages of cellular organisms.

  19. The Odorant Binding Protein Gene Family from the Genome of Silkworm, Bombyx mori

    Directory of Open Access Journals (Sweden)

    Zhao Ping

    2009-07-01

    Full Text Available Abstract Background Chemosensory systems play key roles in the survival and reproductive success of insects. Insect chemoreception is mediated by two large and diverse gene superfamilies, chemoreceptors and odorant binding proteins (OBPs. OBPs are believed to transport hydrophobic odorants from the environment to the olfactory receptors. Results We identified a family of OBP-like genes in the silkworm genome and characterized their expression using oligonucleotide microarrays. A total of forty-four OBP genes were annotated, a number comparable to the 57 OBPs known from Anopheles gambiae and 51 from Drosophila melanogaster. As seen in other fully sequenced insect genomes, most silkworm OBP genes are present in large clusters. We defined six subfamilies of OBPs, each of which shows lineage-specific expansion and diversification. EST data and OBP expression profiles from multiple larvae tissues of day three fifth instars demonstrated that many OBPs are expressed in chemosensory-specific tissues although some OBPs are expressed ubiquitously and others exclusively in non-chemosensory tissues. Some atypical OBPs are expressed throughout development. These results reveal that, although many OBPs are chemosensory-specific, others may have more general physiological roles. Conclusion Silkworms possess a number of OBPs genes similar to other insects. Their expression profiles suggest that many OBPs may be involved in olfaction and gustation as well as general carriers of hydrophobic molecules. The expansion of OBP gene subfamilies and sequence divergence indicate that the silkworm OBP family acquired functional diversity concurrently with functional constraints. Further investigation of the OBPs of the silkworm could give insights in the roles of OBPs in chemoreception.

  20. Gene loss and horizontal gene transfer contributed to the genome evolution of the extreme acidophile Ferrovum

    Directory of Open Access Journals (Sweden)

    Sophie Roxana Ullrich

    2016-05-01

    Full Text Available Acid mine drainage (AMD, associated with active and abandoned mining sites, is a habitat for acidophilic microorganisms that gain energy from the oxidation of reduced sulfur compounds and ferrous iron and that thrive at pH below 4. Members of the recently proposed genus Ferrovum are the first acidophilic iron oxidizers to be described within the Betaproteobacteria. Although they have been detected as typical community members in AMD habitats worldwide, knowledge of their phylogenetic and metabolic diversity is scarce. Genomics approaches appear to be most promising in addressing this lacuna since isolation and cultivation of Ferrovum has proven to be extremely difficult and has so far only been successful for the designated type strain Ferrovum myxofaciens P3G. In this study, the genomes of two novel strains of Ferrovum (PN-J185 and Z-31 derived from water samples of a mine water treatment plant were sequenced. These genomes were compared with those of Ferrovum sp. JA12 that also originated from the mine water treatment plant, and of the type strain (P3G. Phylogenomic scrutiny suggests that the four strains represent three Ferrovum species that cluster in two groups (1 and 2. Comprehensive analysis of their predicted metabolic pathways revealed that these groups harbor characteristic metabolic profiles, notably with respect to motility, chemotaxis, nitrogen metabolism, biofilm formation and their potential strategies to cope with the acidic environment. For example, while the F. myxofaciens strains (group 1 appear to be motile and diazotrophic, the non-motile group 2 strains have the predicted potential to use a greater variety of fixed nitrogen sources. Furthermore, analysis of their genome synteny provides first insights into their genome evolution, suggesting that horizontal gene transfer and genome reduction in the group 2 strains by loss of genes encoding complete metabolic pathways or physiological features contributed to the observed

  1. Imputation and quality control steps for combining multiple genome-wide datasets

    Directory of Open Access Journals (Sweden)

    Shefali S Verma

    2014-12-01

    Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  2. Genome duplication and multiple evolutionary origins of complex migratory behavior in Salmonidae.

    Science.gov (United States)

    Alexandrou, Markos A; Swartz, Brian A; Matzke, Nicholas J; Oakley, Todd H

    2013-12-01

    Multiple rounds of whole genome duplication have repeatedly marked the evolution of vertebrates, and correlate strongly with morphological innovation. However, less is known about the behavioral, physiological and ecological consequences of genome duplication, and whether these events coincide with major transitions in vertebrate complexity. The complex behavior of anadromy - where adult fishes migrate up rivers from the sea to their natal site to spawn - is well known in salmonid fishes. Some hypotheses suggest that migratory behavior evolved as a consequence of an ancestral genome duplication event, which permitted salinity tolerance and osmoregulatory plasticity. Here we test whether anadromy evolved multiple times within salmonids, and whether genome duplication coincided with the evolution of anadromy. We present a method that uses ancestral character simulation data to plot the frequency of character transitions over a time calibrated phylogenetic tree to provide estimates of the absolute timing of character state transitions. Furthermore, we incorporate extinct and extant taxa to improve on previous estimates of divergence times. We present the first phylogenetic evidence indicating that anadromy evolved at least twice from freshwater salmonid ancestors. Results suggest that genome duplication did not coincide in time with changes in migratory behavior, but preceded a transition to anadromy by 55-50 million years. Our study represents the first attempt to estimate the absolute timing of a complex behavioral trait in relation to a genome duplication event.

  3. RiceGeneThresher: a web-based application for mining genes underlying QTL in rice genome.

    Science.gov (United States)

    Thongjuea, Supat; Ruanjaichon, Vinitchan; Bruskiewich, Richard; Vanavichit, Apichart

    2009-01-01

    RiceGeneThresher is a public online resource for mining genes underlying genome regions of interest or quantitative trait loci (QTL) in rice genome. It is a compendium of rice genomic resources consisting of genetic markers, genome annotation, expressed sequence tags (ESTs), protein domains, gene ontology, plant stress-responsive genes, metabolic pathways and prediction of protein-protein interactions. RiceGeneThresher system integrates these diverse data sources and provides powerful web-based applications, and flexible tools for delivering customized set of biological data on rice. Its system supports whole-genome gene mining for QTL by querying using DNA marker intervals or genomic loci. RiceGeneThresher provides biologically supported evidences that are essential for targeting groups or networks of genes involved in controlling traits underlying QTL. Users can use it to discover and to assign the most promising candidate genes in preparation for the further gene function validation analysis. The web-based application is freely available at http://rice.kps.ku.ac.th.

  4. Genomic Characterization of Phenylalanine Ammonia Lyase Gene in Buckwheat.

    Directory of Open Access Journals (Sweden)

    Karthikeyan Thiyagarajan

    Full Text Available Phenylalanine Ammonia Lyase (PAL gene which plays a key role in bio-synthesis of medicinally important compounds, Rutin/quercetin was sequence characterized for its efficient genomics application. These compounds possessing anti-diabetic and anti-cancer properties and are predominantly produced by Fagopyrum spp. In the present study, PAL gene was sequenced from three Fagopyrum spp. (F. tataricum, F. esculentum and F. dibotrys and showed the presence of three SNPs and four insertion/deletions at intra and inter specific level. Among them, the potential SNP (position 949th bp G>C with Parsimony Informative Site was selected and successfully utilised to individuate the zygosity/allelic variation of 16 F. tataricum varieties. Insertion mutations were identified in coding region, which resulted the change of a stretch of 39 amino acids on the putative protein. Our Study revealed that autogamous species (F. tataricum has lower frequency of observed SNPs as compared to allogamous species (F. dibotrys and F. esculentum. The identified SNPs in F. tataricum didn't result to amino acid change, while in other two species it caused both conservative and non-conservative variations. Consistent pattern of SNPs across the species revealed their phylogenetic importance. We found two groups of F. tataricum and one of them was closely related with F. dibotrys. Sequence characterization information of PAL gene reported in present investigation can be utilized in genetic improvement of buckwheat in reference to its medicinal value.

  5. Multiple Evolutionary Selections Involved in Synonymous Codon Usages in the Streptococcus agalactiae Genome.

    Science.gov (United States)

    Ma, Yan-Ping; Ke, Hao; Liang, Zhi-Ling; Liu, Zhen-Xing; Hao, Le; Ma, Jiang-Yao; Li, Yu-Gu

    2016-02-24

    Streptococcus agalactiae is an important human and animal pathogen. To better understand the genetic features and evolution of S. agalactiae, multiple factors influencing synonymous codon usage patterns in S. agalactiae were analyzed in this study. A- and U-ending rich codons were used in S. agalactiae function genes through the overall codon usage analysis, indicating that Adenine (A)/Thymine (T) compositional constraints might contribute an important role to the synonymous codon usage pattern. The GC3% against the effective number of codon (ENC) value suggested that translational selection was the important factor for codon bias in the microorganism. Principal component analysis (PCA) showed that (i) mutational pressure was the most important factor in shaping codon usage of all open reading frames (ORFs) in the S. agalactiae genome; (ii) strand specific mutational bias was not capable of influencing the codon usage bias in the leading and lagging strands; and (iii) gene length was not the important factor in synonymous codon usage pattern in this organism. Additionally, the high correlation between tRNA adaptation index (tAI) value and codon adaptation index (CAI), frequency of optimal codons (Fop) value, reinforced the role of natural selection for efficient translation in S. agalactiae. Comparison of synonymous codon usage pattern between S. agalactiae and susceptible hosts (human and tilapia) showed that synonymous codon usage of S. agalactiae was independent of the synonymous codon usage of susceptible hosts. The study of codon usage in S. agalactiae may provide evidence about the molecular evolution of the bacterium and a greater understanding of evolutionary relationships between S. agalactiae and its hosts.

  6. The Mitochondrial Genome of Raphanus sativus and Gene Evolution of Cruciferous Mitochondrial Types

    Institute of Scientific and Technical Information of China (English)

    Shengxin Chang; Jianmei Chen; Yankun Wang; Bingchao Gu; Jianbo He; Pu Chu; Rongzhan Guan

    2013-01-01

    To explore the mitochondrial genes of the Cruciferae family,the mitochondrial genome of Raphanus sativus (sat) was sequenced and annotated.The circular mitochondrial genome of sat is 239,723 bp and includes 33 protein-coding genes,three rRNA genes and 17 tRNA genes.The mitochondrial genome also contains a pair of large repeat sequences 5.9 kb in length,which may mediate genome reorganization into two sub-genomic circles,with predicted sizes of 124.8 kb and 115.0 kb,respectively.Furthermore,gene evolution of mitochondrial genomes within the Cruciferae family was analyzed using sat mitochondrial type (mitotype),together with six other reported mitotypes.The cruciferous mitochondrial genomes have maintained almost the same set of functional genes.Compared with Cycas taitungensis (a representative gymnosperm),the mitochondrial genomes of the Cruciferae have lost nine protein-coding genes and seven mitochondrial-like tRNA genes,but acquired six chloroplast-like tRNAs.Among the Cruciferae,to maintain the same set of genes that are necessary for mitochondrial function,the exons of the genes have changed at the lowest rates,as indicated by the numbers of single nucleotide polymorphisms.The open reading frames (ORFs) of unknown function in the cruciferous genomes are not conserved.Evolutionary events,such as mutations,genome reorganizations and sequence insertions or deletions (indels),have resulted in the nonconserved ORFs in the cruciferous mitochondrial genomes,which is becoming significantly different among mitotypes.This work represents the first phylogenic explanation of the evolution of genes of known function in the Cruciferae family.It revealed significant variation in ORFs and the causes of such variation.

  7. The mitochondrial genome of Raphanus sativus and gene evolution of cruciferous mitochondrial types.

    Science.gov (United States)

    Chang, Shengxin; Chen, Jianmei; Wang, Yankun; Gu, Bingchao; He, Jianbo; Chu, Pu; Guan, Rongzhan

    2013-03-20

    To explore the mitochondrial genes of the Cruciferae family, the mitochondrial genome of Raphanus sativus (sat) was sequenced and annotated. The circular mitochondrial genome of sat is 239,723 bp and includes 33 protein-coding genes, three rRNA genes and 17 tRNA genes. The mitochondrial genome also contains a pair of large repeat sequences 5.9 kb in length, which may mediate genome reorganization into two sub-genomic circles, with predicted sizes of 124.8 kb and 115.0 kb, respectively. Furthermore, gene evolution of mitochondrial genomes within the Cruciferae family was analyzed using sat mitochondrial type (mitotype), together with six other reported mitotypes. The cruciferous mitochondrial genomes have maintained almost the same set of functional genes. Compared with Cycas taitungensis (a representative gymnosperm), the mitochondrial genomes of the Cruciferae have lost nine protein-coding genes and seven mitochondrial-like tRNA genes, but acquired six chloroplast-like tRNAs. Among the Cruciferae, to maintain the same set of genes that are necessary for mitochondrial function, the exons of the genes have changed at the lowest rates, as indicated by the numbers of single nucleotide polymorphisms. The open reading frames (ORFs) of unknown function in the cruciferous genomes are not conserved. Evolutionary events, such as mutations, genome reorganizations and sequence insertions or deletions (indels), have resulted in the non-conserved ORFs in the cruciferous mitochondrial genomes, which is becoming significantly different among mitotypes. This work represents the first phylogenic explanation of the evolution of genes of known function in the Cruciferae family. It revealed significant variation in ORFs and the causes of such variation.

  8. Discriminative accuracy of genomic profiling comparing multiplicative and additive risk models.

    Science.gov (United States)

    Moonesinghe, Ramal; Khoury, Muin J; Liu, Tiebin; Janssens, A Cecile J W

    2011-02-01

    Genetic prediction of common diseases is based on testing multiple genetic variants with weak effect sizes. Standard logistic regression and Cox Proportional Hazard models that assess the combined effect of multiple variants on disease risk assume multiplicative joint effects of the variants, but this assumption may not be correct. The risk model chosen may affect the predictive accuracy of genomic profiling. We investigated the discriminative accuracy of genomic profiling by comparing additive and multiplicative risk models. We examined genomic profiles of 40 variants with genotype frequencies varying from 0.1 to 0.4 and relative risks varying from 1.1 to 1.5 in separate scenarios assuming a disease risk of 10%. The discriminative accuracy was evaluated by the area under the receiver operating characteristic curve. Predicted risks were more extreme at the lower and higher risks for the multiplicative risk model compared with the additive model. The discriminative accuracy was consistently higher for multiplicative risk models than for additive risk models. The differences in discriminative accuracy were negligible when the effect sizes were small (risk genotypes were common or when they had stronger effects. Unraveling the exact mode of biological interaction is important when effect sizes of genetic variants are moderate at the least, to prevent the incorrect estimation of risks.

  9. Modeling the cumulative genetic risk for multiple sclerosis from genome-wide association data.

    Science.gov (United States)

    Wang, Joanne H; Pappas, Derek; De Jager, Philip L; Pelletier, Daniel; de Bakker, Paul Iw; Kappos, Ludwig; Polman, Chris H; Chibnik, Lori B; Hafler, David A; Matthews, Paul M; Hauser, Stephen L; Baranzini, Sergio E; Oksenberg, Jorge R

    2011-01-18

    Multiple sclerosis (MS) is the most common cause of chronic neurologic disability beginning in early to middle adult life. Results from recent genome-wide association studies (GWAS) have substantially lengthened the list of disease loci and provide convincing evidence supporting a multifactorial and polygenic model of inheritance. Nevertheless, the knowledge of MS genetics remains incomplete, with many risk alleles still to be revealed. We used a discovery GWAS dataset (8,844 samples, 2,124 cases and 6,720 controls) and a multi-step logistic regression protocol to identify novel genetic associations. The emerging genetic profile included 350 independent markers and was used to calculate and estimate the cumulative genetic risk in an independent validation dataset (3,606 samples). Analysis of covariance (ANCOVA) was implemented to compare clinical characteristics of individuals with various degrees of genetic risk. Gene ontology and pathway enrichment analysis was done using the DAVID functional annotation tool, the GO Tree Machine, and the Pathway-Express profiling tool. In the discovery dataset, the median cumulative genetic risk (P-Hat) was 0.903 and 0.007 in the case and control groups, respectively, together with 79.9% classification sensitivity and 95.8% specificity. The identified profile shows a significant enrichment of genes involved in the immune response, cell adhesion, cell communication/signaling, nervous system development, and neuronal signaling, including ionotropic glutamate receptors, which have been implicated in the pathological mechanism driving neurodegeneration. In the validation dataset, the median cumulative genetic risk was 0.59 and 0.32 in the case and control groups, respectively, with classification sensitivity 62.3% and specificity 75.9%. No differences in disease progression or T2-lesion volumes were observed among four levels of predicted genetic risk groups (high, medium, low, misclassified). On the other hand, a significant

  10. An automated annotation tool for genomic DNA sequences using GeneScan and BLAST

    Indian Academy of Sciences (India)

    Andrew M. Lynn; Chakresh Kumar Jain; K. Kosalai; Pranjan Barman; Nupur Thakur; Harish Batra; Alok Bhattacharya

    2001-04-01

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences.

  11. Comparative analysis of genome maintenance genes in naked mole rat, mouse, and human

    NARCIS (Netherlands)

    S.L. Macrae (Sheila L.); Q. Zhang (Quanwei); C. Lemetre (Christophe); I. Seim (Inge); R.B. Calder (Robert B.); J.H.J. Hoeijmakers (Jan); Y. Suh (Yousin); V.N. Gladyshev (Vadim N.); A. Seluanov (Andrei); V. Gorbunova (Vera); J. Vijg (Jan); Z.D. Zhang (Zhengdong D.)

    2015-01-01

    textabstractGenome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM g

  12. Multiple aspects of gene dysregulation in Huntington’s Disease.

    Directory of Open Access Journals (Sweden)

    Lara eMoumne

    2013-10-01

    Full Text Available Huntington’s Disease (HD is a genetic neurodegenerative disease caused by a CAG expansion in the gene encoding Huntingtin (Htt. It is characterized by chorea, cognitive and psychiatric disorders. The most affected brain region is the striatum, and the clinical symptoms are directly correlated to the rate of striatal degeneration. The wild-type Htt is a ubiquitous protein and its deletion is lethal. Mutated (expanded Htt produces excitotoxicity, mitochondrial dysfunctions, axonal transport deficit, altered proteasome activity, and gene dysregulation. Transcriptional dysregulation occurs at early neuropathological stages in HD patients. Multiple genes are dysregulated, with overlaps of altered transcripts between mouse models of HD and patient brains. Nuclear localization of Exp-Htt interferes with transcription factors, co-activators and proteins of the transcriptional machinery. Another key mechanism described so far, is an alteration of cytoplasmic retention of the transcriptional repressor REST, which is normally associated with wild-type Htt. As such, Exp-Htt causes alteration of transcription of multiple genes involved in neuronal survival, plasticity, signaling and mitochondrial biogenesis and respiration. Besides these transcriptional dysregulations, Exp-Htt affects the chromatin structure through altered post-translational modifications (PTM of histones and methylation of DNA. Multiple alterations of histone PTM are described, including acetylation, methylation, ubiquitylation, polyamination and phosphorylation. Exp-Htt also affects the expression and regulation of non-coding microRNAs. First multiple neural microRNAs are controlled by REST, and dysregulated in HD, with concomitant de-repression of downstream mRNA targets. Second, Exp-Htt protein or RNA may also play a major role in the processing of miRNAs and hence pathogenesis. These pleiotropic effects of Exp-Htt on gene expression may represent seminal deleterious effects on the

  13. Nitrile Hydratase Genes Are Present in Multiple Eukaryotic Supergroups

    Science.gov (United States)

    Marron, Alan O.; Akam, Michael; Walker, Giselle

    2012-01-01

    Background Nitrile hydratases are enzymes involved in the conversion of nitrile-containing compounds into ammonia and organic acids. Although they are widespread in prokaryotes, nitrile hydratases have only been reported in two eukaryotes: the choanoflagellate Monosiga brevicollis and the stramenopile Aureococcus anophagefferens. The nitrile hydratase gene in M. brevicollis was believed to have arisen by lateral gene transfer from a prokaryote, and is a fusion of beta and alpha nitrile hydratase subunits. Only the alpha subunit has been reported in A. anophagefferens. Methodology/Principal Findings Here we report the detection of nitrile hydratase genes in five eukaryotic supergroups: opisthokonts, amoebozoa, archaeplastids, CCTH and SAR. Beta-alpha subunit fusion genes are found in the choanoflagellates, ichthyosporeans, apusozoans, haptophytes, rhizarians and stramenopiles, and potentially also in the amoebozoans. An individual alpha subunit is found in a dinoflagellate and an individual beta subunit is found in a haptophyte. Phylogenetic analyses recover a clade of eukaryotic-type nitrile hydratases in the Opisthokonta, Amoebozoa, SAR and CCTH; this is supported by analyses of introns and gene architecture. Two nitrile hydratase sequences from an animal and a plant resolve in the prokaryotic nitrile hydratase clade. Conclusions/Significance The evidence presented here demonstrates that nitrile hydratase genes are present in multiple eukaryotic supergroups, suggesting that a subunit fusion gene was present in the last common ancestor of all eukaryotes. The absence of nitrile hydratase from several sequenced species indicates that subunits were lost in multiple eukaryotic taxa. The presence of nitrile hydratases in many other eukaryotic groups is unresolved due to insufficient data and taxon sampling. The retention and expression of the gene in distantly related eukaryotic species suggests that it plays an important metabolic role. The novel family of eukaryotic

  14. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy

    NARCIS (Netherlands)

    Bouwman, A.C.; Veerkamp, R.F.

    2014-01-01

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken

  15. PseudoGeneQuest – Service for identification of different pseudogene types in the human genome

    OpenAIRE

    Vihinen Mauno; Ortutay Csaba

    2008-01-01

    Abstract Background Pseudogenes, nonfunctional copies of genes, evolve fast due the lack of evolutionary pressures and thus appear in several different forms. PseudoGeneQuest is an online tool to search the human genome for a given query sequence and to identify different types of pseudogenes as well as novel genes and gene fragments. Description The service can detect pseudogenes, that have arisen either by retrotransposition or segmental genome duplication, many of which are not listed in t...

  16. Genomic-wide analysis of lymphatic metastasis-associated genes in human hepatocellular carcinoma

    Institute of Scientific and Technical Information of China (English)

    Chun-Feng Lee; Zhi-Qiang Ling; Ting Zhao; Shih-Hua Fang; Weng-Cheng Chang; San-Chih Lee; Kuan-Rong Lee

    2009-01-01

    AIM: To identify the genes related to lymph node metastasis in human hepatocellular carcinoma (HCC), 32 HCC patients with or without lymph node metastasis were investigated by high-throughput microarray comprising 886 genes.METHODS: The samples of cancerous and non-cancerouspaired tissue were taken from 32 patients with HCC who underwent hepatectomy with lymph node dissection. Total RNA was extracted from the cells obtained by means of laser microdissection (LCM) and was amplified by the T7-based amplification system. Then, the amplified samples were applied in the cDNA microarray comprising of 886 genes.RESULTS: The results demonstrated that 25 upregulated genes such as cell membrane receptor,intracellular signaling and cell adhesion related genes,and 48 down-regulated genes such as intracellular signaling and cell cycle regulator-related genes,were correlated with lymph node metastasis in HCC. Amongst them were included some interesting genes, such as MET, EPHA2, CCND1, MMP2, MMP13,CASP3, CDH1, and PTPN2. Expression of 16 genes ( MET, CCND1, CCND2, VEGF, KRT18, RFC4, BIRC5,CDC6, MMP2, BCL2A1, CDH1, VIM, PDGFRA, PTPN2,SLC25A5 and DSP) were further confirmed by real-time quantitative reverse transcriptional polymerase chain reaction (RT-PCR).CONCLUSION: Tumor metastasis is an important biological characteristic, which involves multiple genetic changes and cumulation. This genome-wide information contributes to an improved understanding of molecular alterations during lymph node metastasis in HCC. It may help clinicians to predict metastasis of lymph nodes and assist researchers in identifying novel therapeutic targets for metastatic HCC patients.

  17. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions.

    Directory of Open Access Journals (Sweden)

    Soumya Raychaudhuri

    2009-06-01

    Full Text Available Translating a set of disease regions into insight about pathogenic mechanisms requires not only the ability to identify the key disease genes within them, but also the biological relationships among those key genes. Here we describe a statistical method, Gene Relationships Among Implicated Loci (GRAIL, that takes a list of disease regions and automatically assesses the degree of relatedness of implicated genes using 250,000 PubMed abstracts. We first evaluated GRAIL by assessing its ability to identify subsets of highly related genes in common pathways from validated lipid and height SNP associations from recent genome-wide studies. We then tested GRAIL, by assessing its ability to separate true disease regions from many false positive disease regions in two separate practical applications in human genetics. First, we took 74 nominally associated Crohn's disease SNPs and applied GRAIL to identify a subset of 13 SNPs with highly related genes. Of these, ten convincingly validated in follow-up genotyping; genotyping results for the remaining three were inconclusive. Next, we applied GRAIL to 165 rare deletion events seen in schizophrenia cases (less than one-third of which are contributing to disease risk. We demonstrate that GRAIL is able to identify a subset of 16 deletions containing highly related genes; many of these genes are expressed in the central nervous system and play a role in neuronal synapses. GRAIL offers a statistically robust approach to identifying functionally related genes from across multiple disease regions--that likely represent key disease pathways. An online version of this method is available for public use (http://www.broad.mit.edu/mpg/grail/.

  18. A rare case of plastid protein-coding gene duplication in the chloroplast genome of Euglena archaeoplastidiata (Euglenophyta).

    Science.gov (United States)

    Bennett, Matthew S; Shiu, Shin-Han; Triemer, Richard E

    2017-03-12

    Gene duplication is an important evolutionary process that allows duplicate functions to diverge, or, in some cases, allows for new functional gains. However, in contrast to the nuclear genome, gene duplications within the chloroplast are extremely rare. Here, we present the chloroplast genome of the photosynthetic protist Euglena archaeoplastidiata. Upon annotation, it was found that the chloroplast genome contained a novel tandem direct duplication that encoded a portion of RuBisCO large subunit (rbcL) followed by a complete copy of ribosomal protein L32 (rpl32), as well as the associated intergenic sequences. Analyses of the duplicated rpl32 were inconclusive regarding selective pressures, although it was found that substitutions in the duplicated region, all non-synonymous, likely had a neutral functional effect. The duplicated region did not exhibit patterns consistent with previously described mechanisms for tandem direct duplications, and demonstrated an unknown mechanism of duplication. In addition, a comparison of this chloroplast genome to other previously characterized chloroplast genomes from the same family revealed characteristics that indicated E. archaeoplastidiata was probably more closely related to taxa in the genera Monomorphina, Cryptoglena, and Euglenaria than it was to other Euglena taxa. Taken together, the chloroplast genome of E. archaeoplastidiata demonstrated multiple characteristics unique to the euglenoid world, and has justified the longstanding curiosity regarding this enigmatic taxon.

  19. Quantitative Seq-LGS: Genome-Wide Identification of Genetic Drivers of Multiple Phenotypes in Malaria Parasites

    KAUST Repository

    Abkallo, Hussein M.

    2016-10-01

    Identifying the genetic determinants of phenotypes that impact on disease severity is of fundamental importance for the design of new interventions against malaria. Traditionally, such discovery has relied on labor-intensive approaches that require significant investments of time and resources. By combining Linkage Group Selection (LGS), quantitative whole genome population sequencing and a novel mathematical modeling approach (qSeq-LGS), we simultaneously identified multiple genes underlying two distinct phenotypes, identifying novel alleles for growth rate and strain specific immunity (SSI), while removing the need for traditionally required steps such as cloning, individual progeny phenotyping and marker generation. The detection of novel variants, verified by experimental phenotyping methods, demonstrates the remarkable potential of this approach for the identification of genes controlling selectable phenotypes in malaria and other apicomplexan parasites for which experimental genetic crosses are amenable.

  20. Genomic imprinting and maternal effect genes in haplodiploid sex determination.

    Science.gov (United States)

    van de Zande, L; Verhulst, E C

    2014-01-01

    The research into the Drosophila melanogaster sex-determining system has been at the basis of all further research on insect sex determination. This further research has made it clear that, for most insect species, the presence of sufficient functional Transformer (TRA) protein in the early embryonic stage is essential for female sexual development. In Hymenoptera, functional analysis of sex determination by knockdown studies of sex-determining genes has only been performed for 2 species. The first is the social insect species Apis mellifera, the honeybee, which has single-locus complementary sex determination (CSD). The other species is the parasitoid Nasonia vitripennis, the jewel wasp. Nasonia has a non-CSD sex-determining system, described as the maternal effect genomic imprinting sex determination system (MEGISD). Here, we describe the arguments that eventually led to the formulation of MEGISD and the experimental data that supported and refined this model. We evaluate the possibility that DNA methylation lies at the basis of MEGISD and briefly address the role of genomic imprinting in non-CSD sex determination in other Hymenoptera.

  1. Comparative genomics of free-living Gammaproteobacteria: pathogenesis-related genes or interaction-related genes?

    Science.gov (United States)

    Vázquez-Rosas-Landa, Mirna; Ponce-Soto, Gabriel Yaxal; Eguiarte, Luis E; Souza, V

    2017-07-31

    Bacteria have numerous strategies to interact with themselves and with their environment, but genes associated with these interactions are usually cataloged as pathogenic. To understand the role that these genes have not only in pathogenesis but also in bacterial interactions, we compared the genomes of eight bacteria from human-impacted environments with those of free-living bacteria from the Cuatro Ciénegas Basin (CCB), a relatively pristine oligotrophic site. Fifty-one genomes from CCB bacteria, including Pseudomonas, Vibrio, Photobacterium and Aeromonas, were analyzed. We found that the CCB strains had several virulence-related genes, 15 of which were common to all strains and were related to flagella and chemotaxis. We also identified the presence of Type III and VI secretion systems, which leads us to propose that these systems play an important role in interactions among bacterial communities beyond pathogenesis. None of the CCB strains had pathogenicity islands, despite having genes associated with antibiotics. Integrons were rare, while CRISPR elements were common. The idea that pathogenicity-related genes in many cases form part of a wider strategy used by bacteria to interact with other organisms could help us to understand the role of pathogenicity-related elements in an ecological and evolutionary framework leading toward a more inclusive One Health concept. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Genomic definition of multiple ex vivo regulatory T cell subphenotypes.

    Science.gov (United States)

    Feuerer, Markus; Hill, Jonathan A; Kretschmer, Karsten; von Boehmer, Harald; Mathis, Diane; Benoist, Christophe

    2010-03-30

    Regulatory T (Treg) cells that express the Foxp3 transcription factor are essential for lymphoid homeostasis and immune tolerance to self. Other nonimmunological functions of Treg cells, such as controlling metabolic function in adipose tissue, are also emerging. Treg cells originate primarily in the thymus, but can also be elicited from conventional T cells by in vivo exposure to low-dose antigen or homeostatic expansion or by activation in the presence of TGFbeta in vitro. Treg cells are characterized by a distinct transcriptional signature controlled in part, but not solely, by Foxp3. For a better perspective on transcriptional control in Treg cells, we compared gene expression profiles of a broad panel of Treg cells from various origins or anatomical locations. Treg cells generated by different means form different subphenotypes and were identifiable by particular combinations of transcripts, none of which fully encompassed the entire Treg signature. Molecules involved in Treg cell effector function, chemokine receptors, and the transcription factors that control them were differentially represented in these subphenotypes. Treg cells from the gut proved dissimilar to cells elicited by exposure to TGFbeta in vitro, but instead they resembled a CD103(+)Klrg1(+) subphenotype preferentially generated in response to lymphopenia.

  3. Detection of phytochrome-like genes from Rhazya stricta (Apocynaceae) using de novo genome assembly.

    Science.gov (United States)

    Sabir, Jamal S M; Baeshen, Nabih A; Shokry, Ahmed M; Gadalla, Nour O; Edris, Sherif; Mutwakil, Mohammed H; Ramadan, Ahmed M; Atef, Ahmed; Al-Kordy, Magdy A; Abuzinadah, Osama A; El-Domyati, Fotouh M; Jansen, Robert K; Bahieldin, Ahmed

    2013-01-01

    Phytochrome-like genes in the wild plant species Rhazya stricta Decne were characterized using a de novo genome assembly of next generation sequence data. Rhazya stricta contains more than 100 alkaloids with multiple pharmacological properties, and leaf extracts have been used to cure chronic rheumatism, to treat tumors, and in the treatment of several other diseases. Phytochromes are known to be involved in the light-regulated biosynthesis of some alkaloids. Phytochromes are soluble chromoproteins that function in the absorption of red and far-red light and the transduction of intracellular signals during light-regulated plant development. De novo assembly of the nuclear genome of R. stricta recovered 45,641 contigs greater than 1000bp long, which were used in constructing a local database. Five sequences belonging to Arabidopsis thaliana phytochrome gene family (i.e., AtphyABCDE) were used to identify R. stricta contigs with phytochrome-like sequences using BLAST. This led to the identification of three contigs with phytochrome-like sequences covering AtphyA-, AtphyC- and AtphyE-like full-length genes. Annotation of the three sequences showed that each contig consists of one phytochrome-like gene with three exons and two introns. BLASTn and BLASTp results indicated that RsphyA mRNA and protein sequences had homologues in Wrightia coccinea and and Solanum tuberosum, respectively. RsphyC-like mRNA and protein sequence were homologous to Vitis vinifera and Vitis riparia. RsphyE-like mRNA coding and protein sequences were homologous to Ipomoea nil. Multiple-sequence alignment of phytochrome proteins indicated a homology with 30 sequences from 23 different species of flowering plants. Phylogenetic analysis confirmed that each R. stricta phytochrome gene is related to the same phytochrome gene of other flowering plants. It is proposed that the absence of phyB gene in R. stricta is due to RsphyA gene taking over the role of phyB.

  4. PrimerDesign-M: a multiple-alignment based multiple-primer design tool for walking across variable genomes.

    Science.gov (United States)

    Yoon, Hyejin; Leitner, Thomas

    2015-05-01

    Analyses of entire viral genomes or mtDNA requires comprehensive design of many primers across their genomes. Furthermore, simultaneous optimization of several DNA primer design criteria may improve overall experimental efficiency and downstream bioinformatic processing. To achieve these goals, we developed PrimerDesign-M. It includes several options for multiple-primer design, allowing researchers to efficiently design walking primers that cover long DNA targets, such as entire HIV-1 genomes, and that optimizes primers simultaneously informed by genetic diversity in multiple alignments and experimental design constraints given by the user. PrimerDesign-M can also design primers that include DNA barcodes and minimize primer dimerization. PrimerDesign-M finds optimal primers for highly variable DNA targets and facilitates design flexibility by suggesting alternative designs to adapt to experimental conditions. PrimerDesign-M is available as a webtool at http://www.hiv.lanl.gov/content/sequence/PRIMER_DESIGN/primer_design.html tkl@lanl.gov or seq-info@lanl.gov. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  5. Late multiple organ surge in interferon-regulated target genes characterizes staphylococcal enterotoxin B lethality.

    Directory of Open Access Journals (Sweden)

    Gabriela A Ferreyra

    Full Text Available BACKGROUND: Bacterial superantigens are virulence factors that cause toxic shock syndrome. Here, the genome-wide, temporal response of mice to lethal intranasal staphylococcal enterotoxin B (SEB challenge was investigated in six tissues. RESULTS: The earliest responses and largest number of affected genes occurred in peripheral blood mononuclear cells (PBMC, spleen, and lung tissues with the highest content of both T-cells and monocyte/macrophages, the direct cellular targets of SEB. In contrast, the response of liver, kidney, and heart was delayed and involved fewer genes, but revealed a dominant genetic program that was seen in all 6 tissues. Many of the 85 uniquely annotated transcripts participating in this shared genomic response have not been previously linked to SEB. Nine of the 85 genes were subsequently confirmed by RT-PCR in every tissue/organ at 24 h. These 85 transcripts, up-regulated in all tissues, annotated to the interferon (IFN/antiviral-response and included genes belonging to the DNA/RNA sensing system, DNA damage repair, the immunoproteasome, and the ER/metabolic stress-response and apoptosis pathways. Overall, this shared program was identified as a type I and II interferon (IFN-response and the promoters of these genes were highly enriched for IFN regulatory matrices. Several genes whose secreted products induce the IFN pathway were up-regulated at early time points in PBMCs, spleen, and/or lung. Furthermore, IFN regulatory factors including Irf1, Irf7 and Irf8, and Zbp1, a DNA sensor/transcription factor that can directly elicit an IFN innate immune response, participated in this host-wide SEB signature. CONCLUSION: Global gene-expression changes across multiple organs implicated a host-wide IFN-response in SEB-induced death. Therapies aimed at IFN-associated innate immunity may improve outcome in toxic shock syndromes.

  6. The complete sequence of the first Spodoptera frugiperda Betabaculovirus genome: a natural multiple recombinant virus.

    Science.gov (United States)

    Cuartas, Paola E; Barrera, Gloria P; Belaich, Mariano N; Barreto, Emiliano; Ghiringhelli, Pablo D; Villamizar, Laura F

    2015-01-20

    Spodoptera frugiperda (Lepidoptera: Noctuidae) is a major pest in maize crops in Colombia, and affects several regions in America. A granulovirus isolated from S. frugiperda (SfGV VG008) has potential as an enhancer of insecticidal activity of previously described nucleopolyhedrovirus from the same insect species (SfMNPV). The SfGV VG008 genome was sequenced and analyzed showing circular double stranded DNA of 140,913 bp encoding 146 putative ORFs that include 37 Baculoviridae core genes, 88 shared with betabaculoviruses, two shared only with betabaculoviruses from Noctuide insects, two shared with alphabaculoviruses, three copies of own genes (paralogs) and the other 14 corresponding to unique genes without representation in the other baculovirus species. Particularly, the genome encodes for important virulence factors such as 4 chitinases and 2 enhancins. The sequence analysis revealed the existence of eight homologous regions (hrs) and also suggests processes of gene acquisition by horizontal transfer including the SfGV VG008 ORFs 046/047 (paralogs), 059, 089 and 099. The bioinformatics evidence indicates that the genome donors of mentioned genes could be alpha- and/or betabaculovirus species. The previous reported ability of SfGV VG008 to naturally co-infect the same host with other virus show a possible mechanism to capture genes and thus improve its fitness.

  7. The Complete Sequence of the First Spodoptera frugiperda Betabaculovirus Genome: A Natural Multiple Recombinant Virus

    Directory of Open Access Journals (Sweden)

    Paola E. Cuartas

    2015-01