WorldWideScience

Sample records for genes genomic organization

  1. Gene organization inside replication domains in mammalian genomes

    Science.gov (United States)

    Zaghloul, Lamia; Baker, Antoine; Audit, Benjamin; Arneodo, Alain

    2012-11-01

    We investigate the large-scale organization of human genes with respect to "master" replication origins that were previously identified as bordering nucleotide compositional skew domains. We separate genes in two categories depending on their CpG enrichment at the promoter which can be considered as a marker of germline DNA methylation. Using expression data in mouse, we confirm that CpG-rich genes are highly expressed in germline whereas CpG-poor genes are in a silent state. We further show that, whether tissue-specific or broadly expressed (housekeeping genes), the CpG-rich genes are over-represented close to the replication skew domain borders suggesting some coordination of replication and transcription. We also reveal that the transcription of the longest CpG-rich genes is co-oriented with replication fork progression so that the promoter of these transcriptionally active genes be located into the accessible open chromatin environment surrounding the master replication origins that border the replication skew domains. The observation of a similar gene organization in the mouse genome confirms the interplay of replication, transcription and chromatin structure as the cornerstone of mammalian genome architecture.

  2. Genomic sequence and organization of two members of a human lectin gene family

    International Nuclear Information System (INIS)

    Gitt, M.A.; Barondes, S.H.

    1991-01-01

    The authors have isolated and sequenced the genomic DNA encoding a human dimeric soluble lactose-binding lectin. The gene has four exons, and its upstream region contains sequences that suggest control by glucocorticoids, heat (environmental) shock, metals, and other factors. They have also isolated and sequenced three exons of the gene encoding another human putative lectin, the existence of which was first indicated by isolation of its cDNA. Comparisons suggest a general pattern of genomic organization of members of this lectin gene family

  3. Genome organization and expression of the rat ACBP gene family

    DEFF Research Database (Denmark)

    Mandrup, S; Andreasen, P H; Knudsen, J

    1993-01-01

    pool former. We have molecularly cloned and characterized the rat ACBP gene family which comprises one expressed and four processed pseudogenes. One of these was shown to exist in two allelic forms. A comprehensive computer-aided analysis of the promoter region of the expressed ACBP gene revealed...

  4. Genomic organization, annotation, and ligand-receptor inferences of chicken chemokines and chemokine receptor genes based on comparative genomics

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2005-03-01

    Full Text Available Abstract Background Chemokines and their receptors play important roles in host defense, organogenesis, hematopoiesis, and neuronal communication. Forty-two chemokines and 19 cognate receptors have been found in the human genome. Prior to this report, only 11 chicken chemokines and 7 receptors had been reported. The objectives of this study were to systematically identify chicken chemokines and their cognate receptor genes in the chicken genome and to annotate these genes and ligand-receptor binding by a comparative genomics approach. Results Twenty-three chemokine and 14 chemokine receptor genes were identified in the chicken genome. All of the chicken chemokines contained a conserved CC, CXC, CX3C, or XC motif, whereas all the chemokine receptors had seven conserved transmembrane helices, four extracellular domains with a conserved cysteine, and a conserved DRYLAIV sequence in the second intracellular domain. The number of coding exons in these genes and the syntenies are highly conserved between human, mouse, and chicken although the amino acid sequence homologies are generally low between mammalian and chicken chemokines. Chicken genes were named with the systematic nomenclature used in humans and mice based on phylogeny, synteny, and sequence homology. Conclusion The independent nomenclature of chicken chemokines and chemokine receptors suggests that the chicken may have ligand-receptor pairings similar to mammals. All identified chicken chemokines and their cognate receptors were identified in the chicken genome except CCR9, whose ligand was not identified in this study. The organization of these genes suggests that there were a substantial number of these genes present before divergence between aves and mammals and more gene duplications of CC, CXC, CCR, and CXCR subfamilies in mammals than in aves after the divergence.

  5. Rodent malaria parasites : genome organization & comparative genomics

    NARCIS (Netherlands)

    Kooij, Taco W.A.

    2006-01-01

    The aim of the studies described in this thesis was to investigate the genome organization of rodent malaria parasites (RMPs) and compare the organization and gene content of the genomes of RMPs and the human malaria parasite P. falciparum. The release of the complete genome sequence of P.

  6. From genes to milk: genomic organization and epigenetic regulation of the mammary transcriptome.

    Science.gov (United States)

    Lemay, Danielle G; Pollard, Katherine S; Martin, William F; Freeman Zadrowski, Courtneay; Hernandez, Joseph; Korf, Ian; German, J Bruce; Rijnkels, Monique

    2013-01-01

    Even in genomes lacking operons, a gene's position in the genome influences its potential for expression. The mechanisms by which adjacent genes are co-expressed are still not completely understood. Using lactation and the mammary gland as a model system, we explore the hypothesis that chromatin state contributes to the co-regulation of gene neighborhoods. The mammary gland represents a unique evolutionary model, due to its recent appearance, in the context of vertebrate genomes. An understanding of how the mammary gland is regulated to produce milk is also of biomedical and agricultural importance for human lactation and dairying. Here, we integrate epigenomic and transcriptomic data to develop a comprehensive regulatory model. Neighborhoods of mammary-expressed genes were determined using expression data derived from pregnant and lactating mice and a neighborhood scoring tool, G-NEST. Regions of open and closed chromatin were identified by ChIP-Seq of histone modifications H3K36me3, H3K4me2, and H3K27me3 in the mouse mammary gland and liver tissue during lactation. We found that neighborhoods of genes in regions of uniquely active chromatin in the lactating mammary gland, compared with liver tissue, were extremely rare. Rather, genes in most neighborhoods were suppressed during lactation as reflected in their expression levels and their location in regions of silenced chromatin. Chromatin silencing was largely shared between the liver and mammary gland during lactation, and what distinguished the mammary gland was mainly a small tissue-specific repertoire of isolated, expressed genes. These findings suggest that an advantage of the neighborhood organization is in the collective repression of groups of genes via a shared mechanism of chromatin repression. Genes essential to the mammary gland's uniqueness are isolated from neighbors, and likely have less tolerance for variation in expression, properties they share with genes responsible for an organism's survival.

  7. Genomic organization of the rat alpha 2u-globulin gene cluster.

    Science.gov (United States)

    McFadyen, D A; Addison, W; Locke, J

    1999-05-01

    The alpha 2u-globulin are a group of similar proteins, belonging to the lipocalin superfamily of proteins, that are synthesized in a subset of secretory tissues in rats. The many alpha 2u-globulin isoforms are encoded by a multigene family that exhibits extensive homology. Despite a high degree of sequence identity, individual family members show diverse expression patterns involving complex hormonal, tissue-specific, and developmental regulation. Analysis suggests that there are approximately 20 alpha 2u-globulin genes in the rat genome. We have used fluorescence in situ hybridization (FISH) to show that the alpha 2u-globulin genes are clustered at a single site on rat Chromosome (Chr) 5 (5q22-24). Southern blots of rat genomic DNA separated by pulsed field gel electrophoresis indicated that the alpha 2u-globulin genes are contained on two NruI fragments with a total size of 880 kbp. Analysis of three P1 clones containing alpha 2u-globulin genes indicated that the alpha 2u-globulin genes are tandemly arranged in a head-to-tail fashion. The organization of the alpha 2u-globulin genes in the rat as a tandem array of single genes differs from the homologous major urinary protein genes in the mouse, which are organized as tandem arrays of divergently oriented gene pairs. The structure of these gene clusters may have consequences for the proposed function, as a pheromone transporter, for the protein products encoded by these genes.

  8. Uncovering the functional constraints underlying the genomic organization of the odorant-binding protein genes.

    Science.gov (United States)

    Librado, Pablo; Rozas, Julio

    2013-01-01

    Animal olfactory systems have a critical role for the survival and reproduction of individuals. In insects, the odorant-binding proteins (OBPs) are encoded by a moderately sized gene family, and mediate the first steps of the olfactory processing. Most OBPs are organized in clusters of a few paralogs, which are conserved over time. Currently, the biological mechanism explaining the close physical proximity among OBPs is not yet established. Here, we conducted a comprehensive study aiming to gain insights into the mechanisms underlying the OBP genomic organization. We found that the OBP clusters are embedded within large conserved arrangements. These organizations also include other non-OBP genes, which often encode proteins integral to plasma membrane. Moreover, the conservation degree of such large clusters is related to the following: 1) the promoter architecture of the confined genes, 2) a characteristic transcriptional environment, and 3) the chromatin conformation of the chromosomal region. Our results suggest that chromatin domains may restrict the location of OBP genes to regions having the appropriate transcriptional environment, leading to the OBP cluster structure. However, the appropriate transcriptional environment for OBP and the other neighbor genes is not dominated by reduced levels of expression noise. Indeed, the stochastic fluctuations in the OBP transcript abundance may have a critical role in the combinatorial nature of the olfactory coding process.

  9. Genomic Organization and Expression of Iron Metabolism Genes in the Emerging Pathogenic Mold Scedosporium apiospermum

    Directory of Open Access Journals (Sweden)

    Yohann Le Govic

    2018-04-01

    Full Text Available The ubiquitous mold Scedosporium apiospermum is increasingly recognized as an emerging pathogen, especially among patients with underlying disorders such as immunodeficiency or cystic fibrosis (CF. Indeed, it ranks the second among the filamentous fungi colonizing the respiratory tract of CF patients. However, our knowledge about virulence factors of this fungus is still limited. The role of iron-uptake systems may be critical for establishment of Scedosporium infections, notably in the iron-rich environment of the CF lung. Two main strategies are employed by fungi to efficiently acquire iron from their host or from their ecological niche: siderophore production and reductive iron assimilation (RIA systems. The aim of this study was to assess the existence of orthologous genes involved in iron metabolism in the recently sequenced genome of S. apiospermum. At first, a tBLASTn analysis using A. fumigatus iron-related proteins as query revealed orthologs of almost all relevant loci in the S. apiospermum genome. Whereas the genes putatively involved in RIA were randomly distributed, siderophore biosynthesis and transport genes were organized in two clusters, each containing a non-ribosomal peptide synthetase (NRPS whose orthologs in A. fumigatus have been described to catalyze hydroxamate siderophore synthesis. Nevertheless, comparative genomic analysis of siderophore-related clusters showed greater similarity between S. apiospermum and phylogenetically close molds than with Aspergillus species. The expression level of these genes was then evaluated by exposing conidia to iron starvation and iron excess. The expression of several orthologs of A. fumigatus genes involved in siderophore-based iron uptake or RIA was significantly induced during iron starvation, and conversely repressed in iron excess conditions. Altogether, these results indicate that S. apiospermum possesses the genetic information required for efficient and competitive iron uptake

  10. Genomic organization, expression, and chromosome localization of a third aurora-related kinase gene, Aie1.

    Science.gov (United States)

    Hu, H M; Chuang, C K; Lee, M J; Tseng, T C; Tang, T K

    2000-11-01

    We previously reported two novel testis-specific serine/threonine kinases, Aie1 (mouse) and AIE2 (human), that share high amino acid identities with the kinase domains of fly aurora and yeast Ipl1. Here, we report the entire intron-exon organization of the Aie1 gene and analyze the expression patterns of Aie1 mRNA during testis development. The mouse Aie1 gene spans approximately 14 kb and contains seven exons. The sequences of the exon-intron boundaries of the Aie1 gene conform to the consensus sequences (GT/AG) of the splicing donor and acceptor sites of most eukaryotic genes. Comparative genomic sequencing revealed that the gene structure is highly conserved between mouse Aie1 and human AIE2. However, much less homology was found in the sequence outside the kinase-coding domains. The Aie1 locus was mapped to mouse chromosome 7A2-A3 by fluorescent in situ hybridization. Northern blot analysis indicates that Aie1 mRNA likely is expressed at a low level on day 14 and reaches its plateau on day 21 in the developing postnatal testis. RNA in situ hybridization indicated that the expression of the Aie1 transcript was restricted to meiotically active germ cells, with the highest levels detected in spermatocytes at the late pachytene stage. These findings suggest that Aie1 plays a role in spermatogenesis.

  11. Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes

    Directory of Open Access Journals (Sweden)

    Schnitzler Christine E

    2012-12-01

    Full Text Available Abstract Background Calcium-activated photoproteins are luciferase variants found in photocyte cells of bioluminescent jellyfish (Phylum Cnidaria and comb jellies (Phylum Ctenophora. The complete genomic sequence from the ctenophore Mnemiopsis leidyi, a representative of the earliest branch of animals that emit light, provided an opportunity to examine the genome of an organism that uses this class of luciferase for bioluminescence and to look for genes involved in light reception. To determine when photoprotein genes first arose, we examined the genomic sequence from other early-branching taxa. We combined our genomic survey with gene trees, developmental expression patterns, and functional protein assays of photoproteins and opsins to provide a comprehensive view of light production and light reception in Mnemiopsis. Results The Mnemiopsis genome has 10 full-length photoprotein genes situated within two genomic clusters with high sequence conservation that are maintained due to strong purifying selection and concerted evolution. Photoprotein-like genes were also identified in the genomes of the non-luminescent sponge Amphimedon queenslandica and the non-luminescent cnidarian Nematostella vectensis, and phylogenomic analysis demonstrated that photoprotein genes arose at the base of all animals. Photoprotein gene expression in Mnemiopsis embryos begins during gastrulation in migrating precursors to photocytes and persists throughout development in the canals where photocytes reside. We identified three putative opsin genes in the Mnemiopsis genome and show that they do not group with well-known bilaterian opsin subfamilies. Interestingly, photoprotein transcripts are co-expressed with two of the putative opsins in developing photocytes. Opsin expression is also seen in the apical sensory organ. We present evidence that one opsin functions as a photopigment in vitro, absorbing light at wavelengths that overlap with peak photoprotein light

  12. The Genome of Tolypocladium inflatum: Evolution, Organization, and Expression of the Cyclosporin Biosynthetic Gene Cluster

    Science.gov (United States)

    Bushley, Kathryn E.; Raja, Rajani; Jaiswal, Pankaj; Cumbie, Jason S.; Nonogaki, Mariko; Boyd, Alexander E.; Owensby, C. Alisha; Knaus, Brian J.; Elser, Justin; Miller, Daniel; Di, Yanming; McPhail, Kerry L.; Spatafora, Joseph W.

    2013-01-01

    The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921), the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS) that encodes for cyclosporin synthetase (simA) and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc.), and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further investigations of the role

  13. Promoter characterization and genomic organization of the human X11β gene APBA2.

    LENUS (Irish Health Repository)

    Hao, Yan

    2012-02-15

    Overexpression of neuronal adaptor protein X11β has been shown to decrease the production of amyloid-β, a toxic peptide deposited in Alzheimer\\'s disease brains. Therefore, manipulation of the X11β level may represent a potential therapeutic strategy for Alzheimer\\'s disease. As X11β expression can be regulated at the transcription level, we determined the genomic organization and the promoter of the human X11β gene, amyloid β A4 precursor protein-binding family A member 2 (APBA2). By RNA ligase-mediated rapid amplification of cDNA ends, a single APBA2 transcription start site and the complete sequence of exon 1 were identified. The APBA2 promoter was located upstream of exon 1 and was more active in neurons. The core promoter contains several CpG dinucleotides, and was strongly suppressed by DNA methylation. In addition, mutagenesis analysis revealed a putative Pax5-binding site within the promoter. Together, APBA2 contains a potent neuronal promoter whose activity may be regulated by DNA methylation and Pax5.

  14. Rice sHsp genes: genomic organization and expression profiling under stress and development

    Directory of Open Access Journals (Sweden)

    Grover Anil

    2009-08-01

    Full Text Available Abstract Background Heat shock proteins (Hsps constitute an important component in the heat shock response of all living systems. Among the various plant Hsps (i.e. Hsp100, Hsp90, Hsp70 and Hsp20, Hsp20 or small Hsps (sHsps are expressed in maximal amounts under high temperature stress. The characteristic feature of the sHsps is the presence of α-crystallin domain (ACD at the C-terminus. sHsps cooperate with Hsp100/Hsp70 and co-chaperones in ATP-dependent manner in preventing aggregation of cellular proteins and in their subsequent refolding. Database search was performed to investigate the sHsp gene family across rice genome sequence followed by comprehensive expression analysis of these genes. Results We identified 40 α-crystallin domain containing genes in rice. Phylogenetic analysis showed that 23 out of these 40 genes constitute sHsps. The additional 17 genes containing ACD clustered with Acd proteins of Arabidopsis. Detailed scrutiny of 23 sHsp sequences enabled us to categorize these proteins in a revised scheme of classification constituting of 16 cytoplasmic/nuclear, 2 ER, 3 mitochondrial, 1 plastid and 1 peroxisomal genes. In the new classification proposed herein nucleo-cytoplasmic class of sHsps with 9 subfamilies is more complex in rice than in Arabidopsis. Strikingly, 17 of 23 rice sHsp genes were noted to be intronless. Expression analysis based on microarray and RT-PCR showed that 19 sHsp genes were upregulated by high temperature stress. Besides heat stress, expression of sHsp genes was up or downregulated by other abiotic and biotic stresses. In addition to stress regulation, various sHsp genes were differentially upregulated at different developmental stages of the rice plant. Majority of sHsp genes were expressed in seed. Conclusion We identified twenty three sHsp genes and seventeen Acd genes in rice. Three nucleocytoplasmic sHsp genes were found only in monocots. Analysis of expression profiling of sHsp genes revealed

  15. Rapid isolation of gene homologs across taxa: Efficient identification and isolation of gene orthologs from non-model organism genomes, a technical report

    Directory of Open Access Journals (Sweden)

    Heffer Alison

    2011-03-01

    Full Text Available Abstract Background Tremendous progress has been made in the field of evo-devo through comparisons of related genes from diverse taxa. While the vast number of species in nature precludes a complete analysis of the molecular evolution of even one single gene family, this would not be necessary to understand fundamental mechanisms underlying gene evolution if experiments could be designed to systematically sample representative points along the path of established phylogenies to trace changes in regulatory and coding gene sequence. This isolation of homologous genes from phylogenetically diverse, representative species can be challenging, especially if the gene is under weak selective pressure and evolving rapidly. Results Here we present an approach - Rapid Isolation of Gene Homologs across Taxa (RIGHT - to efficiently isolate specific members of gene families. RIGHT is based upon modification and a combination of degenerate polymerase chain reaction (PCR and gene-specific amplified fragment length polymorphism (AFLP. It allows targeted isolation of specific gene family members from any organism, only requiring genomic DNA. We describe this approach and how we used it to isolate members of several different gene families from diverse arthropods spanning millions of years of evolution. Conclusions RIGHT facilitates systematic isolation of one gene from large gene families. It allows for efficient gene isolation without whole genome sequencing, RNA extraction, or culturing of non-model organisms. RIGHT will be a generally useful method for isolation of orthologs from both distant and closely related species, increasing sample size and facilitating the tracking of molecular evolution of gene families and regulatory networks across the tree of life.

  16. Identification, characterization and metagenome analysis of oocyte-specific genes organized in clusters in the mouse genome

    Directory of Open Access Journals (Sweden)

    Vaiman Daniel

    2005-05-01

    Full Text Available Abstract Background Genes specifically expressed in the oocyte play key roles in oogenesis, ovarian folliculogenesis, fertilization and/or early embryonic development. In an attempt to identify novel oocyte-specific genes in the mouse, we have used an in silico subtraction methodology, and we have focused our attention on genes that are organized in genomic clusters. Results In the present work, five clusters have been studied: a cluster of thirteen genes characterized by an F-box domain localized on chromosome 9, a cluster of six genes related to T-cell leukaemia/lymphoma protein 1 (Tcl1 on chromosome 12, a cluster composed of a SPErm-associated glutamate (E-Rich (Speer protein expressed in the oocyte in the vicinity of four unknown genes specifically expressed in the testis on chromosome 14, a cluster composed of the oocyte secreted protein-1 (Oosp-1 gene and two Oosp-related genes on chromosome 19, all three being characterized by a partial N-terminal zona pellucida-like domain, and another small cluster of two genes on chromosome 19 as well, composed of a TWIK-Related spinal cord K+ channel encoding-gene, and an unknown gene predicted in silico to be testis-specific. The specificity of expression was confirmed by RT-PCR and in situ hybridization for eight and five of them, respectively. Finally, we showed by comparing all of the isolated and clustered oocyte-specific genes identified so far in the mouse genome, that the oocyte-specific clusters are significantly closer to telomeres than isolated oocyte-specific genes are. Conclusion We have studied five clusters of genes specifically expressed in female, some of them being also expressed in male germ-cells. Moreover, contrarily to non-clustered oocyte-specific genes, those that are organized in clusters tend to map near chromosome ends, suggesting that this specific near-telomere position of oocyte-clusters in rodents could constitute an evolutionary advantage. Understanding the biological

  17. The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands

    Science.gov (United States)

    de Cambiaire, Jean-Charles; Otis, Christian; Lemieux, Claude; Turmel, Monique

    2006-01-01

    Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. While the basal position of the Prasinophyceae is well established, the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae (UTC) remains uncertain. The five complete chloroplast DNA (cpDNA) sequences currently available for representatives of these classes display considerable variability in overall structure, gene content, gene density, intron content and gene order. Among these genomes, that of the chlorophycean green alga Chlamydomonas reinhardtii has retained the least ancestral features. The two single-copy regions, which are separated from one another by the large inverted repeat (IR), have similar sizes, rather than unequal sizes, and differ radically in both gene contents and gene organizations relative to the single-copy regions of prasinophyte and ulvophyte cpDNAs. To gain insights into the various changes that underwent the chloroplast genome during the evolution of chlorophycean green algae, we have sequenced the cpDNA of Scenedesmus obliquus, a member of a distinct chlorophycean lineage. Results The 161,452 bp IR-containing genome of Scenedesmus features single-copy regions of similar sizes, encodes 96 genes, i.e. only two additional genes (infA and rpl12) relative to its Chlamydomonas homologue and contains seven group I and two group II introns. It is clearly more compact than the four UTC algal cpDNAs that have been examined so far, displays the lowest proportion of short repeats among these algae and shows a stronger bias in clustering of genes on the same DNA strand compared to Chlamydomonas cpDNA. Like the latter genome, Scenedesmus cpDNA displays only a few ancestral gene clusters. The two chlorophycean genomes share 11 gene clusters that are not found in previously sequenced trebouxiophyte and ulvophyte cpDNAs as well as a few genes that have an unusual structure; however, their single-copy regions differ

  18. Genomic Organization, Phylogenetic Comparison and Differential Expression of the SBP-Box Family Genes in Grape

    Science.gov (United States)

    Hou, Hongmin; Li, Jun; Gao, Min; Singer, Stacy D.; Wang, Hao; Mao, Linyong; Fei, Zhangjun; Wang, Xiping

    2013-01-01

    Background The SBP-box gene family is specific to plants and encodes a class of zinc finger-containing transcription factors with a broad range of functions. Although SBP-box genes have been identified in numerous plants including green algae, moss, silver birch, snapdragon, Arabidopsis, rice and maize, there is little information concerning SBP-box genes, or the corresponding miR156/157, function in grapevine. Methodology/Principal Findings Eighteen SBP-box gene family members were identified in Vitis vinifera, twelve of which bore sequences that were complementary to miRNA156/157. Phylogenetic reconstruction demonstrated that plant SBP-domain proteins could be classified into seven subgroups, with the V. vinifera SBP-domain proteins being more closely related to SBP-domain proteins from dicotyledonous angiosperms than those from monocotyledonous angiosperms. In addition, synteny analysis between grape and Arabidopsis demonstrated that homologs of several grape SBP genes were found in corresponding syntenic blocks of Arabidopsis. Expression analysis of the grape SBP-box genes in various organs and at different stages of fruit development in V. quinquangularis ‘Shang-24’ revealed distinct spatiotemporal patterns. While the majority of the grape SBP-box genes lacking a miR156/157 target site were expressed ubiquitously and constitutively, most genes bearing a miR156/157 target site exhibited distinct expression patterns, possibly due to the inhibitory role of the microRNA. Furthermore, microarray data mining and quantitative real-time RT-PCR analysis identified several grape SBP-box genes that are potentially involved in the defense against biotic and abiotic stresses. Conclusion The results presented here provide a further understanding of SBP-box gene function in plants, and yields additional insights into the mechanism of stress management in grape, which may have important implications for the future success of this crop. PMID:23527172

  19. Genomic organization, transcript variants and comparative analysis of the human nucleoporin 155 (NUP155) gene

    DEFF Research Database (Denmark)

    Zhang, X.; Yang, J.; Yu, J.

    2002-01-01

    Nucleoporin 155 (Nup155) is a major component of the nuclear pore complex (NPC) involved in cellular nucleo-cytoplasmic transport. We have acquired the complete sequence and interpreted the genomic organization of the Nup155 orthologos from human (Homo sapiens) and pufferfish (Fugu rubripes), which...... complementary to RNAs of the Nup155 orthologs from Fugu and mouse. Comparative analysis of the Nup155 orthologs in many species, including H. sapiens, Mus musculus, Rattus norvegicus, F. rubripes, Arabidopsis thaliana, Drosophila melanogaster, and Saccharomyces cerevisiae, has revealed two paralogs in S...

  20. Genomic organization and the tissue distribution of alternatively spliced isoforms of the mouse Spatial gene

    Directory of Open Access Journals (Sweden)

    Mattei Marie-Geneviève

    2004-07-01

    Full Text Available Abstract Background The stromal component of the thymic microenvironment is critical for T lymphocyte generation. Thymocyte differentiation involves a cascade of coordinated stromal genes controlling thymocyte survival, lineage commitment and selection. The "Stromal Protein Associated with Thymii And Lymph-node" (Spatial gene encodes a putative transcription factor which may be involved in T-cell development. In the testis, the Spatial gene is also expressed by round spermatids during spermatogenesis. Results The Spatial gene maps to the B3-B4 region of murine chromosome 10 corresponding to the human syntenic region 10q22.1. The mouse Spatial genomic DNA is organised into 10 exons and is alternatively spliced to generate two short isoforms (Spatial-α and -γ and two other long isoforms (Spatial-δ and -ε comprising 5 additional exons on the 3' site. Here, we report the cloning of a new short isoform, Spatial-β, which differs from other isoforms by an additional alternative exon of 69 bases. This new exon encodes an interesting proline-rich signature that could confer to the 34 kDa Spatial-β protein a particular function. By quantitative TaqMan RT-PCR, we have shown that the short isoforms are highly expressed in the thymus while the long isoforms are highly expressed in the testis. We further examined the inter-species conservation of Spatial between several mammals and identified that the protein which is rich in proline and positive amino acids, is highly conserved. Conclusions The Spatial gene generates at least five alternative spliced variants: three short isoforms (Spatial-α, -β and -γ highly expressed in the thymus and two long isoforms (Spatial-δ and -ε highly expressed in the testis. These alternative spliced variants could have a tissue specific function.

  1. Visualizing conserved gene location across microbe genomes

    Science.gov (United States)

    Shaw, Chris D.

    2009-01-01

    This paper introduces an analysis-based zoomable visualization technique for displaying the location of genes across many related species of microbes. The purpose of this visualizatiuon is to enable a biologist to examine the layout of genes in the organism of interest with respect to the gene organization of related organisms. During the genomic annotation process, the ability to observe gene organization in common with previously annotated genomes can help a biologist better confirm the structure and function of newly analyzed microbe DNA sequences. We have developed a visualization and analysis tool that enables the biologist to observe and examine gene organization among genomes, in the context of the primary sequence of interest. This paper describes the visualization and analysis steps, and presents a case study using a number of Rickettsia genomes.

  2. Persistence drives gene clustering in bacterial genomes

    Directory of Open Access Journals (Sweden)

    Rocha Eduardo PC

    2008-01-01

    Full Text Available Abstract Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering.

  3. Insights into the Evolution of a Snake Venom Multi-Gene Family from the Genomic Organization of Echis ocellatus SVMP Genes

    Directory of Open Access Journals (Sweden)

    Libia Sanz

    2016-07-01

    Full Text Available The molecular events underlying the evolution of the Snake Venom Metalloproteinase (SVMP family from an A Disintegrin And Metalloproteinase (ADAM ancestor remain poorly understood. Comparative genomics may provide decisive information to reconstruct the evolutionary history of this multi-locus toxin family. Here, we report the genomic organization of Echis ocellatus genes encoding SVMPs from the PII and PI classes. Comparisons between them and between these genes and the genomic structures of Anolis carolinensis ADAM28 and E. ocellatus PIII-SVMP EOC00089 suggest that insertions and deletions of intronic regions played key roles along the evolutionary pathway that shaped the current diversity within the multi-locus SVMP gene family. In particular, our data suggest that emergence of EOC00028-like PI-SVMP from an ancestral PII(e/d-type SVMP involved splicing site mutations that abolished both the 3′ splice AG acceptor site of intron 12* and the 5′ splice GT donor site of intron 13*, and resulted in the intronization of exon 13* and the consequent destruction of the structural integrity of the PII-SVMP characteristic disintegrin domain.

  4. Genome-Wide Identification, Phylogeny, and Expression Analysis of ARF Genes Involved in Vegetative Organs Development in Switchgrass

    Directory of Open Access Journals (Sweden)

    Jianli Wang

    2018-01-01

    Full Text Available Auxin response factors (ARFs have been reported to play vital roles during plant growth and development. In order to reveal specific functions related to vegetative organs in grasses, an in-depth study of the ARF gene family was carried out in switchgrass (Panicum virgatum L., a warm-season C4 perennial grass that is mostly used as bioenergy and animal feedstock. A total of 47 putative ARF genes (PvARFs were identified in the switchgrass genome (2n = 4x = 36, 42 of which were anchored to the seven pairs of chromosomes and found to be unevenly distributed. Sixteen PvARFs were predicted to be potential targets of small RNAs (microRNA160 and 167. Phylogenetically speaking, PvARFs were divided into seven distinct subgroups based on the phylogeny, exon/intron arrangement, and conserved motif distribution. Moreover, 15 pairs of PvARFs have different temporal-spatial expression profiles in vegetative organs (2nd, 3rd, and 4th internode and leaves, which implies that different PvARFs have specific functions in switchgrass growth and development. In addition, at least 14 pairs of PvARFs respond to naphthylacetic acid (NAA treatment, which might be helpful for us to study on auxin response in switchgrass. The comprehensive analysis, described here, will facilitate the future functional analysis of ARF genes in grasses.

  5. An original SERPINA3 gene cluster: Elucidation of genomic organization and gene expression in the Bos taurus 21q24 region

    Directory of Open Access Journals (Sweden)

    Ouali Ahmed

    2008-04-01

    Full Text Available Abstract Background The superfamily of serine proteinase inhibitors (serpins is involved in numerous fundamental biological processes as inflammation, blood coagulation and apoptosis. Our interest is focused on the SERPINA3 sub-family. The major human plasma protease inhibitor, α1-antichymotrypsin, encoded by the SERPINA3 gene, is homologous to genes organized in clusters in several mammalian species. However, although there is a similar genic organization with a high degree of sequence conservation, the reactive-centre-loop domains, which are responsible for the protease specificity, show significant divergences. Results We provide additional information by analyzing the situation of SERPINA3 in the bovine genome. A cluster of eight genes and one pseudogene sharing a high degree of identity and the same structural organization was characterized. Bovine SERPINA3 genes were localized by radiation hybrid mapping on 21q24 and only spanned over 235 Kilobases. For all these genes, we propose a new nomenclature from SERPINA3-1 to SERPINA3-8. They share approximately 70% of identity with the human SERPINA3 homologue. In the cluster, we described an original sub-group of six members with an unexpected high degree of conservation for the reactive-centre-loop domain, suggesting a similar peptidase inhibitory pattern. Preliminary expression analyses of these bovSERPINA3s showed different tissue-specific patterns and diverse states of glycosylation and phosphorylation. Finally, in the context of phylogenetic analyses, we improved our knowledge on mammalian SERPINAs evolution. Conclusion Our experimental results update data of the bovine genome sequencing, substantially increase the bovSERPINA3 sub-family and enrich the phylogenetic tree of serpins. We provide new opportunities for future investigations to approach the biological functions of this unusual subset of serine proteinase inhibitors.

  6. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    Science.gov (United States)

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  7. Discovery of global genomic re-organization based on comparison of two newly sequenced rice mitochondrial genomes with cytoplasmic male sterility-related genes

    Directory of Open Access Journals (Sweden)

    Yamada Mari

    2010-03-01

    Full Text Available Abstract Background Plant mitochondrial genomes are known for their complexity, and there is abundant evidence demonstrating that this organelle is important for plant sexual reproduction. Cytoplasmic male sterility (CMS is a phenomenon caused by incompatibility between the nucleus and mitochondria that has been discovered in various plant species. As the exact sequence of steps leading to CMS has not yet been revealed, efforts should be made to elucidate the factors underlying the mechanism of this important trait for crop breeding. Results Two CMS mitochondrial genomes, LD-CMS, derived from Oryza sativa L. ssp. indica (434,735 bp, and CW-CMS, derived from Oryza rufipogon Griff. (559,045 bp, were newly sequenced in this study. Compared to the previously sequenced Nipponbare (Oryza sativa L. ssp. japonica mitochondrial genome, the presence of 54 out of 56 protein-encoding genes (including pseudo-genes, 22 tRNA genes (including pseudo-tRNAs, and three rRNA genes was conserved. Two other genes were not present in the CW-CMS mitochondrial genome, and one of them was present as part of the newly identified chimeric ORF, CW-orf307. At least 12 genomic recombination events were predicted between the LD-CMS mitochondrial genome and Nipponbare, and 15 between the CW-CMS genome and Nipponbare, and novel genetic structures were formed by these genomic rearrangements in the two CMS lines. At least one of the genomic rearrangements was completely unique to each CMS line and not present in 69 rice cultivars or 9 accessions of O. rufipogon. Conclusion Our results demonstrate novel mitochondrial genomic rearrangements that are unique in CMS cytoplasm, and one of the genes that is unique in the CW mitochondrial genome, CW-orf307, appeared to be the candidate most likely responsible for the CW-CMS event. Genomic rearrangements were dynamic in the CMS lines in comparison with those of rice cultivars, suggesting that 'death' and possible 'birth' processes of the

  8. The genomic organization of four b-1,4-endoglucanase genes in plant-parasitic cyst nematodes and its evolutionary implications.

    NARCIS (Netherlands)

    Yan, Y.; Smant, G.; Stokkermans, J.P.W.G.; Qin Ling,; Baum, T.J.; Schots, A.; Davis, E.L.

    1998-01-01

    The genomic organization of genes encoding -1,4-endoglucanases (cellulases) from the plant-parasitic cyst nematodes Heterodera glycines and Globodera rostochiensis (HG-eng1, Hg-eng2, GR-eng1, and GR-eng2) was investigated. HG-eng1 and GR-eng1 both contained eight introns and structural domains of

  9. Genomic organization, tissue distribution and functional characterization of the rat Pate gene cluster.

    Directory of Open Access Journals (Sweden)

    Angireddy Rajesh

    Full Text Available The cysteine rich prostate and testis expressed (Pate proteins identified till date are thought to resemble the three fingered protein/urokinase-type plasminogen activator receptor proteins. In this study, for the first time, we report the identification, cloning and characterization of rat Pate gene cluster and also determine the expression pattern. The rat Pate genes are clustered on chromosome 8 and their predicted proteins retained the ten cysteine signature characteristic to TFP/Ly-6 protein family. PATE and PATE-F three dimensional protein structure was found to be similar to that of the toxin bucandin. Though Pate gene expression is thought to be prostate and testis specific, we observed that rat Pate genes are also expressed in seminal vesicle and epididymis and in tissues beyond the male reproductive tract. In the developing rats (20-60 day old, expression of Pate genes seem to be androgen dependent in the epididymis and testis. In the adult rat, androgen ablation resulted in down regulation of the majority of Pate genes in the epididymides. PATE and PATE-F proteins were found to be expressed abundantly in the male reproductive tract of rats and on the sperm. Recombinant PATE protein exhibited potent antibacterial activity, whereas PATE-F did not exhibit any antibacterial activity. Pate expression was induced in the epididymides when challenged with LPS. Based on our results, we conclude that rat PATE proteins may contribute to the reproductive and defense functions.

  10. The mitochondrial genome of the stingless bee Melipona bicolor (Hymenoptera, Apidae, Meliponini: sequence, gene organization and a unique tRNA translocation event conserved across the tribe Meliponini

    Directory of Open Access Journals (Sweden)

    Daniela Silvestre

    2008-01-01

    Full Text Available At present a complete mtDNA sequence has been reported for only two hymenopterans, the Old World honey bee, Apis mellifera and the sawfly Perga condei. Among the bee group, the tribe Meliponini (stingless bees has some distinction due to its Pantropical distribution, great number of species and large importance as main pollinators in several ecosystems, including the Brazilian rain forest. However few molecular studies have been conducted on this group of bees and few sequence data from mitochondrial genomes have been described. In this project, we PCR amplified and sequenced 78% of the mitochondrial genome of the stingless bee Melipona bicolor (Apidae, Meliponini. The sequenced region contains all of the 13 mitochondrial protein-coding genes, 18 of 22 tRNA genes, and both rRNA genes (one of them was partially sequenced. We also report the genome organization (gene content and order, gene translation, genetic code, and other molecular features, such as base frequencies, codon usage, gene initiation and termination. We compare these characteristics of M. bicolor to those of the mitochondrial genome of A. mellifera and other insects. A highly biased A+T content is a typical characteristic of the A. mellifera mitochondrial genome and it was even more extreme in that of M. bicolor. Length and compositional differences between M. bicolor and A. mellifera genes were detected and the gene order was compared. Eleven tRNA gene translocations were observed between these two species. This latter finding was surprising, considering the taxonomic proximity of these two bee tribes. The tRNA Lys gene translocation was investigated within Meliponini and showed high conservation across the Pantropical range of the tribe.

  11. Genomic organization of the mouse peroxisome proliferator-activated receptor beta/delta gene

    DEFF Research Database (Denmark)

    Larsen, Leif K; Amri, Ez-Zoubir; Mandrup, Susanne

    2002-01-01

    Peroxisome proliferator-activated receptor (PPAR) beta/delta is ubiquitously expressed, but the level of expression differs markedly between different cell types. In order to determine the molecular mechanisms governing PPARbeta/delta gene expression, we have isolated and characterized the mouse...

  12. Chloroplast Genome of the Folk Medicine and Vegetable Plant Talinum paniculatum (Jacq.) Gaertn.: Gene Organization, Comparative and Phylogenetic Analysis.

    Science.gov (United States)

    Liu, Xia; Li, Yuan; Yang, Hongyuan; Zhou, Boyang

    2018-04-09

    The complete chloroplast (cp) genome of Talinum paniculatum (Caryophyllale), a source of pharmaceutical efficacy similar to ginseng, and a widely distributed and planted edible vegetable, were sequenced and analyzed. The cp genome size of T. paniculatum is 156,929 bp, with a pair of inverted repeats (IRs) of 25,751 bp separated by a large single copy (LSC) region of 86,898 bp and a small single copy (SSC) region of 18,529 bp. The genome contains 83 protein-coding genes, 37 transfer RNA (tRNA) genes, eight ribosomal RNA (rRNA) genes and four pseudogenes. Fifty one (51) repeat units and ninety two (92) simple sequence repeats (SSRs) were found in the genome. The pseudogene rpl23 (Ribosomal protein L23) was insert AATT than other Caryophyllale species by sequence alignment, which located in IRs region. The gene of trnK-UUU (tRNA-Lys) and rpl16 (Ribosomal protein L16) have larger introns in T. paniculatum , and the existence of matK (maturase K) genes, which usually located in the introns of trnK-UUU , rich sequence divergence in Caryophyllale. Complete cp genome comparison with other eight Caryophyllales species indicated that the differences between T. paniculatum and P. oleracea were very slight, and the most highly divergent regions occurred in intergenic spacers. Comparisons of IR boundaries among nine Caryophyllales species showed that T. paniculatum have larger IRs region and the contraction is relatively slight. The phylogenetic analysis among 35 Caryophyllales species and two outgroup species revealed that T. paniculatum and P. oleracea do not belong to the same family. All these results give good opportunities for future identification, barcoding of Talinum species, understanding the evolutionary mode of Caryophyllale cp genome and molecular breeding of T. paniculatum with high pharmaceutical efficacy.

  13. Heat Shock Protein Genes Undergo Dynamic Alteration in Their Three-Dimensional Structure and Genome Organization in Response to Thermal Stress.

    Science.gov (United States)

    Chowdhary, Surabhi; Kainth, Amoldeep S; Gross, David S

    2017-12-15

    Three-dimensional (3D) chromatin organization is important for proper gene regulation, yet how the genome is remodeled in response to stress is largely unknown. Here, we use a highly sensitive version of chromosome conformation capture in combination with fluorescence microscopy to investigate Heat Shock Protein ( HSP ) gene conformation and 3D nuclear organization in budding yeast. In response to acute thermal stress, HSP genes undergo intense intragenic folding interactions that go well beyond 5'-3' gene looping previously described for RNA polymerase II genes. These interactions include looping between upstream activation sequence (UAS) and promoter elements, promoter and terminator regions, and regulatory and coding regions (gene "crumpling"). They are also dynamic, being prominent within 60 s, peaking within 2.5 min, and attenuating within 30 min, and correlate with HSP gene transcriptional activity. With similarly striking kinetics, activated HSP genes, both chromosomally linked and unlinked, coalesce into discrete intranuclear foci. Constitutively transcribed genes also loop and crumple yet fail to coalesce. Notably, a missense mutation in transcription factor TFIIB suppresses gene looping, yet neither crumpling nor HSP gene coalescence is affected. An inactivating promoter mutation, in contrast, obviates all three. Our results provide evidence for widespread, transcription-associated gene crumpling and demonstrate the de novo assembly and disassembly of HSP gene foci. Copyright © 2017 American Society for Microbiology.

  14. Cloning and Genomic Organization of a Rhamnogalacturonase Gene from Locally Isolated Strain of Aspergillus niger.

    Science.gov (United States)

    Damak, Naourez; Abdeljalil, Salma; Taeib, Noomen Hadj; Gargouri, Ali

    2015-08-01

    The rhg gene encoding a rhamnogalacturonase was isolated from the novel strain A1 of Aspergillus niger. It consists of an ORF of 1.505 kb encoding a putative protein of 446 amino acids with a predicted molecular mass of 47 kDa, belonging to the family 28 of glycosyl hydrolases. The nature and position of amino acids comprising the active site as well as the three-dimensional structure were well conserved between the A. niger CTM10548 and fungal rhamnogalacturonases. The coding region of the rhg gene is interrupted by three short introns of 56 (introns 1 and 3) and 52 (intron 2) bp in length. The comparison of the peptide sequence with A. niger rhg sequences revealed that the A1 rhg should be an endo-rhamnogalacturonases, more homologous to rhg A than rhg B A. niger known enzymes. The comparison of rhg nucleotide sequence from A. niger A1 with rhg A from A. niger shows several base changes. Most of these changes (59 %) are located at the third base of codons suggesting maintaining the same enzyme function. We used the rhamnogalacturonase A from Aspergillus aculeatus as a template to build a structural model of rhg A1 that adopted a right-handed parallel β-helix.

  15. The genome of Paenibacillus sabinae T27 provides insight into evolution, organization and functional elucidation of nif and nif-like genes.

    Science.gov (United States)

    Li, Xinxin; Deng, Zhiping; Liu, Zhanzhi; Yan, Yongliang; Wang, Tianshu; Xie, Jianbo; Lin, Min; Cheng, Qi; Chen, Sanfeng

    2014-08-27

    Most biological nitrogen fixation is catalyzed by the molybdenum nitrogenase. This enzyme is a complex which contains the MoFe protein encoded by nifDK and the Fe protein encoded by nifH. In addition to nifHDK, nifHDK-like genes were found in some Archaea and Firmicutes, but their function is unclear. We sequenced the genome of Paenibacillus sabinae T27. A total of 4,793 open reading frames were predicted from its 5.27 Mb genome. The genome of P. sabinae T27 contains fifteen nitrogen fixation (nif) genes, including three nifH, one nifD, one nifK, four nifB, two nifE, two nifN, one nifX and one nifV. Of the 15 nif genes, eight nif genes (nifB, nifH, nifD, nifK, nifE, nifN, nifX and nifV) and two non-nif genes (orf1 and hesA) form a complete nif gene cluster. In addition to the nif genes, there are nitrogenase-like genes, including two nifH-like genes and five pairs of nifDK-like genes. IS elements on the flanking regions of nif and nif-like genes imply that these genes might have been obtained by horizontal gene transfer. Phylogenies of the concatenated 8 nif gene (nifB, nifH, nifD, nifK, nifE, nifN, nifX and nifV) products suggest that P. sabinae T27 is closely related to Frankia. RT-PCR analysis showed that the complete nif gene cluster is organized as an operon. We demonstrated that the complete nif gene cluster under the control of σ70-dependent promoter enabled Escherichia coli JM109 to fix nitrogen. Also, here for the first time we demonstrated that unlike nif genes, the transcriptions of nifHDK-like genes were not regulated by ammonium and oxygen, and nifH-like or nifD-like gene could not restore the nitrogenase activity of Klebsiella pneumonia nifH- and nifD- mutant strains, respectively, suggesting that nifHDK-like genes were not involved in nitrogen fixation. Our data and analysis reveal the contents and distribution of nif and nif-like genes and contribute to the study of evolutionary history of nitrogen fixation in Paenibacillus. For the first time we

  16. Genomic organization and promoter cloning of the human X11α gene APBA1.

    LENUS (Irish Health Repository)

    Chai, Ka-Ho

    2012-05-01

    X11α is a brain specific multi-modular protein that interacts with the Alzheimer\\'s disease amyloid precursor protein (APP). Aggregation of amyloid-β peptide (Aβ), an APP cleavage product, is believed to be central to the pathogenesis of Alzheimer\\'s disease. Recently, overexpression of X11α has been shown to reduce Aβ generation and to ameliorate memory deficit in a transgenic mouse model of Alzheimer\\'s disease. Therefore, manipulating the expression level of X11α may provide a novel route for the treatment of Alzheimer\\'s disease. Human X11α is encoded by the gene APBA1. As evidence suggests that X11α expression can be regulated at transcription level, we have determined the gene structure and cloned the promoter of APBA1. APBA1 spans over 244 kb on chromosome 9 and is composed of 13 exons and has multiple transcription start sites. A putative APBA1 promoter has been identified upstream of exon 1 and functional analysis revealed that this is highly active in neurons. By deletion analysis, the minimal promoter was found to be located between -224 and +14, a GC-rich region that contains a functional Sp3 binding site. In neurons, overexpression of Sp3 stimulates the APBA1 promoter while an Sp3 inhibitor suppresses the promoter activity. Moreover, inhibition of Sp3 reduces endogenous X11α expression and promotes the generation of Aβ. Our findings reveal that Sp3 play an essential role in APBA1 transcription.

  17. Utility of sequenced genomes for microsatellite marker development in non-model organisms: a case study of functionally important genes in nine-spined sticklebacks (Pungitius pungitius

    Directory of Open Access Journals (Sweden)

    Shimada Yukinori

    2010-05-01

    Full Text Available Abstract Background Identification of genes involved in adaptation and speciation by targeting specific genes of interest has become a plausible strategy also for non-model organisms. We investigated the potential utility of available sequenced fish genomes to develop microsatellite (cf. simple sequence repeat, SSR markers for functionally important genes in nine-spined sticklebacks (Pungitius pungitius, as well as cross-species transferability of SSR primers from three-spined (Gasterosteus aculeatus to nine-spined sticklebacks. In addition, we examined the patterns and degree of SSR conservation between these species using their aligned sequences. Results Cross-species amplification success was lower for SSR markers located in or around functionally important genes (27 out of 158 than for those randomly derived from genomic (35 out of 101 and cDNA (35 out of 87 libraries. Polymorphism was observed at a large proportion (65% of the cross-amplified loci independently of SSR type. To develop SSR markers for functionally important genes in nine-spined sticklebacks, SSR locations were surveyed in or around 67 target genes based on the three-spined stickleback genome and these regions were sequenced with primers designed from conserved sequences in sequenced fish genomes. Out of the 81 SSRs identified in the sequenced regions (44,084 bp, 57 exhibited the same motifs at the same locations as in the three-spined stickleback. Di- and trinucleotide SSRs appeared to be highly conserved whereas mononucleotide SSRs were less so. Species-specific primers were designed to amplify 58 SSRs using the sequences of nine-spined sticklebacks. Conclusions Our results demonstrated that a large proportion of SSRs are conserved in the species that have diverged more than 10 million years ago. Therefore, the three-spined stickleback genome can be used to predict SSR locations in the nine-spined stickleback genome. While cross-species utility of SSR primers is limited due

  18. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  19. Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

    Directory of Open Access Journals (Sweden)

    Toub Omid

    2010-10-01

    Full Text Available Abstract Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS were predicted by in silico analysis of the grapevine (Vitis vinifera genome assembly 1. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information

  20. The complete chloroplast genome sequence of Ampelopsis: gene organization, comparative analysis and phylogenetic relationships to other angiosperms

    Directory of Open Access Journals (Sweden)

    Gurusamy eRaman

    2016-03-01

    Full Text Available Ampelopsis brevipedunculata is an economically important plant that belongs to the Vitaceae family of angiosperms. The phylogenetic placement of Vitaceae is still unresolved. Recent phylogenetic studies suggested that it should be placed in various alternative families including Caryophyllaceae, asteraceae, Saxifragaceae, Dilleniaceae, or with the rest of the rosid families. However, these analyses provided weak supportive results because they were based on only one of several genes. Accordingly, complete chloroplast genome sequences are required to resolve the phylogenetic relationships among angiosperms. Recent phylogenetic analyses based on the complete chloroplast genome sequence suggested strong support for the position of Vitaceae as the earliest diverging lineage of rosids and placed it as a sister to the remaining rosids. These studies also revealed relationships among several major lineages of angiosperms; however, they highlighted the significance of taxon sampling for obtaining accurate phylogenies. In the present study, we sequenced the complete chloroplast genome of A. brevipedunculata and used these data to assess the relationships among 32 angiosperms, including 18 taxa of rosids. The Ampelopsis chloroplast genome is 161,090 bp in length, and includes a pair of inverted repeats of 26,394 bp that are separated by small and large single copy regions of 19,036 bp and 89,266 bp, respectively. The gene content and order of Ampelopsis is identical to many other unrearranged angiosperm chloroplast genomes, including Vitis and tobacco. A phylogenetic tree constructed based on 70 protein-coding genes of 33 angiosperms showed that both Saxifragales and Vitaceae diverged from the rosid clade and formed two clades with 100% bootstrap value. The position of the Vitaceae is sister to Saxifragales, and both are the basal and earliest diverging lineages. Moreover, Saxifragales forms a sister clade to Vitaceae of rosids. Overall, the results of

  1. A Comprehensive Analysis of the Phylogeny, Genomic Organization and Expression of Immunoglobulin Light Chain Genes in Alligator sinensis, an Endangered Reptile Species

    Science.gov (United States)

    Lu, Yan; Zhang, Chenglin; Wu, Xiaobing; Han, Haitang; Zhao, Yaofeng; Ren, Liming

    2016-01-01

    Crocodilians are evolutionarily distinct reptiles that are distantly related to lizards and are thought to be the closest relatives of birds. Compared with birds and mammals, few studies have investigated the Ig light chain of crocodilians. Here, employing an Alligator sinensis genomic bacterial artificial chromosome (BAC) library and available genome data, we characterized the genomic organization of the Alligator sinensis IgL gene loci. The Alligator sinensis has two IgL isotypes, λ and κ, the same as Anolis carolinensis. The Igλ locus contains 6 Cλ genes, each preceded by a Jλ gene, and 86 potentially functional Vλ genes upstream of (Jλ-Cλ)n. The Igκ locus contains a single Cκ gene, 6 Jκs and 62 functional Vκs. All VL genes are classified into a total of 31 families: 19 Vλ families and 12 Vκ families. Based on an analysis of the chromosomal location of the light chain genes among mammals, birds, lizards and frogs, the data further confirm that there are two IgL isotypes in the Alligator sinensis: Igλ and Igκ. By analyzing the cloned Igλ/κ cDNA, we identified a biased usage pattern of V families in the expressed Vλ and Vκ. An analysis of the junctions of the recombined VJ revealed the presence of N and P nucleotides in both expressed λ and κ sequences. Phylogenetic analysis of the V genes revealed V families shared by mammals, birds, reptiles and Xenopus, suggesting that these conserved V families are orthologous and have been retained during the evolution of IgL. Our data suggest that the Alligator sinensis IgL gene repertoire is highly diverse and complex and provide insight into immunoglobulin gene evolution in vertebrates. PMID:26901135

  2. A Comprehensive Analysis of the Phylogeny, Genomic Organization and Expression of Immunoglobulin Light Chain Genes in Alligator sinensis, an Endangered Reptile Species.

    Directory of Open Access Journals (Sweden)

    Xifeng Wang

    Full Text Available Crocodilians are evolutionarily distinct reptiles that are distantly related to lizards and are thought to be the closest relatives of birds. Compared with birds and mammals, few studies have investigated the Ig light chain of crocodilians. Here, employing an Alligator sinensis genomic bacterial artificial chromosome (BAC library and available genome data, we characterized the genomic organization of the Alligator sinensis IgL gene loci. The Alligator sinensis has two IgL isotypes, λ and κ, the same as Anolis carolinensis. The Igλ locus contains 6 Cλ genes, each preceded by a Jλ gene, and 86 potentially functional Vλ genes upstream of (Jλ-Cλn. The Igκ locus contains a single Cκ gene, 6 Jκs and 62 functional Vκs. All VL genes are classified into a total of 31 families: 19 Vλ families and 12 Vκ families. Based on an analysis of the chromosomal location of the light chain genes among mammals, birds, lizards and frogs, the data further confirm that there are two IgL isotypes in the Alligator sinensis: Igλ and Igκ. By analyzing the cloned Igλ/κ cDNA, we identified a biased usage pattern of V families in the expressed Vλ and Vκ. An analysis of the junctions of the recombined VJ revealed the presence of N and P nucleotides in both expressed λ and κ sequences. Phylogenetic analysis of the V genes revealed V families shared by mammals, birds, reptiles and Xenopus, suggesting that these conserved V families are orthologous and have been retained during the evolution of IgL. Our data suggest that the Alligator sinensis IgL gene repertoire is highly diverse and complex and provide insight into immunoglobulin gene evolution in vertebrates.

  3. Genome-Wide Analysis of Soybean LATERAL ORGAN BOUNDARIES Domain-Containing Genes: A Functional Investigation of GmLBD12

    Directory of Open Access Journals (Sweden)

    Hui Yang

    2017-03-01

    Full Text Available Plant-specific ( genes play critical roles in various plant growth and development processes. However, the number and characteristics of genes in soybean [ (L. Merr.] remain unknown. Here, we identified 90 homologous genes in the soybean genome that phylogenetically clustered into two classes (I and II. The majority of the genes were evenly distributed across all 20 soybean chromosomes, and 77 (81.11% of them were detected in segmental duplicated regions. Furthermore, the exon–intron organization and motif composition for each were analyzed. A close phylogenetic relationship was identified between the soybean genes and 41 previously reported genes of different plants in the same group, providing insights into their putative functions. Expression analysis indicated that more than half of the genes were expressed, with the two gene classes showing differential tissue expression characteristics; in addition, they were differentially induced by biotic and abiotic stresses. To further explore the functions of genes in soybean, was selected for functional characterization. GmLBD12 was mainly localized to the nucleus and showed high expression in root and seed tissues. Overexpressing in (L. Heynh resulted in increases in lateral root (LR number and plant height. Quantitative real-time polymerase chain reaction (qRT-PCR analysis demonstrated that was induced by drought, salt, cold, indole acetic acid (IAA, abscisic acid (ABA, and salicylic acid SA treatments. This study provides the first comprehensive analysis of the soybean gene family and a valuable foundation for future functional studies of genes.

  4. Genomic signatures of local directional selection in a high gene flow marine organism, the Atlantic cod (Gadus morhua)

    DEFF Research Database (Denmark)

    Eg Nielsen, Einar; Hansen, Jakob Hemmer; Poulsen, Nina Aagaard

    2009-01-01

    -associated single nucleotide polymorphisms (SNPs) for evidence of selection in local populations of Atlantic cod (Gadus morhua L.) across the species distribution. Results: Our global genome scan analysis identified eight outlier gene loci with very high statistical support, likely to be subject to directional...... selection in local demes, or closely linked to loci under selection. Likewise, on a regional south/north transect of central and eastern Atlantic populations, seven loci displayed strongly elevated levels of genetic differentiation. Selection patterns among populations appeared to be relatively widespread...

  5. Genomic signatures of local directional selection in a high gene flow marine organism; the Atlantic cod (Gadus morhua

    Directory of Open Access Journals (Sweden)

    Mittelholzer Christian

    2009-12-01

    Full Text Available Abstract Background Marine fishes have been shown to display low levels of genetic structuring and associated high levels of gene flow, suggesting shallow evolutionary trajectories and, possibly, limited or lacking adaptive divergence among local populations. We investigated variation in 98 gene-associated single nucleotide polymorphisms (SNPs for evidence of selection in local populations of Atlantic cod (Gadus morhua L. across the species distribution. Results Our global genome scan analysis identified eight outlier gene loci with very high statistical support, likely to be subject to directional selection in local demes, or closely linked to loci under selection. Likewise, on a regional south/north transect of central and eastern Atlantic populations, seven loci displayed strongly elevated levels of genetic differentiation. Selection patterns among populations appeared to be relatively widespread and complex, i.e. outlier loci were generally not only associated with one of a few divergent local populations. Even on a limited geographical scale between the proximate North Sea and Baltic Sea populations four loci displayed evidence of adaptive evolution. Temporal genome scan analysis applied to DNA from archived otoliths from a Faeroese population demonstrated stability of the intra-population variation over 24 years. An exploratory landscape genetic analysis was used to elucidate potential effects of the most likely environmental factors responsible for the signatures of local adaptation. We found that genetic variation at several of the outlier loci was better correlated with temperature and/or salinity conditions at spawning grounds at spawning time than with geographic distance per se. Conclusion These findings illustrate that adaptive population divergence may indeed be prevalent despite seemingly high levels of gene flow, as found in most marine fishes. Thus, results have important implications for our understanding of the interplay of

  6. Genome-Wide Identification, Evolutionary Analysis and Expression Profiles of LATERAL ORGAN BOUNDARIES DOMAIN Gene Family in Lotus japonicus and Medicago truncatula.

    Directory of Open Access Journals (Sweden)

    Tianquan Yang

    Full Text Available The LATERAL ORGAN BOUNDARIES DOMAIN (LBD gene family has been well-studied in Arabidopsis and play crucial roles in the diverse growth and development processes including establishment and maintenance of boundary of developmental lateral organs. In this study we identified and characterized 38 LBD genes in Lotus japonicus (LjLBD and 57 LBD genes in Medicago truncatula (MtLBD, both of which are model legume plants that have some specific development features absent in Arabidopsis. The phylogenetic relationships, their locations in the genome, genes structure and conserved motifs were examined. The results revealed that all LjLBD and MtLBD genes could be distinctly divided into two classes: Class I and II. The evolutionary analysis showed that Type I functional divergence with some significantly site-specific shifts may be the main force for the divergence between Class I and Class II. In addition, the expression patterns of LjLBD genes uncovered the diverse functions in plant development. Interestingly, we found that two LjLBD proteins that were highly expressed during compound leaf and pulvinus development, can interact via yeast two-hybrid assays. Taken together, our findings provide an evolutionary and genetic foundation in further understanding the molecular basis of LBD gene family in general, specifically in L. japonicus and M. truncatula.

  7. An overview on genome organization of marine organisms.

    Science.gov (United States)

    Costantini, Maria

    2015-12-01

    In this review we will concentrate on some general genome features of marine organisms and their evolution, ranging from vertebrate to invertebrates until unicellular organisms. Before genome sequencing, the ultracentrifugation in CsCl led to high resolution of mammalian DNA (without seeing at the sequence). The analytical profile of human DNA showed that the vertebrate genome is a mosaic of isochores, typically megabase-size DNA segments that belong in a small number of families characterized by different GC levels. The recent availability of a number of fully sequenced genomes allowed mapping very precisely the isochores, based on DNA sequences. Since isochores are tightly linked to biological properties such as gene density, replication timing and recombination, the new level of detail provided by the isochore map helped the understanding of genome structure, function and evolution. This led the current level of knowledge and to further insights. Copyright © 2015. Published by Elsevier B.V.

  8. Murine homeobox-containing gene, Msx-1: analysis of genomic organization, promoter structure, and potential autoregulatory cis-acting elements.

    Science.gov (United States)

    Kuzuoka, M; Takahashi, T; Guron, C; Raghow, R

    1994-05-01

    Detailed molecular organization of the coding and upstream regulatory regions of the murine homeodomain-containing gene, Msx-1, is reported. The protein-encoding portion of the gene is contained in two exons, 590 and 1214 bp in length, separated by a 2107-bp intron; the homeodomain is located in the second exon. The two-exon organization of the murine Msx-1 gene resembles a number of other homeodomain-containing genes. The 5'-(GTAAGT) and 3'-(CCCTAG) splicing junctions and the mRNA polyadenylation signal (UAUAA) of the murine Msx-1 gene are also characteristic of other vertebrate genes. By nuclease protection and primer extension assays, the start of transcription of the Msx-1 gene was located 256 bp upstream of the first AUG. Computer analysis of the promoter proximal 1280-bp sequence revealed a number of potentially important cis-regulatory sequences; these include the recognition elements for Ap-1, Ap-2, Ap-3, Sp-1, a possible binding site for RAR:RXR, and a number of TCF-1 consensus motifs. Importantly, a perfect reverse complement of (C/G)TTAATTG, which was recently shown to be an optimal binding sequence for the homeodomain of Msx-1 protein (K.M. Catron, N. Iler, and C. Abate (1993) Mol. Cell. Biol. 13:2354-2365), was also located in the murine Msx-1 promoter. Binding of bacterially expressed Msx-1 homeodomain polypeptide to Msx-1-specific oligonucleotide was experimentally demonstrated, raising a distinct possibility of autoregulation of this developmentally regulated gene.

  9. Genomic analysis of Xenopus organizer function

    Directory of Open Access Journals (Sweden)

    Suhai Sándor

    2006-06-01

    Full Text Available Abstract Background Studies of the Xenopus organizer have laid the foundation for our understanding of the conserved signaling pathways that pattern vertebrate embryos during gastrulation. The two primary activities of the organizer, BMP and Wnt inhibition, can regulate a spectrum of genes that pattern essentially all aspects of the embryo during gastrulation. As our knowledge of organizer signaling grows, it is imperative that we begin knitting together our gene-level knowledge into genome-level signaling models. The goal of this paper was to identify complete lists of genes regulated by different aspects of organizer signaling, thereby providing a deeper understanding of the genomic mechanisms that underlie these complex and fundamental signaling events. Results To this end, we ectopically overexpress Noggin and Dkk-1, inhibitors of the BMP and Wnt pathways, respectively, within ventral tissues. After isolating embryonic ventral halves at early and late gastrulation, we analyze the transcriptional response to these molecules within the generated ectopic organizers using oligonucleotide microarrays. An efficient statistical analysis scheme, combined with a new Gene Ontology biological process annotation of the Xenopus genome, allows reliable and faithful clustering of molecules based upon their roles during gastrulation. From this data, we identify new organizer-related expression patterns for 19 genes. Moreover, our data sub-divides organizer genes into separate head and trunk organizing groups, which each show distinct responses to Noggin and Dkk-1 activity during gastrulation. Conclusion Our data provides a genomic view of the cohorts of genes that respond to Noggin and Dkk-1 activity, allowing us to separate the role of each in organizer function. These patterns demonstrate a model where BMP inhibition plays a largely inductive role during early developmental stages, thereby initiating the suites of genes needed to pattern dorsal tissues

  10. Genome organization, instabilities, stem cells, and cancer

    Directory of Open Access Journals (Sweden)

    Senthil Kumar Pazhanisamy

    2009-01-01

    Full Text Available It is now widely recognized that advances in exploring genome organization provide remarkable insights on the induction and progression of chromosome abnormalities. Much of what we know about how mutations evolve and consequently transform into genome instabilities has been characterized in the spatial organization context of chromatin. Nevertheless, many underlying concepts of impact of the chromatin organization on perpetuation of multiple mutations and on propagation of chromosomal aberrations remain to be investigated in detail. Genesis of genome instabilities from accumulation of multiple mutations that drive tumorigenesis is increasingly becoming a focal theme in cancer studies. This review focuses on structural alterations evolve to raise a variety of genome instabilities that are manifested at the nucleotide, gene or sub-chromosomal, and whole chromosome level of genome. Here we explore an underlying connection between genome instability and cancer in the light of genome architecture. This review is limited to studies directed towards spatial organizational aspects of origin and propagation of aberrations into genetically unstable tumors.

  11. Genome position and gene amplification

    Czech Academy of Sciences Publication Activity Database

    Jirsová, Pavla; Snijders, A.M.; Kwek, S.; Roydasgupta, R.; Fridlyand, J.; Tokuyasu, T.; Pinkel, D.; Albertson, D. G.

    2007-01-01

    Roč. 8, č. 6 (2007), r120 ISSN 1474-760X Institutional research plan: CEZ:AV0Z50040507; CEZ:AV0Z50040702 Keywords : gene amplification * array comparative genomic hybridization * oncogene Subject RIV: BO - Biophysics Impact factor: 6.589, year: 2007

  12. Genome-wide analysis of the potato Hsp20 gene family: identification, genomic organization and expression profiles in response to heat stress.

    Science.gov (United States)

    Zhao, Peng; Wang, Dongdong; Wang, Ruoqiu; Kong, Nana; Zhang, Chao; Yang, Chenghui; Wu, Wentao; Ma, Haoli; Chen, Qin

    2018-01-18

    Heat shock proteins (Hsps) are essential components in plant tolerance mechanism under various abiotic stresses. Hsp20 is the major family of heat shock proteins, but little of Hsp20 family is known in potato (Solanum tuberosum), which is an important vegetable crop that is thermosensitive. To reveal the mechanisms of potato Hsp20s coping with abiotic stresses, analyses of the potato Hsp20 gene family were conducted using bioinformatics-based methods. In total, 48 putative potato Hsp20 genes (StHsp20s) were identified and named according to their chromosomal locations. A sequence analysis revealed that most StHsp20 genes (89.6%) possessed no, or only one, intron. A phylogenetic analysis indicated that all of the StHsp20 genes, except 10, were grouped into 12 subfamilies. The 48 StHsp20 genes were randomly distributed on 12 chromosomes. Nineteen tandem duplicated StHsp20s and one pair of segmental duplicated genes (StHsp20-15 and StHsp20-48) were identified. A cis-element analysis inferred that StHsp20s, except for StHsp20-41, possessed at least one stress response cis-element. A heatmap of the StHsp20 gene family showed that the genes, except for StHsp20-2 and StHsp20-45, were expressed in various tissues and organs. Real-time quantitative PCR was used to detect the expression level of StHsp20 genes and demonstrated that the genes responded to multiple abiotic stresses, such as heat, salt or drought stress. The relative expression levels of 14 StHsp20 genes (StHsp20-4, 6, 7, 9, 20, 21, 33, 34, 35, 37, 41, 43, 44 and 46) were significantly up-regulated (more than 100-fold) under heat stress. These results provide valuable information for clarifying the evolutionary relationship of the StHsp20 family and in aiding functional characterization of StHsp20 genes in further research.

  13. Comparative genomics of the relationship between gene structure and expression

    NARCIS (Netherlands)

    Ren, X.

    2006-01-01

    The relationship between the structure of genes and their expression is a relatively new aspect of genome organization and regulation. With more genome sequences and expression data becoming available, bioinformatics approaches can help the further elucidation of the relationships between gene

  14. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  15. Two alternatively spliced GPR39 transcripts in seabream: molecular cloning, genomic organization, and regulation of gene expression by metabolic signals.

    Science.gov (United States)

    Zhang, Yong; Liu, Yun; Huang, Xigui; Liu, Xiaochun; Jiao, Baowei; Meng, Zining; Zhu, Pei; Li, Shuisheng; Lin, Haoran; Cheng, Christopher H K

    2008-12-01

    Two GPR39 transcripts, designated as sbGPR39-1a and sbGPR39-1b, were identified in black seabream (Acanthopagrus schlegeli). The deduced amino acid (aa) sequence of sbGPR39-1a contains 423 residues with seven putative transmembrane (TM) domains. On the other hand, sbGPR39-1b contains 284 aa residues with only five putative TM domains. Northern blot analysis confirmed the presence of two GPR39 transcripts in the seabream intestine, stomach, and liver. Apart from seabream, the presence of two GPR39 transcripts was also found to exist in a number of teleosts (zebrafish and pufferfish) and mammals (human and mouse). Analysis of the GPR39 gene structure in different species suggests that the two GPR39 transcripts are generated by alternative splicing. When the seabream receptors were expressed in cultured HEK293 cells, Zn(2)(+) could trigger sbGPR39-1a signaling through the serum response element pathway, but no such functionality could be detected for the sbGPR39-1b receptor. The two receptors were found to be differentially expressed in seabream tissues. sbGPR39-1a is predominantly expressed in the gastrointestinal tract. On the other hand, sbGPR39-1b is widely expressed in most central and peripheral tissues except muscle and ovary. The expression of sbGPR39-1a in the intestine and the expression of sbGPR39-1b in the hypothalamus were decreased significantly during food deprivation in seabream. On the contrary, the expression of the GH secretagogue receptors (sbGHSR-1a and sbGHSR-1b) was significantly increased in the hypothalamus of the food-deprived seabream. The reciprocal regulatory patterns of expression of these two genes suggest that both of them are involved in controlling the physiological response of the organism during starvation.

  16. Uses of antimicrobial genes from microbial genome

    Science.gov (United States)

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  17. Genomic organization of plant aminopropyl transferases.

    Science.gov (United States)

    Rodríguez-Kessler, Margarita; Delgado-Sánchez, Pablo; Rodríguez-Kessler, Gabriela Theresia; Moriguchi, Takaya; Jiménez-Bremont, Juan Francisco

    2010-07-01

    Aminopropyl transferases like spermidine synthase (SPDS; EC 2.5.1.16), spermine synthase and thermospermine synthase (SPMS, tSPMS; EC 2.5.1.22) belong to a class of widely distributed enzymes that use decarboxylated S-adenosylmethionine as an aminopropyl donor and putrescine or spermidine as an amino acceptor to form in that order spermidine, spermine or thermospermine. We describe the analysis of plant genomic sequences encoding SPDS, SPMS, tSPMS and PMT (putrescine N-methyltransferase; EC 2.1.1.53). Genome organization (including exon size, gain and loss, as well as intron number, size, loss, retention, placement and phase, and the presence of transposons) of plant aminopropyl transferase genes were compared between the genomic sequences of SPDS, SPMS and tSPMS from Zea mays, Oryza sativa, Malus x domestica, Populus trichocarpa, Arabidopsis thaliana and Physcomitrella patens. In addition, the genomic organization of plant PMT genes, proposed to be derived from SPDS during the evolution of alkaloid metabolism, is illustrated. Herein, a particular conservation and arrangement of exon and intron sequences between plant SPDS, SPMS and PMT genes that clearly differs with that of ACL5 genes, is shown. The possible acquisition of the plant SPMS exon II and, in particular exon XI in the monocot SPMS genes, is a remarkable feature that allows their differentiation from SPDS genes. In accordance with our in silico analysis, functional complementation experiments of the maize ZmSPMS1 enzyme (previously considered to be SPDS) in yeast demonstrated its spermine synthase activity. Another significant aspect is the conservation of intron sequences among SPDS and PMT paralogs. In addition the existence of microsynteny among some SPDS paralogs, especially in P. trichocarpa and A. thaliana, supports duplication events of plant SPDS genes. Based in our analysis, we hypothesize that SPMS genes appeared with the divergence of vascular plants by a processes of gene duplication and the

  18. The genomic organization of plant pathogenicity in Fusarium species

    NARCIS (Netherlands)

    Rep, M.; Kistler, H.C.

    2010-01-01

    Comparative genomics is a powerful tool to infer the molecular basis of fungal pathogenicity and its evolution by identifying differences in gene content and genomic organization between fungi with different hosts or modes of infection. Through comparative analysis, pathogenicity-related chromosomes

  19. Genomic organization and evolution of the Atlantic salmon hemoglobin repertoire

    Directory of Open Access Journals (Sweden)

    Phillips Ruth B

    2010-10-01

    Full Text Available Abstract Background The genomes of salmonids are considered pseudo-tetraploid undergoing reversion to a stable diploid state. Given the genome duplication and extensive biological data available for salmonids, they are excellent model organisms for studying comparative genomics, evolutionary processes, fates of duplicated genes and the genetic and physiological processes associated with complex behavioral phenotypes. The evolution of the tetrapod hemoglobin genes is well studied; however, little is known about the genomic organization and evolution of teleost hemoglobin genes, particularly those of salmonids. The Atlantic salmon serves as a representative salmonid species for genomics studies. Given the well documented role of hemoglobin in adaptation to varied environmental conditions as well as its use as a model protein for evolutionary analyses, an understanding of the genomic structure and organization of the Atlantic salmon α and β hemoglobin genes is of great interest. Results We identified four bacterial artificial chromosomes (BACs comprising two hemoglobin gene clusters spanning the entire α and β hemoglobin gene repertoire of the Atlantic salmon genome. Their chromosomal locations were established using fluorescence in situ hybridization (FISH analysis and linkage mapping, demonstrating that the two clusters are located on separate chromosomes. The BACs were sequenced and assembled into scaffolds, which were annotated for putatively functional and pseudogenized hemoglobin-like genes. This revealed that the tail-to-tail organization and alternating pattern of the α and β hemoglobin genes are well conserved in both clusters, as well as that the Atlantic salmon genome houses substantially more hemoglobin genes, including non-Bohr β globin genes, than the genomes of other teleosts that have been sequenced. Conclusions We suggest that the most parsimonious evolutionary path leading to the present organization of the Atlantic salmon

  20. Genomic organization and evolution of the Atlantic salmon hemoglobin repertoire

    Science.gov (United States)

    2010-01-01

    Background The genomes of salmonids are considered pseudo-tetraploid undergoing reversion to a stable diploid state. Given the genome duplication and extensive biological data available for salmonids, they are excellent model organisms for studying comparative genomics, evolutionary processes, fates of duplicated genes and the genetic and physiological processes associated with complex behavioral phenotypes. The evolution of the tetrapod hemoglobin genes is well studied; however, little is known about the genomic organization and evolution of teleost hemoglobin genes, particularly those of salmonids. The Atlantic salmon serves as a representative salmonid species for genomics studies. Given the well documented role of hemoglobin in adaptation to varied environmental conditions as well as its use as a model protein for evolutionary analyses, an understanding of the genomic structure and organization of the Atlantic salmon α and β hemoglobin genes is of great interest. Results We identified four bacterial artificial chromosomes (BACs) comprising two hemoglobin gene clusters spanning the entire α and β hemoglobin gene repertoire of the Atlantic salmon genome. Their chromosomal locations were established using fluorescence in situ hybridization (FISH) analysis and linkage mapping, demonstrating that the two clusters are located on separate chromosomes. The BACs were sequenced and assembled into scaffolds, which were annotated for putatively functional and pseudogenized hemoglobin-like genes. This revealed that the tail-to-tail organization and alternating pattern of the α and β hemoglobin genes are well conserved in both clusters, as well as that the Atlantic salmon genome houses substantially more hemoglobin genes, including non-Bohr β globin genes, than the genomes of other teleosts that have been sequenced. Conclusions We suggest that the most parsimonious evolutionary path leading to the present organization of the Atlantic salmon hemoglobin genes involves

  1. Pichia stipitis genomics, transcriptomics, and gene clusters

    Science.gov (United States)

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  2. The genome of Paenibacillus sabinae T27 provides insight into evolution, organization and functional elucidation of nif and nif-like genes

    OpenAIRE

    Li, Xinxin; Deng, Zhiping; Liu, Zhanzhi; Yan, Yongliang; Wang, Tianshu; Xie, Jianbo; Lin, Min; Cheng, Qi; Chen, Sanfeng

    2014-01-01

    Background Most biological nitrogen fixation is catalyzed by the molybdenum nitrogenase. This enzyme is a complex which contains the MoFe protein encoded by nifDK and the Fe protein encoded by nifH. In addition to nifHDK, nifHDK-like genes were found in some Archaea and Firmicutes, but their function is unclear. Results We sequenced the genome of Paenibacillus sabinae T27. A total of 4,793 open reading frames were predicted from its 5.27 Mb genome. The genome of P. sabinae T27 contains fiftee...

  3. Gene conversion in the rice genome

    DEFF Research Database (Denmark)

    Xu, Shuqing; Clark, Terry; Zheng, Hongkun

    2008-01-01

    -chromosomal conversions distributed between chromosome 1 and 5, 2 and 6, and 3 and 5 are more frequent than genome average (Z-test, P ... is not tightly linked to natural selection in the rice genome. To assess the contribution of segmental duplication on gene conversion statistics, we determined locations of conversion partners with respect to inter-chromosomal segment duplication. The number of conversions associated with segmentation is less...... involved in conversion events. CONCLUSION: The evolution of gene families in the rice genome may have been accelerated by conversion with pseudogenes. Our analysis suggests a possible role for gene conversion in the evolution of pathogen-response genes....

  4. Genomic organization and identification of promoter regions for the BDNF gene in the pond turtle Trachemys scripta elegans.

    Science.gov (United States)

    Ambigapathy, Ganesh; Zheng, Zhaoqing; Keifer, Joyce

    2014-08-01

    Brain-derived neurotrophic factor (BDNF) is an important regulator of neuronal development and synaptic function. The BDNF gene undergoes significant activity-dependent regulation during learning. Here, we identified the BDNF promoter regions, transcription start sites, and potential regulatory sequences for BDNF exons I-III that may contribute to activity-dependent gene and protein expression in the pond turtle Trachemys scripta elegans (tBDNF). By using transfection of BDNF promoter/luciferase plasmid constructs into human neuroblastoma SHSY5Y cells and mouse embryonic fibroblast NIH3T3 cells, we identified the basal regulatory activity of promoter sequences located upstream of each tBDNF exon, designated as pBDNFI-III. Further, through chromatin immunoprecipitation (ChIP) assays, we detected CREB binding directly to exon I and exon III promoters, while BHLHB2, but not CREB, binds within the exon II promoter. Elucidation of the promoter regions and regulatory protein binding sites in the tBDNF gene is essential for understanding the regulatory mechanisms that control tBDNF gene expression.

  5. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes

    Science.gov (United States)

    Matus, José Tomás; Aquea, Felipe; Arce-Johnson, Patricio

    2008-01-01

    Background The MYB superfamily constitutes the most abundant group of transcription factors described in plants. Members control processes such as epidermal cell differentiation, stomatal aperture, flavonoid synthesis, cold and drought tolerance and pathogen resistance. No genome-wide characterization of this family has been conducted in a woody species such as grapevine. In addition, previous analysis of the recently released grape genome sequence suggested expansion events of several gene families involved in wine quality. Results We describe and classify 108 members of the grape R2R3 MYB gene subfamily in terms of their genomic gene structures and similarity to their putative Arabidopsis thaliana orthologues. Seven gene models were derived and analyzed in terms of gene expression and their DNA binding domain structures. Despite low overall sequence homology in the C-terminus of all proteins, even in those with similar functions across Arabidopsis and Vitis, highly conserved motif sequences and exon lengths were found. The grape epidermal cell fate clade is expanded when compared with the Arabidopsis and rice MYB subfamilies. Two anthocyanin MYBA related clusters were identified in chromosomes 2 and 14, one of which includes the previously described grape colour locus. Tannin related loci were also detected with eight candidate homologues in chromosomes 4, 9 and 11. Conclusion This genome wide transcription factor analysis in Vitis suggests that clade-specific grape R2R3 MYB genes are expanded while other MYB genes could be well conserved compared to Arabidopsis. MYB gene abundance, homology and orientation within particular loci also suggests that expanded MYB clades conferring quality attributes of grapes and wines, such as colour and astringency, could possess redundant, overlapping and cooperative functions. PMID:18647406

  6. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes

    Directory of Open Access Journals (Sweden)

    Arce-Johnson Patricio

    2008-07-01

    Full Text Available Abstract Background The MYB superfamily constitutes the most abundant group of transcription factors described in plants. Members control processes such as epidermal cell differentiation, stomatal aperture, flavonoid synthesis, cold and drought tolerance and pathogen resistance. No genome-wide characterization of this family has been conducted in a woody species such as grapevine. In addition, previous analysis of the recently released grape genome sequence suggested expansion events of several gene families involved in wine quality. Results We describe and classify 108 members of the grape R2R3 MYB gene subfamily in terms of their genomic gene structures and similarity to their putative Arabidopsis thaliana orthologues. Seven gene models were derived and analyzed in terms of gene expression and their DNA binding domain structures. Despite low overall sequence homology in the C-terminus of all proteins, even in those with similar functions across Arabidopsis and Vitis, highly conserved motif sequences and exon lengths were found. The grape epidermal cell fate clade is expanded when compared with the Arabidopsis and rice MYB subfamilies. Two anthocyanin MYBA related clusters were identified in chromosomes 2 and 14, one of which includes the previously described grape colour locus. Tannin related loci were also detected with eight candidate homologues in chromosomes 4, 9 and 11. Conclusion This genome wide transcription factor analysis in Vitis suggests that clade-specific grape R2R3 MYB genes are expanded while other MYB genes could be well conserved compared to Arabidopsis. MYB gene abundance, homology and orientation within particular loci also suggests that expanded MYB clades conferring quality attributes of grapes and wines, such as colour and astringency, could possess redundant, overlapping and cooperative functions.

  7. Genome-Wide Comparative Gene Family Classification

    Science.gov (United States)

    Frech, Christian; Chen, Nansheng

    2010-01-01

    Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221

  8. Expression and genomic organization of zonadhesin-like genes in three species of fish give insight into the evolutionary history of a mosaic protein

    Directory of Open Access Journals (Sweden)

    Davidson William S

    2005-11-01

    Full Text Available Abstract Background The mosaic sperm protein zonadhesin (ZAN has been characterized in mammals and is implicated in species-specific egg-sperm binding interactions. The genomic structure and testes-specific expression of zonadhesin is known for many mammalian species. All zonadhesin genes characterized to date consist of meprin A5 antigen receptor tyrosine phosphatase mu (MAM domains, mucin tandem repeats, and von Willebrand (VWD adhesion domains. Here we investigate the genomic structure and expression of zonadhesin-like genes in three species of fish. Results The cDNA and corresponding genomic locus of a zonadhesin-like gene (zlg in Atlantic salmon (Salmo salar were sequenced. Zlg is similar in adhesion domain content to mammalian zonadhesin; however, the domain order is altered. Analysis of puffer fish (Takifugu rubripes and zebrafish (Danio rerio sequence data identified zonadhesin (zan genes that share the same domain order, content, and a conserved syntenic relationship with mammalian zonadhesin. A zonadhesin-like gene in D. rerio was also identified. Unlike mammalian zonadhesin, D. rerio zan and S. salar zlg were expressed in the gut and not in the testes. Conclusion We characterized likely orthologs of zonadhesin in both T. rubripes and D. rerio and uncovered zonadhesin-like genes in S. salar and D. rerio. Each of these genes contains MAM, mucin, and VWD domains. While these domains are associated with several proteins that show prominent gut expression, their combination is unique to zonadhesin and zonadhesin-like genes in vertebrates. The expression patterns of fish zonadhesin and zonadhesin-like genes suggest that the reproductive role of zonadhesin evolved later in the mammalian lineage.

  9. Simultaneous gene finding in multiple genomes.

    Science.gov (United States)

    König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario

    2016-11-15

    As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Comparative Genomic Analysis of Soybean Flowering Genes

    Science.gov (United States)

    Jung, Chol-Hee; Wong, Chui E.; Singh, Mohan B.; Bhalla, Prem L.

    2012-01-01

    Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja) revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant, Arabidopsis. PMID:22679494

  11. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    ://www.cbs.dtu.dk/ws/BLASTatlas), where programming examples are available in Perl. By providing an interoperable method to carry out whole genome visualization of homology, this service offers bioinformaticians as well as biologists an easy-to-adopt workflow that can be directly called from the programming language of the user, hence......The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced...... genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including...

  12. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) progr...... is freely available on a web server at http://fgf.genomics.org.cn/...

  13. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...... in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H. influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions....

  14. Genome Binding and Gene Regulation by Stem Cell Transcription Factors

    NARCIS (Netherlands)

    J.H. Brandsma (Johan)

    2016-01-01

    markdownabstractNearly all cells of an individual organism contain the same genome. However, each cell type transcribes a different set of genes due to the presence of different sets of cell type-specific transcription factors. Such transcription factors bind to regulatory regions such as promoters

  15. Expression of a transferred nuclear gene in a mitochondrial genome

    Directory of Open Access Journals (Sweden)

    Yichun Qiu

    2014-08-01

    Full Text Available Transfer of mitochondrial genes to the nucleus, and subsequent gain of regulatory elements for expression, is an ongoing evolutionary process in plants. Many examples have been characterized, which in some cases have revealed sources of mitochondrial targeting sequences and cis-regulatory elements. In contrast, there have been no reports of a nuclear gene that has undergone intracellular transfer to the mitochondrial genome and become expressed. Here we show that the orf164 gene in the mitochondrial genome of several Brassicaceae species, including Arabidopsis, is derived from the nuclear ARF17 gene that codes for an auxin responsive protein and is present across flowering plants. Orf164 corresponds to a portion of ARF17, and the nucleotide and amino acid sequences are 79% and 81% identical, respectively. Orf164 is transcribed in several organ types of Arabidopsis thaliana, as detected by RT-PCR. In addition, orf164 is transcribed in five other Brassicaceae within the tribes Camelineae, Erysimeae and Cardamineae, but the gene is not present in Brassica or Raphanus. This study shows that nuclear genes can be transferred to the mitochondrial genome and become expressed, providing a new perspective on the movement of genes between the genomes of subcellular compartments.

  16. Biology, genome organization and evolution of parvoviruses in marine shrimp

    Science.gov (United States)

    A number of parvoviruses are now know to infect marine shrimp, and these viruses alone or in combination with other viruses have the potential to cause major losses in shrimp aquaculture globally. This review provides a comprehensive overview of the biology, genome organization, gene expression, and...

  17. Structural Genomics of Minimal Organisms: Pipeline and Results

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  18. Tandemly Arrayed Genes in Vertebrate Genomes

    Directory of Open Access Journals (Sweden)

    Deng Pan

    2008-01-01

    Full Text Available Tandemly arrayed genes (TAGs are duplicated genes that are linked as neighbors on a chromosome, many of which have important physiological and biochemical functions. Here we performed a survey of these genes in 11 available vertebrate genomes. TAGs account for an average of about 14% of all genes in these vertebrate genomes, and about 25% of all duplications. The majority of TAGs (72–94% have parallel transcription orientation (i.e., they are encoded on the same strand in contrast to the genome, which has about 50% of its genes in parallel transcription orientation. The majority of tandem arrays have only two members. In all species, the proportion of genes that belong to TAGs tends to be higher in large gene families than in small ones; together with our recent finding that tandem duplication played a more important role than retroposition in large families, this fact suggests that among all types of duplication mechanisms, tandem duplication is the predominant mechanism of duplication, especially in large families. Finally, several species have a higher proportion of large tandem arrays that are species-specific than random expectation.

  19. Regulation of methane genes and genome expression

    Energy Technology Data Exchange (ETDEWEB)

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  20. Gene Transfers Between Distantly Related Organisms

    Science.gov (United States)

    Doolittle, Russell F.

    2003-01-01

    With the completion of numerous microbial genome sequences, reports of individual gene transfers between distantly related prokaryotes have become commonplace. On the other hand, transfers between prokaryotes and eukaryotes still excite the imagination. Many of these claims may be premature, but some are certainly valid. In this chapter, the kinds of supporting data needed to propose transfers between distantly related organisms and cite some interesting examples are considered.

  1. Mycoplasma hyopneumoniae Transcription Unit Organization: Genome Survey and Prediction

    Science.gov (United States)

    Siqueira, Franciele Maboni; Schrank, Augusto; Schrank, Irene Silveira

    2011-01-01

    Mycoplasma hyopneumoniae is associated with swine respiratory diseases. Although gene organization and regulation are well known in many prokaryotic organisms, knowledge on mycoplasma is limited. This study performed a comparative analysis of three strains of M. hyopneumoniae (7448, J and 232), with a focus on genome organization and gene comparison for open read frame (ORF) cluster (OC) identification. An in silico analysis of gene organization demonstrated 117 OCs and 34 single ORFs in M. hyopneumoniae 7448 and J, while 116 OCs and 36 single ORFs were identified in M. hyopneumoniae 232. Genomic comparison revealed high synteny and conservation of gene order between the OCs defined for 7448 and J strains as well as for 7448 and 232 strains. Twenty-one OCs were chosen and experimentally confirmed by reverse transcription–PCR from M. hyopneumoniae 7448 genome, validating our prediction. A subset of the ORFs within an OC could be independently transcribed due to the presence of internal promoters. Our results suggest that transcription occurs in ‘run-on’ from an upstream promoter in M. hyopneumoniae, thus forming large ORF clusters (from 2 to 29 ORFs in the same orientation) and indicating a complex transcriptional organization. PMID:22086999

  2. Genome-wide identification of KANADI1 target genes.

    Directory of Open Access Journals (Sweden)

    Paz Merelo

    Full Text Available Plant organ development and polarity establishment is mediated by the action of several transcription factors. Among these, the KANADI (KAN subclade of the GARP protein family plays important roles in polarity-associated processes during embryo, shoot and root patterning. In this study, we have identified a set of potential direct target genes of KAN1 through a combination of chromatin immunoprecipitation/DNA sequencing (ChIP-Seq and genome-wide transcriptional profiling using tiling arrays. Target genes are over-represented for genes involved in the regulation of organ development as well as in the response to auxin. KAN1 affects directly the expression of several genes previously shown to be important in the establishment of polarity during lateral organ and vascular tissue development. We also show that KAN1 controls through its target genes auxin effects on organ development at different levels: transport and its regulation, and signaling. In addition, KAN1 regulates genes involved in the response to abscisic acid, jasmonic acid, brassinosteroids, ethylene, cytokinins and gibberellins. The role of KAN1 in organ polarity is antagonized by HD-ZIPIII transcription factors, including REVOLUTA (REV. A comparison of their target genes reveals that the REV/KAN1 module acts in organ patterning through opposite regulation of shared targets. Evidence of mutual repression between closely related family members is also shown.

  3. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure.

    Science.gov (United States)

    Gordon, Sean P; Contreras-Moreira, Bruno; Woods, Daniel P; Des Marais, David L; Burgess, Diane; Shu, Shengqiang; Stritt, Christoph; Roulin, Anne C; Schackwitz, Wendy; Tyler, Ludmila; Martin, Joel; Lipzen, Anna; Dochy, Niklas; Phillips, Jeremy; Barry, Kerrie; Geuten, Koen; Budak, Hikmet; Juenger, Thomas E; Amasino, Richard; Caicedo, Ana L; Goodstein, David; Davidson, Patrick; Mur, Luis A J; Figueroa, Melania; Freeling, Michael; Catalan, Pilar; Vogel, John P

    2017-12-19

    While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely to be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.

  4. The footprint of metabolism in the organization of mammalian genomes

    Directory of Open Access Journals (Sweden)

    Berná Luisa

    2012-05-01

    Full Text Available Abstract Background At present five evolutionary hypotheses have been proposed to explain the great variability of the genomic GC content among and within genomes: the mutational bias, the biased gene conversion, the DNA breakpoints distribution, the thermal stability and the metabolic rate. Several studies carried out on bacteria and teleostean fish pointed towards the critical role played by the environment on the metabolic rate in shaping the base composition of genomes. In mammals the debate is still open, and evidences have been produced in favor of each evolutionary hypothesis. Human genes were assigned to three large functional categories (as well as to the corresponding functional classes according to the KOG database: (i information storage and processing, (ii cellular processes and signaling, and (iii metabolism. The classification was extended to the organisms so far analyzed performing a reciprocal Blastp and selecting the best reciprocal hit. The base composition was calculated for each sequence of the whole CDS dataset. Results The GC3 level of the above functional categories was increasing from (i to (iii. This specific compositional pattern was found, as footprint, in all mammalian genomes, but not in frog and lizard ones. Comparative analysis of human versus both frog and lizard functional categories showed that genes involved in the metabolic processes underwent the highest GC3 increment. Analyzing the KOG functional classes of genes, again a well defined intra-genomic pattern was found in all mammals. Not only genes of metabolic pathways, but also genes involved in chromatin structure and dynamics, transcription, signal transduction mechanisms and cytoskeleton, showed an average GC3 level higher than that of the whole genome. In the case of the human genome, the genes of the aforementioned functional categories showed a high probability to be associated with the chromosomal bands. Conclusions In the light of different

  5. Genome Organization Drives Chromosome Fragility.

    Science.gov (United States)

    Canela, Andres; Maman, Yaakov; Jung, Seolkyoung; Wong, Nancy; Callen, Elsa; Day, Amanda; Kieffer-Kwon, Kyong-Rim; Pekowska, Aleksandra; Zhang, Hongliang; Rao, Suhas S P; Huang, Su-Chen; Mckinnon, Peter J; Aplan, Peter D; Pommier, Yves; Aiden, Erez Lieberman; Casellas, Rafael; Nussenzweig, André

    2017-07-27

    In this study, we show that evolutionarily conserved chromosome loop anchors bound by CCCTC-binding factor (CTCF) and cohesin are vulnerable to DNA double strand breaks (DSBs) mediated by topoisomerase 2B (TOP2B). Polymorphisms in the genome that redistribute CTCF/cohesin occupancy rewire DNA cleavage sites to novel loop anchors. While transcription- and replication-coupled genomic rearrangements have been well documented, we demonstrate that DSBs formed at loop anchors are largely transcription-, replication-, and cell-type-independent. DSBs are continuously formed throughout interphase, are enriched on both sides of strong topological domain borders, and frequently occur at breakpoint clusters commonly translocated in cancer. Thus, loop anchors serve as fragile sites that generate DSBs and chromosomal rearrangements. VIDEO ABSTRACT. Published by Elsevier Inc.

  6. Hsf and Hsp gene families in Populus: genome-wide identification, organization and correlated expression during development and in stress responses.

    Science.gov (United States)

    Zhang, Jin; Liu, Bobin; Li, Jianbo; Zhang, Li; Wang, Yan; Zheng, Huanquan; Lu, Mengzhu; Chen, Jun

    2015-03-14

    Heat shock proteins (Hsps) are molecular chaperones that are involved in many normal cellular processes and stress responses, and heat shock factors (Hsfs) are the transcriptional activators of Hsps. Hsfs and Hsps are widely coordinated in various biological processes. Although the roles of Hsfs and Hsps in stress responses have been well characterized in Arabidopsis, their roles in perennial woody species undergoing various environmental stresses remain unclear. Here, a comprehensive identification and analysis of Hsf and Hsp families in poplars is presented. In Populus trichocarpa, we identified 42 paralogous pairs, 66.7% resulting from a whole genome duplication. The gene structure and motif composition are relatively conserved in each subfamily. Microarray and quantitative real-time RT-PCR analyses showed that most of the Populus Hsf and Hsp genes are differentially expressed upon exposure to various stresses. A coexpression network between Populus Hsf and Hsp genes was generated based on their expression. Coordinated relationships were validated by transient overexpression and subsequent qPCR analyses. The comprehensive analysis indicates that different sets of PtHsps are downstream of particular PtHsfs and provides a basis for functional studies aimed at revealing the roles of these families in poplar development and stress responses.

  7. The chromosomal organization of horizontal gene transfer in bacteria.

    Science.gov (United States)

    Oliveira, Pedro H; Touchon, Marie; Cury, Jean; Rocha, Eduardo P C

    2017-10-10

    Bacterial adaptation is accelerated by the acquisition of novel traits through horizontal gene transfer, but the integration of these genes affects genome organization. We found that transferred genes are concentrated in only ~1% of the chromosomal regions (hotspots) in 80 bacterial species. This concentration increases with genome size and with the rate of transfer. Hotspots diversify by rapid gene turnover; their chromosomal distribution depends on local contexts (neighboring core genes), and content in mobile genetic elements. Hotspots concentrate most changes in gene repertoires, reduce the trade-off between genome diversification and organization, and should be treasure troves of strain-specific adaptive genes. Most mobile genetic elements and antibiotic resistance genes are in hotspots, but many hotspots lack recognizable mobile genetic elements and exhibit frequent homologous recombination at flanking core genes. Overrepresentation of hotspots with fewer mobile genetic elements in naturally transformable bacteria suggests that homologous recombination and horizontal gene transfer are tightly linked in genome evolution.Horizontal gene transfer (HGT) is an important mechanism for genome evolution and adaptation in bacteria. Here, Oliveira and colleagues find HGT hotspots comprising  ~ 1% of the chromosomal regions in 80 bacterial species.

  8. On Computing Breakpoint Distances for Genomes with Duplicate Genes.

    Science.gov (United States)

    Shao, Mingfu; Moret, Bernard M E

    2017-06-01

    A fundamental problem in comparative genomics is to compute the distance between two genomes in terms of its higher level organization (given by genes or syntenic blocks). For two genomes without duplicate genes, we can easily define (and almost always efficiently compute) a variety of distance measures, but the problem is NP-hard under most models when genomes contain duplicate genes. To tackle duplicate genes, three formulations (exemplar, maximum matching, and any matching) have been proposed, all of which aim to build a matching between homologous genes so as to minimize some distance measure. Of the many distance measures, the breakpoint distance (the number of nonconserved adjacencies) was the first one to be studied and remains of significant interest because of its simplicity and model-free property. The three breakpoint distance problems corresponding to the three formulations have been widely studied. Although we provided last year a solution for the exemplar problem that runs very fast on full genomes, computing optimal solutions for the other two problems has remained challenging. In this article, we describe very fast, exact algorithms for these two problems. Our algorithms rely on a compact integer-linear program that we further simplify by developing an algorithm to remove variables, based on new results on the structure of adjacencies and matchings. Through extensive experiments using both simulations and biological data sets, we show that our algorithms run very fast (in seconds) on mammalian genomes and scale well beyond. We also apply these algorithms (as well as the classic orthology tool MSOAR) to create orthology assignment, then compare their quality in terms of both accuracy and coverage. We find that our algorithm for the "any matching" formulation significantly outperforms other methods in terms of accuracy while achieving nearly maximum coverage.

  9. Comparative genomics of Mycoplasma: analysis of conserved essential genes and diversity of the pan-genome.

    Directory of Open Access Journals (Sweden)

    Wei Liu

    Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.

  10. Identification of candidate new cancer susceptibility genes using yeast genomics

    International Nuclear Information System (INIS)

    Brown, M.; Brown, J.A.; Game, J.C.

    2003-01-01

    A large proportion of cancer susceptibility syndromes are the result of mutations in genes in DNA repair or in cell-cycle checkpoints in response to DNA damage, such as ataxia telangiectasia (AT), Fanconi's anemia (FA), Bloom's syndrome (BS), Nijmegen breakage syndrome (NBS), and xeroderma pigmentosum (XP). Mutations in these genes often cause gross chromosomal instability leading to an increased mutation rate of all genes including those directly responsible for cancer. We have proposed that because the orthologs of these genes in budding yeast, S. cerevisiae, confer protection against killing by DNA damaging agents it should be possible to identify new cancer susceptibility genes by identifying yeast genes whose deletion causes sensitivity to DNA damage. We therefore screened the recently completed collection of individual gene deletion mutants to identify genes that affect sensitivity to DNA-damaging agents. Screening for sensitivity in this obtained up to now with the F98 glioma model othe fact that each deleted gene is replaced by a cassette containing two molecular 'barcodes', or 20-mers, that uniquely identify the strain when DNA from a pool of strains is hybridized to an oligonucleotide array containing the complementary sequences of the barcodes. We performed the screen with UV, IR, H 2 0 2 and other DNA damaging agents. In addition to identifying genes already known to confer resistance to DNA damaging agents we have identified, and individually confirmed, several genes not previously associated with resistance. Several of these are of unknown function. We have also examined the chromosomal stability of selected strains and found that IR sensitive strains often but not always exhibit genomic instability. We are presently constructing a yeast artificial chromosome to globally interrogate all the genes in the deletion pool for their involvement in genomic stability. This work shows that budding yeast is a valuable eukaryotic model organism to identify

  11. Molecular cloning, genomic organization, chromosome mapping, tissues expression pattern and identification of a novel splicing variant of porcine CIDEb gene

    International Nuclear Information System (INIS)

    Li, YanHua; Li, AiHua; Yang, Z.Q.

    2016-01-01

    Cell death-inducing DNA fragmentation factor-α-like effector b (CIDEb) is a member of the CIDE family of apoptosis-inducing factors, CIDEa and CIDEc have been reported to be Lipid droplets (LDs)-associated proteins that promote atypical LD fusion in adipocytes, and responsible for liver steatosis under fasting and obese conditions, whereas CIDEb promotes lipid storage under normal diet conditions [1], and promotes the formation of triacylglyceride-enriched VLDL particles in hepatocytes [2]. Here, we report the gene cloning, chromosome mapping, tissue distribution, genetic expression analysis, and identification of a novel splicing variant of the porcine CIDEb gene. Sequence analysis shows that the open reading frame of the normal porcine CIDEb isoform covers 660bp and encodes a 219-amino acid polypeptide, whereas its alternative splicing variant encodes a 142-amino acid polypeptide truncated at the fourth exon and comprised of the CIDE-N domain and part of the CIDE-C domain. The deduced amino acid sequence of normal porcine CIDEb shows an 85.8% similarity to the human protein and 80.0% to the mouse protein. The CIDEb genomic sequence spans approximately 6KB comprised of five exons and four introns. Radiation hybrid mapping demonstrated that porcine CIDEb is located at chromosome 7q21 and at a distance of 57cR from the most significantly linked marker, S0334, regions that are syntenic with the corresponding region in the human genome. Tissue expression analysis indicated that normal CIDEb mRNA is ubiquitously expressed in many porcine tissues. It was highly expressed in white adipose tissue and was observed at relatively high levels in the liver, lung, small intestine, lymphatic tissue and brain. The normal version of CIDEb was the predominant form in all tested tissues, whereas the splicing variant was expressed at low levels in all examined tissues except the lymphatic tissue. Furthermore, genetic expression analysis indicated that CIDEb mRNA levels were

  12. Molecular cloning, genomic organization, chromosome mapping, tissues expression pattern and identification of a novel splicing variant of porcine CIDEb gene

    Energy Technology Data Exchange (ETDEWEB)

    Li, YanHua, E-mail: liyanhua.1982@aliyun.com [Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Translational Medical Research in Cognitive Development and Learning and Memory Disorders, China International Science and Technology Cooperation base of Child development and Critical Disorders, Children’s Hospital of Chongqing Medical University, Chongqing 400014 (China); Li, AiHua [Chongqing Cancer Institute & Hospital & Cancer Center, Chongqing 404100 (China); Yang, Z.Q. [Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070 (China)

    2016-09-09

    Cell death-inducing DNA fragmentation factor-α-like effector b (CIDEb) is a member of the CIDE family of apoptosis-inducing factors, CIDEa and CIDEc have been reported to be Lipid droplets (LDs)-associated proteins that promote atypical LD fusion in adipocytes, and responsible for liver steatosis under fasting and obese conditions, whereas CIDEb promotes lipid storage under normal diet conditions [1], and promotes the formation of triacylglyceride-enriched VLDL particles in hepatocytes [2]. Here, we report the gene cloning, chromosome mapping, tissue distribution, genetic expression analysis, and identification of a novel splicing variant of the porcine CIDEb gene. Sequence analysis shows that the open reading frame of the normal porcine CIDEb isoform covers 660bp and encodes a 219-amino acid polypeptide, whereas its alternative splicing variant encodes a 142-amino acid polypeptide truncated at the fourth exon and comprised of the CIDE-N domain and part of the CIDE-C domain. The deduced amino acid sequence of normal porcine CIDEb shows an 85.8% similarity to the human protein and 80.0% to the mouse protein. The CIDEb genomic sequence spans approximately 6KB comprised of five exons and four introns. Radiation hybrid mapping demonstrated that porcine CIDEb is located at chromosome 7q21 and at a distance of 57cR from the most significantly linked marker, S0334, regions that are syntenic with the corresponding region in the human genome. Tissue expression analysis indicated that normal CIDEb mRNA is ubiquitously expressed in many porcine tissues. It was highly expressed in white adipose tissue and was observed at relatively high levels in the liver, lung, small intestine, lymphatic tissue and brain. The normal version of CIDEb was the predominant form in all tested tissues, whereas the splicing variant was expressed at low levels in all examined tissues except the lymphatic tissue. Furthermore, genetic expression analysis indicated that CIDEb mRNA levels were

  13. cDNA cloning, genomic organization and expression analysis during somatic embryogenesis of the translationally controlled tumor protein (TCTP) gene from Japanese larch (Larix leptolepis).

    Science.gov (United States)

    Zhang, Li-Feng; Li, Wan-Feng; Han, Su-Ying; Yang, Wen-Hua; Qi, Li-Wang

    2013-10-15

    A full-length cDNA and genomic sequences of a translationally controlled tumor protein (TCTP) gene were isolated from Japanese larch (Larix leptolepis) and designated LaTCTP. The length of the cDNA was 1, 043 bp and contained a 504 bp open reading frame that encodes a predicted protein of 167 amino acids, characterized by two signature sequences of the TCTP protein family. Analysis of the LaTCTP gene structure indicated four introns and five exons, and it is the largest of all currently known TCTP genes in plants. The 5'-flanking promoter region of LaTCTP was cloned using an improved TAIL-PCR technique. In this region we identified many important potential cis-acting elements, such as a Box-W1 (fungal elicitor responsive element), a CAT-box (cis-acting regulatory element related to meristem expression), a CGTCA-motif (cis-acting regulatory element involved in MeJA-responsiveness), a GT1-motif (light responsive element), a Skn-1-motif (cis-acting regulatory element required for endosperm expression) and a TGA-element (auxin-responsive element), suggesting that expression of LaTCTP is highly regulated. Expression analysis demonstrated ubiquitous localization of LaTCTP mRNA in the roots, stems and needles, high mRNA levels in the embryonal-suspensor mass (ESM), browning embryogenic cultures and mature somatic embryos, and low levels of mRNA at day five during somatic embryogenesis. We suggest that LaTCTP might participate in the regulation of somatic embryo development. These results provide a theoretical basis for understanding the molecular regulatory mechanism of LaTCTP and lay the foundation for artificial regulation of somatic embryogenesis. © 2013.

  14. Identification and genome organization of saponin pathway genes from a wild crucifer, and their use for transient production of saponins in Nicotiana benthamiana.

    Science.gov (United States)

    Khakimov, Bekzod; Kuzina, Vera; Erthmann, Pernille Ø; Fukushima, Ery Odette; Augustin, Jörg M; Olsen, Carl Erik; Scholtalbers, Jelle; Volpin, Hanne; Andersen, Sven Bode; Hauser, Thure P; Muranaka, Toshiya; Bak, Søren

    2015-11-01

    The ability to evolve novel metabolites has been instrumental for the defence of plants against antagonists. A few species in the Barbarea genus are the only crucifers known to produce saponins, some of which make plants resistant to specialist herbivores, like Plutella xylostella, the diamondback moth. Genetic mapping in Barbarea vulgaris revealed that genes for saponin biosynthesis are not clustered but are located in different linkage groups. Using co-location with quantitative trait loci (QTLs) for resistance, transcriptome and genome sequences, we identified two 2,3-oxidosqualene cyclases that form the major triterpenoid backbones. LUP2 mainly produces lupeol, and is preferentially expressed in insect-susceptible B. vulgaris plants, whereas LUP5 produces β-amyrin and α-amyrin, and is preferentially expressed in resistant plants; β-amyrin is the backbone for the resistance-conferring saponins in Barbarea. Two loci for cytochromes P450, predicted to add functional groups to the saponin backbone, were identified: CYP72As co-localized with insect resistance, whereas CYP716As did not. When B. vulgaris sapogenin biosynthesis genes were transiently expressed by CPMV-HT technology in Nicotiana benthamiana, high levels of hydroxylated and carboxylated triterpenoid structures accumulated, including oleanolic acid, which is a precursor of the major resistance-conferring saponins. When the B. vulgaris gene for sapogenin 3-O-glucosylation was co-expressed, the insect deterrent 3-O-oleanolic acid monoglucoside accumulated, as well as triterpene structures with up to six hexoses, demonstrating that N. benthamiana further decorates the monoglucosides. We argue that saponin biosynthesis in the Barbarea genus evolved by a neofunctionalized glucosyl transferase, whereas the difference between resistant and susceptible B. vulgaris chemotypes evolved by different expression of oxidosqualene cyclases (OSCs). © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons

  15. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    Directory of Open Access Journals (Sweden)

    Yunsheng Wang

    Full Text Available In this study, we identified and compared nucleotide-binding site (NBS domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China. Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  16. Whole genome DNA methylation: beyond genes silencing

    OpenAIRE

    Tirado-Magallanes, Roberto; Rebbani, Khadija; Lim, Ricky; Pradhan, Sriharsa; Benoukraf, Touati

    2016-01-01

    The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation at near base pair level resolution, far beyond that of the kilobase-long canonical CpG islands that initially revealed the biological relevance of this covalent DNA modification. The latest high-resolution studies have revealed a role for very punctual DNA methylation in chromatin plasticity, gene regulation and splicing. Here, we aim to outline the ...

  17. Gene Composer in a structural genomics environment

    International Nuclear Information System (INIS)

    Lorimer, Don; Raymond, Amy; Mixon, Mark; Burgin, Alex; Staker, Bart; Stewart, Lance

    2011-01-01

    For structural biology applications, protein-construct engineering is guided by comparative sequence analysis and structural information, which allow the researcher to better define domain boundaries for terminal deletions and nonconserved regions for surface mutants. A database software application called Gene Composer has been developed to facilitate construct design. The structural genomics effort at the Seattle Structural Genomics Center for Infectious Disease (SSGCID) requires the manipulation of large numbers of amino-acid sequences and the underlying DNA sequences which are to be cloned into expression vectors. To improve efficiency in high-throughput protein structure determination, a database software package, Gene Composer, has been developed which facilitates the information-rich design of protein constructs and their underlying gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bioinformatics steps used in modern structure-guided protein engineering and synthetic gene engineering. An example of the structure determination of H1N1 RNA-dependent RNA polymerase PB2 subunit is given

  18. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...

  19. Two duplicated chicken-type lysozyme genes in disc abalone Haliotis discus discus: molecular aspects in relevance to structure, genomic organization, mRNA expression and bacteriolytic function.

    Science.gov (United States)

    Umasuthan, Navaneethaiyer; Bathige, S D N K; Kasthuri, Saranya Revathy; Wan, Qiang; Whang, Ilson; Lee, Jehee

    2013-08-01

    Lysozymes are crucial antibacterial proteins that are associated with catalytic cleavage of peptidoglycan and subsequent bacteriolysis. The present study describes the identification of two lysozyme genes from disc abalone Haliotis discus discus and their characterization at sequence-, genomic-, transcriptional- and functional-levels. Two cDNAs and BAC clones bearing lysozyme genes were isolated from abalone transcriptome and BAC genomic libraries, respectively and sequences were determined. Corresponding deduced amino acid sequences harbored a chicken-type lysozyme (LysC) family profile and exhibited conserved characteristics of LysC family members including active residues (Glu and Asp) and GS(S/T)DYGIFQINS motif suggested that they are LysC counterparts in disc abalone and designated as abLysC1 and abLysC2. While abLysC1 represented the homolog recently reported in Ezo abalone [1], abLysC2 shared significant identity with LysC homologs. Unlike other vertebrate LysCs, coding sequence of abLysCs were distributed within five exons interrupted by four introns. Both abLysCs revealed a broader mRNA distribution with highest levels in mantle (abLysC1) and hepatopancreas (abLysC2) suggesting their likely main role in defense and digestion, respectively. Investigation of temporal transcriptional profiles post-LPS and -pathogen challenges revealed induced-responses of abLysCs in gills and hemocytes. The in vitro muramidase activity of purified recombinant (r) abLysCs proteins was evaluated, and findings indicated that they are active in acidic pH range (3.5-6.5) and over a broad temperature range (20-60 °C) and influenced by ionic strength. When the antibacterial spectra of (r)abLysCs were examined, they displayed differential activities against both Gram positive and Gram negative strains providing evidence for their involvement in bacteriolytic function in abalone physiology. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  1. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    Science.gov (United States)

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  2. Constraints on genome dynamics revealed from gene distribution among the Ralstonia solanacearum species.

    Directory of Open Access Journals (Sweden)

    Pierre Lefeuvre

    Full Text Available Because it is suspected that gene content may partly explain host adaptation and ecology of pathogenic bacteria, it is important to study factors affecting genome composition and its evolution. While recent genomic advances have revealed extremely large pan-genomes for some bacterial species, it remains difficult to predict to what extent gene pool is accessible within or transferable between populations. As genomes bear imprints of the history of the organisms, gene distribution pattern analyses should provide insights into the forces and factors at play in the shaping and maintaining of bacterial genomes. In this study, we revisited the data obtained from a previous CGH microarrays analysis in order to assess the genomic plasticity of the R. solanacearum species complex. Gene distribution analyses demonstrated the remarkably dispersed genome of R. solanacearum with more than half of the genes being accessory. From the reconstruction of the ancestral genomes compositions, we were able to infer the number of gene gain and loss events along the phylogeny. Analyses of gene movement patterns reveal that factors associated with gene function, genomic localization and ecology delineate gene flow patterns. While the chromosome displayed lower rates of movement, the megaplasmid was clearly associated with hot-spots of gene gain and loss. Gene function was also confirmed to be an essential factor in gene gain and loss dynamics with significant differences in movement patterns between different COG categories. Finally, analyses of gene distribution highlighted possible highways of horizontal gene transfer. Due to sampling and design bias, we can only speculate on factors at play in this gene movement dynamic. Further studies examining precise conditions that favor gene transfer would provide invaluable insights in the fate of bacteria, species delineation and the emergence of successful pathogens.

  3. Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome

    Science.gov (United States)

    Kim, Woonsu; Park, Hyesun; Seo, Seongwon

    2016-01-01

    The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle. PMID

  4. CGUG: in silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb

    Directory of Open Access Journals (Sweden)

    Mahadevan Padmanabhan

    2009-08-01

    Full Text Available Abstract Background Viruses and small-genome bacteria (~2 megabases and smaller comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. "CoreGenesUniqueGenes" (CGUG is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms. Findings CGUG is available at http://binf.gmu.edu/geneorder.html as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes. Conclusion CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins.

  5. Genome Wide Identification, Phylogeny, and Expression of Aquaporin Genes in Common Carp (Cyprinus carpio.

    Directory of Open Access Journals (Sweden)

    Chuanju Dong

    Full Text Available Aquaporins (Aqps are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication.In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplication event and that the aqp gene family in other teleosts has been lost in their evolution history with the reason that the functions of genes are redundant and conservation. Expression patterns were assessed in various tissues, including brain, heart, spleen, liver, intestine, gill, muscle, and skin, which demonstrated the comprehensive expression profiles of aqp genes in the tetraploidized genome. Significant gene expression divergences have been observed, revealing substantial expression divergences or functional divergences in those duplicated aqp genes post the latest WGD event.To some extent, the gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp aqp gene family provides an

  6. Genomic variation in Salmonella enterica core genes for epidemiological typing

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Lukjancenko, Oksana; Rundsten, Carsten Friis

    2012-01-01

    Background: Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over...... genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher...... that there is a positive selection towards mutations leading to amino acid changes. Conclusions: Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important...

  7. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    OpenAIRE

    Wolf Yuri I; Novichkov Pavel S; Sorokin Alexander V; Makarova Kira S; Koonin Eugene V

    2007-01-01

    Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs ...

  8. The DNA-encoded nucleosome organization of a eukaryotic genome.

    Science.gov (United States)

    Kaplan, Noam; Moore, Irene K; Fondufe-Mittendorf, Yvonne; Gossett, Andrea J; Tillo, Desiree; Field, Yair; LeProust, Emily M; Hughes, Timothy R; Lieb, Jason D; Widom, Jonathan; Segal, Eran

    2009-03-19

    Nucleosome organization is critical for gene regulation. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers, competition with site-specific DNA-binding proteins, and the DNA sequence preferences of the nucleosomes themselves. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for approximately 40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.

  9. The zebrafish genome: a review and msx gene case study.

    Science.gov (United States)

    Postlethwait, J H

    2006-01-01

    Zebrafish is one of several important teleost models for understanding principles of vertebrate developmental, molecular, organismal, genetic, evolutionary, and genomic biology. Efficient investigation of the molecular genetic basis of induced mutations depends on knowledge of the zebrafish genome. Principles of zebrafish genomic analysis, including gene mapping, ortholog identification, conservation of syntenies, genome duplication, and evolution of duplicate gene function are discussed here using as a case study the zebrafish msxa, msxb, msxc, msxd, and msxe genes, which together constitute zebrafish orthologs of tetrapod Msx1, Msx2, and Msx3. Genomic analysis suggests orthologs for this difficult to understand group of paralogs.

  10. Comparative genomics on Norrie disease gene.

    Science.gov (United States)

    Katoh, Masuko; Katoh, Masaru

    2005-05-01

    DAND1 (NBL1), DAND2 (CKTSF1B1 or GREM1 or GREMLIN), DAND3 (CKTSF1B2 or GREM2 or PRDC), DAND4 (CER1), DAND5 (CKTSF1B3 or GREM3 or DANTE), MUC2, MUC5AC, MUC5B, MUC6, MUC19, WISP1, WISP2, WISP3, VWF, NOV and Norrie disease (NDP or NORRIN) genes encode proteins with cysteine knot domain. Cysteine-knot superfamily proteins regulate ligand-receptor interactions for a variety of signaling pathways implicated in embryogenesis, homeostasis, and carcinogenesis. Although Ndp is unrelated to Wnt family members, Ndp is claimed to function as a ligand for Fzd4. Here, we identified and characterized rat Ndp, cow Ndp, chicken ndp and zebrafish ndp genes by using bioinformatics. Rat Ndp gene, consisting of three exons, was located within AC105563.4 genome sequence. Cow Ndp and chicken ndp complete CDS were derived from CB467544.1 EST and BX932859.2 cDNA, respectively. Zebrafish ndp gene was located within BX572627.5 genome sequence. Rat Ndp (131 aa) was a secreted protein with C-terminal cysteine knot-like (CTCK) domain. Rat Ndp showed 100, 96.9, 95.4, 87.8 and 66.4 total-amino-acid identity with mouse Ndp, cow Ndp, human NDP, chicken ndp and zebrafish ndp, respectively. Exon-intron structure of mammalian Ndp orthologs was well conserved. FOXA2, CUTL1 (CCAAT displacement protein), LMO2, CEBPA (C/EBPalpha)-binding sites and triple POU2F1 (OCT1)-binding sites were conserved among promoters of mammalian Ndp orthologs.

  11. Widespread of horizontal gene transfer in the human genome.

    Science.gov (United States)

    Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun

    2017-04-04

    A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. From the pair-wise alignments between human genome and 53 vertebrate genomes, 1,467 human genome regions (2.6 M bases) from all chromosomes were found to be more conserved with non-mammals than with most mammals. These human genome regions involve 642 known genes, which are enriched with ion binding. Compared to known horizontal gene transfer regions in the human genome, there were few overlapping regions, which indicated horizontal gene transfer is more common than we expected in the human genome. Horizontal gene transfer impacts hundreds of human genes and this study provided insight into potential mechanisms of HGT in the human genome.

  12. Gene prediction and RFX transcriptional regulation analysis using comparative genomics

    OpenAIRE

    Chu, Jeffrey Shih Chieh

    2011-01-01

    Regulatory Factor X (RFX) is a family of transcription factors (TF) that is conserved in all metazoans, in some fungi, and in only a few single-cellular organisms. Seven members are found in mammals, nine in fishes, three in fruit flies, and a single member in nematodes and fungi. RFX is involved in many different roles in humans, but a particular function that is conserved in many metazoans is its regulation of ciliogenesis. Probing over 150 genomes for the presence of RFX and ciliary genes ...

  13. Brief Guide to Genomics: DNA, Genes and Genomes

    Science.gov (United States)

    ... clinic. Most new drugs based on genome-based research are estimated to be at least 10 to 15 years away, though recent genome-driven efforts in lipid-lowering therapy have considerably shortened that interval. According ...

  14. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  15. RGmatch: matching genomic regions to proximal genes in omics data integration

    Directory of Open Access Journals (Sweden)

    Pedro Furió-Tarí

    2016-11-01

    Full Text Available Abstract Background The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. Results In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. Conclusions RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher’s specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch’s flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.

  16. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    to protein: through epigenetic modifications, transcription regulators or post-transcriptional controls. The following papers concern several layers of gene regulation with questions answered by different HTS approaches. Genome-wide screening of epigenetic changes by ChIP-seq allowed us to study both spatial...... and temporal alterations of histone modifications (Papers I and II). Coupling the data with machine learning approaches, we established a prediction framework to assess the most informative histone marks as well as their most influential nucleosome positions in predicting the promoter usages. (Papers I...... they regulated or if the sites had global elevated usage rates by multiple TFs. Using RNA-seq, 5’end-seq in combination with depletion of 5’exonuclease as well as nonsensemediated decay (NMD) factors, we systematically analyzed NMD substrates as well as their degradation intermediates in human cells (Paper V...

  17. New Genome Similarity Measures based on Conserved Gene Adjacencies.

    Science.gov (United States)

    Doerr, Daniel; Kowada, Luis Antonio B; Araujo, Eloi; Deshpande, Shachi; Dantas, Simone; Moret, Bernard M E; Stoye, Jens

    2017-06-01

    Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerful-but also most complex-models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.

  18. A systematic genome-wide analysis of zebrafish protein-coding gene function

    NARCIS (Netherlands)

    Kettleborough, R.N.; Busch-Nentwich, E.M.; Harvey, S.A.; Dooley, C.M.; de Bruijn, E.; van Eeden, F.; Sealy, I.; White, R.J.; Herd, C.; Nijman, I.J.; Fenyes, F.; Mehroke, S.; Scahill, C.; Gibbons, R.; Wali, N.; Carruthers, S.; Hall, A.; Yen, J.; Cuppen, E.; Stemple, D.L.

    2013-01-01

    Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at a rapid rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in model vertebrate organisms,

  19. Evolution of closely linked gene pairs in vertebrate genomes

    NARCIS (Netherlands)

    Franck, E.; Hulsen, T.; Huynen, M.A.; Jong, de W.W.; Lunsen, N.H.; Madsen, O.

    2008-01-01

    The orientation of closely linked genes in mammalian genomes is not random: there are more head-to-head (h2h) gene pairs than expected. To understand the origin of this enrichment in h2h gene pairs, we have analyzed the phylogenetic distribution of gene pairs separated by less than 600 bp of

  20. Gene content and organization of a 281-kbp contig from the genome of the extremely thermophilic archaeon, Sulfolobus solfataricus P2

    NARCIS (Netherlands)

    Charlebois, R.; Confalonieri, F.; Curtis, B.; Doolittle, W.F.; Duguet, M.; Erauso, G.; Faguy, D.; Gaasterland, T.; Garrett, R.A.; Gordon, P.; Kozera, C.; Medina, N.; Oost, van der J.; Peng, X.; Ragan, M.; She, Q.; Singh, R.K.

    2000-01-01

    The sequence of a 281-kbp contig from the crenarchaeote Sulfolobus solfataricus P2 was determined and analysed. Notable features in this region include 29 ribosomal protein genes, 12 tRNA genes (four of which contain archaeal-type introns), operons encoding enzymes of histidine biosynthesis,

  1. Normalization of Complete Genome Characteristics: Application to Evolution from Primitive Organisms to Homo sapiens.

    Science.gov (United States)

    Sorimachi, Kenji; Okayasu, Teiji; Ohhira, Shuji

    2015-04-01

    Normalized nucleotide and amino acid contents of complete genome sequences can be visualized as radar charts. The shapes of these charts depict the characteristics of an organism's genome. The normalized values calculated from the genome sequence theoretically exclude experimental errors. Further, because normalization is independent of both target size and kind, this procedure is applicable not only to single genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research. Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms.

  2. Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

    Science.gov (United States)

    Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

    2016-02-27

    In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a

  3. The First Myriapod Genome Sequence Reveals Conservative Arthropod Gene Content and Genome Organisation in the Centipede Strigamia maritima

    Science.gov (United States)

    Chipman, Ariel D.; Ferrier, David E. K.; Brena, Carlo; Qu, Jiaxin; Hughes, Daniel S. T.; Schröder, Reinhard; Torres-Oliva, Montserrat; Znassi, Nadia; Jiang, Huaiyang; Almeida, Francisca C.; Alonso, Claudio R.; Apostolou, Zivkos; Aqrawi, Peshtewani; Arthur, Wallace; Barna, Jennifer C. J.; Blankenburg, Kerstin P.; Brites, Daniela; Capella-Gutiérrez, Salvador; Coyle, Marcus; Dearden, Peter K.; Du Pasquier, Louis; Duncan, Elizabeth J.; Ebert, Dieter; Eibner, Cornelius; Erikson, Galina; Evans, Peter D.; Extavour, Cassandra G.; Francisco, Liezl; Gabaldón, Toni; Gillis, William J.; Goodwin-Horn, Elizabeth A.; Green, Jack E.; Griffiths-Jones, Sam; Grimmelikhuijzen, Cornelis J. P.; Gubbala, Sai; Guigó, Roderic; Han, Yi; Hauser, Frank; Havlak, Paul; Hayden, Luke; Helbing, Sophie; Holder, Michael; Hui, Jerome H. L.; Hunn, Julia P.; Hunnekuhl, Vera S.; Jackson, LaRonda; Javaid, Mehwish; Jhangiani, Shalini N.; Jiggins, Francis M.; Jones, Tamsin E.; Kaiser, Tobias S.; Kalra, Divya; Kenny, Nathan J.; Korchina, Viktoriya; Kovar, Christie L.; Kraus, F. Bernhard; Lapraz, François; Lee, Sandra L.; Lv, Jie; Mandapat, Christigale; Manning, Gerard; Mariotti, Marco; Mata, Robert; Mathew, Tittu; Neumann, Tobias; Newsham, Irene; Ngo, Dinh N.; Ninova, Maria; Okwuonu, Geoffrey; Ongeri, Fiona; Palmer, William J.; Patil, Shobha; Patraquim, Pedro; Pham, Christopher; Pu, Ling-Ling; Putman, Nicholas H.; Rabouille, Catherine; Ramos, Olivia Mendivil; Rhodes, Adelaide C.; Robertson, Helen E.; Robertson, Hugh M.; Ronshaugen, Matthew; Rozas, Julio; Saada, Nehad; Sánchez-Gracia, Alejandro; Scherer, Steven E.; Schurko, Andrew M.; Siggens, Kenneth W.; Simmons, DeNard; Stief, Anna; Stolle, Eckart; Telford, Maximilian J.; Tessmar-Raible, Kristin; Thornton, Rebecca; van der Zee, Maurijn; von Haeseler, Arndt; Williams, James M.; Willis, Judith H.; Wu, Yuanqing; Zou, Xiaoyan; Lawson, Daniel; Muzny, Donna M.; Worley, Kim C.; Gibbs, Richard A.; Akam, Michael; Richards, Stephen

    2014-01-01

    Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific

  4. Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Mara Sangiovanni

    2013-12-01

    Full Text Available Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. However, the Arabidopsis genome is characterized by an inherently complex organization, since it has undergone ancient whole genome duplications, followed by gene reduction, diploidization events and extended rearrangements, which relocated and split up the retained portions. These events, together with probable chromosome reductions, dramatically increased the genome complexity, limiting its role as a reference. The identification of paralogs and single copy genes within a highly duplicated genome is a prerequisite to understand its organization and evolution and to improve its exploitation in comparative genomics. This is still controversial, even in the widely studied Arabidopsis genome. This is also due to the lack of a reference bioinformatics pipeline that could exhaustively identify paralogs and singleton genes. We describe here a complete computational strategy to detect both duplicated and single copy genes in a genome, discussing all the methodological issues that may strongly affect the results, their quality and their reliability. This approach was used to analyze the organization of Arabidopsis nuclear protein coding genes, and besides classifying computationally defined paralogs into networks and single copy genes into different classes, it unraveled further intriguing aspects concerning the genome annotation and the gene relationships in this reference plant species. Since our results may be useful for comparative genomics and genome functional analyses, we organized a dedicated web interface to make them accessible to the scientific community.

  5. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    Science.gov (United States)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  6. OxyGene: an innovative platform for investigating oxidative-response genes in whole prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Barloy-Hubler Frédérique

    2008-12-01

    Full Text Available Abstract Background Oxidative stress is a common stress encountered by living organisms and is due to an imbalance between intracellular reactive oxygen and nitrogen species (ROS, RNS and cellular antioxidant defence. To defend themselves against ROS/RNS, bacteria possess a subsystem of detoxification enzymes, which are classified with regard to their substrates. To identify such enzymes in prokaryotic genomes, different approaches based on similarity, enzyme profiles or patterns exist. Unfortunately, several problems persist in the annotation, classification and naming of these enzymes due mainly to some erroneous entries in databases, mistake propagation, absence of updating and disparity in function description. Description In order to improve the current annotation of oxidative stress subsystems, an innovative platform named OxyGene has been developed. It integrates an original database called OxyDB, holding thoroughly tested anchor-based signatures associated to subfamilies of oxidative stress enzymes, and a new anchor-driven annotator, for ab initio detection of ROS/RNS response genes. All complete Bacterial and Archaeal genomes have been re-annotated, and the results stored in the OxyGene repository can be interrogated via a Graphical User Interface. Conclusion OxyGene enables the exploration and comparative analysis of enzymes belonging to 37 detoxification subclasses in 664 microbial genomes. It proposes a new classification that improves both the ontology and the annotation of the detoxification subsystems in prokaryotic whole genomes, while discovering new ORFs and attributing precise function to hypothetical annotated proteins. OxyGene is freely available at: http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software

  7. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  8. Whole genome duplications and expansion of the vertebrate GATA transcription factor gene family

    Directory of Open Access Journals (Sweden)

    Bowerman Bruce

    2009-08-01

    Full Text Available Abstract Background GATA transcription factors influence many developmental processes, including the specification of embryonic germ layers. The GATA gene family has significantly expanded in many animal lineages: whereas diverse cnidarians have only one GATA transcription factor, six GATA genes have been identified in many vertebrates, five in many insects, and eleven to thirteen in Caenorhabditis nematodes. All bilaterian animal genomes have at least one member each of two classes, GATA123 and GATA456. Results We have identified one GATA123 gene and one GATA456 gene from the genomic sequence of two invertebrate deuterostomes, a cephalochordate (Branchiostoma floridae and a hemichordate (Saccoglossus kowalevskii. We also have confirmed the presence of six GATA genes in all vertebrate genomes, as well as additional GATA genes in teleost fish. Analyses of conserved sequence motifs and of changes to the exon-intron structure, and molecular phylogenetic analyses of these deuterostome GATA genes support their origin from two ancestral deuterostome genes, one GATA 123 and one GATA456. Comparison of the conserved genomic organization across vertebrates identified eighteen paralogous gene families linked to multiple vertebrate GATA genes (GATA paralogons, providing the strongest evidence yet for expansion of vertebrate GATA gene families via genome duplication events. Conclusion From our analysis, we infer the evolutionary birth order and relationships among vertebrate GATA transcription factors, and define their expansion via multiple rounds of whole genome duplication events. As the genomes of four independent invertebrate deuterostome lineages contain single copy GATA123 and GATA456 genes, we infer that the 0R (pre-genome duplication invertebrate deuterostome ancestor also had two GATA genes, one of each class. Synteny analyses identify duplications of paralogous chromosomal regions (paralogons, from single ancestral vertebrate GATA123 and GATA456

  9. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

    Science.gov (United States)

    Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

    2015-02-10

    Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.

  10. Genic regions of a large salamander genome contain long introns and novel genes

    Directory of Open Access Journals (Sweden)

    Bryant Susan V

    2009-01-01

    Full Text Available Abstract Background The basis of genome size variation remains an outstanding question because DNA sequence data are lacking for organisms with large genomes. Sixteen BAC clones from the Mexican axolotl (Ambystoma mexicanum: c-value = 32 × 109 bp were isolated and sequenced to characterize the structure of genic regions. Results Annotation of genes within BACs showed that axolotl introns are on average 10× longer than orthologous vertebrate introns and they are predicted to contain more functional elements, including miRNAs and snoRNAs. Loci were discovered within BACs for two novel EST transcripts that are differentially expressed during spinal cord regeneration and skin metamorphosis. Unexpectedly, a third novel gene was also discovered while manually annotating BACs. Analysis of human-axolotl protein-coding sequences suggests there are 2% more lineage specific genes in the axolotl genome than the human genome, but the great majority (86% of genes between axolotl and human are predicted to be 1:1 orthologs. Considering that axolotl genes are on average 5× larger than human genes, the genic component of the salamander genome is estimated to be incredibly large, approximately 2.8 gigabases! Conclusion This study shows that a large salamander genome has a correspondingly large genic component, primarily because genes have incredibly long introns. These intronic sequences may harbor novel coding and non-coding sequences that regulate biological processes that are unique to salamanders.

  11. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    to investigate locomotor activity, and applied genomic feature prediction models to identify gene ontology (GO) cate- gories predictive of this phenotype. Next, we applied the covariance association test to partition the genomic variance of the predictive GO terms to the genes within these terms. We...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated......Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...

  12. Widespread of horizontal gene transfer in the human genome

    OpenAIRE

    Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun

    2017-01-01

    Background A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. Results From the pa...

  13. Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models.

    Science.gov (United States)

    Mahony, Shaun; McInerney, James O; Smith, Terry J; Golden, Aaron

    2004-03-05

    Many current gene prediction methods use only one model to represent protein-coding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation. This work explores a new approach to gene-prediction, based on the Self-Organizing Map, which has the ability to automatically identify multiple gene models within a genome. The current implementation, named RescueNet, uses relative synonymous codon usage as the indicator of protein-coding potential. While its raw accuracy rate can be less than other methods, RescueNet consistently identifies some genes that other methods do not, and should therefore be of interest to gene-prediction software developers and genome annotation teams alike. RescueNet is recommended for use in conjunction with, or as a complement to, other gene prediction methods.

  14. Gene copy number variation throughout the Plasmodium falciparum genome

    Directory of Open Access Journals (Sweden)

    Stewart Lindsay B

    2009-08-01

    Full Text Available Abstract Background Gene copy number variation (CNV is responsible for several important phenotypes of the malaria parasite Plasmodium falciparum, including drug resistance, loss of infected erythrocyte cytoadherence and alteration of receptor usage for erythrocyte invasion. Despite the known effects of CNV, little is known about its extent throughout the genome. Results We performed a whole-genome survey of CNV genes in P. falciparum using comparative genome hybridisation of a diverse set of 16 laboratory culture-adapted isolates to a custom designed high density Affymetrix GeneChip array. Overall, 186 genes showed hybridisation signals consistent with deletion or amplification in one or more isolate. There is a strong association of CNV with gene length, genomic location, and low orthology to genes in other Plasmodium species. Sub-telomeric regions of all chromosomes are strongly associated with CNV genes independent from members of previously described multigene families. However, ~40% of CNV genes were located in more central regions of the chromosomes. Among the previously undescribed CNV genes, several that are of potential phenotypic relevance are identified. Conclusion CNV represents a major form of genetic variation within the P. falciparum genome; the distribution of gene features indicates the involvement of highly non-random mutational and selective processes. Additional studies should be directed at examining CNV in natural parasite populations to extend conclusions to clinical settings.

  15. Marine Genomics: A clearing-house for genomic and transcriptomic data of marine organisms

    Directory of Open Access Journals (Sweden)

    Trent Harold F

    2005-03-01

    Full Text Available Abstract Background The Marine Genomics project is a functional genomics initiative developed to provide a pipeline for the curation of Expressed Sequence Tags (ESTs and gene expression microarray data for marine organisms. It provides a unique clearing-house for marine specific EST and microarray data and is currently available at http://www.marinegenomics.org. Description The Marine Genomics pipeline automates the processing, maintenance, storage and analysis of EST and microarray data for an increasing number of marine species. It currently contains 19 species databases (over 46,000 EST sequences that are maintained by registered users from local and remote locations in Europe and South America in addition to the USA. A collection of analysis tools are implemented. These include a pipeline upload tool for EST FASTA file, sequence trace file and microarray data, an annotative text search, automated sequence trimming, sequence quality control (QA/QC editing, sequence BLAST capabilities and a tool for interactive submission to GenBank. Another feature of this resource is the integration with a scientific computing analysis environment implemented by MATLAB. Conclusion The conglomeration of multiple marine organisms with integrated analysis tools enables users to focus on the comprehensive descriptions of transcriptomic responses to typical marine stresses. This cross species data comparison and integration enables users to contain their research within a marine-oriented data management and analysis environment.

  16. Alkane Biosynthesis Genes in Cyanobacteria and Their Transcriptional Organization

    International Nuclear Information System (INIS)

    Klähn, Stephan; Baumgartner, Desirée; Pfreundt, Ulrike; Voigt, Karsten; Schön, Verena; Steglich, Claudia; Hess, Wolfgang R.

    2014-01-01

    In cyanobacteria, alkanes are synthesized from a fatty acyl-ACP by two enzymes, acyl–acyl carrier protein reductase and aldehyde deformylating oxygenase. Despite the great interest in the exploitation for biofuel production, nothing is known about the transcriptional organization of their genes or the physiological function of alkane synthesis. The comparison of 115 microarray datasets indicates the relatively constitutive expression of aar and ado genes. The analysis of 181 available genomes showed that in 90% of the genomes both genes are present, likely indicating their physiological relevance. In 61% of them they cluster together with genes encoding acetyl-CoA carboxyl transferase and a short-chain dehydrogenase, strengthening the link to fatty acid metabolism and in 76% of the genomes they are located in tandem, suggesting constraints on the gene arrangement. However, contrary to the expectations for an operon, we found in Synechocystis sp. PCC 6803 specific promoters for the two genes, sll0208 (ado) and sll0209 (aar), which give rise to monocistronic transcripts. Moreover, the upstream located ado gene is driven by a proximal as well as a second, distal, promoter, from which a third transcript, the ~160 nt sRNA SyR9 is transcribed. Thus, the transcriptional organization of the alkane biosynthesis genes in Synechocystis sp. PCC 6803 is of substantial complexity. We verified all three promoters to function independently from each other and show a similar promoter arrangement also in the more distant Nodularia spumigena, Trichodesmium erythraeum, Anabaena sp. PCC 7120, Prochlorococcus MIT9313, and MED4. The presence of separate regulatory elements and the dominance of monocistronic mRNAs suggest the possible autonomous regulation of ado and aar. The complex transcriptional organization of the alkane synthesis gene cluster has possible metabolic implications and should be considered when manipulating the expression of these genes in cyanobacteria.

  17. Alkane biosynthesis genes in cyanobacteria and their transcriptional organization

    Directory of Open Access Journals (Sweden)

    Stephan eKlähn

    2014-07-01

    Full Text Available In cyanobacteria, alkanes are synthesized from a fatty acyl-ACP by two enzymes, acyl-acyl carrier protein reductase (AAR and aldehyde deformylating oxygenase (ADO. Despite the great interest in the exploitation for biofuel production, nothing is known about the transcriptional organization of their genes or the physiological function of alkane synthesis. The comparison of 115 microarray datasets indicates the relatively constitutive expression of aar and ado genes. The analysis of 181 available genomes showed that in 90% of the genomes both genes are present, likely indicating their physiological relevance. In 61% of them they cluster together with genes encoding acetyl-CoA carboxyl transferase and a short chain dehydrogenase, strengthening the link to fatty acid metabolism and in 76% of the genomes they are located in tandem, suggesting constraints on the gene arrangement. However, contrary to the expectations for an operon, we found in Synechocystis sp. PCC 6803 specific promoters for the two genes, sll0208 (ado and sll0209 (aar, that give rise to monocistronic transcripts. Moreover, the upstream located ado gene is driven by a proximal as well as a second, distal, promoter, from which a third transcript, the ~160 nt sRNA SyR9 is transcribed. Thus, the transcriptional organization of the alkane biosynthesis genes in Synechocystis sp. PCC 6803 is of substantial complexity. We verified all three promoters to function independently from each other and show a similar promoter arrangement also in the more distant Nodularia spumigena, Trichodesmium erythraeum, Anabaena sp. PCC 7120, Prochlorococcus MIT9313 and MED4. The presence of separate regulatory elements and the dominance of monocistronic mRNAs suggest the possible autonomous regulation of ado and aar. The complex transcriptional organization of the alkane synthesis gene cluster has possible metabolic implications and should be considered when manipulating the expression of these genes in

  18. Alkane Biosynthesis Genes in Cyanobacteria and Their Transcriptional Organization

    Energy Technology Data Exchange (ETDEWEB)

    Klähn, Stephan; Baumgartner, Desirée; Pfreundt, Ulrike; Voigt, Karsten; Schön, Verena; Steglich, Claudia; Hess, Wolfgang R., E-mail: wolfgang.hess@biologie.uni-freiburg.de [Genetics and Experimental Bioinformatics, Institute of Biology 3, Faculty of Biology, University of Freiburg, Freiburg (Germany)

    2014-07-14

    In cyanobacteria, alkanes are synthesized from a fatty acyl-ACP by two enzymes, acyl–acyl carrier protein reductase and aldehyde deformylating oxygenase. Despite the great interest in the exploitation for biofuel production, nothing is known about the transcriptional organization of their genes or the physiological function of alkane synthesis. The comparison of 115 microarray datasets indicates the relatively constitutive expression of aar and ado genes. The analysis of 181 available genomes showed that in 90% of the genomes both genes are present, likely indicating their physiological relevance. In 61% of them they cluster together with genes encoding acetyl-CoA carboxyl transferase and a short-chain dehydrogenase, strengthening the link to fatty acid metabolism and in 76% of the genomes they are located in tandem, suggesting constraints on the gene arrangement. However, contrary to the expectations for an operon, we found in Synechocystis sp. PCC 6803 specific promoters for the two genes, sll0208 (ado) and sll0209 (aar), which give rise to monocistronic transcripts. Moreover, the upstream located ado gene is driven by a proximal as well as a second, distal, promoter, from which a third transcript, the ~160 nt sRNA SyR9 is transcribed. Thus, the transcriptional organization of the alkane biosynthesis genes in Synechocystis sp. PCC 6803 is of substantial complexity. We verified all three promoters to function independently from each other and show a similar promoter arrangement also in the more distant Nodularia spumigena, Trichodesmium erythraeum, Anabaena sp. PCC 7120, Prochlorococcus MIT9313, and MED4. The presence of separate regulatory elements and the dominance of monocistronic mRNAs suggest the possible autonomous regulation of ado and aar. The complex transcriptional organization of the alkane synthesis gene cluster has possible metabolic implications and should be considered when manipulating the expression of these genes in cyanobacteria.

  19. Composition and genomic organization of arthropod Hox clusters.

    Science.gov (United States)

    Pace, Ryan M; Grbić, Miodrag; Nagy, Lisa M

    2016-01-01

    The ancestral arthropod is believed to have had a clustered arrangement of ten Hox genes. Within arthropods, Hox gene mutations result in transformation of segment identities. Despite the fact that variation in segment number/character was common in the diversification of arthropods, few examples of Hox gene gains/losses have been correlated with morphological evolution. Furthermore, a full appreciation of the variation in the genomic arrangement of Hox genes in extant arthropods has not been recognized, as genome sequences from each major arthropod clade have not been reported until recently. Initial genomic analysis of the chelicerate Tetranychus urticae suggested that loss of Hox genes and Hox gene clustering might be more common than previously assumed. To further characterize the genomic evolution of arthropod Hox genes, we compared the genomic arrangement and general characteristics of Hox genes from representative taxa from each arthropod subphylum. In agreement with others, we find arthropods generally contain ten Hox genes arranged in a common orientation in the genome, with an increasing number of sampled species missing either Hox3 or abdominal-A orthologs. The genomic clustering of Hox genes in species we surveyed varies significantly, ranging from 0.3 to 13.6 Mb. In all species sampled, arthropod Hox genes are dispersed in the genome relative to the vertebrate Mus musculus. Differences in Hox cluster size arise from variation in the number of intervening genes, intergenic spacing, and the size of introns and UTRs. In the arthropods surveyed, Hox gene duplications are rare and four microRNAs are, in general, conserved in similar genomic positions relative to the Hox genes. The tightly clustered Hox complexes found in the vertebrates are not evident within arthropods, and differential patterns of Hox gene dispersion are found throughout the arthropods. The comparative genomic data continue to support an ancestral arthropod Hox cluster of ten genes with

  20. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    International Nuclear Information System (INIS)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-01-01

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society

  1. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    Energy Technology Data Exchange (ETDEWEB)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-09-18

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.

  2. Comparative genome analysis and resistance gene mapping in grain legumes

    International Nuclear Information System (INIS)

    Young, N.D.

    1998-01-01

    Using, DNA markers and genome organization, several important disease resistance genes have been analyzed in mungbean (Vigna radiata), cowpea (Vigna unguiculata), common bean (Phaseolus vulgaris), and soybean (Glycine max). In the process, medium-density linkage maps consisting of restriction fragment length polymorphism (RFLP) markers were constructed for both mungbean and cowpea. Comparisons between these maps, as well as the maps of soybean and common bean, indicate that there is significant conservation of DNA marker order, though the conserved blocks in soybean are much shorter than in the others. DNA mapping results also indicate that a gene for seed weight may be conserved between mungbean and cowpea. Using the linkage maps, genes that control bruchid (genus Callosobruchus) and powdery mildew (Erysiphe polygoni) resistance in mungbean, aphid resistance in cowpea (Aphis craccivora), and cyst nematode (Heterodera glycines) resistance in soybean have all been mapped and characterized. For some of these traits resistance was found to be oligogenic and DNA mapping uncovered multiple genes involved in the phenotype. (author)

  3. Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks.

    Science.gov (United States)

    Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun

    2014-11-25

    among isolates but also functionally essential for a given species and to further evaluate the stability or flexibility of such genome structures across lineages are of importance. Based on a large number of multi-isolate pangenomic data, our analysis reveals that a subset of core genes is organized into a core-gene-defined genome organizational framework, or cGOF. Furthermore, the lineage-associated cGOFs among Gram-positive and Gram-negative bacteria behave differently: the former, composed of 2 to 4 segments, have their fragments symmetrically rearranged around the origin-terminus axis, whereas the latter show more complex segmentation and are partitioned asymmetrically into chromosomal structures. The definition of cGOFs provides new insights into prokaryotic genome organization and efficient guidance for genome assembly and analysis. Copyright © 2014 Kang et al.

  4. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  5. Genome Mutational and Transcriptional Hotspots Are Traps for Duplicated Genes and Sources of Adaptations.

    Science.gov (United States)

    Fares, Mario A; Sabater-Muñoz, Beatriz; Toft, Christina

    2017-05-01

    Gene duplication generates new genetic material, which has been shown to lead to major innovations in unicellular and multicellular organisms. A whole-genome duplication occurred in the ancestor of Saccharomyces yeast species but 92% of duplicates returned to single-copy genes shortly after duplication. The persisting duplicated genes in Saccharomyces led to the origin of major metabolic innovations, which have been the source of the unique biotechnological capabilities in the Baker's yeast Saccharomyces cerevisiae. What factors have determined the fate of duplicated genes remains unknown. Here, we report the first demonstration that the local genome mutation and transcription rates determine the fate of duplicates. We show, for the first time, a preferential location of duplicated genes in the mutational and transcriptional hotspots of S. cerevisiae genome. The mechanism of duplication matters, with whole-genome duplicates exhibiting different preservation trends compared to small-scale duplicates. Genome mutational and transcriptional hotspots are rich in duplicates with large repetitive promoter elements. Saccharomyces cerevisiae shows more tolerance to deleterious mutations in duplicates with repetitive promoter elements, which in turn exhibit higher transcriptional plasticity against environmental perturbations. Our data demonstrate that the genome traps duplicates through the accelerated regulatory and functional divergence of their gene copies providing a source of novel adaptations in yeast. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

    DEFF Research Database (Denmark)

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.

    2005-01-01

    years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences......We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each...... between the species-but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence...

  7. Genes but not genomes reveal bacterial domestication of Lactococcus lactis.

    Directory of Open Access Journals (Sweden)

    Delphine Passerini

    Full Text Available BACKGROUND: The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE. METHODOLOGY/PRINCIPAL FINDINGS: The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST differing by up to 230 kb in genome size. CONCLUSION/SIGNIFICANCE: The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between "environmental" strains, the main contributors to the genetic diversity within the subspecies, and "domesticated" strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the "domesticated" strains essentially arose through substantial genomic flux within the dispensable

  8. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    Josephine Erhiakporeh

    2016-07-06

    Jul 6, 2016 ... candidate genes for drought tolerance in sesame. (Sesamum ... Our results provided genomic resources for further functional analysis and genetic engineering .... reverse transcribed using the Reverse Transcription System.

  9. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    Science.gov (United States)

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  10. Improvisation in evolution of genes and genomes: whose structure is it anyway?

    Science.gov (United States)

    Shakhnovich, Boris E; Shakhnovich, Eugene I

    2008-06-01

    Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.

  11. Soft rot erwiniae: from genes to genomes.

    Science.gov (United States)

    Toth, Ian K; Bell, Kenneth S; Holeva, Maria C; Birch, Paul R J

    2003-01-01

    SUMMARY The soft rot erwiniae, Erwinia carotovora ssp. atroseptica (Eca), E. carotovora ssp. carotovora (Ecc) and E. chrysanthemi (Ech) are major bacterial pathogens of potato and other crops world-wide. We currently understand much about how these bacteria attack plants and protect themselves against plant defences. However, the processes underlying the establishment of infection, differences in host range and their ability to survive when not causing disease, largely remain a mystery. This review will focus on our current knowledge of pathogenesis in these organisms and discuss how modern genomic approaches, including complete genome sequencing of Eca and Ech, may open the door to a new understanding of the potential subtlety and complexity of soft rot erwiniae and their interactions with plants. The soft rot erwiniae are members of the Enterobacteriaceae, along with other plant pathogens such as Erwinia amylovora and human pathogens such as Escherichia coli, Salmonella spp. and Yersinia spp. Although the genus name Erwinia is most often used to describe the group, an alternative genus name Pectobacterium was recently proposed for the soft rot species. Ech mainly affects crops and other plants in tropical and subtropical regions and has a wide host range that includes potato and the important model host African violet (Saintpaulia ionantha). Ecc affects crops and other plants in subtropical and temperate regions and has probably the widest host range, which also includes potato. Eca, on the other hand, has a host range limited almost exclusively to potato in temperate regions only. Disease symptoms: Soft rot erwiniae cause general tissue maceration, termed soft rot disease, through the production of plant cell wall degrading enzymes. Environmental factors such as temperature, low oxygen concentration and free water play an essential role in disease development. On potato, and possibly other plants, disease symptoms may differ, e.g. blackleg disease is associated

  12. Selective Gene Delivery for Integrating Exogenous DNA into Plastid and Mitochondrial Genomes Using Peptide-DNA Complexes.

    Science.gov (United States)

    Yoshizumi, Takeshi; Oikawa, Kazusato; Chuah, Jo-Ann; Kodama, Yutaka; Numata, Keiji

    2018-05-14

    Selective gene delivery into organellar genomes (mitochondrial and plastid genomes) has been limited because of a lack of appropriate platform technology, even though these organelles are essential for metabolite and energy production. Techniques for selective organellar modification are needed to functionally improve organelles and produce transplastomic/transmitochondrial plants. However, no method for mitochondrial genome modification has yet been established for multicellular organisms including plants. Likewise, modification of plastid genomes has been limited to a few plant species and algae. In the present study, we developed ionic complexes of fusion peptides containing organellar targeting signal and plasmid DNA for selective delivery of exogenous DNA into the plastid and mitochondrial genomes of intact plants. This is the first report of exogenous DNA being integrated into the mitochondrial genomes of not only plants, but also multicellular organisms in general. This fusion peptide-mediated gene delivery system is a breakthrough platform for both plant organellar biotechnology and gene therapy for mitochondrial diseases in animals.

  13. Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

    Directory of Open Access Journals (Sweden)

    Yubo Hou

    Full Text Available The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10-transformed protein-coding gene number (Y' versus log(10-transformed genome size (X', genome size in kbp were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p0.91. Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1% compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%. The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6 kbp contains 38,188 protein-coding (40,086 total genes and the largest (245x10(6 kbp 87,688 protein-coding (92,013 total genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

  14. Codon usage is associated with the evolutionary age of genes in metazoan genomes

    Directory of Open Access Journals (Sweden)

    Linial Nathan

    2009-12-01

    Full Text Available Abstract Background Codon usage may vary significantly between different organisms and between genes within the same organism. Several evolutionary processes have been postulated to be the predominant determinants of codon usage: selection, mutation, and genetic drift. However, the relative contribution of each of these factors in different species remains debatable. The availability of complete genomes for tens of multicellular organisms provides an opportunity to inspect the relationship between codon usage and the evolutionary age of genes. Results We assign an evolutionary age to a gene based on the relative positions of its identified homologues in a standard phylogenetic tree. This yields a classification of all genes in a genome to several evolutionary age classes. The present study starts from the observation that each age class of genes has a unique codon usage and proceeds to provide a quantitative analysis of the codon usage in these classes. This observation is made for the genomes of Homo sapiens, Mus musculus, and Drosophila melanogaster. It is even more remarkable that the differences between codon usages in different age groups exhibit similar and consistent behavior in various organisms. While we find that GC content and gene length are also associated with the evolutionary age of genes, they can provide only a partial explanation for the observed codon usage. Conclusion While factors such as GC content, mutational bias, and selection shape the codon usage in a genome, the evolutionary history of an organism over hundreds of millions of years is an overlooked property that is strongly linked to GC content, protein length, and, even more significantly, to the codon usage of metazoan genomes.

  15. Organization of plastid genomes in the freshwater red algal order Batrachospermales (Rhodophyta).

    Science.gov (United States)

    Paiano, Monica Orlandi; Del Cortona, Andrea; Costa, Joana F; Liu, Shao-Lun; Verbruggen, Heroen; De Clerck, Olivier; Necchi, Orlando

    2018-02-01

    Little is known about genome organization in members of the order Batrachospermales, and the infra-ordinal relationship remains unresolved. Plastid (cp) genomes of seven members of the freshwater red algal order Batrachospermales were sequenced, with the following aims: (i) to describe the characteristics of cp genomes and compare these with other red algal groups; (ii) to infer the phylogenetic relationships among these members to better understand the infra-ordinal classification. Cp genomes of Batrachospermales are large, with several cases of gene loss, they are gene-dense (high gene content for the genome size and short intergenic regions) and have highly conserved gene order. Phylogenetic analyses based on concatenated nucleotide genome data roughly supports the current taxonomic system for the order. Comparative analyses confirm data for members of the class Florideophyceae that cp genomes in Batrachospermales is highly conserved, with little variation in gene composition. However, relevant new features were revealed in our study: genome sizes in members of Batrachospermales are close to the lowest values reported for Florideophyceae; differences in cp genome size within the order are large in comparison with other orders (Ceramiales, Gelidiales, Gracilariales, Hildenbrandiales, and Nemaliales); and members of Batrachospermales have the lowest number of protein-coding genes among the Florideophyceae. In terms of gene loss, apcF, which encodes the allophycocyanin beta subunit, is absent in all sequenced taxa of Batrachospermales. We reinforce that the interordinal relationships between the freshwater orders Batrachospermales and Thoreales within the Nemaliophycidae is not well resolved due to limited taxon sampling. © 2017 Phycological Society of America.

  16. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  17. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    Energy Technology Data Exchange (ETDEWEB)

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives

  18. [A review of the genomic and gene cloning studies in trees].

    Science.gov (United States)

    Yin, Tong-Ming

    2010-07-01

    Supported by the Department of Energy (DOE) of U.S., the first tree genome, black cottonwood (Populus trichocarpa), has been completely sequenced and publicly release. This is the milestone that indicates the beginning of post-genome era for forest trees. Identification and cloning genes underlying important traits are one of the main tasks for the post-genome-era tree genomic studies. Recently, great achievements have been made in cloning genes coordinating important domestication traits in some crops, such as rice, tomato, maize and so on. Molecular breeding has been applied in the practical breeding programs for many crops. By contrast, molecular studies in trees are lagging behind. Trees possess some characteristics that make them as difficult organisms for studying on locating and cloning of genes. With the advances in techniques, given also the fast growth of tree genomic resources, great achievements are desirable in cloning unknown genes from trees, which will facilitate tree improvement programs by means of molecular breeding. In this paper, the author reviewed the progress in tree genomic and gene cloning studies, and prospected the future achievements in order to provide a useful reference for researchers working in this area.

  19. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  20. Snf2 family gene distribution in higher plant genomes reveals DRD1 expansion and diversification in the tomato genome.

    Science.gov (United States)

    Bargsten, Joachim W; Folta, Adam; Mlynárová, Ludmila; Nap, Jan-Peter

    2013-01-01

    As part of large protein complexes, Snf2 family ATPases are responsible for energy supply during chromatin remodeling, but the precise mechanism of action of many of these proteins is largely unknown. They influence many processes in plants, such as the response to environmental stress. This analysis is the first comprehensive study of Snf2 family ATPases in plants. We here present a comparative analysis of 1159 candidate plant Snf2 genes in 33 complete and annotated plant genomes, including two green algae. The number of Snf2 ATPases shows considerable variation across plant genomes (17-63 genes). The DRD1, Rad5/16 and Snf2 subfamily members occur most often. Detailed analysis of the plant-specific DRD1 subfamily in related plant genomes shows the occurrence of a complex series of evolutionary events. Notably tomato carries unexpected gene expansions of DRD1 gene members. Most of these genes are expressed in tomato, although at low levels and with distinct tissue or organ specificity. In contrast, the Snf2 subfamily genes tend to be expressed constitutively in tomato. The results underpin and extend the Snf2 subfamily classification, which could help to determine the various functional roles of Snf2 ATPases and to target environmental stress tolerance and yield in future breeding.

  1. Snf2 family gene distribution in higher plant genomes reveals DRD1 expansion and diversification in the tomato genome.

    Directory of Open Access Journals (Sweden)

    Joachim W Bargsten

    Full Text Available As part of large protein complexes, Snf2 family ATPases are responsible for energy supply during chromatin remodeling, but the precise mechanism of action of many of these proteins is largely unknown. They influence many processes in plants, such as the response to environmental stress. This analysis is the first comprehensive study of Snf2 family ATPases in plants. We here present a comparative analysis of 1159 candidate plant Snf2 genes in 33 complete and annotated plant genomes, including two green algae. The number of Snf2 ATPases shows considerable variation across plant genomes (17-63 genes. The DRD1, Rad5/16 and Snf2 subfamily members occur most often. Detailed analysis of the plant-specific DRD1 subfamily in related plant genomes shows the occurrence of a complex series of evolutionary events. Notably tomato carries unexpected gene expansions of DRD1 gene members. Most of these genes are expressed in tomato, although at low levels and with distinct tissue or organ specificity. In contrast, the Snf2 subfamily genes tend to be expressed constitutively in tomato. The results underpin and extend the Snf2 subfamily classification, which could help to determine the various functional roles of Snf2 ATPases and to target environmental stress tolerance and yield in future breeding.

  2. The invasive MED/Q Bemisia tabaci genome: a tale of gene loss and gene gain

    Science.gov (United States)

    Whiteflies are a group of invasive crop pests that impact global agriculture. An analysis was conducted to compare draft genomes of two whitefly strains, which demonstrated the relative conserved gene order, but a number of genes were either novel (added) or omitted (deleted) between genomes. This...

  3. Conserved genomic organisation of Group B Sox genes in insects.

    Directory of Open Access Journals (Sweden)

    Woerfel Gertrud

    2005-05-01

    Full Text Available Abstract Background Sox domain containing genes are important metazoan transcriptional regulators implicated in a wide rage of developmental processes. The vertebrate B subgroup contains the Sox1, Sox2 and Sox3 genes that have early functions in neural development. Previous studies show that Drosophila Group B genes have been functionally conserved since they play essential roles in early neural specification and mutations in the Drosophila Dichaete and SoxN genes can be rescued with mammalian Sox genes. Despite their importance, the extent and organisation of the Group B family in Drosophila has not been fully characterised, an important step in using Drosophila to examine conserved aspects of Group B Sox gene function. Results We have used the directed cDNA sequencing along with the output from the publicly-available genome sequencing projects to examine the structure of Group B Sox domain genes in Drosophila melanogaster, Drosophila pseudoobscura, Anopheles gambiae and Apis mellifora. All of the insect genomes contain four genes encoding Group B proteins, two of which are intronless, as is the case with vertebrate group B genes. As has been previously reported and unusually for Group B genes, two of the insect group B genes, Sox21a and Sox21b, contain introns within their DNA-binding domains. We find that the highly unusual multi-exon structure of the Sox21b gene is common to the insects. In addition, we find that three of the group B Sox genes are organised in a linked cluster in the insect genomes. By in situ hybridisation we show that the pattern of expression of each of the four group B genes during embryogenesis is conserved between D. melanogaster and D. pseudoobscura. Conclusion The DNA-binding domain sequences and genomic organisation of the group B genes have been conserved over 300 My of evolution since the last common ancestor of the Hymenoptera and the Diptera. Our analysis suggests insects have two Group B1 genes, SoxN and

  4. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    Directory of Open Access Journals (Sweden)

    Wolf Yuri I

    2007-11-01

    Full Text Available Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs. Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. Results New Archaeal Clusters of Orthologous Genes (arCOGs were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile

  5. Functional Associations by Response Overlap (FARO), a functional genomics approach matching gene expression phenotypes

    DEFF Research Database (Denmark)

    Nielsen, Henrik Bjørn; Mundy, J.; Willenbrock, Hanni

    2007-01-01

    The systematic comparison of transcriptional responses of organisms is a powerful tool in functional genomics. For example, mutants may be characterized by comparing their transcript profiles to those obtained in other experiments querying the effects on gene expression of many experimental facto...

  6. LATERAL GENE TRANSFER AND THE HISTORY OF BACTERIAL GENOMES

    Energy Technology Data Exchange (ETDEWEB)

    Howard Ochman

    2006-02-22

    The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.

  7. Ranking of Prokaryotic Genomes Based on Maximization of Sortedness of Gene Lengths.

    Science.gov (United States)

    Bolshoy, A; Salih, B; Cohen, I; Tatarinova, T

    How variations of gene lengths (some genes become longer than their predecessors, while other genes become shorter and the sizes of these factions are randomly different from organism to organism) depend on organismal evolution and adaptation is still an open question. We propose to rank the genomes according to lengths of their genes, and then find association between the genome rank and variousproperties, such as growth temperature, nucleotide composition, and pathogenicity. This approach reveals evolutionary driving factors. The main purpose of this study is to test effectiveness and robustness of several ranking methods. The selected method of evaluation is measuring of overall sortedness of the data. We have demonstrated that all considered methods give consistent results and Bubble Sort and Simulated Annealing achieve the highest sortedness. Also, Bubble Sort is considerably faster than the Simulated Annealing method.

  8. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  9. Genome-wide analysis of WRKY gene family in the sesame genome and identification of the WRKY genes involved in responses to abiotic stresses.

    Science.gov (United States)

    Li, Donghua; Liu, Pan; Yu, Jingyin; Wang, Linhai; Dossa, Komivi; Zhang, Yanxin; Zhou, Rong; Wei, Xin; Zhang, Xiurong

    2017-09-11

    Sesame (Sesamum indicum L.) is one of the world's most important oil crops. However, it is susceptible to abiotic stresses in general, and to waterlogging and drought stresses in particular. The molecular mechanisms of abiotic stress tolerance in sesame have not yet been elucidated. The WRKY domain transcription factors play significant roles in plant growth, development, and responses to stresses. However, little is known about the number, location, structure, molecular phylogenetics, and expression of the WRKY genes in sesame. We performed a comprehensive study of the WRKY gene family in sesame and identified 71 SiWRKYs. In total, 65 of these genes were mapped to 15 linkage groups within the sesame genome. A phylogenetic analysis was performed using a related species (Arabidopsis thaliana) to investigate the evolution of the sesame WRKY genes. Tissue expression profiles of the WRKY genes demonstrated that six SiWRKY genes were highly expressed in all organs, suggesting that these genes may be important for plant growth and organ development in sesame. Analysis of the SiWRKY gene expression patterns revealed that 33 and 26 SiWRKYs respond strongly to waterlogging and drought stresses, respectively. Changes in the expression of 12 SiWRKY genes were observed at different times after the waterlogging and drought treatments had begun, demonstrating that sesame gene expression patterns vary in response to abiotic stresses. In this study, we analyzed the WRKY family of transcription factors encoded by the sesame genome. Insight was gained into the classification, evolution, and function of the SiWRKY genes, revealing their putative roles in a variety of tissues. Responses to abiotic stresses in different sesame cultivars were also investigated. The results of our study provide a better understanding of the structures and functions of sesame WRKY genes and suggest that manipulating these WRKYs could enhance resistance to waterlogging and drought.

  10. From Genomics to Gene Therapy: Induced Pluripotent Stem Cells Meet Genome Editing.

    Science.gov (United States)

    Hotta, Akitsu; Yamanaka, Shinya

    2015-01-01

    The advent of induced pluripotent stem (iPS) cells has opened up numerous avenues of opportunity for cell therapy, including the initiation in September 2014 of the first human clinical trial to treat dry age-related macular degeneration. In parallel, advances in genome-editing technologies by site-specific nucleases have dramatically improved our ability to edit endogenous genomic sequences at targeted sites of interest. In fact, clinical trials have already begun to implement this technology to control HIV infection. Genome editing in iPS cells is a powerful tool and enables researchers to investigate the intricacies of the human genome in a dish. In the near future, the groundwork laid by such an approach may expand the possibilities of gene therapy for treating congenital disorders. In this review, we summarize the exciting progress being made in the utilization of genomic editing technologies in pluripotent stem cells and discuss remaining challenges toward gene therapy applications.

  11. Genome engineering using a synthetic gene circuit in Bacillus subtilis.

    Science.gov (United States)

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-03-31

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac-chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. The genomic view of genes responsive to the antagonistic phytohormones, abscisic acid, and gibberellin.

    Science.gov (United States)

    Yazaki, Junshi; Kikuchi, Shoshi

    2005-01-01

    We now have the various genomics tools for monocot (Oryza sativa) and a dicot (Arabidopsis thaliana) plant. Plant is not only a very important agricultural resource but also a model organism for biological research. It is important that the interaction between ABA and GA is investigated for controlling the transition from embryogenesis to germination in seeds using genomics tools. These studies have investigated the relationship between dormancy and germination using genomics tools. Genomics tools identified genes that had never before been annotated as ABA- or GA-responsive genes in plant, detected new interactions between genes responsive to the two hormones, comprehensively characterized cis-elements of hormone-responsive genes, and characterized cis-elements of rice and Arabidopsis. In these research, ABA- and GA-regulated genes have been classified as functional proteins (proteins that probably function in stress or PR tolerance) and regulatory proteins (protein factors involved in further regulation of signal transduction). Comparison between ABA and/or GA-responsive genes in rice and those in Arabidopsis has shown that the cis-element has specificity in each species. cis-Elements for the dehydration-stress response have been specified in Arabidopsis but not in rice. cis-Elements for protein storage are remarkably richer in the upstream regions of the rice gene than in those of Arabidopsis.

  13. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated...

  14. Machine Learning Leveraging Genomes from Metagenomes Identifies Influential Antibiotic Resistance Genes in the Infant Gut Microbiome

    Science.gov (United States)

    Olm, Matthew R.; Morowitz, Michael J.

    2018-01-01

    ABSTRACT Antibiotic resistance in pathogens is extensively studied, and yet little is known about how antibiotic resistance genes of typical gut bacteria influence microbiome dynamics. Here, we leveraged genomes from metagenomes to investigate how genes of the premature infant gut resistome correspond to the ability of bacteria to survive under certain environmental and clinical conditions. We found that formula feeding impacts the resistome. Random forest models corroborated by statistical tests revealed that the gut resistome of formula-fed infants is enriched in class D beta-lactamase genes. Interestingly, Clostridium difficile strains harboring this gene are at higher abundance in formula-fed infants than C. difficile strains lacking this gene. Organisms with genes for major facilitator superfamily drug efflux pumps have higher replication rates under all conditions, even in the absence of antibiotic therapy. Using a machine learning approach, we identified genes that are predictive of an organism’s direction of change in relative abundance after administration of vancomycin and cephalosporin antibiotics. The most accurate results were obtained by reducing annotated genomic data to five principal components classified by boosted decision trees. Among the genes involved in predicting whether an organism increased in relative abundance after treatment are those that encode subclass B2 beta-lactamases and transcriptional regulators of vancomycin resistance. This demonstrates that machine learning applied to genome-resolved metagenomics data can identify key genes for survival after antibiotics treatment and predict how organisms in the gut microbiome will respond to antibiotic administration. IMPORTANCE The process of reconstructing genomes from environmental sequence data (genome-resolved metagenomics) allows unique insight into microbial systems. We apply this technique to investigate how the antibiotic resistance genes of bacteria affect their ability to

  15. Aldehyde Dehydrogenase Gene Superfamily in Populus: Organization and Expression Divergence between Paralogous Gene Pairs.

    Directory of Open Access Journals (Sweden)

    Feng-Xia Tian

    Full Text Available Aldehyde dehydrogenases (ALDHs constitute a superfamily of NAD(P+-dependent enzymes that catalyze the irreversible oxidation of a wide range of reactive aldehydes to their corresponding nontoxic carboxylic acids. ALDHs have been studied in many organisms from bacteria to mammals; however, no systematic analyses incorporating genome organization, gene structure, expression profiles, and cis-acting elements have been conducted in the model tree species Populus trichocarpa thus far. In this study, a comprehensive analysis of the Populus ALDH gene superfamily was performed. A total of 26 Populus ALDH genes were found to be distributed across 12 chromosomes. Genomic organization analysis indicated that purifying selection may have played a pivotal role in the retention and maintenance of PtALDH gene families. The exon-intron organizations of PtALDHs were highly conserved within the same family, suggesting that the members of the same family also may have conserved functionalities. Microarray data and qRT-PCR analysis indicated that most PtALDHs had distinct tissue-specific expression patterns. The specificity of cis-acting elements in the promoter regions of the PtALDHs and the divergence of expression patterns between nine paralogous PtALDH gene pairs suggested that gene duplications may have freed the duplicate genes from the functional constraints. The expression levels of some ALDHs were up- or down-regulated by various abiotic stresses, implying that the products of these genes may be involved in the adaptation of Populus to abiotic stresses. Overall, the data obtained from our investigation contribute to a better understanding of the complexity of the Populus ALDH gene superfamily and provide insights into the function and evolution of ALDH gene families in vascular plants.

  16. Aldehyde Dehydrogenase Gene Superfamily in Populus: Organization and Expression Divergence between Paralogous Gene Pairs.

    Science.gov (United States)

    Tian, Feng-Xia; Zang, Jian-Lei; Wang, Tan; Xie, Yu-Li; Zhang, Jin; Hu, Jian-Jun

    2015-01-01

    Aldehyde dehydrogenases (ALDHs) constitute a superfamily of NAD(P)+-dependent enzymes that catalyze the irreversible oxidation of a wide range of reactive aldehydes to their corresponding nontoxic carboxylic acids. ALDHs have been studied in many organisms from bacteria to mammals; however, no systematic analyses incorporating genome organization, gene structure, expression profiles, and cis-acting elements have been conducted in the model tree species Populus trichocarpa thus far. In this study, a comprehensive analysis of the Populus ALDH gene superfamily was performed. A total of 26 Populus ALDH genes were found to be distributed across 12 chromosomes. Genomic organization analysis indicated that purifying selection may have played a pivotal role in the retention and maintenance of PtALDH gene families. The exon-intron organizations of PtALDHs were highly conserved within the same family, suggesting that the members of the same family also may have conserved functionalities. Microarray data and qRT-PCR analysis indicated that most PtALDHs had distinct tissue-specific expression patterns. The specificity of cis-acting elements in the promoter regions of the PtALDHs and the divergence of expression patterns between nine paralogous PtALDH gene pairs suggested that gene duplications may have freed the duplicate genes from the functional constraints. The expression levels of some ALDHs were up- or down-regulated by various abiotic stresses, implying that the products of these genes may be involved in the adaptation of Populus to abiotic stresses. Overall, the data obtained from our investigation contribute to a better understanding of the complexity of the Populus ALDH gene superfamily and provide insights into the function and evolution of ALDH gene families in vascular plants.

  17. Comparative genomic and transcriptomic analysis of selected fatty acid biosynthesis genes and CNL disease resistance genes in oil palm

    Science.gov (United States)

    Rosli, Rozana; Amiruddin, Nadzirah; Ab Halim, Mohd Amin; Chan, Pek-Lan; Chan, Kuang-Lim; Azizi, Norazah; Morris, Priscilla E.; Leslie Low, Eng-Ti; Ong-Abdullah, Meilina; Sambanthamurthi, Ravigadevi; Singh, Rajinder

    2018-01-01

    Comparative genomics and transcriptomic analyses were performed on two agronomically important groups of genes from oil palm versus other major crop species and the model organism, Arabidopsis thaliana. The first analysis was of two gene families with key roles in regulation of oil quality and in particular the accumulation of oleic acid, namely stearoyl ACP desaturases (SAD) and acyl-acyl carrier protein (ACP) thioesterases (FAT). In both cases, these were found to be large gene families with complex expression profiles across a wide range of tissue types and developmental stages. The detailed classification of the oil palm SAD and FAT genes has enabled the updating of the latest version of the oil palm gene model. The second analysis focused on disease resistance (R) genes in order to elucidate possible candidates for breeding of pathogen tolerance/resistance. Ortholog analysis showed that 141 out of the 210 putative oil palm R genes had homologs in banana and rice. These genes formed 37 clusters with 634 orthologous genes. Classification of the 141 oil palm R genes showed that the genes belong to the Kinase (7), CNL (95), MLO-like (8), RLK (3) and Others (28) categories. The CNL R genes formed eight clusters. Expression data for selected R genes also identified potential candidates for breeding of disease resistance traits. Furthermore, these findings can provide information about the species evolution as well as the identification of agronomically important genes in oil palm and other major crops. PMID:29672525

  18. Comparative genomic and transcriptomic analysis of selected fatty acid biosynthesis genes and CNL disease resistance genes in oil palm.

    Science.gov (United States)

    Rosli, Rozana; Amiruddin, Nadzirah; Ab Halim, Mohd Amin; Chan, Pek-Lan; Chan, Kuang-Lim; Azizi, Norazah; Morris, Priscilla E; Leslie Low, Eng-Ti; Ong-Abdullah, Meilina; Sambanthamurthi, Ravigadevi; Singh, Rajinder; Murphy, Denis J

    2018-01-01

    Comparative genomics and transcriptomic analyses were performed on two agronomically important groups of genes from oil palm versus other major crop species and the model organism, Arabidopsis thaliana. The first analysis was of two gene families with key roles in regulation of oil quality and in particular the accumulation of oleic acid, namely stearoyl ACP desaturases (SAD) and acyl-acyl carrier protein (ACP) thioesterases (FAT). In both cases, these were found to be large gene families with complex expression profiles across a wide range of tissue types and developmental stages. The detailed classification of the oil palm SAD and FAT genes has enabled the updating of the latest version of the oil palm gene model. The second analysis focused on disease resistance (R) genes in order to elucidate possible candidates for breeding of pathogen tolerance/resistance. Ortholog analysis showed that 141 out of the 210 putative oil palm R genes had homologs in banana and rice. These genes formed 37 clusters with 634 orthologous genes. Classification of the 141 oil palm R genes showed that the genes belong to the Kinase (7), CNL (95), MLO-like (8), RLK (3) and Others (28) categories. The CNL R genes formed eight clusters. Expression data for selected R genes also identified potential candidates for breeding of disease resistance traits. Furthermore, these findings can provide information about the species evolution as well as the identification of agronomically important genes in oil palm and other major crops.

  19. Rhizome of life, catastrophes, sequence exchanges, gene creations and giant viruses: How microbial genomics challenges Darwin

    Directory of Open Access Journals (Sweden)

    Vicky eMerhej

    2012-08-01

    Full Text Available Darwin’s theory about the evolution of species has been the object of considerable dispute. In this review, we have described seven key principles in Darwin’s book The Origin of Species and tried to present how genomics challenge each of these concepts and improve our knowledge about evolution. Darwin believed that species evolution consists on a positive directional selection ensuring the survival of the fittest. The most developed state of the species is characterized by increasing complexity. Darwin proposed the theory of descent with modification according to which all species evolve from a single common ancestor through a gradual process of small modification of their vertical inheritance. Finally, the process of evolution can be depicted in the form of a tree. However, microbial genomics showed that evolution is better described as the biological changes over time." The mode of change is not unidirectional and does not necessarily favors advantageous mutations to increase fitness it is rather subject to random selection as a result of catastrophic stochastic processes. Complexity is not necessarily the completion of development: several complex organisms have gone extinct and many microbes including bacteria with intracellular lifestyle have streamlined highly effective genomes. Genomes evolve through large events of gene deletions, duplications, insertions and genomes rearrangements rather than a gradual adaptative process. Genomes are dynamic and chimeric entities with gene repertoires that result from vertical and horizontal acquisitions as well as de novo gene creation. The chimeric character of microbial genomes excludes the possibility of finding a single common ancestor for all the genes recorded currently. Genomes are collections of genes with different evolutionary histories that cannot be represented by a single tree of life. A forest, a network or a rhizome of life may be more accurate to represent evolutionary relationships

  20. The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes.

    Directory of Open Access Journals (Sweden)

    Marion Ouedraogo

    Full Text Available BACKGROUND: There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD was developed for this purpose. METHODOLOGY: Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. CONCLUSIONS: The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.

  1. Evolution of Genome Organization and Epigenetic Machineries ...

    Indian Academy of Sciences (India)

    48

    The compact folded structure of bacterial genomic material is termed as nucleoid. The ... from pluripotent to the differentiated stage of a cell, there is a dramatic alteration both in the .... Sherratt, D.J. 2003 Bacterial Chromosome Dynamics.

  2. The genomic structure of the DMBT1 gene

    DEFF Research Database (Denmark)

    Mollenhauer, J; Holmskov, U; Wiemann, S

    1999-01-01

    Increasing evidence has accumulated for an involvement of the inactivation of tumour suppressor genes at chromosome 10q in the carcinogenesis of brain tumours, melanomas, and carcinomas of the lung, the prostate, the pancreas, and the endometrium. The gene DMBT1 (Deleted in Malignant Brain Tumours...... 1) is located at chromosome 10q25.3-q26.1, within one of the putative intervals for tumour suppressor genes. DMBT1 is a member of the scavenger-receptor cysteine-rich (SRCR) superfamily and displays homozygous deletions or lack of expression in glioblastoma multiforme, medulloblastoma......, and in gastrointestinal and lung cancers. Based on these properties, DMBT1 has been proposed to be a candidate tumour suppressor gene. We have determined the genomic sequence of DMBT1 to allow analyses of mutations. The gene has at least 54 exons that span a genomic region of about 80 kb. We have identified a putative...

  3. Transcriptional interference networks coordinate the expression of functionally related genes clustered in the same genomic loci.

    Science.gov (United States)

    Boldogköi, Zsolt

    2012-01-01

    The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too.

  4. Approaching the Sequential and Three-Dimensional Organization of Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2006-01-01

    textabstractGenomes are one of the major foundations of life due to their role in information storage, process regulation and evolution. To achieve a deeper unterstanding of the human genome the three-dimensional organization of the human cell nucleus, the structural-, scaling- and dynamic

  5. Identification of neural outgrowth genes using genome-wide RNAi.

    Directory of Open Access Journals (Sweden)

    Katharine J Sepp

    2008-07-01

    Full Text Available While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new

  6. Comparative genome sequencing of drosophila pseudoobscura: Chromosomal, gene and cis-element evolution

    Energy Technology Data Exchange (ETDEWEB)

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Todd, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catherine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenee; Verduzco, Daniel; Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2004-04-01

    The genome sequence of a second fruit fly, D. pseudoobscura, presents an opportunity for comparative analysis of a primary model organism D. melanogaster. The vast majority of Drosophila genes have remained on the same arm, but within each arm gene order has been extensively reshuffled leading to the identification of approximately 1300 syntenic blocks. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 35 My since divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome wide average consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than control sequences between the species but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a picture of repeat mediated chromosomal rearrangement, and high co-adaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.

  7. CTCF Mediates the Cell-Type Specific Spatial Organization of the Kcnq5 Locus and the Local Gene Regulation

    OpenAIRE

    Ren, Licheng; Wang, Yang; Shi, Minglei; Wang, Xiaoning; Yang, Zhong; Zhao, Zhihu

    2012-01-01

    Chromatin loops play important roles in the dynamic spatial organization of genes in the nucleus. Growing evidence has revealed that the multivalent functional zinc finger protein CCCTC-binding factor (CTCF) is a master regulator of genome spatial organization, and mediates the ubiquitous chromatin loops within the genome. Using circular chromosome conformation capture (4C) methodology, we discovered that CTCF may be a master organizer in mediating the spatial organization of the kcnq5 gene l...

  8. Genomic Organization of Zebrafish microRNAs

    Directory of Open Access Journals (Sweden)

    Paydar Ima

    2008-05-01

    Full Text Available Abstract Background microRNAs (miRNAs are small (~22 nt non-coding RNAs that regulate cell movement, specification, and development. Expression of miRNAs is highly regulated, both spatially and temporally. Based on direct cloning, sequence conservation, and predicted secondary structures, a large number of miRNAs have been identified in higher eukaryotic genomes but whether these RNAs are simply a subset of a much larger number of noncoding RNA families is unknown. This is especially true in zebrafish where genome sequencing and annotation is not yet complete. Results We analyzed the zebrafish genome to identify the number and location of proven and predicted miRNAs resulting in the identification of 35 new miRNAs. We then grouped all 415 zebrafish miRNAs into families based on seed sequence identity as a means to identify possible functional redundancy. Based on genomic location and expression analysis, we also identified those miRNAs that are likely to be encoded as part of polycistronic transcripts. Lastly, as a resource, we compiled existing zebrafish miRNA expression data and, where possible, listed all experimentally proven mRNA targets. Conclusion Current analysis indicates the zebrafish genome encodes 415 miRNAs which can be grouped into 44 families. The largest of these families (the miR-430 family contains 72 members largely clustered in two main locations along chromosome 4. Thus far, most zebrafish miRNAs exhibit tissue specific patterns of expression.

  9. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies.

    Science.gov (United States)

    Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M

    2012-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.

  10. The Most Developmentally Truncated Fishes Show Extensive Hox Gene Loss and Miniaturized Genomes

    Science.gov (United States)

    Malmstrøm, Martin; Britz, Ralf; Matschiner, Michael; Tørresen, Ole K; Hadiaty, Renny Kurnia; Yaakob, Norsham; Tan, Heok Hui; Jakobsen, Kjetill Sigurd; Salzburger, Walter; Rüber, Lukas

    2018-01-01

    Abstract The world’s smallest fishes belong to the genus Paedocypris. These miniature fishes are endemic to an extreme habitat: the peat swamp forests in Southeast Asia, characterized by highly acidic blackwater. This threatened habitat is home to a large array of fishes, including a number of miniaturized but also developmentally truncated species. Especially the genus Paedocypris is characterized by profound, organism-wide developmental truncation, resulting in sexually mature individuals of <8 mm in length with a larval phenotype. Here, we report on evolutionary simplification in the genomes of two species of the dwarf minnow genus Paedocypris using whole-genome sequencing. The two species feature unprecedented Hox gene loss and genome reduction in association with their massive developmental truncation. We also show how other genes involved in the development of musculature, nervous system, and skeleton have been lost in Paedocypris, mirroring its highly progenetic phenotype. Further, our analyses suggest two mechanisms responsible for the genome streamlining in Paedocypris in relation to other Cypriniformes: severe intron shortening and reduced repeat content. As the first report on the genomic sequence of a vertebrate species with organism-wide developmental truncation, the results of our work enhance our understanding of genome evolution and how genotypes are translated to phenotypes. In addition, as a naturally simplified system closely related to zebrafish, Paedocypris provides novel insights into vertebrate development. PMID:29684203

  11. The Most Developmentally Truncated Fishes Show Extensive Hox Gene Loss and Miniaturized Genomes.

    Science.gov (United States)

    Malmstrøm, Martin; Britz, Ralf; Matschiner, Michael; Tørresen, Ole K; Hadiaty, Renny Kurnia; Yaakob, Norsham; Tan, Heok Hui; Jakobsen, Kjetill Sigurd; Salzburger, Walter; Rüber, Lukas

    2018-04-01

    The world's smallest fishes belong to the genus Paedocypris. These miniature fishes are endemic to an extreme habitat: the peat swamp forests in Southeast Asia, characterized by highly acidic blackwater. This threatened habitat is home to a large array of fishes, including a number of miniaturized but also developmentally truncated species. Especially the genus Paedocypris is characterized by profound, organism-wide developmental truncation, resulting in sexually mature individuals of <8 mm in length with a larval phenotype. Here, we report on evolutionary simplification in the genomes of two species of the dwarf minnow genus Paedocypris using whole-genome sequencing. The two species feature unprecedented Hox gene loss and genome reduction in association with their massive developmental truncation. We also show how other genes involved in the development of musculature, nervous system, and skeleton have been lost in Paedocypris, mirroring its highly progenetic phenotype. Further, our analyses suggest two mechanisms responsible for the genome streamlining in Paedocypris in relation to other Cypriniformes: severe intron shortening and reduced repeat content. As the first report on the genomic sequence of a vertebrate species with organism-wide developmental truncation, the results of our work enhance our understanding of genome evolution and how genotypes are translated to phenotypes. In addition, as a naturally simplified system closely related to zebrafish, Paedocypris provides novel insights into vertebrate development.

  12. GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes.

    Science.gov (United States)

    Yuan, Lina; Yu, Yang; Zhu, Yanmin; Li, Yulai; Li, Changqing; Li, Rujiao; Ma, Qin; Siu, Gilman Kit-Hang; Yu, Jun; Jiang, Taijiao; Xiao, Jingfa; Kang, Yu

    2017-01-25

    Next-generation sequencing (NGS) technologies have greatly promoted the genomic study of prokaryotes. However, highly fragmented assemblies due to short reads from NGS are still a limiting factor in gaining insights into the genome biology. Reference-assisted tools are promising in genome assembly, but tend to result in false assembly when the assigned reference has extensive rearrangements. Herein, we present GAAP, a genome assembly pipeline for scaffolding based on core-gene-defined Genome Organizational Framework (cGOF) described in our previous study. Instead of assigning references, we use the multiple-reference-derived cGOFs as indexes to assist in order and orientation of the scaffolds and build a skeleton structure, and then use read pairs to extend scaffolds, called local scaffolding, and distinguish between true and chimeric adjacencies in the scaffolds. In our performance tests using both empirical and simulated data of 15 genomes in six species with diverse genome size, complexity, and all three categories of cGOFs, GAAP outcompetes or achieves comparable results when compared to three other reference-assisted programs, AlignGraph, Ragout and MeDuSa. GAAP uses both cGOF and pair-end reads to create assemblies in genomic scale, and performs better than the currently available reference-assisted assembly tools as it recovers more assemblies and makes fewer false locations, especially for species with extensive rearranged genomes. Our method is a promising solution for reconstruction of genome sequence from short reads of NGS.

  13. Gene prediction in the fathead minnow [Pimephales promelas] genome

    Science.gov (United States)

    The fathead minnow is a well-established model organism which has been widely used for regulatory ecotoxicity testing and research for over half century. While much information has been gathered on the organism over the years, the fathead minnow genome, a critical source of infor...

  14. Sugar Lego: gene composition of bacterial carbohydrate metabolism genomic loci.

    Science.gov (United States)

    Kaznadzey, Anna; Shelyakin, Pavel; Gelfand, Mikhail S

    2017-11-25

    Bacterial carbohydrate metabolism is extremely diverse, since carbohydrates serve as a major energy source and are involved in a variety of cellular processes. Bacterial genes belonging to same metabolic pathway are often co-localized in the chromosome, but it is not a strict rule. Gene co-localization in linked to co-evolution and co-regulation. This study focuses on a large-scale analysis of bacterial genomic loci related to the carbohydrate metabolism. We demonstrate that only 53% of 148,000 studied genes from over six hundred bacterial genomes are co-localized in bacterial genomes with other carbohydrate metabolism genes, which points to a significant role of singleton genes. Co-localized genes form cassettes, ranging in size from two to fifteen genes. Two major factors influencing the cassette-forming tendency are gene function and bacterial phylogeny. We have obtained a comprehensive picture of co-localization preferences of genes for nineteen major carbohydrate metabolism functional classes, over two hundred gene orthologous clusters, and thirty bacterial classes, and characterized the cassette variety in size and content among different species, highlighting a significant role of short cassettes. The preference towards co-localization of carbohydrate metabolism genes varies between 40 and 76% for bacterial taxa. Analysis of frequently co-localized genes yielded forty-five significant pairwise links between genes belonging to different functional classes. The number of such links per class range from zero to eight, demonstrating varying preferences of respective genes towards a specific chromosomal neighborhood. Genes from eleven functional classes tend to co-localize with genes from the same class, indicating an important role of clustering of genes with similar functions. At that, in most cases such co-localization does not originate from local duplication events. Overall, we describe a complex web formed by evolutionary relationships of bacterial

  15. Characterization and distribution of repetitive elements in association with genes in the human genome.

    Science.gov (United States)

    Liang, Kai-Chiang; Tseng, Joseph T; Tsai, Shaw-Jenq; Sun, H Sunny

    2015-08-01

    Repetitive elements constitute more than 50% of the human genome. Recent studies implied that the complexity of living organisms is not just a direct outcome of a number of coding sequences; the repetitive elements, which do not encode proteins, may also play a significant role. Though scattered studies showed that repetitive elements in the regulatory regions of a gene control gene expression, no systematic survey has been done to report the characterization and distribution of various types of these repetitive elements in the human genome. Sequences from 5' and 3' untranslated regions and upstream and downstream of a gene were downloaded from the Ensembl database. The repetitive elements in the neighboring of each gene were identified and classified using cross-matching implemented in the RepeatMasker. The annotation and distribution of distinct classes of repetitive elements associated with individual gene were collected to characterize genes in association with different types of repetitive elements using systems biology program. We identified a total of 1,068,400 repetitive elements which belong to 37-class families and 1235 subclasses that are associated with 33,761 genes and 57,365 transcripts. In addition, we found that the tandem repeats preferentially locate proximal to the transcription start site (TSS) of genes and the major function of these genes are involved in developmental processes. On the other hand, interspersed repetitive elements showed a tendency to be accumulated at distal region from the TSS and the function of interspersed repeat-containing genes took part in the catabolic/metabolic processes. Results from the distribution analysis were collected and used to construct a gene-based repetitive element database (GBRED; http://www.binfo.ncku.edu.tw/GBRED/index.html). A user-friendly web interface was designed to provide the information of repetitive elements associated with any particular gene(s). This is the first study focusing on the gene

  16. Differential retention of metabolic genes following whole-genome duplication.

    Science.gov (United States)

    Gout, Jean-François; Duret, Laurent; Kahn, Daniel

    2009-05-01

    Classical studies in Metabolic Control Theory have shown that metabolic fluxes usually exhibit little sensitivity to changes in individual enzyme activity, yet remain sensitive to global changes of all enzymes in a pathway. Therefore, little selective pressure is expected on the dosage or expression of individual metabolic genes, yet entire pathways should still be constrained. However, a direct estimate of this selective pressure had not been evaluated. Whole-genome duplications (WGDs) offer a good opportunity to address this question by analyzing the fates of metabolic genes during the massive gene losses that follow. Here, we take advantage of the successive rounds of WGD that occurred in the Paramecium lineage. We show that metabolic genes exhibit different gene retention patterns than nonmetabolic genes. Contrary to what was expected for individual genes, metabolic genes appeared more retained than other genes after the recent WGD, which was best explained by selection for gene expression operating on entire pathways. Metabolic genes also tend to be less retained when present at high copy number before WGD, contrary to other genes that show a positive correlation between gene retention and preduplication copy number. This is rationalized on the basis of the classical concave relationship relating metabolic fluxes with enzyme expression.

  17. In-silico human genomics with GeneCards

    Directory of Open Access Journals (Sweden)

    Stelzer Gil

    2011-10-01

    Full Text Available Abstract Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org. This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.

  18. Genetic addiction: selfish gene's strategy for symbiosis in the genome.

    Science.gov (United States)

    Mochizuki, Atsushi; Yahara, Koji; Kobayashi, Ichizo; Iwasa, Yoh

    2006-02-01

    The evolution and maintenance of the phenomenon of postsegregational host killing or genetic addiction are paradoxical. In this phenomenon, a gene complex, once established in a genome, programs death of a host cell that has eliminated it. The intact form of the gene complex would survive in other members of the host population. It is controversial as to why these genetic elements are maintained, due to the lethal effects of host killing, or perhaps some other properties are beneficial to the host. We analyzed their population dynamics by analytical methods and computer simulations. Genetic addiction turned out to be advantageous to the gene complex in the presence of a competitor genetic element. The advantage is, however, limited in a population without spatial structure, such as that in a well-mixed liquid culture. In contrast, in a structured habitat, such as the surface of a solid medium, the addiction gene complex can increase in frequency, irrespective of its initial density. Our demonstration that genomes can evolve through acquisition of addiction genes has implications for the general question of how a genome can evolve as a community of potentially selfish genes.

  19. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    Energy Technology Data Exchange (ETDEWEB)

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  20. Gene organization in rice revealed by full-length cDNA mapping and gene expression analysis through microarray.

    Directory of Open Access Journals (Sweden)

    Kouji Satoh

    Full Text Available Rice (Oryza sativa L. is a model organism for the functional genomics of monocotyledonous plants since the genome size is considerably smaller than those of other monocotyledonous plants. Although highly accurate genome sequences of indica and japonica rice are available, additional resources such as full-length complementary DNA (FL-cDNA sequences are also indispensable for comprehensive analyses of gene structure and function. We cross-referenced 28.5K individual loci in the rice genome defined by mapping of 578K FL-cDNA clones with the 56K loci predicted in the TIGR genome assembly. Based on the annotation status and the presence of corresponding cDNA clones, genes were classified into 23K annotated expressed (AE genes, 33K annotated non-expressed (ANE genes, and 5.5K non-annotated expressed (NAE genes. We developed a 60mer oligo-array for analysis of gene expression from each locus. Analysis of gene structures and expression levels revealed that the general features of gene structure and expression of NAE and ANE genes were considerably different from those of AE genes. The results also suggested that the cloning efficiency of rice FL-cDNA is associated with the transcription activity of the corresponding genetic locus, although other factors may also have an effect. Comparison of the coverage of FL-cDNA among gene families suggested that FL-cDNA from genes encoding rice- or eukaryote-specific domains, and those involved in regulatory functions were difficult to produce in bacterial cells. Collectively, these results indicate that rice genes can be divided into distinct groups based on transcription activity and gene structure, and that the coverage bias of FL-cDNA clones exists due to the incompatibility of certain eukaryotic genes in bacteria.

  1. Gene therapy and genome surgery in the retina.

    Science.gov (United States)

    DiCarlo, James E; Mahajan, Vinit B; Tsang, Stephen H

    2018-06-01

    Precision medicine seeks to treat disease with molecular specificity. Advances in genome sequence analysis, gene delivery, and genome surgery have allowed clinician-scientists to treat genetic conditions at the level of their pathology. As a result, progress in treating retinal disease using genetic tools has advanced tremendously over the past several decades. Breakthroughs in gene delivery vectors, both viral and nonviral, have allowed the delivery of genetic payloads in preclinical models of retinal disorders and have paved the way for numerous successful clinical trials. Moreover, the adaptation of CRISPR-Cas systems for genome engineering have enabled the correction of both recessive and dominant pathogenic alleles, expanding the disease-modifying power of gene therapies. Here, we highlight the translational progress of gene therapy and genome editing of several retinal disorders, including RPE65-, CEP290-, and GUY2D-associated Leber congenital amaurosis, as well as choroideremia, achromatopsia, Mer tyrosine kinase- (MERTK-) and RPGR X-linked retinitis pigmentosa, Usher syndrome, neovascular age-related macular degeneration, X-linked retinoschisis, Stargardt disease, and Leber hereditary optic neuropathy.

  2. Genomic dissection and prioritizing of candidate genes of QTL for ...

    Indian Academy of Sciences (India)

    of Anatomy and Neurobiology, University of Tennessee Health Science Center, Memphis, TN 38163, USA. 5Mudanjiang ..... Fragile X mental retardation gene 1,. −2.1 ... stimulus/stress and signalling associated with acute-phase response were .... This work was supported by the Center of Genomics and Bioinfor- matics and ...

  3. Re-Examining the Gene in Personalized Genomics

    Science.gov (United States)

    Bartol, Jordan

    2013-01-01

    Personalized genomics companies (PG; also called "direct-to-consumer genetics") are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept…

  4. Gene hunting: molecular analysis of the chicken genome

    NARCIS (Netherlands)

    Crooijmans, R.P.M.A.

    2000-01-01

    This dissertation describes the development of molecular tools to identify genes that are involved in production and health traits in poultry. To unravel the chicken genome, fluorescent molecular markers (microsatellite markers) were developed and optimized to perform high throughput

  5. Genomic dissection and prioritizing of candidate genes of QTL for ...

    Indian Academy of Sciences (India)

    Genomic dissection and prioritizing of candidate genes of QTL for regulating spontaneous arthritis on chromosome 1 in mice deficient for interleukin-1 receptor antagonist. Yanhong Cao, Jifei Zhang, Yan Jiao, Jian Yan, Feng Jiao, XiaoYun Liu, Robert W. Williams, Karen A. Hasty,. John M. Stuart and Weikuan Gu. J. Genet.

  6. Comparative genomics and transcriptomics of trait-gene association

    Directory of Open Access Journals (Sweden)

    Pierlé Sebastián

    2012-11-01

    Full Text Available Abstract Background The Order Rickettsiales includes important tick-borne pathogens, from Rickettsia rickettsii, which causes Rocky Mountain spotted fever, to Anaplasma marginale, the most prevalent vector-borne pathogen of cattle. Although most pathogens in this Order are transmitted by arthropod vectors, little is known about the microbial determinants of transmission. A. marginale provides unique tools for studying the determinants of transmission, with multiple strain sequences available that display distinct and reproducible transmission phenotypes. The closed core A. marginale genome suggests that any phenotypic differences are due to single nucleotide polymorphisms (SNPs. We combined DNA/RNA comparative genomic approaches using strains with different tick transmission phenotypes and identified genes that segregate with transmissibility. Results Comparison of seven strains with different transmission phenotypes generated a list of SNPs affecting 18 genes and nine promoters. Transcriptional analysis found two candidate genes downstream from promoter SNPs that were differentially transcribed. To corroborate the comparative genomics approach we used three RNA-seq platforms to analyze the transcriptomes from two A. marginale strains with different transmission phenotypes. RNA-seq analysis confirmed the comparative genomics data and found 10 additional genes whose transcription between strains with distinct transmission efficiencies was significantly different. Six regions of the genome that contained no annotation were found to be transcriptionally active, and two of these newly identified transcripts were differentially transcribed. Conclusions This approach identified 30 genes and two novel transcripts potentially involved in tick transmission. We describe the transcriptome of an obligate intracellular bacterium in depth, while employing massive parallel sequencing to dissect an important trait in bacterial pathogenesis.

  7. Sampling the genomic pool of protein tyrosine kinase genes using the polymerase chain reaction with genomic DNA.

    Science.gov (United States)

    Oates, A C; Wollberg, P; Achen, M G; Wilks, A F

    1998-08-28

    The polymerase chain reaction (PCR), with cDNA as template, has been widely used to identify members of protein families from many species. A major limitation of using cDNA in PCR is that detection of a family member is dependent on temporal and spatial patterns of gene expression. To circumvent this restriction, and in order to develop a technique that is broadly applicable we have tested the use of genomic DNA as PCR template to identify members of protein families in an expression-independent manner. This test involved amplification of DNA encoding protein tyrosine kinase (PTK) genes from the genomes of three animal species that are well known development models; namely, the mouse Mus musculus, the fruit fly Drosophila melanogaster, and the nematode worm Caenorhabditis elegans. Ten PTK genes were identified from the mouse, 13 from the fruit fly, and 13 from the nematode worm. Among these kinases were 13 members of the PTK family that had not been reported previously. Selected PTKs from this screen were shown to be expressed during development, demonstrating that the amplified fragments did not arise from pseudogenes. This approach will be useful for the identification of many novel members of gene families in organisms of agricultural, medical, developmental and evolutionary significance and for analysis of gene families from any species, or biological sample whose habitat precludes the isolation of mRNA. Furthermore, as a tool to hasten the discovery of members of gene families that are of particular interest, this method offers an opportunity to sample the genome for new members irrespective of their expression pattern.

  8. Major soybean maturity gene haplotypes revealed by SNPViz analysis of 72 sequenced soybean genomes.

    Directory of Open Access Journals (Sweden)

    Tiffany Langewisch

    Full Text Available In this Genomics Era, vast amounts of next-generation sequencing data have become publicly available for multiple genomes across hundreds of species. Analyses of these large-scale datasets can become cumbersome, especially when comparing nucleotide polymorphisms across many samples within a dataset and among different datasets or organisms. To facilitate the exploration of allelic variation and diversity, we have developed and deployed an in-house computer software to categorize and visualize these haplotypes. The SNPViz software enables users to analyze region-specific haplotypes from single nucleotide polymorphism (SNP datasets for different sequenced genomes. The examination of allelic variation and diversity of important soybean [Glycine max (L. Merr.] flowering time and maturity genes may provide additional insight into flowering time regulation and enhance researchers' ability to target soybean breeding for particular environments. For this study, we utilized two available soybean genomic datasets for a total of 72 soybean genotypes encompassing cultivars, landraces, and the wild species Glycine soja. The major soybean maturity genes E1, E2, E3, and E4 along with the Dt1 gene for plant growth architecture were analyzed in an effort to determine the number of major haplotypes for each gene, to evaluate the consistency of the haplotypes with characterized variant alleles, and to identify evidence of artificial selection. The results indicated classification of a small number of predominant haplogroups for each gene and important insights into possible allelic diversity for each gene within the context of known causative mutations. The software has both a stand-alone and web-based version and can be used to analyze other genes, examine additional soybean datasets, and view similar genome sequence and SNP datasets from other species.

  9. LEMONS - A Tool for the Identification of Splice Junctions in Transcriptomes of Organisms Lacking Reference Genomes.

    Directory of Open Access Journals (Sweden)

    Liron Levin

    Full Text Available RNA-seq is becoming a preferred tool for genomics studies of model and non-model organisms. However, DNA-based analysis of organisms lacking sequenced genomes cannot rely on RNA-seq data alone to isolate most genes of interest, as DNA codes both exons and introns. With this in mind, we designed a novel tool, LEMONS, that exploits the evolutionary conservation of both exon/intron boundary positions and splice junction recognition signals to produce high throughput splice-junction predictions in the absence of a reference genome. When tested on multiple annotated vertebrate mRNA data, LEMONS accurately identified 87% (average of the splice-junctions. LEMONS was then applied to our updated Mediterranean chameleon transcriptome, which lacks a reference genome, and predicted a total of 90,820 exon-exon junctions. We experimentally verified these splice-junction predictions by amplifying and sequencing twenty randomly selected genes from chameleon DNA templates. Exons and introns were detected in 19 of 20 of the positions predicted by LEMONS. To the best of our knowledge, LEMONS is currently the only experimentally verified tool that can accurately predict splice-junctions in organisms that lack a reference genome.

  10. Genes Important for Schizosaccharomyces pombe Meiosis Identified Through a Functional Genomics Screen

    Science.gov (United States)

    Blyth, Julie; Makrantoni, Vasso; Barton, Rachael E.; Spanos, Christos; Rappsilber, Juri; Marston, Adele L.

    2018-01-01

    Meiosis is a specialized cell division that generates gametes, such as eggs and sperm. Errors in meiosis result in miscarriages and are the leading cause of birth defects; however, the molecular origins of these defects remain unknown. Studies in model organisms are beginning to identify the genes and pathways important for meiosis, but the parts list is still poorly defined. Here we present a comprehensive catalog of genes important for meiosis in the fission yeast, Schizosaccharomyces pombe. Our genome-wide functional screen surveyed all nonessential genes for roles in chromosome segregation and spore formation. Novel genes important at distinct stages of the meiotic chromosome segregation and differentiation program were identified. Preliminary characterization implicated three of these genes in centrosome/spindle pole body, centromere, and cohesion function. Our findings represent a near-complete parts list of genes important for meiosis in fission yeast, providing a valuable resource to advance our molecular understanding of meiosis. PMID:29259000

  11. cDNA structure, genomic organization and expression patterns of ...

    African Journals Online (AJOL)

    Visfatin was a newly identified adipocytokine, which was involved in various physiologic and pathologic processes of organisms. The cDNA structure, genomic organization and expression patterns of silver Prussian carp visfatin were described in this report. The silver Prussian carp visfatin cDNA cloned from the liver was ...

  12. New genes expressed in human brains: implications for annotating evolving genomes.

    Science.gov (United States)

    Zhang, Yong E; Landback, Patrick; Vibranovski, Maria; Long, Manyuan

    2012-11-01

    New genes have frequently formed and spread to fixation in a wide variety of organisms, constituting abundant sets of lineage-specific genes. It was recently reported that an excess of primate-specific and human-specific genes were upregulated in the brains of fetuses and infants, and especially in the prefrontal cortex, which is involved in cognition. These findings reveal the prevalent addition of new genetic components to the transcriptome of the human brain. More generally, these findings suggest that genomes are continually evolving in both sequence and content, eroding the conservation endowed by common ancestry. Despite increasing recognition of the importance of new genes, we highlight here that these genes are still seriously under-characterized in functional studies and that new gene annotation is inconsistent in current practice. We propose an integrative approach to annotate new genes, taking advantage of functional and evolutionary genomic methods. We finally discuss how the refinement of new gene annotation will be important for the detection of evolutionary forces governing new gene origination. Copyright © 2012 WILEY Periodicals, Inc.

  13. Azolla--a model organism for plant genomic studies.

    Science.gov (United States)

    Qiu, Yin-Long; Yu, Jun

    2003-02-01

    The aquatic ferns of the genus Azolla are nitrogen-fixing plants that have great potentials in agricultural production and environmental conservation. Azolla in many aspects is qualified to serve as a model organism for genomic studies because of its importance in agriculture, its unique position in plant evolution, its symbiotic relationship with the N2-fixing cyanobacterium, Anabaena azollae, and its moderate-sized genome. The goals of this genome project are not only to understand the biology of the Azolla genome to promote its applications in biological research and agriculture practice but also to gain critical insights about evolution of plant genomes. Together with the strategic and technical improvement as well as cost reduction of DNA sequencing, the deciphering of their genetic code is imminent.

  14. Genome-wide identification of key modulators of gene-gene interaction networks in breast cancer.

    Science.gov (United States)

    Chiu, Yu-Chiao; Wang, Li-Ju; Hsiao, Tzu-Hung; Chuang, Eric Y; Chen, Yidong

    2017-10-03

    With the advances in high-throughput gene profiling technologies, a large volume of gene interaction maps has been constructed. A higher-level layer of gene-gene interaction, namely modulate gene interaction, is composed of gene pairs of which interaction strengths are modulated by (i.e., dependent on) the expression level of a key modulator gene. Systematic investigations into the modulation by estrogen receptor (ER), the best-known modulator gene, have revealed the functional and prognostic significance in breast cancer. However, a genome-wide identification of key modulator genes that may further unveil the landscape of modulated gene interaction is still lacking. We proposed a systematic workflow to screen for key modulators based on genome-wide gene expression profiles. We designed four modularity parameters to measure the ability of a putative modulator to perturb gene interaction networks. Applying the method to a dataset of 286 breast tumors, we comprehensively characterized the modularity parameters and identified a total of 973 key modulator genes. The modularity of these modulators was verified in three independent breast cancer datasets. ESR1, the encoding gene of ER, appeared in the list, and abundant novel modulators were illuminated. For instance, a prognostic predictor of breast cancer, SFRP1, was found the second modulator. Functional annotation analysis of the 973 modulators revealed involvements in ER-related cellular processes as well as immune- and tumor-associated functions. Here we present, as far as we know, the first comprehensive analysis of key modulator genes on a genome-wide scale. The validity of filtering parameters as well as the conservativity of modulators among cohorts were corroborated. Our data bring new insights into the modulated layer of gene-gene interaction and provide candidates for further biological investigations.

  15. Comparative genome analysis of PHB gene family reveals deep evolutionary origins and diverse gene function.

    Science.gov (United States)

    Di, Chao; Xu, Wenying; Su, Zhen; Yuan, Joshua S

    2010-10-07

    PHB (Prohibitin) gene family is involved in a variety of functions important for different biological processes. PHB genes are ubiquitously present in divergent species from prokaryotes to eukaryotes. Human PHB genes have been found to be associated with various diseases. Recent studies by our group and others have shown diverse function of PHB genes in plants for development, senescence, defence, and others. Despite the importance of the PHB gene family, no comprehensive gene family analysis has been carried to evaluate the relatedness of PHB genes across different species. In order to better guide the gene function analysis and understand the evolution of the PHB gene family, we therefore carried out the comparative genome analysis of the PHB genes across different kingdoms. The relatedness, motif distribution, and intron/exon distribution all indicated that PHB genes is a relatively conserved gene family. The PHB genes can be classified into 5 classes and each class have a very deep evolutionary origin. The PHB genes within the class maintained the same motif patterns during the evolution. With Arabidopsis as the model species, we found that PHB gene intron/exon structure and domains are also conserved during the evolution. Despite being a conserved gene family, various gene duplication events led to the expansion of the PHB genes. Both segmental and tandem gene duplication were involved in Arabidopsis PHB gene family expansion. However, segmental duplication is predominant in Arabidopsis. Moreover, most of the duplicated genes experienced neofunctionalization. The results highlighted that PHB genes might be involved in important functions so that the duplicated genes are under the evolutionary pressure to derive new function. PHB gene family is a conserved gene family and accounts for diverse but important biological functions based on the similar molecular mechanisms. The highly diverse biological function indicated that more research needs to be carried out

  16. Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.

    Science.gov (United States)

    Hiscock, D; Upton, C

    2000-05-01

    The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .

  17. Genomic organization of a vancomycin-resistant staphylococcus aureus

    International Nuclear Information System (INIS)

    Mirani, A.Z.; Jamil, N.

    2013-01-01

    Objective: To study the genomic organization of vancomycin resistance in a local isolate of vancomycin resistant Staphylococcus aureus (VRSA). Study Design: Experimental study. Place and Duration of Study: Department of Microbiology, University of Karachi, January 2008 through December 2010. Methodology: A vancomycin-resistant Staphylococcus aureus (VRSA-CP2) isolate (MIC 16 mu g/ml) was isolated from a local hospital of Karachi. Species identification was confirmed by Gram staining, standard biochemical tests and PCR amplification of the nuc gene. The vancomycin MIC was re-confirmed by E-test. For the genetic determination of vancomycin resistance, in-vitro amplification of vanA cassette was performed by using plasmid DNA of CP2, CP2's transformant as template on MWG Thermo-Cycler. Amplified products of vanR, vanS, vanH, vanA, vanY, orf2, orf1D, orf2E, orf-Rev and IS element genes were subjected to Sanger's electrophoresis based sequence determination using specific primers. The Basic Local Alignment Search Tool (BLAST) algorithm was used to identify sequences in GenBank with similarities to the vanA cassette genes. Results: The vancomycin-resistant isolate CP2 was found to be resistant to oxacillin, chloramphenicol, erythromycin, rifampicin, gentamicin, tetracycline and ciprofloxacin, as well. The isolate CP2 revealed four bands: one of large molecular size approx 56.4 kb and three of small size approx 6.5 kb, approx 6.1 kb and approx 1.5 kb by agarose gel electrophoresis indicating the presence of 3 plasmids. The plasmid DNA of isolate CP2 was analyzed by PCR for the presence of the van cassettes with each of the vanA , vanB and vanC specific primers. It carried vanA cassette, which comprises of vanR, vanS, vanH, vanA, vanY, and orf2. The vanA cassette of isolate CP2 also carried an insertion element (IS). However, it did not show the PCR product for orf1. Vancomycin resistance was successfully transferred from the donor CP2 to a vancomycin-sensitive recipient S

  18. Gene Conversion in Angiosperm Genomes with an Emphasis on Genes Duplicated by Polyploidization

    Directory of Open Access Journals (Sweden)

    Xi-Yin Wang

    2011-01-01

    Full Text Available Angiosperm genomes differ from those of mammals by extensive and recursive polyploidizations. The resulting gene duplication provides opportunities both for genetic innovation, and for concerted evolution. Though most genes may escape conversion by their homologs, concerted evolution of duplicated genes can last for millions of years or longer after their origin. Indeed, paralogous genes on two rice chromosomes duplicated an estimated 60–70 million years ago have experienced gene conversion in the past 400,000 years. Gene conversion preserves similarity of paralogous genes, but appears to accelerate their divergence from orthologous genes in other species. The mutagenic nature of recombination coupled with the buffering effect provided by gene redundancy, may facilitate the evolution of novel alleles that confer functional innovations while insulating biological fitness of affected plants. A mixed evolutionary model, characterized by a primary birth-and-death process and occasional homoeologous recombination and gene conversion, may best explain the evolution of multigene families.

  19. Genome-wide associations of gene expression variation in humans.

    Directory of Open Access Journals (Sweden)

    Barbara E Stranger

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  20. Genome-Wide Associations of Gene Expression Variation in Humans.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  1. Occupancy of chromatin organizers in the Epstein-Barr virus genome.

    Science.gov (United States)

    Holdorf, Meghan M; Cooper, Samantha B; Yamamoto, Keith R; Miranda, J J L

    2011-06-20

    The human CCCTC-binding factor, CTCF, regulates transcription of the double-stranded DNA genomes of herpesviruses. The architectural complex cohesin and RNA Polymerase II also contribute to this organization. We profiled the occupancy of CTCF, cohesin, and RNA Polymerase II on the episomal genome of the Epstein-Barr virus in a cell culture model of latent infection. CTCF colocalizes with cohesin but not RNA Polymerase II. CTCF and cohesin bind specific sequences throughout the genome that are found not just proximal to the regulatory elements of latent genes, but also near lytic genes. In addition to tracking with known transcripts, RNA Polymerase II appears at two unannotated positions, one of which lies within the latent origin of replication. The widespread occupancy profile of each protein reveals binding near or at a myriad of regulatory elements and suggests context-dependent functions. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Diversity of 23S rRNA genes within individual prokaryotic genomes.

    Directory of Open Access Journals (Sweden)

    Anna Pei

    Full Text Available BACKGROUND: The concept of ribosomal constraints on rRNA genes is deduced primarily based on the comparison of consensus rRNA sequences between closely related species, but recent advances in whole-genome sequencing allow evaluation of this concept within organisms with multiple rRNA operons. METHODOLOGY/PRINCIPAL FINDINGS: Using the 23S rRNA gene as an example, we analyzed the diversity among individual rRNA genes within a genome. Of 184 prokaryotic species containing multiple 23S rRNA genes, diversity was observed in 113 (61.4% genomes (mean 0.40%, range 0.01%-4.04%. Significant (1.17%-4.04% intragenomic variation was found in 8 species. In 5 of the 8 species, the diversity in the primary structure had only minimal effect on the secondary structure (stem versus loop transition. In the remaining 3 species, the diversity significantly altered local secondary structure, but the alteration appears minimized through complex rearrangement. Intervening sequences (IVS, ranging between 9 and 1471 nt in size, were found in 7 species. IVS in Deinococcus radiodurans and Nostoc sp. encode transposases. T. tengcongensis was the only species in which intragenomic diversity >3% was observed among 4 paralogous 23S rRNA genes. CONCLUSIONS/SIGNIFICANCE: These findings indicate tight ribosomal constraints on individual 23S rRNA genes within a genome. Although classification using primary 23S rRNA sequences could be erroneous, significant diversity among paralogous 23S rRNA genes was observed only once in the 184 species analyzed, indicating little overall impact on the mainstream of 23S rRNA gene-based prokaryotic taxonomy.

  3. Evolutionary maintenance of filovirus-like genes in bat genomes

    Directory of Open Access Journals (Sweden)

    Taylor Derek J

    2011-11-01

    Full Text Available Abstract Background Little is known of the biological significance and evolutionary maintenance of integrated non-retroviral RNA virus genes in eukaryotic host genomes. Here, we isolated novel filovirus-like genes from bat genomes and tested for evolutionary maintenance. We also estimated the age of filovirus VP35-like gene integrations and tested the phylogenetic hypotheses that there is a eutherian mammal clade and a marsupial/ebolavirus/Marburgvirus dichotomy for filoviruses. Results We detected homologous copies of VP35-like and NP-like gene integrations in both Old World and New World species of Myotis (bats. We also detected previously unknown VP35-like genes in rodents that are positionally homologous. Comprehensive phylogenetic estimates for filovirus NP-like and VP35-like loci support two main clades with a marsupial and a rodent grouping within the ebolavirus/Lloviu virus/Marburgvirus clade. The concordance of VP35-like, NP-like and mitochondrial gene trees with the expected species tree supports the notion that the copies we examined are orthologs that predate the global spread and radiation of the genus Myotis. Parametric simulations were consistent with selective maintenance for the open reading frame (ORF of VP35-like genes in Myotis. The ORF of the filovirus-like VP35 gene has been maintained in bat genomes for an estimated 13. 4 MY. ORFs were disrupted for the NP-like genes in Myotis. Likelihood ratio tests revealed that a model that accommodates positive selection is a significantly better fit to the data than a model that does not allow for positive selection for VP35-like sequences. Moreover, site-by-site analysis of selection using two methods indicated at least 25 sites in the VP35-like alignment are under positive selection in Myotis. Conclusions Our results indicate that filovirus-like elements have significance beyond genomic imprints of prior infection. That is, there appears to be, or have been, functionally maintained

  4. Integrated genomic and gene expression profiling identifies two major genomic circuits in urothelial carcinoma.

    Directory of Open Access Journals (Sweden)

    David Lindgren

    Full Text Available Similar to other malignancies, urothelial carcinoma (UC is characterized by specific recurrent chromosomal aberrations and gene mutations. However, the interconnection between specific genomic alterations, and how patterns of chromosomal alterations adhere to different molecular subgroups of UC, is less clear. We applied tiling resolution array CGH to 146 cases of UC and identified a number of regions harboring recurrent focal genomic amplifications and deletions. Several potential oncogenes were included in the amplified regions, including known oncogenes like E2F3, CCND1, and CCNE1, as well as new candidate genes, such as SETDB1 (1q21, and BCL2L1 (20q11. We next combined genome profiling with global gene expression, gene mutation, and protein expression data and identified two major genomic circuits operating in urothelial carcinoma. The first circuit was characterized by FGFR3 alterations, overexpression of CCND1, and 9q and CDKN2A deletions. The second circuit was defined by E3F3 amplifications and RB1 deletions, as well as gains of 5p, deletions at PTEN and 2q36, 16q, 20q, and elevated CDKN2A levels. TP53/MDM2 alterations were common for advanced tumors within the two circuits. Our data also suggest a possible RAS/RAF circuit. The tumors with worst prognosis showed a gene expression profile that indicated a keratinized phenotype. Taken together, our integrative approach revealed at least two separate networks of genomic alterations linked to the molecular diversity seen in UC, and that these circuits may reflect distinct pathways of tumor development.

  5. Bacillus anthracis genome organization in light of whole transcriptome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.; Bergman, Nicholas; Borodovsky, Mark

    2010-03-22

    Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computational predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.

  6. The complete mitochondrial genome of Setaria digitata (Nematoda: Filarioidea): Mitochondrial gene content, arrangement and composition compared with other nematodes.

    Science.gov (United States)

    Yatawara, Lalani; Wickramasinghe, Susiji; Rajapakse, R P V J; Agatsuma, Takeshi

    2010-09-01

    In the present study, we determined the complete mitochondrial (mt) genome sequence (13,839bp) of parasitic nematode Setaria digitata and its structure and organization compared with Onchocerca volvulus, Dirofilaria immitis and Brugia malayi. The mt genome of S. digitata is slightly larger than the mt genomes of other filarial nematodes. S. digitata mt genome contains 36 genes (12 protein-coding genes, 22 transfer RNAs and 2 ribosomal RNAs) that are typically found in metazoans. This genome contains a high A+T (75.1%) content and low G+C content (24.9%). The mt gene order for S. digitata is the same as those for O. volvulus, D. immitis and B. malayi but it is distinctly different from other nematodes compared. The start codons inferred in the mt genome of S. digitata are TTT, ATT, TTG, ATG, GTT and ATA. Interestingly, the initiation codon TTT is unique to S. digitata mt genome and four protein-coding genes use this codon as a translation initiation codon. Five protein-coding genes use TAG as a stop codon whereas three genes use TAA and four genes use T as a termination codon. Out of 64 possible codons, only 57 are used for mitochondrial protein-coding genes of S. digitata. T-rich codons such as TTT (18.9%), GTT (7.9%), TTG (7.8%), TAT (7%), ATT (5.7%), TCT (4.8%) and TTA (4.1%) are used more frequently. This pattern of codon usage reflects the strong bias for T in the mt genome of S. digitata. In conclusion, the present investigation provides new molecular data for future studies of the comparative mitochondrial genomics and systematic of parasitic nematodes of socio-economic importance. 2010 Elsevier B.V. All rights reserved.

  7. Gene Discovery through Genomic Sequencing of Brucella abortus

    Science.gov (United States)

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  8. Genome-wide identification, evolutionary and expression analysis of the aspartic protease gene superfamily in grape

    Science.gov (United States)

    2013-01-01

    Background Aspartic proteases (APs) are a large family of proteolytic enzymes found in almost all organisms. In plants, they are involved in many biological processes, such as senescence, stress responses, programmed cell death, and reproduction. Prior to the present study, no grape AP gene(s) had been reported, and their research on woody species was very limited. Results In this study, a total of 50 AP genes (VvAP) were identified in the grape genome, among which 30 contained the complete ASP domain. Synteny analysis within grape indicated that segmental and tandem duplication events contributed to the expansion of the grape AP family. Additional analysis between grape and Arabidopsis demonstrated that several grape AP genes were found in the corresponding syntenic blocks of Arabidopsis, suggesting that these genes arose before the divergence of grape and Arabidopsis. Phylogenetic relationships of the 30 VvAPs with the complete ASP domain and their Arabidopsis orthologs, as well as their gene and protein features were analyzed and their cellular localization was predicted. Moreover, expression profiles of VvAP genes in six different tissues were determined, and their transcript abundance under various stresses and hormone treatments were measured. Twenty-seven VvAP genes were expressed in at least one of the six tissues examined; nineteen VvAPs responded to at least one abiotic stress, 12 VvAPs responded to powdery mildew infection, and most of the VvAPs responded to SA and ABA treatments. Furthermore, integrated synteny and phylogenetic analysis identified orthologous AP genes between grape and Arabidopsis, providing a unique starting point for investigating the function of grape AP genes. Conclusions The genome-wide identification, evolutionary and expression analyses of grape AP genes provide a framework for future analysis of AP genes in defining their roles during stress response. Integrated synteny and phylogenetic analyses provide novel insight into the

  9. Evolution of endogenous non-retroviral genes integrated into plant genomes

    Directory of Open Access Journals (Sweden)

    Hyosub Chu

    2014-08-01

    Full Text Available Numerous comparative genome analyses have revealed the wide extent of horizontal gene transfer (HGT in living organisms, which contributes to their evolution and genetic diversity. Viruses play important roles in HGT. Endogenous viral elements (EVEs are defined as viral DNA sequences present within the genomes of non-viral organisms. In eukaryotic cells, the majority of EVEs are derived from RNA viruses using reverse transcription. In contrast, endogenous non-retroviral elements (ENREs are poorly studied. However, the increasing availability of genomic data and the rapid development of bioinformatics tools have enabled the identification of several ENREs in various eukaryotic organisms. To date, a small number of ENREs integrated into plant genomes have been identified. Of the known non-retroviruses, most identified ENREs are derived from double-strand (ds RNA viruses, followed by single-strand (ss DNA and ssRNA viruses. At least eight virus families have been identified. Of these, viruses in the family Partitiviridae are dominant, followed by viruses of the families Chrysoviridae and Geminiviridae. The identified ENREs have been primarily identified in eudicots, followed by monocots. In this review, we briefly discuss the current view on non-retroviral sequences integrated into plant genomes that are associated with plant-virus evolution and their possible roles in antiviral resistance.

  10. Genomic islands of differentiation in two songbird species reveal candidate genes for hybrid female sterility.

    Science.gov (United States)

    Mořkovský, Libor; Janoušek, Václav; Reif, Jiří; Rídl, Jakub; Pačes, Jan; Choleva, Lukáš; Janko, Karel; Nachman, Michael W; Reifová, Radka

    2018-02-01

    Hybrid sterility is a common first step in the evolution of postzygotic reproductive isolation. According to Haldane's Rule, it affects predominantly the heterogametic sex. While the genetic basis of hybrid male sterility in organisms with heterogametic males has been studied for decades, the genetic basis of hybrid female sterility in organisms with heterogametic females has received much less attention. We investigated the genetic basis of reproductive isolation in two closely related avian species, the common nightingale (Luscinia megarhynchos) and the thrush nightingale (L. luscinia), that hybridize in a secondary contact zone and produce viable hybrid progeny. In accordance with Haldane's Rule, hybrid females are sterile, while hybrid males are fertile, allowing gene flow to occur between the species. Using transcriptomic data from multiple individuals of both nightingale species, we identified genomic islands of high differentiation (F ST ) and of high divergence (D xy ), and we analysed gene content and patterns of molecular evolution within these islands. Interestingly, we found that these islands were enriched for genes related to female meiosis and metabolism. The islands of high differentiation and divergence were also characterized by higher levels of linkage disequilibrium than the rest of the genome in both species indicating that they might be situated in genomic regions of low recombination. This study provides one of the first insights into genetic basis of hybrid female sterility in organisms with heterogametic females. © 2018 John Wiley & Sons Ltd.

  11. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    Directory of Open Access Journals (Sweden)

    Borui Pi

    Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  12. A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

    Science.gov (United States)

    Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

    2015-01-01

    Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180

  13. Identification of DNA repair genes in the human genome

    International Nuclear Information System (INIS)

    Hoeijmakers, J.H.J.; van Duin, M.; Westerveld, A.; Yasui, A.; Bootsma, D.

    1986-01-01

    To identify human DNA repair genes we have transfected human genomic DNA ligated to a dominant marker to excision repair deficient xeroderma pigmentosum (XP) and CHO cells. This resulted in the cloning of a human gene, ERCC-1, that complements the defect of a UV- and mitomycin-C sensitive CHO mutant 43-3B. The ERCC-1 gene has a size of 15 kb, consists of 10 exons and is located in the region 19q13.2-q13.3. Its primary transcript is processed into two mRNAs by alternative splicing of an internal coding exon. One of these transcripts encodes a polypeptide of 297 aminoacids. A putative DNA binding protein domain and nuclear location signal could be identified. Significant AA-homology is found between ERCC-1 and the yeast excision repair gene RAD10. 58 references, 6 figures, 1 table

  14. Re-examining the Gene in Personalized Genomics

    Science.gov (United States)

    Bartol, Jordan

    2013-10-01

    Personalized genomics companies (PG; also called `direct-to-consumer genetics') are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept presented to customers and the relation between the information given and the science behind PG. Two quite different gene concepts are present in company rhetoric, but only one features in the science. To explain this, we must appreciate the delicate tension between PG, academic science, public expectation, and market forces.

  15. HCMI Organization | Office of Cancer Genomics

    Science.gov (United States)

    Consortium The Human Cancer Models Initiative (HCMI) was created and funded by the US National Cancer Institute, Cancer Research UK, the foundation Hubrecht Organoid Technology, and the Wellcome Sanger Institute. Together, these organizations develop policy and make programmatic decisions to contribute to the function of the HCMI. National Cancer Institute

  16. GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

    Directory of Open Access Journals (Sweden)

    Promponas Vasilis J

    2003-10-01

    Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating

  17. The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes.

    Directory of Open Access Journals (Sweden)

    Adam Alexander Thil Smith

    2012-05-01

    Full Text Available Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes, a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short. The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.

  18. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David

    2012-01-01

    Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful...... for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps...... more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...

  19. Long-Range Order and Fractality in the Structure and Organization of Eukaryotic Genomes

    Science.gov (United States)

    Polychronopoulos, Dimitris; Tsiagkas, Giannis; Athanasopoulou, Labrini; Sellis, Diamantis; Almirantis, Yannis

    2014-12-01

    The late Professor J.S. Nicolis always emphasized, both in his writings and in presentations and discussions with students and friends, the relevance of a dynamical systems approach to biology. In particular, viewing the genome as a "biological text" captures the dynamical character of both the evolution and function of the organisms in the form of correlations indicating the presence of a long-range order. This genomic structure can be expressed in forms reminiscent of natural languages and several temporal and spatial traces l by the functioning of dynamical systems: Zipf laws, self-similarity and fractality. Here we review several works of our group and recent unpublished results, focusing on the chromosomal distribution of biologically active genomic components: Genes and protein-coding segments, CpG islands, transposable elements belonging to all major classes and several types of conserved non-coding genomic elements. We report the systematic appearance of power-laws in the size distribution of the distances between elements belonging to each of these types of functional genomic elements. Moreover, fractality is also found in several cases, using box-counting and entropic scaling.We present here, for the first time in a unified way, an aggregative model of the genomic dynamics which can explain the observed patterns on the grounds of known phenomena accompanying genome evolution. Our results comply with recent findings about a "fractal globule" geometry of chromatin in the eukaryotic nucleus.

  20. Genome-wide identification, characterization and phylogenetic analysis of 50 catfish ATP-binding cassette (ABC) transporter genes.

    Science.gov (United States)

    Liu, Shikai; Li, Qi; Liu, Zhanjiang

    2013-01-01

    Although a large set of full-length transcripts was recently assembled in catfish, annotation of large gene families, especially those with duplications, is still a great challenge. Most often, complexities in annotation cause mis-identification and thereby much confusion in the scientific literature. As such, detailed phylogenetic analysis and/or orthology analysis are required for annotation of genes involved in gene families. The ATP-binding cassette (ABC) transporter gene superfamily is a large gene family that encodes membrane proteins that transport a diverse set of substrates across membranes, playing important roles in protecting organisms from diverse environment. In this work, we identified a set of 50 ABC transporters in catfish genome. Phylogenetic analysis allowed their identification and annotation into seven subfamilies, including 9 ABCA genes, 12 ABCB genes, 12 ABCC genes, 5 ABCD genes, 2 ABCE genes, 4 ABCF genes and 6 ABCG genes. Most ABC transporters are conserved among vertebrates, though cases of recent gene duplications and gene losses do exist. Gene duplications in catfish were found for ABCA1, ABCB3, ABCB6, ABCC5, ABCD3, ABCE1, ABCF2 and ABCG2. The whole set of catfish ABC transporters provide the essential genomic resources for future biochemical, toxicological and physiological studies of ABC drug efflux transporters. The establishment of orthologies should allow functional inferences with the information from model species, though the function of lineage-specific genes can be distinct because of specific living environment with different selection pressure.

  1. The complete nucleotide sequence, genome organization, and origin of human adenovirus type 11

    International Nuclear Information System (INIS)

    Stone, Daniel; Furthmann, Anne; Sandig, Volker; Lieber, Andre

    2003-01-01

    The complete DNA sequence and transcription map of human adenovirus type 11 are reported here. This is the first published sequence for a subgenera B human adenovirus and demonstrates a genome organization highly similar to those of other human adenoviruses. All of the genes from the early, intermediate, and late regions are present in the expected locations of the genome for a human adenovirus. The genome size is 34,794 bp in length and has a GC content of 48.9%. Sequence alignment with genomes of groups A (Ad12), C (Ad5), D (Ad17), E (Simian adenovirus 25), and F (Ad40) revealed homologies of 64, 54, 68, 75, and 52%, respectively. Detailed genomic analysis demonstrated that Ads 11 and 35 are highly conserved in all areas except the hexon hypervariable regions and fiber. Similarly, comparison of Ad11 with subgroup E SAV25 revealed poor homology between fibers but high homology in proteins encoded by all other areas of the genome. We propose an evolutionary model in which functional viruses can be reconstituted following fiber substitution from one serotype to another. According to this model either the Ad11 genome is a derivative of Ad35, from which the fiber was substituted with Ad7, or the Ad35 genome is the product of a fiber substitution from Ad21 into the Ad11 genome. This model also provides a possible explanation for the origin of group E Ads, which are evolutionarily derived from a group C fiber substitution into a group B genome

  2. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    Energy Technology Data Exchange (ETDEWEB)

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  3. Conditions for the evolution of gene clusters in bacterial genomes.

    Directory of Open Access Journals (Sweden)

    Sara Ballouz

    2010-02-01

    Full Text Available Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model, genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters.

  4. Conditions for the Evolution of Gene Clusters in Bacterial Genomes

    Science.gov (United States)

    Ballouz, Sara; Francis, Andrew R.; Lan, Ruiting; Tanaka, Mark M.

    2010-01-01

    Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters. PMID:20168992

  5. The draft genome of a termite illuminates alternative social organization

    Science.gov (United States)

    Termites have substantial economic and ecological impact worldwide. They are also the oldest organisms living in complex societies, having evolved a caste system independent of that of eusocial Hymenoptera (ants, bees and wasps). Here we provide the first genome sequence for a termite, Zootermopsis ...

  6. A salmonid EST genomic study: genes, duplications, phylogeny and microarrays

    Directory of Open Access Journals (Sweden)

    Brahmbhatt Sonal

    2008-11-01

    Full Text Available Abstract Background Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most widely studied groups of fish. Results 298,304 expressed sequence tags (ESTs from Atlantic salmon (69% of the total, 11,664 chinook, 10,813 sockeye, 10,051 brook trout, 10,975 grayling, 8,630 lake whitefish, and 3,624 northern pike ESTs were obtained in this study and have been deposited into the public databases. Contigs were built and putative full-length Atlantic salmon clones have been identified. A database containing ESTs, assemblies, consensus sequences, open reading frames, gene predictions and putative annotation is available. The overall similarity between Atlantic salmon ESTs and those of rainbow trout, chinook, sockeye, brook trout, grayling, lake whitefish, northern pike and rainbow smelt is 93.4, 94.2, 94.6, 94.4, 92.5, 91.7, 89.6, and 86.2% respectively. An analysis of 78 transcript sets show Salmo as a sister group to Oncorhynchus and Salvelinus within Salmoninae, and Thymallinae as a sister group to Salmoninae and Coregoninae within Salmonidae. Extensive gene duplication is consistent with a genome duplication in the common ancestor of salmonids. Using all of the available EST data, a new expanded salmonid cDNA microarray of 32,000 features was created. Cross-species hybridizations to this cDNA microarray indicate that this resource will be useful for studies of all 68 salmonid species. Conclusion An extensive collection and analysis of salmonid RNA putative transcripts indicate that Pacific salmon, Atlantic salmon and charr are 94–96% similar while the more distant whitefish, grayling, pike and smelt are 93, 92, 89 and 86% similar to salmon. The salmonid transcriptome reveals a complex history of gene duplication that is

  7. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Science.gov (United States)

    Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

    2016-01-01

    Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  8. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Directory of Open Access Journals (Sweden)

    Jianmin Fu

    Full Text Available Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  9. Molecular analysis and genomic organization of major DNA satellites in banana (Musa spp.).

    Science.gov (United States)

    Čížková, Jana; Hřibová, Eva; Humplíková, Lenka; Christelová, Pavla; Suchánková, Pavla; Doležel, Jaroslav

    2013-01-01

    Satellite DNA sequences consist of tandemly arranged repetitive units up to thousands nucleotides long in head-to-tail orientation. The evolutionary processes by which satellites arise and evolve include unequal crossing over, gene conversion, transposition and extra chromosomal circular DNA formation. Large blocks of satellite DNA are often observed in heterochromatic regions of chromosomes and are a typical component of centromeric and telomeric regions. Satellite-rich loci may show specific banding patterns and facilitate chromosome identification and analysis of structural chromosome changes. Unlike many other genomes, nuclear genomes of banana (Musa spp.) are poor in satellite DNA and the information on this class of DNA remains limited. The banana cultivars are seed sterile clones originating mostly from natural intra-specific crosses within M. acuminata (A genome) and inter-specific crosses between M. acuminata and M. balbisiana (B genome). Previous studies revealed the closely related nature of the A and B genomes, including similarities in repetitive DNA. In this study we focused on two main banana DNA satellites, which were previously identified in silico. Their genomic organization and molecular diversity was analyzed in a set of nineteen Musa accessions, including representatives of A, B and S (M. schizocarpa) genomes and their inter-specific hybrids. The two DNA satellites showed a high level of sequence conservation within, and a high homology between Musa species. FISH with probes for the satellite DNA sequences, rRNA genes and a single-copy BAC clone 2G17 resulted in characteristic chromosome banding patterns in M. acuminata and M. balbisiana which may aid in determining genomic constitution in interspecific hybrids. In addition to improving the knowledge on Musa satellite DNA, our study increases the number of cytogenetic markers and the number of individual chromosomes, which can be identified in Musa.

  10. From Genes to Genomes Chances and boundaries of the New Biology

    CERN Document Server

    Winnaker, E L

    1997-01-01

    The goal of my lecture is to show the new dimensions of genome research. It is replacing classic recombinant DNA technologies. The search for single genes is being replaced by the analysis of gene activities of whole cells, organs or organisms. This development changes radically basic biomedical research and points to new therapeutic strategies (examples:cancer,Alzheimer's disease). I will also show the rapid changes of our understanding of gene activity. Mendel's definition of genes is now replaced by molecular terms which teach us how gene expression is regulated and controlled. Finally I will try to outline the limits of genetic analysis and how it raises ethical and moral questions. If the analysis of changes in the genetic read-out are related to diseases for which there is no therapy or if such knowledge only predisposes to genetic diseases the handling of such information requires extraordinary care. The genome projects thus have to be and are being pursued in conjunction with careful ethical analyses ...

  11. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

    Science.gov (United States)

    Biankin, Andrew V; Waddell, Nicola; Kassahn, Karin S; Gingras, Marie-Claude; Muthuswamy, Lakshmi B; Johns, Amber L; Miller, David K; Wilson, Peter J; Patch, Ann-Marie; Wu, Jianmin; Chang, David K; Cowley, Mark J; Gardiner, Brooke B; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J; Gill, Anthony J; Pinho, Andreia V; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R Scott; Humphris, Jeremy L; Kaplan, Warren; Jones, Marc D; Colvin, Emily K; Nagrial, Adnan M; Humphrey, Emily S; Chou, Angela; Chin, Venessa T; Chantrill, Lorraine A; Mawson, Amanda; Samra, Jaswinder S; Kench, James G; Lovell, Jessica A; Daly, Roger J; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M; Fisher, William E; Brunicardi, F Charles; Hodges, Sally E; Reid, Jeffrey G; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R; Dinh, Huyen; Buhay, Christian J; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E; Yung, Christina K; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A; Petersen, Gloria M; Gallinger, Steven; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Schulick, Richard D; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A; Mann, Karen M; Jenkins, Nancy A; Perez-Mancera, Pedro A; Adams, David J; Largaespada, David A; Wessels, Lodewyk F A; Rust, Alistair G; Stein, Lincoln D; Tuveson, David A; Copeland, Neal G; Musgrove, Elizabeth A; Scarpa, Aldo; Eshleman, James R; Hudson, Thomas J; Sutherland, Robert L; Wheeler, David A; Pearson, John V; McPherson, John D; Gibbs, Richard A; Grimmond, Sean M

    2012-11-15

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.

  12. Large clusters of co-expressed genes in the Drosophila genome.

    Science.gov (United States)

    Boutanaev, Alexander M; Kalmykova, Alla I; Shevelyov, Yuri Y; Nurminsky, Dmitry I

    2002-12-12

    Clustering of co-expressed, non-homologous genes on chromosomes implies their co-regulation. In lower eukaryotes, co-expressed genes are often found in pairs. Clustering of genes that share aspects of transcriptional regulation has also been reported in higher eukaryotes. To advance our understanding of the mode of coordinated gene regulation in multicellular organisms, we performed a genome-wide analysis of the chromosomal distribution of co-expressed genes in Drosophila. We identified a total of 1,661 testes-specific genes, one-third of which are clustered on chromosomes. The number of clusters of three or more genes is much higher than expected by chance. We observed a similar trend for genes upregulated in the embryo and in the adult head, although the expression pattern of individual genes cannot be predicted on the basis of chromosomal position alone. Our data suggest that the prevalent mechanism of transcriptional co-regulation in higher eukaryotes operates with extensive chromatin domains that comprise multiple genes.

  13. Polytene Chromosomes - A Portrait of Functional Organization of the Drosophila Genome.

    Science.gov (United States)

    Zykova, Tatyana Yu; Levitsky, Victor G; Belyaeva, Elena S; Zhimulev, Igor F

    2018-04-01

    This mini-review is devoted to the problem genetic meaning of main polytene chromosome structures - bands and interbands. Generally, densely packed chromatin forms black bands, moderately condensed regions form grey loose bands, whereas decondensed regions of the genome appear as interbands. Recent progress in the annotation of the Drosophila genome and epigenome has made it possible to compare the banding pattern and the structural organization of genes, as well as their activity. This was greatly aided by our ability to establish the borders of bands and interbands on the physical map, which allowed to perform comprehensive side-by-side comparisons of cytology, genetic and epigenetic maps and to uncover the association between the morphological structures and the functional domains of the genome. These studies largely conclude that interbands 5'-ends of housekeeping genes that are active across all cell types. Interbands are enriched with proteins involved in transcription and nucleosome remodeling, as well as with active histone modifications. Notably, most of the replication origins map to interband regions. As for grey loose bands adjacent to interbands, they typically host the bodies of house-keeping genes. Thus, the bipartite structure composed of an interband and an adjacent grey band functions as a standalone genetic unit. Finally, black bands harbor tissue-specific genes with narrow temporal and tissue expression profiles. Thus, the uniform and permanent activity of interbands combined with the inactivity of genes in bands forms the basis of the universal banding pattern observed in various Drosophila tissues.

  14. Genomic analysis of primordial dwarfism reveals novel disease genes.

    Science.gov (United States)

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis.

  15. Targeted Sequencing of Venom Genes from Cone Snail Genomes Improves Understanding of Conotoxin Molecular Evolution.

    Science.gov (United States)

    Phuong, Mark A; Mahardika, Gusti N

    2018-05-01

    To expand our capacity to discover venom sequences from the genomes of venomous organisms, we applied targeted sequencing techniques to selectively recover venom gene superfamilies and nontoxin loci from the genomes of 32 cone snail species (family, Conidae), a diverse group of marine gastropods that capture their prey using a cocktail of neurotoxic peptides (conotoxins). We were able to successfully recover conotoxin gene superfamilies across all species with high confidence (> 100× coverage) and used these data to provide new insights into conotoxin evolution. First, we found that conotoxin gene superfamilies are composed of one to six exons and are typically short in length (mean = ∼85 bp). Second, we expanded our understanding of the following genetic features of conotoxin evolution: 1) positive selection, where exons coding the mature toxin region were often three times more divergent than their adjacent noncoding regions, 2) expression regulation, with comparisons to transcriptome data showing that cone snails only express a fraction of the genes available in their genome (24-63%), and 3) extensive gene turnover, where Conidae species varied from 120 to 859 conotoxin gene copies. Finally, using comparative phylogenetic methods, we found that while diet specificity did not predict patterns of conotoxin evolution, dietary breadth was positively correlated with total conotoxin gene diversity. Overall, the targeted sequencing technique demonstrated here has the potential to radically increase the pace at which venom gene families are sequenced and studied, reshaping our ability to understand the impact of genetic changes on ecologically relevant phenotypes and subsequent diversification.

  16. Distant homology between yeast photoreactivating gene fragment and human genomic digests

    International Nuclear Information System (INIS)

    Meechan, P.J.; Milam, K.M.; Cleaver, J.E.

    1985-01-01

    Hybridization of DNA coding for the yeast DNA photolyase to human genomic DNA appears to allow one to determine whether a conserved enzyme is coded for in human cells. Under stringent conditions (68 0 C), hybridization is not found between the cloned yeast fragment (YEp13-phr1) and human or chick genomic digests. At less stringent conditions (60 0 C), hybridization is observed with chick digests, indicating evolutionary divergence even among organisms capable of photo-reactivation. At 50 0 C, weak hybridization with human digests was observed, indicating further divergence from the cloned gene. Data concerning the precise extent of homology and methods to clone the chick gene for use as another probe are discussed

  17. Genomic and gene variation in Mycoplasma hominis strains

    DEFF Research Database (Denmark)

    Christiansen, Gunna; Andersen, H; Birkelund, Svend

    1987-01-01

    DNAs from 14 strains of Mycoplasma hominis isolated from various habitats, including strain PG21, were analyzed for genomic heterogeneity. DNA-DNA filter hybridization values were from 51 to 91%. Restriction endonuclease digestion patterns, analyzed by agarose gel electrophoresis, revealed...... no identity or cluster formation between strains. Variation within M. hominis rRNA genes was analyzed by Southern hybridization of EcoRI-cleaved DNA hybridized with a cloned fragment of the rRNA gene from the mycoplasma strain PG50. Five of the M. hominis strains showed identical hybridization patterns....... These hybridization patterns were compared with those of 12 other mycoplasma species, which showed a much more complex band pattern. Cloned nonribosomal RNA gene fragments of M. hominis PG21 DNA were analyzed, and the fragments were used to demonstrate heterogeneity among the strains. A monoclonal antibody against...

  18. Lateral gene exchanges shape the genomes of amoeba-resisting microorganisms

    Directory of Open Access Journals (Sweden)

    Claire eBertelli

    2012-08-01

    Full Text Available Based on Darwin’s concept of the tree of life, vertical inheritance was thought to be dominant, and mutations, deletions and duplication were streaming the genomes of living organisms. In the current genomic era, increasing data indicated that both vertical and lateral gene inheritance interact in space and time to trigger genome evolution, particularly among microorganisms sharing a given ecological niche. As a paradigm to their diversity and their survival in a variety of cell types, intracellular microorganisms, and notably intracellular bacteria, were considered as less prone to lateral genetic exchanges. Such specialized microorganisms generally have a smaller gene repertoire because they do rely on their host’s factors for some basic regulatory and metabolic functions. Here we review events of lateral gene transfer (LGT that illustrate the genetic exchanges among intra-amoebal microorganisms or between the microorganism and its amoebal host. We tentatively investigate the functions of laterally transferred genes in the light of the interaction with their host as they should confer a selective advantage and success to the amoeba-resisting microorganisms.

  19. Combining genetical genomics and bulked segregant analysis differential expression: an approach to gene localization

    NARCIS (Netherlands)

    Chen, Xinwei; Hedley, P.E.; Morris, J.; Liu, Hui; Niks, R.E.; Waugh, R.

    2011-01-01

    Positional gene isolation in unsequenced species generally requires either a reference genome sequence or an inference of gene content based on conservation of synteny with a genomic model. In the large unsequenced genomes of the Triticeae cereals the latter, i.e. conservation of synteny with the

  20. Prosecutor: parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources

    Directory of Open Access Journals (Sweden)

    van Hijum Sacha AFT

    2008-10-01

    Full Text Available Abstract Background Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar expression patterns are more likely to participate in shared biological processes. Results We developed Prosecutor, an application that enables researchers to rapidly infer gene function based on available gene expression data and functional annotations. Our parameter-free functional prediction method uses a sensitive algorithm to achieve a high association rate of linking genes with unknown function to annotated genes. Furthermore, Prosecutor utilizes additional biological information such as genomic context and known regulatory mechanisms that are specific for prokaryotes. We analyzed publicly available transcriptome data sets and used literature sources to validate putative functions suggested by Prosecutor. We supply the complete results of our analysis for 11 prokaryotic organisms on a dedicated website. Conclusion The Prosecutor software and supplementary datasets available at http://www.prosecutor.nl allow researchers working on any of the analyzed organisms to quickly identify the putative functions of their genes of interest. A de novo analysis allows new organisms to be studied.

  1. Mapping our genes: The genome projects: How big, how fast

    Energy Technology Data Exchange (ETDEWEB)

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  2. Mapping Our Genes: The Genome Projects: How Big, How Fast

    Science.gov (United States)

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for ?writing the rules? of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  3. Genome-wide identification and characterization of the bHLH gene family in tomato.

    Science.gov (United States)

    Sun, Hua; Fan, Hua-Jie; Ling, Hong-Qing

    2015-01-22

    The basic helix-loop-helix (bHLH) proteins are a large superfamily of transcription factors, and play a central role in a wide range of metabolic, physiological, and developmental processes in higher organisms. Tomato is an important vegetable crop, and its genome sequence has been published recently. However, the bHLH gene family of tomato has not been systematically identified and characterized yet. In this study, we identified 159 bHLH protein-encoding genes (SlbHLH) in tomato genome and analyzed their structures. Although bHLH domains were conserved among the bHLH proteins between tomato and Arabidopsis, the intron sequences and distribution of tomato bHLH genes were extremely different compared with Arabidopsis. The gene duplication analysis showed that 58.5% and 6.3% of SlbHLH genes belonged to low-stringency and high-stringency duplication, respectively, indicating that the SlbHLH genes are mainly generated via short low-stringency region duplication in tomato. Subsequently, we classified the SlbHLH genes into 21 subfamilies by phylogenetic tree analysis, and predicted their possible functions by comparison with their homologous genes of Arabidopsis. Moreover, the expression profile analysis of SlbHLH genes from 10 different tissues showed that 21 SlbHLH genes exhibited tissue-specific expression. Further, we identified that 11 SlbHLH genes were associated with fruit development and ripening (eight of them associated with young fruit development and three with fruit ripening). The evolutionary analysis revealed that 92% SlbHLH genes might be evolved from ancestor(s) originated from early land plant, and 8% from algae. In this work, we systematically identified SlbHLHs by analyzing the tomato genome sequence using a set of bioinformatics approaches, and characterized their chromosomal distribution, gene structures, duplication, phylogenetic relationship and expression profiles, as well predicted their possible biological functions via comparative analysis

  4. Gene interactions in the DNA damage-response pathway identified by genome-wide RNA-interference analysis of synthetic lethality

    NARCIS (Netherlands)

    van Haaften, Gijs; Vastenhouw, Nadine L; Nollen, Ellen A A; Plasterk, Ronald H A; Tijsterman, Marcel

    2004-01-01

    Here, we describe a systematic search for synthetic gene interactions in a multicellular organism, the nematode Caenorhabditis elegans. We established a high-throughput method to determine synthetic gene interactions by genome-wide RNA interference and identified genes that are required to protect

  5. A search engine to identify pathway genes from expression data on multiple organisms

    Directory of Open Access Journals (Sweden)

    Zambon Alexander C

    2007-05-01

    Full Text Available Abstract Background The completion of several genome projects showed that most genes have not yet been characterized, especially in multicellular organisms. Although most genes have unknown functions, a large collection of data is available describing their transcriptional activities under many different experimental conditions. In many cases, the coregulatation of a set of genes across a set of conditions can be used to infer roles for genes of unknown function. Results We developed a search engine, the Multiple-Species Gene Recommender (MSGR, which scans gene expression datasets from multiple organisms to identify genes that participate in a genetic pathway. The MSGR takes a query consisting of a list of genes that function together in a genetic pathway from one of six organisms: Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana, and Helicobacter pylori. Using a probabilistic method to merge searches, the MSGR identifies genes that are significantly coregulated with the query genes in one or more of those organisms. The MSGR achieves its highest accuracy for many human pathways when searches are combined across species. We describe specific examples in which new genes were identified to be involved in a neuromuscular signaling pathway and a cell-adhesion pathway. Conclusion The search engine can scan large collections of gene expression data for new genes that are significantly coregulated with a pathway of interest. By integrating searches across organisms, the MSGR can identify pathway members whose coregulation is either ancient or newly evolved.

  6. Census of solo LuxR genes in prokaryotic genomes.

    Science.gov (United States)

    Hudaiberdiev, Sanjarbek; Choudhary, Kumari S; Vera Alvarez, Roberto; Gelencsér, Zsolt; Ligeti, Balázs; Lamba, Doriano; Pongor, Sándor

    2015-01-01

    luxR genes encode transcriptional regulators that control acyl homoserine lactone-based quorum sensing (AHL QS) in Gram negative bacteria. On the bacterial chromosome, luxR genes are usually found next or near to a luxI gene encoding the AHL signal synthase. Recently, a number of luxR genes were described that have no luxI genes in their vicinity on the chromosome. These so-called solo luxR genes may either respond to internal AHL signals produced by a non-adjacent luxI in the chromosome, or can respond to exogenous signals. Here we present a survey of solo luxR genes found in complete and draft bacterial genomes in the NCBI databases using HMMs. We found that 2698 of the 3550 luxR genes found are solos, which is an unexpectedly high number even if some of the hits may be false positives. We also found that solo LuxR sequences form distinct clusters that are different from the clusters of LuxR sequences that are part of the known luxR-luxI topological arrangements. We also found a number of cases that we termed twin luxR topologies, in which two adjacent luxR genes were in tandem or divergent orientation. Many of the luxR solo clusters were devoid of the sequence motifs characteristic of AHL binding LuxR proteins so there is room to speculate that the solos may be involved in sensing hitherto unknown signals. It was noted that only some of the LuxR clades are rich in conserved cysteine residues. Molecular modeling suggests that some of the cysteines may be involved in disulfide formation, which makes us speculate that some LuxR proteins, including some of the solos may be involved in redox regulation.

  7. Gene deletion of cytosolic ATP: citrate lyase leads to altered organic acid production in Aspergillus niger

    DEFF Research Database (Denmark)

    Meijer, Susan Lisette; Nielsen, Michael Lynge; Olsson, Lisbeth

    2009-01-01

    With the availability of the genome sequence of the filamentous fungus Aspergillus niger, the use of targeted genetic modifications has become feasible. This, together with the fact that A. niger is well established industrially, makes this fungus an attractive micro-organism for creating a cell...... factory platform for production of chemicals. Using molecular biology techniques, this study focused on metabolic engineering of A. niger to manipulate its organic acid production in the direction of succinic acid. The gene target for complete gene deletion was cytosolic ATP: citrate lyase (acl), which...... the acl gene. Additionally, the total amount of organic acids produced in the deletion strain was significantly increased. Genome-scale stoichiometric metabolic model predictions can be used for identifying gene targets. Deletion of the acl led to increased succinic acid production by A. niger....

  8. Chromosome mapping of dragline silk genes in the genomes of widow spiders (Araneae, Theridiidae.

    Directory of Open Access Journals (Sweden)

    Yonghui Zhao

    Full Text Available With its incredible strength and toughness, spider dragline silk is widely lauded for its impressive material properties. Dragline silk is composed of two structural proteins, MaSp1 and MaSp2, which are encoded by members of the spidroin gene family. While previous studies have characterized the genes that encode the constituent proteins of spider silks, nothing is known about the physical location of these genes. We determined karyotypes and sex chromosome organization for the widow spiders, Latrodectus hesperus and L. geometricus (Araneae, Theridiidae. We then used fluorescence in situ hybridization to map the genomic locations of the genes for the silk proteins that compose the remarkable spider dragline. These genes included three loci for the MaSp1 protein and the single locus for the MaSp2 protein. In addition, we mapped a MaSp1 pseudogene. All the MaSp1 gene copies and pseudogene localized to a single chromosomal region while MaSp2 was located on a different chromosome of L. hesperus. Using probes derived from L. hesperus, we comparatively mapped all three MaSp1 loci to a single region of a L. geometricus chromosome. As with L. hesperus, MaSp2 was found on a separate L. geometricus chromosome, thus again unlinked to the MaSp1 loci. These results indicate orthology of the corresponding chromosomal regions in the two widow genomes. Moreover, the occurrence of multiple MaSp1 loci in a conserved gene cluster across species suggests that MaSp1 proliferated by tandem duplication in a common ancestor of L. geometricus and L. hesperus. Unequal crossover events during recombination could have given rise to the gene copies and could also maintain sequence similarity among gene copies over time. Further comparative mapping with taxa of increasing divergence from Latrodectus will pinpoint when the MaSp1 duplication events occurred and the phylogenetic distribution of silk gene linkage patterns.

  9. Genetic and epigenetic variation in 5S ribosomal RNA genes reveals genome dynamics in Arabidopsis thaliana.

    Science.gov (United States)

    Simon, Lauriane; Rabanal, Fernando A; Dubos, Tristan; Oliver, Cecilia; Lauber, Damien; Poulet, Axel; Vogt, Alexander; Mandlbauer, Ariane; Le Goff, Samuel; Sommer, Andreas; Duborjal, Hervé; Tatout, Christophe; Probst, Aline V

    2018-04-06

    Organized in tandem repeat arrays in most eukaryotes and transcribed by RNA polymerase III, expression of 5S rRNA genes is under epigenetic control. To unveil mechanisms of transcriptional regulation, we obtained here in depth sequence information on 5S rRNA genes from the Arabidopsis thaliana genome and identified differential enrichment in epigenetic marks between the three 5S rDNA loci situated on chromosomes 3, 4 and 5. We reveal the chromosome 5 locus as the major source of an atypical, long 5S rRNA transcript characteristic of an open chromatin structure. 5S rRNA genes from this locus translocated in the Landsberg erecta ecotype as shown by linkage mapping and chromosome-specific FISH analysis. These variations in 5S rDNA locus organization cause changes in the spatial arrangement of chromosomes in the nucleus. Furthermore, 5S rRNA gene arrangements are highly dynamic with alterations in chromosomal positions through translocations in certain mutants of the RNA-directed DNA methylation pathway and important copy number variations among ecotypes. Finally, variations in 5S rRNA gene sequence, chromatin organization and transcripts indicate differential usage of 5S rDNA loci in distinct ecotypes. We suggest that both the usage of existing and new 5S rDNA loci resulting from translocations may impact neighboring chromatin organization.

  10. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling

    Science.gov (United States)

    Sato, Yukuto; Tsukamoto, Katsumi; Nishida, Mutsumi

    2015-01-01

    Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post–teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70–80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis. PMID:26578810

  11. Genome-wide identification and expression analysis of MAPK and MAPKK gene family in Malus domestica.

    Science.gov (United States)

    Zhang, Shizhong; Xu, Ruirui; Luo, Xiaocui; Jiang, Zesheng; Shu, Huairui

    2013-12-01

    MAPK signal transduction modules play crucial roles in regulating many biological processes in plants, which are composed of three classes of hierarchically organized protein kinases, namely MAPKKKs, MAPKKs, and MAPKs. Although genome-wide analysis of this family has been carried out in some species, little is known about MAPK and MAPKK genes in apple (Malus domestica). In this study, a total of 26 putative apple MAPK genes (MdMPKs) and 9 putative apple MAPKK genes (MdMKKs) have been identified and located within the apple genome. Phylogenetic analysis revealed that MdMAPKs and MdMAPKKs could be divided into 4 subfamilies (groups A, B, C and D), respectively. The predicted MdMAPKs and MdMAPKKs were distributed across 13 out of 17 chromosomes with different densities. In addition, analysis of exon-intron junctions and of intron phase inside the predicted coding region of each candidate gene has revealed high levels of conservation within and between phylogenetic groups. According to the microarray and expressed sequence tag (EST) analysis, the different expression patterns indicate that they may play different roles during fruit development and rootstock-scion interaction process. Moreover, MAPK and MAPKK genes were performed expression profile analyses in different tissues (root, stem, leaf, flower and fruit), and all of the selected genes were expressed in at least one of the tissues tested, indicating that the MAPKs and MAPKKs are involved in various aspects of physiological and developmental processes of apple. To our knowledge, this is the first report of a genome-wide analysis of the apple MAPK and MAPKK gene family. This study provides valuable information for understanding the classification and putative functions of the MAPK signal in apple. © 2013.

  12. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes.

    Science.gov (United States)

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-02-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.

  13. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis

    Directory of Open Access Journals (Sweden)

    Stajich Jason E

    2006-11-01

    Full Text Available Abstract Background To date, most fungal phylogenies have been derived from single gene comparisons, or from concatenated alignments of a small number of genes. The increase in fungal genome sequencing presents an opportunity to reconstruct evolutionary events using entire genomes. As a tool for future comparative, phylogenomic and phylogenetic studies, we used both supertrees and concatenated alignments to infer relationships between 42 species of fungi for which complete genome sequences are available. Results A dataset of 345,829 genes was extracted from 42 publicly available fungal genomes. Supertree methods were employed to derive phylogenies from 4,805 single gene families. We found that the average consensus supertree method may suffer from long-branch attraction artifacts, while matrix representation with parsimony (MRP appears to be immune from these. A genome phylogeny was also reconstructed from a concatenated alignment of 153 universally distributed orthologs. Our MRP supertree and concatenated phylogeny are highly congruent. Within the Ascomycota, the sub-phyla Pezizomycotina and Saccharomycotina were resolved. Both phylogenies infer that the Leotiomycetes are the closest sister group to the Sordariomycetes. There is some ambiguity regarding the placement of Stagonospora nodurum, the sole member of the class Dothideomycetes present in the dataset. Within the Saccharomycotina, a monophyletic clade containing organisms that translate CTG as serine instead of leucine is evident. There is also strong support for two groups within the CTG clade, one containing the fully sexual species Candida lusitaniae, Candida guilliermondii and Debaryomyces hansenii, and the second group containing Candida albicans, Candida dubliniensis, Candida tropicalis, Candida parapsilosis and Lodderomyces elongisporus. The second major clade within the Saccharomycotina contains species whose genomes have undergone a whole genome duplication (WGD, and their close

  14. Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum.

    Science.gov (United States)

    Rao, Soumya; Nandineni, Madhusudan R

    2017-01-01

    Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.

  15. Sequencing analysis reveals a unique gene organization in the gyrB region of Mycoplasma hominis

    DEFF Research Database (Denmark)

    Ladefoged, Søren; Christiansen, Gunna

    1994-01-01

    of which showed similarity to that which encodes the LicA protein of Haemophilus influenzae. The organization of the genes in the region showed no resemblance to that in the corresponding regions of other bacteria sequenced so far. The gyrA gene was mapped 35 kb downstream from the gyrB gene.......The homolog of the gyrB gene, which has been reported to be present in the vicinity of the initiation site of replication in bacteria, was mapped on the Mycoplasma hominis genome, and the region was subsequently sequenced. Five open reading frames were identified flanking the gyrB gene, one...

  16. The compact Selaginella genome identifies changes in gene content associated with the evolution of vascular plants

    Energy Technology Data Exchange (ETDEWEB)

    Grigoriev, Igor V.; Banks, Jo Ann; Nishiyama, Tomoaki; Hasebe, Mitsuyasu; Bowman, John L.; Gribskov, Michael; dePamphilis, Claude; Albert, Victor A.; Aono, Naoki; Aoyama, Tsuyoshi; Ambrose, Barbara A.; Ashton, Neil W.; Axtell, Michael J.; Barker, Elizabeth; Barker, Michael S.; Bennetzen, Jeffrey L.; Bonawitz, Nicholas D.; Chapple, Clint; Cheng, Chaoyang; Correa, Luiz Gustavo Guedes; Dacre, Michael; DeBarry, Jeremy; Dreyer, Ingo; Elias, Marek; Engstrom, Eric M.; Estelle, Mark; Feng, Liang; Finet, Cedric; Floyd, Sandra K.; Frommer, Wolf B.; Fujita, Tomomichi; Gramzow, Lydia; Gutensohn, Michael; Harholt, Jesper; Hattori, Mitsuru; Heyl, Alexander; Hirai, Tadayoshi; Hiwatashi, Yuji; Ishikawa, Masaki; Iwata, Mineko; Karol, Kenneth G.; Koehler, Barbara; Kolukisaoglu, Uener; Kubo, Minoru; Kurata, Tetsuya; Lalonde, Sylvie; Li, Kejie; Li, Ying; Litt, Amy; Lyons, Eric; Manning, Gerard; Maruyama, Takeshi; Michael, Todd P.; Mikami, Koji; Miyazaki, Saori; Morinaga, Shin-ichi; Murata, Takashi; Mueller-Roeber, Bernd; Nelson, David R.; Obara, Mari; Oguri, Yasuko; Olmstead, Richard G.; Onodera, Naoko; Petersen, Bent Larsen; Pils, Birgit; Prigge, Michael; Rensing, Stefan A.; Riano-Pachon, Diego Mauricio; Roberts, Alison W.; Sato, Yoshikatsu; Scheller, Henrik Vibe; Schulz, Burkhard; Schulz, Christian; Shakirov, Eugene V.; Shibagaki, Nakako; Shinohara, Naoki; Shippen, Dorothy E.; Sorensen, Iben; Sotooka, Ryo; Sugimoto, Nagisa; Sugita, Mamoru; Sumikawa, Naomi; Tanurdzic, Milos; Theilsen, Gunter; Ulvskov, Peter; Wakazuki, Sachiko; Weng, Jing-Ke; Willats, William W.G.T.; Wipf, Daniel; Wolf, Paul G.; Yang, Lixing; Zimmer, Andreas D.; Zhu, Qihui; Mitros, Therese; Hellsten, Uffe; Loque, Dominique; Otillar, Robert; Salamov, Asaf; Schmutz, Jeremy; Shapiro, Harris; Lindquist, Erika; Lucas, Susan; Rokhsar, Daniel

    2011-04-28

    We report the genome sequence of the nonseed vascular plant, Selaginella moellendorffii, and by comparative genomics identify genes that likely played important roles in the early evolution of vascular plants and their subsequent evolution

  17. Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting | Office of Cancer Genomics

    Science.gov (United States)

    The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest.

  18. Cognitive genomics: Linking genes to behavior in the human brain

    Directory of Open Access Journals (Sweden)

    Genevieve Konopka

    2017-02-01

    Full Text Available Correlations of genetic variation in DNA with functional brain activity have already provided a starting point for delving into human cognitive mechanisms. However, these analyses do not provide the specific genes driving the associations, which are complicated by intergenic localization as well as tissue-specific epigenetics and expression. The use of brain-derived expression datasets could build upon the foundation of these initial genetic insights and yield genes and molecular pathways for testing new hypotheses regarding the molecular bases of human brain development, cognition, and disease. Thus, coupling these human brain gene expression data with measurements of brain activity may provide genes with critical roles in brain function. However, these brain gene expression datasets have their own set of caveats, most notably a reliance on postmortem tissue. In this perspective, I summarize and examine the progress that has been made in this realm to date, and discuss the various frontiers remaining, such as the inclusion of cell-type-specific information, additional physiological measurements, and genomic data from patient cohorts.

  19. Gene disruptions using P transposable elements: an integral component of the Drosophila genome project.

    OpenAIRE

    Spradling, A C; Stern, D M; Kiss, I; Roote, J; Laverty, T; Rubin, G M

    1995-01-01

    Biologists require genetic as well as molecular tools to decipher genomic information and ultimately to understand gene function. The Berkeley Drosophila Genome Project is addressing these needs with a massive gene disruption project that uses individual, genetically engineered P transposable elements to target open reading frames throughout the Drosophila genome. DNA flanking the insertions is sequenced, thereby placing an extensive series of genetic markers on the physical genomic map and a...

  20. Impact of nuclear organization and chromatin structure on DNA repair and genome stability

    International Nuclear Information System (INIS)

    Batte, Amandine

    2016-01-01

    The non-random organization of the eukaryotic cell nucleus and the folding of genome in chromatin more or less condensed can influence many functions related to DNA metabolism, including genome stability. Double-strand breaks (DSBs) are the most deleterious DNA damages for the cells. To preserve genome integrity, eukaryotic cells thus developed DSB repair mechanisms conserved from yeast to human, among which homologous recombination (HR) that uses an intact homologous sequence to repair a broken chromosome. HR can be separated in two sub-pathways: Gene Conversion (GC) transfers genetic information from one molecule to its homologous and Break Induced Replication (BIR) establishes a replication fork than can proceed until the chromosome end. My doctorate work was focused on the contribution of the chromatin context and 3D genome organization on DSB repair. In S. cerevisiae, nuclear organization and heterochromatin spreading at sub-telomeres can be modified through the overexpression of the Sir3 or sir3A2Q mutant proteins. We demonstrated that reducing the physical distance between homologous sequences increased GC rates, reinforcing the notion that homology search is a limiting step for recombination. We also showed that hetero-chromatinization of DSB site fine-tunes DSB resection, limiting the loss of the DSB ends required to perform homology search and complete HR. Finally, we noticed that the presence of heterochromatin at the donor locus decreased both GC and BIR efficiencies, probably by affecting strand invasion. This work highlights new regulatory pathways of DNA repair. (author) [fr

  1. Comparative genomics of Geobacter chemotaxis genes reveals diverse signaling function

    Directory of Open Access Journals (Sweden)

    Antommattei Frances M

    2008-10-01

    Full Text Available Abstract Background Geobacter species are δ-Proteobacteria and are often the predominant species in a variety of sedimentary environments where Fe(III reduction is important. Their ability to remediate contaminated environments and produce electricity makes them attractive for further study. Cell motility, biofilm formation, and type IV pili all appear important for the growth of Geobacter in changing environments and for electricity production. Recent studies in other bacteria have demonstrated that signaling pathways homologous to the paradigm established for Escherichia coli chemotaxis can regulate type IV pili-dependent motility, the synthesis of flagella and type IV pili, the production of extracellular matrix material, and biofilm formation. The classification of these pathways by comparative genomics improves the ability to understand how Geobacter thrives in natural environments and better their use in microbial fuel cells. Results The genomes of G. sulfurreducens, G. metallireducens, and G. uraniireducens contain multiple (~70 homologs of chemotaxis genes arranged in several major clusters (six, seven, and seven, respectively. Unlike the single gene cluster of E. coli, the Geobacter clusters are not all located near the flagellar genes. The probable functions of some Geobacter clusters are assignable by homology to known pathways; others appear to be unique to the Geobacter sp. and contain genes of unknown function. We identified large numbers of methyl-accepting chemotaxis protein (MCP homologs that have diverse sensing domain architectures and generate a potential for sensing a great variety of environmental signals. We discuss mechanisms for class-specific segregation of the MCPs in the cell membrane, which serve to maintain pathway specificity and diminish crosstalk. Finally, the regulation of gene expression in Geobacter differs from E. coli. The sequences of predicted promoter elements suggest that the alternative sigma factors

  2. Genome medicine: gene therapy for the millennium, 30 September-3 October 2001, Rome, Italy.

    Science.gov (United States)

    Gruenert, D C; Novelli, G; Dallapiccola, B; Colosimo, A

    2002-06-01

    The recent surge of DNA sequence information resulting from the efforts of agencies interested in deciphering the human genetic code has facilitated technological developments that have been critical in the identification of genes associated with numerous disease pathologies. In addition, these efforts have opened the door to the opportunity to develop novel genetic therapies to treat a broad range of inherited disorders. Through a joint effort by the University of Vermont, the University of Rome, Tor Vergata, University of Rome, La Sapienza, and the CSS Mendel Institute, Rome, an international meeting, 'Genome Medicine: Gene Therapy for the Millennium' was organized. This meeting provided a forum for the discussion of scientific and clinical advances stimulated by the explosion of sequence information generated by the Human Genome Project and the implications these advances have for gene therapy. The meeting had six sessions that focused on the functional evaluation of specific genes via biochemical analysis and through animal models, the development of novel therapeutic strategies involving gene targeting, artificial chromsomes, DNA delivery systems and non-embryonic stem cells, and on the ethical and social implications of these advances.

  3. Genome-wide identification and analysis of the SBP-box family genes in apple (Malus × domestica Borkh.).

    Science.gov (United States)

    Li, Jun; Hou, Hongmin; Li, Xiaoqin; Xiang, Jiang; Yin, Xiangjing; Gao, Hua; Zheng, Yi; Bassett, Carole L; Wang, Xiping

    2013-09-01

    SQUAMOSA promoter binding protein (SBP)-box genes encode a family of plant-specific transcription factors and play many crucial roles in plant development. In this study, 27 SBP-box gene family members were identified in the apple (Malus × domestica Borkh.) genome, 15 of which were suggested to be putative targets of MdmiR156. Plant SBPs were classified into eight groups according to the phylogenetic analysis of SBP-domain proteins. Gene structure, gene chromosomal location and synteny analyses of MdSBP genes within the apple genome demonstrated that tandem and segmental duplications, as well as whole genome duplications, have likely contributed to the expansion and evolution of the SBP-box gene family in apple. Additionally, synteny analysis between apple and Arabidopsis indicated that several paired homologs of MdSBP and AtSPL genes were located in syntenic genomic regions. Tissue-specific expression analysis of MdSBP genes in apple demonstrated their diversified spatiotemporal expression patterns. Most MdmiR156-targeted MdSBP genes, which had relatively high transcript levels in stems, leaves, apical buds and some floral organs, exhibited a more differential expression pattern than most MdmiR156-nontargeted MdSBP genes. Finally, expression analysis of MdSBP genes in leaves upon various plant hormone treatments showed that many MdSBP genes were responsive to different plant hormones, indicating that MdSBP genes may be involved in responses to hormone signaling during stress or in apple development. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  4. Colony size measurement of the yeast gene deletion strains for functional genomics

    Directory of Open Access Journals (Sweden)

    Mir-Rashed Nadereh

    2007-04-01

    Full Text Available Abstract Background Numerous functional genomics approaches have been developed to study the model organism yeast, Saccharomyces cerevisiae, with the aim of systematically understanding the biology of the cell. Some of these techniques are based on yeast growth differences under different conditions, such as those generated by gene mutations, chemicals or both. Manual inspection of the yeast colonies that are grown under different conditions is often used as a method to detect such growth differences. Results Here, we developed a computerized image analysis system called Growth Detector (GD, to automatically acquire quantitative and comparative information for yeast colony growth. GD offers great convenience and accuracy over the currently used manual growth measurement method. It distinguishes true yeast colonies in a digital image and provides an accurate coordinate oriented map of the colony areas. Some post-processing calculations are also conducted. Using GD, we successfully detected a genetic linkage between the molecular activity of the plant-derived antifungal compound berberine and gene expression components, among other cellular processes. A novel association for the yeast mek1 gene with DNA damage repair was also identified by GD and confirmed by a plasmid repair assay. The results demonstrate the usefulness of GD for yeast functional genomics research. Conclusion GD offers significant improvement over the manual inspection method to detect relative yeast colony size differences. The speed and accuracy associated with GD makes it an ideal choice for large-scale functional genomics investigations.

  5. Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

    Science.gov (United States)

    Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

    2017-06-26

    The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis

  6. The genome of Chelonid herpesvirus 5 harbors atypical genes

    Science.gov (United States)

    Ackermann, Mathias; Koriabine, Maxim; Hartmann-Fritsch, Fabienne; de Jong, Pieter J.; Lewis, Teresa D.; Schetle, Nelli; Work, Thierry M.; Dagenais, Julie; Balazs, George H.; Leong, Jo-Ann C.

    2012-01-01

    The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within thealphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of “atypical” DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis

  7. Genome Enabled Discovery of Carbon Sequestration Genes in Poplar

    Energy Technology Data Exchange (ETDEWEB)

    Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping; Strauss, Steve

    2007-02-22

    The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Complete descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral

  8. The rules of gene expression in plants: Organ identity and gene body methylation are key factors for regulation of gene expression in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Gutiérrez Rodrigo A

    2008-09-01

    Full Text Available Abstract Background Microarray technology is a widely used approach for monitoring genome-wide gene expression. For Arabidopsis, there are over 1,800 microarray hybridizations representing many different experimental conditions on Affymetrix™ ATH1 gene chips alone. This huge amount of data offers a unique opportunity to infer the principles that govern the regulation of gene expression in plants. Results We used bioinformatics methods to analyze publicly available data obtained using the ATH1 chip from Affymetrix. A total of 1887 ATH1 hybridizations were normalized and filtered to eliminate low-quality hybridizations. We classified and compared control and treatment hybridizations and determined differential gene expression. The largest differences in gene expression were observed when comparing samples obtained from different organs. On average, ten-fold more genes were differentially expressed between organs as compared to any other experimental variable. We defined "gene responsiveness" as the number of comparisons in which a gene changed its expression significantly. We defined genes with the highest and lowest responsiveness levels as hypervariable and housekeeping genes, respectively. Remarkably, housekeeping genes were best distinguished from hypervariable genes by differences in methylation status in their transcribed regions. Moreover, methylation in the transcribed region was inversely correlated (R2 = 0.8 with gene responsiveness on a genome-wide scale. We provide an example of this negative relationship using genes encoding TCA cycle enzymes, by contrasting their regulatory responsiveness to nitrate and methylation status in their transcribed regions. Conclusion Our results indicate that the Arabidopsis transcriptome is largely established during development and is comparatively stable when faced with external perturbations. We suggest a novel functional role for DNA methylation in the transcribed region as a key determinant

  9. Genome-wide identification and comparative expression analysis of LEA genes in watermelon and melon genomes.

    Science.gov (United States)

    Celik Altunoglu, Yasemin; Baloglu, Mehmet Cengiz; Baloglu, Pinar; Yer, Esra Nurten; Kara, Sibel

    2017-01-01

    Late embryogenesis abundant (LEA) proteins are large and diverse group of polypeptides which were first identified during seed dehydration and then in vegetative plant tissues during different stress responses. Now, gene family members of LEA proteins have been detected in various organisms. However, there is no report for this protein family in watermelon and melon until this study. A total of 73 LEA genes from watermelon ( ClLEA ) and 61 LEA genes from melon ( CmLEA ) were identified in this comprehensive study. They were classified into four and three distinct clusters in watermelon and melon, respectively. There was a correlation between gene structure and motif composition among each LEA groups. Segmental duplication played an important role for LEA gene expansion in watermelon. Maximum gene ontology of LEA genes was observed with poplar LEA genes. For evaluation of tissue specific expression patterns of ClLEA and CmLEA genes, publicly available RNA-seq data were analyzed. The expression analysis of selected LEA genes in root and leaf tissues of drought-stressed watermelon and melon were examined using qRT-PCR. Among them, ClLEA - 12 - 17 - 46 genes were quickly induced after drought application. Therefore, they might be considered as early response genes for water limitation conditions in watermelon. In addition, CmLEA - 42 - 43 genes were found to be up-regulated in both tissues of melon under drought stress. Our results can open up new frontiers about understanding of functions of these important family members under normal developmental stages and stress conditions by bioinformatics and transcriptomic approaches.

  10. Recognizing genes and other components of genomic structure

    Energy Technology Data Exchange (ETDEWEB)

    Burks, C. (Los Alamos National Lab., NM (USA)); Myers, E. (Arizona Univ., Tucson, AZ (USA). Dept. of Computer Science); Stormo, G.D. (Colorado Univ., Boulder, CO (USA). Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  11. GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

    Science.gov (United States)

    Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

    2006-03-31

    Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.

  12. GeneBins: a database for classifying gene expression data, with application to plant genome arrays

    Directory of Open Access Journals (Sweden)

    Weiller Georg

    2007-03-01

    Full Text Available Abstract Background To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms. Results We developed a bioinformatics system to assign the probe set sequences of any organism to a hierarchical functional classification modelled on KEGG ontology. The GeneBins database currently supports the functional classification of expression data from four Affymetrix arrays; Arabidopsis thaliana, Oryza sativa, Glycine max and Medicago truncatula. An online analysis tool to identify relevant functions is also provided. Conclusion GeneBins provides resources to interpret gene expression results from microarray experiments. It is available at http://bioinfoserver.rsbs.anu.edu.au/utils/GeneBins/

  13. Plant ion channels: gene families, physiology, and functional genomics analyses.

    Science.gov (United States)

    Ward, John M; Mäser, Pascal; Schroeder, Julian I

    2009-01-01

    Distinct potassium, anion, and calcium channels in the plasma membrane and vacuolar membrane of plant cells have been identified and characterized by patch clamping. Primarily owing to advances in Arabidopsis genetics and genomics, and yeast functional complementation, many of the corresponding genes have been identified. Recent advances in our understanding of ion channel genes that mediate signal transduction and ion transport are discussed here. Some plant ion channels, for example, ALMT and SLAC anion channel subunits, are unique. The majority of plant ion channel families exhibit homology to animal genes; such families include both hyperpolarization- and depolarization-activated Shaker-type potassium channels, CLC chloride transporters/channels, cyclic nucleotide-gated channels, and ionotropic glutamate receptor homologs. These plant ion channels offer unique opportunities to analyze the structural mechanisms and functions of ion channels. Here we review gene families of selected plant ion channel classes and discuss unique structure-function aspects and their physiological roles in plant cell signaling and transport.

  14. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, Marie; Jensen, L.J.; Brunak, Søren

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only similar to 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  15. Spatial organization of the budding yeast genome in the cell nucleus and identification of specific chromatin interactions from multi-chromosome constrained chromatin model.

    Science.gov (United States)

    Gürsoy, Gamze; Xu, Yun; Liang, Jie

    2017-07-01

    Nuclear landmarks and biochemical factors play important roles in the organization of the yeast genome. The interaction pattern of budding yeast as measured from genome-wide 3C studies are largely recapitulated by model polymer genomes subject to landmark constraints. However, the origin of inter-chromosomal interactions, specific roles of individual landmarks, and the roles of biochemical factors in yeast genome organization remain unclear. Here we describe a multi-chromosome constrained self-avoiding chromatin model (mC-SAC) to gain understanding of the budding yeast genome organization. With significantly improved sampling of genome structures, both intra- and inter-chromosomal interaction patterns from genome-wide 3C studies are accurately captured in our model at higher resolution than previous studies. We show that nuclear confinement is a key determinant of the intra-chromosomal interactions, and centromere tethering is responsible for the inter-chromosomal interactions. In addition, important genomic elements such as fragile sites and tRNA genes are found to be clustered spatially, largely due to centromere tethering. We uncovered previously unknown interactions that were not captured by genome-wide 3C studies, which are found to be enriched with tRNA genes, RNAPIII and TFIIS binding. Moreover, we identified specific high-frequency genome-wide 3C interactions that are unaccounted for by polymer effects under landmark constraints. These interactions are enriched with important genes and likely play biological roles.

  16. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  17. Genome-wide identification and comparative expression analysis of LEA genes in watermelon and melon genomes

    OpenAIRE

    Celik Altunoglu, Yasemin; Baloglu, Mehmet Cengiz; Baloglu, Pinar; Yer, Esra Nurten; Kara, Sibel

    2017-01-01

    Late embryogenesis abundant (LEA) proteins are large and diverse group of polypeptides which were first identified during seed dehydration and then in vegetative plant tissues during different stress responses. Now, gene family members of LEA proteins have been detected in various organisms. However, there is no report for this protein family in watermelon and melon until this study. A total of 73 LEA genes from watermelon (ClLEA) and 61 LEA genes from melon (CmLEA) were identified in this co...

  18. The organization structure and regulatory elements of Chlamydomonas histone genes reveal features linking plant and animal genes.

    Science.gov (United States)

    Fabry, S; Müller, K; Lindauer, A; Park, P B; Cornelius, T; Schmitt, R

    1995-09-01

    The genome of the green alga Chlamydomonas reinhardtii contains approximately 15 gene clusters of the nucleosomal (or core) histone H2A, H2B, H3 and H4 genes and at least one histone H1 gene. Seven non-allelic histone gene loci were isolated from a genomic library, physically mapped, and the nucleotide sequences of three isotypes of each core histone gene species and one linked H1 gene determined. The core histone genes are organized in clusters of H2A-H2B and H3-H4 pairs, in which each gene pair shows outwardly divergent transcription from a short (< 300 bp) intercistronic region. These intercistronic regions contain typically conserved promoter elements, namely a TATA-box and the three motifs TGGCCAG-G(G/C)-CGAG, CGTTGACC and CGGTTG. Different from the genes of higher plants, but like those of animals and the related alga Volvox, the 3' untranslated regions contain no poly A signal, but a palindromic sequence (3' palindrome) essential for mRNA processing is present. One single H1 gene was found in close linkage to a H2A-H2B pair. The H1 upstream region contains the octameric promoter element GGTTGACC (also found upstream of the core histone genes) and two specific sequence motifs that are shared only with the Volvox H1 promoters. This suggests differential transcription of the H1 and the core histone genes. The H1 gene is interrupted by two introns. Unlike Volvox H3 genes, the three sequenced H3 isoforms are intron-free. Primer-directed PCR of genomic DNA demonstrated, however, that at least 8 of the about 15 H3 genes do contain one intron at a conserved position. In synchronized C. reinhardtii cells, H4 mRNA levels (representative of all core histone mRNAs) peak during cell division, suggesting strict replication-dependent gene control. The derived peptide sequences place C. reinhardtii core histones closer to plants than to animals, except that the H2A histones are more animal-like. The peptide sequence of histone H1 is closely related to the V. carteri VH1-II

  19. Population genomics of the Arabidopsis thaliana flowering time gene network.

    Science.gov (United States)

    Flowers, Jonathan M; Hanzawa, Yoshie; Hall, Megan C; Moore, Richard C; Purugganan, Michael D

    2009-11-01

    The time to flowering is a key component of the life-history strategy of the model plant Arabidopsis thaliana that varies quantitatively among genotypes. A significant problem for evolutionary and ecological genetics is to understand how natural selection may operate on this ecologically significant trait. Here, we conduct a population genomic study of resequencing data from 52 genes in the flowering time network. McDonald-Kreitman tests of neutrality suggested a strong excess of amino acid polymorphism when pooling across loci. This excess of replacement polymorphism across the flowering time network and a skewed derived frequency spectrum toward rare alleles for both replacement and noncoding polymorphisms relative to synonymous changes is consistent with a large class of deleterious polymorphisms segregating in these genes. Assuming selective neutrality of synonymous changes, we estimate that approximately 30% of amino acid polymorphisms are deleterious. Evidence of adaptive substitution is less prominent in our analysis. The photoperiod regulatory gene, CO, and a gibberellic acid transcription factor, AtMYB33, show evidence of adaptive fixation of amino acid mutations. A test for extended haplotypes revealed no examples of flowering time alleles with haplotypes comparable in length to those associated with the null fri(Col) allele reported previously. This suggests that the FRI gene likely has a uniquely intense or recent history of selection among the flowering time genes considered here. Although there is some evidence for adaptive evolution in these life-history genes, it appears that slightly deleterious polymorphisms are a major component of natural molecular variation in the flowering time network of A. thaliana.

  20. Complete genome sequence of Brachyspira intermedia reveals unique genomic features in Brachyspira species and phage-mediated horizontal gene transfer

    Science.gov (United States)

    2011-01-01

    Background Brachyspira spp. colonize the intestines of some mammalian and avian species and show different degrees of enteropathogenicity. Brachyspira intermedia can cause production losses in chickens and strain PWS/AT now becomes the fourth genome to be completed in the genus Brachyspira. Results 15 classes of unique and shared genes were analyzed in B. intermedia, B. murdochii, B. hyodysenteriae and B. pilosicoli. The largest number of unique genes was found in B. intermedia and B. murdochii. This indicates the presence of larger pan-genomes. In general, hypothetical protein annotations are overrepresented among the unique genes. A 3.2 kb plasmid was found in B. intermedia strain PWS/AT. The plasmid was also present in the B. murdochii strain but not in nine other Brachyspira isolates. Within the Brachyspira genomes, genes had been translocated and also frequently switched between leading and lagging strands, a process that can be followed by different AT-skews in the third positions of synonymous codons. We also found evidence that bacteriophages were being remodeled and genes incorporated into them. Conclusions The accessory gene pool shapes species-specific traits. It is also influenced by reductive genome evolution and horizontal gene transfer. Gene-transfer events can cross both species and genus boundaries and bacteriophages appear to play an important role in this process. A mechanism for horizontal gene transfer appears to be gene translocations leading to remodeling of bacteriophages in combination with broad tropism. PMID:21816042

  1. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae: structural comparative analysis, gene content and microsatellite detection

    Directory of Open Access Journals (Sweden)

    Andrew W. Gichira

    2017-01-01

    Full Text Available Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp, with a pair of Inverted Repeats (IR 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp and a small single copy (SSC, 18,696. H. abyssinica’s chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  2. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae): structural comparative analysis, gene content and microsatellite detection.

    Science.gov (United States)

    Gichira, Andrew W; Li, Zhizhong; Saina, Josphat K; Long, Zhicheng; Hu, Guangwan; Gituru, Robert W; Wang, Qingfeng; Chen, Jinming

    2017-01-01

    Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce) J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp), with a pair of Inverted Repeats (IR) 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp) and a small single copy (SSC, 18,696). H. abyssinica 's chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene ( infA ) which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica . A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  3. Genomewide analysis of the lateral organ boundaries domain gene ...

    Indian Academy of Sciences (India)

    of plants such as Arabidopsis, Oryza sativa, Zea mays, poplar, apple and tomato. However ... found to share a similar intron/exon structure and gene length within the same class. .... ches against the proteome and genome files downloaded.

  4. Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes

    Directory of Open Access Journals (Sweden)

    Kohane Isaac

    2005-11-01

    Full Text Available Abstract Background Recent advances in genome sequencing suggest a remarkable conservation in gene content of mammalian organisms. The similarity in gene repertoire present in different organisms has increased interest in studying regulatory mechanisms of gene expression aimed at elucidating the differences in phenotypes. In particular, a proximal promoter region contains a large number of regulatory elements that control the expression of its downstream gene. Although many studies have focused on identification of these elements, a broader picture on the complexity of transcriptional regulation of different biological processes has not been addressed in mammals. The regulatory complexity may strongly correlate with gene function, as different evolutionary forces must act on the regulatory systems under different biological conditions. We investigate this hypothesis by comparing the conservation of promoters upstream of genes classified in different functional categories. Results By conducting a rank correlation analysis between functional annotation and upstream sequence alignment scores obtained by human-mouse and human-dog comparison, we found a significantly greater conservation of the upstream sequence of genes involved in development, cell communication, neural functions and signaling processes than those involved in more basic processes shared with unicellular organisms such as metabolism and ribosomal function. This observation persists after controlling for G+C content. Considering conservation as a functional signature, we hypothesize a higher density of cis-regulatory elements upstream of genes participating in complex and adaptive processes. Conclusion We identified a class of functions that are associated with either high or low promoter conservation in mammals. We detected a significant tendency that points to complex and adaptive processes were associated with higher promoter conservation, despite the fact that they have emerged

  5. Effects of aneuploidy on genome structure, expression, and interphase organization in Arabidopsis thaliana.

    Directory of Open Access Journals (Sweden)

    Bruno Huettel

    2008-10-01

    Full Text Available Aneuploidy refers to losses and/or gains of individual chromosomes from the normal chromosome set. The resulting gene dosage imbalance has a noticeable affect on the phenotype, as illustrated by aneuploid syndromes, including Down syndrome in humans, and by human solid tumor cells, which are highly aneuploid. Although the phenotypic manifestations of aneuploidy are usually apparent, information about the underlying alterations in structure, expression, and interphase organization of unbalanced chromosome sets is still sparse. Plants generally tolerate aneuploidy better than animals, and, through colchicine treatment and breeding strategies, it is possible to obtain inbred sibling plants with different numbers of chromosomes. This possibility, combined with the genetic and genomics tools available for Arabidopsis thaliana, provides a powerful means to assess systematically the molecular and cytological consequences of aberrant numbers of specific chromosomes. Here, we report on the generation of Arabidopsis plants in which chromosome 5 is present in triplicate. We compare the global transcript profiles of normal diploids and chromosome 5 trisomics, and assess genome integrity using array comparative genome hybridization. We use live cell imaging to determine the interphase 3D arrangement of transgene-encoded fluorescent tags on chromosome 5 in trisomic and triploid plants. The results indicate that trisomy 5 disrupts gene expression throughout the genome and supports the production and/or retention of truncated copies of chromosome 5. Although trisomy 5 does not grossly distort the interphase arrangement of fluorescent-tagged sites on chromosome 5, it may somewhat enhance associations between transgene alleles. Our analysis reveals the complex genomic changes that can occur in aneuploids and underscores the importance of using multiple experimental approaches to investigate how chromosome numerical changes condition abnormal phenotypes and

  6. Analysis of the Genome and Chromium Metabolism-Related Genes of Serratia sp. S2.

    Science.gov (United States)

    Dong, Lanlan; Zhou, Simin; He, Yuan; Jia, Yan; Bai, Qunhua; Deng, Peng; Gao, Jieying; Li, Yingli; Xiao, Hong

    2018-05-01

    This study is to investigate the genome sequence of Serratia sp. S2. The genomic DNA of Serratia sp. S2 was extracted and the sequencing library was constructed. The sequencing was carried out by Illumina 2000 and complete genomic sequences were obtained. Gene function annotation and bioinformatics analysis were performed by comparing with the known databases. The genome size of Serratia sp. S2 was 5,604,115 bp and the G+C content was 57.61%. There were 5373 protein coding genes, and 3732, 3614, and 3942 genes were respectively annotated into the GO, KEGG, and COG databases. There were 12 genes related to chromium metabolism in the Serratia sp. S2 genome. The whole genome sequence of Serratia sp. S2 is submitted to the GenBank database with gene accession number of LNRP00000000. Our findings may provide theoretical basis for the subsequent development of new biotechnology to repair environmental chromium pollution.

  7. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis.

    Science.gov (United States)

    Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T

    2015-07-11

    SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research

  8. Of genes and microbes: solving the intricacies in host genomes.

    Science.gov (United States)

    Wang, Jun; Chen, Liang; Zhao, Na; Xu, Xizhan; Xu, Yakun; Zhu, Baoli

    2018-05-01

    Microbiome research is a quickly developing field in biomedical research, and we have witnessed its potential in understanding the physiology, metabolism and immunology, its critical role in understanding the health and disease of the host, and its vast capacity in disease prediction, intervention and treatment. However, many of the fundamental questions still need to be addressed, including the shaping forces of microbial diversity between individuals and across time. Microbiome research falls into the classical nature vs. nurture scenario, such that host genetics shape part of the microbiome, while environmental influences change the original course of microbiome development. In this review, we focus on the nature, i.e., the genetic part of the equation, and summarize the recent efforts in understanding which parts of the genome, especially the human and mouse genome, play important roles in determining the composition and functions of microbial communities, primarily in the gut but also on the skin. We aim to present an overview of different approaches in studying the intricate relationships between host genetic variations and microbes, its underlying philosophy and methodology, and we aim to highlight a few key discoveries along this exploration, as well as current pitfalls. More evidence and results will surely appear in upcoming studies, and the accumulating knowledge will lead to a deeper understanding of what we could finally term a "hologenome", that is, the organized, closely interacting genome of the host and the microbiome.

  9. Gene loss and horizontal gene transfer contributed to the genome evolution of the extreme acidophile Ferrovum

    Directory of Open Access Journals (Sweden)

    Sophie Roxana Ullrich

    2016-05-01

    Full Text Available Acid mine drainage (AMD, associated with active and abandoned mining sites, is a habitat for acidophilic microorganisms that gain energy from the oxidation of reduced sulfur compounds and ferrous iron and that thrive at pH below 4. Members of the recently proposed genus Ferrovum are the first acidophilic iron oxidizers to be described within the Betaproteobacteria. Although they have been detected as typical community members in AMD habitats worldwide, knowledge of their phylogenetic and metabolic diversity is scarce. Genomics approaches appear to be most promising in addressing this lacuna since isolation and cultivation of Ferrovum has proven to be extremely difficult and has so far only been successful for the designated type strain Ferrovum myxofaciens P3G. In this study, the genomes of two novel strains of Ferrovum (PN-J185 and Z-31 derived from water samples of a mine water treatment plant were sequenced. These genomes were compared with those of Ferrovum sp. JA12 that also originated from the mine water treatment plant, and of the type strain (P3G. Phylogenomic scrutiny suggests that the four strains represent three Ferrovum species that cluster in two groups (1 and 2. Comprehensive analysis of their predicted metabolic pathways revealed that these groups harbor characteristic metabolic profiles, notably with respect to motility, chemotaxis, nitrogen metabolism, biofilm formation and their potential strategies to cope with the acidic environment. For example, while the F. myxofaciens strains (group 1 appear to be motile and diazotrophic, the non-motile group 2 strains have the predicted potential to use a greater variety of fixed nitrogen sources. Furthermore, analysis of their genome synteny provides first insights into their genome evolution, suggesting that horizontal gene transfer and genome reduction in the group 2 strains by loss of genes encoding complete metabolic pathways or physiological features contributed to the observed

  10. Genomic organization and dynamics of repetitive DNA sequences in representatives of three Fagaceae genera.

    Science.gov (United States)

    Alves, Sofia; Ribeiro, Teresa; Inácio, Vera; Rocheta, Margarida; Morais-Cecílio, Leonor

    2012-05-01

    Oaks, chestnuts, and beeches are economically important species of the Fagaceae. To understand the relationship between these members of this family, a deep knowledge of their genome composition and organization is needed. In this work, we have isolated and characterized several AFLP fragments obtained from Quercus rotundifolia Lam. through homology searches in available databases. Genomic polymorphisms involving some of these sequences were evaluated in two species of Quercus, one of Castanea, and one of Fagus with specific primers. Comparative FISH analysis with generated sequences was performed in interphase nuclei of the four species, and the co-immunolocalization of 5-methylcytosine was also studied. Some of the sequences isolated proved to be genus-specific, while others were present in all the genera. Retroelements, either gypsy-like of the Tat/Athila clade or copia-like, are well represented, and most are dispersed in euchromatic regions of these species with no DNA methylation associated, pointing to an interspersed arrangement of these retroelements with potential gene-rich regions. A particular gypsy-sequence is dispersed in oaks and chestnut nuclei, but its confinement to chromocenters in beech evidences genome restructuring events during evolution of Fagaceae. Several sequences generated in this study proved to be good tools to comparatively study Fagaceae genome organization.

  11. Lineage-specific evolution of the vertebrate Otopetrin gene family revealed by comparative genomic analyses

    Directory of Open Access Journals (Sweden)

    Ryan Joseph F

    2011-01-01

    Full Text Available Abstract Background Mutations in the Otopetrin 1 gene (Otop1 in mice and fish produce an unusual bilateral vestibular pathology that involves the absence of otoconia without hearing impairment. The encoded protein, Otop1, is the only functionally characterized member of the Otopetrin Domain Protein (ODP family; the extended sequence and structural preservation of ODP proteins in metazoans suggest a conserved functional role. Here, we use the tools of sequence- and cytogenetic-based comparative genomics to study the Otop1 and the Otop2-Otop3 genes and to establish their genomic context in 25 vertebrates. We extend our evolutionary study to include the gene mutated in Usher syndrome (USH subtype 1G (Ush1g, both because of the head-to-tail clustering of Ush1g with Otop2 and because Otop1 and Ush1g mutations result in inner ear phenotypes. Results We established that OTOP1 is the boundary gene of an inversion polymorphism on human chromosome 4p16 that originated in the common human-chimpanzee lineage more than 6 million years ago. Other lineage-specific evolutionary events included a three-fold expansion of the Otop genes in Xenopus tropicalis and of Ush1g in teleostei fish. The tight physical linkage between Otop2 and Ush1g is conserved in all vertebrates. To further understand the functional organization of the Ushg1-Otop2 locus, we deduced a putative map of binding sites for CCCTC-binding factor (CTCF, a mammalian insulator transcription factor, from genome-wide chromatin immunoprecipitation-sequencing (ChIP-seq data in mouse and human embryonic stem (ES cells combined with detection of CTCF-binding motifs. Conclusions The results presented here clarify the evolutionary history of the vertebrate Otop and Ush1g families, and establish a framework for studying the possible interaction(s of Ush1g and Otop in developmental pathways.

  12. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    Science.gov (United States)

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  13. Genome-wide survey and developmental expression mapping of zebrafish SET domain-containing genes.

    Directory of Open Access Journals (Sweden)

    Xiao-Jian Sun

    Full Text Available SET domain-containing proteins represent an evolutionarily conserved family of epigenetic regulators, which are responsible for most histone lysine methylation. Since some of these genes have been revealed to be essential for embryonic development, we propose that the zebrafish, a vertebrate model organism possessing many advantages for developmental studies, can be utilized to study the biological functions of these genes and the related epigenetic mechanisms during early development. To this end, we have performed a genome-wide survey of zebrafish SET domain genes. 58 genes total have been identified. Although gene duplication events give rise to several lineage-specific paralogs, clear reciprocal orthologous relationship reveals high conservation between zebrafish and human SET domain genes. These data were further subject to an evolutionary analysis ranging from yeast to human, leading to the identification of putative clusters of orthologous groups (COGs of this gene family. By means of whole-mount mRNA in situ hybridization strategy, we have also carried out a developmental expression mapping of these genes. A group of maternal SET domain genes, which are implicated in the programming of histone modification states in early development, have been identified and predicted to be responsible for all known sites of SET domain-mediated histone methylation. Furthermore, some genes show specific expression patterns in certain tissues at certain stages, suggesting the involvement of epigenetic mechanisms in the development of these systems. These results provide a global view of zebrafish SET domain histone methyltransferases in evolutionary and developmental dimensions and pave the way for using zebrafish to systematically study the roles of these genes during development.

  14. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    Directory of Open Access Journals (Sweden)

    Ueki Masao

    2012-05-01

    Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.

  15. Genome-wide analysis of the expansin gene superfamily reveals grapevine-specific structural and functional characteristics.

    Directory of Open Access Journals (Sweden)

    Silvia Dal Santo

    Full Text Available BACKGROUND: Expansins are proteins that loosen plant cell walls in a pH-dependent manner, probably by increasing the relative movement among polymers thus causing irreversible expansion. The expansin superfamily (EXP comprises four distinct families: expansin A (EXPA, expansin B (EXPB, expansin-like A (EXLA and expansin-like B (EXLB. There is experimental evidence that EXPA and EXPB proteins are required for cell expansion and developmental processes involving cell wall modification, whereas the exact functions of EXLA and EXLB remain unclear. The complete grapevine (Vitis vinifera genome sequence has allowed the characterization of many gene families, but an exhaustive genome-wide analysis of expansin gene expression has not been attempted thus far. METHODOLOGY/PRINCIPAL FINDINGS: We identified 29 EXP superfamily genes in the grapevine genome, representing all four EXP families. Members of the same EXP family shared the same exon-intron structure, and phylogenetic analysis confirmed a closer relationship between EXP genes from woody species, i.e. grapevine and poplar (Populus trichocarpa, compared to those from Arabidopsis thaliana and rice (Oryza sativa. We also identified grapevine-specific duplication events involving the EXLB family. Global gene expression analysis confirmed a strong correlation among EXP genes expressed in mature and green/vegetative samples, respectively, as reported for other gene families in the recently-published grapevine gene expression atlas. We also observed the specific co-expression of EXLB genes in woody organs, and the involvement of certain grapevine EXP genes in berry development and post-harvest withering. CONCLUSION: Our comprehensive analysis of the grapevine EXP superfamily confirmed and extended current knowledge about the structural and functional characteristics of this gene family, and also identified properties that are currently unique to grapevine expansin genes. Our data provide a model for the

  16. Mapping Determinants of Gene Expression Plasticity by Genetical Genomics in C. elegans

    NARCIS (Netherlands)

    Li, Y.; Alda Alvarez, O.; Gutteling, E.W.; Tijsterman, M.; Fu, J.; Riksen, J.A.G.; Hazendonk, E.; Prins, J.C.P.; Plasterk, R.H.A.; Jansen, R.C.; Breitling, R.; Kammenga, J.E.

    2006-01-01

    Recent genetical genomics studies have provided intimate views on gene regulatory networks. Gene expression variations between genetically different individuals have been mapped to the causal regulatory regions, termed expression quantitative trait loci. Whether the environment-induced plastic

  17. Mapping determinants of gene expression plasticity by genetical genomics in C. elegans.

    NARCIS (Netherlands)

    Li, Y.; Alvarez, O.A.; Gutteling, E.W.; Tijsterman, M.; Fu, J.; Riksen, J.A.; Hazendonk, M.G.A.; Prins, P.; Plasterk, R.H.A.; Jansen, R.C.; Breitling, R.; Kammenga, J.E.

    2006-01-01

    Recent genetical genomics studies have provided intimate views on gene regulatory networks. Gene expression variations between genetically different individuals have been mapped to the causal regulatory regions, termed expression quantitative trait loci. Whether the environment-induced plastic

  18. Genome-wide analysis of regions similar to promoters of histone genes

    KAUST Repository

    Chowdhary, Rajesh; Bajic, Vladimir B.; Dong, Difeng; Wong, Limsoon; Liu, Jun S

    2010-01-01

    of histone and histone-coregulated gene transcription initiation. While these hypotheses still remain to be verified, we believe that these form a useful resource for researchers to further explore regulation of human histone genes and human genome

  19. Hepatitis A Virus Genome Organization and Replication Strategy.

    Science.gov (United States)

    McKnight, Kevin L; Lemon, Stanley M

    2018-04-02

    Hepatitis A virus (HAV) is a positive-strand RNA virus classified in the genus Hepatovirus of the family Picornaviridae It is an ancient virus with a long evolutionary history and multiple features of its capsid structure, genome organization, and replication cycle that distinguish it from other mammalian picornaviruses. HAV proteins are produced by cap-independent translation of a single, long open reading frame under direction of an inefficient, upstream internal ribosome entry site (IRES). Genome replication occurs slowly and is noncytopathic, with transcription likely primed by a uridylated protein primer as in other picornaviruses. Newly produced quasi-enveloped virions (eHAV) are released from cells in a nonlytic fashion in a unique process mediated by interactions of capsid proteins with components of the host cell endosomal sorting complexes required for transport (ESCRT) system. Copyright © 2018 Cold Spring Harbor Laboratory Press; all rights reserved.

  20. Genome-wide identification and expression analysis of the CIPK gene family in cassava

    Directory of Open Access Journals (Sweden)

    Wei eHu

    2015-10-01

    Full Text Available Cassava is an important food and potential biofuel crop that is tolerant to multiple abiotic stressors. The mechanisms underlying these tolerances are currently less known. CBL-interacting protein kinases (CIPKs have been shown to play crucial roles in plant developmental processes, hormone signaling transduction, and in the response to abiotic stress. However, no data is currently available about the CPK family in cassava. In this study, a total of 25 CIPK genes were identified from cassava genome based on our previous genome sequencing data. Phylogenetic analysis suggested that 25 MeCIPKs could be classified into four subfamilies, which was supported by exon-intron organizations and the architectures of conserved protein motifs. Transcriptomic analysis of a wild subspecies and two cultivated varieties showed that most MeCIPKs had different expression patterns between wild subspecies and cultivatars in different tissues or in response to drought stress. Some orthologous genes involved in CIPK interaction networks were identified between Arabidopsis and cassava. The interaction networks and co-expression patterns of these orthologous genes revealed that the crucial pathways controlled by CIPK networks may be involved in the differential response to drought stress in different accessions of cassava. Nine MeCIPK genes were selected to investigate their transcriptional response to various stimuli and the results showed the comprehensive response of the tested MeCIPK genes to osmotic, salt, cold, oxidative stressors, and ABA signaling. The identification and expression analysis of CIPK family suggested that CIPK genes are important components of development and multiple signal transduction pathways in cassava. The findings of this study will help lay a foundation for the functional characterization of the CIPK gene family and provide an improved understanding of abiotic stress responses and signaling transduction in cassava.

  1. Genome-wide study of correlations between genomic features and their relationship with the regulation of gene expression.

    Science.gov (United States)

    Kravatsky, Yuri V; Chechetkin, Vladimir R; Tchurikov, Nikolai A; Kravatskaya, Galina I

    2015-02-01

    The broad class of tasks in genetics and epigenetics can be reduced to the study of various features that are distributed over the genome (genome tracks). The rapid and efficient processing of the huge amount of data stored in the genome-scale databases cannot be achieved without the software packages based on the analytical criteria. However, strong inhomogeneity of genome tracks hampers the development of relevant statistics. We developed the criteria for the assessment of genome track inhomogeneity and correlations between two genome tracks. We also developed a software package, Genome Track Analyzer, based on this theory. The theory and software were tested on simulated data and were applied to the study of correlations between CpG islands and transcription start sites in the Homo sapiens genome, between profiles of protein-binding sites in chromosomes of Drosophila melanogaster, and between DNA double-strand breaks and histone marks in the H. sapiens genome. Significant correlations between transcription start sites on the forward and the reverse strands were observed in genomes of D. melanogaster, Caenorhabditis elegans, Mus musculus, H. sapiens, and Danio rerio. The observed correlations may be related to the regulation of gene expression in eukaryotes. Genome Track Analyzer is freely available at http://ancorr.eimb.ru/. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  2. Genomic Characterization of Phenylalanine Ammonia Lyase Gene in Buckwheat.

    Directory of Open Access Journals (Sweden)

    Karthikeyan Thiyagarajan

    Full Text Available Phenylalanine Ammonia Lyase (PAL gene which plays a key role in bio-synthesis of medicinally important compounds, Rutin/quercetin was sequence characterized for its efficient genomics application. These compounds possessing anti-diabetic and anti-cancer properties and are predominantly produced by Fagopyrum spp. In the present study, PAL gene was sequenced from three Fagopyrum spp. (F. tataricum, F. esculentum and F. dibotrys and showed the presence of three SNPs and four insertion/deletions at intra and inter specific level. Among them, the potential SNP (position 949th bp G>C with Parsimony Informative Site was selected and successfully utilised to individuate the zygosity/allelic variation of 16 F. tataricum varieties. Insertion mutations were identified in coding region, which resulted the change of a stretch of 39 amino acids on the putative protein. Our Study revealed that autogamous species (F. tataricum has lower frequency of observed SNPs as compared to allogamous species (F. dibotrys and F. esculentum. The identified SNPs in F. tataricum didn't result to amino acid change, while in other two species it caused both conservative and non-conservative variations. Consistent pattern of SNPs across the species revealed their phylogenetic importance. We found two groups of F. tataricum and one of them was closely related with F. dibotrys. Sequence characterization information of PAL gene reported in present investigation can be utilized in genetic improvement of buckwheat in reference to its medicinal value.

  3. Comparative analysis of genome maintenance genes in naked mole rat, mouse, and human

    NARCIS (Netherlands)

    S.L. Macrae (Sheila L.); Q. Zhang (Quanwei); C. Lemetre (Christophe); I. Seim (Inge); R.B. Calder (Robert B.); J.H.J. Hoeijmakers (Jan); Y. Suh (Yousin); V.N. Gladyshev (Vadim N.); A. Seluanov (Andrei); V. Gorbunova (Vera); J. Vijg (Jan); Z.D. Zhang (Zhengdong D.)

    2015-01-01

    textabstractGenome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM

  4. Theories of Population Variation in Genes and Genomes

    DEFF Research Database (Denmark)

    Christiansen, Freddy

    This textbook provides an authoritative introduction to both classical and coalescent approaches to population genetics. Written for graduate students and advanced undergraduates by one of the world’s leading authorities in the field, the book focuses on the theoretical background of population...... genetics, while emphasizing the close interplay between theory and empiricism. Traditional topics such as genetic and phenotypic variation, mutation, migration, and linkage are covered and advanced by contemporary coalescent theory, which describes the genealogy of genes in a population, ultimately...... connecting them to a single common ancestor. Effects of selection, particularly genomic effects, are discussed with reference to molecular genetic variation. The book is designed for students of population genetics, bioinformatics, evolutionary biology, molecular evolution, and theoretical biology—as well...

  5. Comparative genomic analysis of the arthropod muscle myosin heavy chain genes allows ancestral gene reconstruction and reveals a new type of 'partially' processed pseudogene

    Directory of Open Access Journals (Sweden)

    Kollmar Martin

    2008-02-01

    Full Text Available Abstract Background Alternative splicing of mutually exclusive exons is an important mechanism for increasing protein diversity in eukaryotes. The insect Mhc (myosin heavy chain gene produces all different muscle myosins as a result of alternative splicing in contrast to most other organisms of the Metazoa lineage, that have a family of muscle genes with each gene coding for a protein specialized for a functional niche. Results The muscle myosin heavy chain genes of 22 species of the Arthropoda ranging from the waterflea to wasp and Drosophila have been annotated. The analysis of the gene structures allowed the reconstruction of an ancient muscle myosin heavy chain gene and showed that during evolution of the arthropods introns have mainly been lost in these genes although intron gain might have happened in a few cases. Surprisingly, the genome of Aedes aegypti contains another and that of Culex pipiens quinquefasciatus two further muscle myosin heavy chain genes, called Mhc3 and Mhc4, that contain only one variant of the corresponding alternative exons of the Mhc1 gene. Mhc3 transcription in Aedes aegypti is documented by EST data. Mhc3 and Mhc4 inserted in the Aedes and Culex genomes either by gene duplication followed by the loss of all but one variant of the alternative exons, or by incorporation of a transcript of which all other variants have been spliced out retaining the exon-intron structure. The second and more likely possibility represents a new type of a 'partially' processed pseudogene. Conclusion Based on the comparative genomic analysis of the alternatively spliced arthropod muscle myosin heavy chain genes we propose that the splicing process operates sequentially on the transcript. The process consists of the splicing of the mutually exclusive exons until one exon out of the cluster remains while retaining surrounding intronic sequence. In a second step splicing of introns takes place. A related mechanism could be responsible for

  6. Genome wide analyses of metal responsive genes in Caenorhabditis elegans

    Directory of Open Access Journals (Sweden)

    Michael eAschner

    2012-04-01

    Full Text Available Metals are major contaminants that influence human health. Many metals have physiologic roles, but excessive levels can be harmful. Advances in technology have made toxicogenomic analyses possible to characterize the effects of metal exposure on the entire genome. Much of what is known about cellular responses to metals has come from mammalian systems; however the use of non-mammalian species is gaining wider attention. Caenorhabditis elegans (C. elegans is a small round worm whose genome has been fully sequenced and its development from egg to adult is well characterized. It is an attractive model for high throughput screens due to its short lifespan, ease of genetic mutability, low cost and high homology with humans. Research performed in C. elegans has led to insights in apoptosis, gene expression and neurodegeneration, all of which can be altered by metal exposure. Additionally, by using worms one can potentially study how the mechanisms that underline differential responses to metals in nematodes and humans, allowing for identification of novel pathways and therapeutic targets. In this review, toxicogenomic studies performed in C. elegans exposed to various metals will be discussed, highlighting how this non-mammalian system can be utilized to study cellular processes and pathways induced by metals. Recent work focusing on neurodegeneration in Parkinson’s disease will be discussed as an example of the usefulness of genetic screens in C. elegans and the novel findings that can be produced.

  7. Genome-wide identification of Pseudomonas aeruginosa virulence-related genes using a Caenorhabditis elegans infection model.

    Directory of Open Access Journals (Sweden)

    Rhonda L Feinbaum

    Full Text Available Pseudomonas aeruginosa strain PA14 is an opportunistic human pathogen capable of infecting a wide range of organisms including the nematode Caenorhabditis elegans. We used a non-redundant transposon mutant library consisting of 5,850 clones corresponding to 75% of the total and approximately 80% of the non-essential PA14 ORFs to carry out a genome-wide screen for attenuation of PA14 virulence in C. elegans. We defined a functionally diverse 180 mutant set (representing 170 unique genes necessary for normal levels of virulence that included both known and novel virulence factors. Seven previously uncharacterized virulence genes (ABC transporters PchH and PchI, aminopeptidase PepP, ATPase/molecular chaperone ClpA, cold shock domain protein PA0456, putative enoyl-CoA hydratase/isomerase PA0745, and putative transcriptional regulator PA14_27700 were characterized with respect to pigment production and motility and all but one of these mutants exhibited pleiotropic defects in addition to their avirulent phenotype. We examined the collection of genes required for normal levels of PA14 virulence with respect to occurrence in P. aeruginosa strain-specific genomic regions, location on putative and known genomic islands, and phylogenetic distribution across prokaryotes. Genes predominantly contributing to virulence in C. elegans showed neither a bias for strain-specific regions of the P. aeruginosa genome nor for putatively horizontally transferred genomic islands. Instead, within the collection of virulence-related PA14 genes, there was an overrepresentation of genes with a broad phylogenetic distribution that also occur with high frequency in many prokaryotic clades, suggesting that in aggregate the genes required for PA14 virulence in C. elegans are biased towards evolutionarily conserved genes.

  8. Genome-wide identification of Pseudomonas aeruginosa virulence-related genes using a Caenorhabditis elegans infection model.

    Science.gov (United States)

    Feinbaum, Rhonda L; Urbach, Jonathan M; Liberati, Nicole T; Djonovic, Slavica; Adonizio, Allison; Carvunis, Anne-Ruxandra; Ausubel, Frederick M

    2012-01-01

    Pseudomonas aeruginosa strain PA14 is an opportunistic human pathogen capable of infecting a wide range of organisms including the nematode Caenorhabditis elegans. We used a non-redundant transposon mutant library consisting of 5,850 clones corresponding to 75% of the total and approximately 80% of the non-essential PA14 ORFs to carry out a genome-wide screen for attenuation of PA14 virulence in C. elegans. We defined a functionally diverse 180 mutant set (representing 170 unique genes) necessary for normal levels of virulence that included both known and novel virulence factors. Seven previously uncharacterized virulence genes (ABC transporters PchH and PchI, aminopeptidase PepP, ATPase/molecular chaperone ClpA, cold shock domain protein PA0456, putative enoyl-CoA hydratase/isomerase PA0745, and putative transcriptional regulator PA14_27700) were characterized with respect to pigment production and motility and all but one of these mutants exhibited pleiotropic defects in addition to their avirulent phenotype. We examined the collection of genes required for normal levels of PA14 virulence with respect to occurrence in P. aeruginosa strain-specific genomic regions, location on putative and known genomic islands, and phylogenetic distribution across prokaryotes. Genes predominantly contributing to virulence in C. elegans showed neither a bias for strain-specific regions of the P. aeruginosa genome nor for putatively horizontally transferred genomic islands. Instead, within the collection of virulence-related PA14 genes, there was an overrepresentation of genes with a broad phylogenetic distribution that also occur with high frequency in many prokaryotic clades, suggesting that in aggregate the genes required for PA14 virulence in C. elegans are biased towards evolutionarily conserved genes.

  9. The origin and evolution of Basigin(BSG) gene: A comparative genomic and phylogenetic analysis.

    Science.gov (United States)

    Zhu, Xinyan; Wang, Shenglan; Shao, Mingjie; Yan, Jie; Liu, Fei

    2017-07-01

    Basigin (BSG), also known as extracellular matrix metalloproteinase inducer (EMMPRIN) or cluster of differentiation 147 (CD147), plays various fundamental roles in the intercellular recognition involved in immunologic phenomena, differentiation, and development. In this study, we aimed to compare the similarities and differences of BSG among organisms and explore possible evolutionary relationships based on the comparison result. We used the extensive BLAST tool to search the metazoan genomes, N-glycosylation sites, the transmembrane region and other functional sites. We then identified BSG homologs from genomic sequences and analyzed their phylogenetic relationships. We identified that BSG genes exist not only in the vertebrate metazoans but also in the invertebrate metazoans such as Amphioxus B. floridae, D. melanogaster, A. mellifera, S. japonicum, C. gigas, and T. patagoniensis. After sequence analysis, we confirmed that only vertebrate metazoans and Cephalochordate (amphioxus B. floridae) have the classic structure (a signal peptide, two Ig-like domains (IgC2 and IgI), a transmembrane region, and an intracellular domain). The invertebrate metazoans (excluding amphioxus B. floridae) lack the N-terminal signal peptides and IgC2 domain. We then generated a phylogenetic tree, genome organization comparison, and chromosomal disposition analysis based on the biological information obtained from the NCBI and Ensembl databases. Finally, we established the possible evolutionary scenario of the BSG gene, which showed the restricted exon rearrangement that has occurred during evolution, forming the present-day BSG gene. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Meta genome-wide network from functional linkages of genes in human gut microbial ecosystems.

    Science.gov (United States)

    Ji, Yan; Shi, Yixiang; Wang, Chuan; Dai, Jianliang; Li, Yixue

    2013-03-01

    The human gut microbial ecosystem (HGME) exerts an important influence on the human health. In recent researches, meta-genomics provided deep insights into the HGME in terms of gene contents, metabolic processes and genome constitutions of meta-genome. Here we present a novel methodology to investigate the HGME on the basis of a set of functionally coupled genes regardless of their genome origins when considering the co-evolution properties of genes. By analyzing these coupled genes, we showed some basic properties of HGME significantly associated with each other, and further constructed a protein interaction map of human gut meta-genome to discover some functional modules that may relate with essential metabolic processes. Compared with other studies, our method provides a new idea to extract basic function elements from meta-genome systems and investigate complex microbial environment by associating its biological traits with co-evolutionary fingerprints encoded in it.

  11. GeneDig: a web application for accessing genomic and bioinformatics knowledge.

    Science.gov (United States)

    Suciu, Radu M; Aydin, Emir; Chen, Brian E

    2015-02-28

    With the exponential increase and widespread availability of genomic, transcriptomic, and proteomic data, accessing these '-omics' data is becoming increasingly difficult. The current resources for accessing and analyzing these data have been created to perform highly specific functions intended for specialists, and thus typically emphasize functionality over user experience. We have developed a web-based application, GeneDig.org, that allows any general user access to genomic information with ease and efficiency. GeneDig allows for searching and browsing genes and genomes, while a dynamic navigator displays genomic, RNA, and protein information simultaneously for co-navigation. We demonstrate that our application allows more than five times faster and efficient access to genomic information than any currently available methods. We have developed GeneDig as a platform for bioinformatics integration focused on usability as its central design. This platform will introduce genomic navigation to broader audiences while aiding the bioinformatics analyses performed in everyday biology research.

  12. Hypothesis: Gene-rich plastid genomes in red algae may be an outcome of nuclear genome reduction.

    Science.gov (United States)

    Qiu, Huan; Lee, Jun Mo; Yoon, Hwan Su; Bhattacharya, Debashish

    2017-06-01

    Red algae (Rhodophyta) putatively diverged from the eukaryote tree of life >1.2 billion years ago and are the source of plastids in the ecologically important diatoms, haptophytes, and dinoflagellates. In general, red algae contain the largest plastid gene inventory among all such organelles derived from primary, secondary, or additional rounds of endosymbiosis. In contrast, their nuclear gene inventory is reduced when compared to their putative sister lineage, the Viridiplantae, and other photosynthetic lineages. The latter is thought to have resulted from a phase of genome reduction that occurred in the stem lineage of Rhodophyta. A recent comparative analysis of a taxonomically broad collection of red algal and Viridiplantae plastid genomes demonstrates that the red algal ancestor encoded ~1.5× more plastid genes than Viridiplantae. This difference is primarily explained by more extensive endosymbiotic gene transfer (EGT) in the stem lineage of Viridiplantae, when compared to red algae. We postulate that limited EGT in Rhodophytes resulted from the countervailing force of ancient, and likely recurrent, nuclear genome reduction. In other words, the propensity for nuclear gene loss led to the retention of red algal plastid genes that would otherwise have undergone intracellular gene transfer to the nucleus. This hypothesis recognizes the primacy of nuclear genome evolution over that of plastids, which have no inherent control of their gene inventory and can change dramatically (e.g., secondarily non-photosynthetic eukaryotes, dinoflagellates) in response to selection acting on the host lineage. © 2017 Phycological Society of America.

  13. Strategies used for genetically modifying bacterial genome: ite-directed mutagenesis, gene inactivation, and gene over-expression*

    Science.gov (United States)

    Xu, Jian-zhong; Zhang, Wei-guo

    2016-01-01

    With the availability of the whole genome sequence of Escherichia coli or Corynebacterium glutamicum, strategies for directed DNA manipulation have developed rapidly. DNA manipulation plays an important role in understanding the function of genes and in constructing novel engineering bacteria according to requirement. DNA manipulation involves modifying the autologous genes and expressing the heterogenous genes. Two alternative approaches, using electroporation linear DNA or recombinant suicide plasmid, allow a wide variety of DNA manipulation. However, the over-expression of the desired gene is generally executed via plasmid-mediation. The current review summarizes the common strategies used for genetically modifying E. coli and C. glutamicum genomes, and discusses the technical problem of multi-layered DNA manipulation. Strategies for gene over-expression via integrating into genome are proposed. This review is intended to be an accessible introduction to DNA manipulation within the bacterial genome for novices and a source of the latest experimental information for experienced investigators. PMID:26834010

  14. Draft Genome Sequence and Gene Annotation of the Entomopathogenic Fungus Verticillium hemipterigenum

    OpenAIRE

    Horn, Fabian; Habel, Andreas; Scharf, Daniel H.; Dworschak, Jan; Brakhage, Axel A.; Guthke, Reinhard; Hertweck, Christian; Linde, J?rg

    2015-01-01

    Verticillium hemipterigenum (anamorph Torrubiella hemipterigena) is an entomopathogenic fungus and produces a broad range of secondary metabolites. Here, we present the draft genome sequence of the fungus, including gene structure and functional annotation. Genes were predicted incorporating RNA-Seq data and functionally annotated to provide the basis for further genome studies.

  15. Genomic evidence of bitter taste in snakes and phylogenetic analysis of bitter taste receptor genes in reptiles

    Directory of Open Access Journals (Sweden)

    Huaming Zhong

    2017-08-01

    Full Text Available As nontraditional model organisms with extreme physiological and morphological phenotypes, snakes are believed to possess an inferior taste system. However, the bitter taste sensation is essential to distinguish the nutritious and poisonous food resources and the genomic evidence of bitter taste in snakes is largely scarce. To explore the genetic basis of the bitter taste of snakes and characterize the evolution of bitter taste receptor genes (Tas2rs in reptiles, we identified Tas2r genes in 19 genomes (species corresponding to three orders of non-avian reptiles. Our results indicated contractions of Tas2r gene repertoires in snakes, however dramatic gene expansions have occurred in lizards. Phylogenetic analysis of the Tas2rs with NJ and BI methods revealed that Tas2r genes of snake species formed two clades, whereas in lizards the Tas2r genes clustered into two monophyletic clades and four large clades. Evolutionary changes (birth and death of intact Tas2r genes in reptiles were determined by reconciliation analysis. Additionally, the taste signaling pathway calcium homeostasis modulator 1 (Calhm1 gene of snakes was putatively functional, suggesting that snakes still possess bitter taste sensation. Furthermore, Phylogenetically Independent Contrasts (PIC analyses reviewed a significant correlation between the number of Tas2r genes and the amount of potential toxins in reptilian diets, suggesting that insectivores such as some lizards may require more Tas2rs genes than omnivorous and carnivorous reptiles.

  16. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming.

    Science.gov (United States)

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.

  17. Congruent Deep Relationships in the Grape Family (Vitaceae Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming.

    Directory of Open Access Journals (Sweden)

    Ning Zhang

    Full Text Available Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera. The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.

  18. Reconstruction of Ancestral Genomes in Presence of Gene Gain and Loss.

    Science.gov (United States)

    Avdeyev, Pavel; Jiang, Shuai; Aganezov, Sergey; Hu, Fei; Alekseyev, Max A

    2016-03-01

    Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools.

  19. Transcriptional interference networks coordinate the expression of functionally-related genes clustered in the same genomic loci

    Directory of Open Access Journals (Sweden)

    Zsolt eBoldogkoi

    2012-07-01

    Full Text Available The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organisation, transcription, various post-transcriptional processes and translation. In this study, the Transcriptional Interference Network (TIN hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighbouring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally-linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly-arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely-oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronised cascade of gene expression in functionally-linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular

  20. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    OpenAIRE

    Wright, James C.; Sugden, Deana; Francis-McIntyre, Sue; Riba Garcia, Isabel; Gaskell, Simon J.; Grigoriev, Igor V.; Baker, Scott E.; Beynon, Robert J.; Hubbard, Simon J.

    2009-01-01

    Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were ac...

  1. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes

    Science.gov (United States)

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approxima...

  2. Genomic determinants of sporulation in Bacilli and Clostridia: towards the minimal set of sporulation-specific genes.

    Science.gov (United States)

    Galperin, Michael Y; Mekhedov, Sergei L; Puigbo, Pere; Smirnov, Sergey; Wolf, Yuri I; Rigden, Daniel J

    2012-11-01

    Three classes of low-G+C Gram-positive bacteria (Firmicutes), Bacilli, Clostridia and Negativicutes, include numerous members that are capable of producing heat-resistant endospores. Spore-forming firmicutes include many environmentally important organisms, such as insect pathogens and cellulose-degrading industrial strains, as well as human pathogens responsible for such diseases as anthrax, botulism, gas gangrene and tetanus. In the best-studied model organism Bacillus subtilis, sporulation involves over 500 genes, many of which are conserved among other bacilli and clostridia. This work aimed to define the genomic requirements for sporulation through an analysis of the presence of sporulation genes in various firmicutes, including those with smaller genomes than B. subtilis. Cultivable spore-formers were found to have genomes larger than 2300 kb and encompass over 2150 protein-coding genes of which 60 are orthologues of genes that are apparently essential for sporulation in B. subtilis. Clostridial spore-formers lack, among others, spoIIB, sda, spoVID and safA genes and have non-orthologous displacements of spoIIQ and spoIVFA, suggesting substantial differences between bacilli and clostridia in the engulfment and spore coat formation steps. Many B. subtilis sporulation genes, particularly those encoding small acid-soluble spore proteins and spore coat proteins, were found only in the family Bacillaceae, or even in a subset of Bacillus spp. Phylogenetic profiles of sporulation genes, compiled in this work, confirm the presence of a common sporulation gene core, but also illuminate the diversity of the sporulation processes within various lineages. These profiles should help further experimental studies of uncharacterized widespread sporulation genes, which would ultimately allow delineation of the minimal set(s) of sporulation-specific genes in Bacilli and Clostridia. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.

  3. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, M; Jensen, L J; Brunak, S

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribut......In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only approximately 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  4. The impact of genome triplication on tandem gene evolution in Brassica rapa

    Directory of Open Access Journals (Sweden)

    Lu eFang

    2012-11-01

    Full Text Available Whole genome duplication (WGD and tandem duplication (TD are both important modes of gene expansion. However, how whole genome duplication influences tandemly duplicated genes is not well studied. We used Brassica rapa, which has undergone an additional genome triplication (WGT and shares a common ancestor with Arabidopsis thaliana, Arabidopsis lyrata and Thellungiella parvula, to investigate the impact of genome triplication on tandem gene evolution. We identified 2,137, 1,569, 1,751 and 1,135 tandem gene arrays in B. rapa, A. thaliana, A. lyrata and T. parvula respectively. Among them, 414 conserved tandem arrays are shared by the 3 species without WGT, which were also considered as existing in the diploid ancestor of B. rapa. Thus, after genome triplication, B. rapa should have 1,242 tandem arrays according to the 414 conserved tandems. Here, we found 400 out of the 414 tandems had at least one syntenic ortholog in the genome of B. rapa. Furthermore, 294 out of the 400 shared syntenic orthologs maintain tandem arrays (more than one gene for each syntenic hit in B. rapa. For the 294 tandem arrays, we obtained 426 copies of syntenic paralogous tandems in the triplicated genome of B. rapa. In this study, we demonstrated that tandem arrays in B. rapa were dramatically fractionated after WGT when compared either to non-tandem genes in the B. rapa genome or to the tandem arrays in closely related species that have not experienced a recent whole-genome polyploidization event.

  5. Sequence-Based Analysis of Structural Organization and Composition of the Cultivated Sunflower (Helianthus annuus L. Genome

    Directory of Open Access Journals (Sweden)

    Navdeep Gill

    2014-04-01

    Full Text Available Sunflower is an important oilseed crop, as well as a model system for evolutionary studies, but its 3.6 gigabase genome has proven difficult to assemble, in part because of the high repeat content of its genome. Here we report on the sequencing, assembly, and analyses of 96 randomly chosen BACs from sunflower to provide additional information on the repeat content of the sunflower genome, assess how repetitive elements in the sunflower genome are organized relative to genes, and compare the genomic distribution of these repeats to that found in other food crops and model species. We also examine the expression of transposable element-related transcripts in EST databases for sunflower to determine the representation of repeats in the transcriptome and to measure their transcriptional activity. Our data confirm previous reports in suggesting that the sunflower genome is >78% repetitive. Sunflower repeats share very little similarity to other plant repeats such as those of Arabidopsis, rice, maize and wheat; overall 28% of repeats are “novel” to sunflower. The repetitive sequences appear to be randomly distributed within the sequenced BACs. Assuming the 96 BACs are representative of the genome as a whole, then approximately 5.2% of the sunflower genome comprises non TE-related genic sequence, with an average gene density of 18kbp/gene. Expression levels of these transposable elements indicate tissue specificity and differential expression in vegetative and reproductive tissues, suggesting that expressed TEs might contribute to sunflower development. The assembled BACs will also be useful for assessing the quality of several different draft assemblies of the sunflower genome and for annotating the reference sequence.

  6. Sequence-Based Analysis of Structural Organization and Composition of the Cultivated Sunflower (Helianthus annuus L.) Genome

    Science.gov (United States)

    Gill, Navdeep; Buti, Matteo; Kane, Nolan; Bellec, Arnaud; Helmstetter, Nicolas; Berges, Hélène; Rieseberg, Loren H.

    2014-01-01

    Sunflower is an important oilseed crop, as well as a model system for evolutionary studies, but its 3.6 gigabase genome has proven difficult to assemble, in part because of the high repeat content of its genome. Here we report on the sequencing, assembly, and analyses of 96 randomly chosen BACs from sunflower to provide additional information on the repeat content of the sunflower genome, assess how repetitive elements in the sunflower genome are organized relative to genes, and compare the genomic distribution of these repeats to that found in other food crops and model species. We also examine the expression of transposable element-related transcripts in EST databases for sunflower to determine the representation of repeats in the transcriptome and to measure their transcriptional activity. Our data confirm previous reports in suggesting that the sunflower genome is >78% repetitive. Sunflower repeats share very little similarity to other plant repeats such as those of Arabidopsis, rice, maize and wheat; overall 28% of repeats are “novel” to sunflower. The repetitive sequences appear to be randomly distributed within the sequenced BACs. Assuming the 96 BACs are representative of the genome as a whole, then approximately 5.2% of the sunflower genome comprises non TE-related genic sequence, with an average gene density of 18kbp/gene. Expression levels of these transposable elements indicate tissue specificity and differential expression in vegetative and reproductive tissues, suggesting that expressed TEs might contribute to sunflower development. The assembled BACs will also be useful for assessing the quality of several different draft assemblies of the sunflower genome and for annotating the reference sequence. PMID:24833511

  7. Inter-genomic DNA Exchanges and Homeologous Gene Silencing Shaped the Nascent Allopolyploid Coffee Genome (Coffea arabica L.

    Directory of Open Access Journals (Sweden)

    Philippe Lashermes

    2016-09-01

    Full Text Available Allopolyploidization is a biological process that has played a major role in plant speciation and evolution. Genomic changes are common consequences of polyploidization, but their dynamics over time are still poorly understood. Coffea arabica, a recently formed allotetraploid, was chosen to study genetic changes that accompany allopolyploid formation. Both RNA-seq and DNA-seq data were generated from two genetically distant C. arabica accessions. Genomic structural variation was investigated using C. canephora, one of its diploid progenitors, as reference genome. The fate of 9047 duplicate homeologous genes was inferred and compared between the accessions. The pattern of SNP density along the reference genome was consistent with the allopolyploid structure. Large genomic duplications or deletions were not detected. Two homeologous copies were retained and expressed in 96% of the genes analyzed. Nevertheless, duplicated genes were found to be affected by various genomic changes leading to homeolog loss or silencing. Genetic and epigenetic changes were evidenced that could have played a major role in the stabilization of the unique ancestral allotetraploid and its subsequent diversification. While the early evolution of C. arabica mainly involved homeologous crossover exchanges, the later stage appears to have relied on more gradual evolution involving gene conversion and homeolog silencing.

  8. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  9. Chloroplast Genome Sequence of pigeonpea (Cajanus cajan (L. Millspaugh and Cajanus scarabaeoides: Genome organization and Comparison with other legumes

    Directory of Open Access Journals (Sweden)

    Tanvi Kaila

    2016-12-01

    Full Text Available Pigeonpea (Cajanus cajan (L. Millspaugh, a diploid (2n = 22 legume crop with a genome size of 852 Mbp, serves as an important source of human dietary protein especially in South East Asian and African regions. In this study, the draft chloroplast genomes of Cajanus cajan and Cajanus scarabaeoides were sequenced. Cajanus scarabaeoides is an important species of the Cajanus gene pool and has also been used for developing promising CMS system by different groups. A male sterile genotype harbouring the Cajanus scarabaeoides cytoplasm was used for sequencing the plastid genome. The cp genome of Cajanus cajan is 152,242bp long, having a quadripartite structure with LSC of 83,455 bp and SSC of 17,871 bp separated by IRs of 25,398 bp. Similarly, the cp genome of Cajanus scarabaeoides is 152,201bp long, having a quadripartite structure in which IRs of 25,402 bp length separates 83,423 bp of LSC and 17,854 bp of SSC. The pigeonpea cp genome contains 116 unique genes, including 30 tRNA, 4 rRNA, 78 predicted protein coding genes and 5 pseudogenes. A 50kb inversion was observed in the LSC region of pigeonpea cp genome, consistent with other legumes. Comparison of cp genome with other legumes revealed the contraction of IR boundaries due to the absence of rps19 gene in the IR region. Chloroplast SSRs were mined and a total of 280 and 292 cpSSRs were identified in Cajanus scarabaeoides and Cajanus cajan respectively. RNA editing was observed at 37 sites in both Cajanus scarabaeoides and Cajanus cajan, with maximum occurrence in the ndh genes. The pigeonpea cp genome sequence would be beneficial in providing informative molecular markers which can be utilized for genetic diversity analysis and aid in understanding the plant systematics studies among major grain legumes.

  10. Convergent functional genomics in addiction research - a translational approach to study candidate genes and gene networks.

    Science.gov (United States)

    Spanagel, Rainer

    2013-01-01

    Convergent functional genomics (CFG) is a translational methodology that integrates in a Bayesian fashion multiple lines of evidence from studies in human and animal models to get a better understanding of the genetics of a disease or pathological behavior. Here the integration of data sets that derive from forward genetics in animals and genetic association studies including genome wide association studies (GWAS) in humans is described for addictive behavior. The aim of forward genetics in animals and association studies in humans is to identify mutations (e.g. SNPs) that produce a certain phenotype; i.e. "from phenotype to genotype". Most powerful in terms of forward genetics is combined quantitative trait loci (QTL) analysis and gene expression profiling in recombinant inbreed rodent lines or genetically selected animals for a specific phenotype, e.g. high vs. low drug consumption. By Bayesian scoring genomic information from forward genetics in animals is then combined with human GWAS data on a similar addiction-relevant phenotype. This integrative approach generates a robust candidate gene list that has to be functionally validated by means of reverse genetics in animals; i.e. "from genotype to phenotype". It is proposed that studying addiction relevant phenotypes and endophenotypes by this CFG approach will allow a better determination of the genetics of addictive behavior.

  11. The genome and transcriptome of Phalaenopsis yield insights into floral organ development and flowering regulation

    Directory of Open Access Journals (Sweden)

    Jian-Zhi Huang

    2016-05-01

    Full Text Available The Phalaenopsis orchid is an important potted flower of high economic value around the world. We report the 3.1 Gb draft genome assembly of an important winter flowering Phalaenopsis ‘KHM190’ cultivar. We generated 89.5 Gb RNA-seq and 113 million sRNA-seq reads to use these data to identify 41,153 protein-coding genes and 188 miRNA families. We also generated a draft genome for Phalaenopsis pulcherrima ‘B8802,’ a summer flowering species, via resequencing. Comparison of genome data between the two Phalaenopsis cultivars allowed the identification of 691,532 single-nucleotide polymorphisms. In this study, we reveal that the key role of PhAGL6b in the regulation of labellum organ development involves alternative splicing in the big lip mutant. Petal or sepal overexpressing PhAGL6b leads to the conversion into a lip-like structure. We also discovered that the gibberellin pathway that regulates the expression of flowering time genes during the reproductive phase change is induced by cool temperature. Our work thus depicted a valuable resource for the flowering control, flower architecture development, and breeding of the Phalaenopsis orchids.

  12. Complementary Information Derived from CRISPR Cas9 Mediated Gene Deletion and Suppression. | Office of Cancer Genomics

    Science.gov (United States)

    CRISPR-Cas9 provides the means to perform genome editing and facilitates loss-of-function screens. However, we and others demonstrated that expression of the Cas9 endonuclease induces a gene-independent response that correlates with the number of target sequences in the genome. An alternative approach to suppressing gene expression is to block transcription using a catalytically inactive Cas9 (dCas9). Here we directly compare genome editing by CRISPR-Cas9 (cutting, CRISPRc) and gene suppression using KRAB-dCas9 (CRISPRi) in loss-of-function screens to identify cell essential genes.

  13. Building a complete image of genome regulation in the model organism Escherichia coli.

    Science.gov (United States)

    Ishihama, Akira

    2018-01-15

    The model organism, Escherichia coli, contains a total of more than 4,500 genes, but the total number of RNA polymerase (RNAP) core enzyme or the transcriptase is only about 2,000 molecules per genome. The regulatory targets of RNAP are, however, modulated by changing its promoter selectivity through two-steps of protein-protein interplay with 7 species of the sigma factor in the first step, and then 300 species of the transcription factor (TF) in the second step. Scientists working in the field of prokaryotic transcription in Japan have made considerable contributions to the elucidation of genetic frameworks and regulatory modes of the genome transcription in E. coli K-12. This review summarizes the findings by this group, first focusing on three sigma factors, the stationary-phase sigma RpoS, the heat-shock sigma RpoH, and the flagellar-chemotaxis sigma RpoF, as examples. It also presents an overview of the current state of the systematic research being carried out to identify the regulatory functions of all TFs from a single and the same bacterium E. coli K-12, using the genomic SELEX and PS-TF screening systems. All these studies have been undertaken with the aim of understanding the genome regulation in E. coli K-12 as a whole.

  14. Expression of homing endonuclease gene and insertion-like element in sea anemone mitochondrial genomes: Lesson learned from Anemonia viridis.

    Science.gov (United States)

    Chi, Sylvia Ighem; Urbarova, Ilona; Johansen, Steinar D

    2018-04-30

    The mitochondrial genomes of sea anemones are dynamic in structure. Invasion by genetic elements, such as self-catalytic group I introns or insertion-like sequences, contribute to sea anemone mitochondrial genome expansion and complexity. By using next generation sequencing we investigated the complete mtDNAs and corresponding transcriptomes of the temperate sea anemone Anemonia viridis and its closer tropical relative Anemonia majano. Two versions of fused homing endonuclease gene (HEG) organization were observed among the Actiniidae sea anemones; in-frame gene fusion and pseudo-gene fusion. We provided support for the pseudo-gene fusion organization in Anemonia species, resulting in a repressed HEG from the COI-884 group I intron. orfA, a putative protein-coding gene with insertion-like features, was present in both Anemonia species. Interestingly, orfA and COI expression were significantly up-regulated upon long-term environmental stress corresponding to low seawater pH conditions. This study provides new insights to the dynamics of sea anemone mitochondrial genome structure and function. Copyright © 2018 Elsevier B.V. All rights reserved.

  15. Human Ro60 (SSA2) genomic organization and sequence alterations, examined in cutaneous lupus erythematosus.

    Science.gov (United States)

    Millard, T P; Ashton, G H S; Kondeatis, E; Vaughan, R W; Hughes, G R V; Khamashta, M A; Hawk, J L M; McGregor, J M; McGrath, J A

    2002-02-01

    The Ro 60 kDa protein (Ro60 or SSA2) is the major component of the Ro ribonucleoprotein (Ro RNP) complex, to which an immune response is a specific feature of several autoimmune diseases. The genomic organization and any sequence variation within the DNA encoding Ro60 are unknown. To characterize the Ro60 gene structure and to assess whether any sequence alterations might be associated with serum anti-Ro antibody in subacute cutaneous lupus erythematosus (SCLE), thus potentially providing new insight into disease pathogenesis. The cDNA sequence for Ro60 was obtained from the NCBI database and used for a BLAST search for a clone containing the entire genomic sequence. The intron-exon borders were confirmed by designing intronic primer pairs to flank each exon, which were then used to amplify genomic DNA for automated sequencing from 36 caucasian patients with SCLE (anti-Ro positive) and 49 with discoid LE (DLE, anti-Ro negative), in addition to 36 healthy caucasian controls. Heteroduplex analysis of polymerase chain reaction (PCR) products from patients and controls spanning all Ro60 exons (1-8) revealed a common bandshift in the PCR products spanning exon 7. Sequencing of the corresponding PCR products demonstrated an A > G substitution at nucleotide position 1318-7, within the consensus acceptor splice site of exon 7 (GenBank XM001901). The allele frequencies were major allele A (0.71) and minor allele G (0.29) in 72 control chromosomes, with no significant differences found between SCLE patients, DLE patients and controls. The genomic organization of the DNA encoding the Ro60 protein is described, including a common polymorphism within the consensus acceptor splice site of exon 7. Our delineation of a strategy for the genomic amplification of Ro60 forms a basis for further examination of the pathological functions of the Ro RNP in autoimmune disease.

  16. Finding the missing honey bee genes: Lessons learned from a genome upgrade

    KAUST Repository

    Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A; Patil, S.; Gubbala, S.; Aqrawi, P.; Arias, F.; Bess, C.; Blankenburg, K. B.; Brocchini, M.; Buhay, C.; Challis, D.; Chang, K.; Chen, D.; Coleman, P.; Drummond, J.; English, A.; Evani, U.; Francisco, L.; Fu, Q.; Goodspeed, R.; Haessly, T. H.; Hale, W.; Han, H.; Hu, Y.; Jackson, L.; Jakkamsetti, A.; Jayaseelan, J. C.; Kakkar, N.; Kalra, D.; Kandadi, H.; Lee, S.; Li, H.; Liu, Y.; Macmil, S.; Mandapat, C. M.; Mata, R.; Mathew, T.; Matskevitch, T.; Munidasa, M.; Nagaswamy, U.; Najjar, R.; Nguyen, N.; Niu, J.; Opheim, D.; Palculict, T.; Paul, S.; Pellon, M.; Perales, L.; Pham, C.; Pham, P.

    2014-01-01

    Background: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. 2014 Elsik et al.; licensee BioMed Central Ltd.

  17. Finding the missing honey bee genes: lessons learned from a genome upgrade.

    Science.gov (United States)

    Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A

    2014-01-30

    The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

  18. Finding the missing honey bee genes: Lessons learned from a genome upgrade

    KAUST Repository

    Elsik, Christine G

    2014-01-30

    Background: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. 2014 Elsik et al.; licensee BioMed Central Ltd.

  19. Extensive error in the number of genes inferred from draft genome assemblies.

    Directory of Open Access Journals (Sweden)

    James F Denton

    2014-12-01

    Full Text Available Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.

  20. Genome organization in the nucleus: From dynamic measurements to a functional model.

    Science.gov (United States)

    Vivante, Anat; Brozgol, Eugene; Bronshtein, Irena; Garini, Yuval

    2017-07-01

    A biological system is by definition a dynamic environment encompassing kinetic processes that occur at different length scales and time ranges. To explore this type of system, spatial information needs to be acquired at different time scales. This means overcoming significant hurdles, including the need for stable and precise labeling of the required probes and the use of state of the art optical methods. However, to interpret the acquired data, biophysical models that can account for these biological mechanisms need to be developed. The structure and function of a biological system are closely related to its dynamic properties, thus further emphasizing the importance of identifying the rules governing the dynamics that cannot be directly deduced from information on the structure itself. In eukaryotic cells, tens of thousands of genes are packed in the small volume of the nucleus. The genome itself is organized in chromosomes that occupy specific volumes referred to as chromosome territories. This organization is preserved throughout the cell cycle, even though there are no sub-compartments in the nucleus itself. This organization, which is still not fully understood, is crucial for a large number of cellular functions such as gene regulation, DNA breakage repair and error-free cell division. Various techniques are in use today, including imaging, live cell imaging and molecular methods such as chromosome conformation capture (3C) methods to better understand these mechanisms. Live cell imaging methods are becoming well established. These include methods such as Single Particle Tracking (SPT), Continuous Photobleaching (CP), Fluorescence Recovery After Photobleaching (FRAP) and Fluorescence Correlation Spectroscopy (FCS) that are currently used for studying proteins, RNA, DNA, gene loci and nuclear bodies. They provide crucial information on its mobility, reorganization, interactions and binding properties. Here we describe how these dynamic methods can be used to

  1. Genome-wide search for gene-gene interactions in colorectal cancer.

    Directory of Open Access Journals (Sweden)

    Shuo Jiao

    Full Text Available Genome-wide association studies (GWAS have successfully identified a number of single-nucleotide polymorphisms (SNPs associated with colorectal cancer (CRC risk. However, these susceptibility loci known today explain only a small fraction of the genetic risk. Gene-gene interaction (GxG is considered to be one source of the missing heritability. To address this, we performed a genome-wide search for pair-wise GxG associated with CRC risk using 8,380 cases and 10,558 controls in the discovery phase and 2,527 cases and 2,658 controls in the replication phase. We developed a simple, but powerful method for testing interaction, which we term the Average Risk Due to Interaction (ARDI. With this method, we conducted a genome-wide search to identify SNPs showing evidence for GxG with previously identified CRC susceptibility loci from 14 independent regions. We also conducted a genome-wide search for GxG using the marginal association screening and examining interaction among SNPs that pass the screening threshold (p<10(-4. For the known locus rs10795668 (10p14, we found an interacting SNP rs367615 (5q21 with replication p = 0.01 and combined p = 4.19×10(-8. Among the top marginal SNPs after LD pruning (n = 163, we identified an interaction between rs1571218 (20p12.3 and rs10879357 (12q21.1 (nominal combined p = 2.51×10(-6; Bonferroni adjusted p = 0.03. Our study represents the first comprehensive search for GxG in CRC, and our results may provide new insight into the genetic etiology of CRC.

  2. Primary structure of the human follistatin precursor and its genomic organization

    International Nuclear Information System (INIS)

    Shimasaki, Shunichi; Koga, Makoto; Esch, F.

    1988-01-01

    Follistatin is a single-chain gonadal protein that specifically inhibits follicle-stimulating hormone release. By use of the recently characterized porcine follistatin cDNA as a probe to screen a human testis cDNA library and a genomic library, the structure of the complete human follistatin precursor as well as its genomic organization have been determined. Three of eight cDNA clones that were sequenced predicted a precursor with 344 amino acids, whereas the remaining five cDNA clones encoded a 317 amino acid precursor, resulting from alternative splicing of the precursor mRNA. Mature follistatins contain four contiguous domains that are encoded by precisely separated exons; three of the domains are highly similar to each other, as well as to human epidermal growth factor and human pancreatic secretory trypsin inhibitor. The genomic organization of the human follistatin is similar to that of the human epidermal growth factor gene and thus supports the notion of exon shuffling during evolution

  3. Genome-wide characterization, evolution, and expression analysis of the leucine-rich repeat receptor-like protein kinase (LRR-RLK) gene family in Rosaceae genomes.

    Science.gov (United States)

    Sun, Jiangmei; Li, Leiting; Wang, Peng; Zhang, Shaoling; Wu, Juyou

    2017-10-10

    Leucine-rich repeat receptor-like protein kinase (LRR-RLK) is the largest gene family of receptor-like protein kinases (RLKs) and actively participates in regulating the growth, development, signal transduction, immunity, and stress responses of plants. However, the patterns of LRR-RLK gene family evolution in the five main Rosaceae species for which genome sequences are available have not yet been reported. In this study, we performed a comprehensive analysis of LRR-RLK genes for five Rosaceae species: Fragaria vesca (strawberry), Malus domestica (apple), Pyrus bretschneideri (Chinese white pear), Prunus mume (mei), and Prunus persica (peach), which contained 201, 244, 427, 267, and 258 LRR-RLK genes, respectively. All LRR-RLK genes were further grouped into 23 subfamilies based on the hidden Markov models approach. RLK-Pelle_LRR-XII-1, RLK-Pelle_LRR-XI-1, and RLK-Pelle_LRR-III were the three largest subfamilies. Synteny analysis indicated that there were 236 tandem duplicated genes in the five Rosaceae species, among which subfamilies XII-1 (82 genes) and XI-1 (80 genes) comprised 68.6%. Our results indicate that tandem duplication made a large contribution to the expansion of the subfamilies. The gene expression, tissue-specific expression, and subcellular localization data revealed that LRR-RLK genes were differentially expressed in various organs and tissues, and the largest subfamily XI-1 was highly expressed in all five Rosaceae species, suggesting that LRR-RLKs play important roles in each stage of plant growth and development. Taken together, our results provide an overview of the LRR-RLK family in Rosaceae genomes and the basis for further functional studies.

  4. Homology of yeast photoreactivating gene fragment with human genomic digests

    International Nuclear Information System (INIS)

    Meechan, P.J.; Milam, K.M.; Cleaver, J.E.

    1984-01-01

    Enzymatic photoreactivation of UV-induced DNA lesions has been demonstrated for a variety of prokaryotic and eukaryotic organisms. Its presence in placental mammals, however, has not been clearly established. The authors attempted to resolve this question by assaying for the presence (or absence) of sequences in human DNA complimentary to a fragment of the photoreactivating gene from S. cerevisiae that has recently been cloned. In another study, DNA from human, chick E. coli and yeast cells was digested with either HindIII of BglII, electrophoresed on a 0.5% agarose gel, transferred (Southern blot) to a nylon membrane and probed for homology against a Sau3A restriction fragment from S. cerevisiae that compliments phr/sup -/ cells. Hybridization to human DNA digests was observed only under relatively non-stringent conditions indicating the gene is not conserved in placental mammals. These results are correlated with current literature data concerning photoreactivating enzymes

  5. Selectable tolerance to herbicides by mutated acetolactate synthase genes integrated into the chloroplast genome of tobacco.

    Science.gov (United States)

    Shimizu, Masanori; Goto, Maki; Hanai, Moeko; Shimizu, Tsutomu; Izawa, Norihiko; Kanamoto, Hirosuke; Tomizawa, Ken-Ichi; Yokota, Akiho; Kobayashi, Hirokazu

    2008-08-01

    Strategies employed for the production of genetically modified (GM) crops are premised on (1) the avoidance of gene transfer in the field; (2) the use of genes derived from edible organisms such as plants; (3) preventing the appearance of herbicide-resistant weeds; and (4) maintaining transgenes without obstructing plant cell propagation. To this end, we developed a novel vector system for chloroplast transformation with acetolactate synthase (ALS). ALS catalyzes the first step in the biosynthesis of the branched amino acids, and its enzymatic activity is inhibited by certain classes of herbicides. We generated a series of Arabidopsis (Arabidopsis thaliana) mutated ALS (mALS) genes and introduced constructs with mALS and the aminoglycoside 3'-adenyltransferase gene (aadA) into the tobacco (Nicotiana tabacum) chloroplast genome by particle bombardment. Transplastomic plants were selected using their resistance to spectinomycin. The effects of herbicides on transplastomic mALS activity were examined by a colorimetric assay using the leaves of transplastomic plants. We found that transplastomic G121A, A122V, and P197S plants were specifically tolerant to pyrimidinylcarboxylate, imidazolinon, and sulfonylurea/pyrimidinylcarboxylate herbicides, respectively. Transplastomic plants possessing mALSs were able to grow in the presence of various herbicides, thus affirming the relationship between mALSs and the associated resistance to herbicides. Our results show that mALS genes integrated into the chloroplast genome are useful sustainable markers that function to exclude plants other than those that are GM while maintaining transplastomic crops. This investigation suggests that the resistance management of weeds in the field amid growing GM crops is possible using (1) a series of mALSs that confer specific resistance to herbicides and (2) a strategy that employs herbicide rotation.

  6. Reversible perturbations of gene regulation after genome editing in Drosophila cells.

    Directory of Open Access Journals (Sweden)

    Stefan Kunzelmann

    Full Text Available The prokaryotic phage defense CRISPR/cas-system has developed into a versatile toolbox for genome engineering and genetic studies in many organisms. While many efforts were spent on analyzing the consequences of off-target effects, only few studies addressed side-effects that occur due to the targeted manipulation of the genome. Here, we show that the CRISPR/cas9-mediated integration of an epitope tag in combination with a selection cassette can trigger an siRNA-mediated, epigenetic genome surveillance pathway in Drosophila melanogaster cells. After homology-directed insertion of the sequence coding for the epitope tag and the selection marker, a moderate level of siRNAs covering the inserted sequence and extending into the targeted locus was detected. This response affected protein levels less than two-fold and it persisted even after single cell cloning. However, removal of the selection cassette abolished the siRNA generation, demonstrating that this response is reversible. Consistently, marker-free genome engineering did not trigger the same surveillance mechanism. These two observations indicate that the selection cassette we employed induces an aberrant transcriptional arrangement and ultimately sets off the siRNA production. There have been prior concerns about undesirable effects induced by selection markers, but fortunately we were able to show that at least one of the epigenetic changes reverts as the marker gene is excised. Although the effects observed were rather weak (less than twofold de-repression upon ago2 or dcr-2 knock-down, we recommend that when selection markers are used during genome editing, a strategy for their subsequent removal should always be included.

  7. In silico method for modelling metabolism and gene product expression at genome scale

    Energy Technology Data Exchange (ETDEWEB)

    Lerman, Joshua A.; Hyduke, Daniel R.; Latif, Haythem; Portnoy, Vasiliy A.; Lewis, Nathan E.; Orth, Jeffrey D.; Rutledge, Alexandra C.; Smith, Richard D.; Adkins, Joshua N.; Zengler, Karsten; Palsson, Bernard O.

    2012-07-03

    Transcription and translation use raw materials and energy generated metabolically to create the macromolecular machinery responsible for all cellular functions, including metabolism. A biochemically accurate model of molecular biology and metabolism will facilitate comprehensive and quantitative computations of an organism's molecular constitution as a function of genetic and environmental parameters. Here we formulate a model of metabolism and macromolecular expression. Prototyping it using the simple microorganism Thermotoga maritima, we show our model accurately simulates variations in cellular composition and gene expression. Moreover, through in silico comparative transcriptomics, the model allows the discovery of new regulons and improving the genome and transcription unit annotations. Our method presents a framework for investigating molecular biology and cellular physiology in silico and may allow quantitative interpretation of multi-omics data sets in the context of an integrated biochemical description of an organism.

  8. A comprehensive evaluation of rodent malaria parasite genomes and gene expression

    KAUST Repository

    Otto, Thomas D

    2014-10-30

    Background: Rodent malaria parasites (RMP) are used extensively as models of human malaria. Draft RMP genomes have been published for Plasmodium yoelii, P. berghei ANKA (PbA) and P. chabaudi AS (PcAS). Although availability of these genomes made a significant impact on recent malaria research, these genomes were highly fragmented and were annotated with little manual curation. The fragmented nature of the genomes has hampered genome wide analysis of Plasmodium gene regulation and function. Results: We have greatly improved the genome assemblies of PbA and PcAS, newly sequenced the virulent parasite P. yoelii YM genome, sequenced additional RMP isolates/lines and have characterized genotypic diversity within RMP species. We have produced RNA-seq data and utilized it to improve gene-model prediction and to provide quantitative, genome-wide, data on gene expression. Comparison of the RMP genomes with the genome of the human malaria parasite P. falciparum and RNA-seq mapping permitted gene annotation at base-pair resolution. Full-length chromosomal annotation permitted a comprehensive classification of all subtelomeric multigene families including the `Plasmodium interspersed repeat genes\\' (pir). Phylogenetic classification of the pir family, combined with pir expression patterns, indicates functional diversification within this family. Conclusions: Complete RMP genomes, RNA-seq and genotypic diversity data are excellent and important resources for gene-function and post-genomic analyses and to better interrogate Plasmodium biology. Genotypic diversity between P. chabaudi isolates makes this species an excellent parasite to study genotype-phenotype relationships. The improved classification of multigene families will enhance studies on the role of (variant) exported proteins in virulence and immune evasion/modulation.

  9. StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

    Science.gov (United States)

    Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A

    2017-10-15

    Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. favorov@sensi.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  10. Mammalian-specific genomic functions: Newly acquired traits generated by genomic imprinting and LTR retrotransposon-derived genes in mammals.

    Science.gov (United States)

    Kaneko-Ishino, Tomoko; Ishino, Fumitoshi

    2015-01-01

    Mammals, including human beings, have evolved a unique viviparous reproductive system and a highly developed central nervous system. How did these unique characteristics emerge in mammalian evolution, and what kinds of changes did occur in the mammalian genomes as evolution proceeded? A key conceptual term in approaching these issues is "mammalian-specific genomic functions", a concept covering both mammalian-specific epigenetics and genetics. Genomic imprinting and LTR retrotransposon-derived genes are reviewed as the representative, mammalian-specific genomic functions that are essential not only for the current mammalian developmental system, but also mammalian evolution itself. First, the essential roles of genomic imprinting in mammalian development, especially related to viviparous reproduction via placental function, as well as the emergence of genomic imprinting in mammalian evolution, are discussed. Second, we introduce the novel concept of "mammalian-specific traits generated by mammalian-specific genes from LTR retrotransposons", based on the finding that LTR retrotransposons served as a critical driving force in the mammalian evolution via generating mammalian-specific genes.

  11. Approaching the sequential and three-dimensional organization of Archaea, Bacteria and Eukarya genomes. Dynamic Organization of Nuclear Function

    NARCIS (Netherlands)

    T.A. Knoch (Tobias); M. Göker (Markus); R. Lohner (Rudolf); J. Langowski (Jörg)

    2002-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis

  12. Genome-wide analysis of regions similar to promoters of histone genes

    KAUST Repository

    Chowdhary, Rajesh

    2010-05-28

    Background: The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes.Results: We successfully developed a histone promoter model using a comprehensive collection of histone genes. Based on leave-one-out cross-validation test, the model produced good prediction accuracy (94.1% sensitivity, 92.6% specificity, and 92.8% positive predictive value). We used this model to predict across the genome a number of genes that shared similar promoter structures with the histone gene promoters. We thus hypothesize that these predicted genes could be coregulated with histone genes. This hypothesis matches well with the available gene expression, gene ontology, and pathways data. Jointly with promoters of the above-mentioned genes, we found a large number of intergenic regions with similar structure as histone promoters.Conclusions: This study represents one of the most comprehensive computational analyses conducted thus far on a genome-wide scale of promoters of human histone genes. Our analysis suggests a number of other human genes that share a high similarity of promoter structure with the histone genes and thus are highly likely to be coregulated, and consequently coexpressed, with the histone genes. We also found that there are a large number of intergenic regions across the genome with their structures similar to promoters of histone genes. These regions may be promoters of yet unidentified genes, or may represent remote control regions that

  13. Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.

    Science.gov (United States)

    Evans, Teri; Johnson, Andrew D; Loose, Matthew

    2018-01-12

    Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .

  14. Whole Genome and Global Gene Expression Analyses of the Model Mushroom Flammulina velutipes Reveal a High Capacity for Lignocellulose Degradation

    Science.gov (United States)

    Park, Young-Jin; Baek, Jeong Hun; Lee, Seonwook; Kim, Changhoon; Rhee, Hwanseok; Kim, Hyungtae; Seo, Jeong-Sun; Park, Hae-Ran; Yoon, Dae-Eun; Nam, Jae-Young; Kim, Hong-Il; Kim, Jong-Guk; Yoon, Hyeokjun; Kang, Hee-Wan; Cho, Jae-Yong; Song, Eun-Sung; Sung, Gi-Ho; Yoo, Young-Bok; Lee, Chang-Soo; Lee, Byoung-Moo; Kong, Won-Sik

    2014-01-01

    Flammulina velutipes is a fungus with health and medicinal benefits that has been used for consumption and cultivation in East Asia. F. velutipes is also known to degrade lignocellulose and produce ethanol. The overlapping interests of mushroom production and wood bioconversion make F. velutipes an attractive new model for fungal wood related studies. Here, we present the complete sequence of the F. velutipes genome. This is the first sequenced genome for a commercially produced edible mushroom that also degrades wood. The 35.6-Mb genome contained 12,218 predicted protein-encoding genes and 287 tRNA genes assembled into 11 scaffolds corresponding with the 11 chromosomes of strain KACC42780. The 88.4-kb mitochondrial genome contained 35 genes. Well-developed wood degrading machinery with strong potential for lignin degradation (69 auxiliary activities, formerly FOLymes) and carbohydrate degradation (392 CAZymes), along with 58 alcohol dehydrogenase genes were highly expressed in the mycelium, demonstrating the potential application of this organism to bioethanol production. Thus, the newly uncovered wood degrading capacity and sequential nature of this process in F. velutipes, offer interesting possibilities for more detailed studies on either lignin or (hemi-) cellulose degradation in complex wood substrates. The mutual interest in wood degradation by the mushroom industry and (ligno-)cellulose biomass related industries further increase the significance of F. velutipes as a new model. PMID:24714189

  15. Functional evolution of ADAMTS genes: Evidence from analyses of phylogeny and gene organization

    Directory of Open Access Journals (Sweden)

    Van Meir Erwin G

    2005-02-01

    Full Text Available Abstract Background The ADAMTS (A Disintegrin-like and Metalloprotease with Thrombospondin motifs proteins are a family of metalloproteases with sequence similarity to the ADAM proteases, that contain the thrombospondin type 1 sequence repeat motifs (TSRs common to extracellular matrix proteins. ADAMTS proteins have recently gained attention with the discovery of their role in a variety of diseases, including tissue and blood disorders, cancer, osteoarthritis, Alzheimer's and the genetic syndromes Weill-Marchesani syndrome (ADAMTS10, thrombotic thrombocytopenic purpura (ADAMTS13, and Ehlers-Danlos syndrome type VIIC (ADAMTS2 in humans and belted white-spotting mutation in mice (ADAMTS20. Results Phylogenetic analysis and comparison of the exon/intron organization of vertebrate (Homo, Mus, Fugu, chordate (Ciona and invertebrate (Drosophila and Caenorhabditis ADAMTS homologs has elucidated the evolutionary relationships of this important gene family, which comprises 19 members in humans. Conclusions The evolutionary history of ADAMTS genes in vertebrate genomes has been marked by rampant gene duplication, including a retrotransposition that gave rise to a distinct ADAMTS subfamily (ADAMTS1, -4, -5, -8, -15 that may have distinct aggrecanase and angiogenesis functions.

  16. Genomic organization of the canine herpesvirus US region.

    Science.gov (United States)

    Haanes, E J; Tomlinson, C C

    1998-02-01

    Canine herpesvirus (CHV) is an alpha-herpesvirus of limited pathogenicity in healthy adult dogs and infectivity of the virus appears to be largely limited to cells of canine origin. CHV's low virulence and species specificity make it an attractive candidate for a recombinant vaccine vector to protect dogs against a variety of pathogens. As part of the analysis of the CHV genome, the authors determined the complete nucleotide sequence of the CHV US region as well as portions of the flanking inverted repeats. Seven full open reading frames (ORFs) encoding proteins larger than 100 amino acids were identified within, or partially within the CHV US: cUS2, cUS3, cUS4, cUS6, cUS7, cUS8 and cUS9; which are homologs of the herpes simplex virus type-1 US2; protein kinase; gG, gD, gI, gE; and US9 genes, respectively. An eighth ORF was identified in the inverted repeat region, cIR6, a homolog of the equine herpesvirus type-1 IR6 gene. The authors identified and mapped most of the major transcripts for the predicted CHV US ORFs by Northern analysis.

  17. Genome-Wide Identification, Phylogenetic and Expression Analyses of the Ubiquitin-Conjugating Enzyme Gene Family in Maize

    Science.gov (United States)

    Jue, Dengwei; Sang, Xuelian; Lu, Shengqiao; Dong, Chen; Zhao, Qiufang; Chen, Hongliang; Jia, Liqiang

    2015-01-01

    Background Ubiquitination is a post-translation modification where ubiquitin is attached to a substrate. Ubiquitin-conjugating enzymes (E2s) play a major role in the ubiquitin transfer pathway, as well as a variety of functions in plant biological processes. To date, no genome-wide characterization of this gene family has been conducted in maize (Zea mays). Methodology/Principal Findings In the present study, a total of 75 putative ZmUBC genes have been identified and located in the maize genome. Phylogenetic analysis revealed that ZmUBC proteins could be divided into 15 subfamilies, which include 13 ubiquitin-conjugating enzymes (ZmE2s) and two independent ubiquitin-conjugating enzyme variant (UEV) groups. The predicted ZmUBC genes were distributed across 10 chromosomes at different densities. In addition, analysis of exon-intron junctions and sequence motifs in each candidate gene has revealed high levels of conservation within and between phylogenetic groups. Tissue expression analysis indicated that most ZmUBC genes were expressed in at least one of the tissues, indicating that these are involved in various physiological and developmental processes in maize. Moreover, expression profile analyses of ZmUBC genes under different stress treatments (4°C, 20% PEG6000, and 200 mM NaCl) and various expression patterns indicated that these may play crucial roles in the response of plants to stress. Conclusions Genome-wide identification, chromosome organization, gene structure, evolutionary and expression analyses of ZmUBC genes have facilitated in the characterization of this gene family, as well as determined its potential involvement in growth, development, and stress responses. This study provides valuable information for better understanding the classification and putative functions of the UBC-encoding genes of maize. PMID:26606743

  18. Organization and evolution of the rat tyrosine hydroxylase gene

    International Nuclear Information System (INIS)

    Brown, E.R.; Coker, G.T. III; O'Malley, K.L.

    1987-01-01

    This report describes the organization of the rat tyrosine hydroxylase (TH) gene and compares its structure with the human phenylalanine hydroxylase gene. Both genes are single copy and contain 13 exons separated by 12 introns. Remarkably, the positions of 10 out 12 intron/exon boundaries are identical for the two genes. These results support the idea that these hydroxylases genes are members of a gene family which has a common evolutionary origin. The authors predict that this ancestral gene would have encoded exons similar to those of TH prior to evolutionary drift to other members of this gene family

  19. Genome-wide identification, characterization, and expression profile of aquaporin gene family in flax (Linum usitatissimum).

    Science.gov (United States)

    Shivaraj, S M; Deshmukh, Rupesh K; Rai, Rhitu; Bélanger, Richard; Agrawal, Pawan K; Dash, Prasanta K

    2017-04-27

    Membrane intrinsic proteins (MIPs) form transmembrane channels and facilitate transport of myriad substrates across the cell membrane in many organisms. Majority of plant MIPs have water transporting ability and are commonly referred as aquaporins (AQPs). In the present study, we identified aquaporin coding genes in flax by genome-wide analysis, their structure, function and expression pattern by pan-genome exploration. Cross-genera phylogenetic analysis with known aquaporins from rice, arabidopsis, and poplar showed five subgroups of flax aquaporins representing 16 plasma membrane intrinsic proteins (PIPs), 17 tonoplast intrinsic proteins (TIPs), 13 NOD26-like intrinsic proteins (NIPs), 2 small basic intrinsic proteins (SIPs), and 3 uncharacterized intrinsic proteins (XIPs). Amongst aquaporins, PIPs contained hydrophilic aromatic arginine (ar/R) selective filter but TIP, NIP, SIP and XIP subfamilies mostly contained hydrophobic ar/R selective filter. Analysis of RNA-seq and microarray data revealed high expression of PIPs in multiple tissues, low expression of NIPs, and seed specific expression of TIP3 in flax. Exploration of aquaporin homologs in three closely related Linum species bienne, grandiflorum and leonii revealed presence of 49, 39 and 19 AQPs, respectively. The genome-wide identification of aquaporins, first in flax, provides insight to elucidate their physiological and developmental roles in flax.

  20. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Science.gov (United States)

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  1. PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

    Science.gov (United States)

    Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

    2015-12-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

    Science.gov (United States)

    Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

    2015-01-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591

  3. Genome-wide gene expression regulation as a function of genotype and age in C. elegans

    NARCIS (Netherlands)

    Viñuela Rodriguez, A.; Snoek, L.B.; Riksen, J.A.G.; Kammenga, J.E.

    2010-01-01

    Gene expression becomes more variable with age, and it is widely assumed that this is due to a decrease in expression regulation. But currently there is no understanding how gene expression regulatory patterns progress with age. Here we explored genome-wide gene expression variation and regulatory

  4. The Symbiodinium kawagutii genome illuminates dinoflagellate gene expression and coral symbiosis

    DEFF Research Database (Denmark)

    Lin, Senjie; Cheng, Shifeng; Song, Bo

    2015-01-01

    Symbiodinium-specific gene families. No whole-genome duplication was observed, but instead we found active (retro) transposition and gene family expansion, especially in processes important for successful symbiosis with corals. We also documented genes potentially governing sexual reproduction and cyst...... the molecular basis and evolution of coral symbiosis....

  5. Genomic variation and its impact on gene expression in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Andreas Massouras

    Full Text Available Understanding the relationship between genetic and phenotypic variation is one of the great outstanding challenges in biology. To meet this challenge, comprehensive genomic variation maps of human as well as of model organism populations are required. Here, we present a nucleotide resolution catalog of single-nucleotide, multi-nucleotide, and structural variants in 39 Drosophila melanogaster Genetic Reference Panel inbred lines. Using an integrative, local assembly-based approach for variant discovery, we identify more than 3.6 million distinct variants, among which were more than 800,000 unique insertions, deletions (indels, and complex variants (1 to 6,000 bp. While the SNP density is higher near other variants, we find that variants themselves are not mutagenic, nor are regions with high variant density particularly mutation-prone. Rather, our data suggest that the elevated SNP density around variants is mainly due to population-level processes. We also provide insights into the regulatory architecture of gene expression variation in adult flies by mapping cis-expression quantitative trait loci (cis-eQTLs for more than 2,000 genes. Indels comprise around 10% of all cis-eQTLs and show larger effects than SNP cis-eQTLs. In addition, we identified two-fold more gene associations in males as compared to females and found that most cis-eQTLs are sex-specific, revealing a partial decoupling of the genomic architecture between the sexes as well as the importance of genetic factors in mediating sex-biased gene expression. Finally, we performed RNA-seq-based allelic expression imbalance analyses in the offspring of crosses between sequenced lines, which revealed that the majority of strong cis-eQTLs can be validated in heterozygous individuals.

  6. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    Directory of Open Access Journals (Sweden)

    Grigoriev Igor V

    2009-02-01

    Full Text Available Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR. Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6% of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  7. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  8. Network graph analysis of gene-gene interactions in genome-wide association study data.

    Science.gov (United States)

    Lee, Sungyoung; Kwon, Min-Seok; Park, Taesung

    2012-12-01

    Most common complex traits, such as obesity, hypertension, diabetes, and cancers, are known to be associated with multiple genes, environmental factors, and their epistasis. Recently, the development of advanced genotyping technologies has allowed us to perform genome-wide association studies (GWASs). For detecting the effects of multiple genes on complex traits, many approaches have been proposed for GWASs. Multifactor dimensionality reduction (MDR) is one of the powerful and efficient methods for detecting high-order gene-gene (GxG) interactions. However, the biological interpretation of GxG interactions identified by MDR analysis is not easy. In order to aid the interpretation of MDR results, we propose a network graph analysis to elucidate the meaning of identified GxG interactions. The proposed network graph analysis consists of three steps. The first step is for performing GxG interaction analysis using MDR analysis. The second step is to draw the network graph using the MDR result. The third step is to provide biological evidence of the identified GxG interaction using external biological databases. The proposed method was applied to Korean Association Resource (KARE) data, containing 8838 individuals with 327,632 single-nucleotide polymorphisms, in order to perform GxG interaction analysis of body mass index (BMI). Our network graph analysis successfully showed that many identified GxG interactions have known biological evidence related to BMI. We expect that our network graph analysis will be helpful to interpret the biological meaning of GxG interactions.

  9. No boundaries: genomes, organisms, and ecological interactions responsible for divergence and reproductive isolation.

    Science.gov (United States)

    Etges, William J

    2014-01-01

    Revealing the genetic basis of traits that cause reproductive isolation, particularly premating or sexual isolation, usually involves the same challenges as most attempts at genotype-phenotype mapping and so requires knowledge of how these traits are expressed in different individuals, populations, and environments, particularly under natural conditions. Genetic dissection of speciation phenotypes thus requires understanding of the internal and external contexts in which underlying genetic elements are expressed. Gene expression is a product of complex interacting factors internal and external to the organism including developmental programs, the genetic background including nuclear-cytotype interactions, epistatic relationships, interactions among individuals or social effects, stochasticity, and prevailing variation in ecological conditions. Understanding of genomic divergence associated with reproductive isolation will be facilitated by functional expression analysis of annotated genomes in organisms with well-studied evolutionary histories, phylogenetic affinities, and known patterns of ecological variation throughout their life cycles. I review progress and prospects for understanding the pervasive role of host plant use on genetic and phenotypic expression of reproductive isolating mechanisms in cactophilic Drosophila mojavensis and suggest how this system can be used as a model for revealing the genetic basis for species formation in organisms where speciation phenotypes are under the joint influences of genetic and environmental factors. © The American Genetic Association. 2014. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. The copepod Tigriopus: A promising marine model organism for ecotoxicology and environmental genomics

    Energy Technology Data Exchange (ETDEWEB)

    Raisuddin, Sheikh [Department of Chemistry and the National Research Lab of Marine Molecular and Environmental Bioscience, College of Natural Sciences, Hanyang University, Seoul 133-791 (Korea, Republic of); Kwok, Kevin W.H. [Swire Institute of Marine Science, Department of Ecology and Biodiversity, University of Hong Kong, Pokfulam, Hong Kong (China); Leung, Kenneth M.Y. [Swire Institute of Marine Science, Department of Ecology and Biodiversity, University of Hong Kong, Pokfulam, Hong Kong (China); Schlenk, Daniel [Department of Environmental Sciences, University of California, Riverside, CA 92521 (United States); Lee, Jae-Seong [Department of Chemistry and the National Research Lab of Marine Molecular and Environmental Bioscience, College of Natural Sciences, Hanyang University, Seoul 133-791 (Korea, Republic of)]. E-mail: jslee2@hanyang.ac.kr

    2007-07-20

    There is an increasing body of evidence to support the significant role of invertebrates in assessing impacts of environmental contaminants on marine ecosystems. Therefore, in recent years massive efforts have been directed to identify viable and ecologically relevant invertebrate toxicity testing models. Tigriopus, a harpacticoid copepod has a number of promising characteristics which make it a candidate worth consideration in such efforts. Tigriopus and other copepods are widely distributed and ecologically important organisms. Their position in marine food chains is very prominent, especially with regard to the transfer of energy. Copepods also play an important role in the transportation of aquatic pollutants across the food chains. In recent years there has been a phenomenal increase in the knowledge base of Tigriopus spp., particularly in the areas of their ecology, geophylogeny, genomics and their behavioural, biochemical and molecular responses following exposure to environmental stressors and chemicals. Sequences of a number of important marker genes have been studied in various Tigriopus spp., notably T. californicus and T. japonicus. These genes belong to normal biophysiological functions (e.g. electron transport system enzymes) as well as stress and toxic chemical exposure responses (heat shock protein 20, glutathione reductase, glutathione S-transferase). Recently, 40,740 expressed sequenced tags (ESTs) from T. japonicus, have been sequenced and of them, 5673 ESTs showed significant hits (E-value, >1.0E-05) to the red flour beetle Tribolium genome database. Metals and organic pollutants such as antifouling agents, pesticides, polycyclic aromatic hydrocarbons (PAH) and polychrlorinated biphenyls (PCB) have shown reproducible biological responses when tested in Tigriopus spp. Promising results have been obtained when Tigriopus was used for assessment of risk associated with exposure to endocrine-disrupting chemicals (EDCs). Application of environmental

  11. The copepod Tigriopus: A promising marine model organism for ecotoxicology and environmental genomics

    International Nuclear Information System (INIS)

    Raisuddin, Sheikh; Kwok, Kevin W.H.; Leung, Kenneth M.Y.; Schlenk, Daniel; Lee, Jae-Seong

    2007-01-01

    There is an increasing body of evidence to support the significant role of invertebrates in assessing impacts of environmental contaminants on marine ecosystems. Therefore, in recent years massive efforts have been directed to identify viable and ecologically relevant invertebrate toxicity testing models. Tigriopus, a harpacticoid copepod has a number of promising characteristics which make it a candidate worth consideration in such efforts. Tigriopus and other copepods are widely distributed and ecologically important organisms. Their position in marine food chains is very prominent, especially with regard to the transfer of energy. Copepods also play an important role in the transportation of aquatic pollutants across the food chains. In recent years there has been a phenomenal increase in the knowledge base of Tigriopus spp., particularly in the areas of their ecology, geophylogeny, genomics and their behavioural, biochemical and molecular responses following exposure to environmental stressors and chemicals. Sequences of a number of important marker genes have been studied in various Tigriopus spp., notably T. californicus and T. japonicus. These genes belong to normal biophysiological functions (e.g. electron transport system enzymes) as well as stress and toxic chemical exposure responses (heat shock protein 20, glutathione reductase, glutathione S-transferase). Recently, 40,740 expressed sequenced tags (ESTs) from T. japonicus, have been sequenced and of them, 5673 ESTs showed significant hits (E-value, >1.0E-05) to the red flour beetle Tribolium genome database. Metals and organic pollutants such as antifouling agents, pesticides, polycyclic aromatic hydrocarbons (PAH) and polychrlorinated biphenyls (PCB) have shown reproducible biological responses when tested in Tigriopus spp. Promising results have been obtained when Tigriopus was used for assessment of risk associated with exposure to endocrine-disrupting chemicals (EDCs). Application of environmental

  12. Organization of nif gene cluster in Frankia sp. EuIK1 strain, a symbiont of Elaeagnus umbellata.

    Science.gov (United States)

    Oh, Chang Jae; Kim, Ho Bang; Kim, Jitae; Kim, Won Jin; Lee, Hyoungseok; An, Chung Sun

    2012-01-01

    The nucleotide sequence of a 20.5-kb genomic region harboring nif genes was determined and analyzed. The fragment was obtained from Frankia sp. EuIK1 strain, an indigenous symbiont of Elaeagnus umbellata. A total of 20 ORFs including 12 nif genes were identified and subjected to comparative analysis with the genome sequences of 3 Frankia strains representing diverse host plant specificities. The nucleotide and deduced amino acid sequences showed highest levels of identity with orthologous genes from an Elaeagnus-infecting strain. The gene organization patterns around the nif gene clusters were well conserved among all 4 Frankia strains. However, characteristic features appeared in the location of the nifV gene for each Frankia strain, depending on the type of host plant. Sequence analysis was performed to determine the transcription units and suggested that there could be an independent operon starting from the nifW gene in the EuIK strain. Considering the organization patterns and their total extensions on the genome, we propose that the nif gene clusters remained stable despite genetic variations occurring in the Frankia genomes.

  13. Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals.

    Science.gov (United States)

    Popova, Olga V; Mikhailov, Kirill V; Nikitin, Mikhail A; Logacheva, Maria D; Penin, Aleksey A; Muntyan, Maria S; Kedrova, Olga S; Petrov, Nikolai B; Panchin, Yuri V; Aleoshin, Vladimir V

    2016-01-01

    Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia.

  14. Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals.

    Directory of Open Access Journals (Sweden)

    Olga V Popova

    Full Text Available Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida and Pycnophyes kielensis (Allomalorhagida. Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even

  15. A survey of genomic studies supports association of circadian clock genes with bipolar disorder spectrum illnesses and lithium response.

    Directory of Open Access Journals (Sweden)

    Michael J McCarthy

    Full Text Available Circadian rhythm abnormalities in bipolar disorder (BD have led to a search for genetic abnormalities in circadian "clock genes" associated with BD. However, no significant clock gene findings have emerged from genome-wide association studies (GWAS. At least three factors could account for this discrepancy: complex traits are polygenic, the organization of the clock is more complex than previously recognized, and/or genetic risk for BD may be shared across multiple illnesses. To investigate these issues, we considered the clock gene network at three levels: essential "core" clock genes, upstream circadian clock modulators, and downstream clock controlled genes. Using relaxed thresholds for GWAS statistical significance, we determined the rates of clock vs. control genetic associations with BD, and four additional illnesses that share clinical features and/or genetic risk with BD (major depression, schizophrenia, attention deficit/hyperactivity. Then we compared the results to a set of lithium-responsive genes. Associations with BD-spectrum illnesses and lithium-responsiveness were both enriched among core clock genes but not among upstream clock modulators. Associations with BD-spectrum illnesses and lithium-responsiveness were also enriched among pervasively rhythmic clock-controlled genes but not among genes that were less pervasively rhythmic or non-rhythmic. Our analysis reveals previously unrecognized associations between clock genes and BD-spectrum illnesses, partly reconciling previously discordant results from past GWAS and candidate gene studies.

  16. Genome-wide comparative analysis of NBS-encoding genes between Brassica species and Arabidopsis thaliana.

    Science.gov (United States)

    Yu, Jingyin; Tehrim, Sadia; Zhang, Fengqi; Tong, Chaobo; Huang, Junyan; Cheng, Xiaohui; Dong, Caihua; Zhou, Yanqiu; Qin, Rui; Hua, Wei; Liu, Shengyi

    2014-01-03

    Plant disease resistance (R) genes with the nucleotide binding site (NBS) play an important role in offering resistance to pathogens. The availability of complete genome sequences of Brassica oleracea and Brassica rapa provides an important opportunity for researchers to identify and characterize NBS-encoding R genes in Brassica species and to compare with analogues in Arabidopsis thaliana based on a comparative genomics approach. However, little is known about the evolutionary fate of NBS-encoding genes in the Brassica lineage after split from A. thaliana. Here we present genome-wide analysis of NBS-encoding genes in B. oleracea, B. rapa and A. thaliana. Through the employment of HMM search and manual curation, we identified 157, 206 and 167 NBS-encoding genes in B. oleracea, B. rapa and A. thaliana genomes, respectively. Phylogenetic analysis among 3 species classified NBS-encoding genes into 6 subgroups. Tandem duplication and whole genome triplication (WGT) analyses revealed that after WGT of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions in Brassica ancestor were deleted or lost quickly, but NBS-encoding genes in Brassica species experienced species-specific gene amplification by tandem duplication after divergence of B. rapa and B. oleracea. Expression profiling of NBS-encoding orthologous gene pairs indicated the differential expression pattern of retained orthologous gene copies in B. oleracea and B. rapa. Furthermore, evolutionary analysis of CNL type NBS-encoding orthologous gene pairs among 3 species suggested that orthologous genes in B. rapa species have undergone stronger negative selection than those in B .oleracea species. But for TNL type, there are no significant differences in the orthologous gene pairs between the two species. This study is first identification and characterization of NBS-encoding genes in B. rapa and B. oleracea based on whole genome sequences. Through tandem duplication and whole genome

  17. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3  ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA

  18. Using FlyBase, a Database of Drosophila Genes and Genomes.

    Science.gov (United States)

    Marygold, Steven J; Crosby, Madeline A; Goodman, Joshua L

    2016-01-01

    For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic, and high-throughput technologies add to the quantity and diversity of available data and resources.FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets, or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback.This chapter provides an overview of the data content, organization, and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries.

  19. A massive incorporation of microbial genes into the genome of Tetranychus urticae, a polyphagous arthropod herbivore.

    Science.gov (United States)

    Wybouw, N; Van Leeuwen, T; Dermauw, W

    2018-06-01

    A number of horizontal gene transfers (HGTs) have been identified in the spider mite Tetranychus urticae, a chelicerate herbivore. However, the genome of this mite species has at present not been thoroughly mined for the presence of HGT genes. Here, we performed a systematic screen for HGT genes in the T. urticae genome using the h-index metric. Our results not only validated previously identified HGT genes but also uncovered 25 novel HGT genes. In addition to HGT genes with a predicted biochemical function in carbohydrate, lipid and folate metabolism, we also identified the horizontal transfer of a ketopantoate hydroxymethyltransferase and a pantoate β-alanine ligase gene. In plants and bacteria, both genes are essential for vitamin B5 biosynthesis and their presence in the mite genome strongly suggests that spider mites, similar to Bemisia tabaci and nematodes, can synthesize their own vitamin B5. We further show that HGT genes were physically embedded within the mite genome and were expressed in different life stages. By screening chelicerate genomes and transcriptomes, we were able to estimate the evolutionary histories of these HGTs during chelicerate evolution. Our study suggests that HGT has made a significant and underestimated impact on the metabolic repertoire of plant-feeding spider mites. © 2018 The Royal Entomological Society.

  20. Genomic organization, sequence divergence, and recombination of feline immunodeficiency virus from lions in the wild

    Science.gov (United States)

    Pecon-Slattery, Jill; McCracken, Carrie L; Troyer, Jennifer L; VandeWoude, Sue; Roelke, Melody; Sondgeroth, Kerry; Winterbach, Christiaan; Winterbach, Hanlie; O'Brien, Stephen J

    2008-01-01

    Background Feline immunodeficiency virus (FIV) naturally infects multiple species of cat and is related to human immunodeficiency virus in humans. FIV infection causes AIDS-like disease and mortality in the domestic cat (Felis catus) and serves as a natural model for HIV infection in humans. In African lions (Panthera leo) and other exotic felid species, disease etiology introduced by FIV infection are less clear, but recent studies indicate that FIV causes moderate to severe CD4 depletion. Results In this study, comparative genomic methods are used to evaluate the full proviral genome of two geographically distinct FIV subtypes isolated from free-ranging lions. Genome organization of FIVPle subtype B (9891 bp) from lions in the Serengeti National Park in Tanzania and FIVPle subtype E (9899 bp) isolated from lions in the Okavango Delta in Botswana, both resemble FIV genome sequence from puma, Pallas cat and domestic cat across 5' LTR, gag, pol, vif, orfA, env, rev and 3'LTR regions. Comparative analyses of available full-length FIV consisting of subtypes A, B and C from FIVFca, Pallas cat FIVOma and two puma FIVPco subtypes A and B recapitulate the species-specific monophyly of FIV marked by high levels of genetic diversity both within and between species. Across all FIVPle gene regions except env, lion subtypes B and E are monophyletic, and marginally more similar to Pallas cat FIVOma than to other FIV. Sequence analyses indicate the SU and TM regions of env vary substantially between subtypes, with FIVPle subtype E more related to domestic cat FIVFca than to FIVPle subtype B and FIVOma likely reflecting recombination between strains in the wild. Conclusion This study demonstrates the necessity of whole-genome analysis to complement population/gene-based studies, which are of limited utility in uncovering complex events such as recombination that may lead to functional differences in virulence and pathogenicity. These full-length lion lentiviruses are integral to

  1. Genomic organization, sequence divergence, and recombination of feline immunodeficiency virus from lions in the wild

    Directory of Open Access Journals (Sweden)

    Sondgeroth Kerry

    2008-02-01

    Full Text Available Abstract Background Feline immunodeficiency virus (FIV naturally infects multiple species of cat and is related to human immunodeficiency virus in humans. FIV infection causes AIDS-like disease and mortality in the domestic cat (Felis catus and serves as a natural model for HIV infection in humans. In African lions (Panthera leo and other exotic felid species, disease etiology introduced by FIV infection are less clear, but recent studies indicate that FIV causes moderate to severe CD4 depletion. Results In this study, comparative genomic methods are used to evaluate the full proviral genome of two geographically distinct FIV subtypes isolated from free-ranging lions. Genome organization of FIVPle subtype B (9891 bp from lions in the Serengeti National Park in Tanzania and FIVPle subtype E (9899 bp isolated from lions in the Okavango Delta in Botswana, both resemble FIV genome sequence from puma, Pallas cat and domestic cat across 5' LTR, gag, pol, vif, orfA, env, rev and 3'LTR regions. Comparative analyses of available full-length FIV consisting of subtypes A, B and C from FIVFca, Pallas cat FIVOma and two puma FIVPco subtypes A and B recapitulate the species-specific monophyly of FIV marked by high levels of genetic diversity both within and between species. Across all FIVPle gene regions except env, lion subtypes B and E are monophyletic, and marginally more similar to Pallas cat FIVOma than to other FIV. Sequence analyses indicate the SU and TM regions of env vary substantially between subtypes, with FIVPle subtype E more related to domestic cat FIVFca than to FIVPle subtype B and FIVOma likely reflecting recombination between strains in the wild. Conclusion This study demonstrates the necessity of whole-genome analysis to complement population/gene-based studies, which are of limited utility in uncovering complex events such as recombination that may lead to functional differences in virulence and pathogenicity. These full-length lion

  2. A comparison of digital gene expression profiling and methyl DNA immunoprecipitation as methods for gene discovery in honeybee (Apis mellifera behavioural genomic analyses.

    Directory of Open Access Journals (Sweden)

    Cui Guan

    Full Text Available The honey bee has a well-organized system of division of labour among workers. Workers typically progress through a series of discrete behavioural castes as they age, and this has become an important case study for exploring how dynamic changes in gene expression can influence behaviour. Here we applied both digital gene expression analysis and methyl DNA immunoprecipitation analysis to nurse, forager and reverted nurse bees (nurses that have returned to the nursing state after a period spent foraging from the same colony in order to compare the outcomes of these different forms of genomic analysis. A total of 874 and 710 significantly differentially expressed genes were identified in forager/nurse and reverted nurse/forager comparisons respectively. Of these, 229 genes exhibited reversed directions of gene expression differences between the forager/nurse and reverted nurse/forager comparisons. Using methyl-DNA immunoprecipitation combined with high-throughput sequencing (MeDIP-seq we identified 366 and 442 significantly differentially methylated genes in forager/nurse and reverted nurse/forager comparisons respectively. Of these, 165 genes were identified as differentially methylated in both comparisons. However, very few genes were identified as both differentially expressed and differentially methylated in our comparisons of nurses and foragers. These findings confirm that changes in both gene expression and DNA methylation are involved in the nurse and forager behavioural castes, but the different analytical methods reveal quite distinct sets of candidate genes.

  3. Genome-wide identification of SAUR genes in watermelon (Citrullus lanatus).

    Science.gov (United States)

    Zhang, Na; Huang, Xing; Bao, Yaning; Wang, Bo; Zeng, Hongxia; Cheng, Weishun; Tang, Mi; Li, Yuhua; Ren, Jian; Sun, Yuhong

    2017-07-01

    The early auxin responsive SAUR family is an important gene family in auxin signal transduction. We here present the first report of a genome-wide identification of SAUR genes in watermelon genome. We successfully identified 65 ClaSAURs and provide a genomic framework for future study on these genes. Phylogenetic result revealed a Cucurbitaceae-specific SAUR subfamily and contribute to understanding of the evolutionary pattern of SAUR genes in plants. Quantitative RT-PCR analysis demonstrates the existed expression of 11 randomly selected SAUR genes in watermelon tissues. ClaSAUR36 was highly expressed in fruit, for which further study might bring a new prospective for watermelon fruit development. Moreover, correlation analysis revealed the similar expression profiles of SAUR genes between watermelon and Arabidopsis during shoot organogenesis. This work gives us a new support for the conserved auxin machinery in plants.

  4. Evolution of genes and genomes on the Drosophila phylogeny

    DEFF Research Database (Denmark)

    Clark, Andrew G; Eisen, Michael B; Smith, Douglas R

    2007-01-01

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the ......Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here...... tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila...

  5. Genome-wide Identification and Expression Analysis of Half-size ABCG Genes in Malus × domestica

    Directory of Open Access Journals (Sweden)

    Juanjuan MA

    2018-03-01

    Full Text Available Half-size adenosine triphosphate-binding cassette transporter subgroup G (ABCG genes play crucial roles in regulating the movements of a variety of substrates and have been well studied in several plants. However, half-size ABCGs have not been characterized in detail in apple (Malus × domestica Borkh.. Here, we performed a genome-wide identification and expression analysis of the half-size ABCG gene family in apple. A total of 46 apple half-size ABCGs were identified and divided into six clusters according to the phylogenetic analysis. A gene structural analysis showed that most half-size ABCGs in the same cluster shared a similar exon–intron organization. A gene duplication analysis showed that segmental, tandem and whole-genome duplications could account for the expansion of half-size ABCG transporters in M. domestica. Moreover, a promoter scan, digital expression analysis and RNA-seq revealed that MdABCG21 may be involved in root's cytokinin transport and that ABCG17 may be involved in the lateral bud development of M. spectabilis ‘Bly114’ by mediating cytokinin transport. The data presented here lay the foundation for further investigations into the biological and physiological processes and functions of half-size ABCG genes in apple. Keywords: apple, ABCG gene, duplication, gene expression

  6. Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array

    Directory of Open Access Journals (Sweden)

    Sugnet Charles

    2006-12-01

    Full Text Available Abstract Background Alternative splicing is a mechanism for increasing protein diversity by excluding or including exons during post-transcriptional processing. Alternatively spliced proteins are particularly relevant in oncology since they may contribute to the etiology of cancer, provide selective drug targets, or serve as a marker set for cancer diagnosis. While conventional identification of splice variants generally targets individual genes, we present here a new exon-centric array (GeneChip Human Exon 1.0 ST that allows genome-wide identification of differential splice variation, and concurrently provides a flexible and inclusive analysis of gene expression. Results We analyzed 20 paired tumor-normal colon cancer samples using a microarray designed to detect over one million putative exons that can be virtually assembled into potential gene-level transcripts according to various levels of prior supporting evidence. Analysis of high confidence (empirically supported transcripts identified 160 differentially expressed genes, with 42 genes occupying a network impacting cell proliferation and another twenty nine genes with unknown functions. A more speculative analysis, including transcripts based solely on computational prediction, produced another 160 differentially expressed genes, three-fourths of which have no previous annotation. We also present a comparison of gene signal estimations from the Exon 1.0 ST and the U133 Plus 2.0 arrays. Novel splicing events were predicted by experimental algorithms that compare the relative contribution of each exon to the cognate transcript intensity in each tissue. The resulting candidate splice variants were validated with RT-PCR. We found nine genes that were differentially spliced between colon tumors and normal colon tissues, several of which have not been previously implicated in cancer. Top scoring candidates from our analysis were also found to substantially overlap with EST-based bioinformatic

  7. A gene-based linkage map for Bicyclus anynana butterflies allows for a comprehensive analysis of synteny with the lepidopteran reference genome.

    Directory of Open Access Journals (Sweden)

    Patrícia Beldade

    2009-02-01

    Full Text Available Lepidopterans (butterflies and moths are a rich and diverse order of insects, which, despite their economic impact and unusual biological properties, are relatively underrepresented in terms of genomic resources. The genome of the silkworm Bombyx mori has been fully sequenced, but comparative lepidopteran genomics has been hampered by the scarcity of information for other species. This is especially striking for butterflies, even though they have diverse and derived phenotypes (such as color vision and wing color patterns and are considered prime models for the evolutionary and developmental analysis of ecologically relevant, complex traits. We focus on Bicyclus anynana butterflies, a laboratory system for studying the diversification of novelties and serially repeated traits. With a panel of 12 small families and a biphasic mapping approach, we first assigned 508 expressed genes to segregation groups and then ordered 297 of them within individual linkage groups. We also coarsely mapped seven color pattern loci. This is the richest gene-based map available for any butterfly species and allowed for a broad-coverage analysis of synteny with the lepidopteran reference genome. Based on 462 pairs of mapped orthologous markers in Bi. anynana and Bo. mori, we observed strong conservation of gene assignment to chromosomes, but also evidence for numerous large- and small-scale chromosomal rearrangements. With gene collections growing for a variety of target organisms, the ability to place those genes in their proper genomic context is paramount. Methods to map expressed genes and to compare maps with relevant model systems are crucial to extend genomic-level analysis outside classical model species. Maps with gene-based markers are useful for comparative genomics and to resolve mapped genomic regions to a tractable number of candidate genes, especially if there is synteny with related model species. This is discussed in relation to the identification of

  8. Architectural protein subclasses shape 3-D organization of genomes during lineage commitment

    Science.gov (United States)

    Phillips-Cremins, Jennifer E.; Sauria, Michael E. G.; Sanyal, Amartya; Gerasimova, Tatiana I.; Lajoie, Bryan R.; Bell, Joshua S. K.; Ong, Chin-Tong; Hookway, Tracy A.; Guo, Changying; Sun, Yuhua; Bland, Michael J.; Wagstaff, William; Dalton, Stephen; McDevitt, Todd C.; Sen, Ranjan; Dekker, Job; Taylor, James; Corces, Victor G.

    2013-01-01

    Summary Understanding the topological configurations of chromatin may reveal valuable insights into how the genome and epigenome act in concert to control cell fate during development. Here we generate high-resolution architecture maps across seven genomic loci in embryonic stem cells and neural progenitor cells. We observe a hierarchy of 3-D interactions that undergo marked reorganization at the sub-Mb scale during differentiation. Distinct combinations of CTCF, Mediator, and cohesin show widespread enrichment in looping interactions at different length scales. CTCF/cohesin anchor long-range constitutive interactions that form the topological basis for invariant sub-domains. Conversely, Mediator/cohesin together with pioneer factors bridge shortrange enhancer-promoter interactions within and between larger sub-domains. Knockdown of Smc1 or Med12 in ES cells results in disruption of spatial architecture and down-regulation of genes found in cohesin-mediated interactions. We conclude that cell type-specific chromatin organization occurs at the sub-Mb scale and that architectural proteins shape the genome in hierarchical length scales. PMID:23706625

  9. In silico exploration of Red Sea Bacillus genomes for natural product biosynthetic gene clusters

    KAUST Repository

    Othoum, Ghofran K; Bougouffa, Salim; Razali, Rozaimi; Bokhari, Ameerah; Alamoudi, Soha; Antunes, André ; Gao, Xin; Hoehndorf, Robert; Arold, Stefan T.; Gojobori, Takashi; Hirt, Heribert; Mijakovic, Ivan; Bajic, Vladimir B.; Lafi, Feras Fawzi; Essack, Magbubah

    2018-01-01

    are better potential sources for novel antibiotics. Moreover, the genome of the Red Sea strain B. paralicheniformis Bac48 is more enriched in modular PKS genes compared to B. licheniformis strains and other B. paralicheniformis strains. This may be linked

  10. Comparative inference of duplicated genes produced by polyploidization in soybean genome.

    Science.gov (United States)

    Yang, Yanmei; Wang, Jinpeng; Di, Jianyong

    2013-01-01

    Soybean (Glycine max) is one of the most important crop plants for providing protein and oil. It is important to investigate soybean genome for its economic and scientific value. Polyploidy is a widespread and recursive phenomenon during plant evolution, and it could generate massive duplicated genes which is an important resource for genetic innovation. Improved sequence alignment criteria and statistical analysis are used to identify and characterize duplicated genes produced by polyploidization in soybean. Based on the collinearity method, duplicated genes by whole genome duplication account for 70.3% in soybean. From the statistical analysis of the molecular distances between duplicated genes, our study indicates that the whole genome duplication event occurred more than once in the genome evolution of soybean, which is often distributed near the ends of chromosomes.

  11. Site-Specific Integration of Exogenous Genes Using Genome Editing Technologies in Zebrafish

    Directory of Open Access Journals (Sweden)

    Atsuo Kawahara

    2016-05-01

    Full Text Available The zebrafish (Danio rerio is an ideal vertebrate model to investigate the developmental molecular mechanism of organogenesis and regeneration. Recent innovation in genome editing technologies, such as zinc finger nucleases (ZFNs, transcription activator-like effector nucleases (TALENs and the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR associated protein 9 (Cas9 system, have allowed researchers to generate diverse genomic modifications in whole animals and in cultured cells. The CRISPR/Cas9 and TALEN techniques frequently induce DNA double-strand breaks (DSBs at the targeted gene, resulting in frameshift-mediated gene disruption. As a useful application of genome editing technology, several groups have recently reported efficient site-specific integration of exogenous genes into targeted genomic loci. In this review, we provide an overview of TALEN- and CRISPR/Cas9-mediated site-specific integration of exogenous genes in zebrafish.

  12. Analysis of genomic imbalances and gene expression changes in transformed follicular lymphoma (FL)

    DEFF Research Database (Denmark)

    Obel, G.; Farinha, P.; Lam, W.

    2005-01-01

    American patients with transformed FL. Methods: High-resolution BAC-array comparative genomic hybridisation (CGH) was used to detect genomic imbalances. Gene expression profiling was performed using cDNA microarrays (Affymetrix). Results: Of 9 biopsy pairs identified so far, analysis results of the first 4...

  13. CAGO: a software tool for dynamic visual comparison and correlation measurement of genome organization.

    Directory of Open Access Journals (Sweden)

    Yi-Feng Chang

    Full Text Available CAGO (Comparative Analysis of Genome Organization is developed to address two critical shortcomings of conventional genome atlas plotters: lack of dynamic exploratory functions and absence of signal analysis for genomic properties. With dynamic exploratory functions, users can directly manipulate chromosome tracks of a genome atlas and intuitively identify distinct genomic signals by visual comparison. Signal analysis of genomic properties can further detect inconspicuous patterns from noisy genomic properties and calculate correlations between genomic properties across various genomes. To implement dynamic exploratory functions, CAGO presents each genome atlas in Scalable Vector Graphics (SVG format and allows users to interact with it using a SVG viewer through JavaScript. Signal analysis functions are implemented using R statistical software and a discrete wavelet transformation package waveslim. CAGO is not only a plotter for generating complex genome atlases, but also a platform for exploring genome atlases with dynamic exploratory functions for visual comparison and with signal analysis for comparing genomic properties across multiple organisms. The web-based application of CAGO, its source code, user guides, video demos, and live examples are publicly available and can be accessed at http://cbs.ym.edu.tw/cago.

  14. Genome-wide methylation analysis identifies genes silenced in non-seminoma cell lines.

    Science.gov (United States)

    Noor, Dzul Azri Mohamed; Jeyapalan, Jennie N; Alhazmi, Safiah; Carr, Matthew; Squibb, Benjamin; Wallace, Claire; Tan, Christopher; Cusack, Martin; Hughes, Jaime; Reader, Tom; Shipley, Janet; Sheer, Denise; Scotting, Paul J

    2016-01-01

    Silencing of genes by DNA methylation is a common phenomenon in many types of cancer. However, the genome-wide effect of DNA methylation on gene expression has been analysed in relatively few cancers. Germ cell tumours (GCTs) are a complex group of malignancies. They are unique in developing from a pluripotent progenitor cell. Previous analyses have suggested that non-seminomas exhibit much higher levels of DNA methylation than seminomas. The genomic targets that are methylated, the extent to which this results in gene silencing and the identity of the silenced genes most likely to play a role in the tumours' biology have not yet been established. In this study, genome-wide methylation and expression analysis of GCT cell lines was combined with gene expression data from primary tumours to address this question. Genome methylation was analysed using the Illumina infinium HumanMethylome450 bead chip system and gene expression was analysed using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Regulation by methylation was confirmed by demethylation using 5-aza-2-deoxycytidine and reverse transcription-quantitative PCR. Large differences in the level of methylation of the CpG islands of individual genes between tumour cell lines correlated well with differential gene expression. Treatment of non-seminoma cells with 5-aza-2-deoxycytidine verified that methylation of all genes tested played a role in their silencing in yolk sac tumour cells and many of these genes were also differentially expressed in primary tumours. Genes silenced by methylation in the various GCT cell lines were identified. Several pluripotency-associated genes were identified as a major functional group of silenced genes.

  15. Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species.

    Science.gov (United States)

    Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V

    2017-09-30

    Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis , we investigated nTNL orthologs in the genomes of common bean, Medicago , soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis , common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.

  16. Candidate genes revealed by a genome scan for mosquito resistance to a bacterial insecticide: sequence and gene expression variations

    Directory of Open Access Journals (Sweden)

    David Jean-Philippe

    2009-11-01

    Full Text Available Abstract Background Genome scans are becoming an increasingly popular approach to study the genetic basis of adaptation and speciation, but on their own, they are often helpless at identifying the specific gene(s or mutation(s targeted by selection. This shortcoming is hopefully bound to disappear in the near future, thanks to the wealth of new genomic resources that are currently being developed for many species. In this article, we provide a foretaste of this exciting new era by conducting a genome scan in the mosquito Aedes aegypti with the aim to look for candidate genes involved in resistance to Bacillus thuringiensis subsp. israelensis (Bti insecticidal toxins. Results The genome of a Bti-resistant and a Bti-susceptible strains was surveyed using about 500 MITE-based molecular markers, and the loci showing the highest inter-strain genetic differentiation were sequenced and mapped on the Aedes aegypti genome sequence. Several good candidate genes for Bti-resistance were identified in the vicinity of these highly differentiated markers. Two of them, coding for a cadherin and a leucine aminopeptidase, were further examined at the sequence and gene expression levels. In the resistant strain, the cadherin gene displayed patterns of nucleotide polymorphisms consistent with the action of positive selection (e.g. an excess of high compared to intermediate frequency mutations, as well as a significant under-expression compared to the susceptible strain. Conclusion Both sequence and gene expression analyses agree to suggest a role for positive selection in the evolution of this cadherin gene in the resistant strain. However, it is unlikely that resistance to Bti is conferred by this gene alone, and further investigation will be needed to characterize other genes significantly associated with Bti resistance in Ae. aegypti. Beyond these results, this article illustrates how genome scans can build on the body of new genomic information (here, full

  17. Construction of the BAC Library of Small Abalone (Haliotis diversicolor) for Gene Screening and Genome Characterization.

    Science.gov (United States)

    Jiang, Likun; You, Weiwei; Zhang, Xiaojun; Xu, Jian; Jiang, Yanliang; Wang, Kai; Zhao, Zixia; Chen, Baohua; Zhao, Yunfeng; Mahboob, Shahid; Al-Ghanim, Khalid A; Ke, Caihuan; Xu, Peng

    2016-02-01

    The small abalone (Haliotis diversicolor) is one of the most important aquaculture species in East Asia. To facilitate gene cloning and characterization, genome analysis, and genetic breeding of it, we constructed a large-insert bacterial artificial chromosome (BAC) library, which is an important genetic tool for advanced genetics and genomics research. The small abalone BAC library includes 92,610 clones with an average insert size of 120 Kb, equivalent to approximately 7.6× of the small abalone genome. We set up three-dimensional pools and super pools of 18,432 BAC clones for target gene screening using PCR method. To assess the approach, we screened 12 target genes in these 18,432 BAC clones and identified 16 positive BAC clones. Eight positive BAC clones were then sequenced and assembled with the next generation sequencing platform. The assembled contigs representing these 8 BAC clones spanned 928 Kb of the small abalone genome, providing the first batch of genome sequences for genome evaluation and characterization. The average GC content of small abalone genome was estimated as 40.33%. A total of 21 protein-coding genes, including 7 target genes, were annotated into the 8 BACs, which proved the feasibility of PCR screening approach with three-dimensional pools in small abalone BAC library. One hundred fifty microsatellite loci were also identified from the sequences for marker development in the future. The BAC library and clone pools provided valuable resources and tools for genetic breeding and conservation of H. diversicolor.

  18. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  19. Regulation of gene expression in Mycoplasmas: contribution from Mycoplasma hyopneumoniae and Mycoplasma synoviae genome sequences

    Directory of Open Access Journals (Sweden)

    Humberto Maciel França Madeira

    2007-01-01

    Full Text Available This report describes the transcription apparatus of Mycoplasma hyopneumoniae (strains J and 7448 and Mycoplasma synoviae, using a comparative genomics approach to summarize the main features related to transcription and control of gene expression in mycoplasmas. Most of the transcription-related genes present in the three strains are well conserved among mycoplasmas. Some unique aspects of transcription in mycoplasmas and the scarcity of regulatory proteins in mycoplasma genomes are discussed.

  20. The Fa