WorldWideScience

Sample records for gene cluster proteins

  1. Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2008-04-01

    Full Text Available Abstract Background The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net. Conclusion The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.

  2. Molecular comparison of the structural proteins encoding gene clusters of two related Lactobacillus delbrueckii bacteriophages.

    Science.gov (United States)

    Vasala, A; Dupont, L; Baumann, M; Ritzenthaler, P; Alatossava, T

    1993-01-01

    Virulent phage LL-H and temperate phage mv4 are two related bacteriophages of Lactobacillus delbrueckii. The gene clusters encoding structural proteins of these two phages have been sequenced and further analyzed. Six open reading frames (ORF-1 to ORF-6) were detected. Protein sequencing and Western immunoblotting experiments confirmed that ORF-3 (g34) encoded the main capsid protein Gp34. The presence of a putative late promoter in front of the phage LL-H g34 gene was suggested by primer extension experiments. Comparative sequence analysis between phage LL-H and phage mv4 revealed striking similarities in the structure and organization of this gene cluster, suggesting that the genes encoding phage structural proteins belong to a highly conservative module. Images PMID:8497043

  3. Lactobacillus plantarum gene clusters encoding putative cell-surface protein complexes for carbohydrate utilization are conserved in specific gram-positive bacteria

    Directory of Open Access Journals (Sweden)

    Muscariello Lidia

    2006-05-01

    Full Text Available Abstract Background Genomes of gram-positive bacteria encode many putative cell-surface proteins, of which the majority has no known function. From the rapidly increasing number of available genome sequences it has become apparent that many cell-surface proteins are conserved, and frequently encoded in gene clusters or operons, suggesting common functions, and interactions of multiple components. Results A novel gene cluster encoding exclusively cell-surface proteins was identified, which is conserved in a subgroup of gram-positive bacteria. Each gene cluster generally has one copy of four new gene families called cscA, cscB, cscC and cscD. Clusters encoding these cell-surface proteins were found only in complete genomes of Lactobacillus plantarum, Lactobacillus sakei, Enterococcus faecalis, Listeria innocua, Listeria monocytogenes, Lactococcus lactis ssp lactis and Bacillus cereus and in incomplete genomes of L. lactis ssp cremoris, Lactobacillus casei, Enterococcus faecium, Pediococcus pentosaceus, Lactobacillius brevis, Oenococcus oeni, Leuconostoc mesenteroides, and Bacillus thuringiensis. These genes are neither present in the genomes of streptococci, staphylococci and clostridia, nor in the Lactobacillus acidophilus group, suggesting a niche-specific distribution, possibly relating to association with plants. All encoded proteins have a signal peptide for secretion by the Sec-dependent pathway, while some have cell-surface anchors, novel WxL domains, and putative domains for sugar binding and degradation. Transcriptome analysis in L. plantarum shows that the cscA-D genes are co-expressed, supporting their operon organization. Many gene clusters are significantly up-regulated in a glucose-grown, ccpA-mutant derivative of L. plantarum, suggesting catabolite control. This is supported by the presence of predicted CRE-sites upstream or inside the up-regulated cscA-D gene clusters. Conclusion We propose that the CscA, CscB, CscC and Csc

  4. Protein-protein association and cellular localization of four essential gene products encoded by tellurite resistance-conferring cluster "ter" from pathogenic Escherichia coli.

    Science.gov (United States)

    Valkovicova, Lenka; Vavrova, Silvia Minarikova; Mravec, Jozef; Grones, Jozef; Turna, Jan

    2013-12-01

    Gene cluster "ter" conferring high tellurite resistance has been identified in various pathogenic bacteria including Escherichia coli O157:H7. However, the precise mechanism as well as the molecular function of the respective gene products is unclear. Here we describe protein-protein association and localization analyses of four essential Ter proteins encoded by minimal resistance-conferring fragment (terBCDE) by means of recombinant expression. By using a two-plasmid complementation system we show that the overproduced single Ter proteins are not able to mediate tellurite resistance, but all Ter members play an irreplaceable role within the cluster. We identified several types of homotypic and heterotypic protein-protein associations among the Ter proteins by in vitro and in vivo pull-down assays and determined their cellular localization by cytosol/membrane fractionation. Our results strongly suggest that Ter proteins function involves their mutual association, which probably happens at the interface of the inner plasma membrane and the cytosol.

  5. Transcriptional analysis of the jamaicamide gene cluster from the marine cyanobacterium Lyngbya majuscula and identification of possible regulatory proteins

    Directory of Open Access Journals (Sweden)

    Dorrestein Pieter C

    2009-12-01

    Full Text Available Abstract Background The marine cyanobacterium Lyngbya majuscula is a prolific producer of bioactive secondary metabolites. Although biosynthetic gene clusters encoding several of these compounds have been identified, little is known about how these clusters of genes are transcribed or regulated, and techniques targeting genetic manipulation in Lyngbya strains have not yet been developed. We conducted transcriptional analyses of the jamaicamide gene cluster from a Jamaican strain of Lyngbya majuscula, and isolated proteins that could be involved in jamaicamide regulation. Results An unusually long untranslated leader region of approximately 840 bp is located between the jamaicamide transcription start site (TSS and gene cluster start codon. All of the intergenic regions between the pathway ORFs were transcribed into RNA in RT-PCR experiments; however, a promoter prediction program indicated the possible presence of promoters in multiple intergenic regions. Because the functionality of these promoters could not be verified in vivo, we used a reporter gene assay in E. coli to show that several of these intergenic regions, as well as the primary promoter preceding the TSS, are capable of driving β-galactosidase production. A protein pulldown assay was also used to isolate proteins that may regulate the jamaicamide pathway. Pulldown experiments using the intergenic region upstream of jamA as a DNA probe isolated two proteins that were identified by LC-MS/MS. By BLAST analysis, one of these had close sequence identity to a regulatory protein in another cyanobacterial species. Protein comparisons suggest a possible correlation between secondary metabolism regulation and light dependent complementary chromatic adaptation. Electromobility shift assays were used to evaluate binding of the recombinant proteins to the jamaicamide promoter region. Conclusion Insights into natural product regulation in cyanobacteria are of significant value to drug discovery

  6. Gene cluster statistics with gene families.

    Science.gov (United States)

    Raghupathy, Narayanan; Durand, Dannie

    2009-05-01

    Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such "gene clusters" is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data

  7. Diametrical clustering for identifying anti-correlated gene clusters.

    Science.gov (United States)

    Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

    2003-09-01

    Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.

  8. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

    Directory of Open Access Journals (Sweden)

    Mazo Ilya

    2007-07-01

    Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology

  9. Comparative analysis of clustering methods for gene expression time course data

    Directory of Open Access Journals (Sweden)

    Ivan G. Costa

    2004-01-01

    Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.

  10. Characterization of the largest effector gene cluster of Ustilago maydis.

    Directory of Open Access Journals (Sweden)

    Thomas Brefort

    2014-07-01

    Full Text Available In the genome of the biotrophic plant pathogen Ustilago maydis, many of the genes coding for secreted protein effectors modulating virulence are arranged in gene clusters. The vast majority of these genes encode novel proteins whose expression is coupled to plant colonization. The largest of these gene clusters, cluster 19A, encodes 24 secreted effectors. Deletion of the entire cluster results in severe attenuation of virulence. Here we present the functional analysis of this genomic region. We show that a 19A deletion mutant behaves like an endophyte, i.e. is still able to colonize plants and complete the infection cycle. However, tumors, the most conspicuous symptoms of maize smut disease, are only rarely formed and fungal biomass in infected tissue is significantly reduced. The generation and analysis of strains carrying sub-deletions identified several genes significantly contributing to tumor formation after seedling infection. Another of the effectors could be linked specifically to anthocyanin induction in the infected tissue. As the individual contributions of these genes to tumor formation were small, we studied the response of maize plants to the whole cluster mutant as well as to several individual mutants by array analysis. This revealed distinct plant responses, demonstrating that the respective effectors have discrete plant targets. We propose that the analysis of plant responses to effector mutant strains that lack a strong virulence phenotype may be a general way to visualize differences in effector function.

  11. Identification, characterization and metagenome analysis of oocyte-specific genes organized in clusters in the mouse genome

    Directory of Open Access Journals (Sweden)

    Vaiman Daniel

    2005-05-01

    Full Text Available Abstract Background Genes specifically expressed in the oocyte play key roles in oogenesis, ovarian folliculogenesis, fertilization and/or early embryonic development. In an attempt to identify novel oocyte-specific genes in the mouse, we have used an in silico subtraction methodology, and we have focused our attention on genes that are organized in genomic clusters. Results In the present work, five clusters have been studied: a cluster of thirteen genes characterized by an F-box domain localized on chromosome 9, a cluster of six genes related to T-cell leukaemia/lymphoma protein 1 (Tcl1 on chromosome 12, a cluster composed of a SPErm-associated glutamate (E-Rich (Speer protein expressed in the oocyte in the vicinity of four unknown genes specifically expressed in the testis on chromosome 14, a cluster composed of the oocyte secreted protein-1 (Oosp-1 gene and two Oosp-related genes on chromosome 19, all three being characterized by a partial N-terminal zona pellucida-like domain, and another small cluster of two genes on chromosome 19 as well, composed of a TWIK-Related spinal cord K+ channel encoding-gene, and an unknown gene predicted in silico to be testis-specific. The specificity of expression was confirmed by RT-PCR and in situ hybridization for eight and five of them, respectively. Finally, we showed by comparing all of the isolated and clustered oocyte-specific genes identified so far in the mouse genome, that the oocyte-specific clusters are significantly closer to telomeres than isolated oocyte-specific genes are. Conclusion We have studied five clusters of genes specifically expressed in female, some of them being also expressed in male germ-cells. Moreover, contrarily to non-clustered oocyte-specific genes, those that are organized in clusters tend to map near chromosome ends, suggesting that this specific near-telomere position of oocyte-clusters in rodents could constitute an evolutionary advantage. Understanding the biological

  12. Conditions for the Evolution of Gene Clusters in Bacterial Genomes

    Science.gov (United States)

    Ballouz, Sara; Francis, Andrew R.; Lan, Ruiting; Tanaka, Mark M.

    2010-01-01

    Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters. PMID:20168992

  13. Conditions for the evolution of gene clusters in bacterial genomes.

    Directory of Open Access Journals (Sweden)

    Sara Ballouz

    2010-02-01

    Full Text Available Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model, genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters.

  14. Conserved syntenic clusters of protein coding genes are missing in birds.

    Science.gov (United States)

    Lovell, Peter V; Wirthlin, Morgan; Wilhelm, Larry; Minx, Patrick; Lazar, Nathan H; Carbone, Lucia; Warren, Wesley C; Mello, Claudio V

    2014-01-01

    Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.

  15. Genomic organization of the rat alpha 2u-globulin gene cluster.

    Science.gov (United States)

    McFadyen, D A; Addison, W; Locke, J

    1999-05-01

    The alpha 2u-globulin are a group of similar proteins, belonging to the lipocalin superfamily of proteins, that are synthesized in a subset of secretory tissues in rats. The many alpha 2u-globulin isoforms are encoded by a multigene family that exhibits extensive homology. Despite a high degree of sequence identity, individual family members show diverse expression patterns involving complex hormonal, tissue-specific, and developmental regulation. Analysis suggests that there are approximately 20 alpha 2u-globulin genes in the rat genome. We have used fluorescence in situ hybridization (FISH) to show that the alpha 2u-globulin genes are clustered at a single site on rat Chromosome (Chr) 5 (5q22-24). Southern blots of rat genomic DNA separated by pulsed field gel electrophoresis indicated that the alpha 2u-globulin genes are contained on two NruI fragments with a total size of 880 kbp. Analysis of three P1 clones containing alpha 2u-globulin genes indicated that the alpha 2u-globulin genes are tandemly arranged in a head-to-tail fashion. The organization of the alpha 2u-globulin genes in the rat as a tandem array of single genes differs from the homologous major urinary protein genes in the mouse, which are organized as tandem arrays of divergently oriented gene pairs. The structure of these gene clusters may have consequences for the proposed function, as a pheromone transporter, for the protein products encoded by these genes.

  16. Regulatory role of tetR gene in a novel gene cluster of Acidovorax avenae subsp. avenae RS-1 under oxidative stress

    Directory of Open Access Journals (Sweden)

    He eLiu

    2014-10-01

    Full Text Available Acidovorax avenae subsp. avenae is the causal agent of bacterial brown stripe disease in rice. In this study, we characterized a novel horizontal transfer of a gene cluster, including tetR, on the chromosome of A. avenae subsp. avenae RS-1 by genome-wide analysis. TetR acted as a repressor in this gene cluster and the oxidative stress resistance was enhanced in tetR-deletion mutant strain. Electrophoretic mobility shift assay (EMSA demonstrated that TetR regulator bound directly to the promoter of this gene cluster. Consistently, the results of quantitative real-time PCR also showed alterations in expression of associated genes. Moreover, the proteins affected by TetR under oxidative stress were revealed by comparing proteomic profiles of wild-type and mutant strains via 1D SDS-PAGE and LC-MS/MS analyses. Taken together, our results demonstrated that tetR gene in this novel gene cluster contributed to cell survival under oxidative stress, and TetR protein played an important regulatory role in growth kinetics, biofilm-forming capability, SOD and catalase activity, and oxide detoxicating ability.

  17. Regulatory role of tetR gene in a novel gene cluster of Acidovorax avenae subsp. avenae RS-1 under oxidative stress.

    Science.gov (United States)

    Liu, He; Yang, Chun-Lan; Ge, Meng-Yu; Ibrahim, Muhammad; Li, Bin; Zhao, Wen-Jun; Chen, Gong-You; Zhu, Bo; Xie, Guan-Lin

    2014-01-01

    Acidovorax avenae subsp. avenae is the causal agent of bacterial brown stripe disease in rice. In this study, we characterized a novel horizontal transfer of a gene cluster, including tetR, on the chromosome of A. avenae subsp. avenae RS-1 by genome-wide analysis. TetR acted as a repressor in this gene cluster and the oxidative stress resistance was enhanced in tetR-deletion mutant strain. Electrophoretic mobility shift assay demonstrated that TetR regulator bound directly to the promoter of this gene cluster. Consistently, the results of quantitative real-time PCR also showed alterations in expression of associated genes. Moreover, the proteins affected by TetR under oxidative stress were revealed by comparing proteomic profiles of wild-type and mutant strains via 1D SDS-PAGE and LC-MS/MS analyses. Taken together, our results demonstrated that tetR gene in this novel gene cluster contributed to cell survival under oxidative stress, and TetR protein played an important regulatory role in growth kinetics, biofilm-forming capability, superoxide dismutase and catalase activity, and oxide detoxicating ability.

  18. Site-directed mutagenesis of Azotobacter vinelandii ferredoxin I: [Fe-S] cluster-driven protein rearrangement

    International Nuclear Information System (INIS)

    Martin, A.E.; Burgess, B.K.; Stout, C.D.; Cash, V.L.; Dean, D.R.; Jensen, G.M.; Stephens, P.J.

    1990-01-01

    Azotobacter vinelandii ferredoxin I is a small protein that contains one [4Fe-4S] cluster and one [3Fe-4S] cluster. Recently the x-ray crystal structure has been redetermined and the fdxA gene, which encodes the protein, has been cloned and sequenced. Here the authors report the site-directed mutation of Cys-20, which is a ligand of the [4Fe-4S] cluster in the native protein, to alanine and the characterization of the protein product by x-ray crystallographic and spectroscopic methods. The data show that the mutant protein again contains one [4Fe-4S] cluster and one [3Fe-4S] cluster. The new [4Fe-4S] cluster obtains its fourth ligand from Cys-24, a free cysteine in the native structure. The formation of this [4Fe-4S] cluster drives rearrangement of the protein structure

  19. Correlation of mRNA and protein levels: Cell type-specific gene expression of cluster designation antigens in the prostate

    Directory of Open Access Journals (Sweden)

    Deutsch Eric W

    2008-05-01

    Full Text Available Abstract Background: Expression levels of mRNA and protein by cell types exhibit a range of correlations for different genes. In this study, we compared levels of mRNA abundance for several cluster designation (CD genes determined by gene arrays using magnetic sorted and laser-capture microdissected human prostate cells with levels of expression of the respective CD proteins determined by immunohistochemical staining in the major cell types of the prostate – basal epithelial, luminal epithelial, stromal fibromuscular, and endothelial – and for prostate precursor/stem cells and prostate carcinoma cells. Immunohistochemical stains of prostate tissues from more than 50 patients were scored for informative CD antigen expression and compared with cell-type specific transcriptomes. Results: Concordance between gene and protein expression findings based on 'present' vs. 'absent' calls ranged from 46 to 68%. Correlation of expression levels was poor to moderate (Pearson correlations ranged from 0 to 0.63. Divergence between the two data types was most frequently seen for genes whose array signals exceeded background (> 50 but lacked immunoreactivity by immunostaining. This could be due to multiple factors, e.g. low levels of protein expression, technological sensitivities, sample processing, probe set definition or anatomical origin of tissue and actual biological differences between transcript and protein abundance. Conclusion: Agreement between these two very different methodologies has great implications for their respective use in both molecular studies and clinical trials employing molecular biomarkers.

  20. Genomic organization, tissue distribution and functional characterization of the rat Pate gene cluster.

    Directory of Open Access Journals (Sweden)

    Angireddy Rajesh

    Full Text Available The cysteine rich prostate and testis expressed (Pate proteins identified till date are thought to resemble the three fingered protein/urokinase-type plasminogen activator receptor proteins. In this study, for the first time, we report the identification, cloning and characterization of rat Pate gene cluster and also determine the expression pattern. The rat Pate genes are clustered on chromosome 8 and their predicted proteins retained the ten cysteine signature characteristic to TFP/Ly-6 protein family. PATE and PATE-F three dimensional protein structure was found to be similar to that of the toxin bucandin. Though Pate gene expression is thought to be prostate and testis specific, we observed that rat Pate genes are also expressed in seminal vesicle and epididymis and in tissues beyond the male reproductive tract. In the developing rats (20-60 day old, expression of Pate genes seem to be androgen dependent in the epididymis and testis. In the adult rat, androgen ablation resulted in down regulation of the majority of Pate genes in the epididymides. PATE and PATE-F proteins were found to be expressed abundantly in the male reproductive tract of rats and on the sperm. Recombinant PATE protein exhibited potent antibacterial activity, whereas PATE-F did not exhibit any antibacterial activity. Pate expression was induced in the epididymides when challenged with LPS. Based on our results, we conclude that rat PATE proteins may contribute to the reproductive and defense functions.

  1. Persistence drives gene clustering in bacterial genomes

    Directory of Open Access Journals (Sweden)

    Rocha Eduardo PC

    2008-01-01

    Full Text Available Abstract Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering.

  2. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

    Directory of Open Access Journals (Sweden)

    Cooper James B

    2010-03-01

    Full Text Available Abstract Background Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. Results We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. Conclusions By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.

  3. Ensemble attribute profile clustering: discovering and characterizing groups of genes with similar patterns of biological features

    Directory of Open Access Journals (Sweden)

    Bissell MJ

    2006-03-01

    Full Text Available Abstract Background Ensemble attribute profile clustering is a novel, text-based strategy for analyzing a user-defined list of genes and/or proteins. The strategy exploits annotation data present in gene-centered corpora and utilizes ideas from statistical information retrieval to discover and characterize properties shared by subsets of the list. The practical utility of this method is demonstrated by employing it in a retrospective study of two non-overlapping sets of genes defined by a published investigation as markers for normal human breast luminal epithelial cells and myoepithelial cells. Results Each genetic locus was characterized using a finite set of biological properties and represented as a vector of features indicating attributes associated with the locus (a gene attribute profile. In this study, the vector space models for a pre-defined list of genes were constructed from the Gene Ontology (GO terms and the Conserved Domain Database (CDD protein domain terms assigned to the loci by the gene-centered corpus LocusLink. This data set of GO- and CDD-based gene attribute profiles, vectors of binary random variables, was used to estimate multiple finite mixture models and each ensuing model utilized to partition the profiles into clusters. The resultant partitionings were combined using a unanimous voting scheme to produce consensus clusters, sets of profiles that co-occured consistently in the same cluster. Attributes that were important in defining the genes assigned to a consensus cluster were identified. The clusters and their attributes were inspected to ascertain the GO and CDD terms most associated with subsets of genes and in conjunction with external knowledge such as chromosomal location, used to gain functional insights into human breast biology. The 52 luminal epithelial cell markers and 89 myoepithelial cell markers are disjoint sets of genes. Ensemble attribute profile clustering-based analysis indicated that both lists

  4. Using the clustered circular layout as an informative method for visualizing protein-protein interaction networks.

    Science.gov (United States)

    Fung, David C Y; Wilkins, Marc R; Hart, David; Hong, Seok-Hee

    2010-07-01

    The force-directed layout is commonly used in computer-generated visualizations of protein-protein interaction networks. While it is good for providing a visual outline of the protein complexes and their interactions, it has two limitations when used as a visual analysis method. The first is poor reproducibility. Repeated running of the algorithm does not necessarily generate the same layout, therefore, demanding cognitive readaptation on the investigator's part. The second limitation is that it does not explicitly display complementary biological information, e.g. Gene Ontology, other than the protein names or gene symbols. Here, we present an alternative layout called the clustered circular layout. Using the human DNA replication protein-protein interaction network as a case study, we compared the two network layouts for their merits and limitations in supporting visual analysis.

  5. RRW: repeated random walks on genome-scale protein networks for local cluster discovery

    Directory of Open Access Journals (Sweden)

    Can Tolga

    2009-09-01

    Full Text Available Abstract Background We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins. Results We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL, and find a significant improvement in the RRW clusters' precision and accuracy values. Conclusion RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters.

  6. Evolution of the C-Type Lectin-Like Receptor Genes of the DECTIN-1 Cluster in the NK Gene Complex

    Directory of Open Access Journals (Sweden)

    Susanne Sattler

    2012-01-01

    Full Text Available Pattern recognition receptors are crucial in initiating and shaping innate and adaptive immune responses and often belong to families of structurally and evolutionarily related proteins. The human C-type lectin-like receptors encoded in the DECTIN-1 cluster within the NK gene complex contain prominent receptors with pattern recognition function, such as DECTIN-1 and LOX-1. All members of this cluster share significant homology and are considered to have arisen from subsequent gene duplications. Recent developments in sequencing and the availability of comprehensive sequence data comprising many species showed that the receptors of the DECTIN-1 cluster are not only homologous to each other but also highly conserved between species. Even in Caenorhabditis elegans, genes displaying homology to the mammalian C-type lectin-like receptors have been detected. In this paper, we conduct a comprehensive phylogenetic survey and give an up-to-date overview of the currently available data on the evolutionary emergence of the DECTIN-1 cluster genes.

  7. Detection of protein complex from protein-protein interaction network using Markov clustering

    International Nuclear Information System (INIS)

    Ochieng, P J; Kusuma, W A; Haryanto, T

    2017-01-01

    Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks. (paper)

  8. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  9. Strategies to regulate transcription factor-mediated gene positioning and interchromosomal clustering at the nuclear periphery.

    Science.gov (United States)

    Randise-Hinchliff, Carlo; Coukos, Robert; Sood, Varun; Sumner, Michael Chas; Zdraljevic, Stefan; Meldi Sholl, Lauren; Garvey Brickner, Donna; Ahmed, Sara; Watchmaker, Lauren; Brickner, Jason H

    2016-03-14

    In budding yeast, targeting of active genes to the nuclear pore complex (NPC) and interchromosomal clustering is mediated by transcription factor (TF) binding sites in the gene promoters. For example, the binding sites for the TFs Put3, Ste12, and Gcn4 are necessary and sufficient to promote positioning at the nuclear periphery and interchromosomal clustering. However, in all three cases, gene positioning and interchromosomal clustering are regulated. Under uninducing conditions, local recruitment of the Rpd3(L) histone deacetylase by transcriptional repressors blocks Put3 DNA binding. This is a general function of yeast repressors: 16 of 21 repressors blocked Put3-mediated subnuclear positioning; 11 of these required Rpd3. In contrast, Ste12-mediated gene positioning is regulated independently of DNA binding by mitogen-activated protein kinase phosphorylation of the Dig2 inhibitor, and Gcn4-dependent targeting is up-regulated by increasing Gcn4 protein levels. These different regulatory strategies provide either qualitative switch-like control or quantitative control of gene positioning over different time scales. © 2016 Randise-Hinchliff et al.

  10. Histone and ribosomal RNA repetitive gene clusters of the boll weevil are linked in a tandem array.

    Science.gov (United States)

    Roehrdanz, R; Heilmann, L; Senechal, P; Sears, S; Evenson, P

    2010-08-01

    Histones are the major protein component of chromatin structure. The histone family is made up of a quintet of proteins, four core histones (H2A, H2B, H3 & H4) and the linker histones (H1). Spacers are found between the coding regions. Among insects this quintet of genes is usually clustered and the clusters are tandemly repeated. Ribosomal DNA contains a cluster of the rRNA sequences 18S, 5.8S and 28S. The rRNA genes are separated by the spacers ITS1, ITS2 and IGS. This cluster is also tandemly repeated. We found that the ribosomal RNA repeat unit of at least two species of Anthonomine weevils, Anthonomus grandis and Anthonomus texanus (Coleoptera: Curculionidae), is interspersed with a block containing the histone gene quintet. The histone genes are situated between the rRNA 18S and 28S genes in what is known as the intergenic spacer region (IGS). The complete reiterated Anthonomus grandis histone-ribosomal sequence is 16,248 bp.

  11. Bacillus sp.CDB3 isolated from cattle dip-sites possesses two ars gene clusters

    Institute of Scientific and Technical Information of China (English)

    Somanath Bhat; Xi Luo; Zhiqiang Xu; Lixia Liu; Ren Zhang

    2011-01-01

    Contamination of soil and water by arsenic is a global problem.In Australia, the dipping of cattle in arsenic-containing solution to control cattle ticks in last centenary has left many sites heavily contaminated with arsenic and other toxicants.We had previously isolated five soil bacterial strains (CDB1-5) highly resistant to arsenic.To understand the resistance mechanism, molecular studies have been carried out.Two chromosome-encoded arsenic resistance (ars) gene clusters have been cloned from CDB3 (Bacillus sp.).They both function in Escherichia coli and cluster 1 exerts a much higher resistance to the toxic metalloid.Cluster 2 is smaller possessing four open reading frames (ORFs) arsRorf2BC, similar to that identified in Bacillus subtilis Skin element.Among the eight ORFs in cluster 1 five are analogs of common ars genes found in other bacteria, however, organized in a unique order arsRBCDA instead of arsRDABC.Three other putative genes are located directly downstream and designated as arsTIP based on the homologies of their theoretical translation sequences respectively to thioredoxin reductases, iron-sulphur cluster proteins and protein phosphatases.The latter two are novel of any known ars operons.The arsD gene from Bacillus species was cloned for the first time and the predict protein differs from the well studied E.coli ArsD by lacking two pairs of C-terrninal cysteine residues.Its functional involvement in arsenic resistance has been confirmed by a deletion experiment.There exists also an inverted repeat in the intergenic region between arsC and arsD implying some unknown transcription regulation.

  12. Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

    Energy Technology Data Exchange (ETDEWEB)

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-12-31

    Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involved in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.

  13. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    Science.gov (United States)

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  14. Identification of stress responsive genes by studying specific relationships between mRNA and protein abundance.

    Science.gov (United States)

    Morimoto, Shimpei; Yahara, Koji

    2018-03-01

    Protein expression is regulated by the production and degradation of mRNAs and proteins but the specifics of their relationship are controversial. Although technological advances have enabled genome-wide and time-series surveys of mRNA and protein abundance, recent studies have shown paradoxical results, with most statistical analyses being limited to linear correlation, or analysis of variance applied separately to mRNA and protein datasets. Here, using recently analyzed genome-wide time-series data, we have developed a statistical analysis framework for identifying which types of genes or biological gene groups have significant correlation between mRNA and protein abundance after accounting for potential time delays. Our framework stratifies all genes in terms of the extent of time delay, conducts gene clustering in each stratum, and performs a non-parametric statistical test of the correlation between mRNA and protein abundance in a gene cluster. Consequently, we revealed stronger correlations than previously reported between mRNA and protein abundance in two metabolic pathways. Moreover, we identified a pair of stress responsive genes ( ADC17 and KIN1 ) that showed a highly similar time series of mRNA and protein abundance. Furthermore, we confirmed robustness of the analysis framework by applying it to another genome-wide time-series data and identifying a cytoskeleton-related gene cluster (keratin 18, keratin 17, and mitotic spindle positioning) that shows similar correlation. The significant correlation and highly similar changes of mRNA and protein abundance suggests a concerted role of these genes in cellular stress response, which we consider provides an answer to the question of the specific relationships between mRNA and protein in a cell. In addition, our framework for studying the relationship between mRNAs and proteins in a cell will provide a basis for studying specific relationships between mRNA and protein abundance after accounting for potential

  15. Do protein crystals nucleate within dense liquid clusters?

    International Nuclear Information System (INIS)

    Maes, Dominique; Vorontsova, Maria A.; Potenza, Marco A. C.; Sanvito, Tiziano; Sleutel, Mike; Giglio, Marzio; Vekilov, Peter G.

    2015-01-01

    The evolution of protein-rich clusters and nucleating crystals were characterized by dynamic light scattering (DLS), confocal depolarized dynamic light scattering (cDDLS) and depolarized oblique illumination dark-field microscopy. Newly nucleated crystals within protein-rich clusters were detected directly. These observations indicate that the protein-rich clusters are locations for crystal nucleation. Protein-dense liquid clusters are regions of high protein concentration that have been observed in solutions of several proteins. The typical cluster size varies from several tens to several hundreds of nanometres and their volume fraction remains below 10 −3 of the solution. According to the two-step mechanism of nucleation, the protein-rich clusters serve as locations for and precursors to the nucleation of protein crystals. While the two-step mechanism explained several unusual features of protein crystal nucleation kinetics, a direct observation of its validity for protein crystals has been lacking. Here, two independent observations of crystal nucleation with the proteins lysozyme and glucose isomerase are discussed. Firstly, the evolutions of the protein-rich clusters and nucleating crystals were characterized simultaneously by dynamic light scattering (DLS) and confocal depolarized dynamic light scattering (cDDLS), respectively. It is demonstrated that protein crystals appear following a significant delay after cluster formation. The cDDLS correlation functions follow a Gaussian decay, indicative of nondiffusive motion. A possible explanation is that the crystals are contained inside large clusters and are driven by the elasticity of the cluster surface. Secondly, depolarized oblique illumination dark-field microscopy reveals the evolution from liquid clusters without crystals to newly nucleated crystals contained in the clusters to grown crystals freely diffusing in the solution. Collectively, the observations indicate that the protein-rich clusters in

  16. Predicting protein complexes from weighted protein-protein interaction graphs with a novel unsupervised methodology: Evolutionary enhanced Markov clustering.

    Science.gov (United States)

    Theofilatos, Konstantinos; Pavlopoulou, Niki; Papasavvas, Christoforos; Likothanassis, Spiros; Dimitrakopoulos, Christos; Georgopoulos, Efstratios; Moschopoulos, Charalampos; Mavroudi, Seferina

    2015-03-01

    Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new

  17. Structure based alignment and clustering of proteins (STRALCP)

    Science.gov (United States)

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  18. Co-evolution of secondary metabolite gene clusters and their host

    DEFF Research Database (Denmark)

    Kjærbølling, Inge; Vesth, Tammi Camilla; Frisvad, Jens Christian

    Secondary metabolite gene cluster evolution is mainly driven by two events: gene duplication and annexation and horizontal gene transfer. Here we use comparative genomics of Aspergillus species to investigate the evolution of secondary metabolite (SM) gene clusters across a wide spectrum of speci....... We investigate the dynamic evolutionary relationship between the cluster and the host by examining the genes within the cluster and the number of homologous genes found within the host and in closely related species.......Secondary metabolite gene cluster evolution is mainly driven by two events: gene duplication and annexation and horizontal gene transfer. Here we use comparative genomics of Aspergillus species to investigate the evolution of secondary metabolite (SM) gene clusters across a wide spectrum of species...

  19. Identification of stress responsive genes by studying specific relationships between mRNA and protein abundance

    Directory of Open Access Journals (Sweden)

    Shimpei Morimoto

    2018-03-01

    Full Text Available Protein expression is regulated by the production and degradation of mRNAs and proteins but the specifics of their relationship are controversial. Although technological advances have enabled genome-wide and time-series surveys of mRNA and protein abundance, recent studies have shown paradoxical results, with most statistical analyses being limited to linear correlation, or analysis of variance applied separately to mRNA and protein datasets. Here, using recently analyzed genome-wide time-series data, we have developed a statistical analysis framework for identifying which types of genes or biological gene groups have significant correlation between mRNA and protein abundance after accounting for potential time delays. Our framework stratifies all genes in terms of the extent of time delay, conducts gene clustering in each stratum, and performs a non-parametric statistical test of the correlation between mRNA and protein abundance in a gene cluster. Consequently, we revealed stronger correlations than previously reported between mRNA and protein abundance in two metabolic pathways. Moreover, we identified a pair of stress responsive genes (ADC17 and KIN1 that showed a highly similar time series of mRNA and protein abundance. Furthermore, we confirmed robustness of the analysis framework by applying it to another genome-wide time-series data and identifying a cytoskeleton-related gene cluster (keratin 18, keratin 17, and mitotic spindle positioning that shows similar correlation. The significant correlation and highly similar changes of mRNA and protein abundance suggests a concerted role of these genes in cellular stress response, which we consider provides an answer to the question of the specific relationships between mRNA and protein in a cell. In addition, our framework for studying the relationship between mRNAs and proteins in a cell will provide a basis for studying specific relationships between mRNA and protein abundance after

  20. Origin and distribution of epipolythiodioxopiperazine (ETP gene clusters in filamentous ascomycetes

    Directory of Open Access Journals (Sweden)

    Gardiner Donald M

    2007-09-01

    Full Text Available Abstract Background Genes responsible for biosynthesis of fungal secondary metabolites are usually tightly clustered in the genome and co-regulated with metabolite production. Epipolythiodioxopiperazines (ETPs are a class of secondary metabolite toxins produced by disparate ascomycete fungi and implicated in several animal and plant diseases. Gene clusters responsible for their production have previously been defined in only two fungi. Fungal genome sequence data have been surveyed for the presence of putative ETP clusters and cluster data have been generated from several fungal taxa where genome sequences are not available. Phylogenetic analysis of cluster genes has been used to investigate the assembly and heredity of these gene clusters. Results Putative ETP gene clusters are present in 14 ascomycete taxa, but absent in numerous other ascomycetes examined. These clusters are discontinuously distributed in ascomycete lineages. Gene content is not absolutely fixed, however, common genes are identified and phylogenies of six of these are separately inferred. In each phylogeny almost all cluster genes form monophyletic clades with non-cluster fungal paralogues being the nearest outgroups. This relatedness of cluster genes suggests that a progenitor ETP gene cluster assembled within an ancestral taxon. Within each of the cluster clades, the cluster genes group together in consistent subclades, however, these relationships do not always reflect the phylogeny of ascomycetes. Micro-synteny of several of the genes within the clusters provides further support for these subclades. Conclusion ETP gene clusters appear to have a single origin and have been inherited relatively intact rather than assembling independently in the different ascomycete lineages. This progenitor cluster has given rise to a small number of distinct phylogenetic classes of clusters that are represented in a discontinuous pattern throughout ascomycetes. The disjunct heredity of

  1. K-nearest uphill clustering in the protein structure space

    KAUST Repository

    Cui, Xuefeng

    2016-08-26

    The protein structure classification problem, which is to assign a protein structure to a cluster of similar proteins, is one of the most fundamental problems in the construction and application of the protein structure space. Early manually curated protein structure classifications (e.g., SCOP and CATH) are very successful, but recently suffer the slow updating problem because of the increased throughput of newly solved protein structures. Thus, fully automatic methods to cluster proteins in the protein structure space have been designed and developed. In this study, we observed that the SCOP superfamilies are highly consistent with clustering trees representing hierarchical clustering procedures, but the tree cutting is very challenging and becomes the bottleneck of clustering accuracy. To overcome this challenge, we proposed a novel density-based K-nearest uphill clustering method that effectively eliminates noisy pairwise protein structure similarities and identifies density peaks as cluster centers. Specifically, the density peaks are identified based on K-nearest uphills (i.e., proteins with higher densities) and K-nearest neighbors. To our knowledge, this is the first attempt to apply and develop density-based clustering methods in the protein structure space. Our results show that our density-based clustering method outperforms the state-of-the-art clustering methods previously applied to the problem. Moreover, we observed that computational methods and human experts could produce highly similar clusters at high precision values, while computational methods also suggest to split some large superfamilies into smaller clusters. © 2016 Elsevier B.V.

  2. Two Horizontally Transferred Xenobiotic Resistance Gene Clusters Associated with Detoxification of Benzoxazolinones by Fusarium Species

    Science.gov (United States)

    Glenn, Anthony E.; Davis, C. Britton; Gao, Minglu; Gold, Scott E.; Mitchell, Trevor R.; Proctor, Robert H.; Stewart, Jane E.; Snook, Maurice E.

    2016-01-01

    Microbes encounter a broad spectrum of antimicrobial compounds in their environments and often possess metabolic strategies to detoxify such xenobiotics. We have previously shown that Fusarium verticillioides, a fungal pathogen of maize known for its production of fumonisin mycotoxins, possesses two unlinked loci, FDB1 and FDB2, necessary for detoxification of antimicrobial compounds produced by maize, including the γ-lactam 2-benzoxazolinone (BOA). In support of these earlier studies, microarray analysis of F. verticillioides exposed to BOA identified the induction of multiple genes at FDB1 and FDB2, indicating the loci consist of gene clusters. One of the FDB1 cluster genes encoded a protein having domain homology to the metallo-β-lactamase (MBL) superfamily. Deletion of this gene (MBL1) rendered F. verticillioides incapable of metabolizing BOA and thus unable to grow on BOA-amended media. Deletion of other FDB1 cluster genes, in particular AMD1 and DLH1, did not affect BOA degradation. Phylogenetic analyses and topology testing of the FDB1 and FDB2 cluster genes suggested two horizontal transfer events among fungi, one being transfer of FDB1 from Fusarium to Colletotrichum, and the second being transfer of the FDB2 cluster from Fusarium to Aspergillus. Together, the results suggest that plant-derived xenobiotics have exerted evolutionary pressure on these fungi, leading to horizontal transfer of genes that enhance fitness or virulence. PMID:26808652

  3. A highly divergent gene cluster in honey bees encodes a novel silk family.

    Science.gov (United States)

    Sutherland, Tara D; Campbell, Peter M; Weisman, Sarah; Trueman, Holly E; Sriskantha, Alagacone; Wanjura, Wolfgang J; Haritos, Victoria S

    2006-11-01

    The pupal cocoon of the domesticated silk moth Bombyx mori is the best known and most extensively studied insect silk. It is not widely known that Apis mellifera larvae also produce silk. We have used a combination of genomic and proteomic techniques to identify four honey bee fiber genes (AmelFibroin1-4) and two silk-associated genes (AmelSA1 and 2). The four fiber genes are small, comprise a single exon each, and are clustered on a short genomic region where the open reading frames are GC-rich amid low GC intergenic regions. The genes encode similar proteins that are highly helical and predicted to form unusually tight coiled coils. Despite the similarity in size, structure, and composition of the encoded proteins, the genes have low primary sequence identity. We propose that the four fiber genes have arisen from gene duplication events but have subsequently diverged significantly. The silk-associated genes encode proteins likely to act as a glue (AmelSA1) and involved in silk processing (AmelSA2). Although the silks of honey bees and silkmoths both originate in larval labial glands, the silk proteins are completely different in their primary, secondary, and tertiary structures as well as the genomic arrangement of the genes encoding them. This implies independent evolutionary origins for these functionally related proteins.

  4. A functional bikaverin biosynthesis gene cluster in rare strains of Botrytis cinerea is positively controlled by VELVET.

    Directory of Open Access Journals (Sweden)

    Julia Schumacher

    Full Text Available The gene cluster responsible for the biosynthesis of the red polyketidic pigment bikaverin has only been characterized in Fusarium ssp. so far. Recently, a highly homologous but incomplete and nonfunctional bikaverin cluster has been found in the genome of the unrelated phytopathogenic fungus Botrytis cinerea. In this study, we provided evidence that rare B. cinerea strains such as 1750 have a complete and functional cluster comprising the six genes orthologous to Fusarium fujikuroi ffbik1-ffbik6 and do produce bikaverin. Phylogenetic analysis confirmed that the whole cluster was acquired from Fusarium through a horizontal gene transfer (HGT. In the bikaverin-nonproducing strain B05.10, the genes encoding bikaverin biosynthesis enzymes are nonfunctional due to deleterious mutations (bcbik2-3 or missing (bcbik1 but interestingly, the genes encoding the regulatory proteins BcBIK4 and BcBIK5 do not harbor deleterious mutations which suggests that they may still be functional. Heterologous complementation of the F. fujikuroi Δffbik4 mutant confirmed that bcbik4 of strain B05.10 is indeed fully functional. Deletion of bcvel1 in the pink strain 1750 resulted in loss of bikaverin and overproduction of melanin indicating that the VELVET protein BcVEL1 regulates the biosynthesis of the two pigments in an opposite manner. Although strain 1750 itself expresses a truncated BcVEL1 protein (100 instead of 575 aa that is nonfunctional with regard to sclerotia formation, virulence and oxalic acid formation, it is sufficient to regulate pigment biosynthesis (bikaverin and melanin and fenhexamid HydR2 type of resistance. Finally, a genetic cross between strain 1750 and a bikaverin-nonproducing strain sensitive to fenhexamid revealed that the functional bikaverin cluster is genetically linked to the HydR2 locus.

  5. Topological and organizational properties of the products of house-keeping and tissue-specific genes in protein-protein interaction networks.

    Science.gov (United States)

    Lin, Wen-Hsien; Liu, Wei-Chung; Hwang, Ming-Jing

    2009-03-11

    Human cells of various tissue types differ greatly in morphology despite having the same set of genetic information. Some genes are expressed in all cell types to perform house-keeping functions, while some are selectively expressed to perform tissue-specific functions. In this study, we wished to elucidate how proteins encoded by human house-keeping genes and tissue-specific genes are organized in human protein-protein interaction networks. We constructed protein-protein interaction networks for different tissue types using two gene expression datasets and one protein-protein interaction database. We then calculated three network indices of topological importance, the degree, closeness, and betweenness centralities, to measure the network position of proteins encoded by house-keeping and tissue-specific genes, and quantified their local connectivity structure. Compared to a random selection of proteins, house-keeping gene-encoded proteins tended to have a greater number of directly interacting neighbors and occupy network positions in several shortest paths of interaction between protein pairs, whereas tissue-specific gene-encoded proteins did not. In addition, house-keeping gene-encoded proteins tended to connect with other house-keeping gene-encoded proteins in all tissue types, whereas tissue-specific gene-encoded proteins also tended to connect with other tissue-specific gene-encoded proteins, but only in approximately half of the tissue types examined. Our analysis showed that house-keeping gene-encoded proteins tend to occupy important network positions, while those encoded by tissue-specific genes do not. The biological implications of our findings were discussed and we proposed a hypothesis regarding how cells organize their protein tools in protein-protein interaction networks. Our results led us to speculate that house-keeping gene-encoded proteins might form a core in human protein-protein interaction networks, while clusters of tissue-specific gene

  6. Structure-related clustering of gene expression fingerprints of thp-1 cells exposed to smaller polycyclic aromatic hydrocarbons.

    Science.gov (United States)

    Wan, B; Yarbrough, J W; Schultz, T W

    2008-01-01

    This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.

  7. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  8. Protein sequences clustering of herpes virus by using Tribe Markov clustering (Tribe-MCL)

    Science.gov (United States)

    Bustamam, A.; Siswantining, T.; Febriyani, N. L.; Novitasari, I. D.; Cahyaningrum, R. D.

    2017-07-01

    The herpes virus can be found anywhere and one of the important characteristics is its ability to cause acute and chronic infection at certain times so as a result of the infection allows severe complications occurred. The herpes virus is composed of DNA containing protein and wrapped by glycoproteins. In this work, the Herpes viruses family is classified and analyzed by clustering their protein-sequence using Tribe Markov Clustering (Tribe-MCL) algorithm. Tribe-MCL is an efficient clustering method based on the theory of Markov chains, to classify protein families from protein sequences using pre-computed sequence similarity information. We implement the Tribe-MCL algorithm using an open source program of R. We select 24 protein sequences of Herpes virus obtained from NCBI database. The dataset consists of three types of glycoprotein B, F, and H. Each type has eight herpes virus that infected humans. Based on our simulation using different inflation factor r=1.5, 2, 3 we find a various number of the clusters results. The greater the inflation factor the greater the number of their clusters. Each protein will grouped together in the same type of protein.

  9. Fast gene ontology based clustering for microarray experiments.

    Science.gov (United States)

    Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

    2008-11-21

    Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  10. Hox gene clusters in the Indonesian coelacanth, Latimeria menadoensis

    Science.gov (United States)

    Koh, Esther G. L.; Lam, Kevin; Christoffels, Alan; Erdmann, Mark V.; Brenner, Sydney; Venkatesh, Byrappa

    2003-01-01

    The Hox genes encode transcription factors that play a key role in specifying body plans of metazoans. They are organized into clusters that contain up to 13 paralogue group members. The complex morphology of vertebrates has been attributed to the duplication of Hox clusters during vertebrate evolution. In contrast to the single Hox cluster in the amphioxus (Branchiostoma floridae), an invertebrate-chordate, mammals have four clusters containing 39 Hox genes. Ray-finned fishes (Actinopterygii) such as zebrafish and fugu possess more than four Hox clusters. The coelacanth occupies a basal phylogenetic position among lobe-finned fishes (Sarcopterygii), which gave rise to the tetrapod lineage. The lobe fins of sarcopterygians are considered to be the evolutionary precursors of tetrapod limbs. Thus, the characterization of Hox genes in the coelacanth should provide insights into the origin of tetrapod limbs. We have cloned the complete second exon of 33 Hox genes from the Indonesian coelacanth, Latimeria menadoensis, by extensive PCR survey and genome walking. Phylogenetic analysis shows that 32 of these genes have orthologs in the four mammalian HOX clusters, including three genes (HoxA6, D1, and D8) that are absent in ray-finned fishes. The remaining coelacanth gene is an ortholog of hoxc1 found in zebrafish but absent in mammals. Our results suggest that coelacanths have four Hox clusters bearing a gene complement more similar to mammals than to ray-finned fishes, but with an additional gene, HoxC1, which has been lost during the evolution of mammals from lobe-finned fishes. PMID:12547909

  11. Archaeal Clusters of Orthologous Genes (arCOGs): An Update and Application for Analysis of Shared Features between Thermococcales, Methanococcales, and Methanobacteriales

    OpenAIRE

    Makarova, Kira; Wolf, Yuri; Koonin, Eugene

    2015-01-01

    With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for...

  12. Spectromicroscopy of self-assembled protein clusters

    Energy Technology Data Exchange (ETDEWEB)

    Schonschek, O.; Hormes, J.; Herzog, V. [Univ. of Bonn (Germany)

    1997-04-01

    The aim of this project is to use synchrotron radiation as a tool to study biomedical questions concerned with the thyroid glands. The biological background is outlined in a recent paper. In short, Thyroglobulin (TG), the precursor protein of the hormone thyroxine, forms large (20 - 500 microns in diameter) clusters in the extracellular lumen of thyrocytes. The process of the cluster formation is still not well understood but is thought to be a main storage mechanism of TG and therefore thyroxine inside the thyroid glands. For human thyroids, the interconnections of the proteins inside the clusters are mainly disulfide bondings. Normally, sulfur bridges are catalyzed by an enzyme called Protein Disulfide Bridge Isomerase (PDI). While this enzyme is supposed to be not present in any extracellular space, the cluster formation of TG takes place in the lumen between the thyrocytes. A possible explanation is the autocatalysis of TG.

  13. Identification of the Regulator Gene Responsible for the Acetone-Responsive Expression of the Binuclear Iron Monooxygenase Gene Cluster in Mycobacteria ▿

    Science.gov (United States)

    Furuya, Toshiki; Hirose, Satomi; Semba, Hisashi; Kino, Kuniki

    2011-01-01

    The mimABCD gene cluster encodes the binuclear iron monooxygenase that oxidizes propane and phenol in Mycobacterium smegmatis strain MC2 155 and Mycobacterium goodii strain 12523. Interestingly, expression of the mimABCD gene cluster is induced by acetone. In this study, we investigated the regulator gene responsible for this acetone-responsive expression. In the genome sequence of M. smegmatis strain MC2 155, the mimABCD gene cluster is preceded by a gene designated mimR, which is divergently transcribed. Sequence analysis revealed that MimR exhibits amino acid similarity with the NtrC family of transcriptional activators, including AcxR and AcoR, which are involved in acetone and acetoin metabolism, respectively. Unexpectedly, many homologs of the mimR gene were also found in the sequenced genomes of actinomycetes. A plasmid carrying a transcriptional fusion of the intergenic region between the mimR and mimA genes with a promoterless green fluorescent protein (GFP) gene was constructed and introduced into M. smegmatis strain MC2 155. Using a GFP reporter system, we confirmed by deletion and complementation analyses that the mimR gene product is the positive regulator of the mimABCD gene cluster expression that is responsive to acetone. M. goodii strain 12523 also utilized the same regulatory system as M. smegmatis strain MC2 155. Although transcriptional activators of the NtrC family generally control transcription using the σ54 factor, a gene encoding the σ54 factor was absent from the genome sequence of M. smegmatis strain MC2 155. These results suggest the presence of a novel regulatory system in actinomycetes, including mycobacteria. PMID:21856847

  14. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  15. Fast Gene Ontology based clustering for microarray experiments

    Directory of Open Access Journals (Sweden)

    Ovaska Kristian

    2008-11-01

    Full Text Available Abstract Background Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. Results We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Conclusion Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  16. Uncovering the functional constraints underlying the genomic organization of the odorant-binding protein genes.

    Science.gov (United States)

    Librado, Pablo; Rozas, Julio

    2013-01-01

    Animal olfactory systems have a critical role for the survival and reproduction of individuals. In insects, the odorant-binding proteins (OBPs) are encoded by a moderately sized gene family, and mediate the first steps of the olfactory processing. Most OBPs are organized in clusters of a few paralogs, which are conserved over time. Currently, the biological mechanism explaining the close physical proximity among OBPs is not yet established. Here, we conducted a comprehensive study aiming to gain insights into the mechanisms underlying the OBP genomic organization. We found that the OBP clusters are embedded within large conserved arrangements. These organizations also include other non-OBP genes, which often encode proteins integral to plasma membrane. Moreover, the conservation degree of such large clusters is related to the following: 1) the promoter architecture of the confined genes, 2) a characteristic transcriptional environment, and 3) the chromatin conformation of the chromosomal region. Our results suggest that chromatin domains may restrict the location of OBP genes to regions having the appropriate transcriptional environment, leading to the OBP cluster structure. However, the appropriate transcriptional environment for OBP and the other neighbor genes is not dominated by reduced levels of expression noise. Indeed, the stochastic fluctuations in the OBP transcript abundance may have a critical role in the combinatorial nature of the olfactory coding process.

  17. Heterologous expression of the Halothiobacillus neapolitanus carboxysomal gene cluster in Corynebacterium glutamicum.

    Science.gov (United States)

    Baumgart, Meike; Huber, Isabel; Abdollahzadeh, Iman; Gensch, Thomas; Frunzke, Julia

    2017-09-20

    Compartmentalization represents a ubiquitous principle used by living organisms to optimize metabolic flux and to avoid detrimental interactions within the cytoplasm. Proteinaceous bacterial microcompartments (BMCs) have therefore created strong interest for the encapsulation of heterologous pathways in microbial model organisms. However, attempts were so far mostly restricted to Escherichia coli. Here, we introduced the carboxysomal gene cluster of Halothiobacillus neapolitanus into the biotechnological platform species Corynebacterium gluta-micum. Transmission electron microscopy, fluorescence microscopy and single molecule localization microscopy suggested the formation of BMC-like structures in cells expressing the complete carboxysome operon or only the shell proteins. Purified carboxysomes consisted of the expected protein components as verified by mass spectrometry. Enzymatic assays revealed the functional production of RuBisCO in C. glutamicum both in the presence and absence of carboxysomal shell proteins. Furthermore, we could show that eYFP is targeted to the carboxysomes by fusion to the large RuBisCO subunit. Overall, this study represents the first transfer of an α-carboxysomal gene cluster into a Gram-positive model species supporting the modularity and orthogonality of these microcompartments, but also identified important challenges which need to be addressed on the way towards biotechnological application. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Genome-Wide Identification and Analysis of Genes Encoding PHD-Finger Protein in Tomato

    International Nuclear Information System (INIS)

    Hayat, S.; Cheng, Z.; Chen, X.

    2016-01-01

    The PHD-finger proteins are conserved in eukaryotic organisms and are involved in a variety of important functions in different biological processes in plants. However, the function of PHD fingers are poorly known in tomato (Solanum lycopersicum L.). In current study, we identified 45 putative genes coding Phd finger protein in tomato distributed on 11 chromosomes except for chromosome 8. Some of the genes encode other conserved key domains besides Phd-finger. Phylogenetic analysis of these 45 proteins resulted in seven clusters. Most Phd finger proteins were predicted to PML body location. These PHD-finger genes displayed differential expression either in various organs, at different development stages and under stresses in tomato. Our study provides the first systematic analysis of PHD-finger genes and proteins in tomato. This preliminary study provides a very useful reference information for Phd-finger proteins in tomato. They will be helpful for cloning and functional study of tomato PHD-finger genes. (author)

  19. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  20. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Directory of Open Access Journals (Sweden)

    Roslyn D Noar

    Full Text Available Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that

  1. Gene duplication, modularity and adaptation in the evolution of the aflatoxin gene cluster

    Directory of Open Access Journals (Sweden)

    Jakobek Judy L

    2007-07-01

    Full Text Available Abstract Background The biosynthesis of aflatoxin (AF involves over 20 enzymatic reactions in a complex polyketide pathway that converts acetate and malonate to the intermediates sterigmatocystin (ST and O-methylsterigmatocystin (OMST, the respective penultimate and ultimate precursors of AF. Although these precursors are chemically and structurally very similar, their accumulation differs at the species level for Aspergilli. Notable examples are A. nidulans that synthesizes only ST, A. flavus that makes predominantly AF, and A. parasiticus that generally produces either AF or OMST. Whether these differences are important in the evolutionary/ecological processes of species adaptation and diversification is unknown. Equally unknown are the specific genomic mechanisms responsible for ordering and clustering of genes in the AF pathway of Aspergillus. Results To elucidate the mechanisms that have driven formation of these clusters, we performed systematic searches of aflatoxin cluster homologs across five Aspergillus genomes. We found a high level of gene duplication and identified seven modules consisting of highly correlated gene pairs (aflA/aflB, aflR/aflS, aflX/aflY, aflF/aflE, aflT/aflQ, aflC/aflW, and aflG/aflL. With the exception of A. nomius, contrasts of mean Ka/Ks values across all cluster genes showed significant differences in selective pressure between section Flavi and non-section Flavi species. A. nomius mean Ka/Ks values were more similar to partial clusters in A. fumigatus and A. terreus. Overall, mean Ka/Ks values were significantly higher for section Flavi than for non-section Flavi species. Conclusion Our results implicate several genomic mechanisms in the evolution of ST, OMST and AF cluster genes. Gene modules may arise from duplications of a single gene, whereby the function of the pre-duplication gene is retained in the copy (aflF/aflE or the copies may partition the ancestral function (aflA/aflB. In some gene modules, the

  2. In-depth comparative analysis of malaria parasite genomes reveals protein-coding genes linked to human disease in Plasmodium falciparum genome.

    Science.gov (United States)

    Liu, Xuewu; Wang, Yuanyuan; Liang, Jiao; Wang, Luojun; Qin, Na; Zhao, Ya; Zhao, Gang

    2018-05-02

    Plasmodium falciparum is the most virulent malaria parasite capable of parasitizing human erythrocytes. The identification of genes related to this capability can enhance our understanding of the molecular mechanisms underlying human malaria and lead to the development of new therapeutic strategies for malaria control. With the availability of several malaria parasite genome sequences, performing computational analysis is now a practical strategy to identify genes contributing to this disease. Here, we developed and used a virtual genome method to assign 33,314 genes from three human malaria parasites, namely, P. falciparum, P. knowlesi and P. vivax, and three rodent malaria parasites, namely, P. berghei, P. chabaudi and P. yoelii, to 4605 clusters. Each cluster consisted of genes whose protein sequences were significantly similar and was considered as a virtual gene. Comparing the enriched values of all clusters in human malaria parasites with those in rodent malaria parasites revealed 115 P. falciparum genes putatively responsible for parasitizing human erythrocytes. These genes are mainly located in the chromosome internal regions and participate in many biological processes, including membrane protein trafficking and thiamine biosynthesis. Meanwhile, 289 P. berghei genes were included in the rodent parasite-enriched clusters. Most are located in subtelomeric regions and encode erythrocyte surface proteins. Comparing cluster values in P. falciparum with those in P. vivax and P. knowlesi revealed 493 candidate genes linked to virulence. Some of them encode proteins present on the erythrocyte surface and participate in cytoadhesion, virulence factor trafficking, or erythrocyte invasion, but many genes with unknown function were also identified. Cerebral malaria is characterized by accumulation of infected erythrocytes at trophozoite stage in brain microvascular. To discover cerebral malaria-related genes, fast Fourier transformation (FFT) was introduced to extract

  3. Large clusters of co-expressed genes in the Drosophila genome.

    Science.gov (United States)

    Boutanaev, Alexander M; Kalmykova, Alla I; Shevelyov, Yuri Y; Nurminsky, Dmitry I

    2002-12-12

    Clustering of co-expressed, non-homologous genes on chromosomes implies their co-regulation. In lower eukaryotes, co-expressed genes are often found in pairs. Clustering of genes that share aspects of transcriptional regulation has also been reported in higher eukaryotes. To advance our understanding of the mode of coordinated gene regulation in multicellular organisms, we performed a genome-wide analysis of the chromosomal distribution of co-expressed genes in Drosophila. We identified a total of 1,661 testes-specific genes, one-third of which are clustered on chromosomes. The number of clusters of three or more genes is much higher than expected by chance. We observed a similar trend for genes upregulated in the embryo and in the adult head, although the expression pattern of individual genes cannot be predicted on the basis of chromosomal position alone. Our data suggest that the prevalent mechanism of transcriptional co-regulation in higher eukaryotes operates with extensive chromatin domains that comprise multiple genes.

  4. Differential Retention of Gene Functions in a Secondary Metabolite Cluster.

    Science.gov (United States)

    Reynolds, Hannah T; Slot, Jason C; Divon, Hege H; Lysøe, Erik; Proctor, Robert H; Brown, Daren W

    2017-08-01

    In fungi, distribution of secondary metabolite (SM) gene clusters is often associated with host- or environment-specific benefits provided by SMs. In the plant pathogen Alternaria brassicicola (Dothideomycetes), the DEP cluster confers an ability to synthesize the SM depudecin, a histone deacetylase inhibitor that contributes weakly to virulence. The DEP cluster includes genes encoding enzymes, a transporter, and a transcription regulator. We investigated the distribution and evolution of the DEP cluster in 585 fungal genomes and found a wide but sporadic distribution among Dothideomycetes, Sordariomycetes, and Eurotiomycetes. We confirmed DEP gene expression and depudecin production in one fungus, Fusarium langsethiae. Phylogenetic analyses suggested 6-10 horizontal gene transfers (HGTs) of the cluster, including a transfer that led to the presence of closely related cluster homologs in Alternaria and Fusarium. The analyses also indicated that HGTs were frequently followed by loss/pseudogenization of one or more DEP genes. Independent cluster inactivation was inferred in at least four fungal classes. Analyses of transitions among functional, pseudogenized, and absent states of DEP genes among Fusarium species suggest enzyme-encoding genes are lost at higher rates than the transporter (DEP3) and regulatory (DEP6) genes. The phenotype of an experimentally-induced DEP3 mutant of Fusarium did not support the hypothesis that selective retention of DEP3 and DEP6 protects fungi from exogenous depudecin. Together, the results suggest that HGT and gene loss have contributed significantly to DEP cluster distribution, and that some DEP genes provide a greater fitness benefit possibly due to a differential tendency to form network connections. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.

  5. Functional clustering of time series gene expression data by Granger causality

    Science.gov (United States)

    2012-01-01

    Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425

  6. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Directory of Open Access Journals (Sweden)

    Zhimin Dai

    Full Text Available Biological nitrogen fixation is an essential function of acid mine drainage (AMD microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  7. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Science.gov (United States)

    Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  8. Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

    Science.gov (United States)

    Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417

  9. Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae

    Science.gov (United States)

    2013-01-01

    Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. Results We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. Conclusions This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites. PMID:23617571

  10. A scale invariant clustering of genes on human chromosome 7

    Directory of Open Access Journals (Sweden)

    Kendal Wayne S

    2004-01-01

    Full Text Available Abstract Background Vertebrate genes often appear to cluster within the background of nontranscribed genomic DNA. Here an analysis of the physical distribution of gene structures on human chromosome 7 was performed to confirm the presence of clustering, and to elucidate possible underlying statistical and biological mechanisms. Results Clustering of genes was confirmed by virtue of a variance of the number of genes per unit physical length that exceeded the respective mean. Further evidence for clustering came from a power function relationship between the variance and mean that possessed an exponent of 1.51. This power function implied that the spatial distribution of genes on chromosome 7 was scale invariant, and that the underlying statistical distribution had a Poisson-gamma (PG form. A PG distribution for the spatial scattering of genes was validated by stringent comparisons of both the predicted variance to mean power function and its cumulative distribution function to data derived from chromosome 7. Conclusion The PG distribution was consistent with at least two different biological models: In the microrearrangement model, the number of genes per unit length of chromosome represented the contribution of a random number of smaller chromosomal segments that had originated by random breakage and reconstruction of more primitive chromosomes. Each of these smaller segments would have necessarily contained (on average a gamma distributed number of genes. In the gene cluster model, genes would be scattered randomly to begin with. Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution. On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering. However, the underlying PG statistical model implicated neutral

  11. Contribution of the Pmra Promoter to Expression of Genes in the Escherichia coli mra Cluster of Cell Envelope Biosynthesis and Cell Division Genes

    Science.gov (United States)

    Mengin-Lecreulx, Dominique; Ayala, Juan; Bouhss, Ahmed; van Heijenoort, Jean; Parquet, Claudine; Hara, Hiroshi

    1998-01-01

    Recently, a promoter for the essential gene ftsI, which encodes penicillin-binding protein 3 of Escherichia coli, was precisely localized 1.9 kb upstream from this gene, at the beginning of the mra cluster of cell division and cell envelope biosynthesis genes (H. Hara, S. Yasuda, K. Horiuchi, and J. T. Park, J. Bacteriol. 179:5802–5811, 1997). Disruption of this promoter (Pmra) on the chromosome and its replacement by the lac promoter (Pmra::Plac) led to isopropyl-β-d-thiogalactopyranoside (IPTG)-dependent cells that lysed in the absence of inducer, a defect which was complemented only when the whole region from Pmra to ftsW, the fifth gene downstream from ftsI, was provided in trans on a plasmid. In the present work, the levels of various proteins involved in peptidoglycan synthesis and cell division were precisely determined in cells in which Pmra::Plac promoter expression was repressed or fully induced. It was confirmed that the Pmra promoter is required for expression of the first nine genes of the mra cluster: mraZ (orfC), mraW (orfB), ftsL (mraR), ftsI, murE, murF, mraY, murD, and ftsW. Interestingly, three- to sixfold-decreased levels of MurG and MurC enzymes were observed in uninduced Pmra::Plac cells. This was correlated with an accumulation of the nucleotide precursors UDP–N-acetylglucosamine and UDP–N-acetylmuramic acid, substrates of these enzymes, and with a depletion of the pool of UDP–N-acetylmuramyl pentapeptide, resulting in decreased cell wall peptidoglycan synthesis. Moreover, the expression of ftsZ, the penultimate gene from this cluster, was significantly reduced when Pmra expression was repressed. It was concluded that the transcription of the genes located downstream from ftsW in the mra cluster, from murG to ftsZ, is also mainly (but not exclusively) dependent on the Pmra promoter. PMID:9721276

  12. Time-series clustering of gene expression in irradiated and bystander fibroblasts: an application of FBPA clustering

    Directory of Open Access Journals (Sweden)

    Markatou Marianthi

    2011-01-01

    Full Text Available Abstract Background The radiation bystander effect is an important component of the overall biological response of tissues and organisms to ionizing radiation, but the signaling mechanisms between irradiated and non-irradiated bystander cells are not fully understood. In this study, we measured a time-series of gene expression after α-particle irradiation and applied the Feature Based Partitioning around medoids Algorithm (FBPA, a new clustering method suitable for sparse time series, to identify signaling modules that act in concert in the response to direct irradiation and bystander signaling. We compared our results with those of an alternate clustering method, Short Time series Expression Miner (STEM. Results While computational evaluations of both clustering results were similar, FBPA provided more biological insight. After irradiation, gene clusters were enriched for signal transduction, cell cycle/cell death and inflammation/immunity processes; but only FBPA separated clusters by function. In bystanders, gene clusters were enriched for cell communication/motility, signal transduction and inflammation processes; but biological functions did not separate as clearly with either clustering method as they did in irradiated samples. Network analysis confirmed p53 and NF-κB transcription factor-regulated gene clusters in irradiated and bystander cells and suggested novel regulators, such as KDM5B/JARID1B (lysine (K-specific demethylase 5B and HDACs (histone deacetylases, which could epigenetically coordinate gene expression after irradiation. Conclusions In this study, we have shown that a new time series clustering method, FBPA, can provide new leads to the mechanisms regulating the dynamic cellular response to radiation. The findings implicate epigenetic control of gene expression in addition to transcription factor networks.

  13. A robust approach based on Weibull distribution for clustering gene expression data

    Directory of Open Access Journals (Sweden)

    Gong Binsheng

    2011-05-01

    Full Text Available Abstract Background Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest. Results In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method, a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM using functional annotation information given by the Gene Ontology (GO. The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets. Conclusions The results demonstrate that our WDCM produces clusters

  14. Heterologous expression of pikromycin biosynthetic gene cluster using Streptomyces artificial chromosome system.

    Science.gov (United States)

    Pyeon, Hye-Rim; Nah, Hee-Ju; Kang, Seung-Hoon; Choi, Si-Sun; Kim, Eung-Soo

    2017-05-31

    Heterologous expression of biosynthetic gene clusters of natural microbial products has become an essential strategy for titer improvement and pathway engineering of various potentially-valuable natural products. A Streptomyces artificial chromosomal conjugation vector, pSBAC, was previously successfully applied for precise cloning and tandem integration of a large polyketide tautomycetin (TMC) biosynthetic gene cluster (Nah et al. in Microb Cell Fact 14(1):1, 2015), implying that this strategy could be employed to develop a custom overexpression scheme of natural product pathway clusters present in actinomycetes. To validate the pSBAC system as a generally-applicable heterologous overexpression system for a large-sized polyketide biosynthetic gene cluster in Streptomyces, another model polyketide compound, the pikromycin biosynthetic gene cluster, was preciously cloned and heterologously expressed using the pSBAC system. A unique HindIII restriction site was precisely inserted at one of the border regions of the pikromycin biosynthetic gene cluster within the chromosome of Streptomyces venezuelae, followed by site-specific recombination of pSBAC into the flanking region of the pikromycin gene cluster. Unlike the previous cloning process, one HindIII site integration step was skipped through pSBAC modification. pPik001, a pSBAC containing the pikromycin biosynthetic gene cluster, was directly introduced into two heterologous hosts, Streptomyces lividans and Streptomyces coelicolor, resulting in the production of 10-deoxymethynolide, a major pikromycin derivative. When two entire pikromycin biosynthetic gene clusters were tandemly introduced into the S. lividans chromosome, overproduction of 10-deoxymethynolide and the presence of pikromycin, which was previously not detected, were both confirmed. Moreover, comparative qRT-PCR results confirmed that the transcription of pikromycin biosynthetic genes was significantly upregulated in S. lividans containing tandem

  15. Clustering approaches to identifying gene expression patterns from DNA microarray data.

    Science.gov (United States)

    Do, Jin Hwan; Choi, Dong-Kug

    2008-04-30

    The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.

  16. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    Science.gov (United States)

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of EOperon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

    Science.gov (United States)

    Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming

    2015-01-01

    In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.

  18. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    Directory of Open Access Journals (Sweden)

    Wolf Yuri I

    2007-11-01

    Full Text Available Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs. Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. Results New Archaeal Clusters of Orthologous Genes (arCOGs were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile

  19. CAR gene cluster and transcript levels of carotenogenic genes in Rhodotorula mucilaginosa.

    Science.gov (United States)

    Landolfo, Sara; Ianiri, Giuseppe; Camiolo, Salvatore; Porceddu, Andrea; Mulas, Giuliana; Chessa, Rossella; Zara, Giacomo; Mannazzu, Ilaria

    2018-01-01

    A molecular approach was applied to the study of the carotenoid biosynthetic pathway of Rhodotorula mucilaginosa. At first, functional annotation of the genome of R. mucilaginosa C2.5t1 was carried out and gene ontology categories were assigned to 4033 predicted proteins. Then, a set of genes involved in different steps of carotenogenesis was identified and those coding for phytoene desaturase, phytoene synthase/lycopene cyclase and carotenoid dioxygenase (CAR genes) proved to be clustered within a region of ~10 kb. Quantitative PCR of the genes involved in carotenoid biosynthesis showed that genes coding for 3-hydroxy-3-methylglutharyl-CoA reductase and mevalonate kinase are induced during exponential phase while no clear trend of induction was observed for phytoene synthase/lycopene cyclase and phytoene dehydrogenase encoding genes. Thus, in R. mucilaginosa the induction of genes involved in the early steps of carotenoid biosynthesis is transient and accompanies the onset of carotenoid production, while that of CAR genes does not correlate with the amount of carotenoids produced. The transcript levels of genes coding for carotenoid dioxygenase, superoxide dismutase and catalase A increased during the accumulation of carotenoids, thus suggesting the activation of a mechanism aimed at the protection of cell structures from oxidative stress during carotenoid biosynthesis. The data presented herein, besides being suitable for the elucidation of the mechanisms that underlie carotenoid biosynthesis, will contribute to boosting the biotechnological potential of this yeast by improving the outcome of further research efforts aimed at also exploring other features of interest.

  20. The Serratia gene cluster encoding biosynthesis of the red antibiotic, prodigiosin, shows species- and strain-dependent genome context variation

    DEFF Research Database (Denmark)

    Harris, Abigail K P; Williamson, Neil R; Slater, Holly

    2004-01-01

    The prodigiosin biosynthesis gene cluster (pig cluster) from two strains of Serratia (S. marcescens ATCC 274 and Serratia sp. ATCC 39006) has been cloned, sequenced and expressed in heterologous hosts. Sequence analysis of the respective pig clusters revealed 14 ORFs in S. marcescens ATCC 274...... and 15 ORFs in Serratia sp. ATCC 39006. In each Serratia species, predicted gene products showed similarity to polyketide synthases (PKSs), non-ribosomal peptide synthases (NRPSs) and the Red proteins of Streptomyces coelicolor A3(2). Comparisons between the two Serratia pig clusters and the red cluster...... from Str. coelicolor A3(2) revealed some important differences. A modified scheme for the biosynthesis of prodigiosin, based on the pathway recently suggested for the synthesis of undecylprodigiosin, is proposed. The distribution of the pig cluster within several Serratia sp. isolates is demonstrated...

  1. Histone and Ribosomal RNA Repetitive Gene Clusters of the Boll Weevil are Linked in a Tandem Array

    Science.gov (United States)

    Histones are the major protein component of chromatin structure. The histone family is made up of a quintet of proteins, four core histones (H2A, H2B, H3 & H4) and the linker histones (H1). Spacers are found between the coding regions. Among insects this quintet of genes is usually clustered and ...

  2. The medaka novel immune-type receptor (NITR gene clusters reveal an extraordinary degree of divergence in variable domains

    Directory of Open Access Journals (Sweden)

    Litman Gary W

    2008-06-01

    Full Text Available Abstract Background Novel immune-type receptor (NITR genes are members of diversified multigene families that are found in bony fish and encode type I transmembrane proteins containing one or two extracellular immunoglobulin (Ig domains. The majority of NITRs can be classified as inhibitory receptors that possess cytoplasmic immunoreceptor tyrosine-based inhibition motifs (ITIMs. A much smaller number of NITRs can be classified as activating receptors by the lack of cytoplasmic ITIMs and presence of a positively charged residue within their transmembrane domain, which permits partnering with an activating adaptor protein. Results Forty-four NITR genes in medaka (Oryzias latipes are located in three gene clusters on chromosomes 10, 18 and 21 and can be organized into 24 families including inhibitory and activating forms. The particularly large dataset acquired in medaka makes direct comparison possible to another complete dataset acquired in zebrafish in which NITRs are localized in two clusters on different chromosomes. The two largest medaka NITR gene clusters share conserved synteny with the two zebrafish NITR gene clusters. Shared synteny between NITRs and CD8A/CD8B is limited but consistent with a potential common ancestry. Conclusion Comprehensive phylogenetic analyses between the complete datasets of NITRs from medaka and zebrafish indicate multiple species-specific expansions of different families of NITRs. The patterns of sequence variation among gene family members are consistent with recent birth-and-death events. Similar effects have been observed with mammalian immunoglobulin (Ig, T cell antigen receptor (TCR and killer cell immunoglobulin-like receptor (KIR genes. NITRs likely diverged along an independent pathway from that of the somatically rearranging antigen binding receptors but have undergone parallel evolution of V family diversity.

  3. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

    Directory of Open Access Journals (Sweden)

    Olszewski Kellen L

    2007-07-01

    Full Text Available Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes. Results We developed Nearest Neighbor Networks (NNN, a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the

  4. Structural studies of the Enterococcus faecalis SufU [Fe-S] cluster protein

    Directory of Open Access Journals (Sweden)

    Frazzon Jeverson

    2009-02-01

    Full Text Available Abstract Background Iron-sulfur clusters are ubiquitous and evolutionarily ancient inorganic prosthetic groups, the biosynthesis of which depends on complex protein machineries. Three distinct assembly systems involved in the maturation of cellular Fe-S proteins have been determined, designated the NIF, ISC and SUF systems. Although well described in several organisms, these machineries are poorly understood in Gram-positive bacteria. Within the Firmicutes phylum, the Enterococcus spp. genus have recently assumed importance in clinical microbiology being considered as emerging pathogens for humans, wherein Enterococcus faecalis represents the major species associated with nosocomial infections. The aim of this study was to carry out a phylogenetic analysis in Enterococcus faecalis V583 and a structural and conformational characterisation of it SufU protein. Results BLAST searches of the Enterococcus genome revealed a series of genes with sequence similarity to the Escherichia coli SUF machinery of [Fe-S] cluster biosynthesis, namely sufB, sufC, sufD and SufS. In addition, the E. coli IscU ortholog SufU was found to be the scaffold protein of Enterococcus spp., containing all features considered essential for its biological activity, including conserved amino acid residues involved in substrate and/or co-factor binding (Cys50,76,138 and Asp52 and, phylogenetic analyses showed a close relationship with orthologues from other Gram-positive bacteria. Molecular dynamics for structural determinations and molecular modeling using E. faecalis SufU primary sequence protein over the PDB:1su0 crystallographic model from Streptococcus pyogenes were carried out with a subsequent 50 ns molecular dynamic trajectory. This presented a stable model, showing secondary structure modifications near the active site and conserved cysteine residues. Molecular modeling using Haemophilus influenzae IscU primary sequence over the PDB:1su0 crystal followed by a MD

  5. Unusual Gene Order and Organization of the Sea Urchin HoxCluster

    Energy Technology Data Exchange (ETDEWEB)

    Richardson, Paul M.; Lucas, Susan; Cameron, R. Andrew; Rowen,Lee; Nesbitt, Ryan; Bloom, Scott; Rast, Jonathan P.; Berney, Kevin; Arenas-Mena, Cesar; Martinez, Pedro; Davidson, Eric H.; Peterson, KevinJ.; Hood, Leroy

    2005-05-10

    The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3' gene is Hox5. (The gene order is : 5'-Hox1,2, 3, 11/13c, 11/13b, '11/13a, 9/10, 8, 7, 6, 5 - 3)'. The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.

  6. Identification of a novel prophage-like gene cluster actively expressed in both virulent and avirulent strains of Leptospira interrogans serovar Lai.

    Science.gov (United States)

    Qin, Jin-Hong; Zhang, Qing; Zhang, Zhi-Ming; Zhong, Yi; Yang, Yang; Hu, Bao-Yu; Zhao, Guo-Ping; Guo, Xiao-Kui

    2008-06-01

    DNA microarray analysis was used to compare the differential gene expression profiles between Leptospira interrogans serovar Lai type strain 56601 and its corresponding attenuated strain IPAV. A 22-kb genomic island covering a cluster of 34 genes (i.e., genes LA0186 to LA0219) was actively expressed in both strains but concomitantly upregulated in strain 56601 in contrast to that of IPAV. Reverse transcription-PCR assays proved that the gene cluster comprised five transcripts. Gene annotation of this cluster revealed characteristics of a putative prophage-like remnant with at least 8 of 34 sequences encoding prophage-like proteins, of which the LA0195 protein is probably a putative prophage CI-like regulator. The transcription initiation activities of putative promoter-regulatory sequences of transcripts I, II, and III, all proximal to the LA0195 gene, were further analyzed in the Escherichia coli promoter probe vector pKK232-8 by assaying the reporter chloramphenicol acetyltransferase (CAT) activities. The strong promoter activities of both transcripts I and II indicated by the E. coli CAT assay were well correlated with the in vitro sequence-specific binding of the recombinant LA0195 protein to the corresponding promoter probes detected by the electrophoresis mobility shift assay. On the other hand, the promoter activity of transcript III was very low in E. coli and failed to show active binding to the LA0195 protein in vitro. These results suggested that the LA0195 protein is likely involved in the transcription of transcripts I and II. However, the identical complete DNA sequences of this prophage remnant from these two strains strongly suggests that possible regulatory factors or signal transduction systems residing outside of this region within the genome may be responsible for the differential expression profiling in these two strains.

  7. Gene expression patterns of oxidative phosphorylation complex I subunits are organized in clusters.

    Directory of Open Access Journals (Sweden)

    Yael Garbian

    Full Text Available After the radiation of eukaryotes, the NUO operon, controlling the transcription of the NADH dehydrogenase complex of the oxidative phosphorylation system (OXPHOS complex I, was broken down and genes encoding this protein complex were dispersed across the nuclear genome. Seven genes, however, were retained in the genome of the mitochondrion, the ancient symbiote of eukaryotes. This division, in combination with the three-fold increase in subunit number from bacteria (N = approximately 14 to man (N = 45, renders the transcription regulation of OXPHOS complex I a challenge. Recently bioinformatics analysis of the promoter regions of all OXPHOS genes in mammals supported patterns of co-regulation, suggesting that natural selection favored a mechanism facilitating the transcriptional regulatory control of genes encoding subunits of these large protein complexes. Here, using real time PCR of mitochondrial (mtDNA- and nuclear DNA (nDNA-encoded transcripts in a panel of 13 different human tissues, we show that the expression pattern of OXPHOS complex I genes is regulated in several clusters. Firstly, all mtDNA-encoded complex I subunits (N = 7 share a similar expression pattern, distinct from all tested nDNA-encoded subunits (N = 10. Secondly, two sub-clusters of nDNA-encoded transcripts with significantly different expression patterns were observed. Thirdly, the expression patterns of two nDNA-encoded genes, NDUFA4 and NDUFA5, notably diverged from the rest of the nDNA-encoded subunits, suggesting a certain degree of tissue specificity. Finally, the expression pattern of the mtDNA-encoded ND4L gene diverged from the rest of the tested mtDNA-encoded transcripts that are regulated by the same promoter, consistent with post-transcriptional regulation. These findings suggest, for the first time, that the regulation of complex I subunits expression in humans is complex rather than reflecting global co-regulation.

  8. A Metabolic Gene Cluster in the Wheat W1 and the Barley Cer-cqu Loci Determines β-Diketone Biosynthesis and Glaucousness.

    Science.gov (United States)

    Hen-Avivi, Shelly; Savin, Orna; Racovita, Radu C; Lee, Wing-Sham; Adamski, Nikolai M; Malitsky, Sergey; Almekias-Siegl, Efrat; Levy, Matan; Vautrin, Sonia; Bergès, Hélène; Friedlander, Gilgi; Kartvelishvily, Elena; Ben-Zvi, Gil; Alkan, Noam; Uauy, Cristobal; Kanyuka, Kostya; Jetter, Reinhard; Distelfeld, Assaf; Aharoni, Asaph

    2016-06-01

    The glaucous appearance of wheat (Triticum aestivum) and barley (Hordeum vulgare) plants, that is the light bluish-gray look of flag leaf, stem, and spike surfaces, results from deposition of cuticular β-diketone wax on their surfaces; this phenotype is associated with high yield, especially under drought conditions. Despite extensive genetic and biochemical characterization, the molecular genetic basis underlying the biosynthesis of β-diketones remains unclear. Here, we discovered that the wheat W1 locus contains a metabolic gene cluster mediating β-diketone biosynthesis. The cluster comprises genes encoding proteins of several families including type-III polyketide synthases, hydrolases, and cytochrome P450s related to known fatty acid hydroxylases. The cluster region was identified in both genetic and physical maps of glaucous and glossy tetraploid wheat, demonstrating entirely different haplotypes in these accessions. Complementary evidence obtained through gene silencing in planta and heterologous expression in bacteria supports a model for a β-diketone biosynthesis pathway involving members of these three protein families. Mutations in homologous genes were identified in the barley eceriferum mutants defective in β-diketone biosynthesis, demonstrating a gene cluster also in the β-diketone biosynthesis Cer-cqu locus in barley. Hence, our findings open new opportunities to breed major cereal crops for surface features that impact yield and stress response. © 2016 American Society of Plant Biologists. All rights reserved.

  9. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  10. Hox gene cluster of the ascidian, Halocynthia roretzi, reveals multiple ancient steps of cluster disintegration during ascidian evolution.

    Science.gov (United States)

    Sekigami, Yuka; Kobayashi, Takuya; Omi, Ai; Nishitsuji, Koki; Ikuta, Tetsuro; Fujiyama, Asao; Satoh, Noriyuki; Saiga, Hidetoshi

    2017-01-01

    Hox gene clusters with at least 13 paralog group (PG) members are common in vertebrate genomes and in that of amphioxus. Ascidians, which belong to the subphylum Tunicata (Urochordata), are phylogenetically positioned between vertebrates and amphioxus, and traditionally divided into two groups: the Pleurogona and the Enterogona. An enterogonan ascidian, Ciona intestinalis ( Ci ), possesses nine Hox genes localized on two chromosomes; thus, the Hox gene cluster is disintegrated. We investigated the Hox gene cluster of a pleurogonan ascidian, Halocynthia roretzi ( Hr ) to investigate whether Hox gene cluster disintegration is common among ascidians, and if so, how such disintegration occurred during ascidian or tunicate evolution. Our phylogenetic analysis reveals that the Hr Hox gene complement comprises nine members, including one with a relatively divergent Hox homeodomain sequence. Eight of nine Hr Hox genes were orthologous to Ci-Hox1 , 2, 3, 4, 5, 10, 12 and 13. Following the phylogenetic classification into 13 PGs, we designated Hr Hox genes as Hox1, 2, 3, 4, 5, 10, 11/12/13.a , 11/12/13.b and HoxX . To address the chromosomal arrangement of the nine Hox genes, we performed two-color chromosomal fluorescent in situ hybridization, which revealed that the nine Hox genes are localized on a single chromosome in Hr , distinct from their arrangement in Ci . We further examined the order of the nine Hox genes on the chromosome by chromosome/scaffold walking. This analysis suggested a gene order of Hox1 , 11/12/13.b, 11/12/13.a, 10, 5, X, followed by either Hox4, 3, 2 or Hox2, 3, 4 on the chromosome. Based on the present results and those previously reported in Ci , we discuss the establishment of the Hox gene complement and disintegration of Hox gene clusters during the course of ascidian or tunicate evolution. The Hox gene cluster and the genome must have experienced extensive reorganization during the course of evolution from the ancestral tunicate to Hr and Ci

  11. Calcitonin gene-related peptide antagonism and cluster headache

    DEFF Research Database (Denmark)

    Ashina, Håkan; Newman, Lawrence; Ashina, Sait

    2017-01-01

    Calcitonin gene-related peptide (CGRP) is a key signaling molecule involved in migraine pathophysiology. Efficacy of CGRP monoclonal antibodies and antagonists in migraine treatment has fueled an increasing interest in the prospect of treating cluster headache (CH) with CGRP antagonism. The exact...... role of CGRP and its mechanism of action in CH have not been fully clarified. A search for original studies and randomized controlled trials (RCTs) published in English was performed in PubMed and in ClinicalTrials.gov . The search term used was "cluster headache and calcitonin gene related peptide......" and "primary headaches and calcitonin gene related peptide." Reference lists of identified articles were also searched for additional relevant papers. Human experimental studies have reported elevated plasma CGRP levels during both spontaneous and glyceryl trinitrate-induced cluster attacks. CGRP may play...

  12. Characterization of an M-Cluster-Substituted Nitrogenase VFe Protein.

    Science.gov (United States)

    Rebelein, Johannes G; Lee, Chi Chung; Newcomb, Megan; Hu, Yilin; Ribbe, Markus W

    2018-03-13

    The Mo- and V-nitrogenases are two homologous members of the nitrogenase family that are distinguished mainly by the presence of different heterometals (Mo or V) at their respective cofactor sites (M- or V-cluster). However, the V-nitrogenase is ~600-fold more active than its Mo counterpart in reducing CO to hydrocarbons at ambient conditions. Here, we expressed an M-cluster-containing, hybrid V-nitrogenase in Azotobacter vinelandii and compared it to its native, V-cluster-containing counterpart in order to assess the impact of protein scaffold and cofactor species on the differential reactivities of Mo- and V-nitrogenases toward CO. Housed in the VFe protein component of V-nitrogenase, the M-cluster displayed electron paramagnetic resonance (EPR) features similar to those of the V-cluster and demonstrated an ~100-fold increase in hydrocarbon formation activity from CO reduction, suggesting a significant impact of protein environment on the overall CO-reducing activity of nitrogenase. On the other hand, the M-cluster was still ~6-fold less active than the V-cluster in the same protein scaffold, and it retained its inability to form detectable amounts of methane from CO reduction, illustrating a fine-tuning effect of the cofactor properties on this nitrogenase-catalyzed reaction. Together, these results provided important insights into the two major determinants for the enzymatic activity of CO reduction while establishing a useful framework for further elucidation of the essential catalytic elements for the CO reactivity of nitrogenase. IMPORTANCE This is the first report on the in vivo generation and in vitro characterization of an M-cluster-containing V-nitrogenase hybrid. The "normalization" of the protein scaffold to that of the V-nitrogenase permits a direct comparison between the cofactor species of the Mo- and V-nitrogenases (M- and V-clusters) in CO reduction, whereas the discrepancy between the protein scaffolds of the Mo- and V-nitrogenases (MoFe and VFe

  13. Plasmid Complement of Lactococcus lactis NCDO712 Reveals a Novel Pilus Gene Cluster.

    Science.gov (United States)

    Tarazanova, Mariya; Beerthuyzen, Marke; Siezen, Roland; Fernandez-Gutierrez, Marcela M; de Jong, Anne; van der Meulen, Sjoerd; Kok, Jan; Bachmann, Herwig

    2016-01-01

    Lactococcus lactis MG1363 is an important gram-positive model organism. It is a plasmid-free and phage-cured derivative of strain NCDO712. Plasmid-cured strains facilitate studies on molecular biological aspects, but many properties which make L. lactis an important organism in the dairy industry are plasmid encoded. We sequenced the total DNA of strain NCDO712 and, contrary to earlier reports, revealed that the strain carries 6 rather than 5 plasmids. A new 50-kb plasmid, designated pNZ712, encodes functional nisin immunity (nisCIP) and copper resistance (lcoRSABC). The copper resistance could be used as a marker for the conjugation of pNZ712 to L. lactis MG1614. A genome comparison with the plasmid cured daughter strain MG1363 showed that the number of single nucleotide polymorphisms that accumulated in the laboratory since the strains diverted more than 30 years ago is limited to 11 of which only 5 lead to amino acid changes. The 16-kb plasmid pSH74 was found to contain a novel 8-kb pilus gene cluster spaCB-spaA-srtC1-srtC2, which is predicted to encode a pilin tip protein SpaC, a pilus basal subunit SpaB, and a pilus backbone protein SpaA. The sortases SrtC1/SrtC2 are most likely involved in pilus polymerization while the chromosomally encoded SrtA could act to anchor the pilus to peptidoglycan in the cell wall. Overexpression of the pilus gene cluster from a multi-copy plasmid in L. lactis MG1363 resulted in cell chaining, aggregation, rapid sedimentation and increased conjugation efficiency of the cells. Electron microscopy showed that the over-expression of the pilus gene cluster leads to appendices on the cell surfaces. A deletion of the gene encoding the putative basal protein spaB, by truncating spaCB, led to more pilus-like structures on the cell surface, but cell aggregation and cell chaining were no longer observed. This is consistent with the prediction that spaB is involved in the anchoring of the pili to the cell.

  14. Supported silver clusters as nanoplasmonic transducers for protein sensing

    DEFF Research Database (Denmark)

    Fojan, Peter; Hanif, Muhammad; Bartling, Stephen

    2015-01-01

    Transducers for optical sensing of proteins are prepared using cluster beam deposition on quartz substrates. Surface plasmon resonance phenomenon of the supported silver clusters is used for the detection. It is shown that surface immobilisation procedure providing adhesion of the silver clusters...... stages and protein immobilisation scheme the sensing of protein of interest can be assured using a relatively simple optical spectroscopy method....... an enhancement of the plasmon absorption band used for the detection. Atomic force microscopy study allows to suggest that immobilisation of antibodies on silver clusters has been achieved, thus giving a possibility to incubate and detect an antigen of interest. Hence, by applying the developed preparation...

  15. A phylogenomic gene cluster resource: The phylogeneticallyinferred groups (PhlGs) database

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir S.; Boore, Jeffrey L.

    2005-08-25

    We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.

  16. An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information

    Directory of Open Access Journals (Sweden)

    Ao Li

    2009-04-01

    Full Text Available Motivation: Bi-clustering algorithms aim to identify sets of genes sharing similar expression patterns across a subset of conditions. However direct interpretation or prediction of gene regulatory mechanisms may be difficult as only gene expression data is used. Information about gene regulators may also be available, most commonly about which transcription factors may bind to the promoter region and thus control the expression level of a gene. Thus a method to integrate gene expression and gene regulation information is desirable for clustering and analyzing. Methods: By incorporating gene regulatory information with gene expression data, we define regulated expression values (REV as indicators of how a gene is regulated by a specific factor. Existing bi-clustering methods are extended to a three dimensional data space by developing a heuristic TRI-Clustering algorithm. An additional approach named Automatic Boundary Searching algorithm (ABS is introduced to automatically determine the boundary threshold. Results: Results based on incorporating ChIP-chip data representing transcription factor-gene interactions show that the algorithms are efficient and robust for detecting tri-clusters. Detailed analysis of the tri-cluster extracted from yeast sporulation REV data shows genes in this cluster exhibited significant differences during the middle and late stages. The implicated regulatory network was then reconstructed for further study of defined regulatory mechanisms. Topological and statistical analysis of this network demonstrated evidence of significant changes of TF activities during the different stages of yeast sporulation, and suggests this approach might be a general way to study regulatory networks undergoing transformations.

  17. Transcriptional profiling of protein expression related genes of Pichia pastoris under simulated microgravity.

    Directory of Open Access Journals (Sweden)

    Feng Qi

    Full Text Available The physiological responses and transcription profiling of Pichia pastoris GS115 to simulated microgravity (SMG were substantially changed compared with normal gravity (NG control. We previously reported that the recombinant P. pastoris grew faster under SMG than NG during methanol induction phase and the efficiencies of recombinant enzyme production and secretion were enhanced under SMG, which was considered as the consequence of changed transcriptional levels of some key genes. In this work, transcriptiome profiling of P. pastoris cultured under SMG and NG conditions at exponential and stationary phases were determined using next-generation sequencing (NGS technologies. Four categories of 141 genes function as methanol utilization, protein chaperone, RNA polymerase and protein transportation or secretion classified according to Gene Ontology (GO were chosen to be analyzed on the basis of NGS results. And 80 significantly changed genes were weighted and estimated by Cluster 3.0. It was found that most genes of methanol metabolism (85% of 20 genes and protein transportation or secretion (82.2% of 45 genes were significantly up-regulated under SMG. Furthermore the quantity and fold change of up-regulated genes in exponential phase of each category were higher than those of stationary phase. The results indicate that the up-regulated genes of methanol metabolism and protein transportation or secretion mainly contribute to enhanced production and secretion of the recombinant protein under SMG.

  18. Global Analysis of miRNA Gene Clusters and Gene Families Reveals Dynamic and Coordinated Expression

    Directory of Open Access Journals (Sweden)

    Li Guo

    2014-01-01

    Full Text Available To further understand the potential expression relationships of miRNAs in miRNA gene clusters and gene families, a global analysis was performed in 4 paired tumor (breast cancer and adjacent normal tissue samples using deep sequencing datasets. The compositions of miRNA gene clusters and families are not random, and clustered and homologous miRNAs may have close relationships with overlapped miRNA species. Members in the miRNA group always had various expression levels, and even some showed larger expression divergence. Despite the dynamic expression as well as individual difference, these miRNAs always indicated consistent or similar deregulation patterns. The consistent deregulation expression may contribute to dynamic and coordinated interaction between different miRNAs in regulatory network. Further, we found that those clustered or homologous miRNAs that were also identified as sense and antisense miRNAs showed larger expression divergence. miRNA gene clusters and families indicated important biological roles, and the specific distribution and expression further enrich and ensure the flexible and robust regulatory network.

  19. Interaction of the iron–sulfur cluster assembly protein IscU with the Hsc66/Hsc20 molecular chaperone system of Escherichia coli

    Science.gov (United States)

    Hoff, Kevin G.; Silberg, Jonathan J.; Vickery, Larry E.

    2000-01-01

    The iscU gene in bacteria is located in a gene cluster encoding proteins implicated in iron–sulfur cluster assembly and an hsc70-type (heat shock cognate) molecular chaperone system, iscSUA-hscBA. To investigate possible interactions between these systems, we have overproduced and purified the IscU protein from Escherichia coli and have studied its interactions with the hscA and hscB gene products Hsc66 and Hsc20. IscU and its iron–sulfur complex (IscU–Fe/S) stimulated the basal steady-state ATPase activity of Hsc66 weakly in the absence of Hsc20 but, in the presence of Hsc20, increased the ATPase activity up to 480-fold. Hsc20 also decreased the apparent Km for IscU stimulation of Hsc66 ATPase activity, and surface plasmon resonance studies revealed that Hsc20 enhances binding of IscU to Hsc66. Surface plasmon resonance and isothermal titration calorimetry further showed that IscU and Hsc20 form a complex, and Hsc20 may thereby aid in the targeting of IscU to Hsc66. These results establish a direct and specific role for the Hsc66/Hsc20 chaperone system in functioning with isc gene components for the assembly of iron–sulfur cluster proteins. PMID:10869428

  20. Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters.

    Science.gov (United States)

    Zhang, Huixian; Ravi, Vydianathan; Tay, Boon-Hui; Tohari, Sumanty; Pillai, Nisha E; Prasad, Aravind; Lin, Qiang; Brenner, Sydney; Venkatesh, Byrappa

    2017-08-22

    ParaHox genes ( Gsx , Pdx , and Cdx ) are an ancient family of developmental genes closely related to the Hox genes. They play critical roles in the patterning of brain and gut. The basal chordate, amphioxus, contains a single ParaHox cluster comprising one member of each family, whereas nonteleost jawed vertebrates contain four ParaHox genomic loci with six or seven ParaHox genes. Teleosts, which have experienced an additional whole-genome duplication, contain six ParaHox genomic loci with six ParaHox genes. Jawless vertebrates, represented by lampreys and hagfish, are the most ancient group of vertebrates and are crucial for understanding the origin and evolution of vertebrate gene families. We have previously shown that lampreys contain six Hox gene loci. Here we report that lampreys contain only two ParaHox gene clusters (designated as α- and β-clusters) bearing five ParaHox genes ( Gsxα , Pdxα , Cdxα , Gsxβ , and Cdxβ ). The order and orientation of the three genes in the α-cluster are identical to that of the single cluster in amphioxus. However, the orientation of Gsxβ in the β-cluster is inverted. Interestingly, Gsxβ is expressed in the eye, unlike its homologs in jawed vertebrates, which are expressed mainly in the brain. The lamprey Pdxα is expressed in the pancreas similar to jawed vertebrate Pdx genes, indicating that the pancreatic expression of Pdx was acquired before the divergence of jawless and jawed vertebrate lineages. It is likely that the lamprey Pdxα plays a crucial role in pancreas specification and insulin production similar to the Pdx of jawed vertebrates.

  1. Regulation of the Apolipoprotein Gene Cluster by a Long Noncoding RNA

    Directory of Open Access Journals (Sweden)

    Paul Halley

    2014-01-01

    Full Text Available Apolipoprotein A1 (APOA1 is the major protein component of high-density lipoprotein (HDL in plasma. We have identified an endogenously expressed long noncoding natural antisense transcript, APOA1-AS, which acts as a negative transcriptional regulator of APOA1 both in vitro and in vivo. Inhibition of APOA1-AS in cultured cells resulted in the increased expression of APOA1 and two neighboring genes in the APO cluster. Chromatin immunoprecipitation (ChIP analyses of a ∼50 kb chromatin region flanking the APOA1 gene demonstrated that APOA1-AS can modulate distinct histone methylation patterns that mark active and/or inactive gene expression through the recruitment of histone-modifying enzymes. Targeting APOA1-AS with short antisense oligonucleotides also enhanced APOA1 expression in both human and monkey liver cells and induced an increase in hepatic RNA and protein expression in African green monkeys. Furthermore, the results presented here highlight the significant local modulatory effects of long noncoding antisense RNAs and demonstrate the therapeutic potential of manipulating the expression of these transcripts both in vitro and in vivo.

  2. Evaluation of clustering algorithms for protein-protein interaction networks

    Directory of Open Access Journals (Sweden)

    van Helden Jacques

    2006-11-01

    Full Text Available Abstract Background Protein interactions are crucial components of all cellular processes. Recently, high-throughput methods have been developed to obtain a global description of the interactome (the whole network of protein interactions for a given organism. In 2002, the yeast interactome was estimated to contain up to 80,000 potential interactions. This estimate is based on the integration of data sets obtained by various methods (mass spectrometry, two-hybrid methods, genetic studies. High-throughput methods are known, however, to yield a non-negligible rate of false positives, and to miss a fraction of existing interactions. The interactome can be represented as a graph where nodes correspond with proteins and edges with pairwise interactions. In recent years clustering methods have been developed and applied in order to extract relevant modules from such graphs. These algorithms require the specification of parameters that may drastically affect the results. In this paper we present a comparative assessment of four algorithms: Markov Clustering (MCL, Restricted Neighborhood Search Clustering (RNSC, Super Paramagnetic Clustering (SPC, and Molecular Complex Detection (MCODE. Results A test graph was built on the basis of 220 complexes annotated in the MIPS database. To evaluate the robustness to false positives and false negatives, we derived 41 altered graphs by randomly removing edges from or adding edges to the test graph in various proportions. Each clustering algorithm was applied to these graphs with various parameter settings, and the clusters were compared with the annotated complexes. We analyzed the sensitivity of the algorithms to the parameters and determined their optimal parameter values. We also evaluated their robustness to alterations of the test graph. We then applied the four algorithms to six graphs obtained from high-throughput experiments and compared the resulting clusters with the annotated complexes. Conclusion This

  3. Human major histocompatibility complex contains a minimum of 19 genes between the complement cluster and HLA-B

    International Nuclear Information System (INIS)

    Spies, T.; Bresnahan, M.; Strominger, J.L.

    1989-01-01

    A 600-kilobase (kb) DNA segment from the human major histocompatibility complex (MHC) class III region was isolated by extension of a previous 435-kb chromosome walk. The contiguous series of cloned overlapping cosmids contains the entire 555-kb interval between C2 in the complement gene cluster and HLA-B. This region is known to encode the tumor necrosis factors (TNFs) α and β, B144, and the major heat shock protein HSP70. Moreover, a cluster of genes, BAT1-BAT5 (HLA-B-associated transcripts) have been localized in the vicinity of the genes for TNFα and TNFβ. An additional four genes were identified by isolation of corresponding cDNA clones with cosmid DNA probes. These genes for BAT6-BAT9 were mapped near the gene for C2 within a 120-kb region that includes a HSP70 gene pair. These results, together with complementary data from a similar recent study, indicated the presence of a minimum of 19 genes within the C2-HLA-B interval of the MHC class III region. Although the functional properties of most of these genes are yet unknown, they may be involved in some aspects of immunity. This idea is supported by the genetic mapping of the hematopoietic histocompatibility locus-1 (Hh-1) in recombinant mice between TNFα and H-2S, which is homologous to the complement gene cluster in humans

  4. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes.

    Science.gov (United States)

    Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N

    2017-01-04

    Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Genomic characterization of a new endophytic Streptomyces kebangsaanensis identifies biosynthetic pathway gene clusters for novel phenazine antibiotic production

    Directory of Open Access Journals (Sweden)

    Juwairiah Remali

    2017-11-01

    Full Text Available Background Streptomyces are well known for their capability to produce many bioactive secondary metabolites with medical and industrial importance. Here we report a novel bioactive phenazine compound, 6-((2-hydroxy-4-methoxyphenoxy carbonyl phenazine-1-carboxylic acid (HCPCA extracted from Streptomyces kebangsaanensis, an endophyte isolated from the ethnomedicinal Portulaca oleracea. Methods The HCPCA chemical structure was determined using nuclear magnetic resonance spectroscopy. We conducted whole genome sequencing for the identification of the gene cluster(s believed to be responsible for phenazine biosynthesis in order to map its corresponding pathway, in addition to bioinformatics analysis to assess the potential of S. kebangsaanensis in producing other useful secondary metabolites. Results The S. kebangsaanensis genome comprises an 8,328,719 bp linear chromosome with high GC content (71.35% consisting of 12 rRNA operons, 81 tRNA, and 7,558 protein coding genes. We identified 24 gene clusters involved in polyketide, nonribosomal peptide, terpene, bacteriocin, and siderophore biosynthesis, as well as a gene cluster predicted to be responsible for phenazine biosynthesis. Discussion The HCPCA phenazine structure was hypothesized to derive from the combination of two biosynthetic pathways, phenazine-1,6-dicarboxylic acid and 4-methoxybenzene-1,2-diol, originated from the shikimic acid pathway. The identification of a biosynthesis pathway gene cluster for phenazine antibiotics might facilitate future genetic engineering design of new synthetic phenazine antibiotics. Additionally, these findings confirm the potential of S. kebangsaanensis for producing various antibiotics and secondary metabolites.

  6. The hybrid-cluster protein ('prismane protein') from Escherichia coli. Characterization of the hybrid-cluster protein, redox properties of the [2Fe-2S] and [4Fe-2S-2O] clusters and identification of an associated NADH oxidoreductase containing FAD and[2Fe-2S

    NARCIS (Netherlands)

    Berg, van den W.A.M.; Hagen, W.R.; Dongen, van W.M.A.M.

    2000-01-01

    Hybrid-cluster proteins ('prismane proteins') have previously been isolated and characterized from strictly anaerobic sulfate-reducing bacteria. These proteins contain two types of Fe/S clusters unique in biological systems: a [4Fe-4S] cubane cluster with spin-admixed S = 3/2 ground-state

  7. Evaluation of gene-expression clustering via mutual information distance measure

    Directory of Open Access Journals (Sweden)

    Maimon Oded

    2007-03-01

    Full Text Available Abstract Background The definition of a distance measure plays a key role in the evaluation of different clustering solutions of gene expression profiles. In this empirical study we compare different clustering solutions when using the Mutual Information (MI measure versus the use of the well known Euclidean distance and Pearson correlation coefficient. Results Relying on several public gene expression datasets, we evaluate the homogeneity and separation scores of different clustering solutions. It was found that the use of the MI measure yields a more significant differentiation among erroneous clustering solutions. The proposed measure was also used to analyze the performance of several known clustering algorithms. A comparative study of these algorithms reveals that their "best solutions" are ranked almost oppositely when using different distance measures, despite the found correspondence between these measures when analysing the averaged scores of groups of solutions. Conclusion In view of the results, further attention should be paid to the selection of a proper distance measure for analyzing the clustering of gene expression data.

  8. The Pacific Ocean virome (POV: a marine viral metagenomic dataset and associated protein clusters for quantitative viral ecology.

    Directory of Open Access Journals (Sweden)

    Bonnie L Hurwitz

    Full Text Available Bacteria and their viruses (phage are fundamental drivers of many ecosystem processes including global biogeochemistry and horizontal gene transfer. While databases and resources for studying function in uncultured bacterial communities are relatively advanced, many fewer exist for their viral counterparts. The issue is largely technical in that the majority (often 90% of viral sequences are functionally 'unknown' making viruses a virtually untapped resource of functional and physiological information. Here, we provide a community resource that organizes this unknown sequence space into 27 K high confidence protein clusters using 32 viral metagenomes from four biogeographic regions in the Pacific Ocean that vary by season, depth, and proximity to land, and include some of the first deep pelagic ocean viral metagenomes. These protein clusters more than double currently available viral protein clusters, including those from environmental datasets. Further, a protein cluster guided analysis of functional diversity revealed that richness decreased (i from deep to surface waters, (ii from winter to summer, (iii and with distance from shore in surface waters only. These data provide a framework from which to draw on for future metadata-enabled functional inquiries of the vast viral unknown.

  9. The Pacific Ocean virome (POV): a marine viral metagenomic dataset and associated protein clusters for quantitative viral ecology.

    Science.gov (United States)

    Hurwitz, Bonnie L; Sullivan, Matthew B

    2013-01-01

    Bacteria and their viruses (phage) are fundamental drivers of many ecosystem processes including global biogeochemistry and horizontal gene transfer. While databases and resources for studying function in uncultured bacterial communities are relatively advanced, many fewer exist for their viral counterparts. The issue is largely technical in that the majority (often 90%) of viral sequences are functionally 'unknown' making viruses a virtually untapped resource of functional and physiological information. Here, we provide a community resource that organizes this unknown sequence space into 27 K high confidence protein clusters using 32 viral metagenomes from four biogeographic regions in the Pacific Ocean that vary by season, depth, and proximity to land, and include some of the first deep pelagic ocean viral metagenomes. These protein clusters more than double currently available viral protein clusters, including those from environmental datasets. Further, a protein cluster guided analysis of functional diversity revealed that richness decreased (i) from deep to surface waters, (ii) from winter to summer, (iii) and with distance from shore in surface waters only. These data provide a framework from which to draw on for future metadata-enabled functional inquiries of the vast viral unknown.

  10. Minimum Information about a Biosynthetic Gene cluster : commentary

    NARCIS (Netherlands)

    Medema, Marnix H; Kottmann, Renzo; Yilmaz, Pelin; Cummings, Matthew; Biggins, John B; Blin, Kai; de Bruijn, Irene; Chooi, Yit Heng; Claesen, Jan; Coates, R Cameron; Cruz-Morales, Pablo; Duddela, Srikanth; Dusterhus, Stephanie; Edwards, Daniel J; Fewer, David P; Garg, Neha; Geiger, Christoph; Gomez-Escribano, Juan Pablo; Greule, Anja; Hadjithomas, Michalis; Haines, Anthony S; Helfrich, Eric J N; Hillwig, Matthew L; Ishida, Keishi; Jones, Adam C; Jones, Carla S; Jungmann, Katrin; Kegler, Carsten; Kim, Hyun Uk; Kotter, Peter; Krug, Daniel; Masschelein, Joleen; Melnik, Alexey V; Mantovani, Simone M; Monroe, Emily A; Moore, Marcus; Moss, Nathan; Nutzmann, Hans-Wilhelm; Pan, Guohui; Pati, Amrita; Petras, Daniel; Reen, F Jerry; Rosconi, Federico; Rui, Zhe; Tian, Zhenhua; Tobias, Nicholas J; Tsunematsu, Yuta; Wiemann, Philipp; Wyckoff, Elizabeth; Yan, Xiaohui; Yim, Grace; Yu, Fengan; Xie, Yunchang; Aigle, Bertrand; Apel, Alexander K; Balibar, Carl J; Balskus, Emily P; Barona-Gomez, Francisco; Bechthold, Andreas; Bode, Helge B; Borriss, Rainer; Brady, Sean F; Brakhage, Axel A; Caffrey, Patrick; Cheng, Yi-Qiang; Clardy, Jon; Cox, Russell J; De Mot, Rene; Donadio, Stefano; Donia, Mohamed S; van der Donk, Wilfred A; Dorrestein, Pieter C; Doyle, Sean; Driessen, Arnold J M; Ehling-Schulz, Monika; Entian, Karl-Dieter; Fischbach, Michael A; Gerwick, Lena; Gerwick, William H; Gross, Harald; Gust, Bertolt; Hertweck, Christian; Hofte, Monica; Jensen, Susan E; Ju, Jianhua; Katz, Leonard; Kaysser, Leonard; Klassen, Jonathan L; Keller, Nancy P; Kormanec, Jan; Kuipers, Oscar P; Kuzuyama, Tomohisa; Kyrpides, Nikos C; Kwon, Hyung-Jin; Lautru, Sylvie; Lavigne, Rob; Lee, Chia Y; Linquan, Bai; Liu, Xinyu; Liu, Wen; Luzhetskyy, Andriy; Mahmud, Taifo; Mast, Yvonne; Mendez, Carmen; Metsa-Ketela, Mikko; Micklefield, Jason; Mitchell, Douglas A; Moore, Bradley S; Moreira, Leonilde M; Muller, Rolf; Neilan, Brett A; Nett, Markus; Nielsen, Jens; O'Gara, Fergal; Oikawa, Hideaki; Osbourn, Anne; Osburne, Marcia S; Ostash, Bohdan; Payne, Shelley M; Pernodet, Jean-Luc; Petricek, Miroslav; Piel, Jorn; Ploux, Olivier; Raaijmakers, Jos M; Salas, Jose A; Schmitt, Esther K; Scott, Barry; Seipke, Ryan F; Shen, Ben; Sherman, David H; Sivonen, Kaarina; Smanski, Michael J; Sosio, Margherita; Stegmann, Evi; Sussmuth, Roderich D; Tahlan, Kapil; Thomas, Christopher M; Tang, Yi; Truman, Andrew W; Viaud, Muriel; Walton, Jonathan D; Walsh, Christopher T; Weber, Tilmann; van Wezel, Gilles P; Wilkinson, Barrie; Willey, Joanne M; Wohlleben, Wolfgang; Wright, Gerard D; Ziemert, Nadine; Zhang, Changsheng; Zotchev, Sergey B; Breitling, Rainer; Takano, Eriko; Glockner, Frank Oliver

    A wide variety of enzymatic pathways that produce specialized metabolites in bacteria, fungi and plants are known to be encoded in biosynthetic gene clusters. Information about these clusters, pathways and metabolites is currently dispersed throughout the literature, making it difficult to exploit.

  11. Overproduction of lactimidomycin by cross-overexpression of genes encoding Streptomyces antibiotic regulatory proteins.

    Science.gov (United States)

    Zhang, Bo; Yang, Dong; Yan, Yijun; Pan, Guohui; Xiang, Wensheng; Shen, Ben

    2016-03-01

    The glutarimide-containing polyketides represent a fascinating class of natural products that exhibit a multitude of biological activities. We have recently cloned and sequenced the biosynthetic gene clusters for three members of the glutarimide-containing polyketides-iso-migrastatin (iso-MGS) from Streptomyces platensis NRRL 18993, lactimidomycin (LTM) from Streptomyces amphibiosporus ATCC 53964, and cycloheximide (CHX) from Streptomyces sp. YIM56141. Comparative analysis of the three clusters identified mgsA and chxA, from the mgs and chx gene clusters, respectively, that were predicted to encode the PimR-like Streptomyces antibiotic regulatory proteins (SARPs) but failed to reveal any regulatory gene from the ltm gene cluster. Overexpression of mgsA or chxA in S. platensis NRRL 18993, Streptomyces sp. YIM56141 or SB11024, and a recombinant strain of Streptomyces coelicolor M145 carrying the intact mgs gene cluster has no significant effect on iso-MGS or CHX production, suggesting that MgsA or ChxA regulation may not be rate-limiting for iso-MGS and CHX production in these producers. In contrast, overexpression of mgsA or chxA in S. amphibiosporus ATCC 53964 resulted in a significant increase in LTM production, with LTM titer reaching 106 mg/L, which is five-fold higher than that of the wild-type strain. These results support MgsA and ChxA as members of the SARP family of positive regulators for the iso-MGS and CHX biosynthetic machinery and demonstrate the feasibility to improve glutarimide-containing polyketide production in Streptomyces strains by exploiting common regulators.

  12. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering.

    Science.gov (United States)

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  13. Protein aggregates and novel presenilin gene variants in idiopathic dilated cardiomyopathy.

    Science.gov (United States)

    Gianni, Davide; Li, Airong; Tesco, Giuseppina; McKay, Kenneth M; Moore, John; Raygor, Kunal; Rota, Marcello; Gwathmey, Judith K; Dec, G William; Aretz, Thomas; Leri, Annarosa; Semigran, Marc J; Anversa, Piero; Macgillivray, Thomas E; Tanzi, Rudolph E; del Monte, Federica

    2010-03-16

    Heart failure is a debilitating condition resulting in severe disability and death. In a subset of cases, clustered as idiopathic dilated cardiomyopathy (iDCM), the origin of heart failure is unknown. In the brain of patients with dementia, proteinaceous aggregates and abnormal oligomeric assemblies of beta-amyloid impair cell function and lead to cell death. We have similarly characterized fibrillar and oligomeric assemblies in the hearts of iDCM patients, pointing to abnormal protein aggregation as a determinant of iDCM. We also showed that oligomers alter myocyte Ca(2+) homeostasis. Additionally, we have identified 2 new sequence variants in the presenilin-1 (PSEN1) gene promoter leading to reduced gene and protein expression. We also show that presenilin-1 coimmunoprecipitates with SERCA2a. On the basis of these findings, we propose that 2 mechanisms may link protein aggregation and cardiac function: oligomer-induced changes on Ca(2+) handling and a direct effect of PSEN1 sequence variants on excitation-contraction coupling protein function.

  14. Pichia stipitis genomics, transcriptomics, and gene clusters

    Science.gov (United States)

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  15. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    OpenAIRE

    Wolf Yuri I; Novichkov Pavel S; Sorokin Alexander V; Makarova Kira S; Koonin Eugene V

    2007-01-01

    Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs ...

  16. Distinct cell clusters touching islet cells induce islet cell replication in association with over-expression of Regenerating Gene (REG protein in fulminant type 1 diabetes.

    Directory of Open Access Journals (Sweden)

    Kaoru Aida

    Full Text Available BACKGROUND: Pancreatic islet endocrine cell-supporting architectures, including islet encapsulating basement membranes (BMs, extracellular matrix (ECM, and possible cell clusters, are unclear. PROCEDURES: The architectures around islet cell clusters, including BMs, ECM, and pancreatic acinar-like cell clusters, were studied in the non-diabetic state and in the inflamed milieu of fulminant type 1 diabetes in humans. RESULT: Immunohistochemical and electron microscopy analyses demonstrated that human islet cell clusters and acinar-like cell clusters adhere directly to each other with desmosomal structures and coated-pit-like structures between the two cell clusters. The two cell-clusters are encapsulated by a continuous capsule composed of common BMs/ECM. The acinar-like cell clusters have vesicles containing regenerating (REG Iα protein. The vesicles containing REG Iα protein are directly secreted to islet cells. In the inflamed milieu of fulminant type 1 diabetes, the acinar-like cell clusters over-expressed REG Iα protein. Islet endocrine cells, including beta-cells and non-beta cells, which were packed with the acinar-like cell clusters, show self-replication with a markedly increased number of Ki67-positive cells. CONCLUSION: The acinar-like cell clusters touching islet endocrine cells are distinct, because the cell clusters are packed with pancreatic islet clusters and surrounded by common BMs/ECM. Furthermore, the acinar-like cell clusters express REG Iα protein and secrete directly to neighboring islet endocrine cells in the non-diabetic state, and the cell clusters over-express REG Iα in the inflamed milieu of fulminant type 1 diabetes with marked self-replication of islet cells.

  17. Detection of secondary structure elements in proteins by hydrophobic cluster analysis.

    Science.gov (United States)

    Woodcock, S; Mornon, J P; Henrissat, B

    1992-10-01

    Hydrophobic cluster analysis (HCA) is a protein sequence comparison method based on alpha-helical representations of the sequences where the size, shape and orientation of the clusters of hydrophobic residues are primarily compared. The effectiveness of HCA has been suggested to originate from its potential ability to focus on the residues forming the hydrophobic core of globular proteins. We have addressed the robustness of the bidimensional representation used for HCA in its ability to detect the regular secondary structure elements of proteins. Various parameters have been studied such as those governing cluster size and limits, the hydrophobic residues constituting the clusters as well as the potential shift of the cluster positions with respect to the position of the regular secondary structure elements. The following results have been found to support the alpha-helical bidimensional representation used in HCA: (i) there is a positive correlation (clearly above background noise) between the hydrophobic clusters and the regular secondary structure elements in proteins; (ii) the hydrophobic clusters are centred on the regular secondary structure elements; (iii) the pitch of the helical representation which gives the best correspondence is that of an alpha-helix. The correspondence between hydrophobic clusters and regular secondary structure elements suggests a way to implement variable gap penalties during the automatic alignment of protein sequences.

  18. Open reading frame 176 in the photosynthesis gene cluster of Rhodobacter capsulatus encodes idi, a gene for isopentenyl diphosphate isomerase.

    OpenAIRE

    Hahn, F M; Baker, J A; Poulter, C D

    1996-01-01

    Isopentenyl diphosphate (IPP) isomerase catalyzes an essential activation step in the isoprenoid biosynthetic pathway. A database search based on probes from the highly conserved regions in three eukaryotic IPP isomerases revealed substantial similarity with ORF176 in the photosynthesis gene cluster in Rhodobacter capsulatus. The open reading frame was cloned into an Escherichia coli expression vector. The encoded 20-kDa protein, which was purified in two steps by ion exchange and hydrophobic...

  19. Two different secondary metabolism gene clusters occupied the same ancestral locus in fungal dermatophytes of the arthrodermataceae.

    Science.gov (United States)

    Zhang, Han; Rokas, Antonis; Slot, Jason C

    2012-01-01

    Dermatophyte fungi of the family Arthrodermataceae (Eurotiomycetes) colonize keratinized tissue, such as skin, frequently causing superficial mycoses in humans and other mammals, reptiles, and birds. Competition with native microflora likely underlies the propensity of these dermatophytes to produce a diversity of antibiotics and compounds for scavenging iron, which is extremely scarce, as well as the presence of an unusually large number of putative secondary metabolism gene clusters, most of which contain non-ribosomal peptide synthetases (NRPS), in their genomes. To better understand the historical origins and diversification of NRPS-containing gene clusters we examined the evolution of a variable locus (VL) that exists in one of three alternative conformations among the genomes of seven dermatophyte species. The first conformation of the VL (termed VLA) contains only 539 base pairs of sequence and lacks protein-coding genes, whereas the other two conformations (termed VLB and VLC) span 36 Kb and 27 Kb and contain 12 and 10 genes, respectively. Interestingly, both VLB and VLC appear to contain distinct secondary metabolism gene clusters; VLB contains a NRPS gene as well as four porphyrin metabolism genes never found to be physically linked in the genomes of 128 other fungal species, whereas VLC also contains a NRPS gene as well as several others typically found associated with secondary metabolism gene clusters. Phylogenetic evidence suggests that the VL locus was present in the ancestor of all seven species achieving its present distribution through subsequent differential losses or retentions of specific conformations. We propose that the existence of variable loci, similar to the one we studied, in fungal genomes could potentially explain the dramatic differences in secondary metabolic diversity between closely related species of filamentous fungi, and contribute to host adaptation and the generation of metabolic diversity.

  20. Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.

    Science.gov (United States)

    Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray

    2004-01-01

    One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.

  1. GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

    Science.gov (United States)

    Schulz, Tizian; Stoye, Jens; Doerr, Daniel

    2018-05-08

    Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.

  2. Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture.

    Science.gov (United States)

    Johnson, Timothy A; Stedtfeld, Robert D; Wang, Qiong; Cole, James R; Hashsham, Syed A; Looft, Torey; Zhu, Yong-Guan; Tiedje, James M

    2016-04-12

    Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if

  3. Network based approaches reveal clustering in protein point patterns

    Science.gov (United States)

    Parker, Joshua; Barr, Valarie; Aldridge, Joshua; Samelson, Lawrence E.; Losert, Wolfgang

    2014-03-01

    Recent advances in super-resolution imaging have allowed for the sub-diffraction measurement of the spatial location of proteins on the surfaces of T-cells. The challenge is to connect these complex point patterns to the internal processes and interactions, both protein-protein and protein-membrane. We begin analyzing these patterns by forming a geometric network amongst the proteins and looking at network measures, such the degree distribution. This allows us to compare experimentally observed patterns to models. Specifically, we find that the experimental patterns differ from heterogeneous Poisson processes, highlighting an internal clustering structure. Further work will be to compare our results to simulated protein-protein interactions to determine clustering mechanisms.

  4. Analysis of hepatocellular carcinoma and metastatic hepatic carcinoma via functional modules in a protein-protein interaction network

    Directory of Open Access Journals (Sweden)

    Jun Pan

    2014-01-01

    Full Text Available Introduction: This study aims to identify protein clusters with potential functional relevance in the pathogenesis of hepatocellular carcinoma (HCC and metastatic hepatic carcinoma using network analysis. Materials and Methods: We used human protein interaction data to build a protein-protein interaction network with Cytoscape and then derived functional clusters using MCODE. Combining the gene expression profiles, we calculated the functional scores for the clusters and selected statistically significant clusters. Meanwhile, Gene Ontology was used to assess the functionality of these clusters. Finally, a support vector machine was trained on the gold standard data sets. Results: The differentially expressed genes of HCC were mainly involved in metabolic and signaling processes. We acquired 13 significant modules from the gene expression profiles. The area under the curve value based on the differentially expressed modules were 98.31%, which outweighed the classification with DEGs. Conclusions: Differentially expressed modules are valuable to screen biomarkers combined with functional modules.

  5. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Directory of Open Access Journals (Sweden)

    Landfors Mattias

    2010-10-01

    Full Text Available Abstract Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered, missing value imputation (2, standardization of data (2, gene selection (19 or clustering method (11. The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that

  6. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Science.gov (United States)

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is

  7. An indigoidine biosynthetic gene cluster from Streptomyces chromofuscus ATCC 49982 contains an unusual IndB homologue.

    Science.gov (United States)

    Yu, Dayu; Xu, Fuchao; Valiente, Jonathan; Wang, Siyuan; Zhan, Jixun

    2013-01-01

    A putative indigoidine biosynthetic gene cluster was located in the genome of Streptomyces chromofuscus ATCC 49982. The silent 9.4-kb gene cluster consists of five open reading frames, named orf1, Sc-indC, Sc-indA, Sc-indB, and orf2, respectively. Sc-IndC was functionally characterized as an indigoidine synthase through heterologous expression of the enzyme in both Streptomyces coelicolor CH999 and Escherichia coli BAP1. The yield of indigoidine in E. coli BAP1 reached 2.78 g/l under the optimized conditions. The predicted protein product of Sc-indB is unusual and much larger than any other reported IndB-like protein. The N-terminal portion of this enzyme resembles IdgB and the C-terminal portion is a hypothetical protein. Sc-IndA and/or Sc-IndB were co-expressed with Sc-IndC in E. coli BAP1, which demonstrated the involvement of Sc-IndB, but not Sc-IndA, in the biosynthetic pathway of indigoidine. The yield of indigoidine was dramatically increased by 41.4 % (3.93 g/l) when Sc-IndB was co-expressed with Sc-IndC in E. coli BAP1. Indigoidine is more stable at low temperatures.

  8. Combined protein construct and synthetic gene engineering for heterologous protein expression and crystallization using Gene Composer

    Directory of Open Access Journals (Sweden)

    Walchli John

    2009-04-01

    Full Text Available Abstract Background With the goal of improving yield and success rates of heterologous protein production for structural studies we have developed the database and algorithm software package Gene Composer. This freely available electronic tool facilitates the information-rich design of protein constructs and their engineered synthetic gene sequences, as detailed in the accompanying manuscript. Results In this report, we compare heterologous protein expression levels from native sequences to that of codon engineered synthetic gene constructs designed by Gene Composer. A test set of proteins including a human kinase (P38α, viral polymerase (HCV NS5B, and bacterial structural protein (FtsZ were expressed in both E. coli and a cell-free wheat germ translation system. We also compare the protein expression levels in E. coli for a set of 11 different proteins with greatly varied G:C content and codon bias. Conclusion The results consistently demonstrate that protein yields from codon engineered Gene Composer designs are as good as or better than those achieved from the synonymous native genes. Moreover, structure guided N- and C-terminal deletion constructs designed with the aid of Gene Composer can lead to greater success in gene to structure work as exemplified by the X-ray crystallographic structure determination of FtsZ from Bacillus subtilis. These results validate the Gene Composer algorithms, and suggest that using a combination of synthetic gene and protein construct engineering tools can improve the economics of gene to structure research.

  9. Clustering evolving proteins into homologous families.

    Science.gov (United States)

    Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A

    2013-04-08

    Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better

  10. Cluster analysis of historical and modern hard red spring wheat cultivars based on parentage and HPLC analysis of gluten forming proteins

    Science.gov (United States)

    In this study, 30 hard red spring (HRS) wheat cultivars released between 1910 and 2013 were analyzed to determine how they cluster in terms of parentage and protein data, analyzed by reverse-phase HPLC (RP-HPLC) of gliadins, and size-exclusion HPLC (SE-HPLC) of unreduced proteins. Dwarfing genes in...

  11. Effect of mitochondrial complex I inhibition on Fe-S cluster protein activity

    Energy Technology Data Exchange (ETDEWEB)

    Mena, Natalia P. [Department of Biology, Faculty of Sciences, Universidad de Chile, Las Palmeras 3425, Santiago (Chile); Millennium Institute of Cell Dynamics and Biotechnology, Santiago (Chile); Bulteau, Anne Laure [UPMC Univ Paris 06, UMRS 975 - UMR 7725, Centre de Recherche en Neurosciences, ICM, Therapeutique Experimentale de la Neurodegenerescence, Hopital de la Salpetriere, F-75005 Paris (France); Inserm, U 975, Centre de Recherche en Neurosciences, ICM, Therapeutique Experimentale de la Neurodegenerescence, Hopital de la Salpetriere, F-75005 Paris (France); CNRS, UMR 7225, Centre de Recherche en Neurosciences, ICM, Therapeutique Experimentale de la Neurodegenerescence, Hopital de la Salpetriere, F-75005 Paris (France); ICM, Therapeutique Experimentale de la Neurodegenerescence, Hopital de la Salpetriere, Paris 75013 (France); Salazar, Julio [Millennium Institute of Cell Dynamics and Biotechnology, Santiago (Chile); Hirsch, Etienne C. [UPMC Univ Paris 06, UMRS 975 - UMR 7725, Centre de Recherche en Neurosciences, ICM, Therapeutique Experimentale de la Neurodegenerescence, Hopital de la Salpetriere, F-75005 Paris (France); Inserm, U 975, Centre de Recherche en Neurosciences, ICM, Therapeutique Experimentale de la Neurodegenerescence, Hopital de la Salpetriere, F-75005 Paris (France); CNRS, UMR 7225, Centre de Recherche en Neurosciences, ICM, Therapeutique Experimentale de la Neurodegenerescence, Hopital de la Salpetriere, F-75005 Paris (France); ICM, Therapeutique Experimentale de la Neurodegenerescence, Hopital de la Salpetriere, Paris 75013 (France); Nunez, Marco T., E-mail: mnunez@uchile.cl [Department of Biology, Faculty of Sciences, Universidad de Chile, Las Palmeras 3425, Santiago (Chile); Millennium Institute of Cell Dynamics and Biotechnology, Santiago (Chile)

    2011-06-03

    Highlights: {yields} Mitochondrial complex I inhibition resulted in decreased activity of Fe-S containing enzymes mitochondrial aconitase and cytoplasmic aconitase and xanthine oxidase. {yields} Complex I inhibition resulted in the loss of Fe-S clusters in cytoplasmic aconitase and of glutamine phosphoribosyl pyrophosphate amidotransferase. {yields} Consistent with loss of cytoplasmic aconitase activity, an increase in iron regulatory protein 1 activity was found. {yields} Complex I inhibition resulted in an increase in the labile cytoplasmic iron pool. -- Abstract: Iron-sulfur (Fe-S) clusters are small inorganic cofactors formed by tetrahedral coordination of iron atoms with sulfur groups. Present in numerous proteins, these clusters are involved in key biological processes such as electron transfer, metabolic and regulatory processes, DNA synthesis and repair and protein structure stabilization. Fe-S clusters are synthesized mainly in the mitochondrion, where they are directly incorporated into mitochondrial Fe-S cluster-containing proteins or exported for cytoplasmic and nuclear cluster-protein assembly. In this study, we tested the hypothesis that inhibition of mitochondrial complex I by rotenone decreases Fe-S cluster synthesis and cluster content and activity of Fe-S cluster-containing enzymes. Inhibition of complex I resulted in decreased activity of three Fe-S cluster-containing enzymes: mitochondrial and cytosolic aconitases and xanthine oxidase. In addition, the Fe-S cluster content of glutamine phosphoribosyl pyrophosphate amidotransferase and mitochondrial aconitase was dramatically decreased. The reduction in cytosolic aconitase activity was associated with an increase in iron regulatory protein (IRP) mRNA binding activity and with an increase in the cytoplasmic labile iron pool. Since IRP activity post-transcriptionally regulates the expression of iron import proteins, Fe-S cluster inhibition may result in a false iron deficiency signal. Given that

  12. Effect of mitochondrial complex I inhibition on Fe-S cluster protein activity

    International Nuclear Information System (INIS)

    Mena, Natalia P.; Bulteau, Anne Laure; Salazar, Julio; Hirsch, Etienne C.; Nunez, Marco T.

    2011-01-01

    Highlights: → Mitochondrial complex I inhibition resulted in decreased activity of Fe-S containing enzymes mitochondrial aconitase and cytoplasmic aconitase and xanthine oxidase. → Complex I inhibition resulted in the loss of Fe-S clusters in cytoplasmic aconitase and of glutamine phosphoribosyl pyrophosphate amidotransferase. → Consistent with loss of cytoplasmic aconitase activity, an increase in iron regulatory protein 1 activity was found. → Complex I inhibition resulted in an increase in the labile cytoplasmic iron pool. -- Abstract: Iron-sulfur (Fe-S) clusters are small inorganic cofactors formed by tetrahedral coordination of iron atoms with sulfur groups. Present in numerous proteins, these clusters are involved in key biological processes such as electron transfer, metabolic and regulatory processes, DNA synthesis and repair and protein structure stabilization. Fe-S clusters are synthesized mainly in the mitochondrion, where they are directly incorporated into mitochondrial Fe-S cluster-containing proteins or exported for cytoplasmic and nuclear cluster-protein assembly. In this study, we tested the hypothesis that inhibition of mitochondrial complex I by rotenone decreases Fe-S cluster synthesis and cluster content and activity of Fe-S cluster-containing enzymes. Inhibition of complex I resulted in decreased activity of three Fe-S cluster-containing enzymes: mitochondrial and cytosolic aconitases and xanthine oxidase. In addition, the Fe-S cluster content of glutamine phosphoribosyl pyrophosphate amidotransferase and mitochondrial aconitase was dramatically decreased. The reduction in cytosolic aconitase activity was associated with an increase in iron regulatory protein (IRP) mRNA binding activity and with an increase in the cytoplasmic labile iron pool. Since IRP activity post-transcriptionally regulates the expression of iron import proteins, Fe-S cluster inhibition may result in a false iron deficiency signal. Given that inhibition of complex

  13. Comparative study of human mitochondrial proteome reveals extensive protein subcellular relocalization after gene duplications

    Directory of Open Access Journals (Sweden)

    Huang Yong

    2009-11-01

    Full Text Available Abstract Background Gene and genome duplication is the principle creative force in evolution. Recently, protein subcellular relocalization, or neolocalization was proposed as one of the mechanisms responsible for the retention of duplicated genes. This hypothesis received support from the analysis of yeast genomes, but has not been tested thoroughly on animal genomes. In order to evaluate the importance of subcellular relocalizations for retention of duplicated genes in animal genomes, we systematically analyzed nuclear encoded mitochondrial proteins in the human genome by reconstructing phylogenies of mitochondrial multigene families. Results The 456 human mitochondrial proteins selected for this study were clustered into 305 gene families including 92 multigene families. Among the multigene families, 59 (64% consisted of both mitochondrial and cytosolic (non-mitochondrial proteins (mt-cy families while the remaining 33 (36% were composed of mitochondrial proteins (mt-mt families. Phylogenetic analyses of mt-cy families revealed three different scenarios of their neolocalization following gene duplication: 1 relocalization from mitochondria to cytosol, 2 from cytosol to mitochondria and 3 multiple subcellular relocalizations. The neolocalizations were most commonly enabled by the gain or loss of N-terminal mitochondrial targeting signals. The majority of detected subcellular relocalization events occurred early in animal evolution, preceding the evolution of tetrapods. Mt-mt protein families showed a somewhat different pattern, where gene duplication occurred more evenly in time. However, for both types of protein families, most duplication events appear to roughly coincide with two rounds of genome duplications early in vertebrate evolution. Finally, we evaluated the effects of inaccurate and incomplete annotation of mitochondrial proteins and found that our conclusion of the importance of subcellular relocalization after gene duplication on

  14. The Lepidoptera Odorant Binding Protein gene family: Gene gain and loss within the GOBP/PBP complex of moths and butterflies.

    Science.gov (United States)

    Vogt, Richard G; Große-Wilde, Ewald; Zhou, Jing-Jiang

    2015-07-01

    Butterflies and moths differ significantly in their daily activities: butterflies are diurnal while moths are largely nocturnal or crepuscular. This life history difference is presumably reflected in their sensory biology, and especially the balance between the use of chemical versus visual signals. Odorant Binding Proteins (OBP) are a class of insect proteins, at least some of which are thought to orchestrate the transfer of odor molecules within an olfactory sensillum (olfactory organ), between the air and odor receptor proteins (ORs) on the olfactory neurons. A Lepidoptera specific subclass of OBPs are the GOBPs and PBPs; these were the first OBPs studied and have well documented associations with olfactory sensilla. We have used the available genomes of two moths, Manduca sexta and Bombyx mori, and two butterflies, Danaus plexippus and Heliconius melpomene, to characterize the GOBP/PBP genes, attempting to identify gene orthologs and document specific gene gain and loss. First, we identified the full repertoire of OBPs in the M. sexta genome, and compared these with the full repertoire of OBPs from the other three lepidopteran genomes, the OBPs of Drosophila melanogaster and select OBPs from other Lepidoptera. We also evaluated the tissue specific expression of the M. sexta OBPs using an available RNAseq databases. In the four lepidopteran species, GOBP2 and all PBPs reside in single gene clusters; in two species GOBP1 is documented to be nearby, about 100 kb from the cluster; all GOBP/PBP genes share a common gene structure indicating a common origin. As such, the GOBP/PBP genes form a gene complex. Our findings suggest that (1) the lepidopteran GOBP/PBP complex is a monophyletic lineage with origins deep within Lepidoptera phylogeny, (2) within this lineage PBP gene evolution is much more dynamic than GOBP gene evolution, and (3) butterflies may have lost a PBP gene that plays an important role in moth pheromone detection, correlating with a shift from

  15. Inactivation of human α-globin gene expression by a de novo deletion located upstream of the α-globin gene cluster

    International Nuclear Information System (INIS)

    Liebhaber, S.A.; Weiss, I.; Cash, F.E.; Griese, E.U.; Horst, J.; Ayyub, H.; Higgs, D.R.

    1990-01-01

    Synthesis of normal human hemoglobin A, α 2 β 2 , is based upon balanced expression of genes in the α-globin gene cluster on chromosome 15 and the β-globin gene cluster on chromosome 11. Full levels of erythroid-specific activation of the β-globin cluster depend on sequences located at a considerable distance 5' to the β-globin gene, referred to as the locus-activating or dominant control region. The existence of an analogous element(s) upstream of the α-globin cluster has been suggested from observations on naturally occurring deletions and experimental studies. The authors have identified an individual with α-thalassemia in whom structurally normal α-globin genes have been inactivated in cis by a discrete de novo 35-kilobase deletion located ∼30 kilobases 5' from the α-globin gene cluster. They conclude that this deletion inactivates expression of the α-globin genes by removing one or more of the previously identified upstream regulatory sequences that are critical to expression of the α-globin genes

  16. Dominant control region of the human β- like globin gene cluster

    NARCIS (Netherlands)

    Blom van Assendelft, Margaretha van

    1989-01-01

    The structure and regulation of the human β -like globin gene cluster has been studied extensively. Genetic disorders connected with this gene cluster are responsible for human diseases associated with high levels of morbidity and mortality, such as β-thalassaemia and sickle cell anaemia. The work

  17. K-nearest uphill clustering in the protein structure space

    KAUST Repository

    Cui, Xuefeng; Gao, Xin

    2016-01-01

    The protein structure classification problem, which is to assign a protein structure to a cluster of similar proteins, is one of the most fundamental problems in the construction and application of the protein structure space. Early manually curated

  18. Protein Aggregates and Novel Presenilin Gene Variants in Idiopathic Dilated Cardiomyopathy

    Science.gov (United States)

    Gianni, Davide; Li, Airong; Tesco, Giuseppina; McKay, Kenneth M.; Moore, John; Raygor, Kunal; Rota, Marcello; Gwathmey, Judith K; Dec, G William; Aretz, Thomas; Leri, Annarosa; Semigran, Marc J; Anversa, Piero; Macgillivray, Thomas E; Tanzi, Rudolph E.; Monte, Federica del

    2010-01-01

    Background Heart failure (HF) is a debilitating condition resulting in severe disability and death. In a subset of cases, clustered as Idiopathic Dilated Cardiomyopathy (iDCM), the origin of HF is unknown. In the brain of patients with dementia, proteinaceous aggregates and abnormal oligomeric assemblies of β-amyloid impair cell function and lead to cell death. Methods and Results We have similarly characterized fibrillar and oligomeric assemblies in the hearts of iDCM patients pointing to abnormal protein aggregation as a determinant of iDCM. We also showed that oligomers alter myocyte Ca2+ homeostasis. Additionally, we have identified two new sequence variants in the presenilin-1 (PSEN1) gene promoter leading to reduced gene and protein expression. We also show that presenilin-1 co-immunoprecipitates with SERCA2a. Conclusions Based on these findings we propose that two mechanisms may link protein aggregation and cardiac function: oligomer-induced changes on Ca2+ handling and a direct effect of PSEN1 sequence variants on EC-coupling protein function. PMID:20194882

  19. Hessian regularization based non-negative matrix factorization for gene expression data clustering.

    Science.gov (United States)

    Liu, Xiao; Shi, Jun; Wang, Congzhi

    2015-01-01

    Since a key step in the analysis of gene expression data is to detect groups of genes that have similar expression patterns, clustering technique is then commonly used to analyze gene expression data. Data representation plays an important role in clustering analysis. The non-negative matrix factorization (NMF) is a widely used data representation method with great success in machine learning. Although the traditional manifold regularization method, Laplacian regularization (LR), can improve the performance of NMF, LR still suffers from the problem of its weak extrapolating power. Hessian regularization (HR) is a newly developed manifold regularization method, whose natural properties make it more extrapolating, especially for small sample data. In this work, we propose the HR-based NMF (HR-NMF) algorithm, and then apply it to represent gene expression data for further clustering task. The clustering experiments are conducted on five commonly used gene datasets, and the results indicate that the proposed HR-NMF outperforms LR-based NMM and original NMF, which suggests the potential application of HR-NMF for gene expression data.

  20. Unveiling network-based functional features through integration of gene expression into protein networks.

    Science.gov (United States)

    Jalili, Mahdi; Gebhardt, Tom; Wolkenhauer, Olaf; Salehzadeh-Yazdi, Ali

    2018-06-01

    Decoding health and disease phenotypes is one of the fundamental objectives in biomedicine. Whereas high-throughput omics approaches are available, it is evident that any single omics approach might not be adequate to capture the complexity of phenotypes. Therefore, integrated multi-omics approaches have been used to unravel genotype-phenotype relationships such as global regulatory mechanisms and complex metabolic networks in different eukaryotic organisms. Some of the progress and challenges associated with integrated omics studies have been reviewed previously in comprehensive studies. In this work, we highlight and review the progress, challenges and advantages associated with emerging approaches, integrating gene expression and protein-protein interaction networks to unravel network-based functional features. This includes identifying disease related genes, gene prioritization, clustering protein interactions, developing the modules, extract active subnetworks and static protein complexes or dynamic/temporal protein complexes. We also discuss how these approaches contribute to our understanding of the biology of complex traits and diseases. This article is part of a Special Issue entitled: Cardiac adaptations to obesity, diabetes and insulin resistance, edited by Professors Jan F.C. Glatz, Jason R.B. Dyck and Christine Des Rosiers. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Conservation of gene linkage in dispersed vertebrate NK homeobox clusters.

    Science.gov (United States)

    Wotton, Karl R; Weierud, Frida K; Juárez-Morales, José L; Alvares, Lúcia E; Dietrich, Susanne; Lewis, Katharine E

    2009-10-01

    Nk homeobox genes are important regulators of many different developmental processes including muscle, heart, central nervous system and sensory organ development. They are thought to have arisen as part of the ANTP megacluster, which also gave rise to Hox and ParaHox genes, and at least some NK genes remain tightly linked in all animals examined so far. The protostome-deuterostome ancestor probably contained a cluster of nine Nk genes: (Msx)-(Nk4/tinman)-(Nk3/bagpipe)-(Lbx/ladybird)-(Tlx/c15)-(Nk7)-(Nk6/hgtx)-(Nk1/slouch)-(Nk5/Hmx). Of these genes, only NKX2.6-NKX3.1, LBX1-TLX1 and LBX2-TLX2 remain tightly linked in humans. However, it is currently unclear whether this is unique to the human genome as we do not know which of these Nk genes are clustered in other vertebrates. This makes it difficult to assess whether the remaining linkages are due to selective pressures or because chance rearrangements have "missed" certain genes. In this paper, we identify all of the paralogs of these ancestrally clustered NK genes in several distinct vertebrates. We demonstrate that tight linkages of Lbx1-Tlx1, Lbx2-Tlx2 and Nkx3.1-Nkx2.6 have been widely maintained in both the ray-finned and lobe-finned fish lineages. Moreover, the recently duplicated Hmx2-Hmx3 genes are also tightly linked. Finally, we show that Lbx1-Tlx1 and Hmx2-Hmx3 are flanked by highly conserved noncoding elements, suggesting that shared regulatory regions may have resulted in evolutionary pressure to maintain these linkages. Consistent with this, these pairs of genes have overlapping expression domains. In contrast, Lbx2-Tlx2 and Nkx3.1-Nkx2.6, which do not seem to be coexpressed, are also not associated with conserved noncoding sequences, suggesting that an alternative mechanism may be responsible for the continued clustering of these genes.

  2. Cluster editing

    DEFF Research Database (Denmark)

    Böcker, S.; Baumbach, Jan

    2013-01-01

    . The problem has been the inspiration for numerous algorithms in bioinformatics, aiming at clustering entities such as genes, proteins, phenotypes, or patients. In this paper, we review exact and heuristic methods that have been proposed for the Cluster Editing problem, and also applications......The Cluster Editing problem asks to transform a graph into a disjoint union of cliques using a minimum number of edge modifications. Although the problem has been proven NP-complete several times, it has nevertheless attracted much research both from the theoretical and the applied side...

  3. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    Energy Technology Data Exchange (ETDEWEB)

    Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,' ' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  4. A Cluster of Five Genes Essential for the Utilization of Dihydroxamate Xenosiderophores in Synechocystis sp. PCC 6803.

    Science.gov (United States)

    Obando S, Tobias A; Babykin, Michael M; Zinchenko, Vladislav V

    2018-05-21

    The unicellular freshwater cyanobacterium Synechocystis sp. PCC 6803 is capable of using dihydroxamate xenosiderophores, either ferric schizokinen (FeSK) or a siderophore of the filamentous cyanobacterium Anabaena variabilis ATCC 29413 (SAV), as the sole source of iron in the TonB-dependent manner. The fecCDEB1-schT gene cluster encoding a siderophore transport system that is involved in the utilization of FeSK and SAV in Synechocystis sp. PCC 6803 was identified. The gene schT encodes TonB-dependent outer membrane transporter, whereas the remaining four genes encode the ABC-type transporter FecB1CDE formed by the periplasmic binding protein FecB1, the transmembrane permease proteins FecC and FecD, and the ATPase FecE. Inactivation of any of these genes resulted in the inability of cells to utilize FeSK and SAV. Our data strongly suggest that Synechocystis sp. PCC 6803 can readily internalize Fe-siderophores via the classic TonB-dependent transport system.

  5. Variation in the fumonisin biosynthetic gene cluster in fumonisin-producing and nonproducing black aspergilli.

    Science.gov (United States)

    Susca, Antonia; Proctor, Robert H; Butchko, Robert A E; Haidukowski, Miriam; Stea, Gaetano; Logrieco, Antonio; Moretti, Antonio

    2014-12-01

    The ability to produce fumonisin mycotoxins varies among members of the black aspergilli. Previously, analyses of selected genes in the fumonisin biosynthetic gene (fum) cluster in black aspergilli from California grapes indicated that fumonisin-nonproducing isolates of Aspergillus welwitschiae lack six fum genes, but nonproducing isolates of Aspergillus niger do not. In the current study, analyses of black aspergilli from grapes from the Mediterranean Basin indicate that the genomic context of the fum cluster is the same in isolates of A. niger and A. welwitschiae regardless of fumonisin-production ability and that full-length clusters occur in producing isolates of both species and nonproducing isolates of A. niger. In contrast, the cluster has undergone an eight-gene deletion in fumonisin-nonproducing isolates of A. welwitschiae. Phylogenetic analyses suggest each species consists of a mixed population of fumonisin-producing and nonproducing individuals, and that existence of both production phenotypes may provide a selective advantage to these species. Differences in gene content of fum cluster homologues and phylogenetic relationships of fum genes suggest that the mutation(s) responsible for the nonproduction phenotype differs, and therefore arose independently, in the two species. Partial fum cluster homologues were also identified in genome sequences of four other black Aspergillus species. Gene content of these partial clusters and phylogenetic relationships of fum sequences indicate that non-random partial deletion of the cluster has occurred multiple times among the species. This in turn suggests that an intact cluster and fumonisin production were once more widespread among black aspergilli. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. A hybrid clustering approach to recognition of protein families in 114 microbial genomes

    Directory of Open Access Journals (Sweden)

    Gogarten J Peter

    2004-04-01

    Full Text Available Abstract Background Grouping proteins into sequence-based clusters is a fundamental step in many bioinformatic analyses (e.g., homology-based prediction of structure or function. Standard clustering methods such as single-linkage clustering capture a history of cluster topologies as a function of threshold, but in practice their usefulness is limited because unrelated sequences join clusters before biologically meaningful families are fully constituted, e.g. as the result of matches to so-called promiscuous domains. Use of the Markov Cluster algorithm avoids this non-specificity, but does not preserve topological or threshold information about protein families. Results We describe a hybrid approach to sequence-based clustering of proteins that combines the advantages of standard and Markov clustering. We have implemented this hybrid approach over a relational database environment, and describe its application to clustering a large subset of PDB, and to 328577 proteins from 114 fully sequenced microbial genomes. To demonstrate utility with difficult problems, we show that hybrid clustering allows us to constitute the paralogous family of ATP synthase F1 rotary motor subunits into a single, biologically interpretable hierarchical grouping that was not accessible using either single-linkage or Markov clustering alone. We describe validation of this method by hybrid clustering of PDB and mapping SCOP families and domains onto the resulting clusters. Conclusion Hybrid (Markov followed by single-linkage clustering combines the advantages of the Markov Cluster algorithm (avoidance of non-specific clusters resulting from matches to promiscuous domains and single-linkage clustering (preservation of topological information as a function of threshold. Within the individual Markov clusters, single-linkage clustering is a more-precise instrument, discerning sub-clusters of biological relevance. Our hybrid approach thus provides a computationally efficient

  7. HOXA genes cluster: clinical implications of the smallest deletion

    OpenAIRE

    Pezzani, Lidia; Milani, Donatella; Manzoni, Francesca; Baccarin, Marco; Silipigni, Rosamaria; Guerneri, Silvana; Esposito, Susanna

    2015-01-01

    Background HOXA genes cluster plays a fundamental role in embryologic development. Deletion of the entire cluster is known to cause a clinically recognizable syndrome with mild developmental delay, characteristic facies, small feet with unusually short and big halluces, abnormal thumbs, and urogenital malformations. The clinical manifestations may vary with different ranges of deletions of HOXA cluster and flanking regions. Case presentation We report a girl with the smallest deletion reporte...

  8. New gene cluster from the thermophile Bacillus fordii MH602 in the conversion of DL-5-substituted hydantoins to L-amino acids.

    Science.gov (United States)

    Mei, Yan-Zhen; Wan, Yong-Min; He, Bing-Fang; Ying, Han-Jie; Ouyang, Ping-Kai

    2009-12-01

    The thermophile Bacillus fordii MH602 was screened for stereospecifically hydrolyzing DL-5-substituted hydantoins to L-alpha-amino acids. Since the reaction at higher temperature, the advantageous for enhancement of substrate solubility and for racemization of DL-5-substituted hydantoins during the conversion were achieved. The hydantoin metabolism gene cluster from thermophile was firstly reported in this paper. The genes involved in hydantoin utilization (hyu) were isolated on an 8.2 kb DNA fragment by Restriction Site-dependent PCR, and six ORFs were identified by DNA sequence analysis. The hyu gene cluster contained four genes with novel cluster organization characteristics: the hydantoinase gene hyuH, putative transport protein hyuP, hyperprotein hyuHP, and L-carbamoylase gene hyuC. The hyuH and hyuC genes were heterogeneously expressed in E. coli. The results indicated that hyuH and hyuC are involved in the conversion of DL-5-substituted hydantoins to an N-carbamyl intermediate that is subsequently converted to L-alpha-amino acids. Hydantoinase and carbamoylase from B. fordii MH602 comparing respectively with reported hydantoinase and carbamoylase showed the highest identities of 71% and 39%. The novel cluster organization characteristics and the difference of the key enzymes between thermopile B. fordii MH602 and other mesophiles were presumed to be related to the evolutionary origins of concerned metabolism.

  9. Lack of Dependence of the Sizes of the Mesoscopic Protein Clusters on Electrostatics.

    Science.gov (United States)

    Vorontsova, Maria A; Chan, Ho Yin; Lubchenko, Vassiliy; Vekilov, Peter G

    2015-11-03

    Protein-rich clusters of steady submicron size and narrow size distribution exist in protein solutions in apparent violation of the classical laws of phase equilibrium. Even though they contain a minor fraction of the total protein, evidence suggests that they may serve as essential precursors for the nucleation of ordered solids such as crystals, sickle-cell hemoglobin polymers, and amyloid fibrils. The cluster formation mechanism remains elusive. We use the highly basic protein lysozyme at nearly neutral and lower pH as a model and explore the response of the cluster population to the electrostatic forces, which govern numerous biophysical phenomena, including crystallization and fibrillization. We tune the strength of intermolecular electrostatic forces by varying the solution ionic strength I and pH and find that despite the weaker repulsion at higher I and pH, the cluster size remains constant. Cluster responses to the presence of urea and ethanol demonstrate that cluster formation is controlled by hydrophobic interactions between the peptide backbones, exposed to the solvent after partial protein unfolding that may lead to transient protein oligomers. These findings reveal that the mechanism of the mesoscopic clusters is fundamentally different from those underlying the two main classes of ordered protein solid phases, crystals and amyloid fibrils, and partial unfolding of the protein chain may play a significant role. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  10. Gene Cluster Responsible for Secretion of and Immunity to Multiple Bacteriocins, the NKR-5-3 Enterocins

    Science.gov (United States)

    Ishibashi, Naoki; Himeno, Kohei; Masuda, Yoshimitsu; Perez, Rodney Honrada; Iwatani, Shun; Wilaipun, Pongtep; Leelawatcharamas, Vichien; Nakayama, Jiro; Sonomoto, Kenji

    2014-01-01

    Enterococcus faecium NKR-5-3, isolated from Thai fermented fish, is characterized by the unique ability to produce five bacteriocins, namely, enterocins NKR-5-3A, -B, -C, -D, and -Z (Ent53A, Ent53B, Ent53C, Ent53D, and Ent53Z). Genetic analysis with a genome library revealed that the bacteriocin structural genes (enkA [ent53A], enkC [ent53C], enkD [ent53D], and enkZ [ent53Z]) that encode these peptides (except for Ent53B) are located in close proximity to each other. This NKR-5-3ACDZ (Ent53ACDZ) enterocin gene cluster (approximately 13 kb long) includes certain bacteriocin biosynthetic genes such as an ABC transporter gene (enkT), two immunity genes (enkIaz and enkIc), a response regulator (enkR), and a histidine protein kinase (enkK). Heterologous-expression studies of enkT and ΔenkT mutant strains showed that enkT is responsible for the secretion of Ent53A, Ent53C, Ent53D, and Ent53Z, suggesting that EnkT is a wide-range ABC transporter that contributes to the effective production of these bacteriocins. In addition, EnkIaz and EnkIc were found to confer self-immunity to the respective bacteriocins. Furthermore, bacteriocin induction assays performed with the ΔenkRK mutant strain showed that EnkR and EnkK are regulatory proteins responsible for bacteriocin production and that, together with Ent53D, they constitute a three-component regulatory system. Thus, the Ent53ACDZ gene cluster is essential for the biosynthesis and regulation of NKR-5-3 enterocins, and this is, to our knowledge, the first report that demonstrates the secretion of multiple bacteriocins by an ABC transporter. PMID:25149515

  11. Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species.

    Directory of Open Access Journals (Sweden)

    Michael B Walker

    Full Text Available Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.

  12. Clustering gene expression regulators: new approach to disease subtyping.

    Directory of Open Access Journals (Sweden)

    Mikhail Pyatnitskiy

    Full Text Available One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms, that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient.

  13. Which clustering algorithm is better for predicting protein complexes?

    Directory of Open Access Journals (Sweden)

    Moschopoulos Charalampos N

    2011-12-01

    Full Text Available Abstract Background Protein-Protein interactions (PPI play a key role in determining the outcome of most cellular processes. The correct identification and characterization of protein interactions and the networks, which they comprise, is critical for understanding the molecular mechanisms within the cell. Large-scale techniques such as pull down assays and tandem affinity purification are used in order to detect protein interactions in an organism. Today, relatively new high-throughput methods like yeast two hybrid, mass spectrometry, microarrays, and phage display are also used to reveal protein interaction networks. Results In this paper we evaluated four different clustering algorithms using six different interaction datasets. We parameterized the MCL, Spectral, RNSC and Affinity Propagation algorithms and applied them to six PPI datasets produced experimentally by Yeast 2 Hybrid (Y2H and Tandem Affinity Purification (TAP methods. The predicted clusters, so called protein complexes, were then compared and benchmarked with already known complexes stored in published databases. Conclusions While results may differ upon parameterization, the MCL and RNSC algorithms seem to be more promising and more accurate at predicting PPI complexes. Moreover, they predict more complexes than other reviewed algorithms in absolute numbers. On the other hand the spectral clustering algorithm achieves the highest valid prediction rate in our experiments. However, it is nearly always outperformed by both RNSC and MCL in terms of the geometrical accuracy while it generates the fewest valid clusters than any other reviewed algorithm. This article demonstrates various metrics to evaluate the accuracy of such predictions as they are presented in the text below. Supplementary material can be found at: http://www.bioacademy.gr/bioinformatics/projects/ppireview.htm

  14. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    Directory of Open Access Journals (Sweden)

    Borui Pi

    Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  15. A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

    Science.gov (United States)

    Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

    2015-01-01

    Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180

  16. ICGE: an R package for detecting relevant clusters and atypical units in gene expression

    Directory of Open Access Journals (Sweden)

    Irigoien Itziar

    2012-02-01

    Full Text Available Abstract Background Gene expression technologies have opened up new ways to diagnose and treat cancer and other diseases. Clustering algorithms are a useful approach with which to analyze genome expression data. They attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. An important problem associated with gene classification is to discern whether the clustering process can find a relevant partition as well as the identification of new genes classes. There are two key aspects to classification: the estimation of the number of clusters, and the decision as to whether a new unit (gene, tumor sample... belongs to one of these previously identified clusters or to a new group. Results ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use. Conclusions We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.

  17. A Link-Based Cluster Ensemble Approach For Improved Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    P.Balaji

    2015-01-01

    Full Text Available Abstract It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset for a defined set of gene expression data because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms this is no more totally optimal. Cluster ensemble research can solve this type of problem by automatically merging multiple data partitions from a wide range of different clusterings of any dimensions to improve both the quality and robustness of the clustering result. But we have many existing ensemble approaches using an association matrix to condense sample-cluster and co-occurrence statistics and relations within the ensemble are encapsulated only at raw level while the existing among clusters are totally discriminated. Finding these missing associations can greatly expand the capability of those ensemble methodologies for microarray data clustering. We propose general K-means cluster ensemble approach for the clustering of general categorical data into required number of partitions.

  18. Genes involved in degradation of para-nitrophenol are differentially arranged in form of non-contiguous gene clusters in Burkholderia sp. strain SJ98.

    Directory of Open Access Journals (Sweden)

    Surendra Vikram

    Full Text Available Biodegradation of para-Nitrophenol (PNP proceeds via two distinct pathways, having 1,2,3-benzenetriol (BT and hydroquinone (HQ as their respective terminal aromatic intermediates. Genes involved in these pathways have already been studied in different PNP degrading bacteria. Burkholderia sp. strain SJ98 degrades PNP via both the pathways. Earlier, we have sequenced and analyzed a ~41 kb fragment from the genomic library of strain SJ98. This DNA fragment was found to harbor all the lower pathway genes; however, genes responsible for the initial transformation of PNP could not be identified within this fragment. Now, we have sequenced and annotated the whole genome of strain SJ98 and found two ORFs (viz., pnpA and pnpB showing maximum identity at amino acid level with p-nitrophenol 4-monooxygenase (PnpM and p-benzoquinone reductase (BqR. Unlike the other PNP gene clusters reported earlier in different bacteria, these two ORFs in SJ98 genome are physically separated from the other genes of PNP degradation pathway. In order to ascertain the identity of ORFs pnpA and pnpB, we have performed in-vitro assays using recombinant proteins heterologously expressed and purified to homogeneity. Purified PnpA was found to be a functional PnpM and transformed PNP into benzoquinone (BQ, while PnpB was found to be a functional BqR which catalyzed the transformation of BQ into hydroquinone (HQ. Noticeably, PnpM from strain SJ98 could also transform a number of PNP analogues. Based on the above observations, we propose that the genes for PNP degradation in strain SJ98 are arranged differentially in form of non-contiguous gene clusters. This is the first report for such arrangement for gene clusters involved in PNP degradation. Therefore, we propose that PNP degradation in strain SJ98 could be an important model system for further studies on differential evolution of PNP degradation functions.

  19. Activation and clustering of a Plasmodium falciparum var gene are affected by subtelomeric sequences.

    Science.gov (United States)

    Duffy, Michael F; Tang, Jingyi; Sumardy, Fransisca; Nguyen, Hanh H T; Selvarajah, Shamista A; Josling, Gabrielle A; Day, Karen P; Petter, Michaela; Brown, Graham V

    2017-01-01

    The Plasmodium falciparum var multigene family encodes the cytoadhesive, variant antigen PfEMP1. P. falciparum antigenic variation and cytoadhesion specificity are controlled by epigenetic switching between the single, or few, simultaneously expressed var genes. Most var genes are maintained in perinuclear clusters of heterochromatic telomeres. The active var gene(s) occupy a single, perinuclear var expression site. It is unresolved whether the var expression site forms in situ at a telomeric cluster or whether it is an extant compartment to which single chromosomes travel, thus controlling var switching. Here we show that transcription of a var gene did not require decreased colocalisation with clusters of telomeres, supporting var expression site formation in situ. However following recombination within adjacent subtelomeric sequences, the same var gene was persistently activated and did colocalise less with telomeric clusters. Thus, participation in stable, heterochromatic, telomere clusters and var switching are independent but are both affected by subtelomeric sequences. The var expression site colocalised with the euchromatic mark H3K27ac to a greater extent than it did with heterochromatic H3K9me3. H3K27ac was enriched within the active var gene promoter even when the var gene was transiently repressed in mature parasites and thus H3K27ac may contribute to var gene epigenetic memory. © 2016 Federation of European Biochemical Societies.

  20. Cloned Erwinia chrysanthemi out genes enable Escherichia coli to selectively secrete a diverse family of heterologous proteins to its milieu.

    Science.gov (United States)

    He, S Y; Lindeberg, M; Chatterjee, A K; Collmer, A

    1991-02-01

    The out genes of the enterobacterial plant pathogen Erwinia chrysanthemi are responsible for the efficient extracellular secretion of multiple plant cell wall-degrading enzymes, including four isozymes of pectate lyase, exo-poly-alpha-D-galacturonosidase, pectin methylesterase, and cellulase. Out- mutants of Er. chrysanthemi are unable to export any of these proteins beyond the periplasm and are severely reduced in virulence. We have cloned out genes from Er. chrysanthemi in the stable, low-copy-number cosmid pCPP19 by complementing several transposon-induced mutations. The cloned out genes were clustered in a 12-kilobase chromosomal DNA region, complemented all existing out mutations in Er. chrysanthemi EC16, and enabled Escherichia coli strains to efficiently secrete the extracellular pectic enzymes produced from cloned Er. chrysanthemi genes, while retaining the periplasmic marker protein beta-lactamase. DNA sequencing of a 2.4-kilobase EcoRI fragment within the out cluster revealed four genes arranged colinearly and sharing substantial similarity with the Klebsiella pneumoniae genes pulH, pulI, pulJ, and pulK, which are necessary for pullulanase secretion. However, K. pneumoniae cells harboring the cloned Er. chrysanthemi pelE gene were unable to secrete the Erwinia pectate lyase. Furthermore, the Er. chrysanthemi Out system was unable to secrete an extracellular pectate lyase encoded by a gene from a closely related plant pathogen. Erwinia carotovora ssp. carotovora. The results suggest that these enterobacteria secrete polysaccharidases by a conserved mechanism whose protein-recognition capacities have diverged.

  1. Physical and genetic map of the major nif gene cluster from Azotobacter vinelandii.

    OpenAIRE

    Jacobson, M R; Brigle, K E; Bennett, L T; Setterquist, R A; Wilson, M S; Cash, V L; Beynon, J; Newton, W E; Dean, D R

    1989-01-01

    Determination of a 28,793-base-pair DNA sequence of a region from the Azotobacter vinelandii genome that includes and flanks the nitrogenase structural gene region was completed. This information was used to revise the previously proposed organization of the major nif cluster. The major nif cluster from A. vinelandii encodes 15 nif-specific genes whose products bear significant structural identity to the corresponding nif-specific gene products from Klebsiella pneumoniae. These genes include ...

  2. The unique fold and lability of the [2Fe-2S] clusters of NEET proteins mediate their key functions in health and disease.

    Science.gov (United States)

    Karmi, Ola; Marjault, Henri-Baptiste; Pesce, Luca; Carloni, Paolo; Onuchic, Jose' N; Jennings, Patricia A; Mittler, Ron; Nechushtai, Rachel

    2018-02-12

    NEET proteins comprise a new class of [2Fe-2S] cluster proteins. In human, three genes encode for NEET proteins: cisd1 encodes mitoNEET (mNT), cisd2 encodes the Nutrient-deprivation autophagy factor-1 (NAF-1) and cisd3 encodes MiNT (Miner2). These recently discovered proteins play key roles in many processes related to normal metabolism and disease. Indeed, NEET proteins are involved in iron, Fe-S, and reactive oxygen homeostasis in cells and play an important role in regulating apoptosis and autophagy. mNT and NAF-1 are homodimeric and reside on the outer mitochondrial membrane. NAF-1 also resides in the membranes of the ER associated mitochondrial membranes (MAM) and the ER. MiNT is a monomer with distinct asymmetry in the molecular surfaces surrounding the clusters. Unlike its paralogs mNT and NAF-1, it resides within the mitochondria. NAF-1 and mNT share similar backbone folds to the plant homodimeric NEET protein (At-NEET), while MiNT's backbone fold resembles a bacterial MiNT protein. Despite the variation of amino acid composition among these proteins, all NEET proteins retained their unique CDGSH domain harboring their unique 3Cys:1His [2Fe-2S] cluster coordination through evolution. The coordinating exposed His was shown to convey the lability to the NEET proteins' [2Fe-2S] clusters. In this minireview, we discuss the NEET fold and its structural elements. Special attention is given to the unique lability of the NEETs' [2Fe-2S] cluster and the implication of the latter to the NEET proteins' cellular and systemic function in health and disease.

  3. GenClust: A genetic algorithm for clustering gene expression data

    Directory of Open Access Journals (Sweden)

    Raimondi Alessandra

    2005-12-01

    Full Text Available Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a a novel coding of the search space that is simple, compact and easy to update; (b it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

  4. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  5. Cloning and expression of three thaumatin-like protein genes from Polyporus umbellatus

    Directory of Open Access Journals (Sweden)

    Mengmeng Liu

    2017-05-01

    Full Text Available Genes encoding thaumatin-like protein (TLPs are frequently found in fungal genomes. However, information on TLP genes in Polyporus umbellatus is still limited. In this study, three TLP genes were cloned from P. umbellatus. The full-length coding sequence of PuTLP1, PuTLP2 and PuTLP3 were 768, 759 and 561 bp long, respectively, encoding for 256, 253 and 187 amino acids. Phylogenetic trees showed that P. umbellatus PuTLP1, PuTLP2 and PuTLP3 were clustered with sequences from Gloeophyllum trabeum, Trametes versicolor and Stereum hirsutum, respectively. The expression patterns of the three TLP genes were higher in P. umbellatus with Armillaria mellea infection than in the sclerotia without A. mellea. Furthermore, over-expression of three PuTLPs were carried out in Escherichia coli BL21 (DE3 strain, and high quality proteins were obtained using Ni-NTA resin that can be used for preparation of specific antibodies. These results suggest that PuTLP1, PuTLP2 and PuTLP3 in P. umbellatus may be involved in the defense response to A. mellea infections.

  6. Acinetobacter baumannii K27 and K44 capsular polysaccharides have the same K unit but different structures due to the presence of distinct wzy genes in otherwise closely related K gene clusters.

    Science.gov (United States)

    Shashkov, Alexander S; Kenyon, Johanna J; Senchenkova, Sof'ya N; Shneider, Mikhail M; Popova, Anastasiya V; Arbatsky, Nikolay P; Miroshnikov, Konstantin A; Volozhantsev, Nikolay V; Hall, Ruth M; Knirel, Yuriy A

    2016-05-01

    Capsular polysaccharides (CPSs), from Acinetobacter baumannii isolates 1432, 4190 and NIPH 70, which have related gene content at the K locus, were examined, and the chemical structures established using 2D(1)H and(13)C NMR spectroscopy. The three isolates produce the same pentasaccharide repeat unit, which consists of 5-N-acetyl-7-N-[(S)-3-hydroxybutanoyl] (major) or 5,7-di-N-acetyl (minor) derivatives of 5,7-diamino-3,5,7,9-tetradeoxy-D-glycero-D-galacto-non-2-ulosonic (legionaminic) acid (Leg5Ac7R), D-galactose, N-acetyl-D-galactosamine and N-acetyl-D-glucosamine. However, the linkage between repeat units in NIPH 70 was different to that in 1432 and 4190, and this significantly alters the CPS structure. The KL27 gene cluster in 4190 and KL44 gene cluster in NIPH 70 are organized identically and contain lga genes for Leg5Ac7R synthesis, genes for the synthesis of the common sugars, as well as anitrA2 initiating transferase and four glycosyltransferases genes. They share high-level nucleotide sequence identity for corresponding genes, but differ in the wzy gene encoding the Wzy polymerase. The Wzy proteins, which have different lengths and share no similarity, would form the unrelated linkages in the K27 and K44 structures. The linkages formed by the four shared glycosyltransferases were predicted by comparison with gene clusters that synthesize related structures. These findings unambiguously identify the linkages formed by WzyK27 and WzyK44, and show that the presence of different wzy genes in otherwise closely related K gene clusters changes the structure of the CPS. This may affect its capacity as a protective barrier for A. baumannii. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Using hierarchical clustering of secreted protein families to classify and rank candidate effectors of rust fungi.

    Directory of Open Access Journals (Sweden)

    Diane G O Saunders

    Full Text Available Rust fungi are obligate biotrophic pathogens that cause considerable damage on crop plants. Puccinia graminis f. sp. tritici, the causal agent of wheat stem rust, and Melampsora larici-populina, the poplar leaf rust pathogen, have strong deleterious impacts on wheat and poplar wood production, respectively. Filamentous pathogens such as rust fungi secrete molecules called disease effectors that act as modulators of host cell physiology and can suppress or trigger host immunity. Current knowledge on effectors from other filamentous plant pathogens can be exploited for the characterisation of effectors in the genome of recently sequenced rust fungi. We designed a comprehensive in silico analysis pipeline to identify the putative effector repertoire from the genome of two plant pathogenic rust fungi. The pipeline is based on the observation that known effector proteins from filamentous pathogens have at least one of the following properties: (i contain a secretion signal, (ii are encoded by in planta induced genes, (iii have similarity to haustorial proteins, (iv are small and cysteine rich, (v contain a known effector motif or a nuclear localization signal, (vi are encoded by genes with long intergenic regions, (vii contain internal repeats, and (viii do not contain PFAM domains, except those associated with pathogenicity. We used Markov clustering and hierarchical clustering to classify protein families of rust pathogens and rank them according to their likelihood of being effectors. Using this approach, we identified eight families of candidate effectors that we consider of high value for functional characterization. This study revealed a diverse set of candidate effectors, including families of haustorial expressed secreted proteins and small cysteine-rich proteins. This comprehensive classification of candidate effectors from these devastating rust pathogens is an initial step towards probing plant germplasm for novel resistance components.

  8. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  9. The complete coenzyme B12 biosynthesis gene cluster of Lactobacillus reuteri CRL 1098

    NARCIS (Netherlands)

    Santos, dos F.; Vera, J.L.; Heijden, van der R.; Valdez, G.F.; Vos, de W.M.; Sesma, F.; Hugenholtz, J.

    2008-01-01

    The coenzyme B12 production pathway in Lactobacillus reuteri has been deduced using a combination of genetic, biochemical and bioinformatics approaches. The coenzyme B12 gene cluster of Lb. reuteri CRL1098 has the unique feature of clustering together the cbi, cob and hem genes. It consists of 29

  10. Structure of the neutral capsular polysaccharide of Acinetobacter baumannii NIPH146 that carries the KL37 capsule gene cluster.

    Science.gov (United States)

    Arbatsky, Nikolay P; Shneider, Mikhail M; Kenyon, Johanna J; Shashkov, Alexander S; Popova, Anastasiya V; Miroshnikov, Konstantin A; Volozhantsev, Nikolay V; Knirel, Yuriy A

    2015-09-02

    Capsular polysaccharide (CPS) was isolated from Acinetobacter baumannii NIPH146, and the following structure of branched pentasaccharide repeating unit was established by sugar analyses along with 1D and 2D NMR spectroscopy: In comparison to most other known capsular polysaccharides of A. baumannii, the CPS studied is neutral and lacks any specific monosaccharide component. The synthesis, assembly and export of this structure could be attributed to genes in a novel capsule biosynthesis gene cluster, designated KL37, which was found in the NIPH146 genome. The CPS of A. baumannii NIPH146 shares the α-d-Galp-(1→6)-β-d-Glcp-(1→3)-d-GalpNAc-(1→ trisaccharide fragment with the CPS units of several A. baumannii strains, including ATCC 17978 and LUH 5537 that carry the KL3 and KL22 gene clusters, respectively. KL37 contains two genes for glycosyltransferases that are related to two glycosyltransferase genes present in both KL3 and KL22, and the encoded proteins could be tentatively assigned to linkages between sugars in the CPS repeat. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Comprehensive identification and clustering of CLV3/ESR-related (CLE) genes in plants finds groups with potentially shared function.

    Science.gov (United States)

    Goad, David M; Zhu, Chuanmei; Kellogg, Elizabeth A

    2017-10-01

    CLV3/ESR (CLE) proteins are important signaling peptides in plants. The short CLE peptide (12-13 amino acids) is cleaved from a larger pre-propeptide and functions as an extracellular ligand. The CLE family is large and has resisted attempts at classification because the CLE domain is too short for reliable phylogenetic analysis and the pre-propeptide is too variable. We used a model-based search for CLE domains from 57 plant genomes and used the entire pre-propeptide for comprehensive clustering analysis. In total, 1628 CLE genes were identified in land plants, with none recognizable from green algae. These CLEs form 12 groups within which CLE domains are largely conserved and pre-propeptides can be aligned. Most clusters contain sequences from monocots, eudicots and Amborella trichopoda, with sequences from Picea abies, Selaginella moellendorffii and Physcomitrella patens scattered in some clusters. We easily identified previously known clusters involved in vascular differentiation and nodulation. In addition, we found a number of discrete groups whose function remains poorly characterized. Available data indicate that CLE proteins within a cluster are likely to share function, whereas those from different clusters play at least partially different roles. Our analysis provides a foundation for future evolutionary and functional studies. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  12. Clustering and visualizing similarity networks of membrane proteins.

    Science.gov (United States)

    Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming

    2015-08-01

    We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.

  13. Resistance gene candidates identified by PCR with degenerate oligonucleotide primers map to clusters of resistance genes in lettuce.

    Science.gov (United States)

    Shen, K A; Meyers, B C; Islam-Faridi, M N; Chin, D B; Stelly, D M; Michelmore, R W

    1998-08-01

    The recent cloning of genes for resistance against diverse pathogens from a variety of plants has revealed that many share conserved sequence motifs. This provides the possibility of isolating numerous additional resistance genes by polymerase chain reaction (PCR) with degenerate oligonucleotide primers. We amplified resistance gene candidates (RGCs) from lettuce with multiple combinations of primers with low degeneracy designed from motifs in the nucleotide binding sites (NBSs) of RPS2 of Arabidopsis thaliana and N of tobacco. Genomic DNA, cDNA, and bacterial artificial chromosome (BAC) clones were successfully used as templates. Four families of sequences were identified that had the same similarity to each other as to resistance genes from other species. The relationship of the amplified products to resistance genes was evaluated by several sequence and genetic criteria. The amplified products contained open reading frames with additional sequences characteristic of NBSs. Hybridization of RGCs to genomic DNA and to BAC clones revealed large numbers of related sequences. Genetic analysis demonstrated the existence of clustered multigene families for each of the four RGC sequences. This parallels classical genetic data on clustering of disease resistance genes. Two of the four families mapped to known clusters of resistance genes; these two families were therefore studied in greater detail. Additional evidence that these RGCs could be resistance genes was gained by the identification of leucine-rich repeat (LRR) regions in sequences adjoining the NBS similar to those in RPM1 and RPS2 of A. thaliana. Fluorescent in situ hybridization confirmed the clustered genomic distribution of these sequences. The use of PCR with degenerate oligonucleotide primers is therefore an efficient method to identify numerous RGCs in plants.

  14. Two Gene Clusters Coordinate Galactose and Lactose Metabolism in Streptococcus gordonii

    Science.gov (United States)

    Zeng, Lin; Martino, Nicole C.

    2012-01-01

    Streptococcus gordonii is an early colonizer of the human oral cavity and an abundant constituent of oral biofilms. Two tandemly arranged gene clusters, designated lac and gal, were identified in the S. gordonii DL1 genome, which encode genes of the tagatose pathway (lacABCD) and sugar phosphotransferase system (PTS) enzyme II permeases. Genes encoding a predicted phospho-β-galactosidase (LacG), a DeoR family transcriptional regulator (LacR), and a transcriptional antiterminator (LacT) were also present in the clusters. Growth and PTS assays supported that the permease designated EIILac transports lactose and galactose, whereas EIIGal transports galactose. The expression of the gene for EIIGal was markedly upregulated in cells growing on galactose. Using promoter-cat fusions, a role for LacR in the regulation of the expressions of both gene clusters was demonstrated, and the gal cluster was also shown to be sensitive to repression by CcpA. The deletion of lacT caused an inability to grow on lactose, apparently because of its role in the regulation of the expression of the genes for EIILac, but had little effect on galactose utilization. S. gordonii maintained a selective advantage over Streptococcus mutans in a mixed-species competition assay, associated with its possession of a high-affinity galactose PTS, although S. mutans could persist better at low pHs. Collectively, these results support the concept that the galactose and lactose systems of S. gordonii are subject to complex regulation and that a high-affinity galactose PTS may be advantageous when S. gordonii is competing against the caries pathogen S. mutans in oral biofilms. PMID:22660715

  15. QTL global meta-analysis: are trait determining genes clustered?

    Directory of Open Access Journals (Sweden)

    Adelson David L

    2009-04-01

    Full Text Available Abstract Background A key open question in biology is if genes are physically clustered with respect to their known functions or phenotypic effects. This is of particular interest for Quantitative Trait Loci (QTL where a QTL region could contain a number of genes that contribute to the trait being measured. Results We observed a significant increase in gene density within QTL regions compared to non-QTL regions and/or the entire bovine genome. By grouping QTL from the Bovine QTL Viewer database into 8 categories of non-redundant regions, we have been able to analyze gene density and gene function distribution, based on Gene Ontology (GO with relation to their location within QTL regions, outside of QTL regions and across the entire bovine genome. We identified a number of GO terms that were significantly over represented within particular QTL categories. Furthermore, select GO terms expected to be associated with the QTL category based on common biological knowledge have also proved to be significantly over represented in QTL regions. Conclusion Our analysis provides evidence of over represented GO terms in QTL regions. This increased GO term density indicates possible clustering of gene functions within QTL regions of the bovine genome. Genes with similar functions may be grouped in specific locales and could be contributing to QTL traits. Moreover, we have identified over-represented GO terminology that from a biological standpoint, makes sense with respect to QTL category type.

  16. MADIBA: A web server toolkit for biological interpretation of Plasmodium and plant gene clusters

    Directory of Open Access Journals (Sweden)

    Louw Abraham I

    2008-02-01

    Full Text Available Abstract Background Microarray technology makes it possible to identify changes in gene expression of an organism, under various conditions. Data mining is thus essential for deducing significant biological information such as the identification of new biological mechanisms or putative drug targets. While many algorithms and software have been developed for analysing gene expression, the extraction of relevant information from experimental data is still a substantial challenge, requiring significant time and skill. Description MADIBA (MicroArray Data Interface for Biological Annotation facilitates the assignment of biological meaning to gene expression clusters by automating the post-processing stage. A relational database has been designed to store the data from gene to pathway for Plasmodium, rice and Arabidopsis. Tools within the web interface allow rapid analyses for the identification of the Gene Ontology terms relevant to each cluster; visualising the metabolic pathways where the genes are implicated, their genomic localisations, putative common transcriptional regulatory elements in the upstream sequences, and an analysis specific to the organism being studied. Conclusion MADIBA is an integrated, online tool that will assist researchers in interpreting their results and understand the meaning of the co-expression of a cluster of genes. Functionality of MADIBA was validated by analysing a number of gene clusters from several published experiments – expression profiling of the Plasmodium life cycle, and salt stress treatments of Arabidopsis and rice. In most of the cases, the same conclusions found by the authors were quickly and easily obtained after analysing the gene clusters with MADIBA.

  17. Statistical indicators of collective behavior and functional clusters in gene networks of yeast

    Science.gov (United States)

    Živković, J.; Tadić, B.; Wick, N.; Thurner, S.

    2006-03-01

    We analyze gene expression time-series data of yeast (S. cerevisiae) measured along two full cell-cycles. We quantify these data by using q-exponentials, gene expression ranking and a temporal mean-variance analysis. We construct gene interaction networks based on correlation coefficients and study the formation of the corresponding giant components and minimum spanning trees. By coloring genes according to their cell function we find functional clusters in the correlation networks and functional branches in the associated trees. Our results suggest that a percolation point of functional clusters can be identified on these gene expression correlation networks.

  18. Fast large-scale clustering of protein structures using Gauss integrals

    DEFF Research Database (Denmark)

    Harder, Tim; Borg, Mikael; Boomsma, Wouter

    2011-01-01

    trajectories. Results: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by rst mapping structures to Gauss integral vectors – which were introduced by Røgen and co......-workers – and subsequently performing K-means clustering. Conclusions: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a signicantly larger number of structures, while providing state-ofthe- art results. The number of low energy structures generated...

  19. Genome-wide identification of physically clustered genes suggests chromatin-level co-regulation in male reproductive development in Arabidopsis thaliana.

    Science.gov (United States)

    Reimegård, Johan; Kundu, Snehangshu; Pendle, Ali; Irish, Vivian F; Shaw, Peter; Nakayama, Naomi; Sundström, Jens F; Emanuelsson, Olof

    2017-04-07

    Co-expression of physically linked genes occurs surprisingly frequently in eukaryotes. Such chromosomal clustering may confer a selective advantage as it enables coordinated gene regulation at the chromatin level. We studied the chromosomal organization of genes involved in male reproductive development in Arabidopsis thaliana. We developed an in-silico tool to identify physical clusters of co-regulated genes from gene expression data. We identified 17 clusters (96 genes) involved in stamen development and acting downstream of the transcriptional activator MS1 (MALE STERILITY 1), which contains a PHD domain associated with chromatin re-organization. The clusters exhibited little gene homology or promoter element similarity, and largely overlapped with reported repressive histone marks. Experiments on a subset of the clusters suggested a link between expression activation and chromatin conformation: qRT-PCR and mRNA in situ hybridization showed that the clustered genes were up-regulated within 48 h after MS1 induction; out of 14 chromatin-remodeling mutants studied, expression of clustered genes was consistently down-regulated only in hta9/hta11, previously associated with metabolic cluster activation; DNA fluorescence in situ hybridization confirmed that transcriptional activation of the clustered genes was correlated with open chromatin conformation. Stamen development thus appears to involve transcriptional activation of physically clustered genes through chromatin de-condensation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Clustering gene expression data based on predicted differential effects of GV interaction.

    Science.gov (United States)

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  1. Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases

    Directory of Open Access Journals (Sweden)

    Ma'ayan Avi

    2007-10-01

    Full Text Available Abstract Background In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP, generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Results Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Conclusion Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.

  2. Transporter’s evolution and carbohydrate metabolic clusters

    NARCIS (Netherlands)

    Plantinga, Titia H.; Does, Chris van der; Driessen, Arnold J.M.

    2004-01-01

    The yiaQRS genes of Escherichia coli K-12 are involved in carbohydrate metabolism. Clustering of homologous genes was found throughout several unrelated bacteria. Strikingly, all four bacterial transport protein classes were found, conserving transport function but not mechanism. It appears that

  3. Transcriptional regulation of gene expression clusters in motor neurons following spinal cord injury

    Directory of Open Access Journals (Sweden)

    Westerdahl Ann-Charlotte

    2010-06-01

    Full Text Available Abstract Background Spinal cord injury leads to neurological dysfunctions affecting the motor, sensory as well as the autonomic systems. Increased excitability of motor neurons has been implicated in injury-induced spasticity, where the reappearance of self-sustained plateau potentials in the absence of modulatory inputs from the brain correlates with the development of spasticity. Results Here we examine the dynamic transcriptional response of motor neurons to spinal cord injury as it evolves over time to unravel common gene expression patterns and their underlying regulatory mechanisms. For this we use a rat-tail-model with complete spinal cord transection causing injury-induced spasticity, where gene expression profiles are obtained from labeled motor neurons extracted with laser microdissection 0, 2, 7, 21 and 60 days post injury. Consensus clustering identifies 12 gene clusters with distinct time expression profiles. Analysis of these gene clusters identifies early immunological/inflammatory and late developmental responses as well as a regulation of genes relating to neuron excitability that support the development of motor neuron hyper-excitability and the reappearance of plateau potentials in the late phase of the injury response. Transcription factor motif analysis identifies differentially expressed transcription factors involved in the regulation of each gene cluster, shaping the expression of the identified biological processes and their associated genes underlying the changes in motor neuron excitability. Conclusions This analysis provides important clues to the underlying mechanisms of transcriptional regulation responsible for the increased excitability observed in motor neurons in the late chronic phase of spinal cord injury suggesting alternative targets for treatment of spinal cord injury. Several transcription factors were identified as potential regulators of gene clusters containing elements related to motor neuron hyper

  4. Transcriptional regulation of gene expression clusters in motor neurons following spinal cord injury.

    Science.gov (United States)

    Ryge, Jesper; Winther, Ole; Wienecke, Jacob; Sandelin, Albin; Westerdahl, Ann-Charlotte; Hultborn, Hans; Kiehn, Ole

    2010-06-09

    Spinal cord injury leads to neurological dysfunctions affecting the motor, sensory as well as the autonomic systems. Increased excitability of motor neurons has been implicated in injury-induced spasticity, where the reappearance of self-sustained plateau potentials in the absence of modulatory inputs from the brain correlates with the development of spasticity. Here we examine the dynamic transcriptional response of motor neurons to spinal cord injury as it evolves over time to unravel common gene expression patterns and their underlying regulatory mechanisms. For this we use a rat-tail-model with complete spinal cord transection causing injury-induced spasticity, where gene expression profiles are obtained from labeled motor neurons extracted with laser microdissection 0, 2, 7, 21 and 60 days post injury. Consensus clustering identifies 12 gene clusters with distinct time expression profiles. Analysis of these gene clusters identifies early immunological/inflammatory and late developmental responses as well as a regulation of genes relating to neuron excitability that support the development of motor neuron hyper-excitability and the reappearance of plateau potentials in the late phase of the injury response. Transcription factor motif analysis identifies differentially expressed transcription factors involved in the regulation of each gene cluster, shaping the expression of the identified biological processes and their associated genes underlying the changes in motor neuron excitability. This analysis provides important clues to the underlying mechanisms of transcriptional regulation responsible for the increased excitability observed in motor neurons in the late chronic phase of spinal cord injury suggesting alternative targets for treatment of spinal cord injury. Several transcription factors were identified as potential regulators of gene clusters containing elements related to motor neuron hyper-excitability, the manipulation of which potentially could be

  5. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  6. Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters – Towards Identification of Novel Secondary Metabolisms from Filamentous Fungi -

    Directory of Open Access Journals (Sweden)

    Myco eUmemura

    2015-05-01

    Full Text Available Secondary metabolites are produced mostly by clustered genes that are essential to their biosynthesis. The transcriptional expression of these genes is often cooperatively regulated by a transcription factor located inside or close to a cluster. Most of the secondary metabolism biosynthesis (SMB gene clusters identified to date contain so-called core genes with distinctive sequence features, such as polyketide synthase (PKS and non-ribosomal peptide synthetase (NRPS. Recent efforts in sequencing fungal genomes have revealed far more SMB gene clusters than expected based on the number of core genes in the genomes. Several bioinformatics tools have been developed to survey SMB gene clusters using the sequence motif information of the core genes, including SMURF and antiSMASH.More recently, accompanied by the development of sequencing techniques allowing to obtain large-scale genomic and transcriptomic data, motif-independent prediction methods of SMB gene clusters, including MIDDAS-M, have been developed. Most these methods detect the clusters in which the genes are cooperatively regulated at transcriptional levels, thus allowing the identification of novel SMB gene clusters regardless of the presence of the core genes. Another type of the method, MIPS-CG, uses the characteristics of SMB genes, which are highly enriched in non-syntenic blocks (NSBs, enabling the prediction even without transcriptome data although the results have not been evaluated in detail. Considering that large portion of SMB gene clusters might be sufficiently expressed only in limited uncommon conditions, it seems that prediction of SMB gene clusters by bioinformatics and successive experimental validation is an only way to efficiently uncover hidden SMB gene clusters. Here, we describe and discuss possible novel approaches for the determination of SMB gene clusters that have not been identified using conventional methods.

  7. Clusters of proteins in bio-membranes: insights into the roles of interaction potential shapes and of protein diversity

    OpenAIRE

    Meilhac, Nicolas; Destainville, Nicolas

    2011-01-01

    It has recently been proposed that proteins embedded in lipidic bio-membranes can spontaneously self-organize into stable small clusters, or membrane nano-domains, due to the competition between short-range attractive and longer-range repulsive forces between proteins, specific to these systems. In this paper, we carry on our investigation, by Monte Carlo simulations, of different aspects of cluster phases of proteins in bio-membranes. First, we compare different long-range potentials (includ...

  8. The Genome of Tolypocladium inflatum: Evolution, Organization, and Expression of the Cyclosporin Biosynthetic Gene Cluster

    Science.gov (United States)

    Bushley, Kathryn E.; Raja, Rajani; Jaiswal, Pankaj; Cumbie, Jason S.; Nonogaki, Mariko; Boyd, Alexander E.; Owensby, C. Alisha; Knaus, Brian J.; Elser, Justin; Miller, Daniel; Di, Yanming; McPhail, Kerry L.; Spatafora, Joseph W.

    2013-01-01

    The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921), the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS) that encodes for cyclosporin synthetase (simA) and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc.), and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further investigations of the role

  9. A genome-wide analysis of nonribosomal peptide synthetase gene clusters and their peptides in a Planktothrix rubescens strain

    Directory of Open Access Journals (Sweden)

    Nederbragt Alexander J

    2009-08-01

    Full Text Available Abstract Background Cyanobacteria often produce several different oligopeptides, with unknown biological functions, by nonribosomal peptide synthetases (NRPS. Although some cyanobacterial NRPS gene cluster types are well described, the entire NRPS genomic content within a single cyanobacterial strain has never been investigated. Here we have combined a genome-wide analysis using massive parallel pyrosequencing ("454" and mass spectrometry screening of oligopeptides produced in the strain Planktothrix rubescens NIVA CYA 98 in order to identify all putative gene clusters for oligopeptides. Results Thirteen types of oligopeptides were uncovered by mass spectrometry (MS analyses. Microcystin, cyanopeptolin and aeruginosin synthetases, highly similar to already characterized NRPS, were present in the genome. Two novel NRPS gene clusters were associated with production of anabaenopeptins and microginins, respectively. Sequence-depth of the genome and real-time PCR data revealed three copies of the microginin gene cluster. Since NRPS gene cluster candidates for microviridin and oscillatorin synthesis could not be found, putative (gene encoded precursor peptide sequences to microviridin and oscillatorin were found in the genes mdnA and oscA, respectively. The genes flanking the microviridin and oscillatorin precursor genes encode putative modifying enzymes of the precursor oligopeptides. We therefore propose ribosomal pathways involving modifications and cyclisation for microviridin and oscillatorin. The microviridin, anabaenopeptin and cyanopeptolin gene clusters are situated in close proximity to each other, constituting an oligopeptide island. Conclusion Altogether seven nonribosomal peptide synthetase (NRPS gene clusters and two gene clusters putatively encoding ribosomal oligopeptide biosynthetic pathways were revealed. Our results demonstrate that whole genome shotgun sequencing combined with MS-directed determination of oligopeptides successfully

  10. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data

    Directory of Open Access Journals (Sweden)

    Scherer Stephen W

    2011-05-01

    Full Text Available Abstract Background Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. Results We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. Conclusions The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  11. Gene composer: database software for protein construct design, codon engineering, and gene synthesis.

    Science.gov (United States)

    Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

    2009-04-21

    To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease

  12. Gene Composer: database software for protein construct design, codon engineering, and gene synthesis

    Directory of Open Access Journals (Sweden)

    Mixon Mark

    2009-04-01

    Full Text Available Abstract Background To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. Results An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. Conclusion We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene

  13. antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth

    2015-01-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we...... introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration...... of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products...

  14. Gene structure and expression characteristic of a novel odorant receptor gene cluster in the parasitoid wasp Microplitis mediator (Hymenoptera: Braconidae).

    Science.gov (United States)

    Wang, S-N; Shan, S; Zheng, Y; Peng, Y; Lu, Z-Y; Yang, Y-Q; Li, R-J; Zhang, Y-J; Guo, Y-Y

    2017-08-01

    Odorant receptors (ORs) expressed in the antennae of parasitoid wasps are responsible for detection of various lipophilic airborne molecules. In the present study, 107 novel OR genes were identified from Microplitis mediator antennal transcriptome data. Phylogenetic analysis of the set of OR genes from M. mediator and Microplitis demolitor revealed that M. mediator OR (MmedOR) genes can be classified into different subfamilies, and the majority of MmedORs in each subfamily shared high sequence identities and clear orthologous relationships to M. demolitor ORs. Within a subfamily, six MmedOR genes, MmedOR98, 124, 125, 126, 131 and 155, shared a similar gene structure and were tightly linked in the genome. To evaluate whether the clustered MmedOR genes share common regulatory features, the transcription profile and expression characteristics of the six closely related OR genes were investigated in M. mediator. Rapid amplification of cDNA ends-PCR experiments revealed that the OR genes within the cluster were transcribed as single mRNAs, and a bicistronic mRNA for two adjacent genes (MmedOR124 and MmedOR98) was also detected in female antennae by reverse transcription PCR. In situ hybridization experiments indicated that each OR gene within the cluster was expressed in a different number of cells. Moreover, there was no co-expression of the two highly related OR genes, MmedOR124 and MmedOR98, which appeared to be individually expressed in a distinct population of neurons. Overall, there were distinct expression profiles of closely related MmedOR genes from the same cluster in M. mediator. These data provide a basic understanding of the olfactory coding in parasitoid wasps. © 2017 The Royal Entomological Society.

  15. Variation in sequence and location of the fumonisin mycotoxin niosynthetic gene cluster in Fusarium

    NARCIS (Netherlands)

    Proctor, R.H.; Hove, van F.; Susca, A.; Stea, A.; Busman, M.; Lee, van der T.A.J.; Waalwijk, C.; Moretti, A.

    2010-01-01

    In Fusarium, the ability to produce fumonisins is governed by a 17-gene fumonisin biosynthetic gene (FUM) cluster. Here, we examined the cluster in F. oxysporum strain O-1890 and nine other species selected to represent a wide range of the genetic diversity within the GFSC.

  16. Genome-wide identification, subcellular localization and gene expression analysis of the members of CESA gene family in common tobacco (Nicotiana tabacum L.).

    Science.gov (United States)

    Xu, Zong-Chang; Kong, Yingzhen

    2017-06-20

    Cellulose-synthase proteins (CESAs) are membrane localized proteins and they form protein complexes to produce cellulose in the plasma membrane. CESA proteins play very important roles in cell wall construction during plant growth and development. In this study, a total of 21 NtCESA gene sequences were identified by using PF03552 conserved protein sequence and 10 AtCESA protein sequences of Arabidopsis thaliana to blast against the common tobacco (Nicotiana tabacum L.) genome database with TBLASTN protocol. We analyzed the physical and chemical properties of protein sequences based on some software or on-line analysis tools. The results showed that there were no significant variances in terms of the physical and chemical properties of the 21 NtCESA proteins. First, phylogenetic tree analysis showed that 21 NtCESA genes and 10 AtCESA genes were clustered into five groups, and the gene structures were similar among the genes that are clustered into the same group. Second, in all of the 21 NtCESA proteins the conserved zinc finger domain was identified in the N-terminus, transmembrane domains were identified in the C-terminus and the DDD-QXXRW conserved domains were also identified. Third, gene expression analysis results indicated that most NtCESA genes were expressed in roots and leaves of seedling or mature tissues of tobacco, seeds and callus tissues. The genes that clustered into the same group share similar expression patterns. Importantly, NtCESA proteins that are involved in secondary cell wall cellulose synthesis have two extra transmembrane domains compared with that involved in primary cell wall cellulose biosynthesis. In addition, subcellular localization results showed that NtCESA9 and NtCESA14 were two plasma membrane anchored proteins. This study will lay a foundation for further functional characterization of these NtCESA genes.

  17. Combining random gene fission and rational gene fusion to discover near-infrared fluorescent protein fragments that report on protein-protein interactions.

    Science.gov (United States)

    Pandey, Naresh; Nobles, Christopher L; Zechiedrich, Lynn; Maresso, Anthony W; Silberg, Jonathan J

    2015-05-15

    Gene fission can convert monomeric proteins into two-piece catalysts, reporters, and transcription factors for systems and synthetic biology. However, some proteins can be challenging to fragment without disrupting function, such as near-infrared fluorescent protein (IFP). We describe a directed evolution strategy that can overcome this challenge by randomly fragmenting proteins and concomitantly fusing the protein fragments to pairs of proteins or peptides that associate. We used this method to create libraries that express fragmented IFP as fusions to a pair of associating peptides (IAAL-E3 and IAAL-K3) and proteins (CheA and CheY) and screened for fragmented IFP with detectable near-infrared fluorescence. Thirteen novel fragmented IFPs were identified, all of which arose from backbone fission proximal to the interdomain linker. Either the IAAL-E3 and IAAL-K3 peptides or CheA and CheY proteins could assist with IFP fragment complementation, although the IAAL-E3 and IAAL-K3 peptides consistently yielded higher fluorescence. These results demonstrate how random gene fission can be coupled to rational gene fusion to create libraries enriched in fragmented proteins with AND gate logic that is dependent upon a protein-protein interaction, and they suggest that these near-infrared fluorescent protein fragments will be suitable as reporters for pairs of promoters and protein-protein interactions within whole animals.

  18. Sequencing, physical organization and kinetic expression of the patulin biosynthetic gene cluster from Penicillium expansum

    International Nuclear Information System (INIS)

    Tannous, J.; El Khoury, R.; El Khoury, A.; Lteif, R.; Snini, S.; Lippi, Y.; Oswald, I.; Olivier, P.; Atoui, A.

    2014-01-01

    Patulin is a polyketide-derived mycotoxin produced by numerous filamentous fungi. Among them, Penicillium expansum is by far the most problematic species. This fungus is a destructive phytopathogen capable of growing on fruit, provoking the blue mold decay of apples and producing significant amounts of patulin. The biosynthetic pathway of this mycotoxin is chemically well-characterized, but its genetic bases remain largely unknown with only few characterized genes in less economic relevant species. The present study consisted of the identification and positional organization of the patulin gene cluster in P. expansum strain NRRL 35695. Several amplification reactions were performed with degenerative primers that were designed based on sequences from the orthologous genes available in other species. An improved genome Walking approach was used in order to sequence the remaining adjacent genes of the cluster. RACE-PCR was also carried out from mRNAs to determine the start and stop codons of the coding sequences. The patulin gene cluster in P. expansum consists of 15 genes in the following order: patH, patG, patF, patE, patD, patC, patB, patA, patM, patN, patO, patL, patI, patJ, and patK. These genes share 60–70% of identity with orthologous genes grouped differently, within a putative patulin cluster described in a non-producing strain of Aspergillus clavatus. The kinetics of patulin cluster genes expression was studied under patulin-permissive conditions (natural apple-based medium) and patulin-restrictive conditions (Eagle's minimal essential medium), and demonstrated a significant association between gene expression and patulin production. In conclusion, the sequence of the patulin cluster in P. expansum constitutes a key step for a better understanding of themechanisms leading to patulin production in this fungus. It will allow the role of each gene to be elucidated, and help to define strategies to reduce patulin production in apple-based products

  19. Proteins in similarity relationship with the cluster - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Proteins in similarity relationship with the cluster Data detail Data name Pro...teins in similarity relationship with the cluster DOI 10.18908/lsdba.nbdc00464-003 Description of data conte...s Proteins in similarity relationship with the cluster - Gclust Server | LSDB Archive ...

  20. Horizontal transfer of a nitrate assimilation gene cluster and ecological transitions in fungi: a phylogenetic study.

    Directory of Open Access Journals (Sweden)

    Jason C Slot

    Full Text Available High affinity nitrate assimilation genes in fungi occur in a cluster (fHANT-AC that can be coordinately regulated. The clustered genes include nrt2, which codes for a high affinity nitrate transporter; euknr, which codes for nitrate reductase; and NAD(PH-nir, which codes for nitrite reductase. Homologs of genes in the fHANT-AC occur in other eukaryotes and prokaryotes, but they have only been found clustered in the oomycete Phytophthora (heterokonts. We performed independent and concatenated phylogenetic analyses of homologs of all three genes in the fHANT-AC. Phylogenetic analyses limited to fungal sequences suggest that the fHANT-AC has been transferred horizontally from a basidiomycete (mushrooms and smuts to an ancestor of the ascomycetous mold Trichoderma reesei. Phylogenetic analyses of sequences from diverse eukaryotes and eubacteria, and cluster structure, are consistent with a hypothesis that the fHANT-AC was assembled in a lineage leading to the oomycetes and was subsequently transferred to the Dikarya (Ascomycota+Basidiomycota, which is a derived fungal clade that includes the vast majority of terrestrial fungi. We propose that the acquisition of high affinity nitrate assimilation contributed to the success of Dikarya on land by allowing exploitation of nitrate in aerobic soils, and the subsequent transfer of a complete assimilation cluster improved the fitness of T. reesei in a new niche. Horizontal transmission of this cluster of functionally integrated genes supports the "selfish operon" hypothesis for maintenance of gene clusters.

  1. Comparison of Expression of Secondary Metabolite Biosynthesis Cluster Genes in Aspergillus flavus, A. parasiticus, and A. oryzae

    OpenAIRE

    Ehrlich, Kenneth C.; Mack, Brian M.

    2014-01-01

    Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in gene clusters for the closely related fungi A. parasiticus, A. oryzae, and A. flavus S and L sclerotial morphotypes. The data help ...

  2. Increasing Power by Sharing Information from Genetic Background and Treatment in Clustering of Gene Expression Time Series

    OpenAIRE

    Sura Zaki Alrashid; Muhammad Arifur Rahman; Nabeel H Al-Aaraji; Neil D Lawrence; Paul R Heath

    2018-01-01

    Clustering of gene expression time series gives insight into which genes may be co-regulated, allowing us to discern the activity of pathways in a given microarray experiment. Of particular interest is how a given group of genes varies with different conditions or genetic background. This paper develops
a new clustering method that allows each cluster to be parameterised according to whether the behaviour of the genes across conditions is correlated or anti-correlated. By specifying correlati...

  3. Increasing Power by Sharing Information from Genetic Background and Treatment in Clustering of Gene Expression Time Series

    Directory of Open Access Journals (Sweden)

    Sura Zaki Alrashid

    2018-02-01

    Full Text Available Clustering of gene expression time series gives insight into which genes may be co-regulated, allowing us to discern the activity of pathways in a given microarray experiment. Of particular interest is how a given group of genes varies with different conditions or genetic background. This paper develops
a new clustering method that allows each cluster to be parameterised according to whether the behaviour of the genes across conditions is correlated or anti-correlated. By specifying correlation between such genes,more information is gain within the cluster about how the genes interrelate. Amyotrophic lateral sclerosis (ALS is an irreversible neurodegenerative disorder that kills the motor neurons and results in death within 2 to 3 years from the symptom onset. Speed of progression for different patients are heterogeneous with significant variability. The SOD1G93A transgenic mice from different backgrounds (129Sv and C57 showed consistent phenotypic differences for disease progression. A hierarchy of Gaussian isused processes to model condition-specific and gene-specific temporal co-variances. This study demonstrated about finding some significant gene expression profiles and clusters of associated or co-regulated gene expressions together from four groups of data (SOD1G93A and Ntg from 129Sv and C57 backgrounds. Our study shows the effectiveness of sharing information between replicates and different model conditions when modelling gene expression time series. Further gene enrichment score analysis and ontology pathway analysis of some specified clusters for a particular group may lead toward identifying features underlying the differential speed of disease progression.

  4. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    Science.gov (United States)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  5. Automatically identifying gene/protein terms in MEDLINE abstracts.

    Science.gov (United States)

    Yu, Hong; Hatzivassiloglou, Vasileios; Rzhetsky, Andrey; Wilbur, W John

    2002-01-01

    Natural language processing (NLP) techniques are used to extract information automatically from computer-readable literature. In biology, the identification of terms corresponding to biological substances (e.g., genes and proteins) is a necessary step that precedes the application of other NLP systems that extract biological information (e.g., protein-protein interactions, gene regulation events, and biochemical pathways). We have developed GPmarkup (for "gene/protein-full name mark up"), a software system that automatically identifies gene/protein terms (i.e., symbols or full names) in MEDLINE abstracts. As a part of marking up process, we also generated automatically a knowledge source of paired gene/protein symbols and full names (e.g., LARD for lymphocyte associated receptor of death) from MEDLINE. We found that many of the pairs in our knowledge source do not appear in the current GenBank database. Therefore our methods may also be used for automatic lexicon generation. GPmarkup has 73% recall and 93% precision in identifying and marking up gene/protein terms in MEDLINE abstracts. A random sample of gene/protein symbols and full names and a sample set of marked up abstracts can be viewed at http://www.cpmc.columbia.edu/homepages/yuh9001/GPmarkup/. Contact. hy52@columbia.edu. Voice: 212-939-7028; fax: 212-666-0140.

  6. Comparison of expression of secondary metabolite biosynthesis cluster genes in Aspergillus flavus, A. parasiticus, and A. oryzae.

    Science.gov (United States)

    Ehrlich, Kenneth C; Mack, Brian M

    2014-06-23

    Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in gene clusters for the closely related fungi A. parasiticus, A. oryzae, and A. flavus S and L sclerotial morphotypes. The data help to refine the identification of probable functional gene clusters within these species. Our results suggest that A. flavus, a prevalent contaminant of maize, cottonseed, peanuts and tree nuts, is capable of producing metabolites which, besides aflatoxin, could be an underappreciated contributor to its toxicity.

  7. A recently transferred cluster of bacterial genes in Trichomonas vaginalis - lateral gene transfer and the fate of acquired genes

    Science.gov (United States)

    2014-01-01

    Background Lateral Gene Transfer (LGT) has recently gained recognition as an important contributor to some eukaryote proteomes, but the mechanisms of acquisition and fixation in eukaryotic genomes are still uncertain. A previously defined norm for LGTs in microbial eukaryotes states that the majority are genes involved in metabolism, the LGTs are typically localized one by one, surrounded by vertically inherited genes on the chromosome, and phylogenetics shows that a broad collection of bacterial lineages have contributed to the transferome. Results A unique 34 kbp long fragment with 27 clustered genes (TvLF) of prokaryote origin was identified in the sequenced genome of the protozoan parasite Trichomonas vaginalis. Using a PCR based approach we confirmed the presence of the orthologous fragment in four additional T. vaginalis strains. Detailed sequence analyses unambiguously suggest that TvLF is the result of one single, recent LGT event. The proposed donor is a close relative to the firmicute bacterium Peptoniphilus harei. High nucleotide sequence similarity between T. vaginalis strains, as well as to P. harei, and the absence of homologs in other Trichomonas species, suggests that the transfer event took place after the radiation of the genus Trichomonas. Some genes have undergone pseudogenization and degradation, indicating that they may not be retained in the future. Functional annotations reveal that genes involved in informational processes are particularly prone to degradation. Conclusions We conclude that, although the majority of eukaryote LGTs are single gene occurrences, they may be acquired in clusters of several genes that are subsequently cleansed of evolutionarily less advantageous genes. PMID:24898731

  8. Homo-FRET imaging as a tool to quantify protein and lipid clustering.

    Science.gov (United States)

    Bader, Arjen N; Hoetzl, Sandra; Hofman, Erik G; Voortman, Jarno; van Bergen en Henegouwen, Paul M P; van Meer, Gerrit; Gerritsen, Hans C

    2011-02-25

    Homo-FRET, Förster resonance energy transfer between identical fluorophores, can be conveniently measured by observing its effect on the fluorescence anisotropy. This review aims to summarize the possibilities of fluorescence anisotropy imaging techniques to investigate clustering of identical proteins and lipids. Homo-FRET imaging has the ability to determine distances between fluorophores. In addition it can be employed to quantify cluster sizes as well as cluster size distributions. The interpretation of homo-FRET signals is complicated by the fact that both the mutual orientations of the fluorophores and the number of fluorophores per cluster affect the fluorescence anisotropy in a similar way. The properties of the fluorescence probes are very important. Taking these properties into account is critical for the correct interpretation of homo-FRET signals in protein- and lipid-clustering studies. This is be exemplified by studies on the clustering of the lipid raft markers GPI and K-ras, as well as for EGF receptor clustering in the plasma membrane. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Identification of new genes in a cell envelope-cell division gene cluster of Escherichia coli: cell envelope gene murG.

    Science.gov (United States)

    Salmond, G P; Lutkenhaus, J F; Donachie, W D

    1980-01-01

    We report the identification, cloning, and mapping of a new cell envelope gene, murG. This lies in a group of five genes of similar phenotype (in the order murE murF murG murC ddl) all concerned with peptidoglycan biosynthesis. This group is in a larger cluster of at least 10 genes, all of which are involved in some way with cell envelope growth. Images PMID:6998962

  10. Interaction between Nbp35 and Cfd1 proteins of cytosolic Fe-S cluster assembly reveals a stable complex formation in Entamoeba histolytica.

    Directory of Open Access Journals (Sweden)

    Shadab Anwar

    Full Text Available Iron-Sulfur (Fe-S proteins are involved in many biological functions such as electron transport, photosynthesis, regulation of gene expression and enzymatic activities. Biosynthesis and transfer of Fe-S clusters depend on Fe-S clusters assembly processes such as ISC, SUF, NIF, and CIA systems. Unlike other eukaryotes which possess ISC and CIA systems, amitochondriate Entamoeba histolytica has retained NIF & CIA systems for Fe-S cluster assembly in the cytosol. In the present study, we have elucidated interaction between two proteins of E. histolytica CIA system, Cytosolic Fe-S cluster deficient 1 (Cfd1 protein and Nucleotide binding protein 35 (Nbp35. In-silico analysis showed that structural regions ranging from amino acid residues (P33-K35, G131-V135 and I147-E151 of Nbp35 and (G5-V6, M34-D39 and G46-A52 of Cfd1 are involved in the formation of protein-protein complex. Furthermore, Molecular dynamic (MD simulations study suggested that hydrophobic forces surpass over hydrophilic forces between Nbp35 and Cfd1 and Van-der-Waal interaction plays crucial role in the formation of stable complex. Both proteins were separately cloned, expressed as recombinant fusion proteins in E. coli and purified to homogeneity by affinity column chromatography. Physical interaction between Nbp35 and Cfd1 proteins was confirmed in vitro by co-purification of recombinant Nbp35 with thrombin digested Cfd1 and in vivo by pull down assay and immunoprecipitation. The insilico, in vitro as well as in vivo results prove a stable interaction between these two proteins, supporting the possibility of its involvement in Fe-S cluster transfer to target apo-proteins through CIA machinery in E. histolytica. Our study indicates that initial synthesis of a Fe-S precursor in mitochondria is not necessary for the formation of Cfd1-Nbp35 complex. Thus, Cfd1 and Nbp35 with the help of cytosolic NifS and NifU proteins can participate in the maturation of non-mitosomal Fe-S proteins

  11. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters.

    Science.gov (United States)

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H

    2015-07-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. A genome-wide analysis of the flax (Linum usitatissimum L.) dirigent protein family: from gene identification and evolution to differential regulation.

    Energy Technology Data Exchange (ETDEWEB)

    Corbin, Cyrielle; Drouet, Samantha; Markulin, Lucija; Auguin, Daniel; Laine, Eric; Davin, Laurence B.; Cort, John R.; Lewis, Norman G.; Hano, Christophe

    2018-04-30

    Identification of DIR encoding genes in flax genome. Analysis of phylogeny, gene/protein structures and evolution. Identification of new conserved motifs linked to biochemical functions. Investigation of spatio-temporal gene expression and response to stress. Dirigent proteins (DIRs) were discovered during 8-8' lignan biosynthesis studies, through identification of stereoselective coupling to afford either (+)- or (-)-pinoresinols from E-coniferyl alcohol. DIRs are also involved or potentially involved in terpenoid, allyl/propenyl phenol lignan, pterocarpan and lignin biosynthesis. DIRs have very large multigene families in different vascular plants including flax, with most still of unknown function. DIR studies typically focus on a small subset of genes and identification of biochemical/physiological functions. Herein, a genome-wide analysis and characterization of the predicted flax DIR 44-membered multigene family was performed, this species being a rich natural grain source of 8-8' linked secoisolariciresinol-derived lignan oligomers. All predicted DIR sequences, including their promoters, were analyzed together with their public gene expression datasets. Expression patterns of selected DIRs were examined using qPCR, as well as through clustering analysis of DIR gene expression. These analyses further implicated roles for specific DIRs in (-)-pinoresinol formation in seed-coats, as well as (+)-pinoresinol in vegetative organs and/or specific responses to stress. Phylogeny and gene expression analysis segregated flax DIRs into six distinct clusters with new cluster-specific motifs identified. We propose that these findings can serve as a foundation to further systematically determine functions of DIRs, i.e. other than those already known in lignan biosynthesis in flax and other species. Given the differential expression profiles and inducibility of the flax DIR family, we provisionally propose that some DIR genes of unknown function could be involved

  13. A genome-wide analysis of the flax (Linum usitatissimum L.) dirigent protein family: from gene identification and evolution to differential regulation.

    Science.gov (United States)

    Corbin, Cyrielle; Drouet, Samantha; Markulin, Lucija; Auguin, Daniel; Lainé, Éric; Davin, Laurence B; Cort, John R; Lewis, Norman G; Hano, Christophe

    2018-05-01

    Identification of DIR encoding genes in flax genome. Analysis of phylogeny, gene/protein structures and evolution. Identification of new conserved motifs linked to biochemical functions. Investigation of spatio-temporal gene expression and response to stress. Dirigent proteins (DIRs) were discovered during 8-8' lignan biosynthesis studies, through identification of stereoselective coupling to afford either (+)- or (-)-pinoresinols from E-coniferyl alcohol. DIRs are also involved or potentially involved in terpenoid, allyl/propenyl phenol lignan, pterocarpan and lignin biosynthesis. DIRs have very large multigene families in different vascular plants including flax, with most still of unknown function. DIR studies typically focus on a small subset of genes and identification of biochemical/physiological functions. Herein, a genome-wide analysis and characterization of the predicted flax DIR 44-membered multigene family was performed, this species being a rich natural grain source of 8-8' linked secoisolariciresinol-derived lignan oligomers. All predicted DIR sequences, including their promoters, were analyzed together with their public gene expression datasets. Expression patterns of selected DIRs were examined using qPCR, as well as through clustering analysis of DIR gene expression. These analyses further implicated roles for specific DIRs in (-)-pinoresinol formation in seed-coats, as well as (+)-pinoresinol in vegetative organs and/or specific responses to stress. Phylogeny and gene expression analysis segregated flax DIRs into six distinct clusters with new cluster-specific motifs identified. We propose that these findings can serve as a foundation to further systematically determine functions of DIRs, i.e. other than those already known in lignan biosynthesis in flax and other species. Given the differential expression profiles and inducibility of the flax DIR family, we provisionally propose that some DIR genes of unknown function could be involved in

  14. Evolution and Diversity of Biosynthetic Gene Clusters in Fusarium

    Directory of Open Access Journals (Sweden)

    Koen Hoogendoorn

    2018-06-01

    Full Text Available Plant pathogenic fungi in the Fusarium genus cause severe damage to crops, resulting in great financial losses and health hazards. Specialized metabolites synthesized by these fungi are known to play key roles in the infection process, and to provide survival advantages inside and outside the host. However, systematic studies of the evolution of specialized metabolite-coding potential across Fusarium have been scarce. Here, we apply a combination of bioinformatic approaches to identify biosynthetic gene clusters (BGCs across publicly available genomes from Fusarium, to group them into annotated families and to study gain/loss events of BGC families throughout the history of the genus. Comparison with MIBiG reference BGCs allowed assignment of 29 gene cluster families (GCFs to pathways responsible for the production of known compounds, while for 57 GCFs, the molecular products remain unknown. Comparative analysis of BGC repertoires using ancestral state reconstruction raised several new hypotheses on how BGCs contribute to Fusarium pathogenicity or host specificity, sometimes surprisingly so: for example, a gene cluster for the biosynthesis of hexadehydro-astechrome was identified in the genome of the biocontrol strain Fusarium oxysporum Fo47, while being absent in that of the tomato pathogen F. oxysporum f.sp. lycopersici. Several BGCs were also identified on supernumerary chromosomes; heterologous expression of genes for three terpene synthases encoded on the Fusarium poae supernumerary chromosome and subsequent GC/MS analysis showed that these genes are functional and encode enzymes that each are able to synthesize koraiol; this observed functional redundancy supports the hypothesis that localization of copies of BGCs on supernumerary chromosomes provides freedom for evolutionary innovations to occur, while the original function remains conserved. Altogether, this systematic overview of biosynthetic diversity in Fusarium paves the way for

  15. Transcription of two adjacent carbohydrate utilization gene clusters in Bifidobacterium breve UCC2003 is controlled by LacI- and repressor open reading frame kinase (ROK)-type regulators.

    Science.gov (United States)

    O'Connell, Kerry Joan; Motherway, Mary O'Connell; Liedtke, Andrea; Fitzgerald, Gerald F; Paul Ross, R; Stanton, Catherine; Zomer, Aldert; van Sinderen, Douwe

    2014-06-01

    Members of the genus Bifidobacterium are commonly found in the gastrointestinal tracts of mammals, including humans, where their growth is presumed to be dependent on various diet- and/or host-derived carbohydrates. To understand transcriptional control of bifidobacterial carbohydrate metabolism, we investigated two genetic carbohydrate utilization clusters dedicated to the metabolism of raffinose-type sugars and melezitose. Transcriptomic and gene inactivation approaches revealed that the raffinose utilization system is positively regulated by an activator protein, designated RafR. The gene cluster associated with melezitose metabolism was shown to be subject to direct negative control by a LacI-type transcriptional regulator, designated MelR1, in addition to apparent indirect negative control by means of a second LacI-type regulator, MelR2. In silico analysis, DNA-protein interaction, and primer extension studies revealed the MelR1 and MelR2 operator sequences, each of which is positioned just upstream of or overlapping the correspondingly regulated promoter sequences. Similar analyses identified the RafR binding operator sequence located upstream of the rafB promoter. This study indicates that transcriptional control of gene clusters involved in carbohydrate metabolism in bifidobacteria is subject to conserved regulatory systems, representing either positive or negative control.

  16. Discovering disease-associated genes in weighted protein-protein interaction networks

    Science.gov (United States)

    Cui, Ying; Cai, Meng; Stanley, H. Eugene

    2018-04-01

    Although there have been many network-based attempts to discover disease-associated genes, most of them have not taken edge weight - which quantifies their relative strength - into consideration. We use connection weights in a protein-protein interaction (PPI) network to locate disease-related genes. We analyze the topological properties of both weighted and unweighted PPI networks and design an improved random forest classifier to distinguish disease genes from non-disease genes. We use a cross-validation test to confirm that weighted networks are better able to discover disease-associated genes than unweighted networks, which indicates that including link weight in the analysis of network properties provides a better model of complex genotype-phenotype associations.

  17. Acquisition and evolution of plant pathogenesis-associated gene clusters and candidate determinants of tissue-specificity in xanthomonas.

    Directory of Open Access Journals (Sweden)

    Hong Lu

    Full Text Available Xanthomonas is a large genus of plant-associated and plant-pathogenic bacteria. Collectively, members cause diseases on over 392 plant species. Individually, they exhibit marked host- and tissue-specificity. The determinants of this specificity are unknown.To assess potential contributions to host- and tissue-specificity, pathogenesis-associated gene clusters were compared across genomes of eight Xanthomonas strains representing vascular or non-vascular pathogens of rice, brassicas, pepper and tomato, and citrus. The gum cluster for extracellular polysaccharide is conserved except for gumN and sequences downstream. The xcs and xps clusters for type II secretion are conserved, except in the rice pathogens, in which xcs is missing. In the otherwise conserved hrp cluster, sequences flanking the core genes for type III secretion vary with respect to insertion sequence element and putative effector gene content. Variation at the rpf (regulation of pathogenicity factors cluster is more pronounced, though genes with established functional relevance are conserved. A cluster for synthesis of lipopolysaccharide varies highly, suggesting multiple horizontal gene transfers and reassortments, but this variation does not correlate with host- or tissue-specificity. Phylogenetic trees based on amino acid alignments of gum, xps, xcs, hrp, and rpf cluster products generally reflect strain phylogeny. However, amino acid residues at four positions correlate with tissue specificity, revealing hpaA and xpsD as candidate determinants. Examination of genome sequences of xanthomonads Xylella fastidiosa and Stenotrophomonas maltophilia revealed that the hrp, gum, and xcs clusters are recent acquisitions in the Xanthomonas lineage.Our results provide insight into the ancestral Xanthomonas genome and indicate that differentiation with respect to host- and tissue-specificity involved not major modifications or wholesale exchange of clusters, but subtle changes in a small

  18. MeSH key terms for validation and annotation of gene expression clusters

    Energy Technology Data Exchange (ETDEWEB)

    Rechtsteiner, A. (Andreas); Rocha, L. M. (Luis Mateus)

    2004-01-01

    Integration of different sources of information is a great challenge for the analysis of gene expression data, and for the field of Functional Genomics in general. As the availability of numerical data from high-throughput methods increases, so does the need for technologies that assist in the validation and evaluation of the biological significance of results extracted from these data. In mRNA assaying with microarrays, for example, numerical analysis often attempts to identify clusters of co-expressed genes. The important task to find the biological significance of the results and validate them has so far mostly fallen to the biological expert who had to perform this task manually. One of the most promising avenues to develop automated and integrative technology for such tasks lies in the application of modern Information Retrieval (IR) and Knowledge Management (KM) algorithms to databases with biomedical publications and data. Examples of databases available for the field are bibliographic databases c ntaining scientific publications (e.g. MEDLINE/PUBMED), databases containing sequence data (e.g. GenBank) and databases of semantic annotations (e.g. the Gene Ontology Consortium and Medical Subject Headings (MeSH)). We present here an approach that uses the MeSH terms and their concept hierarchies to validate and obtain functional information for gene expression clusters. The controlled and hierarchical MeSH vocabulary is used by the National Library of Medicine (NLM) to index all the articles cited in MEDLINE. Such indexing with a controlled vocabulary eliminates some of the ambiguity due to polysemy (terms that have multiple meanings) and synonymy (multiple terms have similar meaning) that would be encountered if terms would be extracted directly from the articles due to differing article contexts or author preferences and background. Further, the hierarchical organization of the MeSH terms can illustrate the conceptuallfunctional relationships of genes

  19. A remarkably stable TipE gene cluster: evolution of insect Para sodium channel auxiliary subunits

    Directory of Open Access Journals (Sweden)

    Li Jia

    2011-11-01

    Full Text Available Abstract Background First identified in fruit flies with temperature-sensitive paralysis phenotypes, the Drosophila melanogaster TipE locus encodes four voltage-gated sodium (NaV channel auxiliary subunits. This cluster of TipE-like genes on chromosome 3L, and a fifth family member on chromosome 3R, are important for the optional expression and functionality of the Para NaV channel but appear quite distinct from auxiliary subunits in vertebrates. Here, we exploited available arthropod genomic resources to trace the origin of TipE-like genes by mapping their evolutionary histories and examining their genomic architectures. Results We identified a remarkably conserved synteny block of TipE-like orthologues with well-maintained local gene arrangements from 21 insect species. Homologues in the water flea, Daphnia pulex, suggest an ancestral pancrustacean repertoire of four TipE-like genes; a subsequent gene duplication may have generated functional redundancy allowing gene losses in the silk moth and mosquitoes. Intronic nesting of the insect TipE gene cluster probably occurred following the divergence from crustaceans, but in the flour beetle and silk moth genomes the clusters apparently escaped from nesting. Across Pancrustacea, TipE gene family members have experienced intronic nesting, escape from nesting, retrotransposition, translocation, and gene loss events while generally maintaining their local gene neighbourhoods. D. melanogaster TipE-like genes exhibit coordinated spatial and temporal regulation of expression distinct from their host gene but well-correlated with their regulatory target, the Para NaV channel, suggesting that functional constraints may preserve the TipE gene cluster. We identified homology between TipE-like NaV channel regulators and vertebrate Slo-beta auxiliary subunits of big-conductance calcium-activated potassium (BKCa channels, which suggests that ion channel regulatory partners have evolved distinct lineage

  20. Diverse and Abundant Secondary Metabolism Biosynthetic Gene Clusters in the Genomes of Marine Sponge Derived Streptomyces spp. Isolates

    Directory of Open Access Journals (Sweden)

    Stephen A. Jackson

    2018-02-01

    Full Text Available The genus Streptomyces produces secondary metabolic compounds that are rich in biological activity. Many of these compounds are genetically encoded by large secondary metabolism biosynthetic gene clusters (smBGCs such as polyketide synthases (PKS and non-ribosomal peptide synthetases (NRPS which are modular and can be highly repetitive. Due to the repeats, these gene clusters can be difficult to resolve using short read next generation datasets and are often quite poorly predicted using standard approaches. We have sequenced the genomes of 13 Streptomyces spp. strains isolated from shallow water and deep-sea sponges that display antimicrobial activities against a number of clinically relevant bacterial and yeast species. Draft genomes have been assembled and smBGCs have been identified using the antiSMASH (antibiotics and Secondary Metabolite Analysis Shell web platform. We have compared the smBGCs amongst strains in the search for novel sequences conferring the potential to produce novel bioactive secondary metabolites. The strains in this study recruit to four distinct clades within the genus Streptomyces. The marine strains host abundant smBGCs which encode polyketides, NRPS, siderophores, bacteriocins and lantipeptides. The deep-sea strains appear to be enriched with gene clusters encoding NRPS. Marine adaptations are evident in the sponge-derived strains which are enriched for genes involved in the biosynthesis and transport of compatible solutes and for heat-shock proteins. Streptomyces spp. from marine environments are a promising source of novel bioactive secondary metabolites as the abundance and diversity of smBGCs show high degrees of novelty. Sponge derived Streptomyces spp. isolates appear to display genomic adaptations to marine living when compared to terrestrial strains.

  1. Clustering Gene Expression Time Series with Coregionalization: Speed propagation of ALS

    OpenAIRE

    Rahman, Muhammad Arifur; Heath, Paul R.; Lawrence, Neil D.

    2018-01-01

    Clustering of gene expression time series gives insight into which genes may be coregulated, allowing us to discern the activity of pathways in a given microarray experiment. Of particular interest is how a given group of genes varies with different model conditions or genetic background. Amyotrophic lateral sclerosis (ALS), an irreversible diverse neurodegenerative disorder showed consistent phenotypic differences and the disease progression is heterogeneous with significant variability. Thi...

  2. Targeting protein-protein interaction between MLL1 and reciprocal proteins for leukemia therapy.

    Science.gov (United States)

    Wang, Zhi-Hui; Li, Dong-Dong; Chen, Wei-Lin; You, Qi-Dong; Guo, Xiao-Ke

    2018-01-15

    The mixed lineage leukemia protein-1 (MLL1), as a lysine methyltransferase, predominantly regulates the methylation of histone H3 lysine 4 (H3K4) and functions in hematopoietic stem cell (HSC) self-renewal. MLL1 gene fuses with partner genes that results in the generation of MLL1 fusion proteins (MLL1-FPs), which are frequently detected in acute leukemia. In the progress of leukemogenesis, a great deal of proteins cooperate with MLL1 to form multiprotein complexes serving for the dysregulation of H3K4 methylation, the overexpression of homeobox (HOX) cluster genes, and the consequent generation of leukemia. Hence, disrupting the interactions between MLL1 and the reciprocal proteins has been considered to be a new treatment strategy for leukemia. Here, we reviewed potential protein-protein interactions (PPIs) between MLL1 and its reciprocal proteins, and summarized the inhibitors to target MLL1 PPIs. The druggability of MLL1 PPIs for leukemia were also discussed. Copyright © 2017. Published by Elsevier Ltd.

  3. Self Organizing Maps to efficiently cluster and functionally interpret protein conformational ensembles

    Directory of Open Access Journals (Sweden)

    Fabio Stella

    2013-09-01

    Full Text Available An approach that combines Self-Organizing maps, hierarchical clustering and network components is presented, aimed at comparing protein conformational ensembles obtained from multiple Molecular Dynamic simulations. As a first result the original ensembles can be summarized by using only the representative conformations of the clusters obtained. In addition the network components analysis allows to discover and interpret the dynamic behavior of the conformations won by each neuron. The results showed the ability of this approach to efficiently derive a functional interpretation of the protein dynamics described by the original conformational ensemble, highlighting its potential as a support for protein engineering.

  4. Looping and clustering model for the organization of protein-DNA complexes on the bacterial genome

    Science.gov (United States)

    Walter, Jean-Charles; Walliser, Nils-Ole; David, Gabriel; Dorignac, Jérôme; Geniet, Frédéric; Palmeri, John; Parmeggiani, Andrea; Wingreen, Ned S.; Broedersz, Chase P.

    2018-03-01

    The bacterial genome is organized by a variety of associated proteins inside a structure called the nucleoid. These proteins can form complexes on DNA that play a central role in various biological processes, including chromosome segregation. A prominent example is the large ParB-DNA complex, which forms an essential component of the segregation machinery in many bacteria. ChIP-Seq experiments show that ParB proteins localize around centromere-like parS sites on the DNA to which ParB binds specifically, and spreads from there over large sections of the chromosome. Recent theoretical and experimental studies suggest that DNA-bound ParB proteins can interact with each other to condense into a coherent 3D complex on the DNA. However, the structural organization of this protein-DNA complex remains unclear, and a predictive quantitative theory for the distribution of ParB proteins on DNA is lacking. Here, we propose the looping and clustering model, which employs a statistical physics approach to describe protein-DNA complexes. The looping and clustering model accounts for the extrusion of DNA loops from a cluster of interacting DNA-bound proteins that is organized around a single high-affinity binding site. Conceptually, the structure of the protein-DNA complex is determined by a competition between attractive protein interactions and loop closure entropy of this protein-DNA cluster on the one hand, and the positional entropy for placing loops within the cluster on the other. Indeed, we show that the protein interaction strength determines the ‘tightness’ of the loopy protein-DNA complex. Thus, our model provides a theoretical framework for quantitatively computing the binding profiles of ParB-like proteins around a cognate (parS) binding site.

  5. Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs).

    Science.gov (United States)

    Natale, D A; Shankavaram, U T; Galperin, M Y; Wolf, Y I; Aravind, L; Koonin, E V

    2000-01-01

    Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and

  6. Dynamic Change in p63 Protein Expression during Implantation of Urothelial Cancer Clusters

    Directory of Open Access Journals (Sweden)

    Takahiro Yoshida

    2015-07-01

    Full Text Available Although the dissemination of urothelial cancer cells is supposed to be a major cause of the multicentricity of urothelial tumors, the mechanism of implantation has not been well investigated. Here, we found that cancer cell clusters from the urine of patients with urothelial cancer retain the ability to survive, grow, and adhere. By using cell lines and primary cells collected from multiple patients, we demonstrate that △Np63α protein in cancer cell clusters was rapidly decreased through proteasomal degradation when clusters were attached to the matrix, leading to downregulation of E-cadherin and upregulation of N-cadherin. Decreased △Np63α protein level in urothelial cancer cell clusters was involved in the clearance of the urothelium. Our data provide the first evidence that clusters of urothelial cancer cells exhibit dynamic changes in △Np63α expression during attachment to the matrix, and decreased △Np63α protein plays a critical role in the interaction between cancer cell clusters and the urothelium. Thus, because △Np63α might be involved in the process of intraluminal dissemination of urothelial cancer cells, blocking the degradation of △Np63α could be a target of therapy to prevent the dissemination of urothelial cancer.

  7. Computer analysis of protein functional sites projection on exon structure of genes in Metazoa.

    Science.gov (United States)

    Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A

    2015-01-01

    Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and

  8. The drug target genes show higher evolutionary conservation than non-target genes.

    Science.gov (United States)

    Lv, Wenhua; Xu, Yongdeng; Guo, Yiying; Yu, Ziqi; Feng, Guanglong; Liu, Panpan; Luan, Meiwei; Zhu, Hongjie; Liu, Guiyou; Zhang, Mingming; Lv, Hongchao; Duan, Lian; Shang, Zhenwei; Li, Jin; Jiang, Yongshuai; Zhang, Ruijie

    2016-01-26

    Although evidence indicates that drug target genes share some common evolutionary features, there have been few studies analyzing evolutionary features of drug targets from an overall level. Therefore, we conducted an analysis which aimed to investigate the evolutionary characteristics of drug target genes. We compared the evolutionary conservation between human drug target genes and non-target genes by combining both the evolutionary features and network topological properties in human protein-protein interaction network. The evolution rate, conservation score and the percentage of orthologous genes of 21 species were included in our study. Meanwhile, four topological features including the average shortest path length, betweenness centrality, clustering coefficient and degree were considered for comparison analysis. Then we got four results as following: compared with non-drug target genes, 1) drug target genes had lower evolutionary rates; 2) drug target genes had higher conservation scores; 3) drug target genes had higher percentages of orthologous genes and 4) drug target genes had a tighter network structure including higher degrees, betweenness centrality, clustering coefficients and lower average shortest path lengths. These results demonstrate that drug target genes are more evolutionarily conserved than non-drug target genes. We hope that our study will provide valuable information for other researchers who are interested in evolutionary conservation of drug targets.

  9. Genome-wide evolutionary characterization and expression analyses of major latex protein (MLP) family genes in Vitis vinifera.

    Science.gov (United States)

    Zhang, Ningbo; Li, Ruimin; Shen, Wei; Jiao, Shuzhen; Zhang, Junxiang; Xu, Weirong

    2018-04-27

    The major latex protein/ripening-related protein (MLP/RRP) subfamily is known to be involved in a wide range of biological processes of plant development and various stress responses. However, the biological function of MLP/RRP proteins is still far from being clear and identification of them may provide important clues for understanding their roles. Here, we report a genome-wide evolutionary characterization and gene expression analysis of the MLP family in European Vitis species. A total of 14 members, was found in the grape genome, all of which are located on chromosome 1, where are predominantly arranged in tandem clusters. We have noticed, most surprisingly, promoter-sharing by several non-identical but highly similar gene members to a greater extent than expected by chance. Synteny analysis between the grape and Arabidopsis thaliana genomes suggested that 3 grape MLP genes arose before the divergence of the two species. Phylogenetic analysis provided further insights into the evolutionary relationship between the genes, as well as their putative functions, and tissue-specific expression analysis suggested distinct biological roles for different members. Our expression data suggested a couple of candidate genes involved in abiotic stresses and phytohormone responses. The present work provides new insight into the evolution and regulation of Vitis MLP genes, which represent targets for future studies and inclusion in tolerance-related molecular breeding programs.

  10. Nearly complete mitogenome of hairy sawfly, Corynis lateralis (Brullé, 1832) (Hymenoptera: Cimbicidae): rearrangements in the IQM and ARNS1EF gene clusters.

    Science.gov (United States)

    Doğan, Özgül; Korkmaz, E Mahir

    2017-10-01

    The Cimbicidae is a small family of the primitive and relatively less diverse suborder Symphyta (Hymenoptera). Here, nearly complete mitochondrial genome (mitogenome) of hairy sawfly, Corynis lateralis (Hymenoptera: Cimbicidae) was sequenced using next generation sequencing and comparatively analysed with the mitogenome of Trichiosoma anthracinum. The sequenced length of C. lateralis mitogenome was 14,899 bp with an A+T content of 80.60%. All protein coding genes (PCGs) are initiated by ATN codons and all are terminated with TAR or T- stop codon. All tRNA genes preferred usual anticodons. Compared with the inferred insect ancestral mitogenome, two tRNA rearrangements were observed in the IQM and ARNS1EF gene clusters, representing a new event not previously reported in Symphyta. An illicit priming of replication and/or intra/inter-mitochondrial recombination and TDRL seem to be responsible mechanisms for the rearrangement events in these gene clusters. Phylogenetic analyses confirmed the position of Corynis within Cimbicidae and recovered a relationship of Tenthredinoidea + (Cephoidea + Orussoidea) in Symphyta.

  11. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer

    Directory of Open Access Journals (Sweden)

    Wolf Yuri I

    2012-12-01

    Full Text Available Abstract Background Collections of Clusters of Orthologous Genes (COGs provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs. Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea. Results The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major ‘highways’ of horizontal gene transfer. Conclusions The updated collection

  12. Patterns of genetic diversity and differentiation in resistance gene clusters of two hybridizing European Populus species

    OpenAIRE

    Casey, Céline; Stölting, Kai N.; Barbará, Thelma; González-Martínez, Santiago C.; Lexer, Christian

    2015-01-01

    Resistance genes (R-genes) are essential for long-lived organisms such as forest trees, which are exposed to diverse herbivores and pathogens. In short-lived model species, R-genes have been shown to be involved in species isolation. Here, we studied more than 400 trees from two natural hybrid zones of the European Populus species Populus alba and Populus tremula for microsatellite markers located in three R-gene clusters, including one cluster situated in the incipient sex chromosome region....

  13. Mouse Nkrp1-Clr gene cluster sequence and expression analyses reveal conservation of tissue-specific MHC-independent immunosurveillance.

    Directory of Open Access Journals (Sweden)

    Qiang Zhang

    Full Text Available The Nkrp1 (Klrb1-Clr (Clec2 genes encode a receptor-ligand system utilized by NK cells as an MHC-independent immunosurveillance strategy for innate immune responses. The related Ly49 family of MHC-I receptors displays extreme allelic polymorphism and haplotype plasticity. In contrast, previous BAC-mapping and aCGH studies in the mouse suggest the neighboring and related Nkrp1-Clr cluster is evolutionarily stable. To definitively compare the relative evolutionary rate of Nkrp1-Clr vs. Ly49 gene clusters, the Nkrp1-Clr gene clusters from two Ly49 haplotype-disparate inbred mouse strains, BALB/c and 129S6, were sequenced. Both Nkrp1-Clr gene cluster sequences are highly similar to the C57BL/6 reference sequence, displaying the same gene numbers and order, complete pseudogenes, and gene fragments. The Nkrp1-Clr clusters contain a strikingly dissimilar proportion of repetitive elements compared to the Ly49 clusters, suggesting that certain elements may be partly responsible for the highly disparate Ly49 vs. Nkrp1 evolutionary rate. Focused allelic polymorphisms were found within the Nkrp1b/d (Klrb1b, Nkrp1c (Klrb1c, and Clr-c (Clec2f genes, suggestive of possible immune selection. Cell-type specific transcription of Nkrp1-Clr genes in a large panel of tissues/organs was determined. Clr-b (Clec2d and Clr-g (Clec2i showed wide expression, while other Clr genes showed more tissue-specific expression patterns. In situ hybridization revealed specific expression of various members of the Clr family in leukocytes/hematopoietic cells of immune organs, various tissue-restricted epithelial cells (including intestinal, kidney tubular, lung, and corneal progenitor epithelial cells, as well as myocytes. In summary, the Nkrp1-Clr gene cluster appears to evolve more slowly relative to the related Ly49 cluster, and likely regulates innate immunosurveillance in a tissue-specific manner.

  14. Functional characterization of diverse ring-hydroxylating oxygenases and induction of complex aromatic catabolic gene clusters in Sphingobium sp. PNB

    Directory of Open Access Journals (Sweden)

    Pratick Khara

    2014-01-01

    Full Text Available Sphingobium sp. PNB, like other sphingomonads, has multiple ring-hydroxylating oxygenase (RHO genes. Three different fosmid clones have been sequenced to identify the putative genes responsible for the degradation of various aromatics in this bacterial strain. Comparison of the map of the catabolic genes with that of different sphingomonads revealed a similar arrangement of gene clusters that harbors seven sets of RHO terminal components and a sole set of electron transport (ET proteins. The presence of distinctly conserved amino acid residues in ferredoxin and in silico molecular docking analyses of ferredoxin with the well characterized terminal oxygenase components indicated the structural uniqueness of the ET component in sphingomonads. The predicted substrate specificities, derived from the phylogenetic relationship of each of the RHOs, were examined based on transformation of putative substrates and their structural homologs by the recombinant strains expressing each of the oxygenases and the sole set of available ET proteins. The RHO AhdA1bA2b was functionally characterized for the first time and was found to be capable of transforming ethylbenzene, propylbenzene, cumene, p-cymene and biphenyl, in addition to a number of polycyclic aromatic hydrocarbons. Overexpression of aromatic catabolic genes in strain PNB, revealed by real-time PCR analyses, is a way forward to understand the complex regulation of degradative genes in sphingomonads.

  15. Gene expression profiles reveal key genes for early diagnosis and treatment of adamantinomatous craniopharyngioma.

    Science.gov (United States)

    Yang, Jun; Hou, Ziming; Wang, Changjiang; Wang, Hao; Zhang, Hongbing

    2018-04-23

    Adamantinomatous craniopharyngioma (ACP) is an aggressive brain tumor that occurs predominantly in the pediatric population. Conventional diagnosis method and standard therapy cannot treat ACPs effectively. In this paper, we aimed to identify key genes for ACP early diagnosis and treatment. Datasets GSE94349 and GSE68015 were obtained from Gene Expression Omnibus database. Consensus clustering was applied to discover the gene clusters in the expression data of GSE94349 and functional enrichment analysis was performed on gene set in each cluster. The protein-protein interaction (PPI) network was built by the Search Tool for the Retrieval of Interacting Genes, and hubs were selected. Support vector machine (SVM) model was built based on the signature genes identified from enrichment analysis and PPI network. Dataset GSE94349 was used for training and testing, and GSE68015 was used for validation. Besides, RT-qPCR analysis was performed to analyze the expression of signature genes in ACP samples compared with normal controls. Seven gene clusters were discovered in the differentially expressed genes identified from GSE94349 dataset. Enrichment analysis of each cluster identified 25 pathways that highly associated with ACP. PPI network was built and 46 hubs were determined. Twenty-five pathway-related genes that overlapped with the hubs in PPI network were used as signatures to establish the SVM diagnosis model for ACP. The prediction accuracy of SVM model for training, testing, and validation data were 94, 85, and 74%, respectively. The expression of CDH1, CCL2, ITGA2, COL8A1, COL6A2, and COL6A3 were significantly upregulated in ACP tumor samples, while CAMK2A, RIMS1, NEFL, SYT1, and STX1A were significantly downregulated, which were consistent with the differentially expressed gene analysis. SVM model is a promising classification tool for screening and early diagnosis of ACP. The ACP-related pathways and signature genes will advance our knowledge of ACP pathogenesis

  16. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.

    Science.gov (United States)

    Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

    2008-07-01

    UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.

  17. Fine Mapping of Two Wheat Powdery Mildew Resistance Genes Located at the Pm1 Cluster

    Directory of Open Access Journals (Sweden)

    Junchao Liang

    2016-07-01

    Full Text Available Powdery mildew caused by (DC. f. sp. ( is a globally devastating foliar disease of wheat ( L.. More than a dozen genes against this disease, identified from wheat germplasms of different ploidy levels, have been mapped to the region surrounding the locus on the long arm of chromosome 7A, which forms a resistance (-gene cluster. and from einkorn wheat ( L. were two of the genes belonging to this cluster. This study was initiated to fine map these two genes toward map-based cloning. Comparative genomics study showed that macrocolinearity exists between L. chromosome 1 (Bd1 and the – region, which allowed us to develop markers based on the wheat sequences orthologous to genes contained in the Bd1 region. With these and other newly developed and published markers, high-resolution maps were constructed for both and using large F populations. Moreover, a physical map of was constructed through chromosome walking with bacterial artificial chromosome (BAC clones and comparative mapping. Eventually, and were restricted to a 0.12- and 0.86-cM interval, respectively. Based on the closely linked common markers, , , and (another powdery mildew resistance gene in the cluster were not allelic to one another. Severe recombination suppression and disruption of synteny were noted in the region encompassing . These results provided useful information for map-based cloning of the genes in the cluster and interpretation of their evolution.

  18. Functional conservation and divergence of four ginger AP1/AGL9 MADS-box genes revealed by analysis of their expression and protein-protein interaction, and ectopic expression of AhFUL gene in Arabidopsis.

    Directory of Open Access Journals (Sweden)

    Xiumei Li

    Full Text Available Alpinia genus are known generally as ginger-lilies for showy flowers in the ginger family, Zingiberaceae, and their floral morphology diverges from typical monocotyledon flowers. However, little is known about the functions of ginger MADS-box genes in floral identity. In this study, four AP1/AGL9 MADS-box genes were cloned from Alpinia hainanensis, and protein-protein interactions (PPIs and roles of the four genes in floral homeotic conversion and in floral evolution are surveyed for the first time. AhFUL is clustered to the AP1 lineage, AhSEP4 and AhSEP3b to the SEP lineage, and AhAGL6-like to the AGL6 lineage. The four genes showed conserved and divergent expression patterns, and their encoded proteins were localized in the nucleus. Seven combinations of PPI (AhFUL-AhSEP4, AhFUL-AhAGL6-like, AhFUL-AhSEP3b, AhSEP4-AhAGL6-like, AhSEP4-AhSEP3b, AhAGL6-like-AhSEP3b, and AhSEP3b-AhSEP3b were detected, and the PPI patterns in the AP1/AGL9 lineage revealed that five of the 10 possible combinations are conserved and three are variable, while conclusions cannot yet be made regarding the other two. Ectopic expression of AhFUL in Arabidopsis thaliana led to early flowering and floral organ homeotic conversion to sepal-like or leaf-like. Therefore, we conclude that the four A. hainanensis AP1/AGL9 genes show functional conservation and divergence in the floral identity from other MADS-box genes.

  19. Protein Annotation from Protein Interaction Networks and Gene Ontology

    OpenAIRE

    Nguyen, Cao D.; Gardiner, Katheleen J.; Cios, Krzysztof J.

    2011-01-01

    We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precis...

  20. Gene expression data clustering and it’s application in differential analysis of leukemia

    Directory of Open Access Journals (Sweden)

    M. Vahedi

    2008-02-01

    Full Text Available Introduction: DNA microarray technique is one of the most important categories in bioinformatics,which allows the possibility of monitoring thousands of expressed genes has been resulted in creatinggiant data bases of gene expression data, recently. Statistical analysis of such databases includednormalization, clustering, classification and etc.Materials and Methods: Golub et al (1999 collected data bases of leukemia based on the method ofoligonucleotide. The data is on the internet. In this paper, we analyzed gene expression data. It wasclustered by several methods including multi-dimensional scaling, hierarchical and non-hierarchicalclustering. Data set included 20 Acute Lymphoblastic Leukemia (ALL patients and 14 Acute MyeloidLeukemia (AML patients. The results of tow methods of clustering were compared with regard to realgrouping (ALL & AML. R software was used for data analysis.Results: Specificity and sensitivity of divisive hierarchical clustering in diagnosing of ALL patientswere 75% and 92%, respectively. Specificity and sensitivity of partitioning around medoids indiagnosing of ALL patients were 90% and 93%, respectively. These results showed a wellaccomplishment of both methods of clustering. It is considerable that, due to clustering methodsresults, one of the samples was placed in ALL groups, which was in AML group in clinical test.Conclusion: With regard to concordance of the results with real grouping of data, therefore we canuse these methods in the cases where we don't have accurate information of real grouping of data.Moreover, Results of clustering might distinct subgroups of data in such a way that would be necessaryfor concordance with clinical outcomes, laboratory results and so on.

  1. Analysis of gene and protein name synonyms in Entrez Gene and UniProtKB resources

    KAUST Repository

    Arkasosy, Basil

    2013-01-01

    be ambiguous, referring in some cases to more than one gene or one protein, or in others, to both genes and proteins at the same time. Public biological databases give a very useful insight about genes and proteins information, including their names

  2. DiffSLC: A graph centrality method to detect essential proteins of a protein-protein interaction network.

    Science.gov (United States)

    Mistry, Divya; Wise, Roger P; Dickerson, Julie A

    2017-01-01

    Identification of central genes and proteins in biomolecular networks provides credible candidates for pathway analysis, functional analysis, and essentiality prediction. The DiffSLC centrality measure predicts central and essential genes and proteins using a protein-protein interaction network. Network centrality measures prioritize nodes and edges based on their importance to the network topology. These measures helped identify critical genes and proteins in biomolecular networks. The proposed centrality measure, DiffSLC, combines the number of interactions of a protein and the gene coexpression values of genes from which those proteins were translated, as a weighting factor to bias the identification of essential proteins in a protein interaction network. Potentially essential proteins with low node degree are promoted through eigenvector centrality. Thus, the gene coexpression values are used in conjunction with the eigenvector of the network's adjacency matrix and edge clustering coefficient to improve essentiality prediction. The outcome of this prediction is shown using three variations: (1) inclusion or exclusion of gene co-expression data, (2) impact of different coexpression measures, and (3) impact of different gene expression data sets. For a total of seven networks, DiffSLC is compared to other centrality measures using Saccharomyces cerevisiae protein interaction networks and gene expression data. Comparisons are also performed for the top ranked proteins against the known essential genes from the Saccharomyces Gene Deletion Project, which show that DiffSLC detects more essential proteins and has a higher area under the ROC curve than other compared methods. This makes DiffSLC a stronger alternative to other centrality methods for detecting essential genes using a protein-protein interaction network that obeys centrality-lethality principle. DiffSLC is implemented using the igraph package in R, and networkx package in Python. The python package can be

  3. A novel polyketide biosynthesis gene cluster is involved in fruiting body morphogenesis in the filamentous fungi Sordaria macrospora and Neurospora crassa.

    Science.gov (United States)

    Nowrousian, Minou

    2009-04-01

    During fungal fruiting body development, hyphae aggregate to form multicellular structures that protect and disperse the sexual spores. Analysis of microarray data revealed a gene cluster strongly upregulated during fruiting body development in the ascomycete Sordaria macrospora. Real time PCR analysis showed that the genes from the orthologous cluster in Neurospora crassa are also upregulated during development. The cluster encodes putative polyketide biosynthesis enzymes, including a reducing polyketide synthase. Analysis of knockout strains of a predicted dehydrogenase gene from the cluster showed that mutants in N. crassa and S. macrospora are delayed in fruiting body formation. In addition to the upregulated cluster, the N. crassa genome comprises another cluster containing a polyketide synthase gene, and five additional reducing polyketide synthase (rpks) genes that are not part of clusters. To study the role of these genes in sexual development, expression of the predicted rpks genes in S. macrospora (five genes) and N. crassa (six genes) was analyzed; all but one are upregulated during sexual development. Analysis of knockout strains for the N. crassa rpks genes showed that one of them is essential for fruiting body formation. These data indicate that polyketides produced by RPKSs are involved in sexual development in filamentous ascomycetes.

  4. Growing functional modules from a seed protein via integration of protein interaction and gene expression data

    Directory of Open Access Journals (Sweden)

    Dimitrakopoulou Konstantina

    2007-10-01

    Full Text Available Abstract Background Nowadays modern biology aims at unravelling the strands of complex biological structures such as the protein-protein interaction (PPI networks. A key concept in the organization of PPI networks is the existence of dense subnetworks (functional modules in them. In recent approaches clustering algorithms were applied at these networks and the resulting subnetworks were evaluated by estimating the coverage of well-established protein complexes they contained. However, most of these algorithms elaborate on an unweighted graph structure which in turn fails to elevate those interactions that would contribute to the construction of biologically more valid and coherent functional modules. Results In the current study, we present a method that corroborates the integration of protein interaction and microarray data via the discovery of biologically valid functional modules. Initially the gene expression information is overlaid as weights onto the PPI network and the enriched PPI graph allows us to exploit its topological aspects, while simultaneously highlights enhanced functional association in specific pairs of proteins. Then we present an algorithm that unveils the functional modules of the weighted graph by expanding a kernel protein set, which originates from a given 'seed' protein used as starting-point. Conclusion The integrated data and the concept of our approach provide reliable functional modules. We give proofs based on yeast data that our method manages to give accurate results in terms both of structural coherency, as well as functional consistency.

  5. GEM2Net: from gene expression modeling to -omics networks, a new CATdb module to investigate Arabidopsis thaliana genes involved in stress response.

    Science.gov (United States)

    Zaag, Rim; Tamby, Jean Philippe; Guichard, Cécile; Tariq, Zakia; Rigaill, Guillem; Delannoy, Etienne; Renou, Jean-Pierre; Balzergue, Sandrine; Mary-Huard, Tristan; Aubourg, Sébastien; Martin-Magniette, Marie-Laure; Brunaud, Véronique

    2015-01-01

    CATdb (http://urgv.evry.inra.fr/CATdb) is a database providing a public access to a large collection of transcriptomic data, mainly for Arabidopsis but also for other plants. This resource has the rare advantage to contain several thousands of microarray experiments obtained with the same technical protocol and analyzed by the same statistical pipelines. In this paper, we present GEM2Net, a new module of CATdb that takes advantage of this homogeneous dataset to mine co-expression units and decipher Arabidopsis gene functions. GEM2Net explores 387 stress conditions organized into 18 biotic and abiotic stress categories. For each one, a model-based clustering is applied on expression differences to identify clusters of co-expressed genes. To characterize functions associated with these clusters, various resources are analyzed and integrated: Gene Ontology, subcellular localization of proteins, Hormone Families, Transcription Factor Families and a refined stress-related gene list associated to publications. Exploiting protein-protein interactions and transcription factors-targets interactions enables to display gene networks. GEM2Net presents the analysis of the 18 stress categories, in which 17,264 genes are involved and organized within 681 co-expression clusters. The meta-data analyses were stored and organized to compose a dynamic Web resource. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Arabidopsis thaliana mTERF proteins: evolution and functional classification

    Directory of Open Access Journals (Sweden)

    Tatjana eKleine

    2012-10-01

    Full Text Available Organellar gene expression (OGE is crucial for plant development, photosynthesis and respiration, but our understanding of the mechanisms that control it is still relatively poor. Thus, OGE requires various nucleus-encoded proteins that promote transcription, splicing, trimming and editing of organellar RNAs, and regulate translation. In metazoans, proteins of the mitochondrial Transcription tERmination Factor (mTERF family interact with the mitochondrial chromosome and regulate transcriptional initiation and termination. Sequencing of the Arabidopsis thaliana genome led to the identification of a diversified MTERF gene family but, in contrast to mammalian mTERFs, knowledge about the function of these proteins in photosynthetic organisms is scarce. In this hypothesis article, I show that tandem duplications and one block duplication contributed to the large number of MTERF genes in A. thaliana, and propose that the expansion of the family is related to the evolution of land plants. The MTERF genes - especially the duplicated genes - display a number of distinct mRNA accumulation patterns, suggesting functional diversification of mTERF proteins to increase adaptability to environmental changes. Indeed, hypothetical functions for the different mTERF proteins can be predicted using co-expression analysis and gene ontology annotations. On this basis, mTERF proteins can be sorted into five groups. Members of the chloroplast and chloroplast-associated clusters are principally involved in chloroplast gene expression, embryogenesis and protein catabolism, while representatives of the mitochondrial cluster seem to participate in DNA and RNA metabolism in that organelle. Moreover, members of the mitochondrion-associated cluster and the low expression group may act in the nucleus and/or the cytosol. As proteins involved in OGE and presumably nuclear gene expression, mTERFs are ideal candidates for the coordination of the expression of organelle and nuclear

  7. The ergot alkaloid gene cluster: Functional analyses and evolutionary aspects

    Czech Academy of Sciences Publication Activity Database

    Lorenz, N.; Haarmann, T.; Pažoutová, Sylvie; Jung, M.; Tudzynski, P.

    2009-01-01

    Roč. 70, 15-16 (2009), s. 1822-1832 ISSN 0031-9422 Institutional research plan: CEZ:AV0Z50200510 Keywords : Claviceps purpurea * Ergot fungus * Ergot alkaloid gene cluster Subject RIV: EE - Microbiology, Virology Impact factor: 3.104, year: 2009

  8. [HMGA proteins and their genes as a potential neoplastic biomarkers].

    Science.gov (United States)

    Balcerczak, Ewa; Balcerczak, Mariusz; Mirowski, Marek

    2005-01-01

    HMGA proteins and their genes are described in this article. HMGA proteins reveal ability to bind DNA in AT-rich regions, which are characteristic for gene promoter sequences. This interaction lead to gene silencing or their overexpression. In normal tissue HMGA proteins level is low or even undetectable. During embriogenesis their level is increasing. High HMGA proteins level is characteristic for tumor phenotype of spontaneous and experimental malignant neoplasms. High HMGA proteins expression correlate with bad prognostic factors and with metastases formation. HMGA genes expression can be used as a marker of tumor progression. Present studies connected with tumor gene therapy based on HMGA proteins sythesis inhibition by the use of viral vectors containing gene encoding these proteins in antisence orientation, as well as a new potential anticancer drugs acting as crosslinkers between DNA and HMGA proteins suggest their usefulness as a targets in cancer therapy.

  9. Gene expression profiling in the stress control brain region hypothalamic paraventricular nucleus reveals a novel gene network including Amyloid beta Precursor Protein

    Directory of Open Access Journals (Sweden)

    Deussing Jan M

    2010-10-01

    Full Text Available Abstract Background The pivotal role of stress in the precipitation of psychiatric diseases such as depression is generally accepted. This study aims at the identification of genes that are directly or indirectly responding to stress. Inbred mouse strains that had been evidenced to differ in their stress response as well as in their response to antidepressant treatment were chosen for RNA profiling after stress exposure. Gene expression and regulation was determined by microarray analyses and further evaluated by bioinformatics tools including pathway and cluster analyses. Results Forced swimming as acute stressor was applied to C57BL/6J and DBA/2J mice and resulted in sets of regulated genes in the paraventricular nucleus of the hypothalamus (PVN, 4 h or 8 h after stress. Although the expression changes between the mouse strains were quite different, they unfolded in phases over time in both strains. Our search for connections between the regulated genes resulted in potential novel signalling pathways in stress. In particular, Guanine nucleotide binding protein, alpha inhibiting 2 (GNAi2 and Amyloid β (A4 precursor protein (APP were detected as stress-regulated genes, and together with other genes, seem to be integrated into stress-responsive pathways and gene networks in the PVN. Conclusions This search for stress-regulated genes in the PVN revealed its impact on interesting genes (GNAi2 and APP and a novel gene network. In particular the expression of APP in the PVN that is governing stress hormone balance, is of great interest. The reported neuroprotective role of this molecule in the CNS supports the idea that a short acute stress can elicit positive adaptational effects in the brain.

  10. Human microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application.

    Science.gov (United States)

    Roubelakis, Maria G; Zotos, Pantelis; Papachristoudis, Georgios; Michalopoulos, Ioannis; Pappa, Kalliopi I; Anagnou, Nicholas P; Kossida, Sophia

    2009-06-16

    microRNAs (miRNAs) are single-stranded RNA molecules of about 20-23 nucleotides length found in a wide variety of organisms. miRNAs regulate gene expression, by interacting with target mRNAs at specific sites in order to induce cleavage of the message or inhibit translation. Predicting or verifying mRNA targets of specific miRNAs is a difficult process of great importance. GOmir is a novel stand-alone application consisting of two separate tools: JTarget and TAGGO. JTarget integrates miRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools as well as the experimentally supported targets from TarBase and also providing a full gene description and functional analysis for each target gene. On the other hand, TAGGO application is designed to automatically group gene ontology annotations, taking advantage of the Gene Ontology (GO), in order to extract the main attributes of sets of proteins. GOmir represents a new tool incorporating two separate Java applications integrated into one stand-alone Java application. GOmir (by using up to five different databases) introduces miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir can freely be downloaded BRFAA.

  11. Differential dynamic microscopy of weakly scattering and polydisperse protein-rich clusters

    Science.gov (United States)

    Safari, Mohammad S.; Vorontsova, Maria A.; Poling-Skutvik, Ryan; Vekilov, Peter G.; Conrad, Jacinta C.

    2015-10-01

    Nanoparticle dynamics impact a wide range of biological transport processes and applications in nanomedicine and natural resource engineering. Differential dynamic microscopy (DDM) was recently developed to quantify the dynamics of submicron particles in solutions from fluctuations of intensity in optical micrographs. Differential dynamic microscopy is well established for monodisperse particle populations, but has not been applied to solutions containing weakly scattering polydisperse biological nanoparticles. Here we use bright-field DDM (BDDM) to measure the dynamics of protein-rich liquid clusters, whose size ranges from tens to hundreds of nanometers and whose total volume fraction is less than 10-5. With solutions of two proteins, hemoglobin A and lysozyme, we evaluate the cluster diffusion coefficients from the dependence of the diffusive relaxation time on the scattering wave vector. We establish that for weakly scattering populations, an optimal thickness of the sample chamber exists at which the BDDM signal is maximized at the smallest sample volume. The average cluster diffusion coefficient measured using BDDM is consistently lower than that obtained from dynamic light scattering at a scattering angle of 90∘. This apparent discrepancy is due to Mie scattering from the polydisperse cluster population, in which larger clusters preferentially scatter more light in the forward direction.

  12. A Proteomic Approach to Investigating Gene Cluster Expression and Secondary Metabolite Functionality in Aspergillus fumigatus

    Science.gov (United States)

    Owens, Rebecca A.; Hammel, Stephen; Sheridan, Kevin J.; Jones, Gary W.; Doyle, Sean

    2014-01-01

    A combined proteomics and metabolomics approach was utilised to advance the identification and characterisation of secondary metabolites in Aspergillus fumigatus. Here, implementation of a shotgun proteomic strategy led to the identification of non-redundant mycelial proteins (n = 414) from A. fumigatus including proteins typically under-represented in 2-D proteome maps: proteins with multiple transmembrane regions, hydrophobic proteins and proteins with extremes of molecular mass and pI. Indirect identification of secondary metabolite cluster expression was also achieved, with proteins (n = 18) from LaeA-regulated clusters detected, including GliT encoded within the gliotoxin biosynthetic cluster. Biochemical analysis then revealed that gliotoxin significantly attenuates H2O2-induced oxidative stress in A. fumigatus (p>0.0001), confirming observations from proteomics data. A complementary 2-D/LC-MS/MS approach further elucidated significantly increased abundance (pproteome and experimental strategies, plus mechanistic data pertaining to gliotoxin functionality in the organism. PMID:25198175

  13. Detecting coordinated regulation of multi-protein complexes using logic analysis of gene expression

    Directory of Open Access Journals (Sweden)

    Yeates Todd O

    2009-12-01

    Full Text Available Abstract Background Many of the functional units in cells are multi-protein complexes such as RNA polymerase, the ribosome, and the proteasome. For such units to work together, one might expect a high level of regulation to enable co-appearance or repression of sets of complexes at the required time. However, this type of coordinated regulation between whole complexes is difficult to detect by existing methods for analyzing mRNA co-expression. We propose a new methodology that is able to detect such higher order relationships. Results We detect coordinated regulation of multiple protein complexes using logic analysis of gene expression data. Specifically, we identify gene triplets composed of genes whose expression profiles are found to be related by various types of logic functions. In order to focus on complexes, we associate the members of a gene triplet with the distinct protein complexes to which they belong. In this way, we identify complexes related by specific kinds of regulatory relationships. For example, we may find that the transcription of complex C is increased only if the transcription of both complex A AND complex B is repressed. We identify hundreds of examples of coordinated regulation among complexes under various stress conditions. Many of these examples involve the ribosome. Some of our examples have been previously identified in the literature, while others are novel. One notable example is the relationship between the transcription of the ribosome, RNA polymerase and mannosyltransferase II, which is involved in N-linked glycan processing in the Golgi. Conclusions The analysis proposed here focuses on relationships among triplets of genes that are not evident when genes are examined in a pairwise fashion as in typical clustering methods. By grouping gene triplets, we are able to decipher coordinated regulation among sets of three complexes. Moreover, using all triplets that involve coordinated regulation with the ribosome

  14. Leveraging long sequencing reads to investigate R-gene clustering and variation in sugar beet

    Science.gov (United States)

    Host-pathogen interactions are of prime importance to modern agriculture. Plants utilize various types of resistance genes to mitigate pathogen damage. Identification of the specific gene responsible for a specific resistance can be difficult due to duplication and clustering within R-gene families....

  15. Sequencing and transcriptional analysis of the Streptococcus thermophilus histamine biosynthesis gene cluster: factors that affect differential hdcA expression

    DEFF Research Database (Denmark)

    Calles-Enríquez, Marina; Hjort, Benjamin Benn; Andersen, Pia Skov

    2010-01-01

    to produce histamine. The hdc clusters of S. thermophilus CHCC1524 and CHCC6483 were sequenced, and the factors that affect histamine biosynthesis and histidine-decarboxylating gene (hdcA) expression were studied. The hdc cluster began with the hdcA gene, was followed by a transporter (hdcP), and ended...... with the hdcB gene, which is of unknown function. The three genes were orientated in the same direction. The genetic organization of the hdc cluster showed a unique organization among the lactic acid bacterial group and resembled those of Staphylococcus and Clostridium species, thus indicating possible...... acquisition through a horizontal transfer mechanism. Transcriptional analysis of the hdc cluster revealed the existence of a polycistronic mRNA covering the three genes. The histidine-decarboxylating gene (hdcA) of S. thermophilus demonstrated maximum expression during the stationary growth phase, with high...

  16. A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.

    Science.gov (United States)

    Ye, Meixia; Wang, Zhong; Wang, Yaqun; Wu, Rongling

    2015-03-01

    Dynamic changes of gene expression reflect an intrinsic mechanism of how an organism responds to developmental and environmental signals. With the increasing availability of expression data across a time-space scale by RNA-seq, the classification of genes as per their biological function using RNA-seq data has become one of the most significant challenges in contemporary biology. Here we develop a clustering mixture model to discover distinct groups of genes expressed during a period of organ development. By integrating the density function of multivariate Poisson distribution, the model accommodates the discrete property of read counts characteristic of RNA-seq data. The temporal dependence of gene expression is modeled by the first-order autoregressive process. The model is implemented with the Expectation-Maximization algorithm and model selection to determine the optimal number of gene clusters and obtain the estimates of Poisson parameters that describe the pattern of time-dependent expression of genes from each cluster. The model has been demonstrated by analyzing a real data from an experiment aimed to link the pattern of gene expression to catkin development in white poplar. The usefulness of the model has been validated through computer simulation. The model provides a valuable tool for clustering RNA-seq data, facilitating our global view of expression dynamics and understanding of gene regulation mechanisms. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  17. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  18. Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles

    Directory of Open Access Journals (Sweden)

    Lee Yun-Shien

    2008-03-01

    Full Text Available Abstract Background The hierarchical clustering tree (HCT with a dendrogram 1 and the singular value decomposition (SVD with a dimension-reduced representative map 2 are popular methods for two-way sorting the gene-by-array matrix map employed in gene expression profiling. While HCT dendrograms tend to optimize local coherent clustering patterns, SVD leading eigenvectors usually identify better global grouping and transitional structures. Results This study proposes a flipping mechanism for a conventional agglomerative HCT using a rank-two ellipse (R2E, an improved SVD algorithm for sorting purpose seriation by Chen 3 as an external reference. While HCTs always produce permutations with good local behaviour, the rank-two ellipse seriation gives the best global grouping patterns and smooth transitional trends. The resulting algorithm automatically integrates the desirable properties of each method so that users have access to a clustering and visualization environment for gene expression profiles that preserves coherent local clusters and identifies global grouping trends. Conclusion We demonstrate, through four examples, that the proposed method not only possesses better numerical and statistical properties, it also provides more meaningful biomedical insights than other sorting algorithms. We suggest that sorted proximity matrices for genes and arrays, in addition to the gene-by-array expression matrix, can greatly aid in the search for comprehensive understanding of gene expression structures. Software for the proposed methods can be obtained at http://gap.stat.sinica.edu.tw/Software/GAP.

  19. BiP clustering facilitates protein folding in the endoplasmic reticulum.

    Directory of Open Access Journals (Sweden)

    Marc Griesemer

    2014-07-01

    Full Text Available The chaperone BiP participates in several regulatory processes within the endoplasmic reticulum (ER: translocation, protein folding, and ER-associated degradation. To facilitate protein folding, a cooperative mechanism known as entropic pulling has been proposed to demonstrate the molecular-level understanding of how multiple BiP molecules bind to nascent and unfolded proteins. Recently, experimental evidence revealed the spatial heterogeneity of BiP within the nuclear and peripheral ER of S. cerevisiae (commonly referred to as 'clusters'. Here, we developed a model to evaluate the potential advantages of accounting for multiple BiP molecules binding to peptides, while proposing that BiP's spatial heterogeneity may enhance protein folding and maturation. Scenarios were simulated to gauge the effectiveness of binding multiple chaperone molecules to peptides. Using two metrics: folding efficiency and chaperone cost, we determined that the single binding site model achieves a higher efficiency than models characterized by multiple binding sites, in the absence of cooperativity. Due to entropic pulling, however, multiple chaperones perform in concert to facilitate the resolubilization and ultimate yield of folded proteins. As a result of cooperativity, multiple binding site models used fewer BiP molecules and maintained a higher folding efficiency than the single binding site model. These insilico investigations reveal that clusters of BiP molecules bound to unfolded proteins may enhance folding efficiency through cooperative action via entropic pulling.

  20. Spatial expression of Hox cluster genes in the ontogeny of a sea urchin

    Science.gov (United States)

    Arenas-Mena, C.; Cameron, A. R.; Davidson, E. H.

    2000-01-01

    The Hox cluster of the sea urchin Strongylocentrous purpuratus contains ten genes in a 500 kb span of the genome. Only two of these genes are expressed during embryogenesis, while all of eight genes tested are expressed during development of the adult body plan in the larval stage. We report the spatial expression during larval development of the five 'posterior' genes of the cluster: SpHox7, SpHox8, SpHox9/10, SpHox11/13a and SpHox11/13b. The five genes exhibit a dynamic, largely mesodermal program of expression. Only SpHox7 displays extensive expression within the pentameral rudiment itself. A spatially sequential and colinear arrangement of expression domains is found in the somatocoels, the paired posterior mesodermal structures that will become the adult perivisceral coeloms. No such sequential expression pattern is observed in endodermal, epidermal or neural tissues of either the larva or the presumptive juvenile sea urchin. The spatial expression patterns of the Hox genes illuminate the evolutionary process by which the pentameral echinoderm body plan emerged from a bilateral ancestor.

  1. Comparative genomics of Cluster O mycobacteriophages.

    Science.gov (United States)

    Cresawn, Steven G; Pope, Welkin H; Jacobs-Sera, Deborah; Bowman, Charles A; Russell, Daniel A; Dedrick, Rebekah M; Adair, Tamarah; Anders, Kirk R; Ball, Sarah; Bollivar, David; Breitenberger, Caroline; Burnett, Sandra H; Butela, Kristen; Byrnes, Deanna; Carzo, Sarah; Cornely, Kathleen A; Cross, Trevor; Daniels, Richard L; Dunbar, David; Findley, Ann M; Gissendanner, Chris R; Golebiewska, Urszula P; Hartzog, Grant A; Hatherill, J Robert; Hughes, Lee E; Jalloh, Chernoh S; De Los Santos, Carla; Ekanem, Kevin; Khambule, Sphindile L; King, Rodney A; King-Smith, Christina; Klyczek, Karen; Krukonis, Greg P; Laing, Christian; Lapin, Jonathan S; Lopez, A Javier; Mkhwanazi, Sipho M; Molloy, Sally D; Moran, Deborah; Munsamy, Vanisha; Pacey, Eddie; Plymale, Ruth; Poxleitner, Marianne; Reyna, Nathan; Schildbach, Joel F; Stukey, Joseph; Taylor, Sarah E; Ware, Vassie C; Wellmann, Amanda L; Westholm, Daniel; Wodarski, Donna; Zajko, Michelle; Zikalala, Thabiso S; Hendrix, Roger W; Hatfull, Graham F

    2015-01-01

    Mycobacteriophages--viruses of mycobacterial hosts--are genetically diverse but morphologically are all classified in the Caudovirales with double-stranded DNA and tails. We describe here a group of five closely related mycobacteriophages--Corndog, Catdawg, Dylan, Firecracker, and YungJamal--designated as Cluster O with long flexible tails but with unusual prolate capsids. Proteomic analysis of phage Corndog particles, Catdawg particles, and Corndog-infected cells confirms expression of half of the predicted gene products and indicates a non-canonical mechanism for translation of the Corndog tape measure protein. Bioinformatic analysis identifies 8-9 strongly predicted SigA promoters and all five Cluster O genomes contain more than 30 copies of a 17 bp repeat sequence with dyad symmetry located throughout the genomes. Comparison of the Cluster O phages provides insights into phage genome evolution including the processes of gene flux by horizontal genetic exchange.

  2. Some statistical properties of gene expression clustering for array data

    DEFF Research Database (Denmark)

    Abreu, G C G; Pinheiro, A; Drummond, R D

    2010-01-01

    DNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented...... for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https...

  3. Genomic and expression analysis of the vanG-like gene cluster of Clostridium difficile.

    Science.gov (United States)

    Peltier, Johann; Courtin, Pascal; El Meouche, Imane; Catel-Ferreira, Manuella; Chapot-Chartier, Marie-Pierre; Lemée, Ludovic; Pons, Jean-Louis

    2013-07-01

    Primary antibiotic treatment of Clostridium difficile intestinal diseases requires metronidazole or vancomycin therapy. A cluster of genes homologous to enterococcal glycopeptides resistance vanG genes was found in the genome of C. difficile 630, although this strain remains sensitive to vancomycin. This vanG-like gene cluster was found to consist of five ORFs: the regulatory region consisting of vanR and vanS and the effector region consisting of vanG, vanXY and vanT. We found that 57 out of 83 C. difficile strains, representative of the main lineages of the species, harbour this vanG-like cluster. The cluster is expressed as an operon and, when present, is found at the same genomic location in all strains. The vanG, vanXY and vanT homologues in C. difficile 630 are co-transcribed and expressed to a low level throughout the growth phases in the absence of vancomycin. Conversely, the expression of these genes is strongly induced in the presence of subinhibitory concentrations of vancomycin, indicating that the vanG-like operon is functional at the transcriptional level in C. difficile. Hydrophilic interaction liquid chromatography (HILIC-HPLC) and MS analysis of cytoplasmic peptidoglycan precursors of C. difficile 630 grown without vancomycin revealed the exclusive presence of a UDP-MurNAc-pentapeptide with an alanine at the C terminus. UDP-MurNAc-pentapeptide [d-Ala] was also the only peptidoglycan precursor detected in C. difficile grown in the presence of vancomycin, corroborating the lack of vancomycin resistance. Peptidoglycan structures of a vanG-like mutant strain and of a strain lacking the vanG-like cluster did not differ from the C. difficile 630 strain, indicating that the vanG-like cluster also has no impact on cell-wall composition.

  4. Genome-Wide Identification and Functional Analysis of the Calcineurin B-like Protein and Calcineurin B-like Protein-Interacting Protein Kinase Gene Families in Turnip (Brassica rapa var. rapa

    Directory of Open Access Journals (Sweden)

    Xin Yin

    2017-07-01

    Full Text Available The calcineurin B-like protein (CBL–CBL-interacting protein kinase (CIPK complex has been identified as a primary component in calcium sensors that perceives various stress signals. Turnip (Brassica rapa var. rapa has been widely cultivated in the Qinghai–Tibet Plateau for a century as a food crop of worldwide economic significance. These CBL–CIPK complexes have been demonstrated to play crucial roles in plant response to various environmental stresses. However, no report is available on the genome-wide characterization of these two gene families in turnip. In the present study, 19 and 51 members of the BrrCBL and BrrCIPK genes, respectively, are first identified in turnip and phylogenetically grouped into three and two distinct clusters, respectively. The expansion of these two gene families is mainly attributable to segmental duplication. Moreover, the differences in expression patterns in quantitative real-time PCR, as well as interaction profiles in the yeast two-hybrid assay, suggest the functional divergence of paralog genes during long-term evolution in turnip. Overexpressing and complement lines in Arabidopsis reveal that BrrCBL9.2 improves, but BrrCBL9.1 does not affect, salt tolerance in Arabidopsis. Thus, the expansion of the BrrCBL and BrrCIPK gene families enables the functional differentiation and evolution of some new gene functions of paralog genes. These paralog genes then play prominent roles in turnip's adaptation to the adverse environment of the Qinghai–Tibet Plateau. Overall, the study results contribute to our understanding of the functions of the CBL–CIPK complex and provide basis for selecting appropriate genes for the in-depth functional studies of BrrCBL–BrrCIPK in turnip.

  5. Deletion of Plasmodium falciparum Histidine-Rich Protein 2 (pfhrp2) and Histidine-Rich Protein 3 (pfhrp3) Genes in Colombian Parasites.

    Science.gov (United States)

    Murillo Solano, Claribel; Akinyi Okoth, Sheila; Abdallah, Joseph F; Pava, Zuleima; Dorado, Erika; Incardona, Sandra; Huber, Curtis S; Macedo de Oliveira, Alexandre; Bell, David; Udhayakumar, Venkatachalam; Barnwell, John W

    2015-01-01

    A number of studies have analyzed the performance of malaria rapid diagnostic tests (RDTs) in Colombia with discrepancies in performance being attributed to a combination of factors such as parasite levels, interpretation of RDT results and/or the handling and storage of RDT kits. However, some of the inconsistencies observed with results from Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-based RDTs could also be explained by the deletion of the gene that encodes the protein, pfhrp2, and its structural homolog, pfhrp3, in some parasite isolates. Given that pfhrp2- and pfhrp3-negative P. falciparum isolates have been detected in the neighboring Peruvian and Brazilian Amazon regions, we hypothesized that parasites with deletions of pfhrp2 and pfhrp3 may also be present in Colombia. In this study we tested 100 historical samples collected between 1999 and 2009 from six Departments in Colombia for the presence of pfhrp2, pfhrp3 and their flanking genes. Seven neutral microsatellites were also used to determine the genetic background of these parasites. In total 18 of 100 parasite isolates were found to have deleted pfhrp2, a majority of which (14 of 18) were collected from Amazonas Department, which borders Peru and Brazil. pfhrp3 deletions were found in 52 of the 100 samples collected from all regions of the country. pfhrp2 flanking genes PF3D7_0831900 and PF3D7_0831700 were deleted in 22 of 100 and in 1 of 100 samples, respectively. pfhrp3 flanking genes PF3D7_1372100 and PF3D7_1372400 were missing in 55 of 100 and in 57 of 100 samples. Structure analysis of microsatellite data indicated that Colombian samples tested in this study belonged to four clusters and they segregated mostly based on their geographic region. Most of the pfhrp2-deleted parasites were assigned to a single cluster and originated from Amazonas Department although a few pfhrp2-negative parasites originated from the other three clusters. The presence of a high proportion of pfhrp2

  6. Burkholderia thailandensis harbors two identical rhl gene clusters responsible for the biosynthesis of rhamnolipids

    Directory of Open Access Journals (Sweden)

    Woods Donald E

    2009-12-01

    Full Text Available Abstract Background Rhamnolipids are surface active molecules composed of rhamnose and β-hydroxydecanoic acid. These biosurfactants are produced mainly by Pseudomonas aeruginosa and have been thoroughly investigated since their early discovery. Recently, they have attracted renewed attention because of their involvement in various multicellular behaviors. Despite this high interest, only very few studies have focused on the production of rhamnolipids by Burkholderia species. Results Orthologs of rhlA, rhlB and rhlC, which are responsible for the biosynthesis of rhamnolipids in P. aeruginosa, have been found in the non-infectious Burkholderia thailandensis, as well as in the genetically similar important pathogen B. pseudomallei. In contrast to P. aeruginosa, both Burkholderia species contain these three genes necessary for rhamnolipid production within a single gene cluster. Furthermore, two identical, paralogous copies of this gene cluster are found on the second chromosome of these bacteria. Both Burkholderia spp. produce rhamnolipids containing 3-hydroxy fatty acid moieties with longer side chains than those described for P. aeruginosa. Additionally, the rhamnolipids produced by B. thailandensis contain a much larger proportion of dirhamnolipids versus monorhamnolipids when compared to P. aeruginosa. The rhamnolipids produced by B. thailandensis reduce the surface tension of water to 42 mN/m while displaying a critical micelle concentration value of 225 mg/L. Separate mutations in both rhlA alleles, which are responsible for the synthesis of the rhamnolipid precursor 3-(3-hydroxyalkanoyloxyalkanoic acid, prove that both copies of the rhl gene cluster are functional, but one contributes more to the total production than the other. Finally, a double ΔrhlA mutant that is completely devoid of rhamnolipid production is incapable of swarming motility, showing that both gene clusters contribute to this phenotype. Conclusions Collectively, these

  7. Molecular population genetics of the β-esterase gene cluster of ...

    Indian Academy of Sciences (India)

    We suggest that the demographic history (bottleneck and admixture of genetically differentiated populations) is the major factor shaping the pattern of nucleotide polymorphism in the -esterase gene cluster. However there are some 'footprints' of directional and balancing selection shaping specific distribution of nucleotide ...

  8. VRprofile: gene-cluster-detection-based profiling of virulence and antibiotic resistance traits encoded within genome sequences of pathogenic bacteria.

    Science.gov (United States)

    Li, Jun; Tai, Cui; Deng, Zixin; Zhong, Weihong; He, Yongqun; Ou, Hong-Yu

    2017-01-10

    VRprofile is a Web server that facilitates rapid investigation of virulence and antibiotic resistance genes, as well as extends these trait transfer-related genetic contexts, in newly sequenced pathogenic bacterial genomes. The used backend database MobilomeDB was firstly built on sets of known gene cluster loci of bacterial type III/IV/VI/VII secretion systems and mobile genetic elements, including integrative and conjugative elements, prophages, class I integrons, IS elements and pathogenicity/antibiotic resistance islands. VRprofile is thus able to co-localize the homologs of these conserved gene clusters using HMMer or BLASTp searches. With the integration of the homologous gene cluster search module with a sequence composition module, VRprofile has exhibited better performance for island-like region predictions than the other widely used methods. In addition, VRprofile also provides an integrated Web interface for aligning and visualizing identified gene clusters with MobilomeDB-archived gene clusters, or a variety set of bacterial genomes. VRprofile might contribute to meet the increasing demands of re-annotations of bacterial variable regions, and aid in the real-time definitions of disease-relevant gene clusters in pathogenic bacteria of interest. VRprofile is freely available at http://bioinfo-mml.sjtu.edu.cn/VRprofile. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. Genome-wide association study identifies the SERPINB gene cluster as a susceptibility locus for food allergy.

    Science.gov (United States)

    Marenholz, Ingo; Grosche, Sarah; Kalb, Birgit; Rüschendorf, Franz; Blümchen, Katharina; Schlags, Rupert; Harandi, Neda; Price, Mareike; Hansen, Gesine; Seidenberg, Jürgen; Röblitz, Holger; Yürek, Songül; Tschirner, Sebastian; Hong, Xiumei; Wang, Xiaobin; Homuth, Georg; Schmidt, Carsten O; Nöthen, Markus M; Hübner, Norbert; Niggemann, Bodo; Beyer, Kirsten; Lee, Young-Ae

    2017-10-20

    Genetic factors and mechanisms underlying food allergy are largely unknown. Due to heterogeneity of symptoms a reliable diagnosis is often difficult to make. Here, we report a genome-wide association study on food allergy diagnosed by oral food challenge in 497 cases and 2387 controls. We identify five loci at genome-wide significance, the clade B serpin (SERPINB) gene cluster at 18q21.3, the cytokine gene cluster at 5q31.1, the filaggrin gene, the C11orf30/LRRC32 locus, and the human leukocyte antigen (HLA) region. Stratifying the results for the causative food demonstrates that association of the HLA locus is peanut allergy-specific whereas the other four loci increase the risk for any food allergy. Variants in the SERPINB gene cluster are associated with SERPINB10 expression in leukocytes. Moreover, SERPINB genes are highly expressed in the esophagus. All identified loci are involved in immunological regulation or epithelial barrier function, emphasizing the role of both mechanisms in food allergy.

  10. Genetic clusters and sex-biased gene flow in a unicolonial Formica ant

    Directory of Open Access Journals (Sweden)

    Chapuisat Michel

    2009-03-01

    Full Text Available Abstract Background Animal societies are diverse, ranging from small family-based groups to extraordinarily large social networks in which many unrelated individuals interact. At the extreme of this continuum, some ant species form unicolonial populations in which workers and queens can move among multiple interconnected nests without eliciting aggression. Although unicoloniality has been mostly studied in invasive ants, it also occurs in some native non-invasive species. Unicoloniality is commonly associated with very high queen number, which may result in levels of relatedness among nestmates being so low as to raise the question of the maintenance of altruism by kin selection in such systems. However, the actual relatedness among cooperating individuals critically depends on effective dispersal and the ensuing pattern of genetic structuring. In order to better understand the evolution of unicoloniality in native non-invasive ants, we investigated the fine-scale population genetic structure and gene flow in three unicolonial populations of the wood ant F. paralugubris. Results The analysis of geo-referenced microsatellite genotypes and mitochondrial haplotypes revealed the presence of cryptic clusters of genetically-differentiated nests in the three populations of F. paralugubris. Because of this spatial genetic heterogeneity, members of the same clusters were moderately but significantly related. The comparison of nuclear (microsatellite and mitochondrial differentiation indicated that effective gene flow was male-biased in all populations. Conclusion The three unicolonial populations exhibited male-biased and mostly local gene flow. The high number of queens per nest, exchanges among neighbouring nests and restricted long-distance gene flow resulted in large clusters of genetically similar nests. The positive relatedness among clustermates suggests that kin selection may still contribute to the maintenance of altruism in unicolonial

  11. Regulatory role of tetR gene in a novel gene cluster of Acidovorax avenae subsp. avenae RS-1 under oxidative stress

    OpenAIRE

    Liu, He; Yang, Chun-Lan; Ge, Meng-Yu; Ibrahim, Muhammad; Li, Bin; Zhao, Wen-Jun; Chen, Gong-You; Zhu, Bo; Xie, Guan-Lin

    2014-01-01

    Acidovorax avenae subsp. avenae is the causal agent of bacterial brown stripe disease in rice. In this study, we characterized a novel horizontal transfer of a gene cluster, including tetR, on the chromosome of A. avenae subsp. avenae RS-1 by genome-wide analysis. TetR acted as a repressor in this gene cluster and the oxidative stress resistance was enhanced in tetR-deletion mutant strain. Electrophoretic mobility shift assay demonstrated that TetR regulator bound directly to the promoter of ...

  12. Molecular identification and characterisation of catalase and catalase-like protein genes in urease-positive thermophilic Campylobacter (UPTC).

    Science.gov (United States)

    Nakajima, T; Kuribayashi, T; Moore, J E; Millar, B C; Yamamoto, S; Matsuda, Motoo

    2016-01-01

    Thermophilic Campylobacter are important bacterial pathogens of foodborne diseases worldwide. These organisms' physiology requires a microaerophilic atmosphere. To date, little is known about the protective catalase mechanism in urease-positive thermophilic campylobacters (UPTC); hence, it was the aim of this study to identify and characterise catalase and catalase-like protein genes in these organisms. Catalase (katA) and catalase (Kat)-like protein genes from the Japanese UPTC CF89-12 strain were molecularly analysed and compared with C. lari RM2100 and other C. lari and thermophilic Campylobacter reference isolates. A possible open reading frame of 1,422 base pairs, predicted to encode a peptide of 474 amino acid residues, with calculated molecular weight of 52.7 kilo Daltons for katA, was identified within UPTC CF89-12. A probable ribosome binding site, two putative promoters and a putative ρ-independent transcription terminator were also identified within katA. A similar katA cluster also existed in the C. lari RM2100 strain, except that this strain carries no DcuB genes. However, the Kat-like protein gene or any other homologue(s) were never identified in the C. lari RM2100 strain, or in C. jejuni and C. upsaliensis. This study demonstrates the presence of catalase/catalase-like protein genes in UPTC organisms. These findings are significant in that they suggest that UPTC organisms have the protective genetic capability of helping protect the organisms from toxic oxygen stress, which may help them to survive in physiologically harsh environments, both within human and animal hosts, as well as in the natural environment.

  13. Expression of Genes Involved in Bacteriocin Production and Self-Resistance in Lactobacillus brevis 174A Is Mediated by Two Regulatory Proteins.

    Science.gov (United States)

    Noda, Masafumi; Miyauchi, Rumi; Danshiitsoodol, Narandalai; Matoba, Yasuyuki; Kumagai, Takanori; Sugiyama, Masanori

    2018-04-01

    We have previously shown that the lactic acid bacterium Lactobacillus brevis 174A, isolated from Citrus iyo fruit, produces a bacteriocin designated brevicin 174A, which is comprised of two antibacterial polypeptides (designated brevicins 174A-β and 174A-γ). We have also found a gene cluster, composed of eight open reading frames (ORFs), that contains genes for the biosynthesis of brevicin 174A, self-resistance to its own bacteriocin, and two transcriptional regulatory proteins. Some lactic acid bacterial strains have a system to start the production of bacteriocin at an adequate stage of growth. Generally, the system consists of a membrane-bound histidine protein kinase (HPK) that senses a specific environmental stimulus and a corresponding response regulator (RR) that mediates the cellular response. We have previously shown that although the HPK- and RR-encoding genes are not found on the brevicin 174A biosynthetic gene cluster in the 174A strain, two putative regulatory genes, designated breD and breG , are in the gene cluster. In the present study, we demonstrate that the expression of brevicin 174A production and self-resistance is positively controlled by two transcriptional regulatory proteins, designated BreD and BreG. BreD is expressed together with BreE as the self-resistance determinant of L. brevis 174A. DNase I footprinting analysis and a promoter assay demonstrated that BreD binds to the breED promoter as a positive autoregulator. The present study also demonstrates that BreG, carrying a transmembrane domain, binds to the common promoter of breB and breC , encoding brevicins 174A-β and 174A-γ, respectively, for positive regulation. IMPORTANCE The problem of the appearance of bacteria that are resistant to practical antibiotics and the increasing demand for safe foods have increased interest in replacing conventional antibiotics with bacteriocin produced by the lactic acid bacteria. This antibacterial substance can inhibit the growth of pathogenic

  14. Inference of gene-phenotype associations via protein-protein interaction and orthology.

    Directory of Open Access Journals (Sweden)

    Panwen Wang

    Full Text Available One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.

  15. Using the 2A Protein Coexpression System: Multicistronic 2A Vectors Expressing Gene(s) of Interest and Reporter Proteins.

    Science.gov (United States)

    Luke, Garry A; Ryan, Martin D

    2018-01-01

    To date, a huge range of different proteins-many with cotranslational and posttranslational subcellular localization signals-have been coexpressed together with various reporter proteins in vitro and in vivo using 2A peptides. The pros and cons of 2A co-expression technology are considered below, followed by a simple example of a "how to" protocol to concatenate multiple genes of interest, together with a reporter gene, into a single gene linked via 2As for easy identification or selection of transduced cells.

  16. Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages

    Science.gov (United States)

    Fungi that have the enzymes cyanase and carbonic anhydrase show a limited capacity to detoxify cyanate, a fungicide employed by both plants and humans. Here, we describe a novel two-gene cluster that comprises duplicated cyanase and carbonic anhydrase copies, which we name the CCA gene cluster, trac...

  17. Introducing a Clustering Step in a Consensus Approach for the Scoring of Protein-Protein Docking Models

    KAUST Repository

    Chermak, Edrisse; De Donato, Renato; Lensink, Marc F.; Petta, Andrea; Serra, Luigi; Scarano, Vittorio; Cavallo, Luigi; Oliva, Romina

    2016-01-01

    Correctly scoring protein-protein docking models to single out native-like ones is an open challenge. It is also an object of assessment in CAPRI (Critical Assessment of PRedicted Interactions), the community-wide blind docking experiment. We introduced in the field the first pure consensus method, CONSRANK, which ranks models based on their ability to match the most conserved contacts in the ensemble they belong to. In CAPRI, scorers are asked to evaluate a set of available models and select the top ten ones, based on their own scoring approach. Scorers' performance is ranked based on the number of targets/interfaces for which they could provide at least one correct solution. In such terms, blind testing in CAPRI Round 30 (a joint prediction round with CASP11) has shown that critical cases for CONSRANK are represented by targets showing multiple interfaces or for which only a very small number of correct solutions are available. To address these challenging cases, CONSRANK has now been modified to include a contact-based clustering of the models as a preliminary step of the scoring process. We used an agglomerative hierarchical clustering based on the number of common inter-residue contacts within the models. Two criteria, with different thresholds, were explored in the cluster generation, setting either the number of common contacts or of total clusters. For each clustering approach, after selecting the top (most populated) ten clusters, CONSRANK was run on these clusters and the top-ranked model for each cluster was selected, in the limit of 10 models per target. We have applied our modified scoring approach, Clust-CONSRANK, to SCORE_SET, a set of CAPRI scoring models made recently available by CAPRI assessors, and to the subset of homodimeric targets in CAPRI Round 30 for which CONSRANK failed to include a correct solution within the ten selected models. Results show that, for the challenging cases, the clustering step typically enriches the ten top ranked

  18. Introducing a Clustering Step in a Consensus Approach for the Scoring of Protein-Protein Docking Models

    KAUST Repository

    Chermak, Edrisse

    2016-11-15

    Correctly scoring protein-protein docking models to single out native-like ones is an open challenge. It is also an object of assessment in CAPRI (Critical Assessment of PRedicted Interactions), the community-wide blind docking experiment. We introduced in the field the first pure consensus method, CONSRANK, which ranks models based on their ability to match the most conserved contacts in the ensemble they belong to. In CAPRI, scorers are asked to evaluate a set of available models and select the top ten ones, based on their own scoring approach. Scorers\\' performance is ranked based on the number of targets/interfaces for which they could provide at least one correct solution. In such terms, blind testing in CAPRI Round 30 (a joint prediction round with CASP11) has shown that critical cases for CONSRANK are represented by targets showing multiple interfaces or for which only a very small number of correct solutions are available. To address these challenging cases, CONSRANK has now been modified to include a contact-based clustering of the models as a preliminary step of the scoring process. We used an agglomerative hierarchical clustering based on the number of common inter-residue contacts within the models. Two criteria, with different thresholds, were explored in the cluster generation, setting either the number of common contacts or of total clusters. For each clustering approach, after selecting the top (most populated) ten clusters, CONSRANK was run on these clusters and the top-ranked model for each cluster was selected, in the limit of 10 models per target. We have applied our modified scoring approach, Clust-CONSRANK, to SCORE_SET, a set of CAPRI scoring models made recently available by CAPRI assessors, and to the subset of homodimeric targets in CAPRI Round 30 for which CONSRANK failed to include a correct solution within the ten selected models. Results show that, for the challenging cases, the clustering step typically enriches the ten top ranked

  19. Analysis of gene and protein name synonyms in Entrez Gene and UniProtKB resources

    KAUST Repository

    Arkasosy, Basil

    2013-05-11

    Ambiguity in texts is a well-known problem: words can carry several meanings, and hence, can be read and interpreted differently. This is also true in the biological literature; names of biological concepts, such as genes and proteins, might be ambiguous, referring in some cases to more than one gene or one protein, or in others, to both genes and proteins at the same time. Public biological databases give a very useful insight about genes and proteins information, including their names. In this study, we made a thorough analysis of the nomenclatures of genes and proteins in two data sources and for six different species. We developed an automated process that parses, extracts, processes and stores information available in two major biological databases: Entrez Gene and UniProtKB. We analysed gene and protein synonyms, their types, frequencies, and the ambiguities within a species, in between data sources and cross-species. We found that at least 40% of the cross-species ambiguities are caused by names that are already ambiguous within the species. Our study shows that from the six species we analysed (Homo Sapiens, Mus Musculus, Arabidopsis Thaliana, Oryza Sativa, Bacillus Subtilis and Pseudomonas Fluorescens), rice (Oriza Sativa) has the best naming model in Entrez Gene database, with low ambiguities between data sources and cross-species.

  20. Deletion and Gene Expression Analyses Define the Paxilline Biosynthetic Gene Cluster in Penicillium paxilli

    Directory of Open Access Journals (Sweden)

    Emily J. Parker

    2013-08-01

    Full Text Available The indole-diterpene paxilline is an abundant secondary metabolite synthesized by Penicillium paxilli. In total, 21 genes have been identified at the PAX locus of which six have been previously confirmed to have a functional role in paxilline biosynthesis. A combination of bioinformatics, gene expression and targeted gene replacement analyses were used to define the boundaries of the PAX gene cluster. Targeted gene replacement identified seven genes, paxG, paxA, paxM, paxB, paxC, paxP and paxQ that were all required for paxilline production, with one additional gene, paxD, required for regular prenylation of the indole ring post paxilline synthesis. The two putative transcription factors, PP104 and PP105, were not co-regulated with the pax genes and based on targeted gene replacement, including the double knockout, did not have a role in paxilline production. The relationship of indole dimethylallyl transferases involved in prenylation of indole-diterpenes such as paxilline or lolitrem B, can be found as two disparate clades, not supported by prenylation type (e.g., regular or reverse. This paper provides insight into the P. paxilli indole-diterpene locus and reviews the recent advances identified in paxilline biosynthesis.

  1. Ubiquitin--conserved protein or selfish gene?

    Science.gov (United States)

    Catic, André; Ploegh, Hidde L

    2005-11-01

    The posttranslational modifier ubiquitin is encoded by a multigene family containing three primary members, which yield the precursor protein polyubiquitin and two ubiquitin moieties, Ub(L40) and Ub(S27), that are fused to the ribosomal proteins L40 and S27, respectively. The gene encoding polyubiquitin is highly conserved and, until now, those encoding Ub(L40) and Ub(S27) have been generally considered to be equally invariant. The evolution of the ribosomal ubiquitin moieties is, however, proving to be more dynamic. It seems that the genes encoding Ub(L40) and Ub(S27) are actively maintained by homologous recombination with the invariant polyubiquitin locus. Failure to recombine leads to deterioration of the sequence of the ribosomal ubiquitin moieties in several phyla, although this deterioration is evidently constrained by the structural requirements of the ubiquitin fold. Only a few amino acids in ubiquitin are vital for its function, and we propose that conservation of all three ubiquitin genes is driven not only by functional properties of the ubiquitin protein, but also by the propensity of the polyubiquitin locus to act as a 'selfish gene'.

  2. Genome-Wide Analysis of Secondary Metabolite Gene Clusters in Ophiostoma ulmi and Ophiostoma novo-ulmi Reveals a Fujikurin-Like Gene Cluster with a Putative Role in Infection

    Directory of Open Access Journals (Sweden)

    Nicolau Sbaraini

    2017-06-01

    Full Text Available The emergence of new microbial pathogens can result in destructive outbreaks, since their hosts have limited resistance and pathogens may be excessively aggressive. Described as the major ecological incident of the twentieth century, Dutch elm disease, caused by ascomycete fungi from the Ophiostoma genus, has caused a significant decline in elm tree populations (Ulmus sp. in North America and Europe. Genome sequencing of the two main causative agents of Dutch elm disease (Ophiostoma ulmi and Ophiostoma novo-ulmi, along with closely related species with different lifestyles, allows for unique comparisons to be made to identify how pathogens and virulence determinants have emerged. Among several established virulence determinants, secondary metabolites (SMs have been suggested to play significant roles during phytopathogen infection. Interestingly, the secondary metabolism of Dutch elm pathogens remains almost unexplored, and little is known about how SM biosynthetic genes are organized in these species. To better understand the metabolic potential of O. ulmi and O. novo-ulmi, we performed a deep survey and description of SM biosynthetic gene clusters (BGCs in these species and assessed their conservation among eight species from the Ophiostomataceae family. Among 19 identified BGCs, a fujikurin-like gene cluster (OpPKS8 was unique to Dutch elm pathogens. Phylogenetic analysis revealed that orthologs for this gene cluster are widespread among phytopathogens and plant-associated fungi, suggesting that OpPKS8 may have been horizontally acquired by the Ophiostoma genus. Moreover, the detailed identification of several BGCs paves the way for future in-depth research and supports the potential impact of secondary metabolism on Ophiostoma genus’ lifestyle.

  3. Nonsynonymous substitution rate (Ka is a relatively consistent parameter for defining fast-evolving and slow-evolving protein-coding genes

    Directory of Open Access Journals (Sweden)

    Wang Lei

    2011-02-01

    Full Text Available Abstract Background Mammalian genome sequence data are being acquired in large quantities and at enormous speeds. We now have a tremendous opportunity to better understand which genes are the most variable or conserved, and what their particular functions and evolutionary dynamics are, through comparative genomics. Results We chose human and eleven other high-coverage mammalian genome data–as well as an avian genome as an outgroup–to analyze orthologous protein-coding genes using nonsynonymous (Ka and synonymous (Ks substitution rates. After evaluating eight commonly-used methods of Ka and Ks calculation, we observed that these methods yielded a nearly uniform result when estimating Ka, but not Ks (or Ka/Ks. When sorting genes based on Ka, we noticed that fast-evolving and slow-evolving genes often belonged to different functional classes, with respect to species-specificity and lineage-specificity. In particular, we identified two functional classes of genes in the acquired immune system. Fast-evolving genes coded for signal-transducing proteins, such as receptors, ligands, cytokines, and CDs (cluster of differentiation, mostly surface proteins, whereas the slow-evolving genes were for function-modulating proteins, such as kinases and adaptor proteins. In addition, among slow-evolving genes that had functions related to the central nervous system, neurodegenerative disease-related pathways were enriched significantly in most mammalian species. We also confirmed that gene expression was negatively correlated with evolution rate, i.e. slow-evolving genes were expressed at higher levels than fast-evolving genes. Our results indicated that the functional specializations of the three major mammalian clades were: sensory perception and oncogenesis in primates, reproduction and hormone regulation in large mammals, and immunity and angiotensin in rodents. Conclusion Our study suggests that Ka calculation, which is less biased compared to Ks and Ka

  4. Protein annotation from protein interaction networks and Gene Ontology.

    Science.gov (United States)

    Nguyen, Cao D; Gardiner, Katheleen J; Cios, Krzysztof J

    2011-10-01

    We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precision and 60% recall versus 45% and 26% for Majority and 24% and 61% for χ²-statistics, respectively. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. Comparative genomics of Cluster O mycobacteriophages.

    Directory of Open Access Journals (Sweden)

    Steven G Cresawn

    Full Text Available Mycobacteriophages--viruses of mycobacterial hosts--are genetically diverse but morphologically are all classified in the Caudovirales with double-stranded DNA and tails. We describe here a group of five closely related mycobacteriophages--Corndog, Catdawg, Dylan, Firecracker, and YungJamal--designated as Cluster O with long flexible tails but with unusual prolate capsids. Proteomic analysis of phage Corndog particles, Catdawg particles, and Corndog-infected cells confirms expression of half of the predicted gene products and indicates a non-canonical mechanism for translation of the Corndog tape measure protein. Bioinformatic analysis identifies 8-9 strongly predicted SigA promoters and all five Cluster O genomes contain more than 30 copies of a 17 bp repeat sequence with dyad symmetry located throughout the genomes. Comparison of the Cluster O phages provides insights into phage genome evolution including the processes of gene flux by horizontal genetic exchange.

  6. Zinc fingers, zinc clusters, and zinc twists in DNA-binding protein domains

    International Nuclear Information System (INIS)

    Vallee, B.L.; Auld, D.S.; Coleman, J.E.

    1991-01-01

    The authors recognize three distinct motifs of DNA-binding zinc proteins: (i) zinc fingers, (ii) zinc clusters, and (iii) zinc twists. Until very recently, x-ray crystallographic or NMR three-dimensional structure analyses of DNA-binding zinc proteins have not been available to serve as standards of reference for the zinc binding sites of these families of proteins. Those of the DNA-binding domains of the fungal transcription factor GAL4 and the rat glucocorticoid receptor are the first to have been determined. Both proteins contain two zinc binding sites, and in both, cysteine residues are the sole zinc ligands. In GAL4, two zinc atoms are bound to six cysteine residues which form a zinc cluster akin to that of metallothionein; the distance between the two zinc atoms of GAL4 is ∼3.5 angstrom. In the glucocorticoid receptor, each zinc atom is bound to four cysteine residues; the interatomic zinc-zinc distance is ∼13 angstrom, and in this instance, a zinc twist is represented by a helical DNA recognition site located between the two zinc atoms. Zinc clusters and zinc twists are here recognized as two distinctive motifs in DNA-binding proteins containing multiple zinc atoms. For native zinc fingers, structural data do not exist as yet; consequently, the interatomic distances between zinc atoms are not known. As further structural data become available, the structural and functional significance of these different motifs in their binding to DNA and other proteins participating in the transmission of the genetic message will become apparent

  7. ATNT: an enhanced system for expression of polycistronic secondary metabolite gene clusters in Aspergillus niger.

    Science.gov (United States)

    Geib, Elena; Brock, Matthias

    2017-01-01

    Fungi are treasure chests for yet unexplored natural products. However, exploitation of their real potential remains difficult as a significant proportion of biosynthetic gene clusters appears silent under standard laboratory conditions. Therefore, elucidation of novel products requires gene activation or heterologous expression. For heterologous gene expression, we previously developed an expression platform in Aspergillus niger that is based on the transcriptional regulator TerR and its target promoter P terA . In this study, we extended this system by regulating expression of terR  by the doxycycline inducible Tet-on system. Reporter genes cloned under the control of the target promoter P terA remained silent in the absence of doxycycline, but were strongly expressed when doxycycline was added. Reporter quantification revealed that the coupled system results in about five times higher expression rates compared to gene expression under direct control of the Tet-on system. As production of secondary metabolites generally requires the expression of several biosynthetic genes, the suitability of the self-cleaving viral peptide sequence P2A was tested in this optimised expression system. P2A allowed polycistronic expression of genes required for Asp-melanin formation in combination with the gene coding for the red fluorescent protein tdTomato. Gene expression and Asp-melanin formation was prevented in the absence of doxycycline and strongly induced by addition of doxycycline. Fluorescence studies confirmed the correct subcellular localisation of the respective enzymes. This tightly regulated but strongly inducible expression system enables high level production of secondary metabolites most likely even those with toxic potential. Furthermore, this system is compatible with polycistronic gene expression and, thus, suitable for the discovery of novel natural products.

  8. Heterozygous truncation mutations of the SMC1A gene cause a severe early onset epilepsy with cluster seizures in females: Detailed phenotyping of 10 new cases.

    Science.gov (United States)

    Symonds, Joseph D; Joss, Shelagh; Metcalfe, Kay A; Somarathi, Suresh; Cruden, Jamie; Devlin, Anita M; Donaldson, Alan; DiDonato, Nataliya; Fitzpatrick, David; Kaiser, Frank J; Lampe, Anne K; Lees, Melissa M; McLellan, Ailsa; Montgomery, Tara; Mundada, Vivek; Nairn, Lesley; Sarkar, Ajoy; Schallner, Jens; Pozojevic, Jelena; Parenti, Ilaria; Tan, Jeen; Turnpenny, Peter; Whitehouse, William P; Zuberi, Sameer M

    2017-04-01

    The phenotype of seizure clustering with febrile illnesses in infancy/early childhood is well recognized. To date the only genetic epilepsy consistently associated with this phenotype is PCDH19, an X-linked disorder restricted to females, and males with mosaicism. The SMC1A gene, which encodes a structural component of the cohesin complex is also located on the X chromosome. Missense variants and small in-frame deletions of SMC1A cause approximately 5% of Cornelia de Lange Syndrome (CdLS). Recently, protein truncating mutations in SMC1A have been reported in five females, all of whom have been affected by a drug-resistant epilepsy, and severe developmental impairment. Our objective was to further delineate the phenotype of SMC1A truncation. Female cases with de novo truncation mutations in SMC1A were identified from the Deciphering Developmental Disorders (DDD) study (n = 8), from postmortem testing of an affected twin (n = 1), and from clinical testing with an epilepsy gene panel (n = 1). Detailed information on the phenotype in each case was obtained. Ten cases with heterozygous de novo mutations in the SMC1A gene are presented. All 10 mutations identified are predicted to result in premature truncation of the SMC1A protein. All cases are female, and none had a clinical diagnosis of CdLS. They presented with onset of epileptic seizures between <4 weeks and 28 months of age. In the majority of cases, a marked preponderance for seizures to occur in clusters was noted. Seizure clusters were associated with developmental regression. Moderate or severe developmental impairment was apparent in all cases. Truncation mutations in SMC1A cause a severe epilepsy phenotype with cluster seizures in females. These mutations are likely to be nonviable in males. Wiley Periodicals, Inc. © 2017 International League Against Epilepsy.

  9. Polymorphisms of ST2-IL18R1-IL18RAP gene cluster: a new risk for autoimmune thyroid diseases.

    Science.gov (United States)

    Wang, X; Zhu, Y F; Li, D M; Qin, Q; Wang, Q; Muhali, F S; Jiang, W J; Zhang, J A

    2016-02-01

    Interleukin 33 (IL33) / ST2 pathway and ST2-interlukin18 receptor1-interlukin18 receptor accessory protein (ST2-IL18R1-IL18RAP) gene cluster have been involved in many autoimmune diseases but few report in autoimmune thyroid diseases (AITD). In this study, we investigated whether polymorphisms of IL33, ST2, IL18R1, and IL18RAP are associated with Graves' disease (GD) and Hashimoto's thyroiditis (HT), two major forms of AITD, among a Chinese population. A total of 11 SNPs were explored in a case-control study including 417 patients with GD, 250 HT patients and 301 controls, including rs1929992, rs10975519, rs10208293, rs6543116, rs1041973, rs3732127, rs11465597, rs1035130, rs2293225, rs1035127, rs917997 of IL 33, ST2-IL18R1-IL18RAP gene cluster. Genotyping of these SNPs was performed using matrix-assisted laser desorption / ionization-time-of-flight mass spectrometer (MALDI-TOF-MS) platform from Sequenom. The frequencies of allele A and AA+AG genotype of rs6543116 (ST2) in HT patients were significantly increased compared with those of the controls (P = 0.029/0.021, OR = 1.31/1.62). And in another SNP rs917997, AA+AG genotype presented an increased frequency in HT subjects compared with controls (P = 0.046, OR = 1.53). Furthermore, the haplotype GAGCCCG from ST2-IL18R1-IL18RAP gene cluster (rs6543116, rs1041973, rs1035130, rs3732127, rs1035127, rs2293225, rs917997) was associated with increased susceptibility to GD with an OR of 2.03 (P = 0.022, 95% CI = 1.07-3.86). Some SNPs of ST2-IL18R1-IL18RAP gene cluster might increase the risk of susceptibility of HT and GD in Chinese Han population. © 2015 John Wiley & Sons Ltd.

  10. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    Science.gov (United States)

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  11. Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins.

    Science.gov (United States)

    Wang, Ying; Ding, Jun; Daniell, Henry; Hu, Haiyan; Li, Xiaoman

    2012-09-01

    Chloroplasts play critical roles in land plant cells. Despite their importance and the availability of at least 200 sequenced chloroplast genomes, the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper, we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes, indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses, protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs. Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein, which sheds light on the understanding of the transcriptional regulation of chloroplast genes.

  12. Finding trans-regulatory genes and protein complexes modulating meiotic recombination hotspots of human, mouse and yeast.

    Science.gov (United States)

    Wu, Min; Kwoh, Chee-Keong; Li, Xiaoli; Zheng, Jie

    2014-09-11

    The regulatory mechanism of recombination is one of the most fundamental problems in genomics, with wide applications in genome wide association studies (GWAS), birth-defect diseases, molecular evolution, cancer research, etc. Recombination events cluster into short genomic regions called "recombination hotspots". Recently, a zinc finger protein PRDM9 was reported to regulate recombination hotspots in human and mouse genomes. In addition, a 13-mer motif contained in the binding sites of PRDM9 is found to be enriched in human hotspots. However, this 13-mer motif only covers a fraction of hotspots, indicating that PRDM9 is not the only regulator of recombination hotspots. Therefore, the challenge of discovering other regulators of recombination hotspots becomes significant. Furthermore, recombination is a complex process. Hence, multiple proteins acting as machinery, rather than individual proteins, are more likely to carry out this process in a precise and stable manner. Therefore, the extension of the prediction of individual trans-regulators to protein complexes is also highly desired. In this paper, we introduce a pipeline to identify genes and protein complexes associated with recombination hotspots. First, we prioritize proteins associated with hotspots based on their preference of binding to hotspots and coldspots. Second, using the above identified genes as seeds, we apply the Random Walk with Restart algorithm (RWR) to propagate their influences to other proteins in protein-protein interaction (PPI) networks. Hence, many proteins without DNA-binding information will also be assigned a score to implicate their roles in recombination hotspots. Third, we construct sub-PPI networks induced by top genes ranked by RWR for various species (e.g., yeast, human and mouse) and detect protein complexes in those sub-PPI networks. The GO term analysis show that our prioritizing methods and the RWR algorithm are capable of identifying novel genes associated with

  13. Identification and analysis of the paulomycin biosynthetic gene cluster and titer improvement of the paulomycins in Streptomyces paulus NRRL 8115.

    Directory of Open Access Journals (Sweden)

    Jine Li

    Full Text Available The paulomycins are a group of glycosylated compounds featuring a unique paulic acid moiety. To locate their biosynthetic gene clusters, the genomes of two paulomycin producers, Streptomyces paulus NRRL 8115 and Streptomyces sp. YN86, were sequenced. The paulomycin biosynthetic gene clusters were defined by comparative analyses of the two genomes together with the genome of the third paulomycin producer Streptomyces albus J1074. Subsequently, the identity of the paulomycin biosynthetic gene cluster was confirmed by inactivation of two genes involved in biosynthesis of the paulomycose branched chain (pau11 and the ring A moiety (pau18 in Streptomyces paulus NRRL 8115. After determining the gene cluster boundaries, a convergent biosynthetic model was proposed for paulomycin based on the deduced functions of the pau genes. Finally, a paulomycin high-producing strain was constructed by expressing an activator-encoding gene (pau13 in S. paulus, setting the stage for future investigations.

  14. Investigating the Correspondence Between Transcriptomic and Proteomic Expression Profiles Using Coupled Cluster Models

    International Nuclear Information System (INIS)

    Rogers, Simon; Girolami, Mark; Kolch, Walter; Waters, Katrina M.; Liu, Tao; Thrall, Brian D.; Wiley, H. S.

    2008-01-01

    Modern transcriptomics and proteomics enable us to survey the expression of RNAs and proteins at large scales. While these data are usually generated and analyzed separately, there is an increasing interest in comparing and co-analyzing transcriptome and proteome expression data. A major open question is whether transcriptome and proteome expression is linked and how it is coordinated. Results: Here we have developed a probabilistic clustering model that permits analysis of the links between transcriptomic and proteomic profiles in a sensible and flexible manner. Our coupled mixture model defines a prior probability distribution over the component to which a protein profile should be assigned conditioned on which component the associated mRNA profile belongs to. By providing probabilistic assignments this approach sits between the two extremes of concatenating the data on the assumption that mRNA and protein clusters would have a one-to-one relationship, and independent clustering where the mRNA profile provides no information on the protein profile and vice-versa. We apply this approach to a large dataset of quantitative transcriptomic and proteomic expression data obtained from a human breast epithelial cell line (HMEC) stimulated by epidermal growth factor (EGF) over a series of timepoints corresponding to one cell cycle. The results reveal a complex relationship between transcriptome and proteome with most mRNA clusters linked to at least two protein clusters, and vice versa. A more detailed analysis incorporating information on gene function from the gene ontology database shows that a high correlation of mRNA and protein expression is limited to the components of some molecular machines, such as the ribosome, cell adhesion complexes and the TCP-1 chaperonin involved in protein folding. Conclusions: The dynamic regulation of the transcriptome and proteome in mammalian cells in response to an acute mitogenic stimulus appears largely independent with very little

  15. Chassis organism from Corynebacterium glutamicum--a top-down approach to identify and delete irrelevant gene clusters.

    Science.gov (United States)

    Unthan, Simon; Baumgart, Meike; Radek, Andreas; Herbst, Marius; Siebert, Daniel; Brühl, Natalie; Bartsch, Anna; Bott, Michael; Wiechert, Wolfgang; Marin, Kay; Hans, Stephan; Krämer, Reinhard; Seibold, Gerd; Frunzke, Julia; Kalinowski, Jörn; Rückert, Christian; Wendisch, Volker F; Noack, Stephan

    2015-02-01

    For synthetic biology applications, a robust structural basis is required, which can be constructed either from scratch or in a top-down approach starting from any existing organism. In this study, we initiated the top-down construction of a chassis organism from Corynebacterium glutamicum ATCC 13032, aiming for the relevant gene set to maintain its fast growth on defined medium. We evaluated each native gene for its essentiality considering expression levels, phylogenetic conservation, and knockout data. Based on this classification, we determined 41 gene clusters ranging from 3.7 to 49.7 kbp as target sites for deletion. 36 deletions were successful and 10 genome-reduced strains showed impaired growth rates, indicating that genes were hit, which are relevant to maintain biological fitness at wild-type level. In contrast, 26 deleted clusters were found to include exclusively irrelevant genes for growth on defined medium. A combinatory deletion of all irrelevant gene clusters would, in a prophage-free strain, decrease the size of the native genome by about 722 kbp (22%) to 2561 kbp. Finally, five combinatory deletions of irrelevant gene clusters were investigated. The study introduces the novel concept of relevant genes and demonstrates general strategies to construct a chassis suitable for biotechnological application. © 2014 The Authors. Biotechnology Journal published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim. This is an open access article under the terms of the Creative Commons Attribution-Non-Commercial-NoDerivs Licence, which permits use and distribution in any medium, provided the original work is properly cited, the use is non- commercial and no modifications or adaptations are made.

  16. Gravitation field algorithm and its application in gene cluster

    Directory of Open Access Journals (Sweden)

    Zheng Ming

    2010-09-01

    Full Text Available Abstract Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA.

  17. Molecular Characterization of SQUAMOSA PROMOTER BINDING PROTEIN-LIKE (SPL Gene Family in Betula luminifera

    Directory of Open Access Journals (Sweden)

    Xiu-Yun Li

    2018-05-01

    Full Text Available As a major family of plant-specific transcription factors, SQUAMOSA PROMOTER BINDING PROTEIN-LIKE (SPL genes play vital regulatory roles in plant growth, development and stress responses. In this study, 18 SPL genes were identified and cloned from Betula luminifera. Two zinc finger-like structures and a nuclear location signal (NLS segments were existed in the SBP domains of all BlSPLs. Phylogenetic analysis showed that these genes were clustered into nine groups (group I-IX. The intron/exon structure and motif composition were highly conserved within the same group. 12 of the 18 BlSPLs were experimentally verified as the targets of miR156, and two cleavage sites were detected in these miR156-targeted BlSPL genes. Many putative cis-elements, associated with light, stresses and phytohormones response, were identified in the promoter regions of BlSPLs, suggesting that BlSPL genes are probably involved in important physiological processes and developmental events. Tissue-specific expression analysis showed that miR156-targeted BlSPLs exhibited a more differential expression pattern, while most miR156-nontargeted BlSPLs tended to be constitutively expressed, suggesting the distinct roles of miR156-targeted and nontargeted BlSPLs in development and growth of B. luminifera. Further expression analysis revealed that miR156-targeted BlSPLs were dramatically up-regulated with age, whereas mature BlmiR156 level was apparently declined with age, indicating that miR156/SPL module plays important roles in vegetative phase change of B. luminifera. Moreover, yeast two-hybrid assay indicated that several miR156-targeted and nontargeted BlSPLs could interact with two DELLA proteins (BlRGA and BlRGL, which suggests that certain BlSPLs take part in the GA regulated processes through protein interaction with DELLA proteins. All these results provide an important basis for further exploring the biological functions of BlSPLs in B. luminifera.

  18. Draft genome sequence of Streptomyces coelicoflavus ZG0656 reveals the putative biosynthetic gene cluster of acarviostatin family α-amylase inhibitors.

    Science.gov (United States)

    Guo, X; Geng, P; Bai, F; Bai, G; Sun, T; Li, X; Shi, L; Zhong, Q

    2012-08-01

    The aims of this study are to obtain the draft genome sequence of Streptomyces coelicoflavus ZG0656, which produces novel acarviostatin family α-amylase inhibitors, and then to reveal the putative acarviostatin-related gene cluster and the biosynthetic pathway. The draft genome sequence of S. coelicoflavus ZG0656 was generated using a shotgun approach employing a combination of 454 and Solexa sequencing technologies. Genome analysis revealed a putative gene cluster for acarviostatin biosynthesis, termed sct-cluster. The cluster contains 13 acarviostatin synthetic genes, six transporter genes, four starch degrading or transglycosylation enzyme genes and two regulator genes. On the basis of bioinformatic analysis, we proposed a putative biosynthetic pathway of acarviostatins. The intracellular steps produce a structural core, acarviostatin I00-7-P, and the extracellular assemblies lead to diverse acarviostatin end products. The draft genome sequence of S. coelicoflavus ZG0656 revealed the putative biosynthetic gene cluster of acarviostatins and a putative pathway of acarviostatin production. To our knowledge, S. coelicoflavus ZG0656 is the first strain in this species for which a genome sequence has been reported. The analysis of sct-cluster provided important insights into the biosynthesis of acarviostatins. This work will be a platform for producing novel variants and yield improvement. © 2012 The Authors. Letters in Applied Microbiology © 2012 The Society for Applied Microbiology.

  19. Identification and functional analysis of gene cluster involvement in biosynthesis of the cyclic lipopeptide antibiotic pelgipeptin produced by Paenibacillus elgii

    Directory of Open Access Journals (Sweden)

    Qian Chao-Dong

    2012-09-01

    Full Text Available Abstract Background Pelgipeptin, a potent antibacterial and antifungal agent, is a non-ribosomally synthesised lipopeptide antibiotic. This compound consists of a β-hydroxy fatty acid and nine amino acids. To date, there is no information about its biosynthetic pathway. Results A potential pelgipeptin synthetase gene cluster (plp was identified from Paenibacillus elgii B69 through genome analysis. The gene cluster spans 40.8 kb with eight open reading frames. Among the genes in this cluster, three large genes, plpD, plpE, and plpF, were shown to encode non-ribosomal peptide synthetases (NRPSs, with one, seven, and one module(s, respectively. Bioinformatic analysis of the substrate specificity of all nine adenylation domains indicated that the sequence of the NRPS modules is well collinear with the order of amino acids in pelgipeptin. Additional biochemical analysis of four recombinant adenylation domains (PlpD A1, PlpE A1, PlpE A3, and PlpF A1 provided further evidence that the plp gene cluster involved in pelgipeptin biosynthesis. Conclusions In this study, a gene cluster (plp responsible for the biosynthesis of pelgipeptin was identified from the genome sequence of Paenibacillus elgii B69. The identification of the plp gene cluster provides an opportunity to develop novel lipopeptide antibiotics by genetic engineering.

  20. A complex of Cas proteins 5, 6, and 7 is required for the biogenesis and stability of clustered regularly interspaced short palindromic repeats (crispr)-derived rnas (crrnas) in Haloferax volcanii.

    Science.gov (United States)

    Brendel, Jutta; Stoll, Britta; Lange, Sita J; Sharma, Kundan; Lenz, Christof; Stachler, Aris-Edda; Maier, Lisa-Katharina; Richter, Hagen; Nickel, Lisa; Schmitz, Ruth A; Randau, Lennart; Allers, Thorsten; Urlaub, Henning; Backofen, Rolf; Marchfelder, Anita

    2014-03-07

    The clustered regularly interspaced short palindromic repeats/CRISPR-associated (CRISPR-Cas) system is a prokaryotic defense mechanism against foreign genetic elements. A plethora of CRISPR-Cas versions exist, with more than 40 different Cas protein families and several different molecular approaches to fight the invading DNA. One of the key players in the system is the CRISPR-derived RNA (crRNA), which directs the invader-degrading Cas protein complex to the invader. The CRISPR-Cas types I and III use the Cas6 protein to generate mature crRNAs. Here, we show that the Cas6 protein is necessary for crRNA production but that additional Cas proteins that form a CRISPR-associated complex for antiviral defense (Cascade)-like complex are needed for crRNA stability in the CRISPR-Cas type I-B system in Haloferax volcanii in vivo. Deletion of the cas6 gene results in the loss of mature crRNAs and interference. However, cells that have the complete cas gene cluster (cas1-8b) removed and are transformed with the cas6 gene are not able to produce and stably maintain mature crRNAs. crRNA production and stability is rescued only if cas5, -6, and -7 are present. Mutational analysis of the cas6 gene reveals three amino acids (His-41, Gly-256, and Gly-258) that are essential for pre-crRNA cleavage, whereas the mutation of two amino acids (Ser-115 and Ser-224) leads to an increase of crRNA amounts. This is the first systematic in vivo analysis of Cas6 protein variants. In addition, we show that the H. volcanii I-B system contains a Cascade-like complex with a Cas7, Cas5, and Cas6 core that protects the crRNA.

  1. Genetic variations and haplotype diversity of the UGT1 gene cluster in the Chinese population.

    Directory of Open Access Journals (Sweden)

    Jing Yang

    Full Text Available Vertebrates require tremendous molecular diversity to defend against numerous small hydrophobic chemicals. UDP-glucuronosyltransferases (UGTs are a large family of detoxification enzymes that glucuronidate xenobiotics and endobiotics, facilitating their excretion from the body. The UGT1 gene cluster contains a tandem array of variable first exons, each preceded by a specific promoter, and a common set of downstream constant exons, similar to the genomic organization of the protocadherin (Pcdh, immunoglobulin, and T-cell receptor gene clusters. To assist pharmacogenomics studies in Chinese, we sequenced nine first exons, promoter and intronic regions, and five common exons of the UGT1 gene cluster in a population sample of 253 unrelated Chinese individuals. We identified 101 polymorphisms and found 15 novel SNPs. We then computed allele frequencies for each polymorphism and reconstructed their linkage disequilibrium (LD map. The UGT1 cluster can be divided into five linkage blocks: Block 9 (UGT1A9, Block 9/7/6 (UGT1A9, UGT1A7, and UGT1A6, Block 5 (UGT1A5, Block 4/3 (UGT1A4 and UGT1A3, and Block 3' UTR. Furthermore, we inferred haplotypes and selected their tagSNPs. Finally, comparing our data with those of three other populations of the HapMap project revealed ethnic specificity of the UGT1 genetic diversity in Chinese. These findings have important implications for future molecular genetic studies of the UGT1 gene cluster as well as for personalized medical therapies in Chinese.

  2. Identification and manipulation of the pleuromutilin gene cluster from Clitopilus passeckerianus for increased rapid antibiotic production

    Science.gov (United States)

    Bailey, Andy M.; Alberti, Fabrizio; Kilaru, Sreedhar; Collins, Catherine M.; de Mattos-Shipley, Kate; Hartley, Amanda J.; Hayes, Patrick; Griffin, Alison; Lazarus, Colin M.; Cox, Russell J.; Willis, Christine L.; O'Dwyer, Karen; Spence, David W.; Foster, Gary D.

    2016-05-01

    Semi-synthetic derivatives of the tricyclic diterpene antibiotic pleuromutilin from the basidiomycete Clitopilus passeckerianus are important in combatting bacterial infections in human and veterinary medicine. These compounds belong to the only new class of antibiotics for human applications, with novel mode of action and lack of cross-resistance, representing a class with great potential. Basidiomycete fungi, being dikaryotic, are not generally amenable to strain improvement. We report identification of the seven-gene pleuromutilin gene cluster and verify that using various targeted approaches aimed at increasing antibiotic production in C. passeckerianus, no improvement in yield was achieved. The seven-gene pleuromutilin cluster was reconstructed within Aspergillus oryzae giving production of pleuromutilin in an ascomycete, with a significant increase (2106%) in production. This is the first gene cluster from a basidiomycete to be successfully expressed in an ascomycete, and paves the way for the exploitation of a metabolically rich but traditionally overlooked group of fungi.

  3. Molecular evolution of the nif gene cluster carrying nifI1 and nifI2 genes in the Gram-positive phototrophic bacterium Heliobacterium chlorum.

    Science.gov (United States)

    Enkh-Amgalan, Jigjiddorj; Kawasaki, Hiroko; Seki, Tatsuji

    2006-01-01

    A major nif cluster was detected in the strictly anaerobic, Gram-positive phototrophic bacterium Heliobacterium chlorum. The cluster consisted of 11 genes arranged within a 10 kb region in the order nifI1, nifI2, nifH, nifD, nifK, nifE, nifN, nifX, fdx, nifB and nifV. The phylogenetic position of Hbt. chlorum was the same in the NifH, NifD, NifK, NifE and NifN trees; Hbt. chlorum formed a cluster with Desulfitobacterium hafniense, the closest neighbour of heliobacteria based on the 16S rRNA phylogeny, and two species of the genus Geobacter belonging to the Deltaproteobacteria. Two nifI genes, known to occur in the nif clusters of methanogenic archaea between nifH and nifD, were found upstream of the nifH gene of Hbt. chlorum. The organization of the nif operon and the phylogeny of individual and concatenated gene products showed that the Hbt. chlorum nif operon carrying nifI genes upstream of the nifH gene was an intermediate between the nif operon with nifI downstream of nifH (group II and III of the nitrogenase classification) and the nif operon lacking nifI (group I). Thus, the phylogenetic position of Hbt. chlorum nitrogenase may reflect an evolutionary stage of a divergence of the two nitrogenase groups, with group I consisting of the aerobic diazotrophs and group II consisting of strictly anaerobic prokaryotes.

  4. Ortholog-based screening and identification of genes related to intracellular survival.

    Science.gov (United States)

    Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin

    2018-04-20

    Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.

  5. Association of paraoxonase gene cluster polymorphisms with ALS in France, Quebec, and Sweden.

    Science.gov (United States)

    Valdmanis, P N; Kabashi, E; Dyck, A; Hince, P; Lee, J; Dion, P; D'Amour, M; Souchon, F; Bouchard, J-P; Salachas, F; Meininger, V; Andersen, P M; Camu, W; Dupré, N; Rouleau, G A

    2008-08-12

    The paraoxonase gene cluster on chromosome 7 comprising the PON1-3 genes is an attractive candidate for association in amyotrophic lateral sclerosis (ALS) given the role of paraoxonase genes during the response to oxidative stress and their contribution to the enzymatic break down of nerve toxins. Oxidative stress is considered one of the mechanisms involved in ALS pathogenesis. Evidence for this includes the fact that mutations of SOD1, which normally reduce the production of toxic superoxide anion, account for 12% to 23% of familial cases in ALS. In addition, PON variants were shown to be associated with susceptibility to ALS in several North American and European populations. We extended this analysis to examine 20 single nucleotide polymorphisms (SNPs) across the PON gene cluster in a set of patients from France (480 cases, 475 controls), Quebec (159 cases, 95 controls), and Sweden (558 cases, 506 controls). Although individual SNPs were not considered associated on their own, a haplotype of SNPs at the C-terminal portion of PON2 that includes the PON2 C311S amino acid change was significant in the French (p value 0.0075) and Quebec (p value 0.026) populations as well as all three populations combined (p value 1.69 x 10(-6)). Stratification of the samples showed that this variation was pertinent to ALS susceptibility as a whole, and not to a particular subset of patients. These findings contribute to the increasing weight of evidence that genetic variants in the paraoxonase gene cluster are associated with amyotrophic lateral sclerosis.

  6. Crystal structure of clustered regularly interspaced short palindromic repeats (CRISPR)-associated Csn2 protein revealed Ca2+-dependent double-stranded DNA binding activity.

    Science.gov (United States)

    Nam, Ki Hyun; Kurinov, Igor; Ke, Ailong

    2011-09-02

    Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated protein genes (cas genes) are widespread in bacteria and archaea. They form a line of RNA-based immunity to eradicate invading bacteriophages and malicious plasmids. A key molecular event during this process is the acquisition of new spacers into the CRISPR loci to guide the selective degradation of the matching foreign genetic elements. Csn2 is a Nmeni subtype-specific cas gene required for new spacer acquisition. Here we characterize the Enterococcus faecalis Csn2 protein as a double-stranded (ds-) DNA-binding protein and report its 2.7 Å tetrameric ring structure. The inner circle of the Csn2 tetrameric ring is ∼26 Å wide and populated with conserved lysine residues poised for nonspecific interactions with ds-DNA. Each Csn2 protomer contains an α/β domain and an α-helical domain; significant hinge motion was observed between these two domains. Ca(2+) was located at strategic positions in the oligomerization interface. We further showed that removal of Ca(2+) ions altered the oligomerization state of Csn2, which in turn severely decreased its affinity for ds-DNA. In summary, our results provided the first insight into the function of the Csn2 protein in CRISPR adaptation by revealing that it is a ds-DNA-binding protein functioning at the quaternary structure level and regulated by Ca(2+) ions.

  7. Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.

    Science.gov (United States)

    Bhattacharya, Anindya; De, Rajat K

    2010-08-01

    Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to

  8. Identification of biofilm-associated cluster (bac in Pseudomonas aeruginosa involved in biofilm formation and virulence.

    Directory of Open Access Journals (Sweden)

    Camille Macé

    Full Text Available Biofilms are prevalent in diseases caused by Pseudomonas aeruginosa, an opportunistic and nosocomial pathogen. By a proteomic approach, we previously identified a hypothetical protein of P. aeruginosa (coded by the gene pA3731 that was accumulated by biofilm cells. We report here that a Delta pA3731 mutant is highly biofilm-defective as compared with the wild-type strain. Using a mouse model of lung infection, we show that the mutation also induces a defect in bacterial growth during the acute phase of infection and an attenuation of the virulence. The pA3731 gene is found to control positively the ability to swarm and to produce extracellular rhamnolipids, and belongs to a cluster of 4 genes (pA3729-pA3732 not previously described in P. aeruginosa. Though the protein PA3731 has a predicted secondary structure similar to that of the Phage Shock Protein, some obvious differences are observed compared to already described psp systems, e.g., this unknown cluster is monocistronic and no homology is found between the other proteins constituting this locus and psp proteins. As E. coli PspA, the amount of the protein PA3731 is enlarged by an osmotic shock, however, not affected by a heat shock. We consequently named this locus bac for biofilm-associated cluster.

  9. Genes involved in translation of Mycoplasma hyopneumoniae and Mycoplasma synoviae

    Directory of Open Access Journals (Sweden)

    Mônica de Oliveira Santos

    2007-01-01

    Full Text Available This is a report on the analysis of genes involved in translation of the complete genomes of Mycoplasma hyopneumoniae strain J and 7448 and Mycoplasma synoviae. In both genomes 31 ORFs encoding large ribosomal subunit proteins and 19 ORFs encoding small ribosomal subunit proteins were found. Ten ribosomal protein gene clusters encoding 42 ribosomal proteins were found in M. synoviae, while 8 clusters encoding 39 ribosomal proteins were found in both M. hyopneumoniae strains. The L33 gene of the M. hyopneumoniae strain 7448 presented two copies in different locations. The genes encoding initiation factors (IF-1, IF-2 and IF-3, elongation factors (EF-G, EF-Tu, EF-Ts and EF-P, and the genes encoding the ribosome recycling factor (frr and one polypeptide release factor (prfA were present in the genomes of M. hyopneumoniae and M. synoviae. Nineteen aminoacyl-tRNA synthases had been previously identified in both mycoplasmas. In the two strains of M. hyopneumoniae, J and 7448, only one set of 5S, 16S and 23S rRNAs had been identified. Two sets of 16S and 23S rRNA genes and three sets of 5S rRNA genes had been identified in the M. synoviae genome.

  10. Genetic interrelations in the actinomycin biosynthetic gene clusters of Streptomyces antibioticus IMRU 3720 and Streptomyces chrysomallus ATCC11523, producers of actinomycin X and actinomycin C

    Science.gov (United States)

    Crnovčić, Ivana; Rückert, Christian; Semsary, Siamak; Lang, Manuel; Kalinowski, Jörn; Keller, Ullrich

    2017-01-01

    Sequencing the actinomycin (acm) biosynthetic gene cluster of Streptomyces antibioticus IMRU 3720, which produces actinomycin X (Acm X), revealed 20 genes organized into a highly similar framework as in the bi-armed acm C biosynthetic gene cluster of Streptomyces chrysomallus but without an attached additional extra arm of orthologues as in the latter. Curiously, the extra arm of the S. chrysomallus gene cluster turned out to perfectly match the single arm of the S. antibioticus gene cluster in the same order of orthologues including the the presence of two pseudogenes, scacmM and scacmN, encoding a cytochrome P450 and its ferredoxin, respectively. Orthologues of the latter genes were both missing in the principal arm of the S. chrysomallus acm C gene cluster. All orthologues of the extra arm showed a G +C-contents different from that of their counterparts in the principal arm. Moreover, the similarities of translation products from the extra arm were all higher to the corresponding translation products of orthologue genes from the S. antibioticus acm X gene cluster than to those encoded by the principal arm of their own gene cluster. This suggests that the duplicated structure of the S. chrysomallus acm C biosynthetic gene cluster evolved from previous fusion between two one-armed acm gene clusters each from a different genetic background. However, while scacmM and scacmN in the extra arm of the S. chrysomallus acm C gene cluster are mutated and therefore are non-functional, their orthologues saacmM and saacmN in the S. antibioticus acm C gene cluster show no defects seemingly encoding active enzymes with functions specific for Acm X biosynthesis. Both acm biosynthetic gene clusters lack a kynurenine-3-monooxygenase gene necessary for biosynthesis of 3-hydroxy-4-methylanthranilic acid, the building block of the Acm chromophore, which suggests participation of a genome-encoded relevant monooxygenase during Acm biosynthesis in both S. chrysomallus and S

  11. Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans

    Directory of Open Access Journals (Sweden)

    Michelle Carey

    2016-10-01

    Full Text Available Many pragmatic clustering methods have been developed to group data vectors or objects into clusters so that the objects in one cluster are very similar and objects in different clusters are distinct based on some similarity measure. The availability of time course data has motivated researchers to develop methods, such as mixture and mixed-effects modelling approaches, that incorporate the temporal information contained in the shape of the trajectory of the data. However, there is still a need for the development of time-course clustering methods that can adequately deal with inhomogeneous clusters (some clusters are quite large and others are quite small. Here we propose two such methods, hierarchical clustering (IHC and iterative pairwise-correlation clustering (IPC. We evaluate and compare the proposed methods to the Markov Cluster Algorithm (MCL and the generalised mixed-effects model (GMM using simulation studies and an application to a time course gene expression data set from a study containing human subjects who were challenged by a live influenza virus. We identify four types of temporal gene response modules to influenza infection in humans, i.e., single-gene modules (SGM, small-size modules (SSM, medium-size modules (MSM and large-size modules (LSM. The LSM contain genes that perform various fundamental biological functions that are consistent across subjects. The SSM and SGM contain genes that perform either different or similar biological functions that have complex temporal responses to the virus and are unique to each subject. We show that the temporal response of the genes in the LSM have either simple patterns with a single peak or trough a consequence of the transient stimuli sustained or state-transitioning patterns pertaining to developmental cues and that these modules can differentiate the severity of disease outcomes. Additionally, the size of gene response modules follows a power-law distribution with a consistent

  12. Measurement of circulating transcripts and gene cluster analysis predicts and defines therapeutic efficacy of peptide receptor radionuclide therapy (PRRT) in neuroendocrine tumors

    International Nuclear Information System (INIS)

    Bodei, L.; Kidd, M.; Modlin, I.M.; Severi, S.; Nicolini, S.; Paganelli, G.; Drozdov, I.; Kwekkeboom, D.J.; Krenning, E.P.; Baum, R.P.

    2016-01-01

    Peptide receptor radionuclide therapy (PRRT) is an effective method for treating neuroendocrine tumors (NETs). It is limited, however, in the prediction of individual tumor response and the precise and early identification of changes in tumor size. Currently, response prediction is based on somatostatin receptor expression and efficacy by morphological imaging and/or chromogranin A (CgA) measurement. The aim of this study was to assess the accuracy of circulating NET transcripts as a measure of PRRT efficacy, and moreover to identify prognostic gene clusters in pretreatment blood that could be interpolated with relevant clinical features in order to define a biological index for the tumor and a predictive quotient for PRRT efficacy. NET patients (n = 54), M: F 37:17, median age 66, bronchial: n = 13, GEP-NET: n = 35, CUP: n = 6 were treated with 177 Lu-based-PRRT (cumulative activity: 6.5-27.8 GBq, median 18.5). At baseline: 47/54 low-grade (G1/G2; bronchial typical/atypical), 31/49 18 FDG positive and 39/54 progressive. Disease status was assessed by RECIST1.1. Transcripts were measured by real-time quantitative reverse transcription PCR (qRT-PCR) and multianalyte algorithmic analysis (NETest); CgA by enzyme-linked immunosorbent assay (ELISA). Gene cluster (GC) derivations: regulatory network, protein:protein interactome analyses. Statistical analyses: chi-square, non-parametric measurements, multiple regression, receiver operating characteristic and Kaplan-Meier survival. The disease control rate was 72 %. Median PFS was not achieved (follow-up: 1-33 months, median: 16). Only grading was associated with response (p < 0.01). At baseline, 94 % of patients were NETest-positive, while CgA was elevated in 59 %. NETest accurately (89 %, χ 2 = 27.4; p = 1.2 x 10 -7 ) correlated with treatment response, while CgA was 24 % accurate. Gene cluster expression (growth-factor signalome and metabolome) had an AUC of 0.74 ± 0.08 (z-statistic = 2.92, p < 0.004) for predicting

  13. Measurement of circulating transcripts and gene cluster analysis predicts and defines therapeutic efficacy of peptide receptor radionuclide therapy (PRRT) in neuroendocrine tumors

    Energy Technology Data Exchange (ETDEWEB)

    Bodei, L. [European Institute of Oncology, Division of Nuclear Medicine, Milan (Italy); LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Kidd, M. [Wren Laboratories, Branford, CT (United States); Modlin, I.M. [LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Yale School of Medicine, New Haven, CT (United States); Severi, S.; Nicolini, S.; Paganelli, G. [Istituto Scientifico Romagnolo per lo Studio e la Cura dei Tumori (IRST) IRCCS, Nuclear Medicine and Radiometabolic Units, Meldola (Italy); Drozdov, I. [Bering Limited, London (United Kingdom); Kwekkeboom, D.J.; Krenning, E.P. [LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Erasmus Medical Center, Nuclear Medicine Department, Rotterdam (Netherlands); Baum, R.P. [LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Zentralklinik Bad Berka, Theranostics Center for Molecular Radiotherapy and Imaging, Bad Berka (Germany)

    2016-05-15

    Peptide receptor radionuclide therapy (PRRT) is an effective method for treating neuroendocrine tumors (NETs). It is limited, however, in the prediction of individual tumor response and the precise and early identification of changes in tumor size. Currently, response prediction is based on somatostatin receptor expression and efficacy by morphological imaging and/or chromogranin A (CgA) measurement. The aim of this study was to assess the accuracy of circulating NET transcripts as a measure of PRRT efficacy, and moreover to identify prognostic gene clusters in pretreatment blood that could be interpolated with relevant clinical features in order to define a biological index for the tumor and a predictive quotient for PRRT efficacy. NET patients (n = 54), M: F 37:17, median age 66, bronchial: n = 13, GEP-NET: n = 35, CUP: n = 6 were treated with {sup 177}Lu-based-PRRT (cumulative activity: 6.5-27.8 GBq, median 18.5). At baseline: 47/54 low-grade (G1/G2; bronchial typical/atypical), 31/49 {sup 18}FDG positive and 39/54 progressive. Disease status was assessed by RECIST1.1. Transcripts were measured by real-time quantitative reverse transcription PCR (qRT-PCR) and multianalyte algorithmic analysis (NETest); CgA by enzyme-linked immunosorbent assay (ELISA). Gene cluster (GC) derivations: regulatory network, protein:protein interactome analyses. Statistical analyses: chi-square, non-parametric measurements, multiple regression, receiver operating characteristic and Kaplan-Meier survival. The disease control rate was 72 %. Median PFS was not achieved (follow-up: 1-33 months, median: 16). Only grading was associated with response (p < 0.01). At baseline, 94 % of patients were NETest-positive, while CgA was elevated in 59 %. NETest accurately (89 %, χ{sup 2} = 27.4; p = 1.2 x 10{sup -7}) correlated with treatment response, while CgA was 24 % accurate. Gene cluster expression (growth-factor signalome and metabolome) had an AUC of 0.74 ± 0.08 (z-statistic = 2.92, p < 0

  14. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  15. Genetic interrelations in the actinomycin biosynthetic gene clusters of Streptomyces antibioticus IMRU 3720 and Streptomyces chrysomallus ATCC11523, producers of actinomycin X and actinomycin C

    Directory of Open Access Journals (Sweden)

    Crnovčić I

    2017-04-01

    Full Text Available Ivana Crnovčić,1 Christian Rückert,2 Siamak Semsary,1 Manuel Lang,1 Jörn Kalinowski,2 Ullrich Keller1 1Institut für Chemie, Technische Universität Berlin, Berlin-Charlottenburg, 2Technology Platform Genomics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany Abstract: Sequencing the actinomycin (acm biosynthetic gene cluster of Streptomyces antibioticus IMRU 3720, which produces actinomycin X (Acm X, revealed 20 genes organized into a highly similar framework as in the bi-armed acm C biosynthetic gene cluster of Streptomyces chrysomallus but without an attached additional extra arm of orthologues as in the latter. Curiously, the extra arm of the S. chrysomallus gene cluster turned out to perfectly match the single arm of the S. antibioticus gene cluster in the same order of orthologues including the the presence of two pseudogenes, scacmM and scacmN, encoding a cytochrome P450 and its ferredoxin, respectively. Orthologues of the latter genes were both missing in the principal arm of the S. chrysomallus acm C gene cluster. All orthologues of the extra arm showed a G +C-contents different from that of their counterparts in the principal arm. Moreover, the similarities of translation products from the extra arm were all higher to the corresponding translation products of orthologue genes from the S. antibioticus acm X gene cluster than to those encoded by the principal arm of their own gene cluster. This suggests that the duplicated structure of the S. chrysomallus acm C biosynthetic gene cluster evolved from previous fusion between two one-armed acm gene clusters each from a different genetic background. However, while scacmM and scacmN in the extra arm of the S. chrysomallus acm C gene cluster are mutated and therefore are non-functional, their orthologues saacmM and saacmN in the S. antibioticus acm C gene cluster show no defects seemingly encoding active enzymes with functions specific for Acm X biosynthesis. Both acm

  16. Heterologous Reconstitution of the Intact Geodin Gene Cluster in Aspergillus nidulans through a Simple and Versatile PCR Based Approach

    DEFF Research Database (Denmark)

    Nielsen, Morten Thrane; Nielsen, Jakob Blæsbjerg; Anyaogu, Dianna Chinyere

    2013-01-01

    was transferred in a two step procedure to an expression platform in A. nidulans. The individual cluster fragments were generated by PCR and assembled via efficient USER fusion prior to ransformation and integration via re-iterative gene targeting. A total of 13 open reading frames contained in 25 kb of DNA were...... of solid methodology for genetic manipulation of most species severely hampers pathway haracterization. Here we present a simple PCR based approach for heterologous reconstitution of intact gene clusters. Specifically, the putative gene cluster responsible for geodin production from Aspergillus terreus...... successfully transferred between the two species enabling geodin synthesis in A. nidulans. Subsequently, functions of three genes in the cluster were validated by genetic and chemical analyses. Specifically, ATEG_08451 (gedC) encodes a polyketide synthase, ATEG_08453 (gedR) encodes a transcription factor...

  17. Operon Gene Order Is Optimized for Ordered Protein Complex Assembly

    Science.gov (United States)

    Wells, Jonathan N.; Bergendahl, L. Therese; Marsh, Joseph A.

    2016-01-01

    Summary The assembly of heteromeric protein complexes is an inherently stochastic process in which multiple genes are expressed separately into proteins, which must then somehow find each other within the cell. Here, we considered one of the ways by which prokaryotic organisms have attempted to maximize the efficiency of protein complex assembly: the organization of subunit-encoding genes into operons. Using structure-based assembly predictions, we show that operon gene order has been optimized to match the order in which protein subunits assemble. Exceptions to this are almost entirely highly expressed proteins for which assembly is less stochastic and for which precisely ordered translation offers less benefit. Overall, these results show that ordered protein complex assembly pathways are of significant biological importance and represent a major evolutionary constraint on operon gene organization. PMID:26804901

  18. De novo origin of human protein-coding genes.

    Directory of Open Access Journals (Sweden)

    Dong-Dong Wu

    2011-11-01

    Full Text Available The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA-seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes.

  19. De Novo Origin of Human Protein-Coding Genes

    Science.gov (United States)

    Wu, Dong-Dong; Irwin, David M.; Zhang, Ya-Ping

    2011-01-01

    The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA–seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes. PMID:22102831

  20. Characterization of the biosynthetic gene cluster for cryptic phthoxazolin A in Streptomyces avermitilis.

    Directory of Open Access Journals (Sweden)

    Dian Anggraini Suroto

    Full Text Available Phthoxazolin A, an oxazole-containing polyketide, has a broad spectrum of anti-oomycete activity and herbicidal activity. We recently identified phthoxazolin A as a cryptic metabolite of Streptomyces avermitilis that produces the important anthelmintic agent avermectin. Even though genome data of S. avermitilis is publicly available, no plausible biosynthetic gene cluster for phthoxazolin A is apparent in the sequence data. Here, we identified and characterized the phthoxazolin A (ptx biosynthetic gene cluster through genome sequencing, comparative genomic analysis, and gene disruption. Sequence analysis uncovered that the putative ptx biosynthetic genes are laid on an extra genomic region that is not found in the public database, and 8 open reading frames in the extra genomic region could be assigned roles in the biosynthesis of the oxazole ring, triene polyketide and carbamoyl moieties. Disruption of the ptxA gene encoding a discrete acyltransferase resulted in a complete loss of phthoxazolin A production, confirming that the trans-AT type I PKS system is responsible for the phthoxazolin A biosynthesis. Based on the predicted functional domains in the ptx assembly line, we propose the biosynthetic pathway of phthoxazolin A.

  1. Transcriptional interference networks coordinate the expression of functionally related genes clustered in the same genomic loci.

    Science.gov (United States)

    Boldogköi, Zsolt

    2012-01-01

    The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too.

  2. Transcriptional interference networks coordinate the expression of functionally-related genes clustered in the same genomic loci

    Directory of Open Access Journals (Sweden)

    Zsolt eBoldogkoi

    2012-07-01

    Full Text Available The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organisation, transcription, various post-transcriptional processes and translation. In this study, the Transcriptional Interference Network (TIN hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighbouring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally-linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly-arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely-oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronised cascade of gene expression in functionally-linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular

  3. The small RNA content of human sperm reveals pseudogene-derived piRNAs complementary to protein-coding genes

    DEFF Research Database (Denmark)

    Pantano, Lorena; Jodar, Meritxell; Bak, Mads

    2015-01-01

    -specific genes. The most abundant class of small noncoding RNAs in sperm are PIWI-interacting RNAs (piRNAs). Surprisingly, we found that human sperm cells contain piRNAs processed from pseudogenes. Clusters of piRNAs from human testes contain pseudogenes transcribed in the antisense strand and processed...... into small RNAs. Several human protein-coding genes contain antisense predicted targets of pseudogene-derived piRNAs in the male germline and these piRNAs are still found in mature sperm. Our study provides the most extensive data set and annotation of human sperm small RNAs to date and is a resource...... for further functional studies on the roles of sperm small RNAs. In addition, we propose that some of the pseudogene-derived human piRNAs may regulate expression of their parent gene in the male germline....

  4. Hierarchical clustering of breast cancer methylomes revealed differentially methylated and expressed breast cancer genes.

    Directory of Open Access Journals (Sweden)

    I-Hsuan Lin

    Full Text Available Oncogenic transformation of normal cells often involves epigenetic alterations, including histone modification and DNA methylation. We conducted whole-genome bisulfite sequencing to determine the DNA methylomes of normal breast, fibroadenoma, invasive ductal carcinomas and MCF7. The emergence, disappearance, expansion and contraction of kilobase-sized hypomethylated regions (HMRs and the hypomethylation of the megabase-sized partially methylated domains (PMDs are the major forms of methylation changes observed in breast tumor samples. Hierarchical clustering of HMR revealed tumor-specific hypermethylated clusters and differential methylated enhancers specific to normal or breast cancer cell lines. Joint analysis of gene expression and DNA methylation data of normal breast and breast cancer cells identified differentially methylated and expressed genes associated with breast and/or ovarian cancers in cancer-specific HMR clusters. Furthermore, aberrant patterns of X-chromosome inactivation (XCI was found in breast cancer cell lines as well as breast tumor samples in the TCGA BRCA (breast invasive carcinoma dataset. They were characterized with differentially hypermethylated XIST promoter, reduced expression of XIST, and over-expression of hypomethylated X-linked genes. High expressions of these genes were significantly associated with lower survival rates in breast cancer patients. Comprehensive analysis of the normal and breast tumor methylomes suggests selective targeting of DNA methylation changes during breast cancer progression. The weak causal relationship between DNA methylation and gene expression observed in this study is evident of more complex role of DNA methylation in the regulation of gene expression in human epigenetics that deserves further investigation.

  5. A Drosophila gene encoding a protein resembling the human β-amyloid protein precursor

    International Nuclear Information System (INIS)

    Rosen, D.R.; Martin-Morris, L.; Luo, L.; White, K.

    1989-01-01

    The authors have isolated genomic and cDNA clones for a Drosophila gene resembling the human β-amyloid precursor protein (APP). This gene produces a nervous system-enriched 6.5-kilobase transcript. Sequencing of cDNAs derived from the 6.5-kilobase transcript predicts an 886-amino acid polypeptide. This polypeptide contains a putative transmembrane domain and exhibits strong sequence similarity to cytoplasmic and extracellular regions of the human β-amyloid precursor protein. There is a high probability that this Drosophila gene corresponds to the essential Drosophila locus vnd, a gene required for embryonic nervous system development

  6. A Complex of Cas Proteins 5, 6, and 7 Is Required for the Biogenesis and Stability of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-derived RNAs (crRNAs) in Haloferax volcanii*

    Science.gov (United States)

    Brendel, Jutta; Stoll, Britta; Lange, Sita J.; Sharma, Kundan; Lenz, Christof; Stachler, Aris-Edda; Maier, Lisa-Katharina; Richter, Hagen; Nickel, Lisa; Schmitz, Ruth A.; Randau, Lennart; Allers, Thorsten; Urlaub, Henning; Backofen, Rolf; Marchfelder, Anita

    2014-01-01

    The clustered regularly interspaced short palindromic repeats/CRISPR-associated (CRISPR-Cas) system is a prokaryotic defense mechanism against foreign genetic elements. A plethora of CRISPR-Cas versions exist, with more than 40 different Cas protein families and several different molecular approaches to fight the invading DNA. One of the key players in the system is the CRISPR-derived RNA (crRNA), which directs the invader-degrading Cas protein complex to the invader. The CRISPR-Cas types I and III use the Cas6 protein to generate mature crRNAs. Here, we show that the Cas6 protein is necessary for crRNA production but that additional Cas proteins that form a CRISPR-associated complex for antiviral defense (Cascade)-like complex are needed for crRNA stability in the CRISPR-Cas type I-B system in Haloferax volcanii in vivo. Deletion of the cas6 gene results in the loss of mature crRNAs and interference. However, cells that have the complete cas gene cluster (cas1–8b) removed and are transformed with the cas6 gene are not able to produce and stably maintain mature crRNAs. crRNA production and stability is rescued only if cas5, -6, and -7 are present. Mutational analysis of the cas6 gene reveals three amino acids (His-41, Gly-256, and Gly-258) that are essential for pre-crRNA cleavage, whereas the mutation of two amino acids (Ser-115 and Ser-224) leads to an increase of crRNA amounts. This is the first systematic in vivo analysis of Cas6 protein variants. In addition, we show that the H. volcanii I-B system contains a Cascade-like complex with a Cas7, Cas5, and Cas6 core that protects the crRNA. PMID:24459147

  7. Transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis

    Directory of Open Access Journals (Sweden)

    Riccardi Giovanna

    2009-03-01

    Full Text Available Abstract Background The ESAT-6 (early secreted antigenic target, 6 kDa family collects small mycobacterial proteins secreted by Mycobacterium tuberculosis, particularly in the early phase of growth. There are 23 ESAT-6 family members in M. tuberculosis H37Rv. In a previous work, we identified the Zur- dependent regulation of five proteins of the ESAT-6/CFP-10 family (esxG, esxH, esxQ, esxR, and esxS. esxG and esxH are part of ESAT-6 cluster 3, whose expression was already known to be induced by iron starvation. Results In this research, we performed EMSA experiments and transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis (msmeg0615-msmeg0625 and M. tuberculosis. In contrast to what we had observed in M. tuberculosis, we found that in M. smegmatis ESAT-6 cluster 3 responds only to iron and not to zinc. In both organisms we identified an internal promoter, a finding which suggests the presence of two transcriptional units and, by consequence, a differential expression of cluster 3 genes. We compared the expression of msmeg0615 and msmeg0620 in different growth and stress conditions by means of relative quantitative PCR. The expression of msmeg0615 and msmeg0620 genes was essentially similar; they appeared to be repressed in most of the tested conditions, with the exception of acid stress (pH 4.2 where msmeg0615 was about 4-fold induced, while msmeg0620 was repressed. Analysis revealed that in acid stress conditions M. tuberculosis rv0282 gene was 3-fold induced too, while rv0287 induction was almost insignificant. Conclusion In contrast with what has been reported for M. tuberculosis, our results suggest that in M. smegmatis only IdeR-dependent regulation is retained, while zinc has no effect on gene expression. The role of cluster 3 in M. tuberculosis virulence is still to be defined; however, iron- and zinc-dependent expression strongly suggests that cluster 3 is highly expressed in the infective process, and that the cluster

  8. Gene clusters for insecticidal loline alkaloids in the grass-endophytic fungus Neotyphodium uncinatum.

    Science.gov (United States)

    Spiering, Martin J; Moon, Christina D; Wilkinson, Heather H; Schardl, Christopher L

    2005-03-01

    Loline alkaloids are produced by mutualistic fungi symbiotic with grasses, and they protect the host plants from insects. Here we identify in the fungal symbiont, Neotyphodium uncinatum, two homologous gene clusters (LOL-1 and LOL-2) associated with loline-alkaloid production. Nine genes were identified in a 25-kb region of LOL-1 and designated (in order) lolF-1, lolC-1, lolD-1, lolO-1, lolA-1, lolU-1, lolP-1, lolT-1, and lolE-1. LOL-2 contained the homologs lolC-2 through lolE-2 in the same order and orientation. Also identified was lolF-2, but its possible linkage with either cluster was undetermined. Most lol genes were regulated in N. uncinatum and N. coenophialum, and all were expressed concomitantly with loline-alkaloid biosynthesis. A lolC-2 RNA-interference (RNAi) construct was introduced into N. uncinatum, and in two independent transformants, RNAi significantly decreased lolC expression (P lol-gene products indicate that the pathway has evolved from various different primary and secondary biosynthesis pathways.

  9. Structure and gene cluster of the O-antigen of Escherichia coli O54.

    Science.gov (United States)

    Naumenko, Olesya I; Guo, Xi; Senchenkova, Sof'ya N; Geng, Peng; Perepelov, Andrei V; Shashkov, Alexander S; Liu, Bin; Knirel, Yuriy A

    2018-06-15

    Mild acid hydrolysis of the lipopolysaccharide of Escherichia coli O54 afforded an O-polysaccharide, which was studied by sugar analysis, solvolysis with anhydrous trifluoroacetic acid, and 1 H and 13 C NMR spectroscopy. Solvolysis cleaved predominantly the linkage of β-d-Ribf and, to a lesser extent, that of β-d-GlcpNAc, whereas the other linkages, including the linkage of α-l-Rhap, were stable under selected conditions (40 °C, 5 h). The following structure of the O-polysaccharide was established: →4)-α-d-GalpA-(1 → 2)-α-l-Rhap-(1 → 2)-β-d-Ribf-(1 → 4)-β-d-Galp-(1 → 3)-β-d-GlcpNAc-(1→ The O-antigen gene cluster of E. coli O54 was analyzed and found to be consistent in general with the O-polysaccharide structure established but there were two exceptions: i) in the cluster, there were genes for phosphoserine phosphatase and serine transferase, which have no apparent role in the O-polysaccharide synthesis, and ii) no ribofuranosyltransferase gene was present in the cluster. Both uncommon features are shared by some other enteric bacteria. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks.

    Science.gov (United States)

    Botía, Juan A; Vandrovcova, Jana; Forabosco, Paola; Guelfi, Sebastian; D'Sa, Karishma; Hardy, John; Lewis, Cathryn M; Ryten, Mina; Weale, Michael E

    2017-04-12

    Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used R software package for the generation of gene co-expression networks (GCN). WGCNA generates both a GCN and a derived partitioning of clusters of genes (modules). We propose k-means clustering as an additional processing step to conventional WGCNA, which we have implemented in the R package km2gcn (k-means to gene co-expression network, https://github.com/juanbot/km2gcn ). We assessed our method on networks created from UKBEC data (10 different human brain tissues), on networks created from GTEx data (42 human tissues, including 13 brain tissues), and on simulated networks derived from GTEx data. We observed substantially improved module properties, including: (1) few or zero misplaced genes; (2) increased counts of replicable clusters in alternate tissues (x3.1 on average); (3) improved enrichment of Gene Ontology terms (seen in 48/52 GCNs) (4) improved cell type enrichment signals (seen in 21/23 brain GCNs); and (5) more accurate partitions in simulated data according to a range of similarity indices. The results obtained from our investigations indicate that our k-means method, applied as an adjunct to standard WGCNA, results in better network partitions. These improved partitions enable more fruitful downstream analyses, as gene modules are more biologically meaningful.

  11. Arabidopsis mRNA polyadenylation machinery: comprehensive analysis of protein-protein interactions and gene expression profiling

    Directory of Open Access Journals (Sweden)

    Mo Min

    2008-05-01

    Full Text Available Abstract Background The polyadenylation of mRNA is one of the critical processing steps during expression of almost all eukaryotic genes. It is tightly integrated with transcription, particularly its termination, as well as other RNA processing events, i.e. capping and splicing. The poly(A tail protects the mRNA from unregulated degradation, and it is required for nuclear export and translation initiation. In recent years, it has been demonstrated that the polyadenylation process is also involved in the regulation of gene expression. The polyadenylation process requires two components, the cis-elements on the mRNA and a group of protein factors that recognize the cis-elements and produce the poly(A tail. Here we report a comprehensive pairwise protein-protein interaction mapping and gene expression profiling of the mRNA polyadenylation protein machinery in Arabidopsis. Results By protein sequence homology search using human and yeast polyadenylation factors, we identified 28 proteins that may be components of Arabidopsis polyadenylation machinery. To elucidate the protein network and their functions, we first tested their protein-protein interaction profiles. Out of 320 pair-wise protein-protein interaction assays done using the yeast two-hybrid system, 56 (~17% showed positive interactions. 15 of these interactions were further tested, and all were confirmed by co-immunoprecipitation and/or in vitro co-purification. These interactions organize into three distinct hubs involving the Arabidopsis polyadenylation factors. These hubs are centered around AtCPSF100, AtCLPS, and AtFIPS. The first two are similar to complexes seen in mammals, while the third one stands out as unique to plants. When comparing the gene expression profiles extracted from publicly available microarray datasets, some of the polyadenylation related genes showed tissue-specific expression, suggestive of potential different polyadenylation complex configurations. Conclusion An

  12. Characterization of chicken riboflavin carrier protein gene structure ...

    Indian Academy of Sciences (India)

    The chicken riboflavin carrier protein (RCP) is an estrogen induced egg yolk and white protein. Eggs from hens which have a splice mutation in RCP gene fail to hatch, indicating an absolute requirement of RCP for the transport of riboflavin to the oocyte. In order to understand the mechanism of regulation of this gene by ...

  13. The Basic/Helix-Loop-Helix Protein Family in Gossypium: Reference Genes and Their Evolution during Tetraploidization.

    Directory of Open Access Journals (Sweden)

    Qian Yan

    Full Text Available Basic/helix-loop-helix (bHLH proteins comprise one of the largest transcription factor families and play important roles in diverse cellular and molecular processes. Comprehensive analyses of the composition and evolution of the bHLH family in cotton are essential to elucidate their functions and the molecular basis of cotton development. By searching bHLH homologous genes in sequenced diploid cotton genomes (Gossypium raimondii and G. arboreum, a set of cotton bHLH reference genes containing 289 paralogs were identified and named as GobHLH001-289. Based on their phylogenetic relationships, these cotton bHLH proteins were clustered into 27 subfamilies. Compared to those in Arabidopsis and cacao, cotton bHLH proteins generally increased in number, but unevenly in different subfamilies. To further uncover evolutionary changes of bHLH genes during tetraploidization of cotton, all genes of S5a and S5b subfamilies in upland cotton and its diploid progenitors were cloned and compared, and their transcript profiles were determined in upland cotton. A total of 10 genes of S5a and S5b subfamilies (doubled from A- and D-genome progenitors maintained in tetraploid cottons. The major sequence changes in upland cotton included a 15-bp in-frame deletion in GhbHLH130D and a long terminal repeat retrotransposon inserted in GhbHLH062A, which eliminated GhbHLH062A expression in various tissues. The S5a and S5b bHLH genes of A and D genomes (except GobHLH062 showed similar transcription patterns in various tissues including roots, stems, leaves, petals, ovules, and fibers, while the A- and D-genome genes of GobHLH110 and GobHLH130 displayed clearly different transcript profiles during fiber development. In total, this study represented a genome-wide analysis of cotton bHLH family, and revealed significant changes in sequence and expression of these genes in tetraploid cottons, which paved the way for further functional analyses of bHLH genes in the cotton genus.

  14. Protein functional links in Trypanosoma brucei, identified by gene fusion analysis

    Directory of Open Access Journals (Sweden)

    Trimpalis Philip

    2011-07-01

    Full Text Available Abstract Background Domain or gene fusion analysis is a bioinformatics method for detecting gene fusions in one organism by comparing its genome to that of other organisms. The occurrence of gene fusions suggests that the two original genes that participated in the fusion are functionally linked, i.e. their gene products interact either as part of a multi-subunit protein complex, or in a metabolic pathway. Gene fusion analysis has been used to identify protein functional links in prokaryotes as well as in eukaryotic model organisms, such as yeast and Drosophila. Results In this study we have extended this approach to include a number of recently sequenced protists, four of which are pathogenic, to identify fusion linked proteins in Trypanosoma brucei, the causative agent of African sleeping sickness. We have also examined the evolution of the gene fusion events identified, to determine whether they can be attributed to fusion or fission, by looking at the conservation of the fused genes and of the individual component genes across the major eukaryotic and prokaryotic lineages. We find relatively limited occurrence of gene fusions/fissions within the protist lineages examined. Our results point to two trypanosome-specific gene fissions, which have recently been experimentally confirmed, one fusion involving proteins involved in the same metabolic pathway, as well as two novel putative functional links between fusion-linked protein pairs. Conclusions This is the first study of protein functional links in T. brucei identified by gene fusion analysis. We have used strict thresholds and only discuss results which are highly likely to be genuine and which either have already been or can be experimentally verified. We discuss the possible impact of the identification of these novel putative protein-protein interactions, to the development of new trypanosome therapeutic drugs.

  15. Phenotype Clustering of Breast Epithelial Cells in Confocal Imagesbased on Nuclear Protein Distribution Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Long, Fuhui; Peng, Hanchuan; Sudar, Damir; Levievre, Sophie A.; Knowles, David W.

    2006-09-05

    Background: The distribution of the chromatin-associatedproteins plays a key role in directing nuclear function. Previously, wedeveloped an image-based method to quantify the nuclear distributions ofproteins and showed that these distributions depended on the phenotype ofhuman mammary epithelial cells. Here we describe a method that creates ahierarchical tree of the given cell phenotypes and calculates thestatistical significance between them, based on the clustering analysisof nuclear protein distributions. Results: Nuclear distributions ofnuclear mitotic apparatus protein were previously obtained fornon-neoplastic S1 and malignant T4-2 human mammary epithelial cellscultured for up to 12 days. Cell phenotype was defined as S1 or T4-2 andthe number of days in cultured. A probabilistic ensemble approach wasused to define a set of consensus clusters from the results of multipletraditional cluster analysis techniques applied to the nucleardistribution data. Cluster histograms were constructed to show how cellsin any one phenotype were distributed across the consensus clusters.Grouping various phenotypes allowed us to build phenotype trees andcalculate the statistical difference between each group. The resultsshowed that non-neoplastic S1 cells could be distinguished from malignantT4-2 cells with 94.19 percent accuracy; that proliferating S1 cells couldbe distinguished from differentiated S1 cells with 92.86 percentaccuracy; and showed no significant difference between the variousphenotypes of T4-2 cells corresponding to increasing tumor sizes.Conclusion: This work presents a cluster analysis method that canidentify significant cell phenotypes, based on the nuclear distributionof specific proteins, with high accuracy.

  16. Characterization of the fumonisin B2 biosynthetic gene cluster in Aspergillus niger and A. awamori.

    Science.gov (United States)

    Aspergillus niger and A. awamori strains isolated from grapes cultivated in Mediterranean basin were examined for fumonisin B2 (FB2) production and presence/absence of sequences within the fumonisin biosynthetic gene (fum) cluster. Presence of 13 regions in the fum cluster was evaluated by PCR assay...

  17. Sugarcane genes related to mitochondrial function

    Directory of Open Access Journals (Sweden)

    Fonseca Ghislaine V.

    2001-01-01

    Full Text Available Mitochondria function as metabolic powerhouses by generating energy through oxidative phosphorylation and have become the focus of renewed interest due to progress in understanding the subtleties of their biogenesis and the discovery of the important roles which these organelles play in senescence, cell death and the assembly of iron-sulfur (Fe/S centers. Using proteins from the yeast Saccharomyces cerevisiae, Homo sapiens and Arabidopsis thaliana we searched the sugarcane expressed sequence tag (SUCEST database for the presence of expressed sequence tags (ESTs with similarity to nuclear genes related to mitochondrial functions. Starting with 869 protein sequences, we searched for sugarcane EST counterparts to these proteins using the basic local alignment search tool TBLASTN similarity searching program run against 260,781 sugarcane ESTs contained in 81,223 clusters. We were able to recover 367 clusters likely to represent sugarcane orthologues of the corresponding genes from S. cerevisiae, H. sapiens and A. thaliana with E-value <= 10-10. Gene products belonging to all functional categories related to mitochondrial functions were found and this allowed us to produce an overview of the nuclear genes required for sugarcane mitochondrial biogenesis and function as well as providing a starting point for detailed analysis of sugarcane gene structure and physiology.

  18. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    Science.gov (United States)

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  19. Downregulation of ATM Gene and Protein Expression in Canine Mammary Tumors.

    Science.gov (United States)

    Raposo-Ferreira, T M M; Bueno, R C; Terra, E M; Avante, M L; Tinucci-Costa, M; Carvalho, M; Cassali, G D; Linde, S D; Rogatto, S R; Laufer-Amorim, R

    2016-11-01

    The ataxia telangiectasia mutated (ATM) gene encodes a protein associated with DNA damage repair and maintenance of genomic integrity. In women, ATM transcript and protein downregulation have been reported in sporadic breast carcinomas, and the absence of ATM protein expression has been associated with poor prognosis. The aim of this study was to evaluate ATM gene and protein expression in canine mammary tumors and their association with clinical outcome. ATM gene and protein expression was evaluated by reverse transcription-quantitative polymerase chain reaction and immunohistochemistry, respectively, in normal mammary gland samples (n = 10), benign mammary tumors (n = 11), nonmetastatic mammary carcinomas (n = 19), and metastatic mammary carcinomas (n = 11). Lower ATM transcript levels were detected in benign mammary tumors and carcinomas compared with normal mammary glands (P = .011). Similarly, lower ATM protein expression was observed in benign tumors (P = .0003), nonmetastatic mammary carcinomas (P ATM gene or protein levels were detected among benign tumors and nonmetastatic and metastatic mammary carcinomas (P > .05). The levels of ATM gene or protein expression were not significantly associated with clinical and pathological features or with survival. Similar to human breast cancer, the data in this study suggest that ATM gene and protein downregulation is involved in canine mammary gland tumorigenesis. © The Author(s) 2016.

  20. Genetic homogeneity of Clostridium botulinum type A1 strains with unique toxin gene clusters.

    Science.gov (United States)

    Raphael, Brian H; Luquez, Carolina; McCroskey, Loretta M; Joseph, Lavin A; Jacobson, Mark J; Johnson, Eric A; Maslanka, Susan E; Andreadis, Joanne D

    2008-07-01

    A group of five clonally related Clostridium botulinum type A strains isolated from different sources over a period of nearly 40 years harbored several conserved genetic properties. These strains contained a variant bont/A1 with five nucleotide polymorphisms compared to the gene in C. botulinum strain ATCC 3502. The strains also had a common toxin gene cluster composition (ha-/orfX+) similar to that associated with bont/A in type A strains containing an unexpressed bont/B [termed A(B) strains]. However, bont/B was not identified in the strains examined. Comparative genomic hybridization demonstrated identical genomic content among the strains relative to C. botulinum strain ATCC 3502. In addition, microarray data demonstrated the absence of several genes flanking the toxin gene cluster among the ha-/orfX+ A1 strains, suggesting the presence of genomic rearrangements with respect to this region compared to the C. botulinum ATCC 3502 strain. All five strains were shown to have identical flaA variable region nucleotide sequences. The pulsed-field gel electrophoresis patterns of the strains were indistinguishable when digested with SmaI, and a shift in the size of at least one band was observed in a single strain when digested with XhoI. These results demonstrate surprising genomic homogeneity among a cluster of unique C. botulinum type A strains of diverse origin.

  1. Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource.

    Science.gov (United States)

    Sharpton, Thomas J; Jospin, Guillaume; Wu, Dongying; Langille, Morgan G I; Pollard, Katherine S; Eisen, Jonathan A

    2012-10-13

    New computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences. We implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as "Sifting Families," or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology-based analyses. We describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/).

  2. Ancestral Variations of the PCDHG Gene Cluster Predispose to Dyslexia in a Multiplex Family

    Directory of Open Access Journals (Sweden)

    Teesta Naskar

    2018-02-01

    Full Text Available Dyslexia is a heritable neurodevelopmental disorder characterized by difficulties in reading and writing. In this study, we describe the identification of a set of 17 polymorphisms located across 1.9 Mb region on chromosome 5q31.3, encompassing genes of the PCDHG cluster, TAF7, PCDH1 and ARHGAP26, dominantly inherited with dyslexia in a multi-incident family. Strikingly, the non-risk form of seven variations of the PCDHG cluster, are preponderant in the human lineage, while risk alleles are ancestral and conserved across Neanderthals to non-human primates. Four of these seven ancestral variations (c.460A > C [p.Ile154Leu], c.541G > A [p.Ala181Thr], c.2036G > C [p.Arg679Pro] and c.2059A > G [p.Lys687Glu] result in amino acid alterations. p.Ile154Leu and p.Ala181Thr are present at EC2: EC3 interacting interface of γA3-PCDH and γA4-PCDH respectively might affect trans-homophilic interaction and hence neuronal connectivity. p.Arg679Pro and p.Lys687Glu are present within the linker region connecting trans-membrane to extracellular domain. Sequence analysis indicated the importance of p.Ile154, p.Arg679 and p.Lys687 in maintaining class specificity. Thus the observed association of PCDHG genes encoding neural adhesion proteins reinforces the hypothesis of aberrant neuronal connectivity in the pathophysiology of dyslexia. Additionally, the striking conservation of the identified variants indicates a role of PCDHG in the evolution of highly specialized cognitive skills critical to reading.

  3. Amelogenesis Imperfecta; Genes, Proteins, and Pathways

    Directory of Open Access Journals (Sweden)

    Claire E. L. Smith

    2017-06-01

    Full Text Available Amelogenesis imperfecta (AI is the name given to a heterogeneous group of conditions characterized by inherited developmental enamel defects. AI enamel is abnormally thin, soft, fragile, pitted and/or badly discolored, with poor function and aesthetics, causing patients problems such as early tooth loss, severe embarrassment, eating difficulties, and pain. It was first described separately from diseases of dentine nearly 80 years ago, but the underlying genetic and mechanistic basis of the condition is only now coming to light. Mutations in the gene AMELX, encoding an extracellular matrix protein secreted by ameloblasts during enamel formation, were first identified as a cause of AI in 1991. Since then, mutations in at least eighteen genes have been shown to cause AI presenting in isolation of other health problems, with many more implicated in syndromic AI. Some of the encoded proteins have well documented roles in amelogenesis, acting as enamel matrix proteins or the proteases that degrade them, cell adhesion molecules or regulators of calcium homeostasis. However, for others, function is less clear and further research is needed to understand the pathways and processes essential for the development of healthy enamel. Here, we review the genes and mutations underlying AI presenting in isolation of other health problems, the proteins they encode and knowledge of their roles in amelogenesis, combining evidence from human phenotypes, inheritance patterns, mouse models, and in vitro studies. An LOVD resource (http://dna2.leeds.ac.uk/LOVD/ containing all published gene mutations for AI presenting in isolation of other health problems is described. We use this resource to identify trends in the genes and mutations reported to cause AI in the 270 families for which molecular diagnoses have been reported by 23rd May 2017. Finally we discuss the potential value of the translation of AI genetics to clinical care with improved patient pathways and

  4. Amelogenesis Imperfecta; Genes, Proteins, and Pathways.

    Science.gov (United States)

    Smith, Claire E L; Poulter, James A; Antanaviciute, Agne; Kirkham, Jennifer; Brookes, Steven J; Inglehearn, Chris F; Mighell, Alan J

    2017-01-01

    Amelogenesis imperfecta (AI) is the name given to a heterogeneous group of conditions characterized by inherited developmental enamel defects. AI enamel is abnormally thin, soft, fragile, pitted and/or badly discolored, with poor function and aesthetics, causing patients problems such as early tooth loss, severe embarrassment, eating difficulties, and pain. It was first described separately from diseases of dentine nearly 80 years ago, but the underlying genetic and mechanistic basis of the condition is only now coming to light. Mutations in the gene AMELX , encoding an extracellular matrix protein secreted by ameloblasts during enamel formation, were first identified as a cause of AI in 1991. Since then, mutations in at least eighteen genes have been shown to cause AI presenting in isolation of other health problems, with many more implicated in syndromic AI. Some of the encoded proteins have well documented roles in amelogenesis, acting as enamel matrix proteins or the proteases that degrade them, cell adhesion molecules or regulators of calcium homeostasis. However, for others, function is less clear and further research is needed to understand the pathways and processes essential for the development of healthy enamel. Here, we review the genes and mutations underlying AI presenting in isolation of other health problems, the proteins they encode and knowledge of their roles in amelogenesis, combining evidence from human phenotypes, inheritance patterns, mouse models, and in vitro studies. An LOVD resource (http://dna2.leeds.ac.uk/LOVD/) containing all published gene mutations for AI presenting in isolation of other health problems is described. We use this resource to identify trends in the genes and mutations reported to cause AI in the 270 families for which molecular diagnoses have been reported by 23rd May 2017. Finally we discuss the potential value of the translation of AI genetics to clinical care with improved patient pathways and speculate on the

  5. Heterologous reconstitution of the intact geodin gene cluster in Aspergillus nidulans through a simple and versatile PCR based approach.

    Directory of Open Access Journals (Sweden)

    Morten Thrane Nielsen

    Full Text Available Fungal natural products are a rich resource for bioactive molecules. To fully exploit this potential it is necessary to link genes to metabolites. Genetic information for numerous putative biosynthetic pathways has become available in recent years through genome sequencing. However, the lack of solid methodology for genetic manipulation of most species severely hampers pathway characterization. Here we present a simple PCR based approach for heterologous reconstitution of intact gene clusters. Specifically, the putative gene cluster responsible for geodin production from Aspergillus terreus was transferred in a two step procedure to an expression platform in A. nidulans. The individual cluster fragments were generated by PCR and assembled via efficient USER fusion prior to transformation and integration via re-iterative gene targeting. A total of 13 open reading frames contained in 25 kb of DNA were successfully transferred between the two species enabling geodin synthesis in A. nidulans. Subsequently, functions of three genes in the cluster were validated by genetic and chemical analyses. Specifically, ATEG_08451 (gedC encodes a polyketide synthase, ATEG_08453 (gedR encodes a transcription factor responsible for activation of the geodin gene cluster and ATEG_08460 (gedL encodes a halogenase that catalyzes conversion of sulochrin to dihydrogeodin. We expect that our approach for transferring intact biosynthetic pathways to a fungus with a well developed genetic toolbox will be instrumental in characterizing the many exciting pathways for secondary metabolite production that are currently being uncovered by the fungal genome sequencing projects.

  6. Influence of putative exopolysaccharide genes on Pseudomonas putida KT2440 biofilm stability

    DEFF Research Database (Denmark)

    Nilsson, Martin; Chiang, Wen-Chi; Fazli, Mustafa

    2011-01-01

    We report a study of the role of putative exopolysaccharide gene clusters in the formation and stability of Pseudomonas putida KT2440 biofilm. Two novel putative exopolysaccharide gene clusters, pea and peb, were identified, and evidence is provided that they encode products that stabilize P....... putida KT2440 biofilm. The gene clusters alg and bcs, which code for proteins mediating alginate and cellulose biosynthesis, were found to play minor roles in P. putida KT2440 biofilm formation and stability under the conditions tested. A P. putida KT2440 derivative devoid of any identifiable...

  7. Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach.

    Directory of Open Access Journals (Sweden)

    Jing Zhao

    Full Text Available Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.

  8. The human protein disulfide isomerase gene family

    Directory of Open Access Journals (Sweden)

    Galligan James J

    2012-07-01

    Full Text Available Abstract Enzyme-mediated disulfide bond formation is a highly conserved process affecting over one-third of all eukaryotic proteins. The enzymes primarily responsible for facilitating thiol-disulfide exchange are members of an expanding family of proteins known as protein disulfide isomerases (PDIs. These proteins are part of a larger superfamily of proteins known as the thioredoxin protein family (TRX. As members of the PDI family of proteins, all proteins contain a TRX-like structural domain and are predominantly expressed in the endoplasmic reticulum. Subcellular localization and the presence of a TRX domain, however, comprise the short list of distinguishing features required for gene family classification. To date, the PDI gene family contains 21 members, varying in domain composition, molecular weight, tissue expression, and cellular processing. Given their vital role in protein-folding, loss of PDI activity has been associated with the pathogenesis of numerous disease states, most commonly related to the unfolded protein response (UPR. Over the past decade, UPR has become a very attractive therapeutic target for multiple pathologies including Alzheimer disease, Parkinson disease, alcoholic and non-alcoholic liver disease, and type-2 diabetes. Understanding the mechanisms of protein-folding, specifically thiol-disulfide exchange, may lead to development of a novel class of therapeutics that would help alleviate a wide range of diseases by targeting the UPR.

  9. Variations in CCL3L gene cluster sequence and non-specific gene copy numbers

    Directory of Open Access Journals (Sweden)

    Edberg Jeffrey C

    2010-03-01

    Full Text Available Abstract Background Copy number variations (CNVs of the gene CC chemokine ligand 3-like1 (CCL3L1 have been implicated in HIV-1 susceptibility, but the association has been inconsistent. CCL3L1 shares homology with a cluster of genes localized to chromosome 17q12, namely CCL3, CCL3L2, and, CCL3L3. These genes are involved in host defense and inflammatory processes. Several CNV assays have been developed for the CCL3L1 gene. Findings Through pairwise and multiple alignments of these genes, we have shown that the homology between these genes ranges from 50% to 99% in complete gene sequences and from 70-100% in the exonic regions, with CCL3L1 and CCL3L3 being identical. By use of MEGA 4 and BioEdit, we aligned sense primers, anti-sense primers, and probes used in several previously described assays against pre-multiple alignments of all four chemokine genes. Each set of probes and primers aligned and matched with overlapping sequences in at least two of the four genes, indicating that previously utilized RT-PCR based CNV assays are not specific for only CCL3L1. The four available assays measured median copies of 2 and 3-4 in European and African American, respectively. The concordance between the assays ranged from 0.44-0.83 suggesting individual discordant calls and inconsistencies with the assays from the expected gene coverage from the known sequence. Conclusions This indicates that some of the inconsistencies in the association studies could be due to assays that provide heterogenous results. Sequence information to determine CNV of the three genes separately would allow to test whether their association with the pathogenesis of a human disease or phenotype is affected by an individual gene or by a combination of these genes.

  10. Mutations in iron-sulfur cluster proteins that improve xylose utilization

    Science.gov (United States)

    Froehlich, Allan; Henningsen, Brooks; Covalla, Sean; Zelle, Rintze M.

    2018-03-20

    There is provided an engineered host cells comprising (a) one or more mutations in one or more endogenous genes encoding a protein associated with iron metabolism; and (b) at least one gene encoding a polypeptide having xylose isomerase activity, and methods of their use thereof.

  11. Structural fragment clustering reveals novel structural and functional motifs in α-helical transmembrane proteins

    Directory of Open Access Journals (Sweden)

    Vassilev Boris

    2010-04-01

    Full Text Available Abstract Background A large proportion of an organism's genome encodes for membrane proteins. Membrane proteins are important for many cellular processes, and several diseases can be linked to mutations in them. With the tremendous growth of sequence data, there is an increasing need to reliably identify membrane proteins from sequence, to functionally annotate them, and to correctly predict their topology. Results We introduce a technique called structural fragment clustering, which learns sequential motifs from 3D structural fragments. From over 500,000 fragments, we obtain 213 statistically significant, non-redundant, and novel motifs that are highly specific to α-helical transmembrane proteins. From these 213 motifs, 58 of them were assigned to function and checked in the scientific literature for a biological assessment. Seventy percent of the motifs are found in co-factor, ligand, and ion binding sites, 30% at protein interaction interfaces, and 12% bind specific lipids such as glycerol or cardiolipins. The vast majority of motifs (94% appear across evolutionarily unrelated families, highlighting the modularity of functional design in membrane proteins. We describe three novel motifs in detail: (1 a dimer interface motif found in voltage-gated chloride channels, (2 a proton transfer motif found in heme-copper oxidases, and (3 a convergently evolved interface helix motif found in an aspartate symporter, a serine protease, and cytochrome b. Conclusions Our findings suggest that functional modules exist in membrane proteins, and that they occur in completely different evolutionary contexts and cover different binding sites. Structural fragment clustering allows us to link sequence motifs to function through clusters of structural fragments. The sequence motifs can be applied to identify and characterize membrane proteins in novel genomes.

  12. On the Power and Limits of Sequence Similarity Based Clustering of Proteins Into Families

    DEFF Research Database (Denmark)

    Wiwie, Christian; Röttger, Richard

    2017-01-01

    Over the last decades, we have observed an ongoing tremendous growth of available sequencing data fueled by the advancements in wet-lab technology. The sequencing information is only the beginning of the actual understanding of how organisms survive and prosper. It is, for instance, equally...... important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We...... used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters...

  13. Prediction of the Ebola Virus Infection Related Human Genes Using Protein-Protein Interaction Network.

    Science.gov (United States)

    Cao, HuanHuan; Zhang, YuHang; Zhao, Jia; Zhu, Liucun; Wang, Yi; Li, JiaRui; Feng, Yuan-Ming; Zhang, Ning

    2017-01-01

    Ebola hemorrhagic fever (EHF) is caused by Ebola virus (EBOV). It is reported that human could be infected by EBOV with a high fatality rate. However, association factors between EBOV and host still tend to be ambiguous. According to the "guilt by association" (GBA) principle, proteins interacting with each other are very likely to function similarly or the same. Based on this assumption, we tried to obtain EBOV infection-related human genes in a protein-protein interaction network using Dijkstra algorithm. We hope it could contribute to the discovery of novel effective treatments. Finally, 15 genes were selected as potential EBOV infection-related human genes. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource

    Directory of Open Access Journals (Sweden)

    Sharpton Thomas J

    2012-10-01

    Full Text Available Abstract Background New computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences. Results We implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as “Sifting Families,” or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology–based analyses. Conclusions We describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/.

  15. Porcine lung surfactant protein B gene (SFTPB)

    DEFF Research Database (Denmark)

    Cirera Salicio, Susanna; Fredholm, Merete

    2008-01-01

    The porcine surfactant protein B (SFTPB) is a single copy gene on chromosome 3. Three different cDNAs for the SFTPB have been isolated and sequenced. Nucleotide sequence comparison revealed six nonsynonymous single nucleotide polymorphisms (SNPs), four synonymous SNPs and an in-frame deletion of 69...... bp in the region coding for the active protein. Northern analysis showed lung-specific expression of three different isoforms of the SFTPB transcript. The expression level for the SFTPB gene is low in 50 days-old fetus and it increases during lung development. Quantitative real-time polymerase chain...

  16. Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering.

    Science.gov (United States)

    Chang, Jinyuan; Zhou, Wen; Zhou, Wen-Xin; Wang, Lan

    2017-03-01

    Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence, the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and are available on CRAN. © 2016, The International Biometric Society.

  17. The transcriptional repressor protein NsrR senses nitric oxide directly via a [2Fe-2S] cluster.

    Directory of Open Access Journals (Sweden)

    Nicholas P Tucker

    Full Text Available The regulatory protein NsrR, a member of the Rrf2 family of transcription repressors, is specifically dedicated to sensing nitric oxide (NO in a variety of pathogenic and non-pathogenic bacteria. It has been proposed that NO directly modulates NsrR activity by interacting with a predicted [Fe-S] cluster in the NsrR protein, but no experimental evidence has been published to support this hypothesis. Here we report the purification of NsrR from the obligate aerobe Streptomyces coelicolor. We demonstrate using UV-visible, near UV CD and EPR spectroscopy that the protein contains an NO-sensitive [2Fe-2S] cluster when purified from E. coli. Upon exposure of NsrR to NO, the cluster is nitrosylated, which results in the loss of DNA binding activity as detected by bandshift assays. Removal of the [2Fe-2S] cluster to generate apo-NsrR also resulted in loss of DNA binding activity. This is the first demonstration that NsrR contains an NO-sensitive [2Fe-2S] cluster that is required for DNA binding activity.

  18. The WRKY Transcription Factor Genes in Lotus japonicus.

    Science.gov (United States)

    Song, Hui; Wang, Pengfei; Nan, Zhibiao; Wang, Xingjun

    2014-01-01

    WRKY transcription factor genes play critical roles in plant growth and development, as well as stress responses. WRKY genes have been examined in various higher plants, but they have not been characterized in Lotus japonicus. The recent release of the L. japonicus whole genome sequence provides an opportunity for a genome wide analysis of WRKY genes in this species. In this study, we identified 61 WRKY genes in the L. japonicus genome. Based on the WRKY protein structure, L. japonicus WRKY (LjWRKY) genes can be classified into three groups (I-III). Investigations of gene copy number and gene clusters indicate that only one gene duplication event occurred on chromosome 4 and no clustered genes were detected on chromosomes 3 or 6. Researchers previously believed that group II and III WRKY domains were derived from the C-terminal WRKY domain of group I. Our results suggest that some WRKY genes in group II originated from the N-terminal domain of group I WRKY genes. Additional evidence to support this hypothesis was obtained by Medicago truncatula WRKY (MtWRKY) protein motif analysis. We found that LjWRKY and MtWRKY group III genes are under purifying selection, suggesting that WRKY genes will become increasingly structured and functionally conserved.

  19. Identifying Novel Candidate Genes Related to Apoptosis from a Protein-Protein Interaction Network

    Directory of Open Access Journals (Sweden)

    Baoman Wang

    2015-01-01

    Full Text Available Apoptosis is the process of programmed cell death (PCD that occurs in multicellular organisms. This process of normal cell death is required to maintain the balance of homeostasis. In addition, some diseases, such as obesity, cancer, and neurodegenerative diseases, can be cured through apoptosis, which produces few side effects. An effective comprehension of the mechanisms underlying apoptosis will be helpful to prevent and treat some diseases. The identification of genes related to apoptosis is essential to uncover its underlying mechanisms. In this study, a computational method was proposed to identify novel candidate genes related to apoptosis. First, protein-protein interaction information was used to construct a weighted graph. Second, a shortest path algorithm was applied to the graph to search for new candidate genes. Finally, the obtained genes were filtered by a permutation test. As a result, 26 genes were obtained, and we discuss their likelihood of being novel apoptosis-related genes by collecting evidence from published literature.

  20. A proteomic approach to investigating gene cluster expression and secondary metabolite functionality in Aspergillus fumigatus.

    Directory of Open Access Journals (Sweden)

    Rebecca A Owens

    Full Text Available A combined proteomics and metabolomics approach was utilised to advance the identification and characterisation of secondary metabolites in Aspergillus fumigatus. Here, implementation of a shotgun proteomic strategy led to the identification of non-redundant mycelial proteins (n = 414 from A. fumigatus including proteins typically under-represented in 2-D proteome maps: proteins with multiple transmembrane regions, hydrophobic proteins and proteins with extremes of molecular mass and pI. Indirect identification of secondary metabolite cluster expression was also achieved, with proteins (n = 18 from LaeA-regulated clusters detected, including GliT encoded within the gliotoxin biosynthetic cluster. Biochemical analysis then revealed that gliotoxin significantly attenuates H2O2-induced oxidative stress in A. fumigatus (p>0.0001, confirming observations from proteomics data. A complementary 2-D/LC-MS/MS approach further elucidated significantly increased abundance (p<0.05 of proliferating cell nuclear antigen (PCNA, NADH-quinone oxidoreductase and the gliotoxin oxidoreductase GliT, along with significantly attenuated abundance (p<0.05 of a heat shock protein, an oxidative stress protein and an autolysis-associated chitinase, when gliotoxin and H2O2 were present, compared to H2O2 alone. Moreover, gliotoxin exposure significantly reduced the abundance of selected proteins (p<0.05 involved in de novo purine biosynthesis. Significantly elevated abundance (p<0.05 of a key enzyme, xanthine-guanine phosphoribosyl transferase Xpt1, utilised in purine salvage, was observed in the presence of H2O2 and gliotoxin. This work provides new insights into the A. fumigatus proteome and experimental strategies, plus mechanistic data pertaining to gliotoxin functionality in the organism.

  1. Natural Variation of Epstein-Barr Virus Genes, Proteins, and Primary MicroRNA.

    Science.gov (United States)

    Correia, Samantha; Palser, Anne; Elgueta Karstegl, Claudio; Middeldorp, Jaap M; Ramayanti, Octavia; Cohen, Jeffrey I; Hildesheim, Allan; Fellner, Maria Dolores; Wiels, Joelle; White, Robert E; Kellam, Paul; Farrell, Paul J

    2017-08-01

    Viral gene sequences from an enlarged set of about 200 Epstein-Barr virus (EBV) strains, including many primary isolates, have been used to investigate variation in key viral genetic regions, particularly LMP1, Zp, gp350, EBNA1, and the BART microRNA (miRNA) cluster 2. Determination of type 1 and type 2 EBV in saliva samples from people from a wide range of geographic and ethnic backgrounds demonstrates a small percentage of healthy white Caucasian British people carrying predominantly type 2 EBV. Linkage of Zp and gp350 variants to type 2 EBV is likely to be due to their genes being adjacent to the EBNA3 locus, which is one of the major determinants of the type 1/type 2 distinction. A novel classification of EBNA1 DNA binding domains, named QCIGP, results from phylogeny analysis of their protein sequences but is not linked to the type 1/type 2 classification. The BART cluster 2 miRNA region is classified into three major variants through single-nucleotide polymorphisms (SNPs) in the primary miRNA outside the mature miRNA sequences. These SNPs can result in altered levels of expression of some miRNAs from the BART variant frequently present in Chinese and Indonesian nasopharyngeal carcinoma (NPC) samples. The EBV genetic variants identified here provide a basis for future, more directed analysis of association of specific EBV variations with EBV biology and EBV-associated diseases. IMPORTANCE Incidence of diseases associated with EBV varies greatly in different parts of the world. Thus, relationships between EBV genome sequence variation and health, disease, geography, and ethnicity of the host may be important for understanding the role of EBV in diseases and for development of an effective EBV vaccine. This paper provides the most comprehensive analysis so far of variation in specific EBV genes relevant to these diseases and proposed EBV vaccines. By focusing on variation in LMP1, Zp, gp350, EBNA1, and the BART miRNA cluster 2, new relationships with the known

  2. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    Science.gov (United States)

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  3. Crystal Structure of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated Csn2 Protein Revealed Ca[superscript 2+]-dependent Double-stranded DNA Binding Activity

    Energy Technology Data Exchange (ETDEWEB)

    Nam, Ki Hyun; Kurinov, Igor; Ke, Ailong (Cornell); (NWU)

    2012-05-22

    Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated protein genes (cas genes) are widespread in bacteria and archaea. They form a line of RNA-based immunity to eradicate invading bacteriophages and malicious plasmids. A key molecular event during this process is the acquisition of new spacers into the CRISPR loci to guide the selective degradation of the matching foreign genetic elements. Csn2 is a Nmeni subtype-specific cas gene required for new spacer acquisition. Here we characterize the Enterococcus faecalis Csn2 protein as a double-stranded (ds-) DNA-binding protein and report its 2.7 {angstrom} tetrameric ring structure. The inner circle of the Csn2 tetrameric ring is {approx}26 {angstrom} wide and populated with conserved lysine residues poised for nonspecific interactions with ds-DNA. Each Csn2 protomer contains an {alpha}/{beta} domain and an {alpha}-helical domain; significant hinge motion was observed between these two domains. Ca{sup 2+} was located at strategic positions in the oligomerization interface. We further showed that removal of Ca{sup 2+} ions altered the oligomerization state of Csn2, which in turn severely decreased its affinity for ds-DNA. In summary, our results provided the first insight into the function of the Csn2 protein in CRISPR adaptation by revealing that it is a ds-DNA-binding protein functioning at the quaternary structure level and regulated by Ca{sup 2+} ions.

  4. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  5. Linkage of the Nit1C gene cluster to bacterial cyanide assimilation as a nitrogen source.

    Science.gov (United States)

    Jones, Lauren B; Ghosh, Pallab; Lee, Jung-Hyun; Chou, Chia-Ni; Kunz, Daniel A

    2018-05-21

    A genetic linkage between a conserved gene cluster (Nit1C) and the ability of bacteria to utilize cyanide as the sole nitrogen source was demonstrated for nine different bacterial species. These included three strains whose cyanide nutritional ability has formerly been documented (Pseudomonas fluorescens Pf11764, Pseudomonas putida BCN3 and Klebsiella pneumoniae BCN33), and six not previously known to have this ability [Burkholderia (Paraburkholderia) xenovorans LB400, Paraburkholderia phymatum STM815, Paraburkholderia phytofirmans PsJN, Cupriavidus (Ralstonia) eutropha H16, Gluconoacetobacter diazotrophicus PA1 5 and Methylobacterium extorquens AM1]. For all bacteria, growth on or exposure to cyanide led to the induction of the canonical nitrilase (NitC) linked to the gene cluster, and in the case of Pf11764 in particular, transcript levels of cluster genes (nitBCDEFGH) were raised, and a nitC knock-out mutant failed to grow. Further studies demonstrated that the highly conserved nitB gene product was also significantly elevated. Collectively, these findings provide strong evidence for a genetic linkage between Nit1C and bacterial growth on cyanide, supporting use of the term cyanotrophy in describing what may represent a new nutritional paradigm in microbiology. A broader search of Nit1C genes in presently available genomes revealed its presence in 270 different bacteria, all contained within the domain Bacteria, including Gram-positive Firmicutes and Actinobacteria, and Gram-negative Proteobacteria and Cyanobacteria. Absence of the cluster in the Archaea is congruent with events that may have led to the inception of Nit1C occurring coincidentally with the first appearance of cyanogenic species on Earth, dating back 400-500 million years.

  6. A highly divergent gene cluster in honey bees encodes a novel silk family

    OpenAIRE

    Sutherland, Tara D.; Campbell, Peter M.; Weisman, Sarah; Trueman, Holly E.; Sriskantha, Alagacone; Wanjura, Wolfgang J.; Haritos, Victoria S.

    2006-01-01

    The pupal cocoon of the domesticated silk moth Bombyx mori is the best known and most extensively studied insect silk. It is not widely known that Apis mellifera larvae also produce silk. We have used a combination of genomic and proteomic techniques to identify four honey bee fiber genes (AmelFibroin1–4) and two silk-associated genes (AmelSA1 and 2). The four fiber genes are small, comprise a single exon each, and are clustered on a short genomic region where the open reading frames are GC-r...

  7. Functional characterization of KanP, a methyltransferase from the kanamycin biosynthetic gene cluster of Streptomyces kanamyceticus.

    Science.gov (United States)

    Nepal, Keshav Kumar; Yoo, Jin Cheol; Sohng, Jae Kyung

    2010-09-20

    KanP, a putative methyltransferase, is located in the kanamycin biosynthetic gene cluster of Streptomyces kanamyceticus ATCC12853. Amino acid sequence analysis of KanP revealed the presence of S-adenosyl-L-methionine binding motifs, which are present in other O-methyltransferases. The kanP gene was expressed in Escherichia coli BL21 (DE3) to generate the E. coli KANP recombinant strain. The conversion of external quercetin to methylated quercetin in the culture extract of E. coli KANP proved the function of kanP as S-adenosyl-L-methionine-dependent methyltransferase. This is the first report concerning the identification of an O-methyltransferase gene from the kanamycin gene cluster. The resistant activity assay and RT-PCR analysis demonstrated the leeway for obtaining methylated kanamycin derivatives from the wild-type strain of kanamycin producer. 2009 Elsevier GmbH. All rights reserved.

  8. Clustering gene expression time series data using an infinite Gaussian process mixture model.

    Science.gov (United States)

    McDowell, Ian C; Manandhar, Dinesh; Vockley, Christopher M; Schmid, Amy K; Reddy, Timothy E; Engelhardt, Barbara E

    2018-01-01

    Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

  9. Clustering gene expression time series data using an infinite Gaussian process mixture model.

    Directory of Open Access Journals (Sweden)

    Ian C McDowell

    2018-01-01

    Full Text Available Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP, which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

  10. Characterization and detection of a widely distributed gene cluster that predicts anaerobic choline utilization by human gut bacteria.

    Science.gov (United States)

    Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P

    2015-04-14

    Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for

  11. PlantTribes: a gene and gene family resource for comparative genomics in plants

    OpenAIRE

    Wall, P. Kerr; Leebens-Mack, Jim; Müller, Kai F.; Field, Dawn; Altman, Naomi S.; dePamphilis, Claude W.

    2007-01-01

    The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, ca...

  12. The human TREM gene cluster at 6p21.1 encodes both activating and inhibitory single IgV domain receptors and includes NKp44.

    Science.gov (United States)

    Allcock, Richard J N; Barrow, Alexander D; Forbes, Simon; Beck, Stephan; Trowsdale, John

    2003-02-01

    We have characterized a cluster of single immunoglobulin variable (IgV) domain receptors centromeric of the major histocompatibility complex (MHC) on human chromosome 6. In addition to triggering receptor expressed on myeloid cells (TREM)-1 and TREM2, the cluster contains NKp44, a triggering receptor whose expression is limited to NK cells. We identified three new related genes and two gene fragments within a cluster of approximately 200 kb. Two of the three new genes lack charged residues in their transmembrane domain tails. Further, one of the genes contains two potential immunotyrosine Inhibitory motifs in its cytoplasmic tail, suggesting that it delivers inhibitory signals. The human and mouse TREM clusters appear to have diverged such that there are unique sequences in each species. Finally, each gene in the TREM cluster was expressed in a different range of cell types.

  13. Cluster protein structures using recurrence quantification analysis on coordinates of alpha-carbon atoms of proteins

    International Nuclear Information System (INIS)

    Zhou Yu; Yu Zuguo; Anh, Vo

    2007-01-01

    The 3-dimensional coordinates of alpha-carbon atoms of proteins are used to distinguish the protein structural classes based on recurrence quantification analysis (RQA). We consider two independent variables from RQA of coordinates of alpha-carbon atoms, %determ1 and %determ2, which were defined by Webber et al. [C.L. Webber Jr., A. Giuliani, J.P. Zbilut, A. Colosimo, Proteins Struct. Funct. Genet. 44 (2001) 292]. The variable %determ2 is used to define two new variables, %determ2 1 and %determ2 2 . Then three variables %determ1, %determ2 1 and %determ2 2 are used to construct a 3-dimensional variable space. Each protein is represented by a point in this variable space. The points corresponding to proteins from the α, β, α+β and α/β structural classes position into different areas in this variable space. In order to give a quantitative assessment of our clustering on the selected proteins, Fisher's discriminant algorithm is used. Numerical results indicate that the discriminant accuracies are very high and satisfactory

  14. Novel Tissue Level Effects of the Staphylococcus aureus Enterotoxin Gene Cluster Are Essential for Infective Endocarditis.

    Science.gov (United States)

    Stach, Christopher S; Vu, Bao G; Merriman, Joseph A; Herrera, Alfa; Cahill, Michael P; Schlievert, Patrick M; Salgado-Pabón, Wilmara

    2016-01-01

    Superantigens are indispensable virulence factors for Staphylococcus aureus in disease causation. Superantigens stimulate massive immune cell activation, leading to toxic shock syndrome (TSS) and contributing to other illnesses. However, superantigens differ in their capacities to induce body-wide effects. For many, their production, at least as tested in vitro, is not high enough to reach the circulation, or the proteins are not efficient in crossing epithelial and endothelial barriers, thus remaining within tissues or localized on mucosal surfaces where they exert only local effects. In this study, we address the role of TSS toxin-1 (TSST-1) and most importantly the enterotoxin gene cluster (egc) in infective endocarditis and sepsis, gaining insights into the body-wide versus local effects of superantigens. We examined S. aureus TSST-1 gene (tstH) and egc deletion strains in the rabbit model of infective endocarditis and sepsis. Importantly, we also assessed the ability of commercial human intravenous immunoglobulin (IVIG) plus vancomycin to alter the course of infective endocarditis and sepsis. TSST-1 contributed to infective endocarditis vegetations and lethal sepsis, while superantigens of the egc, a cluster with uncharacterized functions in S. aureus infections, promoted vegetation formation in infective endocarditis. IVIG plus vancomycin prevented lethality and stroke development in infective endocarditis and sepsis. Our studies support the local tissue effects of egc superantigens for establishment and progression of infective endocarditis providing evidence for their role in life-threatening illnesses. In contrast, TSST-1 contributes to both infective endocarditis and lethal sepsis. IVIG may be a useful adjunct therapy for infective endocarditis and sepsis.

  15. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    Science.gov (United States)

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Characterization and chondrocyte differentiation stage-specific expression of KRAB zinc-finger protein gene ZNF470

    International Nuclear Information System (INIS)

    Hering, Thomas M.; Kazmi, Najam H.; Huynh, Tru D.; Kollar, John; Xu, Laura; Hunyady, Aaron B.; Johnstone, Brian

    2004-01-01

    As part of a study to identify novel transcriptional regulators of chondrogenesis-related gene expression, we have cloned and characterized cDNA for zinc-finger protein 470 (ZNF470), the human ortholog of which encodes a 717 amino acid residue protein containing 17 Cys 2 His 2 zinc-finger domains, as well as KRAB-A and KRAB-B motifs. The cDNA library used to isolate the initial ZNF470 clone was prepared from human bone marrow-derived mesenchymal progenitor cells at an intermediate stage of chondrogenic differentiation. We have determined the intron-exon structure of the human ZNF470 gene, which has been mapped to a zinc-finger cluster in a known imprinted region of human chromosome 19q13.4. ZNF470 is expressed at high levels in human testis and is expressed at low or undetectible levels in other adult tissues. Human ZNF470 expressed in mammalian cells as an EGFP fusion protein localizes predominantly to the nucleus, consistent with a role in transcriptional regulation. ZNF470, analyzed by quantitative real time PCR, was transiently expressed before the maximal expression of COL2A1 during chondrogenic differentiation in vitro. We have also characterized the bovine ortholog of human ZNF470, which encodes a 508 amino acid residue protein having 10 zinc-finger domains. A bovine ZNF470 cDNA clone was used to examine expression of ZNF470 in bovine articular chondrocytes treated with retinoic acid to stimulate dedifferentiation. Bovine ZNF470 expression was undetectable in freshly isolated bovine articular chondrocytes, but was dramatically upregulated in dedifferentiated retinoic acid-treated chondrocytes. These results, in two model systems, suggest a possible role for ZNF470 in the regulation of chondrogenesis-specific gene expression

  17. Organization of nif gene cluster in Frankia sp. EuIK1 strain, a symbiont of Elaeagnus umbellata.

    Science.gov (United States)

    Oh, Chang Jae; Kim, Ho Bang; Kim, Jitae; Kim, Won Jin; Lee, Hyoungseok; An, Chung Sun

    2012-01-01

    The nucleotide sequence of a 20.5-kb genomic region harboring nif genes was determined and analyzed. The fragment was obtained from Frankia sp. EuIK1 strain, an indigenous symbiont of Elaeagnus umbellata. A total of 20 ORFs including 12 nif genes were identified and subjected to comparative analysis with the genome sequences of 3 Frankia strains representing diverse host plant specificities. The nucleotide and deduced amino acid sequences showed highest levels of identity with orthologous genes from an Elaeagnus-infecting strain. The gene organization patterns around the nif gene clusters were well conserved among all 4 Frankia strains. However, characteristic features appeared in the location of the nifV gene for each Frankia strain, depending on the type of host plant. Sequence analysis was performed to determine the transcription units and suggested that there could be an independent operon starting from the nifW gene in the EuIK strain. Considering the organization patterns and their total extensions on the genome, we propose that the nif gene clusters remained stable despite genetic variations occurring in the Frankia genomes.

  18. Genetic diversity of K-antigen gene clusters of Escherichia coli and their molecular typing using a suspension array.

    Science.gov (United States)

    Yang, Shuang; Xi, Daoyi; Jing, Fuyi; Kong, Deju; Wu, Junli; Feng, Lu; Cao, Boyang; Wang, Lei

    2018-04-01

    Capsular polysaccharides (CPSs), or K-antigens, are the major surface antigens of Escherichia coli. More than 80 serologically unique K-antigens are classified into 4 groups (Groups 1-4) of capsules. Groups 1 and 4 contain the Wzy-dependent polymerization pathway and the gene clusters are in the order galF to gnd; Groups 2 and 3 contain the ABC-transporter-dependent pathway and the gene clusters consist of 3 regions, regions 1, 2 and 3. Little is known about the variations among the gene clusters. In this study, 9 serotypes of K-antigen gene clusters (K2ab, K11, K20, K24, K38, K84, K92, K96, and K102) were sequenced and correlated with their CPS chemical structures. On the basis of sequence data, a K-antigen-specific suspension array that detects 10 distinct CPSs, including the above 9 CPSs plus K30, was developed. This is the first report to catalog the genetic features of E. coli K-antigen variations and to develop a suspension array for their molecular typing. The method has a number of advantages over traditional bacteriophage and serum agglutination methods and lays the foundation for straightforward identification and detection of additional K-antigens in the future.

  19. Analysis of ligand-protein exchange by Clustering of Ligand Diffusion Coefficient Pairs (CoLD-CoP)

    Science.gov (United States)

    Snyder, David A.; Chantova, Mihaela; Chaudhry, Saadia

    2015-06-01

    NMR spectroscopy is a powerful tool in describing protein structures and protein activity for pharmaceutical and biochemical development. This study describes a method to determine weak binding ligands in biological systems by using hierarchic diffusion coefficient clustering of multidimensional data obtained with a 400 MHz Bruker NMR. Comparison of DOSY spectrums of ligands of the chemical library in the presence and absence of target proteins show translational diffusion rates for small molecules upon interaction with macromolecules. For weak binders such as compounds found in fragment libraries, changes in diffusion rates upon macromolecular binding are on the order of the precision of DOSY diffusion measurements, and identifying such subtle shifts in diffusion requires careful statistical analysis. The "CoLD-CoP" (Clustering of Ligand Diffusion Coefficient Pairs) method presented here uses SAHN clustering to identify protein-binders in a chemical library or even a not fully characterized metabolite mixture. We will show how DOSY NMR and the "CoLD-CoP" method complement each other in identifying the most suitable candidates for lysozyme and wheat germ acid phosphatase.

  20. The relationship among gene expression, the evolution of gene dosage, and the rate of protein evolution.

    Directory of Open Access Journals (Sweden)

    Jean-François Gout

    2010-05-01

    Full Text Available The understanding of selective constraints affecting genes is a major issue in biology. It is well established that gene expression level is a major determinant of the rate of protein evolution, but the reasons for this relationship remain highly debated. Here we demonstrate that gene expression is also a major determinant of the evolution of gene dosage: the rate of gene losses after whole genome duplications in the Paramecium lineage is negatively correlated to the level of gene expression, and this relationship is not a byproduct of other factors known to affect the fate of gene duplicates. This indicates that changes in gene dosage are generally more deleterious for highly expressed genes. This rule also holds for other taxa: in yeast, we find a clear relationship between gene expression level and the fitness impact of reduction in gene dosage. To explain these observations, we propose a model based on the fact that the optimal expression level of a gene corresponds to a trade-off between the benefit and cost of its expression. This COSTEX model predicts that selective pressure against mutations changing gene expression level or affecting the encoded protein should on average be stronger in highly expressed genes and hence that both the frequency of gene loss and the rate of protein evolution should correlate negatively with gene expression. Thus, the COSTEX model provides a simple and common explanation for the general relationship observed between the level of gene expression and the different facets of gene evolution.

  1. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

    Science.gov (United States)

    Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

    2017-12-01

    Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Drosophila TDP-43 RNA-Binding Protein Facilitates Association of Sister Chromatid Cohesion Proteins with Genes, Enhancers and Polycomb Response Elements.

    Directory of Open Access Journals (Sweden)

    Amanda Swain

    2016-09-01

    Full Text Available The cohesin protein complex mediates sister chromatid cohesion and participates in transcriptional control of genes that regulate growth and development. Substantial reduction of cohesin activity alters transcription of many genes without disrupting chromosome segregation. Drosophila Nipped-B protein loads cohesin onto chromosomes, and together Nipped-B and cohesin occupy essentially all active transcriptional enhancers and a large fraction of active genes. It is unknown why some active genes bind high levels of cohesin and some do not. Here we show that the TBPH and Lark RNA-binding proteins influence association of Nipped-B and cohesin with genes and gene regulatory sequences. In vitro, TBPH and Lark proteins specifically bind RNAs produced by genes occupied by Nipped-B and cohesin. By genomic chromatin immunoprecipitation these RNA-binding proteins also bind to chromosomes at cohesin-binding genes, enhancers, and Polycomb response elements (PREs. RNAi depletion reveals that TBPH facilitates association of Nipped-B and cohesin with genes and regulatory sequences. Lark reduces binding of Nipped-B and cohesin at many promoters and aids their association with several large enhancers. Conversely, Nipped-B facilitates TBPH and Lark association with genes and regulatory sequences, and interacts with TBPH and Lark in affinity chromatography and immunoprecipitation experiments. Blocking transcription does not ablate binding of Nipped-B and the RNA-binding proteins to chromosomes, indicating transcription is not required to maintain binding once established. These findings demonstrate that RNA-binding proteins help govern association of sister chromatid cohesion proteins with genes and enhancers.

  3. Cloning of human genes encoding novel G protein-coupled receptors

    Energy Technology Data Exchange (ETDEWEB)

    Marchese, A.; Docherty, J.M.; Heiber, M. [Univ. of Toronto, (Canada)] [and others

    1994-10-01

    We report the isolation and characterization of several novel human genes encoding G protein-coupled receptors. Each of the receptors contained the familiar seven transmembrane topography and most closely resembled peptide binding receptors. Gene GPR1 encoded a receptor protein that is intronless in the coding region and that shared identity (43% in the transmembrane regions) with the opioid receptors. Northern blot analysis revealed that GPR1 transcripts were expressed in the human hippocampus, and the gene was localized to chromosome 15q21.6. Gene GPR2 encoded a protein that most closely resembled an interleukin-8 receptor (51% in the transmembrane regions), and this gene, not expressed in the six brain regions examined, was localized to chromosome 17q2.1-q21.3. A third gene, GPR3, showed identity (56% in the transmembrane regions) with a previously characterized cDNA clone from rat and was localized to chromosome 1p35-p36.1. 31 refs., 5 figs., 1 tab.

  4. Characterization of the Second LysR-Type Regulator in the Biphenyl-Catabolic Gene Cluster of Pseudomonas pseudoalcaligenes KF707

    OpenAIRE

    Watanabe, Takahito; Fujihara, Hidehiko; Furukawa, Kensuke

    2003-01-01

    Pseudomonas pseudoalcaligenes KF707 possesses a biphenyl-catabolic (bph) gene cluster consisting of bphR1A1A2-(orf3)-bphA3A4BCX0X1X2X3D. The bphR1 (formerly orf0) gene product, which belongs to the GntR family, is a positive regulator for itself and bphX0X1X2X3D. Further analysis in this study revealed that a second regulator belonging to the LysR family (designated bphR2) is involved in the regulation of the bph genes in KF707. The bphR2 gene was not located near the bph gene cluster, and it...

  5. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters

    NARCIS (Netherlands)

    Cimermancic, P.; Medema, Marnix; Claesen, J.; Kurika, K.; Wieland Brown, L.C.; Mavrommatis, K.; Pati, A.; Godfrey, P.A.; Koehrsen, M.; Clardy, J.; Birren, B. W.; Takano, Eriko; Sali, A.; Linington, R.G.; Fischbach, M.A.

    2014-01-01

    Although biosynthetic gene clusters (BGCs) have been discovered for hundreds of bacterial metabolites, our knowledge of their diversity remains limited. Here, we used a novel algorithm to systematically identify BGCs in the extensive extant microbial sequencing data. Network analysis of the

  6. Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Cluster based on sequence comparison of homologous proteins of 95 organism spe...cies Data detail Data name Cluster based on sequence comparison of homologous proteins of 95 organism specie...istory of This Database Site Policy | Contact Us Cluster based on sequence compariso

  7. Functional dissection of HOXD cluster genes in regulation of neuroblastoma cell proliferation and differentiation.

    Directory of Open Access Journals (Sweden)

    Yunhong Zha

    Full Text Available Retinoic acid (RA can induce growth arrest and neuronal differentiation of neuroblastoma cells and has been used in clinic for treatment of neuroblastoma. It has been reported that RA induces the expression of several HOXD genes in human neuroblastoma cell lines, but their roles in RA action are largely unknown. The HOXD cluster contains nine genes (HOXD1, HOXD3, HOXD4, and HOXD8-13 that are positioned sequentially from 3' to 5', with HOXD1 at the 3' end and HOXD13 the 5' end. Here we show that all HOXD genes are induced by RA in the human neuroblastoma BE(2-C cells, with the genes located at the 3' end being activated generally earlier than those positioned more 5' within the cluster. Individual induction of HOXD8, HOXD9, HOXD10 or HOXD12 is sufficient to induce both growth arrest and neuronal differentiation, which is associated with downregulation of cell cycle-promoting genes and upregulation of neuronal differentiation genes. However, induction of other HOXD genes either has no effect (HOXD1 or has partial effects (HOXD3, HOXD4, HOXD11 and HOXD13 on BE(2-C cell proliferation or differentiation. We further show that knockdown of HOXD8 expression, but not that of HOXD9 expression, significantly inhibits the differentiation-inducing activity of RA. HOXD8 directly activates the transcription of HOXC9, a key effector of RA action in neuroblastoma cells. These findings highlight the distinct functions of HOXD genes in RA induction of neuroblastoma cell differentiation.

  8. Regulation of human protein S gene (PROS1) transcription

    NARCIS (Netherlands)

    Wolf, Cornelia de

    2006-01-01

    This thesis describes the investigation of the transcriptional regulation of the gene for anticoagulant plasma Protein S, PROS1. Protein S is a cofactor for Protein C in the Protein C anticoagulant pathway. The coagulation cascade is negatively regulated by this pathway through inactivation of

  9. The Cremeomycin Biosynthetic Gene Cluster Encodes a Pathway for Diazo Formation.

    Science.gov (United States)

    Waldman, Abraham J; Pechersky, Yakov; Wang, Peng; Wang, Jennifer X; Balskus, Emily P

    2015-10-12

    Diazo groups are found in a range of natural products that possess potent biological activities. Despite longstanding interest in these metabolites, diazo group biosynthesis is not well understood, in part because of difficulties in identifying specific genes linked to diazo formation. Here we describe the discovery of the gene cluster that produces the o-diazoquinone natural product cremeomycin and its heterologous expression in Streptomyces lividans. We used stable isotope feeding experiments and in vitro characterization of biosynthetic enzymes to decipher the order of events in this pathway and establish that diazo construction involves late-stage N-N bond formation. This work represents the first successful production of a diazo-containing metabolite in a heterologous host, experimentally linking a set of genes with diazo formation. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. An original SERPINA3 gene cluster: Elucidation of genomic organization and gene expression in the Bos taurus 21q24 region

    Directory of Open Access Journals (Sweden)

    Ouali Ahmed

    2008-04-01

    Full Text Available Abstract Background The superfamily of serine proteinase inhibitors (serpins is involved in numerous fundamental biological processes as inflammation, blood coagulation and apoptosis. Our interest is focused on the SERPINA3 sub-family. The major human plasma protease inhibitor, α1-antichymotrypsin, encoded by the SERPINA3 gene, is homologous to genes organized in clusters in several mammalian species. However, although there is a similar genic organization with a high degree of sequence conservation, the reactive-centre-loop domains, which are responsible for the protease specificity, show significant divergences. Results We provide additional information by analyzing the situation of SERPINA3 in the bovine genome. A cluster of eight genes and one pseudogene sharing a high degree of identity and the same structural organization was characterized. Bovine SERPINA3 genes were localized by radiation hybrid mapping on 21q24 and only spanned over 235 Kilobases. For all these genes, we propose a new nomenclature from SERPINA3-1 to SERPINA3-8. They share approximately 70% of identity with the human SERPINA3 homologue. In the cluster, we described an original sub-group of six members with an unexpected high degree of conservation for the reactive-centre-loop domain, suggesting a similar peptidase inhibitory pattern. Preliminary expression analyses of these bovSERPINA3s showed different tissue-specific patterns and diverse states of glycosylation and phosphorylation. Finally, in the context of phylogenetic analyses, we improved our knowledge on mammalian SERPINAs evolution. Conclusion Our experimental results update data of the bovine genome sequencing, substantially increase the bovSERPINA3 sub-family and enrich the phylogenetic tree of serpins. We provide new opportunities for future investigations to approach the biological functions of this unusual subset of serine proteinase inhibitors.

  11. Acyl-CoA-binding protein/diazepam-binding inhibitor gene and pseudogenes

    DEFF Research Database (Denmark)

    Mandrup, S; Hummel, R; Ravn, S

    1992-01-01

    Acyl-CoA-binding protein (ACBP) is a 10 kDa protein isolated from bovine liver by virtue of its ability to bind and induce the synthesis of medium-chain acyl-CoA esters. Surprisingly, it turned out to be identical to a protein named diazepam-binding Inhibitor (DBI) claimed to be an endogenous mod...... have molecularly cloned and characterized the ACBP/DBI gene family in rat. The rat ACBP/DBI gene family comprises one expressed gene and four processed pseudogenes of which one was shown to exist in two allelic forms. The expressed gene is organized into four exons and three introns...

  12. The entire β-globin gene cluster is deleted in a form of τδβ-thalassemia.

    NARCIS (Netherlands)

    E.R. Fearon; H.H.Jr. Kazazian; P.G. Waber (Pamela); J.I. Lee (Joseph); S.E. Antonarakis; S.H. Orkin (Stuart); E.F. Vanin; P.S. Henthorn; F.G. Grosveld (Frank); A.F. Scott; G.R. Buchanan

    1983-01-01

    textabstractWe have used restriction endonuclease mapping to study a deletion involving the beta-globin gene cluster in a Mexican-American family with gamma delta beta-thalassemia. Analysis of DNA polymorphisms demonstrated deletion of the beta-globin gene from the affected chromosome. Using a DNA

  13. Aromatic Polyketide GTRI-02 is a Previously Unidentified Product of the act Gene Cluster in Streptomyces coelicolor A3(2).

    Science.gov (United States)

    Wu, Changsheng; Ichinose, Koji; Choi, Young Hae; van Wezel, Gilles P

    2017-07-18

    The biosynthesis of aromatic polyketides derived from type II polyketide synthases (PKSs) is complex, and it is not uncommon that highly similar gene clusters give rise to diverse structural architectures. The act biosynthetic gene cluster (BGC) of the model actinomycete Streptomyces coelicolor A3(2) is an archetypal type II PKS. Here we show that the act BGC also specifies the aromatic polyketide GTRI-02 (1) and propose a mechanism for the biogenesis of its 3,4-dihydronaphthalen-1(2H)-one backbone. Polyketide 1 was also produced by Streptomyces sp. MBT76 after activation of the act-like qin gene cluster by overexpression of the pathway-specific activator. Mining of this strain also identified dehydroxy-GTRI-02 (2), which most likely originated from dehydration of 1 during the isolation process. This work shows that even extensively studied model gene clusters such as act of S. coelicolor can still produce new chemistry, offering new perspectives for drug discovery. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Development of a gene cloning system in a fast-growing and moderately thermophilic Streptomyces species and heterologous expression of Streptomyces antibiotic biosynthetic gene clusters

    Science.gov (United States)

    2011-01-01

    Background Streptomyces species are a major source of antibiotics. They usually grow slowly at their optimal temperature and fermentation of industrial strains in a large scale often takes a long time, consuming more energy and materials than some other bacterial industrial strains (e.g., E. coli and Bacillus). Most thermophilic Streptomyces species grow fast, but no gene cloning systems have been developed in such strains. Results We report here the isolation of 41 fast-growing (about twice the rate of S. coelicolor), moderately thermophilic (growing at both 30°C and 50°C) Streptomyces strains, detection of one linear and three circular plasmids in them, and sequencing of a 6996-bp plasmid, pTSC1, from one of them. pTSC1-derived pCWH1 could replicate in both thermophilic and mesophilic Streptomyces strains. On the other hand, several Streptomyces replicons function in thermophilic Streptomyces species. By examining ten well-sporulating strains, we found two promising cloning hosts, 2C and 4F. A gene cloning system was established by using the two strains. The actinorhodin and anthramycin biosynthetic gene clusters from mesophilic S. coelicolor A3(2) and thermophilic S. refuineus were heterologously expressed in one of the hosts. Conclusions We have developed a gene cloning and expression system in a fast-growing and moderately thermophilic Streptomyces species. Although just a few plasmids and one antibiotic biosynthetic gene cluster from mesophilic Streptomyces were successfully expressed in thermophilic Streptomyces species, we expect that by utilizing thermophilic Streptomyces-specific promoters, more genes and especially antibiotic genes clusters of mesophilic Streptomyces should be heterologously expressed. PMID:22032628

  15. A Gene Cluster for Biosynthesis of Mannosylerythritol Lipids Consisted of 4-O-β-D-Mannopyranosyl-(2R,3S-Erythritol as the Sugar Moiety in a Basidiomycetous Yeast Pseudozyma tsukubaensis.

    Directory of Open Access Journals (Sweden)

    Azusa Saika

    Full Text Available Mannosylerythritol lipids (MELs belong to the glycolipid biosurfactants and are produced by various fungi. The basidiomycetous yeast Pseudozyma tsukubaensis produces diastereomer type of MEL-B, which contains 4-O-β-D-mannopyranosyl-(2R,3S-erythritol (R-form as the sugar moiety. In this respect it differs from conventional type of MELs, which contain 4-O-β-D-mannopyranosyl-(2S,3R-erythritol (S-form as the sugar moiety. While the biosynthetic gene cluster for conventional type of MELs has been previously identified in Ustilago maydis and Pseudozyma antarctica, the genetic basis for MEL biosynthesis in P. tsukubaensis is unknown. Here, we identified a gene cluster involved in MEL biosynthesis in P. tsukubaensis. Among these genes, PtEMT1, which encodes erythritol/mannose transferase, had greater than 69% identity with homologs from strains in the genera Ustilago, Melanopsichium, Sporisorium and Pseudozyma. However, phylogenetic analysis placed PtEMT1p in a separate clade from the other proteins. To investigate the function of PtEMT1, we introduced the gene into a P. antarctica mutant strain, ΔPaEMT1, which lacks MEL biosynthesis ability owing to the deletion of PaEMT1. Using NMR spectroscopy, we identified the biosynthetic product as MEL-A with altered sugar conformation. These results indicate that PtEMT1p catalyzes the sugar conformation of MELs. This is the first report of a gene cluster for the biosynthesis of diastereomer type of MEL.

  16. Gene ontology based transfer learning for protein subcellular localization

    Directory of Open Access Journals (Sweden)

    Zhou Shuigeng

    2011-02-01

    Full Text Available Abstract Background Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology. Results In this paper, we propose a Gene Ontology Based Transfer Learning Model (GO-TLM for large-scale protein subcellular localization. The model transfers the signature-based homologous GO terms to the target proteins, and further constructs a reliable learning system to reduce the adverse affect of the potential false GO terms that are resulted from evolutionary divergence. We derive three GO kernels from the three aspects of gene ontology to measure the GO similarity of two proteins, and derive two other spectrum kernels to measure the similarity of two protein sequences. We use simple non-parametric cross validation to explicitly weigh the discriminative abilities of the five kernels, such that the time & space computational complexities are greatly reduced when compared to the complicated semi-definite programming and semi-indefinite linear programming. The five kernels are then linearly merged into one single kernel for

  17. Defining reference sequences for Nocardia species by similarity and clustering analyses of 16S rRNA gene sequence data.

    Directory of Open Access Journals (Sweden)

    Manal Helal

    Full Text Available BACKGROUND: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52% corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION: The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra

  18. Isolation of Hox cluster genes from insects reveals an accelerated sequence evolution rate.

    Directory of Open Access Journals (Sweden)

    Heike Hadrys

    Full Text Available Among gene families it is the Hox genes and among metazoan animals it is the insects (Hexapoda that have attracted particular attention for studying the evolution of development. Surprisingly though, no Hox genes have been isolated from 26 out of 35 insect orders yet, and the existing sequences derive mainly from only two orders (61% from Hymenoptera and 22% from Diptera. We have designed insect specific primers and isolated 37 new partial homeobox sequences of Hox cluster genes (lab, pb, Hox3, ftz, Antp, Scr, abd-a, Abd-B, Dfd, and Ubx from six insect orders, which are crucial to insect phylogenetics. These new gene sequences provide a first step towards comparative Hox gene studies in insects. Furthermore, comparative distance analyses of homeobox sequences reveal a correlation between gene divergence rate and species radiation success with insects showing the highest rate of homeobox sequence evolution.

  19. Glycosulfatase-Encoding Gene Cluster in Bifidobacterium breve UCC2003.

    Science.gov (United States)

    Egan, Muireann; Jiang, Hao; O'Connell Motherway, Mary; Oscarson, Stefan; van Sinderen, Douwe

    2016-11-15

    Bifidobacteria constitute a specific group of commensal bacteria typically found in the gastrointestinal tract (GIT) of humans and other mammals. Bifidobacterium breve strains are numerically prevalent among the gut microbiota of many healthy breastfed infants. In the present study, we investigated glycosulfatase activity in a bacterial isolate from a nursling stool sample, B. breve UCC2003. Two putative sulfatases were identified on the genome of B. breve UCC2003. The sulfated monosaccharide N-acetylglucosamine-6-sulfate (GlcNAc-6-S) was shown to support the growth of B. breve UCC2003, while N-acetylglucosamine-3-sulfate, N-acetylgalactosamine-3-sulfate, and N-acetylgalactosamine-6-sulfate did not support appreciable growth. By using a combination of transcriptomic and functional genomic approaches, a gene cluster designated ats2 was shown to be specifically required for GlcNAc-6-S metabolism. Transcription of the ats2 cluster is regulated by a repressor open reading frame kinase (ROK) family transcriptional repressor. This study represents the first description of glycosulfatase activity within the Bifidobacterium genus. Bifidobacteria are saccharolytic organisms naturally found in the digestive tract of mammals and insects. Bifidobacterium breve strains utilize a variety of plant- and host-derived carbohydrates that allow them to be present as prominent members of the infant gut microbiota as well as being present in the gastrointestinal tract of adults. In this study, we introduce a previously unexplored area of carbohydrate metabolism in bifidobacteria, namely, the metabolism of sulfated carbohydrates. B. breve UCC2003 was shown to metabolize N-acetylglucosamine-6-sulfate (GlcNAc-6-S) through one of two sulfatase-encoding gene clusters identified on its genome. GlcNAc-6-S can be found in terminal or branched positions of mucin oligosaccharides, the glycoprotein component of the mucous layer that covers the digestive tract. The results of this study provide

  20. A gene network bioinformatics analysis for pemphigoid autoimmune blistering diseases.

    Science.gov (United States)

    Barone, Antonio; Toti, Paolo; Giuca, Maria Rita; Derchi, Giacomo; Covani, Ugo

    2015-07-01

    In this theoretical study, a text mining search and clustering analysis of data related to genes potentially involved in human pemphigoid autoimmune blistering diseases (PAIBD) was performed using web tools to create a gene/protein interaction network. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database was employed to identify a final set of PAIBD-involved genes and to calculate the overall significant interactions among genes: for each gene, the weighted number of links, or WNL, was registered and a clustering procedure was performed using the WNL analysis. Genes were ranked in class (leader, B, C, D and so on, up to orphans). An ontological analysis was performed for the set of 'leader' genes. Using the above-mentioned data network, 115 genes represented the final set; leader genes numbered 7 (intercellular adhesion molecule 1 (ICAM-1), interferon gamma (IFNG), interleukin (IL)-2, IL-4, IL-6, IL-8 and tumour necrosis factor (TNF)), class B genes were 13, whereas the orphans were 24. The ontological analysis attested that the molecular action was focused on extracellular space and cell surface, whereas the activation and regulation of the immunity system was widely involved. Despite the limited knowledge of the present pathologic phenomenon, attested by the presence of 24 genes revealing no protein-protein direct or indirect interactions, the network showed significant pathways gathered in several subgroups: cellular components, molecular functions, biological processes and the pathologic phenomenon obtained from the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database. The molecular basis for PAIBD was summarised and expanded, which will perhaps give researchers promising directions for the identification of new therapeutic targets.

  1. Clustering on Membranes

    DEFF Research Database (Denmark)

    Johannes, Ludger; Pezeshkian, Weria; Ipsen, John H

    2018-01-01

    Clustering of extracellular ligands and proteins on the plasma membrane is required to perform specific cellular functions, such as signaling and endocytosis. Attractive forces that originate in perturbations of the membrane's physical properties contribute to this clustering, in addition to direct...... protein-protein interactions. However, these membrane-mediated forces have not all been equally considered, despite their importance. In this review, we describe how line tension, lipid depletion, and membrane curvature contribute to membrane-mediated clustering. Additional attractive forces that arise...... from protein-induced perturbation of a membrane's fluctuations are also described. This review aims to provide a survey of the current understanding of membrane-mediated clustering and how this supports precise biological functions....

  2. Phasing of muscle gene expression with fasting-induced recovery growth in Atlantic salmon

    Directory of Open Access Journals (Sweden)

    Bower Neil I

    2009-08-01

    Full Text Available Abstract Background Many fish species experience long periods of fasting in nature often associated with seasonal reductions in water temperature and prey availability or spawning migrations. During periods of nutrient restriction, changes in metabolism occur to provide cellular energy via catabolic processes. Muscle is particularly affected by prolonged fasting as myofibrillar proteins act as a major energy source. To investigate the mechanisms of metabolic reorganisation with fasting and refeeding in a saltwater stage of Atlantic salmon (Salmo salar L. we analysed the expression of genes involved in myogenesis, growth signalling, lipid biosynthesis and myofibrillar protein degradation and synthesis pathways using qPCR. Results Hierarchical clustering of gene expression data revealed three clusters. The first cluster comprised genes involved in lipid metabolism and triacylglycerol synthesis (ALDOB, DGAT1 and LPL which had peak expression 3-14d after refeeding. The second cluster comprised ADIPOQ, MLC2, IGF-I and TALDO1, with peak expression 14-32d after refeeding. Cluster III contained genes strongly down regulated as an initial response to feeding and included the ubiquitin ligases MuRF1 and MAFbx, myogenic regulatory factors and some metabolic genes. Conclusion Early responses to refeeding in fasted salmon included the synthesis of triacylglycerols and activation of the adipogenic differentiation program. Inhibition of MuRF1 and MAFbx respectively may result in decreased degradation and concomitant increased production of myofibrillar proteins. Both of these processes preceded any increase in expression of myogenic regulatory factors and IGF-I. These responses could be a necessary strategy for an animal adapted to long periods of food deprivation whereby energy reserves are replenished prior to the resumption of myogenesis.

  3. Construction of ontology augmented networks for protein complex prediction.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  4. SITEX 2.0: Projections of protein functional sites on eukaryotic genes. Extension with orthologous genes.

    Science.gov (United States)

    Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A

    2017-04-01

    Functional sites define the diversity of protein functions and are the central object of research of the structural and functional organization of proteins. The mechanisms underlying protein functional sites emergence and their variability during evolution are distinguished by duplication, shuffling, insertion and deletion of the exons in genes. The study of the correlation between a site structure and exon structure serves as the basis for the in-depth understanding of sites organization. In this regard, the development of programming resources that allow the realization of the mutual projection of exon structure of genes and primary and tertiary structures of encoded proteins is still the actual problem. Previously, we developed the SitEx system that provides information about protein and gene sequences with mapped exon borders and protein functional sites amino acid positions. The database included information on proteins with known 3D structure. However, data with respect to orthologs was not available. Therefore, we added the projection of sites positions to the exon structures of orthologs in SitEx 2.0. We implemented a search through database using site conservation variability and site discontinuity through exon structure. Inclusion of the information on orthologs allowed to expand the possibilities of SitEx usage for solving problems regarding the analysis of the structural and functional organization of proteins. Database URL: http://www-bionet.sscc.ru/sitex/ .

  5. Application of clustering methods: Regularized Markov clustering (R-MCL) for analyzing dengue virus similarity

    Science.gov (United States)

    Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.

    2017-07-01

    Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.

  6. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Kautsar, Satria A.; Suarez Duran, Hernando G.; Blin, Kai

    2017-01-01

    exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery. The plantiSMASH web server, precalculated results...

  7. Lung Cancer Signature Biomarkers: tissue specific semantic similarity based clustering of Digital Differential Display (DDD data

    Directory of Open Access Journals (Sweden)

    Srivastava Mousami

    2012-11-01

    Full Text Available Abstract Background The tissue-specific Unigene Sets derived from more than one million expressed sequence tags (ESTs in the NCBI, GenBank database offers a platform for identifying significantly and differentially expressed tissue-specific genes by in-silico methods. Digital differential display (DDD rapidly creates transcription profiles based on EST comparisons and numerically calculates, as a fraction of the pool of ESTs, the relative sequence abundance of known and novel genes. However, the process of identifying the most likely tissue for a specific disease in which to search for candidate genes from the pool of differentially expressed genes remains difficult. Therefore, we have used ‘Gene Ontology semantic similarity score’ to measure the GO similarity between gene products of lung tissue-specific candidate genes from control (normal and disease (cancer sets. This semantic similarity score matrix based on hierarchical clustering represents in the form of a dendrogram. The dendrogram cluster stability was assessed by multiple bootstrapping. Multiple bootstrapping also computes a p-value for each cluster and corrects the bias of the bootstrap probability. Results Subsequent hierarchical clustering by the multiple bootstrapping method (α = 0.95 identified seven clusters. The comparative, as well as subtractive, approach revealed a set of 38 biomarkers comprising four distinct lung cancer signature biomarker clusters (panel 1–4. Further gene enrichment analysis of the four panels revealed that each panel represents a set of lung cancer linked metastasis diagnostic biomarkers (panel 1, chemotherapy/drug resistance biomarkers (panel 2, hypoxia regulated biomarkers (panel 3 and lung extra cellular matrix biomarkers (panel 4. Conclusions Expression analysis reveals that hypoxia induced lung cancer related biomarkers (panel 3, HIF and its modulating proteins (TGM2, CSNK1A1, CTNNA1, NAMPT/Visfatin, TNFRSF1A, ETS1, SRC-1, FN1, APLP2, DMBT1

  8. A CLUSTERING OF DJA STOCKS - THE APPLICATION IN FINANCE OF A METHOD FIRST USED IN GENE TRAJECTORY STUDY

    Directory of Open Access Journals (Sweden)

    Silaghi Gheorghe Cosmin

    2009-05-01

    Full Text Available Previously we employed the Gene Trajectory Clustering methodology to search for different associations of the stocks composing the DJA index, with the aim of finding different, logic clusters, supported by economic reasons, preferably different than the

  9. clusterMaker: a multi-algorithm clustering plugin for Cytoscape

    Directory of Open Access Journals (Sweden)

    Morris John H

    2011-11-01

    Full Text Available Abstract Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view, k-means, k-medoid, SCPS, AutoSOME, and native (Java MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin cluster

  10. Role of protein-glutathione contacts in defining glutaredoxin-3 [2Fe-2S] cluster chirality, ligand exchange and transfer chemistry.

    Science.gov (United States)

    Sen, Sambuddha; Cowan, J A

    2017-10-01

    Monothiol glutaredoxins (Grx) serve as intermediate cluster carriers in iron-sulfur cluster trafficking. The [2Fe-2S]-bound holo forms of Grx proteins display cysteinyl coordination from exogenous glutathione (GSH), in addition to contact from protein-derived Cys. Herein, we report mechanistic studies that investigate the role of exogenous glutathione in defining cluster chirality, ligand exchange, and the cluster transfer chemistry of Saccharomyces cerevisiae Grx3. Systematic perturbations were introduced to the glutathione-binding site by substitution of conserved charged amino acids that form crucial electrostatic contacts with the glutathione molecule. Native Grx3 could also be reconstituted in the absence of glutathione, with either DTT, BME or free L-cysteine as the source of the exogenous Fe-S ligand contact, while retaining full functional reactivity. The delivery of the [2Fe-2S] cluster to Grx3 from cluster donor proteins such as Isa, Nfu, and a [2Fe-2S](GS) 4 complex, revealed that electrostatic contacts are of key importance for positioning the exogenous glutathione that in turn influences the chiral environment of the cluster. All Grx3 derivatives were reconstituted by standard chemical reconstitution protocols and found to transfer cluster to apo ferredoxin 1 (Fdx1) at rates comparable to native protein, even when using DTT, BME or free L-cysteine as a thiol source in place of GSH during reconstitution. Kinetic analysis of cluster transfer from holo derivatives to apo Fdx1 has led to a mechanistic model for cluster transfer chemistry of native holo Grx3, and identification of the likely rate-limiting step for the reaction.

  11. Transcriptome sequencing of Mycosphaerella fijiensis during association with Musa acuminata reveals candidate pathogenicity genes.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-08-30

    Mycosphaerella fijiensis, causative agent of the black Sigatoka disease of banana, is considered the most economically damaging banana disease. Despite its importance, the genetics of pathogenicity are poorly understood. Previous studies have characterized polyketide pathways with possible roles in pathogenicity. To identify additional candidate pathogenicity genes, we compared the transcriptome of this fungus during the necrotrophic phase of infection with that during saprophytic growth in medium. Transcriptome analysis was conducted, and the functions of differentially expressed genes were predicted by identifying conserved domains, Gene Ontology (GO) annotation and GO enrichment analysis, Carbohydrate-Active EnZymes (CAZy) annotation, and identification of genes encoding effector-like proteins. The analysis showed that genes commonly involved in secondary metabolism have higher expression in infected leaf tissue, including genes encoding cytochrome P450s, short-chain dehydrogenases, and oxidoreductases in the 2-oxoglutarate and Fe(II)-dependent oxygenase superfamily. Other pathogenicity-related genes with higher expression in infected leaf tissue include genes encoding salicylate hydroxylase-like proteins, hydrophobic surface binding proteins, CFEM domain-containing proteins, and genes encoding secreted cysteine-rich proteins characteristic of effectors. More genes encoding amino acid transporters, oligopeptide transporters, peptidases, proteases, proteinases, sugar transporters, and proteins containing Domain of Unknown Function (DUF) 3328 had higher expression in infected leaf tissue, while more genes encoding inhibitors of peptidases and proteinases had higher expression in medium. Sixteen gene clusters with higher expression in leaf tissue were identified including clusters for the synthesis of a non-ribosomal peptide. A cluster encoding a novel fusicoccane was also identified. Two putative dispensable scaffolds were identified with a large proportion of

  12. Gene cluster analysis for the biosynthesis of elgicins, novel lantibiotics produced by paenibacillus elgii B69

    Directory of Open Access Journals (Sweden)

    Teng Yi

    2012-03-01

    Full Text Available Abstract Background The recent increase in bacterial resistance to antibiotics has promoted the exploration of novel antibacterial materials. As a result, many researchers are undertaking work to identify new lantibiotics because of their potent antimicrobial activities. The objective of this study was to provide details of a lantibiotic-like gene cluster in Paenibacillus elgii B69 and to produce the antibacterial substances coded by this gene cluster based on culture screening. Results Analysis of the P. elgii B69 genome sequence revealed the presence of a lantibiotic-like gene cluster composed of five open reading frames (elgT1, elgC, elgT2, elgB, and elgA. Screening of culture extracts for active substances possessing the predicted properties of the encoded product led to the isolation of four novel peptides (elgicins AI, AII, B, and C with a broad inhibitory spectrum. The molecular weights of these peptides were 4536, 4593, 4706, and 4820 Da, respectively. The N-terminal sequence of elgicin B was Leu-Gly-Asp-Tyr, which corresponded to the partial sequence of the peptide ElgA encoded by elgA. Edman degradation suggested that the product elgicin B is derived from ElgA. By correlating the results of electrospray ionization-mass spectrometry analyses of elgicins AI, AII, and C, these peptides are deduced to have originated from the same precursor, ElgA. Conclusions A novel lantibiotic-like gene cluster was shown to be present in P. elgii B69. Four new lantibiotics with a broad inhibitory spectrum were isolated, and these appear to be promising antibacterial agents.

  13. Analysis of NFU-1 metallocofactor binding-site substitutions-impacts on iron-sulfur cluster coordination and protein structure and function.

    Science.gov (United States)

    Wesley, Nathaniel A; Wachnowsky, Christine; Fidai, Insiya; Cowan, J A

    2017-11-01

    Iron-sulfur (Fe/S) clusters are ancient prosthetic groups found in numerous metalloproteins and are conserved across all kingdoms of life due to their diverse, yet essential functional roles. Genetic mutations to a specific subset of mitochondrial Fe/S cluster delivery proteins are broadly categorized as disease-related under multiple mitochondrial dysfunction syndrome (MMDS), with symptoms indicative of a general failure of the metabolic system. Multiple mitochondrial dysfunction syndrome 1 (MMDS1) arises as a result of the missense mutation in NFU1, an Fe/S cluster scaffold protein, which substitutes a glycine near the Fe/S cluster-binding pocket to a cysteine (p.Gly208Cys). This substitution has been shown to promote protein dimerization such that cluster delivery to NFU1 is blocked, preventing downstream cluster trafficking. However, the possibility of this additional cysteine, located adjacent to the cluster-binding site, serving as an Fe/S cluster ligand has not yet been explored. To fully understand the consequences of this Gly208Cys replacement, complementary substitutions at the Fe/S cluster-binding pocket for native and Gly208Cys NFU1 were made, along with six other variants. Herein, we report the results of an investigation on the effect of these substitutions on both cluster coordination and NFU1 structure and function. The data suggest that the G208C substitution does not contribute to cluster binding. Rather, replacement of the glycine at position 208 changes the oligomerization state as a result of global structural alterations that result in the downstream effects manifest as MMDS1, but does not perturb the coordination chemistry of the Fe-S cluster. © 2017 Federation of European Biochemical Societies.

  14. Genetic recombination as a major cause of mutagenesis in the human globin gene clusters.

    Science.gov (United States)

    Borg, Joseph; Georgitsi, Marianthi; Aleporou-Marinou, Vassiliki; Kollia, Panagoula; Patrinos, George P

    2009-12-01

    Homologous recombination is a frequent phenomenon in multigene families and as such it occurs several times in both the alpha- and beta-like globin gene families. In numerous occasions, genetic recombination has been previously implicated as a major mechanism that drives mutagenesis in the human globin gene clusters, either in the form of unequal crossover or gene conversion. Unequal crossover results in the increase or decrease of the human globin gene copies, accompanied in the majority of cases with minor phenotypic consequences, while gene conversion contributes either to maintaining sequence homogeneity or generating sequence diversity. The role of genetic recombination, particularly gene conversion in the evolution of the human globin gene families has been discussed elsewhere. Here, we summarize our current knowledge and review existing experimental evidence outlining the role of genetic recombination in the mutagenic process in the human globin gene families.

  15. In silico study of protein to protein interaction analysis of AMP-activated protein kinase and mitochondrial activity in three different farm animal species

    Science.gov (United States)

    Prastowo, S.; Widyas, N.

    2018-03-01

    AMP-activated protein kinase (AMPK) is cellular energy censor which works based on ATP and AMP concentration. This protein interacts with mitochondria in determine its activity to generate energy for cell metabolism purposes. For that, this paper aims to compare the protein to protein interaction of AMPK and mitochondrial activity genes in the metabolism of known animal farm (domesticated) that are cattle (Bos taurus), pig (Sus scrofa) and chicken (Gallus gallus). In silico study was done using STRING V.10 as prominent protein interaction database, followed with biological function comparison in KEGG PATHWAY database. Set of genes (12 in total) were used as input analysis that are PRKAA1, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3, PPARGC1, ACC, CPT1B, NRF2 and SOD. The first 7 genes belong to gene in AMPK family, while the last 5 belong to mitochondrial activity genes. The protein interaction result shows 11, 8 and 5 metabolism pathways in Bos taurus, Sus scrofa and Gallus gallus, respectively. The top pathway in Bos taurus is AMPK signaling pathway (10 genes), Sus scrofa is Adipocytokine signaling pathway (8 genes) and Gallus gallus is FoxO signaling pathway (5 genes). Moreover, the common pathways found in those 3 species are Adipocytokine signaling pathway, Insulin signaling pathway and FoxO signaling pathway. Genes clustered in Adipocytokine and Insulin signaling pathway are PRKAA2, PPARGC1A, PRKAB1 and PRKAG2. While, in FoxO signaling pathway are PRKAA2, PRKAB1, PRKAG2. According to that, we found PRKAA2, PRKAB1 and PRKAG2 are the common genes. Based on the bioinformatics analysis, we can demonstrate that protein to protein interaction shows distinct different of metabolism in different species. However, further validation is needed to give a clear explanation.

  16. Gene, protein and network of male sterility in rice

    Directory of Open Access Journals (Sweden)

    Wang eKun

    2013-04-01

    Full Text Available Rice is one of the most important model crop plants whose heterosis has been well exploited in commercial hybrid seed production via a variety of types of male sterile lines. Hybrid rice cultivation area is steadily expanding around the world, especially in Southern Asia. Characterization of genes and proteins related to male sterility aims to understand how and why the male sterility occurs, and which proteins are the key players for microspores abortion. Recently, a series of genes and proteins related to cytoplasmic male sterility, photoperiod sensitive male sterility, self-incompatibility and other types of microspores deterioration have been characterized through genetics or proteomics. Especially the latter, offers us a powerful and high throughput approach to discern the novel proteins involving in male-sterile pathways which may help us to breed artificial male-sterile system. This represents an alternative tool to meet the critical challenge of further development of hybrid rice. In this paper, we reviewed the recent developments in our understanding of male sterility in rice hybrid production across gene, protein and integrated network levels, and also, present a perspective on the engineering of male sterile lines for hybrid rice production.

  17. Phylogenetic Distribution of the Capsid Assembly Protein Gene (g20) of Cyanophages in Paddy Floodwaters in Northeast China

    Science.gov (United States)

    Jing, Ruiyong; Liu, Junjie; Yu, Zhenhua; Liu, Xiaobing; Wang, Guanghua

    2014-01-01

    Numerous studies have revealed the high diversity of cyanophages in marine and freshwater environments, but little is currently known about the diversity of cyanophages in paddy fields, particularly in Northeast (NE) China. To elucidate the genetic diversity of cyanophages in paddy floodwaters in NE China, viral capsid assembly protein gene (g20) sequences from five floodwater samples were amplified with the primers CPS1 and CPS8. Denaturing gradient gel electrophoresis (DGGE) was applied to distinguish different g20 clones. In total, 54 clones differing in g20 nucleotide sequences were obtained in this study. Phylogenetic analysis showed that the distribution of g20 sequences in this study was different from that in Japanese paddy fields, and all the sequences were grouped into Clusters α, β, γ and ε. Within Clusters α and β, three new small clusters (PFW-VII∼-IX) were identified. UniFrac analysis of g20 clone assemblages demonstrated that the community compositions of cyanophage varied among marine, lake and paddy field environments. In paddy floodwater, community compositions of cyanophage were also different between NE China and Japan. PMID:24533125

  18. IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

    Science.gov (United States)

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita

    2015-07-14

    In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to

  19. Deletion of a regulatory gene within the cpk gene cluster reveals novel antibacterial activity in Streptomyces coelicolor A3(2)

    NARCIS (Netherlands)

    Gottelt, Marco; Kol, Stefan; Gomez-Escribano, Juan Pablo; Bibb, Mervyn; Takano, Eriko

    Genome sequencing of Streptomyces coelicolor A3(2) revealed an uncharacterized type I polyketide synthase gene cluster (cpk) Here we describe the discovery of a novel antibacterial activity (abCPK) and a yellow-pigmented secondary metabolite (yCPK) after deleting a presumed pathway-specific

  20. Comparing the performance of biomedical clustering methods

    DEFF Research Database (Denmark)

    Wiwie, Christian; Baumbach, Jan; Röttger, Richard

    2015-01-01

    expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future......Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene....... This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide...

  1. Ligand cluster-based protein network and ePlatton, a multi-target ligand finder.

    Science.gov (United States)

    Du, Yu; Shi, Tieliu

    2016-01-01

    Small molecules are information carriers that make cells aware of external changes and couple internal metabolic and signalling pathway systems with each other. In some specific physiological status, natural or artificial molecules are used to interact with selective biological targets to activate or inhibit their functions to achieve expected biological and physiological output. Millions of years of evolution have optimized biological processes and pathways and now the endocrine and immune system cannot work properly without some key small molecules. In the past thousands of years, the human race has managed to find many medicines against diseases by trail-and-error experience. In the recent decades, with the deepening understanding of life and the progress of molecular biology, researchers spare no effort to design molecules targeting one or two key enzymes and receptors related to corresponding diseases. But recent studies in pharmacogenomics have shown that polypharmacology may be necessary for the effects of drugs, which challenge the paradigm, 'one drug, one target, one disease'. Nowadays, cheminformatics and structural biology can help us reasonably take advantage of the polypharmacology to design next-generation promiscuous drugs and drug combination therapies. 234,591 protein-ligand interactions were extracted from ChEMBL. By the 2D structure similarity, 13,769 ligand emerged from 156,151 distinct ligands which were recognized by 1477 proteins. Ligand cluster- and sequence-based protein networks (LCBN, SBN) were constructed, compared and analysed. For assisting compound designing, exploring polypharmacology and finding possible drug combination, we integrated the pathway, disease, drug adverse reaction and the relationship of targets and ligand clusters into the web platform, ePlatton, which is available at http://www.megabionet.org/eplatton. Although there were some disagreements between the LCBN and SBN, communities in both networks were largely the same

  2. Sequencing and Transcriptional Analysis of the Biosynthesis Gene Cluster of Putrescine-Producing Lactococcus lactis ▿ †

    Science.gov (United States)

    Ladero, Victor; Rattray, Fergal P.; Mayo, Baltasar; Martín, María Cruz; Fernández, María; Alvarez, Miguel A.

    2011-01-01

    Lactococcus lactis is a prokaryotic microorganism with great importance as a culture starter and has become the model species among the lactic acid bacteria. The long and safe history of use of L. lactis in dairy fermentations has resulted in the classification of this species as GRAS (General Regarded As Safe) or QPS (Qualified Presumption of Safety). However, our group has identified several strains of L. lactis subsp. lactis and L. lactis subsp. cremoris that are able to produce putrescine from agmatine via the agmatine deiminase (AGDI) pathway. Putrescine is a biogenic amine that confers undesirable flavor characteristics and may even have toxic effects. The AGDI cluster of L. lactis is composed of a putative regulatory gene, aguR, followed by the genes (aguB, aguD, aguA, and aguC) encoding the catabolic enzymes. These genes are transcribed as an operon that is induced in the presence of agmatine. In some strains, an insertion (IS) element interrupts the transcription of the cluster, which results in a non-putrescine-producing phenotype. Based on this knowledge, a PCR-based test was developed in order to differentiate nonproducing L. lactis strains from those with a functional AGDI cluster. The analysis of the AGDI cluster and their flanking regions revealed that the capacity to produce putrescine via the AGDI pathway could be a specific characteristic that was lost during the adaptation to the milk environment by a process of reductive genome evolution. PMID:21803900

  3. NFκB-mediated activation of the cellular FUT3, 5 and 6 gene cluster by herpes simplex virus type 1.

    Science.gov (United States)

    Nordén, Rickard; Samuelsson, Ebba; Nyström, Kristina

    2017-11-01

    Herpes simplex virus type 1 has the ability to induce expression of a human gene cluster located on chromosome 19 upon infection. This gene cluster contains three fucosyltransferases (encoded by FUT3, FUT5 and FUT6) with the ability to add a fucose to an N-acetylglucosamine residue. Little is known regarding the transcriptional activation of these three genes in human cells. Intriguingly, herpes simplex virus type 1 activates all three genes simultaneously during infection, a situation not observed in uninfected tissue, pointing towards a virus specific mechanism for transcriptional activation. The aim of this study was to define the underlying mechanism for the herpes simplex virus type 1 activation of FUT3, FUT5 and FUT6 transcription. The transcriptional activation of the FUT-gene cluster on chromosome 19 in fibroblasts was specific, not involving adjacent genes. Moreover, inhibition of NFκB signaling through panepoxydone treatment significantly decreased the induction of FUT3, FUT5 and FUT6 transcriptional activation, as did siRNA targeting of p65, in herpes simplex virus type 1 infected fibroblasts. NFκB and p65 signaling appears to play an important role in the regulation of FUT3, FUT5 and FUT6 transcriptional activation by herpes simplex virus type 1 although additional, unidentified, viral factors might account for part of the mechanism as direct interferon mediated stimulation of NFκB was not sufficient to induce the fucosyltransferase encoding gene cluster in uninfected cells. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. De novo deletion of HOXB gene cluster in a patient with failure to thrive, developmental delay, gastroesophageal reflux and bronchiectasis.

    Science.gov (United States)

    Pajusalu, Sander; Reimand, Tiia; Uibo, Oivi; Vasar, Maire; Talvik, Inga; Zilina, Olga; Tammur, Pille; Õunap, Katrin

    2015-01-01

    We report a female patient with a complex phenotype consisting of failure to thrive, developmental delay, congenital bronchiectasis, gastroesophageal reflux and bilateral inguinal hernias. Chromosomal microarray analysis revealed a 230 kilobase deletion in chromosomal region 17q21.32 (arr[hg19] 17q21.32(46 550 362-46 784 039)×1) encompassing only 9 genes - HOXB1 to HOXB9. The deletion was not found in her mother or father. This is the first report of a patient with a HOXB gene cluster deletion involving only HOXB1 to HOXB9 genes. By comparing our case to previously reported five patients with larger chromosomal aberrations involving the HOXB gene cluster, we can suppose that HOXB gene cluster deletions are responsible for growth retardation, developmental delay, and specific facial dysmorphic features. Also, we suppose that bilateral inguinal hernias, tracheo-esophageal abnormalities, and lung malformations represent features with incomplete penetrance. Interestingly, previously published knock-out mice with targeted heterozygous deletion comparable to our patient did not show phenotypic alterations. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  5. Comparative analysis of the prion protein gene sequences in African lion.

    Science.gov (United States)

    Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

    2006-10-01

    The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.

  6. Study on Fusion Protein and Its gene in Baculovirus Specificity

    International Nuclear Information System (INIS)

    Nemr, W.A.H.

    2012-01-01

    Baculoviruses are subdivided into two groups depending on the type of budded virus envelop fusion protein; group I utilized gp64 which include the most of nucleopolyhedroviruses (NPVs), group II utilized F protein which include the remnants of NPVs and all Granuloviruses (GVs). Recent studies reported the viral F protein coding gene as a host cellular sourced gene and may evolutionary acquired from the host genome referring to phylogeny analysis of fusion proteins. Thus, it was deduced that F protein coding gene is species- specific nucleotide sequence related to the type of the specific host and if virus could infect an unexpected host, the resulted virus may encode a vary F gene. In this regard, the present study utilized the mentioned properties of F gene in an attempt to produce a model of specific and more economic wider range granulovirus bio- pesticide able to infect both Spodoptera littoralis and Phthorimaea operculella larvae. Multiple sequence alignment and phylogeny analysis were performed on six members of group II baculovirus, novel universal PCR primers were manually designed from the conserved regions in the alignment graph, targeted to amplify species- specific sequence entire F gene open reading frame (ORF) which is useful in molecular identification of baculovirus in unknown samples. So, the PCR product of SpliGV used to prepare a specific probe for the F gene of this type of virus. Results reflected that it is possible to infect S. littoralis larvae by PhopGV if injected into larval haemocoel, the resulted virus of this infection showed by using DNA hybridization technique to be encode to F gene homologous with the F gene of Spli GV, which is revealed that the resulted virus acquired this F gene sequence from the host genome after infection. Consequently, these results may infer that if genetic aberrations occur in the host genome, this may affect in baculoviral infectivity. So, this study aimed to investigate the effect of gamma radiation at

  7. Protein complex prediction in large ontology attributed protein-protein interaction networks.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Li, Yanpeng; Xu, Bo

    2013-01-01

    Protein complexes are important for unraveling the secrets of cellular organization and function. Many computational approaches have been developed to predict protein complexes in protein-protein interaction (PPI) networks. However, most existing approaches focus mainly on the topological structure of PPI networks, and largely ignore the gene ontology (GO) annotation information. In this paper, we constructed ontology attributed PPI networks with PPI data and GO resource. After constructing ontology attributed networks, we proposed a novel approach called CSO (clustering based on network structure and ontology attribute similarity). Structural information and GO attribute information are complementary in ontology attributed networks. CSO can effectively take advantage of the correlation between frequent GO annotation sets and the dense subgraph for protein complex prediction. Our proposed CSO approach was applied to four different yeast PPI data sets and predicted many well-known protein complexes. The experimental results showed that CSO was valuable in predicting protein complexes and achieved state-of-the-art performance.

  8. Recurrent adenylation domain replacement in the microcystin synthetase gene cluster

    Directory of Open Access Journals (Sweden)

    Laakso Kati

    2007-10-01

    Full Text Available Abstract Background Microcystins are small cyclic heptapeptide toxins produced by a range of distantly related cyanobacteria. Microcystins are synthesized on large NRPS-PKS enzyme complexes. Many structural variants of microcystins are produced simulatenously. A recombination event between the first module of mcyB (mcyB1 and mcyC in the microcystin synthetase gene cluster is linked to the simultaneous production of microcystin variants in strains of the genus Microcystis. Results Here we undertook a phylogenetic study to investigate the order and timing of recombination between the mcyB1 and mcyC genes in a diverse selection of microcystin producing cyanobacteria. Our results provide support for complex evolutionary processes taking place at the mcyB1 and mcyC adenylation domains which recognize and activate the amino acids found at X and Z positions. We find evidence for recent recombination between mcyB1 and mcyC in strains of the genera Anabaena, Microcystis, and Hapalosiphon. We also find clear evidence for independent adenylation domain conversion of mcyB1 by unrelated peptide synthetase modules in strains of the genera Nostoc and Microcystis. The recombination events replace only the adenylation domain in each case and the condensation domains of mcyB1 and mcyC are not transferred together with the adenylation domain. Our findings demonstrate that the mcyB1 and mcyC adenylation domains are recombination hotspots in the microcystin synthetase gene cluster. Conclusion Recombination is thought to be one of the main mechanisms driving the diversification of NRPSs. However, there is very little information on how recombination takes place in nature. This study demonstrates that functional peptide synthetases are created in nature through transfer of adenylation domains without the concomitant transfer of condensation domains.

  9. Diblock-copolymer-mediated self-assembly of protein-stabilized iron oxide nanoparticle clusters for magnetic resonance imaging.

    Science.gov (United States)

    Tähkä, Sari; Laiho, Ari; Kostiainen, Mauri A

    2014-03-03

    Superparamagnetic iron oxide nanoparticles (SPIONs) can be used as efficient transverse relaxivity (T2 ) contrast agents in magnetic resonance imaging (MRI). Organizing small (Doxide) diblock copolymer (P2QVP-b-PEO) to mediate the self-assembly of protein-cage-encapsulated iron oxide (γ-Fe2 O3 ) nanoparticles (magnetoferritin) into stable PEO-coated clusters. This approach relies on electrostatic interactions between the cationic N-methyl-2-vinylpyridinium iodide block and magnetoferritin protein cage surface (pI≈4.5) to form a dense core, whereas the neutral ethylene oxide block provides a stabilizing biocompatible shell. Formation of the complexes was studied in aqueous solvent medium with dynamic light scattering (DLS) and cryogenic transmission electron microcopy (cryo-TEM). DLS results indicated that the hydrodynamic diameter (Dh ) of the clusters is approximately 200 nm, and cryo-TEM showed that the clusters have an anisotropic stringlike morphology. MRI studies showed that in the clusters the longitudinal relaxivity (r1 ) is decreased and the transverse relaxivity (r2 ) is increased relative to free magnetoferritin (MF), thus indicating that clusters can provide considerable contrast enhancement. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. The evolution of Dscam genes across the arthropods.

    Science.gov (United States)

    Armitage, Sophie A O; Freiburg, Rebecca Y; Kurtz, Joachim; Bravo, Ignacio G

    2012-04-13

    One way of creating phenotypic diversity is through alternative splicing of precursor mRNAs. A gene that has evolved a hypervariable form is Down syndrome cell adhesion molecule (Dscam-hv), which in Drosophila melanogaster can produce thousands of isoforms via mutually exclusive alternative splicing. The extracellular region of this protein is encoded by three variable exon clusters, each containing multiple exon variants. The protein is vital for neuronal wiring where the extreme variability at the somatic level is required for axonal guidance, and it plays a role in immunity where the variability has been hypothesised to relate to recognition of different antigens. Dscam-hv has been found across the Pancrustacea. Additionally, three paralogous non-hypervariable Dscam-like genes have also been described for D. melanogaster. Here we took a bioinformatics approach, building profile Hidden Markov Models to search across species for putative orthologs to the Dscam genes and for hypervariable alternatively spliced exons, and inferring the phylogenetic relationships among them. Our aims were to examine whether Dscam orthologs exist outside the Bilateria, whether the origin of Dscam-hv could lie outside the Pancrustacea, when the Dscam-like orthologs arose, how many alternatively spliced exons of each exon cluster were present in the most common recent ancestor, and how these clusters evolved. Our results suggest that the origin of Dscam genes may lie after the split between the Cnidaria and the Bilateria and supports the hypothesis that Dscam-hv originated in the common ancestor of the Pancrustacea. Our phylogeny of Dscam gene family members shows six well-supported clades: five containing Dscam-like genes and one containing all the Dscam-hv genes, a seventh clade contains arachnid putative Dscam genes. Furthermore, the exon clusters appear to have experienced different evolutionary histories. Dscam genes have undergone independent duplication events in the insects and

  11. The evolution of Dscam genes across the arthropods

    Directory of Open Access Journals (Sweden)

    Armitage Sophie AO

    2012-04-01

    Full Text Available Abstract Background One way of creating phenotypic diversity is through alternative splicing of precursor mRNAs. A gene that has evolved a hypervariable form is Down syndrome cell adhesion molecule (Dscam-hv, which in Drosophila melanogaster can produce thousands of isoforms via mutually exclusive alternative splicing. The extracellular region of this protein is encoded by three variable exon clusters, each containing multiple exon variants. The protein is vital for neuronal wiring where the extreme variability at the somatic level is required for axonal guidance, and it plays a role in immunity where the variability has been hypothesised to relate to recognition of different antigens. Dscam-hv has been found across the Pancrustacea. Additionally, three paralogous non-hypervariable Dscam-like genes have also been described for D. melanogaster. Here we took a bioinformatics approach, building profile Hidden Markov Models to search across species for putative orthologs to the Dscam genes and for hypervariable alternatively spliced exons, and inferring the phylogenetic relationships among them. Our aims were to examine whether Dscam orthologs exist outside the Bilateria, whether the origin of Dscam-hv could lie outside the Pancrustacea, when the Dscam-like orthologs arose, how many alternatively spliced exons of each exon cluster were present in the most common recent ancestor, and how these clusters evolved. Results Our results suggest that the origin of Dscam genes may lie after the split between the Cnidaria and the Bilateria and supports the hypothesis that Dscam-hv originated in the common ancestor of the Pancrustacea. Our phylogeny of Dscam gene family members shows six well-supported clades: five containing Dscam-like genes and one containing all the Dscam-hv genes, a seventh clade contains arachnid putative Dscam genes. Furthermore, the exon clusters appear to have experienced different evolutionary histories. Conclusions Dscam genes have

  12. Rapid evolution of the sequences and gene repertoires of secreted proteins in bacteria.

    Directory of Open Access Journals (Sweden)

    Teresa Nogueira

    Full Text Available Proteins secreted to the extracellular environment or to the periphery of the cell envelope, the secretome, play essential roles in foraging, antagonistic and mutualistic interactions. We hypothesize that arms races, genetic conflicts and varying selective pressures should lead to the rapid change of sequences and gene repertoires of the secretome. The analysis of 42 bacterial pan-genomes shows that secreted, and especially extracellular proteins, are predominantly encoded in the accessory genome, i.e. among genes not ubiquitous within the clade. Genes encoding outer membrane proteins might engage more frequently in intra-chromosomal gene conversion because they are more often in multi-genic families. The gene sequences encoding the secretome evolve faster than the rest of the genome and in particular at non-synonymous positions. Cell wall proteins in Firmicutes evolve particularly fast when compared with outer membrane proteins of Proteobacteria. Virulence factors are over-represented in the secretome, notably in outer membrane proteins, but cell localization explains more of the variance in substitution rates and gene repertoires than sequence homology to known virulence factors. Accordingly, the repertoires and sequences of the genes encoding the secretome change fast in the clades of obligatory and facultative pathogens and also in the clades of mutualists and free-living bacteria. Our study shows that cell localization shapes genome evolution. In agreement with our hypothesis, the repertoires and the sequences of genes encoding secreted proteins evolve fast. The particularly rapid change of extracellular proteins suggests that these public goods are key players in bacterial adaptation.

  13. Discovery of rare protein-coding genes in model methylotroph Methylobacterium extorquens AM1.

    Science.gov (United States)

    Kumar, Dhirendra; Mondal, Anupam Kumar; Yadav, Amit Kumar; Dash, Debasis

    2014-12-01

    Proteogenomics involves the use of MS to refine annotation of protein-coding genes and discover genes in a genome. We carried out comprehensive proteogenomic analysis of Methylobacterium extorquens AM1 (ME-AM1) from publicly available proteomics data with a motive to improve annotation for methylotrophs; organisms capable of surviving in reduced carbon compounds such as methanol. Besides identifying 2482(50%) proteins, 29 new genes were discovered and 66 annotated gene models were revised in ME-AM1 genome. One such novel gene is identified with 75 peptides, lacks homolog in other methylobacteria but has glycosyl transferase and lipopolysaccharide biosynthesis protein domains, indicating its potential role in outer membrane synthesis. Many novel genes are present only in ME-AM1 among methylobacteria. Distant homologs of these genes in unrelated taxonomic classes and low GC-content of few genes suggest lateral gene transfer as a potential mode of their origin. Annotations of methylotrophy related genes were also improved by the discovery of a short gene in methylotrophy gene island and redefining a gene important for pyrroquinoline quinone synthesis, essential for methylotrophy. The combined use of proteogenomics and rigorous bioinformatics analysis greatly enhanced the annotation of protein-coding genes in model methylotroph ME-AM1 genome. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. General theory for integrated analysis of growth, gene, and protein expression in biofilms.

    Science.gov (United States)

    Zhang, Tianyu; Pabst, Breana; Klapper, Isaac; Stewart, Philip S

    2013-01-01

    A theory for analysis and prediction of spatial and temporal patterns of gene and protein expression within microbial biofilms is derived. The theory integrates phenomena of solute reaction and diffusion, microbial growth, mRNA or protein synthesis, biomass advection, and gene transcript or protein turnover. Case studies illustrate the capacity of the theory to simulate heterogeneous spatial patterns and predict microbial activities in biofilms that are qualitatively different from those of planktonic cells. Specific scenarios analyzed include an inducible GFP or fluorescent protein reporter, a denitrification gene repressed by oxygen, an acid stress response gene, and a quorum sensing circuit. It is shown that the patterns of activity revealed by inducible stable fluorescent proteins or reporter unstable proteins overestimate the region of activity. This is due to advective spreading and finite protein turnover rates. In the cases of a gene induced by either limitation for a metabolic substrate or accumulation of a metabolic product, maximal expression is predicted in an internal stratum of the biofilm. A quorum sensing system that includes an oxygen-responsive negative regulator exhibits behavior that is distinct from any stage of a batch planktonic culture. Though here the analyses have been limited to simultaneous interactions of up to two substrates and two genes, the framework applies to arbitrarily large networks of genes and metabolites. Extension of reaction-diffusion modeling in biofilms to the analysis of individual genes and gene networks is an important advance that dovetails with the growing toolkit of molecular and genetic experimental techniques.

  15. Origins of gene, genetic code, protein and life

    Indian Academy of Sciences (India)

    Unknown

    have concluded that newly-born genes are products of nonstop frames (NSF) ... research to determine tertiary structures of proteins such ... the present earth, is favourable for new genes to arise, if ..... NGG) in the universal genetic code table, cannot satisfy ..... which has been proposed to explain the development of life on.

  16. Atypical haemolytic uraemic syndrome associated with a hybrid complement gene.

    Directory of Open Access Journals (Sweden)

    Julian P Venables

    2006-10-01

    Full Text Available BACKGROUND: Sequence analysis of the regulators of complement activation (RCA cluster of genes at chromosome position 1q32 shows evidence of several large genomic duplications. These duplications have resulted in a high degree of sequence identity between the gene for factor H (CFH and the genes for the five factor H-related proteins (CFHL1-5; aliases CFHR1-5. CFH mutations have been described in association with atypical haemolytic uraemic syndrome (aHUS. The majority of the mutations are missense changes that cluster in the C-terminal region and impair the ability of factor H to regulate surface-bound C3b. Some have arisen as a result of gene conversion between CFH and CFHL1. In this study we tested the hypothesis that nonallelic homologous recombination between low-copy repeats in the RCA cluster could result in the formation of a hybrid CFH/CFHL1 gene that predisposes to the development of aHUS. METHODS AND FINDINGS: In a family with many cases of aHUS that segregate with the RCA cluster we used cDNA analysis, gene sequencing, and Southern blotting to show that affected individuals carry a heterozygous CFH/CFHL1 hybrid gene in which exons 1-21 are derived from CFH and exons 22/23 from CFHL1. This hybrid encodes a protein product identical to a functionally significant CFH mutant (c.3572C>T, S1191L and c.3590T>C, V1197A that has been previously described in association with aHUS. CONCLUSIONS: CFH mutation screening is recommended in all aHUS patients prior to renal transplantation because of the high risk of disease recurrence post-transplant in those known to have a CFH mutation. Because of our finding it will be necessary to implement additional screening strategies that will detect a hybrid CFH/CFHL1 gene.

  17. Ethylene-induced senescence-related gene expression requires protein synthesis

    International Nuclear Information System (INIS)

    Lawton, K.A.; Raghothama, K.G.; Woodson, W.R.

    1990-01-01

    We have investigated the effects of inhibiting protein synthesis on the ethylene-induced expression of 3 carnation senescence-related genes, pSR5, pSR8, and pSR12. Treatment of preclimacteric carnation petal discs with 1μg/ml of cycloheximide, a cytoplasmic protein synthesis inhibitor, for 3h inhibited protein synthesis by >80% as quantitated by the incorporation of [35S]methionine into protein. Pre-treatment of petal discs with cycloheximide prevented ethylene-induced SR transcript accumulation. Cycloheximide treatment of petal discs held in air did not result in increased levels of SR mRNA. These results indicate that ethylene does not interact with pre-formed factors but rather that the activation of SR gene expression by ethylene is mediated by labile protein factor(s) synthesized on cytoplasmic ribosomes. Experiments are currently underway to determine if cycloheximide exerts its effect at the transcriptional or post-transcriptional level

  18. False positive reduction in protein-protein interaction predictions using gene ontology annotations

    Directory of Open Access Journals (Sweden)

    Lin Yen-Han

    2007-07-01

    Full Text Available Abstract Background Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated. Results Gene Ontology (GO annotations were used to reduce false positive protein-protein interactions (PPI pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The 'strength', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the strength varies between two and ten-fold of randomly removing protein pairs from the datasets. Conclusion Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially

  19. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

    Science.gov (United States)

    Hensman, James; Lawrence, Neil D; Rattray, Magnus

    2013-08-20

    Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

  20. Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data

    Directory of Open Access Journals (Sweden)

    Sakellariou Argiris

    2012-10-01

    Full Text Available Abstract Background A feature selection method in microarray gene expression data should be independent of platform, disease and dataset size. Our hypothesis is that among the statistically significant ranked genes in a gene list, there should be clusters of genes that share similar biological functions related to the investigated disease. Thus, instead of keeping N top ranked genes, it would be more appropriate to define and keep a number of gene cluster exemplars. Results We propose a hybrid FS method (mAP-KL, which combines multiple hypothesis testing and affinity propagation (AP-clustering algorithm along with the Krzanowski & Lai cluster quality index, to select a small yet informative subset of genes. We applied mAP-KL on real microarray data, as well as on simulated data, and compared its performance against 13 other feature selection approaches. Across a variety of diseases and number of samples, mAP-KL presents competitive classification results, particularly in neuromuscular diseases, where its overall AUC score was 0.91. Furthermore, mAP-KL generates concise yet biologically relevant and informative N-gene expression signatures, which can serve as a valuable tool for diagnostic and prognostic purposes, as well as a source of potential disease biomarkers in a broad range of diseases. Conclusions mAP-KL is a data-driven and classifier-independent hybrid feature selection method, which applies to any disease classification problem based on microarray data, regardless of the available samples. Combining multiple hypothesis testing and AP leads to subsets of genes, which classify unknown samples from both, small and large patient cohorts with high accuracy.

  1. In silico analysis highlights the frequency and diversity of type 1 lantibiotic gene clusters in genome sequenced bacteria

    LENUS (Irish Health Repository)

    Marsh, Alan J

    2010-11-30

    Abstract Background Lantibiotics are lanthionine-containing, post-translationally modified antimicrobial peptides. These peptides have significant, but largely untapped, potential as preservatives and chemotherapeutic agents. Type 1 lantibiotics are those in which lanthionine residues are introduced into the structural peptide (LanA) through the activity of separate lanthionine dehydratase (LanB) and lanthionine synthetase (LanC) enzymes. Here we take advantage of the conserved nature of LanC enzymes to devise an in silico approach to identify potential lantibiotic-encoding gene clusters in genome sequenced bacteria. Results In total 49 novel type 1 lantibiotic clusters were identified which unexpectedly were associated with species, genera and even phyla of bacteria which have not previously been associated with lantibiotic production. Conclusions Multiple type 1 lantibiotic gene clusters were identified at a frequency that suggests that these antimicrobials are much more widespread than previously thought. These clusters represent a rich repository which can yield a large number of valuable novel antimicrobials and biosynthetic enzymes.

  2. Functional redundancy and/or ongoing pseudogenization among F-box protein genes expressed in Arabidopsis male gametophyte.

    Science.gov (United States)

    Ikram, Sobia; Durandet, Monique; Vesa, Simona; Pereira, Serge; Guerche, Philippe; Bonhomme, Sandrine

    2014-06-01

    F-box protein genes family is one of the largest gene families in plants, with almost 700 predicted genes in the model plant Arabidopsis. F-box proteins are key components of the ubiquitin proteasome system that allows targeted protein degradation. Transcriptome analyses indicate that half of these F-box protein genes are found expressed in microspore and/or pollen, i.e., during male gametogenesis. To assess the role of F-box protein genes during this crucial developmental step, we selected 34 F-box protein genes recorded as highly and specifically expressed in pollen and isolated corresponding insertion mutants. We checked the expression level of each selected gene by RT-PCR and confirmed pollen expression for 25 genes, but specific expression for only 10 of the 34 F-box protein genes. In addition, we tested the expression level of selected F-box protein genes in 24 mutant lines and showed that 11 of them were null mutants. Transmission analysis of the mutations to the progeny showed that none of the single mutations was gametophytic lethal. These unaffected transmission efficiencies suggested leaky mutations or functional redundancy among F-box protein genes. Cytological observation of the gametophytes in the mutants confirmed these results. Combinations of mutations in F-box protein genes from the same subfamily did not lead to transmission defect either, further highlighting functional redundancy and/or a high proportion of pseudogenes among these F-box protein genes.

  3. A dual origin of the Xist gene from a protein-coding gene and a set of transposable elements.

    Directory of Open Access Journals (Sweden)

    Eugeny A Elisaphenko

    2008-06-01

    Full Text Available X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC. Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA.

  4. Directed natural product biosynthesis gene cluster capture and expression in the model bacterium Bacillus subtilis

    KAUST Repository

    Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S.; Qian, Pei-Yuan

    2015-01-01

    validating this direct cloning plug-and-playa approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation

  5. Evolutionary history of the phl gene cluster in the plant-associated bacterium Pseudomonas fluorescens

    NARCIS (Netherlands)

    Moynihan, J.A.; Morrissey, J.P.; Coppoolse, E.; Stiekema, W.J.; O'Gara, F.; Boyd, E.F.

    2009-01-01

    Pseudomonas fluorescens is of agricultural and economic importance as a biological control agent largely because of its plant-association and production of secondary metabolites, in particular 2, 4-diacetylphloroglucinol (2, 4-DAPG). This polyketide, which is encoded by the eight gene phl cluster,

  6. Gene co-expression analysis identifies gene clusters associated with isotropic and polarized growth in Aspergillus fumigatus conidia.

    Science.gov (United States)

    Baltussen, Tim J H; Coolen, Jordy P M; Zoll, Jan; Verweij, Paul E; Melchers, Willem J G

    2018-04-26

    Aspergillus fumigatus is a saprophytic fungus that extensively produces conidia. These microscopic asexually reproductive structures are small enough to reach the lungs. Germination of conidia followed by hyphal growth inside human lungs is a key step in the establishment of infection in immunocompromised patients. RNA-Seq was used to analyze the transcriptome of dormant and germinating A. fumigatus conidia. Construction of a gene co-expression network revealed four gene clusters (modules) correlated with a growth phase (dormant, isotropic growth, polarized growth). Transcripts levels of genes encoding for secondary metabolites were high in dormant conidia. During isotropic growth, transcript levels of genes involved in cell wall modifications increased. Two modules encoding for growth and cell cycle/DNA processing were associated with polarized growth. In addition, the co-expression network was used to identify highly connected intermodular hub genes. These genes may have a pivotal role in the respective module and could therefore be compelling therapeutic targets. Generally, cell wall remodeling is an important process during isotropic and polarized growth, characterized by an increase of transcripts coding for hyphal growth and cell cycle/DNA processing when polarized growth is initiated. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Molecular characterization of the porcine surfactant, pulmonary-associated protein C gene

    DEFF Research Database (Denmark)

    Cirera, S.; Nygård, A.B.; Jensen, H.E.

    2006-01-01

    The surfactant, pulmonary-associated protein C (SFTPC) is a peptide secreted by the alveolar type II pneumocytes of the lung. We have characterized the porcine SFTPC gene at genomic, transcriptional, and protein levels. The porcine SFTPC is a single-copy gene on pig chromosome 14. Two transcripts...

  8. The Fdb3 transcription factor of the Fusarium Detoxification of Benzoxazolinone gene cluster is required for MBOA but not BOA degradation in Fusarium pseudograminearum.

    Science.gov (United States)

    Kettle, Andrew J; Carere, Jason; Batley, Jacqueline; Manners, John M; Kazan, Kemal; Gardiner, Donald M

    2016-03-01

    A number of cereals produce the benzoxazolinone class of phytoalexins. Fusarium species pathogenic towards these hosts can typically degrade these compounds via an aminophenol intermediate, and the ability to do so is encoded by a group of genes found in the Fusarium Detoxification of Benzoxazolinone (FDB) cluster. A zinc finger transcription factor encoded by one of the FDB cluster genes (FDB3) has been proposed to regulate the expression of other genes in the cluster and hence is potentially involved in benzoxazolinone degradation. Herein we show that Fdb3 is essential for the ability of Fusarium pseudograminearum to efficiently detoxify the predominant wheat benzoxazolinone, 6-methoxy-benzoxazolin-2-one (MBOA), but not benzoxazoline-2-one (BOA). Furthermore, additional genes thought to be part of the FDB gene cluster, based upon transcriptional response to benzoxazolinones, are regulated by Fdb3. However, deletion mutants for these latter genes remain capable of benzoxazolinone degradation, suggesting that they are not essential for this process. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.

  9. Integrative characterization of germ cell-specific genes from mouse spermatocyte UniGene library

    Directory of Open Access Journals (Sweden)

    Eddy Edward M

    2007-07-01

    Full Text Available Abstract Background The primary regulator of spermatogenesis, a highly ordered and tightly regulated developmental process, is an intrinsic genetic program involving male germ cell-specific genes. Results We analyzed the mouse spermatocyte UniGene library containing 2155 gene-oriented transcript clusters. We predict that 11% of these genes are testis-specific and systematically identified 24 authentic genes specifically and abundantly expressed in the testis via in silico and in vitro approaches. Northern blot analysis disclosed various transcript characteristics, such as expression level, size and the presence of isoform. Expression analysis revealed developmentally regulated and stage-specific expression patterns in all of the genes. We further analyzed the genes at the protein and cellular levels. Transfection assays performed using GC-2 cells provided information on the cellular characteristics of the gene products. In addition, antibodies were generated against proteins encoded by some of the genes to facilitate their identification and characterization in spermatogenic cells and sperm. Our data suggest that a number of the gene products are implicated in transcriptional regulation, nuclear integrity, sperm structure and motility, and fertilization. In particular, we found for the first time that Mm.333010, predicted to contain a trypsin-like serine protease domain, is a sperm acrosomal protein. Conclusion We identify 24 authentic genes with spermatogenic cell-specific expression, and provide comprehensive information about the genes. Our findings establish a new basis for future investigation into molecular mechanisms underlying male reproduction.

  10. Novel algorithms reveal streptococcal transcriptomes and clues about undefined genes.

    Science.gov (United States)

    Ryan, Patricia A; Kirk, Brian W; Euler, Chad W; Schuch, Raymond; Fischetti, Vincent A

    2007-07-01

    Bacteria-host interactions are dynamic processes, and understanding transcriptional responses that directly or indirectly regulate the expression of genes involved in initial infection stages would illuminate the molecular events that result in host colonization. We used oligonucleotide microarrays to monitor (in vitro) differential gene expression in group A streptococci during pharyngeal cell adherence, the first overt infection stage. We present neighbor clustering, a new computational method for further analyzing bacterial microarray data that combines two informative characteristics of bacterial genes that share common function or regulation: (1) similar gene expression profiles (i.e., co-expression); and (2) physical proximity of genes on the chromosome. This method identifies statistically significant clusters of co-expressed gene neighbors that potentially share common function or regulation by coupling statistically analyzed gene expression profiles with the chromosomal position of genes. We applied this method to our own data and to those of others, and we show that it identified a greater number of differentially expressed genes, facilitating the reconstruction of more multimeric proteins and complete metabolic pathways than would have been possible without its application. We assessed the biological significance of two identified genes by assaying deletion mutants for adherence in vitro and show that neighbor clustering indeed provides biologically relevant data. Neighbor clustering provides a more comprehensive view of the molecular responses of streptococci during pharyngeal cell adherence.

  11. Heterogeneic dynamics of the structures of multiple gene clusters in two pathogenetically different lines originating from the same phytoplasma.

    Science.gov (United States)

    Arashida, Ryo; Kakizawa, Shigeyuki; Hoshi, Ayaka; Ishii, Yoshiko; Jung, Hee-Young; Kagiwada, Satoshi; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou

    2008-04-01

    Phytoplasmas are phloem-limited plant pathogens that are transmitted by insect vectors and are associated with diseases in hundreds of plant species. Despite their small sizes, phytoplasma genomes have repeat-rich sequences, which are due to several genes that are encoded as multiple copies. These multiple genes exist in a gene cluster, the potential mobile unit (PMU). PMUs are present at several distinct regions in the phytoplasma genome. The multicopy genes encoded by PMUs (herein named mobile unit genes [MUGs]) and similar genes elsewhere in the genome (herein named fundamental genes [FUGs]) are likely to have the same function based on their annotations. In this manuscript we show evidence that MUGs and FUGs do not cluster together within the same clade. Each MUG is in a cluster with a short branch length, suggesting that MUGs are recently diverged paralogs, whereas the origin of FUGs is different from that of MUGs. We also compared the genome structures around the lplA gene in two derivative lines of the 'Candidatus Phytoplasma asteris' OY strain, the severe-symptom line W (OY-W) and the mild-symptom line M (OY-M). The gene organizations of the nucleotide sequences upstream of the lplA genes of OY-W and OY-M were dramatically different. The tra5 insertion sequence, an element of PMUs, was found only in this region in OY-W. These results suggest that transposition of entire PMUs and PMU sections has occurred frequently in the OY phytoplasma genome. The difference in the pathogenicities of OY-W and OY-M might be caused by the duplication and transposition of PMUs, followed by genome rearrangement.

  12. Gene, protein, and network of male sterility in rice

    OpenAIRE

    Wang, Kun; Peng, Xiaojue; Ji, Yanxiao; Yang, Pingfang; Zhu, Yingguo; Li, Shaoqing

    2013-01-01

    Rice is one of the most important model crop plants whose heterosis has been well-exploited in commercial hybrid seed production via a variety of types of male-sterile lines. Hybrid rice cultivation area is steadily expanding around the world, especially in Southern Asia. Characterization of genes and proteins related to male sterility aims to understand how and why the male sterility occurs, and which proteins are the key players for microspores abortion. Recently, a series of genes and prot...

  13. Simulating evolution of protein complexes through gene duplication and co-option.

    Science.gov (United States)

    Haarsma, Loren; Nelesen, Serita; VanAndel, Ethan; Lamine, James; VandeHaar, Peter

    2016-06-21

    We present a model of the evolution of protein complexes with novel functions through gene duplication, mutation, and co-option. Under a wide variety of input parameters, digital organisms evolve complexes of 2-5 bound proteins which have novel functions but whose component proteins are not independently functional. Evolution of complexes with novel functions happens more quickly as gene duplication rates increase, point mutation rates increase, protein complex functional probability increases, protein complex functional strength increases, and protein family size decreases. Evolution of complexity is inhibited when the metabolic costs of making proteins exceeds the fitness gain of having functional proteins, or when point mutation rates get so large the functional proteins undergo deleterious mutations faster than new functional complexes can evolve. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Cloning and characterization of an insecticidal crystal protein gene ...

    Indian Academy of Sciences (India)

    Unknown

    The sequence of the cloned crystal protein gene showed almost complete homology with a mosquitocidal toxin gene from Bacillus .... diet or by topical application on food substrates as .... has very high similarity (99.74%) at DNA level with.

  15. GPI-anchored proteins are confined in subdiffraction clusters at the apical surface of polarized epithelial cells.

    Science.gov (United States)

    Paladino, Simona; Lebreton, Stéphanie; Lelek, Mickaël; Riccio, Patrizia; De Nicola, Sergio; Zimmer, Christophe; Zurzolo, Chiara

    2017-12-01

    Spatio-temporal compartmentalization of membrane proteins is critical for the regulation of diverse vital functions in eukaryotic cells. It was previously shown that, at the apical surface of polarized MDCK cells, glycosylphosphatidylinositol (GPI)-anchored proteins (GPI-APs) are organized in small cholesterol-independent clusters of single GPI-AP species (homoclusters), which are required for the formation of larger cholesterol-dependent clusters formed by multiple GPI-AP species (heteroclusters). This clustered organization is crucial for the biological activities of GPI-APs; hence, understanding the spatio-temporal properties of their membrane organization is of fundamental importance. Here, by using direct stochastic optical reconstruction microscopy coupled to pair correlation analysis (pc-STORM), we were able to visualize and measure the size of these clusters. Specifically, we show that they are non-randomly distributed and have an average size of 67 nm. We also demonstrated that polarized MDCK and non-polarized CHO cells have similar cluster distribution and size, but different sensitivity to cholesterol depletion. Finally, we derived a model that allowed a quantitative characterization of the cluster organization of GPI-APs at the apical surface of polarized MDCK cells for the first time. Experimental FRET (fluorescence resonance energy transfer)/FLIM (fluorescence-lifetime imaging microscopy) data were correlated to the theoretical predictions of the model. © 2017 The Author(s).

  16. Density parameter estimation for finding clusters of homologous proteins-tracing actinobacterial pathogenicity lifestyles

    DEFF Research Database (Denmark)

    Röttger, Richard; Kalaghatgi, Prabhav; Sun, Peng

    2013-01-01

    Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglecte...

  17. Identification of novel type 1 diabetes candidate genes by integrating genome-wide association data, protein-protein interactions, and human pancreatic islet gene expression

    DEFF Research Database (Denmark)

    Bergholdt, Regine; Brorsson, Caroline; Palleja, Albert

    2012-01-01

    Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated with dis......-cells. Our results provide novel insight to the mechanisms behind type 1 diabetes pathogenesis and, thus, may provide the basis for the design of novel treatment strategies.......Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated...... with disease, and they do not typically inform the broader context in which the disease genes operate. Here, we integrated type 1 diabetes GWAS data with protein-protein interactions to construct biological networks of relevance for disease. A total of 17 networks were identified. To prioritize...

  18. G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes

    Directory of Open Access Journals (Sweden)

    Lemay Danielle G

    2012-09-01

    Full Text Available Abstract Background In previous studies, gene neighborhoods—spatial clusters of co-expressed genes in the genome—have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously. Results Using G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods. Conclusions Our experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The

  19. Insulators target active genes to transcription factories and polycomb-repressed genes to polycomb bodies.

    Directory of Open Access Journals (Sweden)

    Hua-Bing Li

    2013-04-01

    Full Text Available Polycomb bodies are foci of Polycomb proteins in which different Polycomb target genes are thought to co-localize in the nucleus, looping out from their chromosomal context. We have shown previously that insulators, not Polycomb response elements (PREs, mediate associations among Polycomb Group (PcG targets to form Polycomb bodies. Here we use live imaging and 3C interactions to show that transgenes containing PREs and endogenous PcG-regulated genes are targeted by insulator proteins to different nuclear structures depending on their state of activity. When two genes are repressed, they co-localize in Polycomb bodies. When both are active, they are targeted to transcription factories in a fashion dependent on Trithorax and enhancer specificity as well as the insulator protein CTCF. In the absence of CTCF, assembly of Polycomb bodies is essentially reduced to those representing genomic clusters of Polycomb target genes. The critical role of Trithorax suggests that stable association with a specialized transcription factory underlies the cellular memory of the active state.

  20. Sequence Variation in Rhoptry Neck Protein 10 Gene among Toxoplasma gondii Isolates from Different Hosts and Geographical Locations

    Directory of Open Access Journals (Sweden)

    Yu ZHAO

    2017-09-01

    Full Text Available Background: Toxoplasma gondii, as a eukaryotic parasite of the phylum Apicomplexa, can infect almost all the warm-blooded animals and humans, causing toxoplasmosis. Rhoptry neck proteins (RONs play a key role in the invasion process of T. gondii and are potential vaccine candidate molecules against toxoplasmosis.Methods: The present study examined sequence variation in the rhoptry neck protein 10 (TgRON10 gene among 10 T. gondii isolates from different hosts and geographical locations from Lanzhou province during 2014, and compared with the corresponding sequences of strains ME49 and VEG obtained from the ToxoDB database, using polymerase chain reaction (PCR amplification, sequence analysis, and phylogenetic reconstruction by Bayesian inference (BI and maximum parsimony (MP. Results: Analysis of all the 12 TgRON10 genomic and cDNA sequences revealed 7 exons and 6 introns in the TgRON10 gDNA. The complete genomic sequence of the TgRON10 gene ranged from 4759 bp to 4763 bp, and sequence variation was 0-0.6% among the 12 T. gondii isolates, indicating a low sequence variation in TgRON10 gene. Phylogenetic analysis of TgRON10 sequences showed that the cluster of the 12 T. gondii isolates was not completely consistent with their respective genotypes.Conclusion: TgRON10 gene is not a suitable genetic marker for the differentiation of T. gondii isolates from different hosts and geographical locations, but may represent a potential vaccine candidate against toxoplasmosis, worth further studies.

  1. Sequence Variation in Rhoptry Neck Protein 10 Gene among Toxoplasma gondii Isolates from Different Hosts and Geographical Locations.

    Science.gov (United States)

    Zhao, Yu; Zhou, Donghui; Chen, Jia; Sun, Xiaolin

    2017-01-01

    Toxoplasma gondii, as a eukaryotic parasite of the phylum Apicomplexa, can infect almost all the warm-blooded animals and humans, causing toxoplasmosis. Rhoptry neck proteins (RONs) play a key role in the invasion process of T. gondii and are potential vaccine candidate molecules against toxoplasmosis. The present study examined sequence variation in the rhoptry neck protein 10 (TgRON10) gene among 10 T. gondii isolates from different hosts and geographical locations from Lanzhou province during 2014, and compared with the corresponding sequences of strains ME49 and VEG obtained from the ToxoDB database, using polymerase chain reaction (PCR) amplification, sequence analysis, and phylogenetic reconstruction by Bayesian inference (BI) and maximum parsimony (MP). Analysis of all the 12 TgRON10 genomic and cDNA sequences revealed 7 exons and 6 introns in the TgRON10 gDNA. The complete genomic sequence of the TgRON10 gene ranged from 4759 bp to 4763 bp, and sequence variation was 0-0.6% among the 12 T. gondii isolates, indicating a low sequence variation in TgRON10 gene. Phylogenetic analysis of TgRON10 sequences showed that the cluster of the 12 T. gondii isolates was not completely consistent with their respective genotypes. TgRON10 gene is not a suitable genetic marker for the differentiation of T. gondii isolates from different hosts and geographical locations, but may represent a potential vaccine candidate against toxoplasmosis, worth further studies.

  2. The gsdf gene locus harbors evolutionary conserved and clustered genes preferentially expressed in fish previtellogenic oocytes.

    Science.gov (United States)

    Gautier, Aude; Le Gac, Florence; Lareyre, Jean-Jacques

    2011-02-01

    display a different cellular localization compared to that of the gsdf gene indicating that the later gene is not co-regulated. Interestingly, our study identifies new clustered genes that are specifically expressed in previtellogenic oocytes (nup54, aff1, klhl8, sdad1). Copyright © 2010 Elsevier B.V. All rights reserved.

  3. One, Two, Three: Polycomb Proteins Hit All Dimensions of Gene Regulation

    Directory of Open Access Journals (Sweden)

    Stefania del Prete

    2015-07-01

    Full Text Available Polycomb group (PcG proteins contribute to the formation and maintenance of a specific repressive chromatin state that prevents the expression of genes in a particular space and time. Polycomb repressive complexes (PRCs consist of several PcG proteins with specific regulatory or catalytic properties. PRCs are recruited to thousands of target genes, and various recruitment factors, including DNA-binding proteins and non-coding RNAs, are involved in the targeting. PcG proteins contribute to a multitude of biological processes by altering chromatin features at different scales. PcG proteins mediate both biochemical modifications of histone tails and biophysical modifications (e.g., chromatin fiber compaction and three-dimensional (3D chromatin conformation. Here, we review the role of PcG proteins in nuclear architecture, describing their impact on the structure of the chromatin fiber, on chromatin interactions, and on the spatial organization of the genome in nuclei. Although little is known about the role of plant PcG proteins in nuclear organization, much is known in the animal field, and we highlight similarities and differences in the roles of PcG proteins in 3D gene regulation in plants and animals.

  4. Transcriptome profiling and digital gene expression analysis of genes associated with salinity resistance in peanut

    Directory of Open Access Journals (Sweden)

    Jiongming Sui

    2018-03-01

    Full Text Available Background: Soil salinity can significantly reduce crop production, but the molecular mechanism of salinity tolerance in peanut is poorly understood. A mutant (S1 with higher salinity resistance than its mutagenic parent HY22 (S3 was obtained. Transcriptome sequencing and digital gene expression (DGE analysis were performed with leaves of S1 and S3 before and after plants were irrigated with 250 mM NaCl. Results: A total of 107,725 comprehensive transcripts were assembled into 67,738 unigenes using TIGR Gene Indices clustering tools (TGICL. All unigenes were searched against the euKaryotic Ortholog Groups (KOG, gene ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG databases, and these unigenes were assigned to 26 functional KOG categories, 56 GO terms, 32 KEGG groups, respectively. In total 112 differentially expressed genes (DEGs between S1 and S3 after salinity stress were screened, among them, 86 were responsive to salinity stress in S1 and/or S3. These 86 DEGs included genes that encoded the following kinds of proteins that are known to be involved in resistance to salinity stress: late embryogenesis abundant proteins (LEAs, major intrinsic proteins (MIPs or aquaporins, metallothioneins (MTs, lipid transfer protein (LTP, calcineurin B-like protein-interacting protein kinases (CIPKs, 9-cis-epoxycarotenoid dioxygenase (NCED and oleosins, etc. Of these 86 DEGs, 18 could not be matched with known proteins. Conclusion: The results from this study will be useful for further research on the mechanism of salinity resistance and will provide a useful gene resource for the variety breeding of salinity resistance in peanut. Keywords: Digital gene expression, Gene, Mutant, NaCl, Peanut (Arachis hypogaea L., RNA-seq, Salinity stress, Salinity tolerance, Soil salinity, Transcripts, Unigenes

  5. Epigenetic upregulation of lncRNAs at 13q14.3 in leukemia is linked to the In Cis downregulation of a gene cluster that targets NF-kB.

    Directory of Open Access Journals (Sweden)

    Angela Garding

    2013-04-01

    Full Text Available Non-coding RNAs are much more common than previously thought. However, for the vast majority of non-coding RNAs, the cellular function remains enigmatic. The two long non-coding RNA (lncRNA genes DLEU1 and DLEU2 map to a critical region at chromosomal band 13q14.3 that is recurrently deleted in solid tumors and hematopoietic malignancies like chronic lymphocytic leukemia (CLL. While no point mutations have been found in the protein coding candidate genes at 13q14.3, they are deregulated in malignant cells, suggesting an epigenetic tumor suppressor mechanism. We therefore characterized the epigenetic makeup of 13q14.3 in CLL cells and found histone modifications by chromatin-immunoprecipitation (ChIP that are associated with activated transcription and significant DNA-demethylation at the transcriptional start sites of DLEU1 and DLEU2 using 5 different semi-quantitative and quantitative methods (aPRIMES, BioCOBRA, MCIp, MassARRAY, and bisulfite sequencing. These epigenetic aberrations were correlated with transcriptional deregulation of the neighboring candidate tumor suppressor genes, suggesting a coregulation in cis of this gene cluster. We found that the 13q14.3 genes in addition to their previously known functions regulate NF-kB activity, which we could show after overexpression, siRNA-mediated knockdown, and dominant-negative mutant genes by using Western blots with previously undescribed antibodies, by a customized ELISA as well as by reporter assays. In addition, we performed an unbiased screen of 810 human miRNAs and identified the miR-15/16 family of genes at 13q14.3 as the strongest inducers of NF-kB activity. In summary, the tumor suppressor mechanism at 13q14.3 is a cluster of genes controlled by two lncRNA genes that are regulated by DNA-methylation and histone modifications and whose members all regulate NF-kB. Therefore, the tumor suppressor mechanism in 13q14.3 underlines the role both of epigenetic aberrations and of lncRNA genes

  6. Epigenetic Upregulation of lncRNAs at 13q14.3 in Leukemia Is Linked to the In Cis Downregulation of a Gene Cluster That Targets NF-kB

    Science.gov (United States)

    Claus, Rainer; Ruppel, Melanie; Tschuch, Cordula; Filarsky, Katharina; Idler, Irina; Zucknick, Manuela; Caudron-Herger, Maïwen; Oakes, Christopher; Fleig, Verena; Keklikoglou, Ioanna; Allegra, Danilo; Serra, Leticia; Thakurela, Sudhir; Tiwari, Vijay; Weichenhan, Dieter; Benner, Axel; Radlwimmer, Bernhard; Zentgraf, Hanswalter; Wiemann, Stefan; Rippe, Karsten; Plass, Christoph; Döhner, Hartmut; Lichter, Peter; Stilgenbauer, Stephan; Mertens, Daniel

    2013-01-01

    Non-coding RNAs are much more common than previously thought. However, for the vast majority of non-coding RNAs, the cellular function remains enigmatic. The two long non-coding RNA (lncRNA) genes DLEU1 and DLEU2 map to a critical region at chromosomal band 13q14.3 that is recurrently deleted in solid tumors and hematopoietic malignancies like chronic lymphocytic leukemia (CLL). While no point mutations have been found in the protein coding candidate genes at 13q14.3, they are deregulated in malignant cells, suggesting an epigenetic tumor suppressor mechanism. We therefore characterized the epigenetic makeup of 13q14.3 in CLL cells and found histone modifications by chromatin-immunoprecipitation (ChIP) that are associated with activated transcription and significant DNA-demethylation at the transcriptional start sites of DLEU1 and DLEU2 using 5 different semi-quantitative and quantitative methods (aPRIMES, BioCOBRA, MCIp, MassARRAY, and bisulfite sequencing). These epigenetic aberrations were correlated with transcriptional deregulation of the neighboring candidate tumor suppressor genes, suggesting a coregulation in cis of this gene cluster. We found that the 13q14.3 genes in addition to their previously known functions regulate NF-kB activity, which we could show after overexpression, siRNA–mediated knockdown, and dominant-negative mutant genes by using Western blots with previously undescribed antibodies, by a customized ELISA as well as by reporter assays. In addition, we performed an unbiased screen of 810 human miRNAs and identified the miR-15/16 family of genes at 13q14.3 as the strongest inducers of NF-kB activity. In summary, the tumor suppressor mechanism at 13q14.3 is a cluster of genes controlled by two lncRNA genes that are regulated by DNA-methylation and histone modifications and whose members all regulate NF-kB. Therefore, the tumor suppressor mechanism in 13q14.3 underlines the role both of epigenetic aberrations and of lncRNA genes in human

  7. The Expression of Genes Encoding Secreted Proteins in Medicago truncatula A17 Inoculated Roots

    Directory of Open Access Journals (Sweden)

    LUCIA KUSUMAWATI

    2013-09-01

    Full Text Available Subtilisin-like serine protease (MtSBT, serine carboxypeptidase (MtSCP, MtN5, non-specific lipid transfer protein (MtnsLTP, early nodulin2-like protein (MtENOD2-like, FAD-binding domain containing protein (MtFAD-BP1, and rhicadhesin receptor protein (MtRHRE1 were among 34 proteins found in the supernatant of M. truncatula 2HA and sickle cell suspension cultures. This study investigated the expression of genes encoding those proteins in roots and developing nodules. Two methods were used: quantitative real time RT-PCR and gene expression analysis (with promoter:GUS fusion in roots. Those proteins are predicted as secreted proteins which is indirectly supported by the findings that promoter:GUS fusions of six of the seven genes encoding secreted proteins were strongly expressed in the vascular bundle of transgenic hairy roots. All six genes have expressed in 14-day old nodule. The expression levels of the selected seven genes were quantified in Sinorhizobium-inoculated and control plants using quantitative real time RT-PCR. In conclusion, among seven genes encoding secreted proteins analyzed, the expression level of only one gene, MtN5, was up-regulated significantly in inoculated root segments compared to controls. The expression of MtSBT1, MtSCP1, MtnsLTP, MtFAD-BP1, MtRHRE1 and MtN5 were higher in root tip than in other tissues examined.

  8. Expression of circadian clock genes and proteins in urothelial cancer is related to cancer-associated genes

    International Nuclear Information System (INIS)

    Litlekalsoy, Jorunn; Rostad, Kari; Kalland, Karl-Henning; Hostmark, Jens G.; Laerum, Ole Didrik

    2016-01-01

    The purpose of this study was to evaluate invasive and metastatic potential of urothelial cancer by investigating differential expression of various clock genes/proteins participating in the 24 h circadian rhythms and to compare these gene expressions with transcription of other cancer-associated genes. Twenty seven paired samples of tumour and benign tissue collected from patients who underwent cystectomy were analysed and compared to 15 samples of normal bladder tissue taken from patients who underwent cystoscopy for benign prostate hyperplasia (unrelated donors). Immunohistochemical analyses were made for clock and clock-related proteins. In addition, the gene-expression levels of 22 genes (clock genes, casein kinases, oncogenes, tumour suppressor genes and cytokeratins) were analysed by real-time quantitative PCR (qPCR). Considerable up- or down-regulation and altered cellular distribution of different clock proteins, a reduction of casein kinase1A1 (CSNK1A1) and increase of casein kinase alpha 1 E (CSNK1E) were found. The pattern was significantly correlated with simultaneous up-regulation of stimulatory tumour markers, and a down-regulation of several suppressor genes. The pattern was mainly seen in aneuploid high-grade cancers. Considerable alterations were also found in the neighbouring bladder mucosa. The close correlation between altered expression of various clock genes and common tumour markers in urothelial cancer indicates that disturbed function in the cellular clock work may be an important additional mechanism contributing to cancer progression and malignant behaviour. The online version of this article (doi:10.1186/s12885-016-2580-y) contains supplementary material, which is available to authorized users

  9. Biosynthesis of Akaeolide and Lorneic Acids and Annotation of Type I Polyketide Synthase Gene Clusters in the Genome of Streptomyces sp. NPS554

    Directory of Open Access Journals (Sweden)

    Tao Zhou

    2015-01-01

    Full Text Available The incorporation pattern of biosynthetic precursors into two structurally unique polyketides, akaeolide and lorneic acid A, was elucidated by feeding experiments with 13C-labeled precursors. In addition, the draft genome sequence of the producer, Streptomyces sp. NPS554, was performed and the biosynthetic gene clusters for these polyketides were identified. The putative gene clusters contain all the polyketide synthase (PKS domains necessary for assembly of the carbon skeletons. Combined with the 13C-labeling results, gene function prediction enabled us to propose biosynthetic pathways involving unusual carbon-carbon bond formation reactions. Genome analysis also indicated the presence of at least ten orphan type I PKS gene clusters that might be responsible for the production of new polyketides.

  10. Expression-based clustering of CAZyme-encoding genes of Aspergillus niger.

    Science.gov (United States)

    Gruben, Birgit S; Mäkelä, Miia R; Kowalczyk, Joanna E; Zhou, Miaomiao; Benoit-Gelber, Isabelle; De Vries, Ronald P

    2017-11-23

    The Aspergillus niger genome contains a large repertoire of genes encoding carbohydrate active enzymes (CAZymes) that are targeted to plant polysaccharide degradation enabling A. niger to grow on a wide range of plant biomass substrates. Which genes need to be activated in certain environmental conditions depends on the composition of the available substrate. Previous studies have demonstrated the involvement of a number of transcriptional regulators in plant biomass degradation and have identified sets of target genes for each regulator. In this study, a broad transcriptional analysis was performed of the A. niger genes encoding (putative) plant polysaccharide degrading enzymes. Microarray data focusing on the initial response of A. niger to the presence of plant biomass related carbon sources were analyzed of a wild-type strain N402 that was grown on a large range of carbon sources and of the regulatory mutant strains ΔxlnR, ΔaraR, ΔamyR, ΔrhaR and ΔgalX that were grown on their specific inducing compounds. The cluster analysis of the expression data revealed several groups of co-regulated genes, which goes beyond the traditionally described co-regulated gene sets. Additional putative target genes of the selected regulators were identified, based on their expression profile. Notably, in several cases the expression profile puts questions on the function assignment of uncharacterized genes that was based on homology searches, highlighting the need for more extensive biochemical studies into the substrate specificity of enzymes encoded by these non-characterized genes. The data also revealed sets of genes that were upregulated in the regulatory mutants, suggesting interaction between the regulatory systems and a therefore even more complex overall regulatory network than has been reported so far. Expression profiling on a large number of substrates provides better insight in the complex regulatory systems that drive the conversion of plant biomass by fungi. In

  11. Genes and proteins of Escherichia coli K-12.

    Science.gov (United States)

    Riley, M

    1998-01-01

    GenProtEC is a database of Escherichia coli genes and their gene products, classified by type of function and physiological role and with citations to the literature for each. Also present are data on sequence similarities among E.coli proteins, representing groups of paralogous genes, with PAM values, percent identity of amino acids, length of alignment and percent aligned. GenProtEC can be accessed at the URL http://www.mbl.edu/html/ecoli.html

  12. Transcriptional organization of the DNA region controlling expression of the K99 gene cluster.

    Science.gov (United States)

    Roosendaal, B; Damoiseaux, J; Jordi, W; de Graaf, F K

    1989-01-01

    The transcriptional organization of the K99 gene cluster was investigated in two ways. First, the DNA region, containing the transcriptional signals was analyzed using a transcription vector system with Escherichia coli galactokinase (GalK) as assayable marker and second, an in vitro transcription system was employed. A detailed analysis of the transcription signals revealed that a strong promoter PA and a moderate promoter PB are located upstream of fanA and fanB, respectively. No promoter activity was detected in the intercistronic region between fanB and fanC. Factor-dependent terminators of transcription were detected and are probably located in the intercistronic region between fanA and fanB (T1), and between fanB and fanC (T2). A third terminator (T3) was observed between fanC and fanD and has an efficiency of 90%. Analysis of the regulatory region in an in vitro transcription system confirmed the location of the respective transcription signals. A model for the transcriptional organization of the K99 cluster is presented. Indications were obtained that the trans-acting regulatory polypeptides FanA and FanB both function as anti-terminators. A model for the regulation of expression of the K99 gene cluster is postulated.

  13. Ancient expansion of the hox cluster in lepidoptera generated four homeobox genes implicated in extra-embryonic tissue formation.

    Directory of Open Access Journals (Sweden)

    Laura Ferguson

    2014-10-01

    Full Text Available Gene duplications within the conserved Hox cluster are rare in animal evolution, but in Lepidoptera an array of divergent Hox-related genes (Shx genes has been reported between pb and zen. Here, we use genome sequencing of five lepidopteran species (Polygonia c-album, Pararge aegeria, Callimorpha dominula, Cameraria ohridella, Hepialus sylvina plus a caddisfly outgroup (Glyphotaelius pellucidus to trace the evolution of the lepidopteran Shx genes. We demonstrate that Shx genes originated by tandem duplication of zen early in the evolution of large clade Ditrysia; Shx are not found in a caddisfly and a member of the basally diverging Hepialidae (swift moths. Four distinct Shx genes were generated early in ditrysian evolution, and were stably retained in all descendent Lepidoptera except the silkmoth which has additional duplications. Despite extensive sequence divergence, molecular modelling indicates that all four Shx genes have the potential to encode stable homeodomains. The four Shx genes have distinct spatiotemporal expression patterns in early development of the Speckled Wood butterfly (Pararge aegeria, with ShxC demarcating the future sites of extraembryonic tissue formation via strikingly localised maternal RNA in the oocyte. All four genes are also expressed in presumptive serosal cells, prior to the onset of zen expression. Lepidopteran Shx genes represent an unusual example of Hox cluster expansion and integration of novel genes into ancient developmental regulatory networks.

  14. Radioresistance related genes screened by protein-protein interaction network analysis in nasopharyngeal carcinoma

    International Nuclear Information System (INIS)

    Zhu Xiaodong; Guo Ya; Qu Song; Li Ling; Huang Shiting; Li Danrong; Zhang Wei

    2012-01-01

    Objective: To discover radioresistance associated molecular biomarkers and its mechanism in nasopharyngeal carcinoma by protein-protein interaction network analysis. Methods: Whole genome expression microarray was applied to screen out differentially expressed genes in two cell lines CNE-2R and CNE-2 with different radiosensitivity. Four differentially expressed genes were randomly selected for further verification by the semi-quantitative RT-PCR analysis with self-designed primers. The common differentially expressed genes from two experiments were analyzed with the SNOW online database in order to find out the central node related to the biomarkers of nasopharyngeal carcinoma radioresistance. The expression of STAT1 in CNE-2R and CNE-2 cells was measured by Western blot. Results: Compared with CNE-2 cells, 374 genes in CNE-2R cells were differentially expressed while 197 genes showed significant differences. Four randomly selected differentially expressed genes were verified by RT-PCR and had same change trend in consistent with the results of chip assay. Analysis with the SNOW database demonstrated that those 197 genes could form a complicated interaction network where STAT1 and JUN might be two key nodes. Indeed, the STAT1-α expression in CNE-2R was higher than that in CNE-2 (t=4.96, P<0.05). Conclusions: The key nodes of STAT1 and JUN may be the molecular biomarkers leading to radioresistance in nasopharyngeal carcinoma, and STAT1-α might have close relationship with radioresistance. (authors)

  15. Cyclin B1 Destruction Box-Mediated Protein Instability: The Enhanced Sensitivity of Fluorescent-Protein-Based Reporter Gene System

    Directory of Open Access Journals (Sweden)

    Chao-Hsun Yang

    2013-01-01

    Full Text Available The periodic expression and destruction of several cyclins are the most important steps for the exact regulation of cell cycle. Cyclins are degraded by the ubiquitin-proteasome system during cell cycle. Besides, a short sequence near the N-terminal of cyclin B called the destruction box (D-box; CDB is also required. Fluorescent-protein-based reporter gene system is insensitive to analysis because of the overly stable fluorescent proteins. Therefore, in this study, we use human CDB fused with both enhanced green fluorescent protein (EGFP at C-terminus and red fluorescent protein (RFP, DsRed at N-terminus in the transfected human melanoma cells to examine the effects of CDB on different fluorescent proteins. Our results indicated that CDB-fused fluorescent protein can be used to examine the slight gene regulations in the reporter gene system and have the potential to be the system for screening of functional compounds in the future.

  16. Challenges in biotechnology at LLNL: from genes to proteins; TOPICAL

    International Nuclear Information System (INIS)

    Albala, J S

    1999-01-01

    This effort has undertaken the task of developing a link between the genomics, DNA repair and structural biology efforts within the Biology and Biotechnology Research Program at LLNL. Through the advent of the I.M.A.G.E. (Integrated Molecular Analysis of Genomes and their Expression) Consortium, a world-wide effort to catalog the largest public collection of genes, accepted and maintained within BBRP, it is now possible to systematically express the protein complement of these to further elucidate novel gene function and structure. The work has ensued in four phases, outlined as follows: (1) Gene and System selection; (2) Protein expression and purification; (3) Structural analysis; and (4) biological integration. Proteins to be expressed have been those of high programmatic interest. This includes, in particular, proteins involved in the maintenance of genome integrity, particularly those involved in the repair of DNA damage, including ERCC1, ERCC4, XRCC2, XRCC3, XRCC9, HEX1, APN1, p53, RAD51B, RAD51C, and RAD51. Full-length cDNA cognates of selected genes were isolated, and cloned into baculovirus-based expression vectors. The baculoviral expression system for protein over-expression is now well-established in the Albala laboratory. Procedures have been successfully optimized for full-length cDNA clining into expression vectors for protein expression from recombinant constructs. This includes the reagents, cell lines, techniques necessary for expression of recombinant baculoviral constructs in Spodoptera frugiperda (Sf9) cells. The laboratory has also generated a high-throughput baculoviral expression paradigm for large scale expression and purification of human recombinant proteins amenable to automation

  17. Dynamic gene expression in fish muscle during recovery growth induced by a fasting-refeeding schedule

    Directory of Open Access Journals (Sweden)

    Esquerré Diane

    2007-11-01

    Full Text Available Abstract Background Recovery growth is a phase of rapid growth that is triggered by adequate refeeding of animals following a period of weight loss caused by starvation. In this study, to obtain more information on the system-wide integration of recovery growth in muscle, we undertook a time-course analysis of transcript expression in trout subjected to a food deprivation-refeeding sequence. For this purpose complex targets produced from muscle of trout fasted for one month and from muscle of trout fasted for one month and then refed for 4, 7, 11 and 36 days were hybridized to cDNA microarrays containing 9023 clones. Results Significance analysis of microarrays (SAM and temporal expression profiling led to the segregation of differentially expressed genes into four major clusters. One cluster comprising 1020 genes with high expression in muscle from fasted animals included a large set of genes involved in protein catabolism. A second cluster that included approximately 550 genes with transient induction 4 to 11 days post-refeeding was dominated by genes involved in transcription, ribosomal biogenesis, translation, chaperone activity, mitochondrial production of ATP and cell division. A third cluster that contained 480 genes that were up-regulated 7 to 36 days post-refeeding was enriched with genes involved in reticulum and Golgi dynamics and with genes indicative of myofiber and muscle remodelling such as genes encoding sarcomeric proteins and matrix compounds. Finally, a fourth cluster of 200 genes overexpressed only in 36-day refed trout muscle contained genes with function in carbohydrate metabolism and lipid biosynthesis. Remarkably, among the genes induced were several transcriptional regulators which might be important for the gene-specific transcriptional adaptations that underlie muscle recovery. Conclusion Our study is the first demonstration of a coordinated expression of functionally related genes during muscle recovery growth

  18. Causal and synthetic associations of variants in the SERPINA gene cluster with alpha1-antitrypsin serum levels.

    Directory of Open Access Journals (Sweden)

    Gian Andri Thun

    Full Text Available Several infrequent genetic polymorphisms in the SERPINA1 gene are known to substantially reduce concentration of alpha1-antitrypsin (AAT in the blood. Since low AAT serum levels fail to protect pulmonary tissue from enzymatic degradation, these polymorphisms also increase the risk for early onset chronic obstructive pulmonary disease (COPD. The role of more common SERPINA1 single nucleotide polymorphisms (SNPs in respiratory health remains poorly understood. We present here an agnostic investigation of genetic determinants of circulating AAT levels in a general population sample by performing a genome-wide association study (GWAS in 1392 individuals of the SAPALDIA cohort. Five common SNPs, defined by showing minor allele frequencies (MAFs >5%, reached genome-wide significance, all located in the SERPINA gene cluster at 14q32.13. The top-ranking genotyped SNP rs4905179 was associated with an estimated effect of β = -0.068 g/L per minor allele (P = 1.20*10(-12. But denser SERPINA1 locus genotyping in 5569 participants with subsequent stepwise conditional analysis, as well as exon-sequencing in a subsample (N = 410, suggested that AAT serum level is causally determined at this locus by rare (MAF<1% and low-frequent (MAF 1-5% variants only, in particular by the well-documented protein inhibitor S and Z (PI S, PI Z variants. Replication of the association of rs4905179 with AAT serum levels in the Copenhagen City Heart Study (N = 8273 was successful (P<0.0001, as was the replication of its synthetic nature (the effect disappeared after adjusting for PI S and Z, P = 0.57. Extending the analysis to lung function revealed a more complex situation. Only in individuals with severely compromised pulmonary health (N = 397, associations of common SNPs at this locus with lung function were driven by rarer PI S or Z variants. Overall, our meta-analysis of lung function in ever-smokers does not support a functional role of common SNPs in the SERPINA gene

  19. EST2Prot: Mapping EST sequences to proteins

    Directory of Open Access Journals (Sweden)

    Lin David M

    2006-03-01

    Full Text Available Abstract Background EST libraries are used in various biological studies, from microarray experiments to proteomic and genetic screens. These libraries usually contain many uncharacterized ESTs that are typically ignored since they cannot be mapped to known genes. Consequently, new discoveries are possibly overlooked. Results We describe a system (EST2Prot that uses multiple elements to map EST sequences to their corresponding protein products. EST2Prot uses UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences and protein database searches to detect protein products related to a query EST sequence. Gene Ontology terms, Swiss-Prot keywords, and protein similarity data are used to map the ESTs to functional descriptors. Conclusion EST2Prot extends and significantly enriches the popular UniGene mapping by utilizing multiple relations between known biological entities. It produces a mapping between ESTs and proteins in real-time through a simple web-interface. The system is part of the Biozon database and is accessible at http://biozon.org/tools/est/.

  20. Cloning of the staurosporine biosynthetic gene cluster from Streptomyces sp. TP-A0274 and its heterologous expression in Streptomyces lividans.

    Science.gov (United States)

    Onaka, Hiroyasu; Taniguchi, Shin-ichi; Igarashi, Yasuhiro; Furumai, Tamotsu

    2002-12-01

    Staurosporine is a representative member of indolocarbazole antibiotics. The entire staurosporine biosynthetic and regulatory gene cluster spanning 20-kb was cloned from Streptomyces sp. TP-A0274 and sequenced. The gene cluster consists of 14 ORFs and the amino acid sequence homology search revealed that it contains three genes, staO, staD, and staP, coding for the enzymes involved in the indolocarbazole aglycone biosynthesis, two genes, staG and staN, for the bond formation between the aglycone and deoxysugar, eight genes, staA, staB, staE, staJ, staI, staK, staMA, and staMB, for the deoxysugar biosynthesis and one gene, staR is a transcriptional regulator. Heterologous gene expression of a 38-kb fragment containing a complete set of the biosynthetic genes for staurosporine cloned into pTOYAMAcos confirmed its role in staurosporine biosynthesis. Moreover, the distribution of the gene for chromopyrrolic acid synthase, the key enzyme for the biosynthesis of indolocarbazole aglycone, in actinomycetes was investigated, and rebD homologs were shown to exist only in the strains producing indolocarbazole antibiotics.

  1. Distinct functional domains within the acidic cluster of tegument protein pp28 required for trafficking and cytoplasmic envelopment of human cytomegalovirus.

    Science.gov (United States)

    Seo, Jun-Young; Jeon, Hyejin; Hong, Sookyung; Britt, William J

    2016-10-01

    Human cytomegalovirus UL99-encoded tegument protein pp28 contains a 16 aa acidic cluster that is required for pp28 trafficking to the assembly compartment (AC) and the virus assembly. However, functional signals within the acidic cluster of pp28 remain undefined. Here, we demonstrated that an acidic cluster rather than specific sorting signals was required for trafficking to the AC. Recombinant viruses with chimeric pp28 proteins expressing non-native acidic clusters exhibited delayed viral growth kinetics and decreased production of infectious virus, indicating that the native acidic cluster of pp28 was essential for wild-type virus assembly. These results suggested that the acidic cluster of pp28 has distinct functional domains required for trafficking and for efficient virus assembly. The first half (aa 44-50) of the acidic cluster was sufficient for pp28 trafficking, whereas the native acidic cluster consisting of aa 51-59 was required for the assembly of wild-type levels of infectious virus.

  2. Genetic clustering and polymorphism of the merozoite surface protein-3 of Plasmodium knowlesi clinical isolates from Peninsular Malaysia.

    Science.gov (United States)

    De Silva, Jeremy Ryan; Lau, Yee Ling; Fong, Mun Yik

    2017-01-03

    The simian malaria parasite Plasmodium knowlesi has been reported to cause significant numbers of human infection in South East Asia. Its merozoite surface protein-3 (MSP3) is a protein that belongs to a multi-gene family of proteins first found in Plasmodium falciparum. Several studies have evaluated the potential of P. falciparum MSP3 as a potential vaccine candidate. However, to date no detailed studies have been carried out on P. knowlesi MSP3 gene (pkmsp3). The present study investigates the genetic diversity, and haplotypes groups of pkmsp3 in P. knowlesi clinical samples from Peninsular Malaysia. Blood samples were collected from P. knowlesi malaria patients within a period of 4 years (2008-2012). The pkmsp3 gene of the isolates was amplified via PCR, and subsequently cloned and sequenced. The full length pkmsp3 sequence was divided into Domain A and Domain B. Natural selection, genetic diversity, and haplotypes of pkmsp3 were analysed using MEGA6 and DnaSP ver. 5.10.00 programmes. From 23 samples, 48 pkmsp3 sequences were successfully obtained. At the nucleotide level, 101 synonymous and 238 non-synonymous mutations were observed. Tests of neutrality were not significant for the full length, Domain A or Domain B sequences. However, the dN/dS ratio of Domain B indicates purifying selection for this domain. Analysis of the deduced amino acid sequences revealed 42 different haplotypes. Neighbour Joining phylogenetic tree and haplotype network analyses revealed that the haplotypes clustered into two distinct groups. A moderate level of genetic diversity was observed in the pkmsp3 and only the C-terminal region (Domain B) appeared to be under purifying selection. The separation of the pkmsp3 into two haplotype groups provides further evidence of the existence of two distinct P. knowlesi types or lineages. Future studies should investigate the diversity of pkmsp3 among P. knowlesi isolates in North Borneo, where large numbers of human knowlesi malaria infection

  3. Ribosomal protein gene knockdown causes developmental defects in zebrafish.

    Directory of Open Access Journals (Sweden)

    Tamayo Uechi

    Full Text Available The ribosomal proteins (RPs form the majority of cellular proteins and are mandatory for cellular growth. RP genes have been linked, either directly or indirectly, to various diseases in humans. Mutations in RP genes are also associated with tissue-specific phenotypes, suggesting a possible role in organ development during early embryogenesis. However, it is not yet known how mutations in a particular RP gene result in specific cellular changes, or how RP genes might contribute to human diseases. The development of animal models with defects in RP genes will be essential for studying these questions. In this study, we knocked down 21 RP genes in zebrafish by using morpholino antisense oligos to inhibit their translation. Of these 21, knockdown of 19 RPs resulted in the development of morphants with obvious deformities. Although mutations in RP genes, like other housekeeping genes, would be expected to result in nonspecific developmental defects with widespread phenotypes, we found that knockdown of some RP genes resulted in phenotypes specific to each gene, with varying degrees of abnormality in the brain, body trunk, eyes, and ears at about 25 hours post fertilization. We focused further on the organogenesis of the brain. Each knocked-down gene that affected the morphogenesis of the brain produced a different pattern of abnormality. Among the 7 RP genes whose knockdown produced severe brain phenotypes, 3 human orthologs are located within chromosomal regions that have been linked to brain-associated diseases, suggesting a possible involvement of RP genes in brain or neurological diseases. The RP gene knockdown system developed in this study could be a powerful tool for studying the roles of ribosomes in human diseases.

  4. Gene expression profiles reveal key pathways and genes associated with neuropathic pain in patients with spinal cord injury.

    Science.gov (United States)

    He, Xijing; Fan, Liying; Wu, Zhongheng; He, Jiaxuan; Cheng, Bin

    2017-04-01

    Previous gene expression profiling studies of neuropathic pain (NP) following spinal cord injury (SCI) have predominantly been performed in animal models. The present study aimed to investigate gene alterations in patients with spinal cord injury and to further examine the mechanisms underlying NP following SCI. The GSE69901 gene expression profile was downloaded from the public Gene Expression Omnibus database. Samples of peripheral blood mononuclear cells (PBMCs) derived from 12 patients with intractable NP and 13 control patients without pain were analyzed to identify the differentially expressed genes (DEGs), followed by functional enrichment analysis and protein‑protein interaction (PPI) network construction. In addition, a transcriptional regulation network was constructed and functional gene clustering was performed. A total of 70 upregulated and 61 downregulated DEGs were identified in the PBMC samples from patients with NP. The upregulated and downregulated genes were significantly involved in different Gene Ontology terms and pathways, including focal adhesion, T cell receptor signaling pathway and mitochondrial function. Glycogen synthase kinase 3 β (GSK3B) was identified as a hub protein in the PPI network. In addition, ornithine decarboxylase 1 (ODC1) and ornithine aminotransferase (OAT) were regulated by additional transcription factors in the regulation network. GSK3B, OAT and ODC1 were significantly enriched in two functional gene clusters, the function of mitochondrial membrane and DNA binding. Focal adhesion and the T cell receptor signaling pathway may be significantly linked with NP, and GSK3B, OAT and ODC1 may be potential targets for the treatment of NP.

  5. Gene Clusters for Insecticidal Loline Alkaloids in the Grass-Endophytic Fungus Neotyphodium uncinatum

    OpenAIRE

    Spiering, Martin J.; Moon, Christina D.; Wilkinson, Heather H.; Schardl, Christopher L.

    2005-01-01

    Loline alkaloids are produced by mutualistic fungi symbiotic with grasses, and they protect the host plants from insects. Here we identify in the fungal symbiont, Neotyphodium uncinatum, two homologous gene clusters (LOL-1 and LOL-2) associated with loline-alkaloid production. Nine genes were identified in a 25-kb region of LOL-1 and designated (in order) lolF-1, lolC-1, lolD-1, lolO-1, lolA-1, lolU-1, lolP-1, lolT-1, and lolE-1. LOL-2 contained the homologs lolC-2 through lolE-2 in the same ...

  6. The cell cycle-regulated genes of Schizosaccharomyces pombe.

    Science.gov (United States)

    Oliva, Anna; Rosebrock, Adam; Ferrezuelo, Francisco; Pyne, Saumyadipta; Chen, Haiying; Skiena, Steve; Futcher, Bruce; Leatherwood, Janet

    2005-07-01

    Many genes are regulated as an innate part of the eukaryotic cell cycle, and a complex transcriptional network helps enable the cyclic behavior of dividing cells. This transcriptional network has been studied in Saccharomyces cerevisiae (budding yeast) and elsewhere. To provide more perspective on these regulatory mechanisms, we have used microarrays to measure gene expression through the cell cycle of Schizosaccharomyces pombe (fission yeast). The 750 genes with the most significant oscillations were identified and analyzed. There were two broad waves of cell cycle transcription, one in early/mid G2 phase, and the other near the G2/M transition. The early/mid G2 wave included many genes involved in ribosome biogenesis, possibly explaining the cell cycle oscillation in protein synthesis in S. pombe. The G2/M wave included at least three distinctly regulated clusters of genes: one large cluster including mitosis, mitotic exit, and cell separation functions, one small cluster dedicated to DNA replication, and another small cluster dedicated to cytokinesis and division. S. pombe cell cycle genes have relatively long, complex promoters containing groups of multiple DNA sequence motifs, often of two, three, or more different kinds. Many of the genes, transcription factors, and regulatory mechanisms are conserved between S. pombe and S. cerevisiae. Finally, we found preliminary evidence for a nearly genome-wide oscillation in gene expression: 2,000 or more genes undergo slight oscillations in expression as a function of the cell cycle, although whether this is adaptive, or incidental to other events in the cell, such as chromatin condensation, we do not know.

  7. The Cell Cycle–Regulated Genes of Schizosaccharomyces pombe

    Science.gov (United States)

    Oliva, Anna; Rosebrock, Adam; Ferrezuelo, Francisco; Pyne, Saumyadipta; Chen, Haiying; Skiena, Steve

    2005-01-01

    Many genes are regulated as an innate part of the eukaryotic cell cycle, and a complex transcriptional network helps enable the cyclic behavior of dividing cells. This transcriptional network has been studied in Saccharomyces cerevisiae (budding yeast) and elsewhere. To provide more perspective on these regulatory mechanisms, we have used microarrays to measure gene expression through the cell cycle of Schizosaccharomyces pombe (fission yeast). The 750 genes with the most significant oscillations were identified and analyzed. There were two broad waves of cell cycle transcription, one in early/mid G2 phase, and the other near the G2/M transition. The early/mid G2 wave included many genes involved in ribosome biogenesis, possibly explaining the cell cycle oscillation in protein synthesis in S. pombe. The G2/M wave included at least three distinctly regulated clusters of genes: one large cluster including mitosis, mitotic exit, and cell separation functions, one small cluster dedicated to DNA replication, and another small cluster dedicated to cytokinesis and division. S. pombe cell cycle genes have relatively long, complex promoters containing groups of multiple DNA sequence motifs, often of two, three, or more different kinds. Many of the genes, transcription factors, and regulatory mechanisms are conserved between S. pombe and S. cerevisiae. Finally, we found preliminary evidence for a nearly genome-wide oscillation in gene expression: 2,000 or more genes undergo slight oscillations in expression as a function of the cell cycle, although whether this is adaptive, or incidental to other events in the cell, such as chromatin condensation, we do not know. PMID:15966770

  8. Using SNP genetic markers to elucidate the linkage of the Co-34/Phg-3 anthracnose and angular leaf spot resistance gene cluster with the Ur-14 resistance gene

    Science.gov (United States)

    The Ouro Negro common bean cultivar contains the Co-34/Phg-3 gene cluster that confers resistance to the anthracnose (ANT) and angular leaf spot (ALS) pathogens. These genes are tightly linked on chromosome 4. Ouro Negro also has the Ur-14 rust resistance gene, reportedly in the vicinity of Co- 34; ...

  9. Hypolipidemic effect of dietary pea proteins: Impact on genes regulating hepatic lipid metabolism.

    Science.gov (United States)

    Rigamonti, Elena; Parolini, Cinzia; Marchesi, Marta; Diani, Erika; Brambilla, Stefano; Sirtori, Cesare R; Chiesa, Giulia

    2010-05-01

    Controversial data on the lipid-lowering effect of dietary pea proteins have been provided and the mechanisms behind this effect are not completely understood. The aim of the study was to evaluate a possible hypolipidemic activity of a pea protein isolate and to determine whether pea proteins could affect the hepatic lipid metabolism through regulation of genes involved in cholesterol and fatty acid homeostasis. Rats were fed Nath's hypercholesterolemic diets for 28 days, the protein sources being casein or a pea protein isolate from Pisum sativum. After 14 and 28 days of dietary treatment, rats fed pea proteins had markedly lower plasma cholesterol and triglyceride levels than rats fed casein (pPea protein-fed rats displayed higher hepatic mRNA levels of LDL receptor versus those fed casein (ppea protein-fed rats than in rats fed casein (ppea proteins in rats. Moreover, pea proteins appear to affect cellular lipid homeostasis by upregulating genes involved in hepatic cholesterol uptake and by downregulating fatty acid synthesis genes.

  10. Secretion Trap Tagging of Secreted and Membrane-Spanning Proteins Using Arabidopsis Gene Traps

    Science.gov (United States)

    Andrew T. Groover; Joseph R. Fontana; Juana M. Arroyo; Cristina Yordan; W. Richard McCombie; Robert A. Martienssen

    2003-01-01

    Secreted and membrane-spanning proteins play fundamental roles in plant development but pose challenges for genetic identification and characterization. We describe a "secretion trap" screen for gene trap insertions in genes encoding proteins routed through the secretory pathway. The gene trap transposon encodes a ß-glucuronidase reporter enzyme...

  11. Knock-in of Enhanced Green Fluorescent Protein or/and Human Fibroblast Growth Factor 2 Gene into β-Casein Gene Locus in the Porcine Fibroblasts to Produce Therapeutic Protein.

    Science.gov (United States)

    Lee, Sang Mi; Kim, Ji Woo; Jeong, Young-Hee; Kim, Se Eun; Kim, Yeong Ji; Moon, Seung Ju; Lee, Ji-Hye; Kim, Keun-Jung; Kim, Min-Kyu; Kang, Man-Jong

    2014-11-01

    Transgenic animals have become important tools for the production of therapeutic proteins in the domestic animal. Production efficiencies of transgenic animals by conventional methods as microinjection and retrovirus vector methods are low, and the foreign gene expression levels are also low because of their random integration in the host genome. In this study, we investigated the homologous recombination on the porcine β-casein gene locus using a knock-in vector for the β-casein gene locus. We developed the knock-in vector on the porcine β-casein gene locus and isolated knock-in fibroblast for nuclear transfer. The knock-in vector consisted of the neomycin resistance gene (neo) as a positive selectable marker gene, diphtheria toxin-A gene as negative selection marker, and 5' arm and 3' arm from the porcine β-casein gene. The secretion of enhanced green fluorescent protein (EGFP) was more easily detected in the cell culture media than it was by western blot analysis of cell extract of the HC11 mouse mammary epithelial cells transfected with EGFP knock-in vector. These results indicated that a knock-in system using β-casein gene induced high expression of transgene by the gene regulatory sequence of endogenous β-casein gene. These fibroblasts may be used to produce transgenic pigs for the production of therapeutic proteins via the mammary glands.

  12. Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

    Directory of Open Access Journals (Sweden)

    Ujjwal Maulik

    Full Text Available Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request.sarkar@labri.fr.

  13. Differentially expressed genes in iron-induced prion protein conversion

    International Nuclear Information System (INIS)

    Kim, Minsun; Kim, Eun-hee; Choi, Bo-Ran; Woo, Hee-Jong

    2016-01-01

    The conversion of the cellular prion protein (PrP C ) to the protease-resistant isoform is the key event in chronic neurodegenerative diseases, including transmissible spongiform encephalopathies (TSEs). Increased iron in prion-related disease has been observed due to the prion protein-ferritin complex. Additionally, the accumulation and conversion of recombinant PrP (rPrP) is specifically derived from Fe(III) but not Fe(II). Fe(III)-mediated PK-resistant PrP (PrP res ) conversion occurs within a complex cellular environment rather than via direct contact between rPrP and Fe(III). In this study, differentially expressed genes correlated with prion degeneration by Fe(III) were identified using Affymetrix microarrays. Following Fe(III) treatment, 97 genes were differentially expressed, including 85 upregulated genes and 12 downregulated genes (≥1.5-fold change in expression). However, Fe(II) treatment produced moderate alterations in gene expression without inducing dramatic alterations in gene expression profiles. Moreover, functional grouping of identified genes indicated that the differentially regulated genes were highly associated with cell growth, cell maintenance, and intra- and extracellular transport. These findings showed that Fe(III) may influence the expression of genes involved in PrP folding by redox mechanisms. The identification of genes with altered expression patterns in neural cells may provide insights into PrP conversion mechanisms during the development and progression of prion-related diseases. - Highlights: • Differential genes correlated with prion degeneration by Fe(III) were identified. • Genes were identified in cell proliferation and intra- and extracellular transport. • In PrP degeneration, redox related genes were suggested. • Cbr2, Rsad2, Slc40a1, Amph and Mvd were expressed significantly.

  14. Identification of the chelocardin biosynthetic gene cluster from Amycolatopsis sulphurea: a platform for producing novel tetracycline antibiotics.

    Science.gov (United States)

    Lukežič, Tadeja; Lešnik, Urška; Podgoršek, Ajda; Horvat, Jaka; Polak, Tomaž; Šala, Martin; Jenko, Branko; Raspor, Peter; Herron, Paul R; Hunter, Iain S; Petković, Hrvoje

    2013-12-01

    Tetracyclines (TCs) are medically important antibiotics from the polyketide family of natural products. Chelocardin (CHD), produced by Amycolatopsis sulphurea, is a broad-spectrum tetracyclic antibiotic with potent bacteriolytic activity against a number of Gram-positive and Gram-negative multi-resistant pathogens. CHD has an unknown mode of action that is different from TCs. It has some structural features that define it as 'atypical' and, notably, is active against tetracycline-resistant pathogens. Identification and characterization of the chelocardin biosynthetic gene cluster from A. sulphurea revealed 18 putative open reading frames including a type II polyketide synthase. Compared to typical TCs, the chd cluster contains a number of features that relate to its classification as 'atypical': an additional gene for a putative two-component cyclase/aromatase that may be responsible for the different aromatization pattern, a gene for a putative aminotransferase for C-4 with the opposite stereochemistry to TCs and a gene for a putative C-9 methylase that is a unique feature of this biosynthetic cluster within the TCs. Collectively, these enzymes deliver a molecule with different aromatization of ring C that results in an unusual planar structure of the TC backbone. This is a likely contributor to its different mode of action. In addition CHD biosynthesis is primed with acetate, unlike the TCs, which are primed with malonamate, and offers a biosynthetic engineering platform that represents a unique opportunity for efficient generation of novel tetracyclic backbones using combinatorial biosynthesis.

  15. Bidirectional gene sequences with similar homology to functional proteins of alkane degrading bacterium pseudomonas fredriksbergensis DNA

    International Nuclear Information System (INIS)

    Megeed, A.A.

    2011-01-01

    The potential for two overlapping fragments of DNA from a clone of newly isolated alkanes degrading bacterium Pseudomonas frederiksbergensis encoding sequences with similar homology to two parts of functional proteins is described. One strand contains a sequence with high homology to alkanes monooxygenase (alkB), a member of the alkanes hydroxylase family, and the other strand contains a sequence with some homology to alcohol dehydrogenase gene (alkJ). Overlapping of the genes on opposite strands has been reported in eukaryotic species, and is now reported in a bacterial species. The sequence comparisons and ORFS results revealed that the regulation and the genes organization involved in alkane oxidation represented in Pseudomonas frederiksberghensis varies among the different known alkane degrading bacteria. The alk gene cluster containing homologues to the known alkane monooxygenase (alkB), and rubredoxin (alkG) are oriented in the same direction, whereas alcohol dehydrogenase (alkJ) is oriented in the opposite direction. Such genomes encode messages on both strands of the DNA, or in an overlapping but different reading frames, of the same strand of DNA. The possibility of creating novel genes from pre-existing sequences, known as overprinting, which is a widespread phenomenon in small viruses. Here, the origin and evolution of the gene overlap to bacteriophages belonging to the family Microviridae have been investigated. Such a phenomenon is most widely described in extremely small genomes such as those of viruses or small plasmids, yet here is a unique phenomenon. (author)

  16. Detection of Locally Over-Represented GO Terms in Protein-Protein Interaction Networks

    Science.gov (United States)

    LAVALLÉE-ADAM, MATHIEU; COULOMBE, BENOIT; BLANCHETTE, MATHIEU

    2015-01-01

    High-throughput methods for identifying protein-protein interactions produce increasingly complex and intricate interaction networks. These networks are extremely rich in information, but extracting biologically meaningful hypotheses from them and representing them in a human-readable manner is challenging. We propose a method to identify Gene Ontology terms that are locally over-represented in a subnetwork of a given biological network. Specifically, we propose several methods to evaluate the degree of clustering of proteins associated to a particular GO term in both weighted and unweighted PPI networks, and describe efficient methods to estimate the statistical significance of the observed clustering. We show, using Monte Carlo simulations, that our best approximation methods accurately estimate the true p-value, for random scale-free graphs as well as for actual yeast and human networks. When applied to these two biological networks, our approach recovers many known complexes and pathways, but also suggests potential functions for many subnetworks. Online Supplementary Material is available at www.liebertonline.com. PMID:20377456

  17. Census of solo LuxR genes in prokaryotic genomes.

    Science.gov (United States)

    Hudaiberdiev, Sanjarbek; Choudhary, Kumari S; Vera Alvarez, Roberto; Gelencsér, Zsolt; Ligeti, Balázs; Lamba, Doriano; Pongor, Sándor

    2015-01-01

    luxR genes encode transcriptional regulators that control acyl homoserine lactone-based quorum sensing (AHL QS) in Gram negative bacteria. On the bacterial chromosome, luxR genes are usually found next or near to a luxI gene encoding the AHL signal synthase. Recently, a number of luxR genes were described that have no luxI genes in their vicinity on the chromosome. These so-called solo luxR genes may either respond to internal AHL signals produced by a non-adjacent luxI in the chromosome, or can respond to exogenous signals. Here we present a survey of solo luxR genes found in complete and draft bacterial genomes in the NCBI databases using HMMs. We found that 2698 of the 3550 luxR genes found are solos, which is an unexpectedly high number even if some of the hits may be false positives. We also found that solo LuxR sequences form distinct clusters that are different from the clusters of LuxR sequences that are part of the known luxR-luxI topological arrangements. We also found a number of cases that we termed twin luxR topologies, in which two adjacent luxR genes were in tandem or divergent orientation. Many of the luxR solo clusters were devoid of the sequence motifs characteristic of AHL binding LuxR proteins so there is room to speculate that the solos may be involved in sensing hitherto unknown signals. It was noted that only some of the LuxR clades are rich in conserved cysteine residues. Molecular modeling suggests that some of the cysteines may be involved in disulfide formation, which makes us speculate that some LuxR proteins, including some of the solos may be involved in redox regulation.

  18. Classification of protein profiles using fuzzy clustering techniques

    DEFF Research Database (Denmark)

    Karemore, Gopal; Mullick, Jhinuk B.; Sujatha, R.

    2010-01-01

     Present  study  has  brought  out  a  comparison  of PCA  and  fuzzy  clustering  techniques  in  classifying  protein profiles  (chromatogram)  of  homogenates  of  different  tissue origins:  Ovarian,  Cervix,  Oral  cancers,  which  were  acquired using HPLC–LIF (High Performance Liquid...... Chromatography- Laser   Induced   Fluorescence)   method   developed   in   our laboratory. Study includes 11 chromatogram spectra each from oral,  cervical,  ovarian  cancers  as  well  as  healthy  volunteers. Generally  multivariate  analysis  like  PCA  demands  clear  data that   is   devoid   of   day......   PCA   mapping   in   classifying   various cancers from healthy spectra with classification rate up to 95 % from  60%.  Methods  are  validated  using  various  clustering indexes   and   shows   promising   improvement   in   developing optical pathology like HPLC-LIF for early detection of various...

  19. A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data

    Directory of Open Access Journals (Sweden)

    Li Min

    2012-03-01

    Full Text Available Abstract Background Identification of essential proteins is always a challenging task since it requires experimental approaches that are time-consuming and laborious. With the advances in high throughput technologies, a large number of protein-protein interactions are available, which have produced unprecedented opportunities for detecting proteins' essentialities from the network level. There have been a series of computational approaches proposed for predicting essential proteins based on network topologies. However, the network topology-based centrality measures are very sensitive to the robustness of network. Therefore, a new robust essential protein discovery method would be of great value. Results In this paper, we propose a new centrality measure, named PeC, based on the integration of protein-protein interaction and gene expression data. The performance of PeC is validated based on the protein-protein interaction network of Saccharomyces cerevisiae. The experimental results show that the predicted precision of PeC clearly exceeds that of the other fifteen previously proposed centrality measures: Degree Centrality (DC, Betweenness Centrality (BC, Closeness Centrality (CC, Subgraph Centrality (SC, Eigenvector Centrality (EC, Information Centrality (IC, Bottle Neck (BN, Density of Maximum Neighborhood Component (DMNC, Local Average Connectivity-based method (LAC, Sum of ECC (SoECC, Range-Limited Centrality (RL, L-index (LI, Leader Rank (LR, Normalized α-Centrality (NC, and Moduland-Centrality (MC. Especially, the improvement of PeC over the classic centrality measures (BC, CC, SC, EC, and BN is more than 50% when predicting no more than 500 proteins. Conclusions We demonstrate that the integration of protein-protein interaction network and gene expression data can help improve the precision of predicting essential proteins. The new centrality measure, PeC, is an effective essential protein discovery method.

  20. The first report of prion-related protein gene (PRNT) polymorphisms in goat.

    Science.gov (United States)

    Kim, Yong-Chan; Jeong, Byung-Hoon

    2017-06-01

    Prion protein is encoded by the prion protein gene (PRNP). Polymorphisms of several members of the prion gene family have shown association with prion diseases in several species. Recent studies on a novel member of the prion gene family in rams have shown that prion-related protein gene (PRNT) has a linkage with codon 26 of prion-like protein (PRND). In a previous study, codon 26 polymorphism of PRND has shown connection with PRNP haplotype which is strongly associated with scrapie vulnerability. In addition, the genotype of a single nucleotide polymorphism (SNP) at codon 26 of PRND is related to fertilisation capacity. These findings necessitate studies on the SNP of PRNT gene which is connected with PRND. In goat, several polymorphism studies have been performed for PRNP, PRND, and shadow of prion protein gene (SPRN). However, polymorphism on PRNT has not been reported. Hence, the objective of this study was to determine the genotype and allelic distribution of SNPs of PRNT in 238 Korean native goats and compare PRNT DNA sequences between Korean native goats and several ruminant species. A total of five SNPs, including PRNT c.-114G > T, PRNT c.-58A > G in the upstream of PRNT gene, PRNT c.71C > T (p.Ala24Val) and PRNT c.102G > A in the open reading frame (ORF) and c.321C > T in the downstream of PRNT gene, were found in this study. All five SNPs of caprine PRNT gene in Korean native goat are in complete linkage disequilibrium (LD) with a D' value of 1.0. Interestingly, comparative sequence analysis of the PRNT gene revealed five mismatches between DNA sequences of Korean native goats and those of goats deposited in the GenBank. Korean native black goats also showed 5 mismatches in PRNT ORF with cattle. To the best of our knowledge, this is the first genetic research of the PRNT gene in goat.