WorldWideScience

Sample records for identify gene clusters

  1. Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.

    Science.gov (United States)

    Maulik, Ujjwal; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2009-01-20

    The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. Using microarray data sets, clustering algorithms have been actively utilized in order to identify groups of co-expressed genes. This article poses the problem of fuzzy clustering in microarray data as a multiobjective optimization problem which simultaneously optimizes two internal fuzzy cluster validity indices to yield a set of Pareto-optimal clustering solutions. Each of these clustering solutions possesses some amount of information regarding the clustering structure of the input data. Motivated by this fact, a novel fuzzy majority voting approach is proposed to combine the clustering information from all the solutions in the resultant Pareto-optimal set. This approach first identifies the genes which are assigned to some particular cluster with high membership degree by most of the Pareto-optimal solutions. Using this set of genes as the training set, the remaining genes are classified by a supervised learning algorithm. In this work, we have used a Support Vector Machine (SVM) classifier for this purpose. The performance of the proposed clustering technique has been demonstrated on five publicly available benchmark microarray data sets, viz., Yeast Sporulation, Yeast Cell Cycle, Arabidopsis Thaliana, Human Fibroblasts Serum and Rat Central Nervous System. Comparative studies of the use of different SVM kernels and several widely used microarray clustering techniques are reported. Moreover, statistical significance tests have been carried out to establish the statistical superiority of the proposed clustering approach. Finally, biological significance tests have been carried out using a web based gene annotation tool to show that the proposed method is able to produce biologically relevant clusters of co

  2. Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes

    Directory of Open Access Journals (Sweden)

    Bandyopadhyay Sanghamitra

    2009-01-01

    Full Text Available Abstract Background The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. Using microarray data sets, clustering algorithms have been actively utilized in order to identify groups of co-expressed genes. This article poses the problem of fuzzy clustering in microarray data as a multiobjective optimization problem which simultaneously optimizes two internal fuzzy cluster validity indices to yield a set of Pareto-optimal clustering solutions. Each of these clustering solutions possesses some amount of information regarding the clustering structure of the input data. Motivated by this fact, a novel fuzzy majority voting approach is proposed to combine the clustering information from all the solutions in the resultant Pareto-optimal set. This approach first identifies the genes which are assigned to some particular cluster with high membership degree by most of the Pareto-optimal solutions. Using this set of genes as the training set, the remaining genes are classified by a supervised learning algorithm. In this work, we have used a Support Vector Machine (SVM classifier for this purpose. Results The performance of the proposed clustering technique has been demonstrated on five publicly available benchmark microarray data sets, viz., Yeast Sporulation, Yeast Cell Cycle, Arabidopsis Thaliana, Human Fibroblasts Serum and Rat Central Nervous System. Comparative studies of the use of different SVM kernels and several widely used microarray clustering techniques are reported. Moreover, statistical significance tests have been carried out to establish the statistical superiority of the proposed clustering approach. Finally, biological significance tests have been carried out using a web based gene annotation tool to show that the proposed method is able to

  3. Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles

    Directory of Open Access Journals (Sweden)

    Lee Yun-Shien

    2008-03-01

    Full Text Available Abstract Background The hierarchical clustering tree (HCT with a dendrogram 1 and the singular value decomposition (SVD with a dimension-reduced representative map 2 are popular methods for two-way sorting the gene-by-array matrix map employed in gene expression profiling. While HCT dendrograms tend to optimize local coherent clustering patterns, SVD leading eigenvectors usually identify better global grouping and transitional structures. Results This study proposes a flipping mechanism for a conventional agglomerative HCT using a rank-two ellipse (R2E, an improved SVD algorithm for sorting purpose seriation by Chen 3 as an external reference. While HCTs always produce permutations with good local behaviour, the rank-two ellipse seriation gives the best global grouping patterns and smooth transitional trends. The resulting algorithm automatically integrates the desirable properties of each method so that users have access to a clustering and visualization environment for gene expression profiles that preserves coherent local clusters and identifies global grouping trends. Conclusion We demonstrate, through four examples, that the proposed method not only possesses better numerical and statistical properties, it also provides more meaningful biomedical insights than other sorting algorithms. We suggest that sorted proximity matrices for genes and arrays, in addition to the gene-by-array expression matrix, can greatly aid in the search for comprehensive understanding of gene expression structures. Software for the proposed methods can be obtained at http://gap.stat.sinica.edu.tw/Software/GAP.

  4. A phase synchronization clustering algorithm for identifying interesting groups of genes from cell cycle expression data

    Directory of Open Access Journals (Sweden)

    Tcha Hong

    2008-01-01

    Full Text Available Abstract Background The previous studies of genome-wide expression patterns show that a certain percentage of genes are cell cycle regulated. The expression data has been analyzed in a number of different ways to identify cell cycle dependent genes. In this study, we pose the hypothesis that cell cycle dependent genes are considered as oscillating systems with a rhythm, i.e. systems producing response signals with period and frequency. Therefore, we are motivated to apply the theory of multivariate phase synchronization for clustering cell cycle specific genome-wide expression data. Results We propose the strategy to find groups of genes according to the specific biological process by analyzing cell cycle specific gene expression data. To evaluate the propose method, we use the modified Kuramoto model, which is a phase governing equation that provides the long-term dynamics of globally coupled oscillators. With this equation, we simulate two groups of expression signals, and the simulated signals from each group shares their own common rhythm. Then, the simulated expression data are mixed with randomly generated expression data to be used as input data set to the algorithm. Using these simulated expression data, it is shown that the algorithm is able to identify expression signals that are involved in the same oscillating process. We also evaluate the method with yeast cell cycle expression data. It is shown that the output clusters by the proposed algorithm include genes, which are closely associated with each other by sharing significant Gene Ontology terms of biological process and/or having relatively many known biological interactions. Therefore, the evaluation analysis indicates that the method is able to identify expression signals according to the specific biological process. Our evaluation analysis also indicates that some portion of output by the proposed algorithm is not obtainable by the traditional clustering algorithm with

  5. Identifying driving gene clusters in complex diseases through critical transition theory

    Science.gov (United States)

    Wolanyk, Nathaniel; Wang, Xujing; Hessner, Martin; Gao, Shouguo; Chen, Ye; Jia, Shuang

    A novel approach of looking at the human body using critical transition theory has yielded positive results: clusters of genes that act in tandem to drive complex disease progression. This cluster of genes can be thought of as the first part of a large genetic force that pushes the body from a curable, but sick, point to an incurable diseased point through a catastrophic bifurcation. The data analyzed is time course microarray blood assay data of 7 high risk individuals for Type 1 Diabetes who progressed into a clinical onset, with an additional larger study requested to be presented at the conference. The normalized data is 25,000 genes strong, which were narrowed down based on statistical metrics, and finally a machine learning algorithm using critical transition metrics found the driving network. This approach was created to be repeatable across multiple complex diseases with only progression time course data needed so that it would be applicable to identifying when an individual is at risk of developing a complex disease. Thusly, preventative measures can be enacted, and in the longer term, offers a possible solution to prevent all Type 1 Diabetes.

  6. Genomic characterization of a new endophytic Streptomyces kebangsaanensis identifies biosynthetic pathway gene clusters for novel phenazine antibiotic production

    Directory of Open Access Journals (Sweden)

    Juwairiah Remali

    2017-11-01

    Full Text Available Background Streptomyces are well known for their capability to produce many bioactive secondary metabolites with medical and industrial importance. Here we report a novel bioactive phenazine compound, 6-((2-hydroxy-4-methoxyphenoxy carbonyl phenazine-1-carboxylic acid (HCPCA extracted from Streptomyces kebangsaanensis, an endophyte isolated from the ethnomedicinal Portulaca oleracea. Methods The HCPCA chemical structure was determined using nuclear magnetic resonance spectroscopy. We conducted whole genome sequencing for the identification of the gene cluster(s believed to be responsible for phenazine biosynthesis in order to map its corresponding pathway, in addition to bioinformatics analysis to assess the potential of S. kebangsaanensis in producing other useful secondary metabolites. Results The S. kebangsaanensis genome comprises an 8,328,719 bp linear chromosome with high GC content (71.35% consisting of 12 rRNA operons, 81 tRNA, and 7,558 protein coding genes. We identified 24 gene clusters involved in polyketide, nonribosomal peptide, terpene, bacteriocin, and siderophore biosynthesis, as well as a gene cluster predicted to be responsible for phenazine biosynthesis. Discussion The HCPCA phenazine structure was hypothesized to derive from the combination of two biosynthetic pathways, phenazine-1,6-dicarboxylic acid and 4-methoxybenzene-1,2-diol, originated from the shikimic acid pathway. The identification of a biosynthesis pathway gene cluster for phenazine antibiotics might facilitate future genetic engineering design of new synthetic phenazine antibiotics. Additionally, these findings confirm the potential of S. kebangsaanensis for producing various antibiotics and secondary metabolites.

  7. Gene cluster statistics with gene families.

    Science.gov (United States)

    Raghupathy, Narayanan; Durand, Dannie

    2009-05-01

    Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such "gene clusters" is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data

  8. Application of bi-clustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials

    Directory of Open Access Journals (Sweden)

    Andrew Williams

    2017-12-01

    Full Text Available This article contains data related to the research article ‘Application of bi-clustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials’ (Williams and Halappanavar, 2015 [1]. The presence of diverse types of nanomaterials (NMs in commerce has grown significantly in the past decade and as a result, human exposure to these materials in the environment is inevitable. The traditional toxicity testing approaches that are reliant on animals are both time- and cost- intensive; employing which, it is not possible to complete the challenging task of safety assessment of NMs currently on the market in a timely manner. Thus, there is an urgent need for comprehensive understanding of the biological behavior of NMs, and efficient toxicity screening tools that will enable the development of predictive toxicology paradigms suited to rapidly assessing the human health impacts of exposure to NMs. In an effort to predict the long term health impacts of acute exposure to NMs, in Williams and Halappanavar (2015 [1], we applied bi-clustering and gene set enrichment analysis methods to derive essential features of altered lung transcriptome following exposure to NMs that are associated with lung-specific diseases. Several datasets from public microarray repositories describing pulmonary diseases in mouse models following exposure to a variety of substances were examined and functionally related bi-clusters showing similar gene expression profiles were identified. The identified bi-clusters were then used to conduct a gene set enrichment analysis on lung gene expression profiles derived from mice exposed to nano-titanium dioxide, carbon black or carbon nanotubes (nano-TiO2, CB and CNTs to determine the disease significance of these data-driven gene sets. The results of the analysis correctly identified all NMs to be inflammogenic, and only CB and CNTs as potentially fibrogenic. Here, we

  9. Antibiotic discovery throughout the Small World Initiative: A molecular strategy to identify biosynthetic gene clusters involved in antagonistic activity.

    Science.gov (United States)

    Davis, Elizabeth; Sloan, Tyler; Aurelius, Krista; Barbour, Angela; Bodey, Elijah; Clark, Brigette; Dennis, Celeste; Drown, Rachel; Fleming, Megan; Humbert, Allison; Glasgo, Elizabeth; Kerns, Trent; Lingro, Kelly; McMillin, MacKenzie; Meyer, Aaron; Pope, Breanna; Stalevicz, April; Steffen, Brittney; Steindl, Austin; Williams, Carolyn; Wimberley, Carmen; Zenas, Robert; Butela, Kristen; Wildschutte, Hans

    2017-06-01

    The emergence of bacterial pathogens resistant to all known antibiotics is a global health crisis. Adding to this problem is that major pharmaceutical companies have shifted away from antibiotic discovery due to low profitability. As a result, the pipeline of new antibiotics is essentially dry and many bacteria now resist the effects of most commonly used drugs. To address this global health concern, citizen science through the Small World Initiative (SWI) was formed in 2012. As part of SWI, students isolate bacteria from their local environments, characterize the strains, and assay for antibiotic production. During the 2015 fall semester at Bowling Green State University, students isolated 77 soil-derived bacteria and genetically characterized strains using the 16S rRNA gene, identified strains exhibiting antagonistic activity, and performed an expanded SWI workflow using transposon mutagenesis to identify a biosynthetic gene cluster involved in toxigenic compound production. We identified one mutant with loss of antagonistic activity and through subsequent whole-genome sequencing and linker-mediated PCR identified a 24.9 kb biosynthetic gene locus likely involved in inhibitory activity in that mutant. Further assessment against human pathogens demonstrated the inhibition of Bacillus cereus, Listeria monocytogenes, and methicillin-resistant Staphylococcus aureus in the presence of this compound, thus supporting our molecular strategy as an effective research pipeline for SWI antibiotic discovery and genetic characterization. © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  10. A genetic and genomic analysis identifies a cluster of genes associated with hematopoietic cell turnover

    NARCIS (Netherlands)

    de Haan, G; Bystrykh, LV; Weersing, E; Dontje, B; Geiger, H; Ivanova, N; Lemischka, IR; Vellenga, E; Van Zant, G

    2002-01-01

    Hematopoietic stem cells from different strains of mice vary widely with respect to their cell cycle activity. In the present study we used complementary genetic and genomic approaches to identify molecular pathways affecting this complex trait. We identified a major quantitative trait locus (QTL)

  11. A genome-wide association study of the maize hypersensitive defense response identifies genes that cluster in related pathways.

    Directory of Open Access Journals (Sweden)

    Bode A Olukolu

    2014-08-01

    Full Text Available Much remains unknown of molecular events controlling the plant hypersensitive defense response (HR, a rapid localized cell death that limits pathogen spread and is mediated by resistance (R- genes. Genetic control of the HR is hard to quantify due to its microscopic and rapid nature. Natural modifiers of the ectopic HR phenotype induced by an aberrant auto-active R-gene (Rp1-D21, were mapped in a population of 3,381 recombinant inbred lines from the maize nested association mapping population. Joint linkage analysis was conducted to identify 32 additive but no epistatic quantitative trait loci (QTL using a linkage map based on more than 7000 single nucleotide polymorphisms (SNPs. Genome-wide association (GWA analysis of 26.5 million SNPs was conducted after adjusting for background QTL. GWA identified associated SNPs that colocalized with 44 candidate genes. Thirty-six of these genes colocalized within 23 of the 32 QTL identified by joint linkage analysis. The candidate genes included genes predicted to be in involved programmed cell death, defense response, ubiquitination, redox homeostasis, autophagy, calcium signalling, lignin biosynthesis and cell wall modification. Twelve of the candidate genes showed significant differential expression between isogenic lines differing for the presence of Rp1-D21. Low but significant correlations between HR-related traits and several previously-measured disease resistance traits suggested that the genetic control of these traits was substantially, though not entirely, independent. This study provides the first system-wide analysis of natural variation that modulates the HR response in plants.

  12. A Genome-Wide Association Study of the Maize Hypersensitive Defense Response Identifies Genes That Cluster in Related Pathways

    Science.gov (United States)

    Venkata, Bala P.; Marla, Sandeep; Ji, Jiabing; Gachomo, Emma; Chu, Kevin; Negeri, Adisu; Benson, Jacqueline; Nelson, Rebecca; Bradbury, Peter; Nielsen, Dahlia; Holland, James B.; Balint-Kurti, Peter J.; Johal, Gurmukh

    2014-01-01

    Much remains unknown of molecular events controlling the plant hypersensitive defense response (HR), a rapid localized cell death that limits pathogen spread and is mediated by resistance (R-) genes. Genetic control of the HR is hard to quantify due to its microscopic and rapid nature. Natural modifiers of the ectopic HR phenotype induced by an aberrant auto-active R-gene (Rp1-D21), were mapped in a population of 3,381 recombinant inbred lines from the maize nested association mapping population. Joint linkage analysis was conducted to identify 32 additive but no epistatic quantitative trait loci (QTL) using a linkage map based on more than 7000 single nucleotide polymorphisms (SNPs). Genome-wide association (GWA) analysis of 26.5 million SNPs was conducted after adjusting for background QTL. GWA identified associated SNPs that colocalized with 44 candidate genes. Thirty-six of these genes colocalized within 23 of the 32 QTL identified by joint linkage analysis. The candidate genes included genes predicted to be in involved programmed cell death, defense response, ubiquitination, redox homeostasis, autophagy, calcium signalling, lignin biosynthesis and cell wall modification. Twelve of the candidate genes showed significant differential expression between isogenic lines differing for the presence of Rp1-D21. Low but significant correlations between HR-related traits and several previously-measured disease resistance traits suggested that the genetic control of these traits was substantially, though not entirely, independent. This study provides the first system-wide analysis of natural variation that modulates the HR response in plants. PMID:25166276

  13. Neural networks and Fuzzy clustering methods for assessing the efficacy of microarray based intrinsic gene signatures in breast cancer classification and the character and relations of identified subtypes.

    Science.gov (United States)

    Samarasinghe, Sandhya; Chaiboonchoe, Amphun

    2015-01-01

    In the classification of breast cancer subtypes using microarray data, hierarchical clustering is commonly used. Although this form of clustering shows basic cluster patterns, more needs to be done to investigate the accuracy of clusters as well as to extract meaningful cluster characteristics and their relations to increase our confidence in their use in a clinical setting. In this study, an in-depth investigation of the efficacy of three reported gene subsets in distinguishing breast cancer subtypes was performed using four advanced computational intelligence methods-Self-Organizing Maps (SOM), Emergent Self-Organizing Maps (ESOM), Fuzzy Clustering by Local Approximation of Memberships (FLAME), and Fuzzy C-means (FCM)-each differing in the way they view data in terms of distance measures and fuzzy or crisp clustering. The gene subsets consisted of 71, 93, and 71 genes reported in the literature from three comprehensive experimental studies for distinguishing Luminal (A and B), Basal, Normal breast-like, and HER2 subtypes. Given the costly procedures involved in clinical studies, the proposed 93-gene set can be used for preliminary classification of breast cancer. Then, as a decision aid, SOM can be used to map the gene signature of a new patient to locate them with respect to all subtypes to get a comprehensive view of the classification. These can be followed by a deeper investigation in the light of the observations made in this study regarding overlapping subtypes. Results from the study could be used as the base for further refining the gene signatures from later experiments and from new experiments designed to separate overlapping clusters as well as to maximally separate all clusters.

  14. Gene ordering in partitive clustering using microarray expressions

    Indian Academy of Sciences (India)

    PRAKASH KUMAR

    fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS ...

  15. FunGeneClusterS

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla; Brandl, Julian; Andersen, Mikael Rørdam

    2016-01-01

    and industrial biotechnology applications. We have previously published a method for accurate prediction of clusters from genome and transcriptome data, which could also suggest cross-chemistry, however, this method was limited both in the number of parameters which could be adjusted as well as in user......Secondary metabolites of fungi are receiving an increasing amount of interest due to their prolific bioactivities and the fact that fungal biosynthesis of secondary metabolites often occurs from co-regulated and co-located gene clusters. This makes the gene clusters attractive for synthetic biology...

  16. Fuzzy clustering analysis of osteosarcoma related genes.

    Science.gov (United States)

    Chen, Kai; Wu, Dajiang; Bai, Yushu; Zhu, Xiaodong; Chen, Ziqiang; Wang, Chuanfeng; Zhao, Yingchuan; Li, Ming

    2014-07-01

    Osteosarcoma is the most common malignant bone-tumor with a peak manifestation during the second and third decade of life. In order to explore the influence of genetic factors on the mechanism of osteosarcoma by analyzing the inter relationship between osteosarcoma and its related genes, and then provide potential genetic references for the prevention, diagnosis and treatment of osteosarcoma, we collected osteosarcoma related gene sequences in Genebank of National Center for Biotechnology Information (NCBI) and local alignment analysis for a pair of sequences was carried out to identify the measurement association among related sequences. Then fuzzy clustering method was used for clustering analysis so as to contact the unknown genes through the consistent osteosarcoma related genes in one class. From the result of fuzzy clustering analysis, we could classify the osteosarcoma related genes into two groups and deduced that the genes clustered into one group had similar function. Based on this knowledge, we found more genes related to the pathogenesis of osteosarcoma and these genes could exert similar function as Runx2, a risk factor confirmed in osteosarcoma, this study may help better understand the genetic mechanism and provide new molecular markers and therapies for osteosarcoma.

  17. Microarray gene cluster identification and annotation through cluster ensemble and EM-based informative textual summarization.

    Science.gov (United States)

    Hu, Xiaohua; Park, E K; Zhang, Xiaodan

    2009-09-01

    Generating high-quality gene clusters and identifying the underlying biological mechanism of the gene clusters are the important goals of clustering gene expression analysis. To get high-quality cluster results, most of the current approaches rely on choosing the best cluster algorithm, in which the design biases and assumptions meet the underlying distribution of the dataset. There are two issues for this approach: 1) usually, the underlying data distribution of the gene expression datasets is unknown and 2) there are so many clustering algorithms available and it is very challenging to choose the proper one. To provide a textual summary of the gene clusters, the most explored approach is the extractive approach that essentially builds upon techniques borrowed from the information retrieval, in which the objective is to provide terms to be used for query expansion, and not to act as a stand-alone summary for the entire document sets. Another drawback is that the clustering quality and cluster interpretation are treated as two isolated research problems and are studied separately. In this paper, we design and develop a unified system Gene Expression Miner to address these challenging issues in a principled and general manner by integrating cluster ensemble, text clustering, and multidocument summarization and provide an environment for comprehensive gene expression data analysis. We present a novel cluster ensemble approach to generate high-quality gene cluster. In our text summarization module, given a gene cluster, our expectation-maximization based algorithm can automatically identify subtopics and extract most probable terms for each topic. Then, the extracted top k topical terms from each subtopic are combined to form the biological explanation of each gene cluster. Experimental results demonstrate that our system can obtain high-quality clusters and provide informative key terms for the gene clusters.

  18. Pyrosequencing-based analysis reveals a novel capsular gene cluster in a KPC-producing Klebsiella pneumoniae clinical isolate identified in Brazil

    Directory of Open Access Journals (Sweden)

    Ramos Pablo Ivan

    2012-08-01

    Full Text Available Abstract Background An important virulence factor of Klebsiella pneumoniae is the production of capsular polysaccharide (CPS, a thick mucus layer that allows for evasion of the host's defense and creates a barrier against antibacterial peptides. CPS production is driven mostly by the expression of genes located in a locus called cps, and the resulting structure is used to distinguish between different serotypes (K types. In this study, we report the unique genetic organization of the cps cluster from K. pneumoniae Kp13, a clinical isolate recovered during a large outbreak of nosocomial infections that occurred in a Brazilian teaching hospital. Results A pyrosequencing-based approach showed that the cps region of Kp13 (cpsKp13 is 26.4 kbp in length and contains genes common, although not universal, to other strains, such as the rmlBADC operon that codes for L-rhamnose synthesis. cpsKp13 also presents some unique features, like the inversion of the wzy gene and a unique repertoire of glycosyltransferases. In silico comparison of cpsKp13 RFLP pattern with 102 previously published cps PCR-RFLP patterns showed that cpsKp13 is distinct from the C patterns of all other K serotypes. Furthermore, in vitro serotyping showed only a weak reaction with capsular types K9 and K34. We confirm that K9 cps shares common genes with cpsKp13 such as the rmlBADC operon, but lacks features like uge and Kp13-specific glycosyltransferases, while K34 capsules contain three of the five sugars that potentially form the Kp13 CPS. Conclusions We report the first description of a cps cluster from a Brazilian clinical isolate of a KPC-producing K. pneumoniae. The gathered data including K-serotyping support that Kp13’s K-antigen belongs to a novel capsular serotype. The CPS of Kp13 probably includes L-rhamnose and D-galacturonate in its structure, among other residues. Because genes involved in L-rhamnose biosynthesis are absent in humans, this pathway may represent

  19. Personalized medicine for mucositis: Bayesian networks identify unique gene clusters which predict the response to gamma-D-glutamyl-L-tryptophan (SCV-07) for the attenuation of chemoradiation-induced oral mucositis.

    Science.gov (United States)

    Alterovitz, Gil; Tuthill, Cynthia; Rios, Israel; Modelska, Katharina; Sonis, Stephen

    2011-10-01

    Gamma-D-glutamyl-L-tryptophan (SCV-07) demonstrated an overall efficacy signal in ameliorating oral mucositis (OM) in a clinical trial of head and neck cancer patients. However, not all SCV-07-treated subjects responded positively. Here we determined if specific gene clusters could discriminate between subjects who responded to SCV-07 and those who did not. Microarrays were done using peripheral blood RNA obtained at screening and on the last day of radiation from 28 subjects enrolled in the SCV-07 trial. An analytical technique was applied that relied on learned Bayesian networks to identify gene clusters which discriminated between individuals who received SCV-07 and those who received placebo, and which differentiated subjects for whom SCV-07 was an effective OM intervention from those for whom it was not. We identified 107 genes that discriminated SCV-07 responders from non-responders using four models and applied Akaike Information Criteria (AIC) and Bayes Factor (BF) analysis to evaluate predictive accuracy. AIC were superior to BF: the accuracy of predicting placebo vs. treatment was 78% using BF, but 91% using the AIC score. Our ability to differentiate responders from non-responders using the AIC score was dramatic and ranged from 93% to 100% depending on the dataset that was evaluated. Predictive Bayesian networks were identified and functional cluster analyses were performed. A specific 10 gene cluster was a critical contributor to the predictability of the dataset. Our results demonstrate proof of concept in which the application of a genomics-based analytical paradigm was capable of discriminating responders and non-responders for an OM intervention. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Directory of Open Access Journals (Sweden)

    Roslyn D Noar

    Full Text Available Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that

  1. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  2. Deletion of a gene cluster for [Ni-Fe] hydrogenase maturation in the anaerobic hyperthermophilic bacterium Caldicellulosiruptor bescii identifies its role in hydrogen metabolism.

    Science.gov (United States)

    Cha, Minseok; Chung, Daehwan; Westpheling, Janet

    2016-02-01

    The anaerobic, hyperthermophlic, cellulolytic bacterium Caldicellulosiruptor bescii grows optimally at ∼80 °C and effectively degrades plant biomass without conventional pretreatment. It utilizes a variety of carbohydrate carbon sources, including both C5 and C6 sugars, released from plant biomass and produces lactate, acetate, CO2, and H2 as primary fermentation products. The C. bescii genome encodes two hydrogenases, a bifurcating [Fe-Fe] hydrogenase and a [Ni-Fe] hydrogenase. The [Ni-Fe] hydrogenase is the most widely distributed in nature and is predicted to catalyze hydrogen production and to pump protons across the cellular membrane creating proton motive force. Hydrogenases are the key enzymes in hydrogen metabolism and their crystal structure reveals complexity in the organization of their prosthetic groups suggesting extensive maturation of the primary protein. Here, we report the deletion of a cluster of genes, hypABFCDE, required for maturation of the [Ni-Fe] hydrogenase. These proteins are specific for the hydrogenases they modify and are required for hydrogenase activity. The deletion strain grew more slowly than the wild type or the parent strain and produced slightly less hydrogen overall, but more hydrogen per mole of cellobiose. Acetate yield per mole of cellobiose was increased ∼67 % and ethanol yield per mole of cellobiose was decreased ∼39 %. These data suggest that the primary role of the [Ni-Fe] hydrogenase is to generate a proton gradient in the membrane driving ATP synthesis and is not the primary enzyme for hydrogen catalysis. In its absence, ATP is generated from increased acetate production resulting in more hydrogen produced per mole of cellobiose.

  3. Organization of an echinoderm Hox gene cluster

    OpenAIRE

    Martinez, Pedro; Rast, Jonathan P.; Arenas-Mena, César; Davidson, Eric H.

    1999-01-01

    The Strongylocentrotus purpuratus genome contains a single ten-gene Hox complex >0.5 megabase in length. This complex was isolated on overlapping bacterial artificial chromosome and P1 artificial chromosome genomic recombinants by using probes for individual genes and by genomic walking. Echinoderm Hox genes of Paralog Groups (PG) 1 and 2 are reported. The cluster includes genes representing all paralog groups of vertebrate Hox clusters, except that there is a sing...

  4. Thirteen nodule-specific or nodule-enhanced genes encoding products homologous to cysteine cluster proteins or plant lipid transfer proteins are identified in Astragalus sinicus L. by suppressive subtractive hybridization.

    Science.gov (United States)

    Chou, Min-Xia; Wei, Xin-Yuan; Chen, Da-Song; Zhou, Jun-Chu

    2006-01-01

    Thirteen nodule-specific or nodule-enhanced genes have been revealed by suppressive subtractive hybridization (SSH) with two mRNA populations of infected and uninfected control roots of Astragalus sinicus. Eleven of them encode small polypeptides showing homology to cysteine cluster proteins (CCPs) that contain a putative signal peptide and conserved cysteine residues. Among these CCP-like genes, AsG257 codes for a homologue of the defensin 2 family and AsD255 contains a scorpion toxin-like domain at the C-terminus. Sequence analysis of a genomic AsD255 fragment which was isolated revealed that one intron separates the first exon encoding the signal peptide from the second exon encoding the cysteine cluster domain of this nodulin. Another two genes, AsE246 and AsIB259, encode two different products similar to lipid transfer proteins (LTPs). Virtual northern blot and reverse transcription-polymerase chain reaction (RT-PCR) analysis indicated that the other genes except AsIB259 and AsC2411 were expressed exclusively in inoculated roots and that their expression was 2-4 d later than that of the leghaemoglobin (Lb) gene during nodule development. Transcription of AsIB259 was also detected in uninfected control roots but with a significant decline in expression and a temporal expression similar to Lb. AsC2411 had a basal expression in control roots identified by RT-PCR. Sequence alignment showed that the putative proteins AsE246 and AsIB259 show lower homology with LTPs from legumes than with those from other plants.

  5. Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data

    Directory of Open Access Journals (Sweden)

    S. Wen

    2007-01-01

    Full Text Available Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-model based clustering approach, designed primarily to cluster tissue samples on the basis of the genes. GeneClust is an implementation of the gene shaving methodology, motivated by research to identify distinct sets of genes for which variation in expression could be related to a biological property of the tissue samples. We illustrate the use of these two methods in the analysis of Affymetrix oligonucleotide arrays of well-known data sets from colon tissue samples with and without tumors, and of tumor tissue samples from patients with leukemia. Although the two approaches have been developed from different perspectives, the results demonstrate a clear correspondence between gene clusters produced by GeneClust and EMMIX-GENE for the colon tissue data. It is demonstrated, for the case of ribosomal proteins and smooth muscle genes in the colon data set, that both methods can classify genes into co-regulated families. It is further demonstrated that tissue types (tumor and normal can be separated on the basis of subtle distributed patterns of genes. Application to the leukemia tissue data produces a division of tissues corresponding closely to the external classification, acute myeloid leukemia (AML and acute lymphoblastic leukaemia (ALL, for both methods. In addition, we also identify genes specifi c for the subgroup of ALL-T cell samples. Overall, we find that the gene shaving method produces gene clusters at great speed; allows variable cluster sizes and can incorporate partial or full supervision; and finds clusters of genes in which the gene expression varies greatly over the tissue samples while maintaining a high level of coherence between the

  6. Identification of Nocobactin NA Biosynthetic Gene Clusters in Nocardia farcinica▿ §

    OpenAIRE

    Hoshino, Yasutaka; Chiba, Kazuhiro; Ishino, Keiko; Fukai, Toshio; Igarashi, Yasuhiro; Yazawa, Katsukiyo; Mikami, Yuzuru; Ishikawa, Jun

    2010-01-01

    We identified the biosynthetic gene clusters of the siderophore nocobactin NA. The nbt clusters, which were discovered as genes highly homologous to the mycobactin biosynthesis genes by the genomic sequencing of Nocardia farcinica IFM 10152, consist of 10 genes separately located at two genomic regions. The gene organization of the nbt clusters and the predicted functions of the nbt genes, particularly the cyclization and epimerization domains, were in good agreement with the chemical structu...

  7. Clustering Algorithms: Their Application to Gene Expression Data.

    Science.gov (United States)

    Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel

    2016-01-01

    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.

  8. Pichia stipitis genomics, transcriptomics, and gene clusters

    Science.gov (United States)

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  9. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Directory of Open Access Journals (Sweden)

    Zhimin Dai

    Full Text Available Biological nitrogen fixation is an essential function of acid mine drainage (AMD microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  10. Gene cluster encoding cholate catabolism in Rhodococcus spp.

    Science.gov (United States)

    Mohn, William W; Wilbrink, Maarten H; Casabon, Israël; Stewart, Gordon R; Liu, Jie; van der Geize, Robert; Eltis, Lindsay D

    2012-12-01

    Bile acids are highly abundant steroids with important functions in vertebrate digestion. Their catabolism by bacteria is an important component of the carbon cycle, contributes to gut ecology, and has potential commercial applications. We found that Rhodococcus jostii RHA1 grows well on cholate, as well as on its conjugates, taurocholate and glycocholate. The transcriptome of RHA1 growing on cholate revealed 39 genes upregulated on cholate, occurring in a single gene cluster. Reverse transcriptase quantitative PCR confirmed that selected genes in the cluster were upregulated 10-fold on cholate versus on cholesterol. One of these genes, kshA3, encoding a putative 3-ketosteroid-9α-hydroxylase, was deleted and found essential for growth on cholate. Two coenzyme A (CoA) synthetases encoded in the cluster, CasG and CasI, were heterologously expressed. CasG was shown to transform cholate to cholyl-CoA, thus initiating side chain degradation. CasI was shown to form CoA derivatives of steroids with isopropanoyl side chains, likely occurring as degradation intermediates. Orthologous gene clusters were identified in all available Rhodococcus genomes, as well as that of Thermomonospora curvata. Moreover, Rhodococcus equi 103S, Rhodococcus ruber Chol-4 and Rhodococcus erythropolis SQ1 each grew on cholate. In contrast, several mycolic acid bacteria lacking the gene cluster were unable to grow on cholate. Our results demonstrate that the above-mentioned gene cluster encodes cholate catabolism and is distinct from a more widely occurring gene cluster encoding cholesterol catabolism.

  11. Characterization of the largest effector gene cluster of Ustilago maydis.

    Directory of Open Access Journals (Sweden)

    Thomas Brefort

    2014-07-01

    Full Text Available In the genome of the biotrophic plant pathogen Ustilago maydis, many of the genes coding for secreted protein effectors modulating virulence are arranged in gene clusters. The vast majority of these genes encode novel proteins whose expression is coupled to plant colonization. The largest of these gene clusters, cluster 19A, encodes 24 secreted effectors. Deletion of the entire cluster results in severe attenuation of virulence. Here we present the functional analysis of this genomic region. We show that a 19A deletion mutant behaves like an endophyte, i.e. is still able to colonize plants and complete the infection cycle. However, tumors, the most conspicuous symptoms of maize smut disease, are only rarely formed and fungal biomass in infected tissue is significantly reduced. The generation and analysis of strains carrying sub-deletions identified several genes significantly contributing to tumor formation after seedling infection. Another of the effectors could be linked specifically to anthocyanin induction in the infected tissue. As the individual contributions of these genes to tumor formation were small, we studied the response of maize plants to the whole cluster mutant as well as to several individual mutants by array analysis. This revealed distinct plant responses, demonstrating that the respective effectors have discrete plant targets. We propose that the analysis of plant responses to effector mutant strains that lack a strong virulence phenotype may be a general way to visualize differences in effector function.

  12. Gene clustering by latent semantic indexing of MEDLINE abstracts.

    Science.gov (United States)

    Homayouni, Ramin; Heinrich, Kevin; Wei, Lai; Berry, Michael W

    2005-01-01

    A major challenge in the interpretation of high-throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases using term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations. We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Finally, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for the analysis of novel associations discovered in genomic experiments. The 50-gene document collection used in this study can be interactively queried at http://shad.cs.utk.edu/sgo/sgo.html.

  13. NIH Researchers Identify OCD Risk Gene

    Science.gov (United States)

    ... Issues Research News From NIH NIH Researchers Identify OCD Risk Gene Past Issues / Summer 2006 Table of ... gene variant that doubles an individual's risk for obsessive-compulsive disorder (OCD). The new functional variant, or allele, is ...

  14. The fimbrial gene cluster of Haemophilus influenzae type b

    NARCIS (Netherlands)

    van Ham, S. M.; van Alphen, L.; Mooi, F. R.; van Putten, J. P.

    1994-01-01

    Haemophilus influenzae infections are preceded by airway colonization, a process facilitated by fimbriae. Here, we identified the complete fimbrial gene cluster of H. influenzae type b. HifA forms the major subunit. HifB, a periplasmic chaperone, and HifC, an outer membrane usher, are typical

  15. Calcitonin gene-related peptide antagonism and cluster headache

    DEFF Research Database (Denmark)

    Ashina, Håkan; Newman, Lawrence; Ashina, Sait

    2017-01-01

    Calcitonin gene-related peptide (CGRP) is a key signaling molecule involved in migraine pathophysiology. Efficacy of CGRP monoclonal antibodies and antagonists in migraine treatment has fueled an increasing interest in the prospect of treating cluster headache (CH) with CGRP antagonism. The exact...... role of CGRP and its mechanism of action in CH have not been fully clarified. A search for original studies and randomized controlled trials (RCTs) published in English was performed in PubMed and in ClinicalTrials.gov . The search term used was "cluster headache and calcitonin gene related peptide......" and "primary headaches and calcitonin gene related peptide." Reference lists of identified articles were also searched for additional relevant papers. Human experimental studies have reported elevated plasma CGRP levels during both spontaneous and glyceryl trinitrate-induced cluster attacks. CGRP may play...

  16. A 6-gene signature identifies four molecular subgroups of neuroblastoma

    LENUS (Irish Health Repository)

    Abel, Frida

    2011-04-14

    Abstract Background There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB); Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA) and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK) was associated to unfavourable biology of sporadic NB. Also, various other genes have been linked to NB pathogenesis. Results The present study explores subgroup discrimination by gene expression profiling using three published microarray studies on NB (47 samples). Four distinct clusters were identified by Principal Components Analysis (PCA) in two separate data sets, which could be verified by an unsupervised hierarchical clustering in a third independent data set (101 NB samples) using a set of 74 discriminative genes. The expression signature of six NB-associated genes ALK, BIRC5, CCND1, MYCN, NTRK1, and PHOX2B, significantly discriminated the four clusters (p < 0.05, one-way ANOVA test). PCA clusters p1, p2, and p3 were found to correspond well to the postulated subtypes 1, 2A, and 2B, respectively. Remarkably, a fourth novel cluster was detected in all three independent data sets. This cluster comprised mainly 11q-deleted MNA-negative tumours with low expression of ALK, BIRC5, and PHOX2B, and was significantly associated with higher tumour stage, poor outcome and poor survival compared to the Type 1-corresponding favourable group (INSS stage 4 and\\/or dead of disease, p < 0.05, Fisher\\'s exact test). Conclusions Based on expression profiling we have identified four molecular subgroups of neuroblastoma, which can be distinguished by a 6-gene signature. The fourth subgroup has not been described elsewhere, and efforts are currently made to further investigate this group\\'s specific characteristics.

  17. Clustered Xenopus keratin genes: A genomic, transcriptomic, and proteomic analysis.

    Science.gov (United States)

    Suzuki, Ken-Ichi T; Suzuki, Miyuki; Shigeta, Mitsuki; Fortriede, Joshua D; Takahashi, Shuji; Mawaribuchi, Shuuji; Yamamoto, Takashi; Taira, Masanori; Fukui, Akimasa

    2017-06-15

    Keratin genes belong to the intermediate filament superfamily and their expression is altered following morphological and physiological changes in vertebrate epithelial cells. Keratin genes are divided into two groups, type I and II, and are clustered on vertebrate genomes, including those of Xenopus species. Various keratin genes have been identified and characterized by their unique expression patterns throughout ontogeny in Xenopus laevis; however, compilation of previously reported and newly identified keratin genes in two Xenopus species is required for our further understanding of keratin gene evolution, not only in amphibians but also in all terrestrial vertebrates. In this study, 120 putative type I and II keratin genes in total were identified based on the genome data from two Xenopus species. We revealed that most of these genes are highly clustered on two homeologous chromosomes, XLA9_10 and XLA2 in X. laevis, and XTR10 and XTR2 in X. tropicalis, which are orthologous to those of human, showing conserved synteny among tetrapods. RNA-Seq data from various embryonic stages and adult tissues highlighted the unique expression profiles of orthologous and homeologous keratin genes in developmental stage- and tissue-specific manners. Moreover, we identified dozens of epidermal keratin proteins from the whole embryo, larval skin, tail, and adult skin using shotgun proteomics. In light of our results, we discuss the radiation, diversification, and unique expression of the clustered keratin genes, which are closely related to epidermal development and terrestrial adaptation during amphibian evolution, including Xenopus speciation. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  19. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  20. A 6-gene signature identifies four molecular subgroups of neuroblastoma

    Directory of Open Access Journals (Sweden)

    Kogner Per

    2011-04-01

    Full Text Available Abstract Background There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB; Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK was associated to unfavourable biology of sporadic NB. Also, various other genes have been linked to NB pathogenesis. Results The present study explores subgroup discrimination by gene expression profiling using three published microarray studies on NB (47 samples. Four distinct clusters were identified by Principal Components Analysis (PCA in two separate data sets, which could be verified by an unsupervised hierarchical clustering in a third independent data set (101 NB samples using a set of 74 discriminative genes. The expression signature of six NB-associated genes ALK, BIRC5, CCND1, MYCN, NTRK1, and PHOX2B, significantly discriminated the four clusters (p ALK, BIRC5, and PHOX2B, and was significantly associated with higher tumour stage, poor outcome and poor survival compared to the Type 1-corresponding favourable group (INSS stage 4 and/or dead of disease, p Conclusions Based on expression profiling we have identified four molecular subgroups of neuroblastoma, which can be distinguished by a 6-gene signature. The fourth subgroup has not been described elsewhere, and efforts are currently made to further investigate this group's specific characteristics.

  1. QTLminer: identifying genes regulating quantitative traits

    Directory of Open Access Journals (Sweden)

    Schughart Klaus

    2010-10-01

    Full Text Available Abstract Background Quantitative trait locus (QTL mapping identifies genomic regions that likely contain genes regulating a quantitative trait. However, QTL regions may encompass tens to hundreds of genes. To find the most promising candidate genes that regulate the trait, the biologist typically collects information from multiple resources about the genes in the QTL interval. This process is very laborious and time consuming. Results QTLminer is a bioinformatics tool that automatically performs QTL region analysis. It is available in GeneNetwork and it integrates information such as gene annotation, gene expression and sequence polymorphisms for all the genes within a given genomic interval. Conclusions QTLminer substantially speeds up discovery of the most promising candidate genes within a QTL region.

  2. Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

    Energy Technology Data Exchange (ETDEWEB)

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-12-31

    Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involved in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.

  3. Identifying probable suicide clusters in wales using national mortality data.

    Directory of Open Access Journals (Sweden)

    Phillip Jones

    Full Text Available Up to 2% of suicides in young people may occur in clusters i.e., close together in time and space. In early 2008 unprecedented attention was given by national and international news media to a suspected suicide cluster among young people living in Bridgend, Wales. This paper investigates the strength of statistical evidence for this apparent cluster, its size, and temporal and geographical limits.The analysis is based on official mortality statistics for Wales for 2000-2009 provided by the UK's Office for National Statistics (ONS. Temporo-spatial analysis was performed using Space Time Permutation Scan Statistics with SaTScan v9.1 for suicide deaths aged 15 and over, with a sub-group analysis focussing on cases aged 15-34 years. These analyses were conducted for deaths coded by ONS as: (i suicide or of undetermined intent (probable suicides and (ii for a combination of suicide, undetermined, and accidental poisoning and hanging (possible suicides. The temporo-spatial analysis did not identify any clusters of suicide or undetermined intent deaths (probable suicides. However, analysis of all deaths by suicide, undetermined intent, accidental poisoning and accidental hanging (possible suicides identified a temporo-spatial cluster (p = 0.029 involving 10 deaths amongst 15-34 year olds centred on the County Borough of Bridgend for the period 27(th December 2007 to 19(th February 2008. Less than 1% of possible suicides in younger people in Wales in the ten year period were identified as being cluster-related.There was a possible suicide cluster in young people in Bridgend between December 2007 and February 2008. This cluster was smaller, shorter in duration, and predominantly later than the phenomenon that was reported in national and international print media. Further investigation of factors leading to the onset and termination of this series of deaths, in particular the role of the media, is required.

  4. Coordinated evolution of co-expressed gene clusters in the Drosophila transcriptome

    Directory of Open Access Journals (Sweden)

    Jones Corbin D

    2008-01-01

    Full Text Available Abstract Background Co-expression of genes that physically cluster together is a common characteristic of eukaryotic transcriptomes. This organization of transcriptomes suggests that coordinated evolution of gene expression for clustered genes may also be common. Clusters where expression evolution of each gene is not independent of their neighbors are important units for understanding transcriptome evolution. Results We used a common microarray platform to measure gene expression in seven closely related species in the Drosophila melanogaster subgroup, accounting for confounding effects of sequence divergence. To summarize the correlation structure among genes in a chromosomal region, we analyzed the fraction of variation along the first principal component of the correlation matrix. We analyzed the correlation for blocks of consecutive genes to assess patterns of correlation that may be manifest at different scales of coordinated expression. We find that expression of physically clustered genes does evolve in a coordinated manner in many locations throughout the genome. Our analysis shows that relatively few of these clusters are near heterochromatin regions and that these clusters tend to be over-dispersed relative to the rest of the genome. This suggests that these clusters are not the byproduct of local gene clustering. We also analyzed the pattern of co-expression among neighboring genes within a single Drosophila species: D. simulans. For the co-expression clusters identified within this species, we find an under-representation of genes displaying a signature of recurrent adaptive amino acid evolution consistent with previous findings. However, clusters displaying co-evolution of expression among species are enriched for adaptively evolving genes. This finding points to a tie between adaptive sequence evolution and evolution of the transcriptome. Conclusion Our results demonstrate that co-evolution of expression in gene clusters is

  5. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  6. Some statistical properties of gene expression clustering for array data

    DEFF Research Database (Denmark)

    Abreu, G C G; Pinheiro, A; Drummond, R D

    2010-01-01

    DNA arrays have been a rich source of data for the study of genomic expression of a wide variety of biological systems. Gene clustering is one of the paradigms quite used to assess the significance of a gene (or group of genes). However, most of the gene clustering techniques are applied to cDNA...

  7. Evolution and differential expression of a vertebrate vitellogenin gene cluster

    Directory of Open Access Journals (Sweden)

    Kongshaug Heidi

    2009-01-01

    Full Text Available Abstract Background The multiplicity or loss of the vitellogenin (vtg gene family in vertebrates has been argued to have broad implications for the mode of reproduction (placental or non-placental, cleavage pattern (meroblastic or holoblastic and character of the egg (pelagic or benthic. Earlier proposals for the existence of three forms of vertebrate vtgs present conflicting models for their origin and subsequent duplication. Results By integrating phylogenetics of novel vtg transcripts from old and modern teleosts with syntenic analyses of all available genomic variants of non-metatherian vertebrates we identify the gene orthologies between the Sarcopterygii (tetrapod branch and Actinopterygii (fish branch. We argue that the vertebrate vtg gene cluster originated in proto-chromosome m, but that vtg genes have subsequently duplicated and rearranged following whole genome duplications. Sequencing of a novel fourth vtg transcript in labrid species, and the presence of duplicated paralogs in certain model organisms supports the notion that lineage-specific gene duplications frequently occur in teleosts. The data show that the vtg gene cluster is more conserved between acanthomorph teleosts and tetrapods, than in ostariophysan teleosts such as the zebrafish. The differential expression of the labrid vtg genes are further consistent with the notion that neofunctionalized Aa-type vtgs are important determinants of the pelagic or benthic character of the eggs in acanthomorph teleosts. Conclusion The vertebrate vtg gene cluster existed prior to the separation of Sarcopterygii from Actinopterygii >450 million years ago, a period associated with the second round of whole genome duplication. The presence of higher copy numbers in a more highly expressed subcluster is particularly prevalent in teleosts. The differential expression and latent neofunctionalization of vtg genes in acanthomorph teleosts is an adaptive feature associated with oocyte hydration

  8. Identifying clinical course patterns in SMS data using cluster analysis.

    Science.gov (United States)

    Kent, Peter; Kongsted, Alice

    2012-07-02

    Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important subgroups in the outcomes of research studies. Two previous studies have investigated detailed clinical course patterns in SMS data obtained from people seeking care for low back pain. One used a visual analysis approach and the other performed a cluster analysis of SMS data that had first been transformed by spline analysis. However, cluster analysis of SMS data in its original untransformed form may be simpler and offer other advantages. Therefore, the aim of this study was to determine whether cluster analysis could be used for identifying clinical course patterns distinct from the pattern of the whole group, by including all SMS time points in their original form. It was a 'proof of concept' study to explore the potential, clinical relevance, strengths and weakness of such an approach. This was a secondary analysis of longitudinal SMS data collected in two randomised controlled trials conducted simultaneously from a single clinical population (n = 322). Fortnightly SMS data collected over a year on 'days of problematic low back pain' and on 'days of sick leave' were analysed using Two-Step (probabilistic) Cluster Analysis. Clinical course patterns were identified that were clinically interpretable and different from those of the whole group. Similar patterns were obtained when the number of SMS time points was reduced to monthly. The advantages and disadvantages of this method were contrasted to that of first transforming SMS data by spline analysis. This study showed that clinical course patterns can be identified by cluster analysis using all SMS time points as cluster variables. This method is simple, intuitive and does not require a high level of statistical skill. However, there

  9. Gene duplication, modularity and adaptation in the evolution of the aflatoxin gene cluster

    Directory of Open Access Journals (Sweden)

    Jakobek Judy L

    2007-07-01

    Full Text Available Abstract Background The biosynthesis of aflatoxin (AF involves over 20 enzymatic reactions in a complex polyketide pathway that converts acetate and malonate to the intermediates sterigmatocystin (ST and O-methylsterigmatocystin (OMST, the respective penultimate and ultimate precursors of AF. Although these precursors are chemically and structurally very similar, their accumulation differs at the species level for Aspergilli. Notable examples are A. nidulans that synthesizes only ST, A. flavus that makes predominantly AF, and A. parasiticus that generally produces either AF or OMST. Whether these differences are important in the evolutionary/ecological processes of species adaptation and diversification is unknown. Equally unknown are the specific genomic mechanisms responsible for ordering and clustering of genes in the AF pathway of Aspergillus. Results To elucidate the mechanisms that have driven formation of these clusters, we performed systematic searches of aflatoxin cluster homologs across five Aspergillus genomes. We found a high level of gene duplication and identified seven modules consisting of highly correlated gene pairs (aflA/aflB, aflR/aflS, aflX/aflY, aflF/aflE, aflT/aflQ, aflC/aflW, and aflG/aflL. With the exception of A. nomius, contrasts of mean Ka/Ks values across all cluster genes showed significant differences in selective pressure between section Flavi and non-section Flavi species. A. nomius mean Ka/Ks values were more similar to partial clusters in A. fumigatus and A. terreus. Overall, mean Ka/Ks values were significantly higher for section Flavi than for non-section Flavi species. Conclusion Our results implicate several genomic mechanisms in the evolution of ST, OMST and AF cluster genes. Gene modules may arise from duplications of a single gene, whereby the function of the pre-duplication gene is retained in the copy (aflF/aflE or the copies may partition the ancestral function (aflA/aflB. In some gene modules, the

  10. Cataloging the Praesepe Cluster: Identifying Interlopers and Binary Systems

    Science.gov (United States)

    Lucey, Madeline R.; Gosnell, Natalie M.; Mann, Andrew; Douglas, Stephanie

    2018-01-01

    We present radial velocity measurements from an ongoing survey of the Praesepe open cluster using the WIYN 3.5m Telescope. Our target stars include 229 early-K to mid-M dwarfs with proper motion memberships that have been observed by the repurposed Kepler mission, K2. With this survey, we will provide a well-constrained membership list of the cluster. By removing interloping stars and determining the cluster binary frequency we can avoid systematic errors in our analysis of the K2 findings and more accurately determine exoplanet properties in the Praesepe cluster. Obtaining accurate exoplanet parameters in open clusters allows us to study the temporal dimension of exoplanet parameter space. We find Praesepe to have a mean radial velocity of 34.09 km/s and a velocity dispersion of 1.13 km/s, which is consistent with previous studies. We derive radial velocity membership probabilities for stars with ≥3 radial velocity measurements and compare against published membership probabilities. We also identify radial velocity variables and potential double-lined spectroscopic binaries. We plan to obtain more observations to determine the radial velocity membership of all the stars in our sample, as well as follow up on radial velocity variables to determine binary orbital solutions.

  11. Identifying node spreading influence for tunable clustering coefficient networks

    Science.gov (United States)

    Wang, Zi-Yi; Han, Jing-Ti; Zhao, Jun

    2017-11-01

    Identifying the node spreading influence is of significant for information and innovation diffusion. In this paper, we argue that the spreading process should be taken into account for identifying the node spreading influence and investigate the effect of the network structure, measured by the clustering coefficient, on the performance of spreading dynamics. Firstly, we generate a series of networks with tunable clustering coefficients. Then, taking into account the spreading process, we explore the performances among the Dynamics-sensitive (DS) index and the degree, between, closeness, eigenvector indices. Comparing with the Susceptible-Infective-Removed (SIR) model, the extensive results show that, for different spreading time steps and clustering coefficients, the DS centrality outperforms the performance, τ > 0 . 97, of degree, betweenness, closeness and eigenvector measures. Moreover, the accuracy of closeness and eigenvector centrality is similar and conducts better in networks with larger spreading rate β = 0 . 20 , τ > 0 . 93. As the clustering coefficient increases, all the performances decrease but DS centrality with least percent of 1.16 at most under β = 0 . 10, and Closeness with the largest percent of 9.75 under β = 0 . 05. This work suggests that the spreading influence not only depends on the network structure, more importantly, the spreading dynamic process also affect the performance greatly, which should be taken into account simultaneously.

  12. Clustering gene expression regulators: new approach to disease subtyping.

    Directory of Open Access Journals (Sweden)

    Mikhail Pyatnitskiy

    Full Text Available One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms, that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient.

  13. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  14. Functional Analysis of the Fusarielin Biosynthetic Gene Cluster

    Directory of Open Access Journals (Sweden)

    Aida Droce

    2016-12-01

    Full Text Available Fusarielins are polyketides with a decalin core produced by various species of Aspergillus and Fusarium. Although the responsible gene cluster has been identified, the biosynthetic pathway remains to be elucidated. In the present study, members of the gene cluster were deleted individually in a Fusarium graminearum strain overexpressing the local transcription factor. The results suggest that a trans-acting enoyl reductase (FSL5 assists the polyketide synthase FSL1 in biosynthesis of a polyketide product, which is released by hydrolysis by a trans-acting thioesterase (FSL2. Deletion of the epimerase (FSL3 resulted in accumulation of an unstable compound, which could be the released product. A novel compound, named prefusarielin, accumulated in the deletion mutant of the cytochrome P450 monooxygenase FSL4. Unlike the known fusarielins from Fusarium, this compound does not contain oxygenized decalin rings, suggesting that FSL4 is responsible for the oxygenation.

  15. Identifying multiple influential spreaders by a heuristic clustering algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Bao, Zhong-Kui [School of Mathematical Science, Anhui University, Hefei 230601 (China); Liu, Jian-Guo [Data Science and Cloud Service Research Center, Shanghai University of Finance and Economics, Shanghai, 200133 (China); Zhang, Hai-Feng, E-mail: haifengzhang1978@gmail.com [School of Mathematical Science, Anhui University, Hefei 230601 (China); Department of Communication Engineering, North University of China, Taiyuan, Shan' xi 030051 (China)

    2017-03-18

    The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very “negligible”. Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant. - Highlights: • A heuristic clustering algorithm is proposed to identify the multiple influential spreaders in complex networks. • The algorithm can not only guarantee the selected spreaders are sufficiently scattered but also avoid to be “insignificant”. • The performance of our algorithm is generally better than other methods, regardless of real networks or synthetic networks.

  16. Identifying multiple influential spreaders by a heuristic clustering algorithm

    International Nuclear Information System (INIS)

    Bao, Zhong-Kui; Liu, Jian-Guo; Zhang, Hai-Feng

    2017-01-01

    The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very “negligible”. Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant. - Highlights: • A heuristic clustering algorithm is proposed to identify the multiple influential spreaders in complex networks. • The algorithm can not only guarantee the selected spreaders are sufficiently scattered but also avoid to be “insignificant”. • The performance of our algorithm is generally better than other methods, regardless of real networks or synthetic networks.

  17. Arrangement of the Clostridium baratii F7 toxin gene cluster with identification of a σ factor that recognizes the botulinum toxin gene cluster promoters.

    Science.gov (United States)

    Dover, Nir; Barash, Jason R; Burke, Julianne N; Hill, Karen K; Detter, John C; Arnon, Stephen S

    2014-01-01

    Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bont gene that is part of a toxin gene cluster that includes several accessory genes. We sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. This TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii.

  18. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    2009-09-01

    Full Text Available Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes.Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method.The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001.The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  19. The ergot alkaloid gene cluster in Claviceps purpurea: extension of the cluster sequence and intra species evolution.

    Science.gov (United States)

    Haarmann, Thomas; Machado, Caroline; Lübbe, Yvonne; Correia, Telmo; Schardl, Christopher L; Panaccione, Daniel G; Tudzynski, Paul

    2005-06-01

    The genomic region of Claviceps purpurea strain P1 containing the ergot alkaloid gene cluster [Tudzynski, P., Hölter, K., Correia, T., Arntz, C., Grammel, N., Keller, U., 1999. Evidence for an ergot alkaloid gene cluster in Claviceps purpurea. Mol. Gen. Genet. 261, 133-141] was explored by chromosome walking, and additional genes probably involved in the ergot alkaloid biosynthesis have been identified. The putative cluster sequence (extending over 68.5kb) contains 4 different nonribosomal peptide synthetase (NRPS) genes and several putative oxidases. Northern analysis showed that most of the genes were co-regulated (repressed by high phosphate), and identified probable flanking genes by lack of co-regulation. Comparison of the cluster sequences of strain P1, an ergotamine producer, with that of strain ECC93, an ergocristine producer, showed high conservation of most of the cluster genes, but significant variation in the NRPS modules, strongly suggesting that evolution of these chemical races of C. purpurea is determined by evolution of NRPS module specificity.

  20. Global Analysis of miRNA Gene Clusters and Gene Families Reveals Dynamic and Coordinated Expression

    Directory of Open Access Journals (Sweden)

    Li Guo

    2014-01-01

    Full Text Available To further understand the potential expression relationships of miRNAs in miRNA gene clusters and gene families, a global analysis was performed in 4 paired tumor (breast cancer and adjacent normal tissue samples using deep sequencing datasets. The compositions of miRNA gene clusters and families are not random, and clustered and homologous miRNAs may have close relationships with overlapped miRNA species. Members in the miRNA group always had various expression levels, and even some showed larger expression divergence. Despite the dynamic expression as well as individual difference, these miRNAs always indicated consistent or similar deregulation patterns. The consistent deregulation expression may contribute to dynamic and coordinated interaction between different miRNAs in regulatory network. Further, we found that those clustered or homologous miRNAs that were also identified as sense and antisense miRNAs showed larger expression divergence. miRNA gene clusters and families indicated important biological roles, and the specific distribution and expression further enrich and ensure the flexible and robust regulatory network.

  1. AntiSMASH 4.0 - improvements in chemistry prediction and gene cluster boundary identification

    NARCIS (Netherlands)

    Blin, Kai; Wolf, Thomas; Chevrette, Marc G.; Lu, Xiaowen; Schwalen, Christopher J.; Kautsar, Satria A.; Suarez Duran, Hernando G.; Los Santos, De Emmanuel L.C.; Kim, Hyun Uk; Nave, Mariana; Dickschat, Jeroen S.; Mitchell, Douglas A.; Shelest, Ekaterina; Breitling, Rainer; Takano, Eriko; Lee, Sang Yup; Weber, Tilmann; Medema, Marnix H.

    2017-01-01

    Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding the production

  2. Identifying Candidate Reprogramming Genes in Mouse Induced Pluripotent Stem Cells.

    Science.gov (United States)

    Gao, Fang; Li, Jingyu; Zhang, Heng; Yang, Xu; An, Tiezhu

    2017-08-01

    Factor-based induced reprogramming approaches have tremendous potential for human regenerative medicine, but the efficiencies of these approaches are still low. In this study, we analyzed the global transcriptional profiles of mouse induced pluripotent stem cells (miPSCs) and mouse embryonic stem cells (mESCs) from seven different labs and present here the first successful clustering according to cell type, not by lab of origin. We identified 2131 different expression genes (DEs) as candidate pluripotency-associated genes by comparing mESCs/miPSCs with somatic cells and 720 DEs between miPSCs and mESCs. Interestingly, there was a significant overlap between the two DE sets. Therefore, we defined the overlap DEs as "consensus DEs" including 313 miPSC-specific genes expressed at a higher level in miPSCs versus mESCs and 184 mESC-specific genes in total and reasoned that these may contribute to the differences in pluripotency between mESCs and miPSCs. A classification of "consensus DEs" according to their different expression levels between somatic cells and mESCs/miPSCs shows that 86% of the miPSC-specific genes are more highly expressed in somatic cells, while 73% of mESC-specific genes are highly expressed in mESCs/miPSCs, indicating that the miPSCs have not efficiently silenced the expression pattern of the somatic cells from which they are derived and failed to completely induce the genes with high expression levels in mESCs. We further revealed a strong correlation between oocyte-enriched factors and insufficiently induced mESC-specific genes and identified 11 hub genes via network analysis. In light of these findings, we postulated that these key hub genes might not only drive somatic cell nuclear transfer (SCNT) reprogramming but also augment the efficiency and quality of miPSC reprogramming.

  3. Rough-fuzzy clustering for grouping functionally similar genes from microarray data.

    Science.gov (United States)

    Maji, Pradipta; Paul, Sushmita

    2013-01-01

    Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.

  4. Time-series clustering of gene expression in irradiated and bystander fibroblasts: an application of FBPA clustering

    Directory of Open Access Journals (Sweden)

    Markatou Marianthi

    2011-01-01

    Full Text Available Abstract Background The radiation bystander effect is an important component of the overall biological response of tissues and organisms to ionizing radiation, but the signaling mechanisms between irradiated and non-irradiated bystander cells are not fully understood. In this study, we measured a time-series of gene expression after α-particle irradiation and applied the Feature Based Partitioning around medoids Algorithm (FBPA, a new clustering method suitable for sparse time series, to identify signaling modules that act in concert in the response to direct irradiation and bystander signaling. We compared our results with those of an alternate clustering method, Short Time series Expression Miner (STEM. Results While computational evaluations of both clustering results were similar, FBPA provided more biological insight. After irradiation, gene clusters were enriched for signal transduction, cell cycle/cell death and inflammation/immunity processes; but only FBPA separated clusters by function. In bystanders, gene clusters were enriched for cell communication/motility, signal transduction and inflammation processes; but biological functions did not separate as clearly with either clustering method as they did in irradiated samples. Network analysis confirmed p53 and NF-κB transcription factor-regulated gene clusters in irradiated and bystander cells and suggested novel regulators, such as KDM5B/JARID1B (lysine (K-specific demethylase 5B and HDACs (histone deacetylases, which could epigenetically coordinate gene expression after irradiation. Conclusions In this study, we have shown that a new time series clustering method, FBPA, can provide new leads to the mechanisms regulating the dynamic cellular response to radiation. The findings implicate epigenetic control of gene expression in addition to transcription factor networks.

  5. Gene ordering in partitive clustering using microarray expressions

    Indian Academy of Sciences (India)

    PRAKASH KUMAR

    the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution. Ray S S, Bandyopadhyay S and Pal S K 2007 Gene ordering in partitive clustering using microarray expressions; J. Biosci.

  6. Unique nucleotide polymorphism of ankyrin gene cluster in ...

    Indian Academy of Sciences (India)

    The ankyrin (ANK) gene cluster is a part of a multigene family encoding ANK transmembrane proteins in Arabidopsis thaliana, and plays an important role in protein–protein interactions and in signal pathways. In contrast to other regions of a genome, the ANK gene cluster exhibits an extremely high level of DNA ...

  7. An effective fuzzy kernel clustering analysis approach for gene expression data.

    Science.gov (United States)

    Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

    2015-01-01

    Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.

  8. Gene expression analysis identifies global gene dosage sensitivity in cancer

    DEFF Research Database (Denmark)

    Fehrmann, Rudolf S. N.; Karjalainen, Juha M.; Krajewska, Malgorzata

    2015-01-01

    Many cancer-associated somatic copy number alterations (SCNAs) are known. Currently, one of the challenges is to identify the molecular downstream effects of these variants. Although several SCNAs are known to change gene expression levels, it is not clear whether each individual SCNA affects gen...

  9. Utility and Limitations of Using Gene Expression Data to Identify Functional Associations.

    Directory of Open Access Journals (Sweden)

    Sahra Uygun

    2016-12-01

    Full Text Available Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets.

  10. Clustering of Drosophila melanogaster immune genes in interplay with recombination rate.

    Directory of Open Access Journals (Sweden)

    K Mathias Wegner

    Full Text Available BACKGROUND: Gene order in eukaryotic chromosomes is not random and has been linked to coordination of gene expression, chromatin structure and also recombination rate. The evolution of recombination rate is especially relevant for genes involved in immunity because host-parasite co-evolution could select for increased recombination rate (Red Queen hypothesis. To identify patterns left by the intimate interaction between hosts and parasites, I analysed the genomic parameters of the immune genes from 24 gene families/groups of Drosophila melanogaster. PRINCIPAL FINDINGS: Immune genes that directly interact with the pathogen (i.e. recognition and effector genes clustered in regions of higher recombination rates. Out of these, clustered effector genes were transcribed fastest indicating that transcriptional control might be one major cause for cluster formation. The relative position of clusters to each other, on the other hand, cannot be explained by transcriptional control per se. Drosophila immune genes that show epistatic interactions can be found at an average distance of 15.44+/-2.98 cM, which is considerably closer than genes that do not interact (30.64+/-1.95 cM. CONCLUSIONS: Epistatically interacting genes rarely belong to the same cluster, which supports recent models of optimal recombination rates between interacting genes in antagonistic host-parasite co-evolution. These patterns suggest that formation of local clusters might be a result of transcriptional control, but that in the condensed genome of D. melanogaster relative position of these clusters may be a result of selection for optimal rather than maximal recombination rates between these clusters.

  11. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

    Directory of Open Access Journals (Sweden)

    Olszewski Kellen L

    2007-07-01

    Full Text Available Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes. Results We developed Nearest Neighbor Networks (NNN, a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the

  12. Differential Retention of Gene Functions in a Secondary Metabolite Cluster.

    Science.gov (United States)

    Reynolds, Hannah T; Slot, Jason C; Divon, Hege H; Lysøe, Erik; Proctor, Robert H; Brown, Daren W

    2017-08-01

    In fungi, distribution of secondary metabolite (SM) gene clusters is often associated with host- or environment-specific benefits provided by SMs. In the plant pathogen Alternaria brassicicola (Dothideomycetes), the DEP cluster confers an ability to synthesize the SM depudecin, a histone deacetylase inhibitor that contributes weakly to virulence. The DEP cluster includes genes encoding enzymes, a transporter, and a transcription regulator. We investigated the distribution and evolution of the DEP cluster in 585 fungal genomes and found a wide but sporadic distribution among Dothideomycetes, Sordariomycetes, and Eurotiomycetes. We confirmed DEP gene expression and depudecin production in one fungus, Fusarium langsethiae. Phylogenetic analyses suggested 6-10 horizontal gene transfers (HGTs) of the cluster, including a transfer that led to the presence of closely related cluster homologs in Alternaria and Fusarium. The analyses also indicated that HGTs were frequently followed by loss/pseudogenization of one or more DEP genes. Independent cluster inactivation was inferred in at least four fungal classes. Analyses of transitions among functional, pseudogenized, and absent states of DEP genes among Fusarium species suggest enzyme-encoding genes are lost at higher rates than the transporter (DEP3) and regulatory (DEP6) genes. The phenotype of an experimentally-induced DEP3 mutant of Fusarium did not support the hypothesis that selective retention of DEP3 and DEP6 protects fungi from exogenous depudecin. Together, the results suggest that HGT and gene loss have contributed significantly to DEP cluster distribution, and that some DEP genes provide a greater fitness benefit possibly due to a differential tendency to form network connections. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.

  13. RNA-seq analysis identifies an intricate regulatory network controlling cluster root development in white lupin.

    Science.gov (United States)

    Secco, David; Shou, Huixia; Whelan, James; Berkowitz, Oliver

    2014-03-25

    Highly adapted plant species are able to alter their root architecture to improve nutrient uptake and thrive in environments with limited nutrient supply. Cluster roots (CRs) are specialised structures of dense lateral roots formed by several plant species for the effective mining of nutrient rich soil patches through a combination of increased surface area and exudation of carboxylates. White lupin is becoming a model-species allowing for the discovery of gene networks involved in CR development. A greater understanding of the underlying molecular mechanisms driving these developmental processes is important for the generation of smarter plants for a world with diminishing resources to improve food security. RNA-seq analyses for three developmental stages of the CR formed under phosphorus-limited conditions and two of non-cluster roots have been performed for white lupin. In total 133,045,174 high-quality paired-end reads were used for a de novo assembly of the root transcriptome and merged with LAGI01 (Lupinus albus gene index) to generate an improved LAGI02 with 65,097 functionally annotated contigs. This was followed by comparative gene expression analysis. We show marked differences in the transcriptional response across the various cluster root stages to adjust to phosphate limitation by increasing uptake capacity and adjusting metabolic pathways. Several transcription factors such as PLT, SCR, PHB, PHV or AUX/IAA with a known role in the control of meristem activity and developmental processes show an increased expression in the tip of the CR. Genes involved in hormonal responses (PIN, LAX, YUC) and cell cycle control (CYCA/B, CDK) are also differentially expressed. In addition, we identify primary transcripts of miRNAs with established function in the root meristem. Our gene expression analysis shows an intricate network of transcription factors and plant hormones controlling CR initiation and formation. In addition, functional differences between the

  14. Identifying Coevolving Partners from Paralogous Gene Families

    Directory of Open Access Journals (Sweden)

    Chen-Hsiang Yeang

    2008-01-01

    Full Text Available Many methods have been developed to detect coevolution from aligned sequences. However, all the existing methods require a one-to-one mapping of candidate coevolving partners (nucleotides, amino acids a priori. When two families of sequences have distinct duplication and loss histories, finding the one-to-one mapping of coevolving partners can be computationally involved. We propose an algorithm to identify the coevolving partners from two families of sequences with distinct phylogenetic trees. The algorithm maps each gene tree to a reference species tree, and builds a joint state of sequence composition and assignments of coevolving partners for each species tree node. By applying dynamic programming on the joint states, the optimal assignments can be identified. Time complexity is quadratic to the size of the species tree, and space complexity is exponential to the maximum number of gene tree nodes mapped to the same species tree node. Analysis on both simulated data and Pfam protein domain sequences demonstrates that the paralog coevolution algorithm picks up the coevolving partners with 60%–88% accuracy. This algorithm extends phylogeny-based coevolutionary models and make them applicable to a wide range of problems such as predicting protein-protein, protein-DNA and DNA-RNA interactions of two distinct families of sequences.

  15. Identifying Clusters with Mixture Models that Include Radial Velocity Observations

    Science.gov (United States)

    Czarnatowicz, Alexis; Ybarra, Jason E.

    2018-01-01

    The study of stellar clusters plays an integral role in the study of star formation. We present a cluster mixture model that considers radial velocity data in addition to spatial data. Maximum likelihood estimation through the Expectation-Maximization (EM) algorithm is used for parameter estimation. Our mixture model analysis can be used to distinguish adjacent or overlapping clusters, and estimate properties for each cluster.Work supported by awards from the Virginia Foundation for Independent Colleges (VFIC) Undergraduate Science Research Fellowship and The Research Experience @Bridgewater (TREB).

  16. High or low correlation between co-occuring gene clusters and 16S rRNA gene phylogeny.

    Science.gov (United States)

    Rudi, Knut; Sekelja, Monika

    2013-02-01

    Ribosomal RNA (rRNA) genes are universal for all living organisms. Yet, the correspondence between genome composition and rRNA phylogeny remains poorly known. The aim of this study was to use the information from genome sequence databases to address the correlation between rRNA gene phylogeny and total gene composition in bacteria. This was done by analysing 327 genomes with TIGRFAM functional gene annotations. Our approach consisted of two steps. First, we searched for discriminatory clusters of co-occurring genes. Using a multivariate statistical approach, we identified 11 such clusters which contain genes that were co-occurring only in a subset of genomes and contributed to explain the gene content differences between genome subsets. Second, we mapped the discovered clusters to 16S rRNA-based phylogeny and calculated the correlation between co-occuring genes and phylogeny. Six of the 11 clusters exhibited significant correlation with 16S rRNA gene phylogeny. The most distinct phylogenetic finding was a high correlation between iron-sulfur oxidoreductases in combination with carbon nitrogen ligases and Chlorobium. The other correlations identified covered relatively large phylogroups: Actinobacteria were positively associated with kinases, while Gammaproteobacteria were positively associated with methylases and acyltransferases. The suggested functional differences between higher phylogroups, however, need experimental verification. © 2012 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  17. Gene prioritization and clustering by multi-view text mining.

    Science.gov (United States)

    Yu, Shi; Tranchevent, Leon-Charles; De Moor, Bart; Moreau, Yves

    2010-01-14

    Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.

  18. Progeny Clustering: A Method to Identify Biological Phenotypes

    Science.gov (United States)

    Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

    2015-01-01

    Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476

  19. An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information

    Directory of Open Access Journals (Sweden)

    Ao Li

    2009-04-01

    Full Text Available Motivation: Bi-clustering algorithms aim to identify sets of genes sharing similar expression patterns across a subset of conditions. However direct interpretation or prediction of gene regulatory mechanisms may be difficult as only gene expression data is used. Information about gene regulators may also be available, most commonly about which transcription factors may bind to the promoter region and thus control the expression level of a gene. Thus a method to integrate gene expression and gene regulation information is desirable for clustering and analyzing. Methods: By incorporating gene regulatory information with gene expression data, we define regulated expression values (REV as indicators of how a gene is regulated by a specific factor. Existing bi-clustering methods are extended to a three dimensional data space by developing a heuristic TRI-Clustering algorithm. An additional approach named Automatic Boundary Searching algorithm (ABS is introduced to automatically determine the boundary threshold. Results: Results based on incorporating ChIP-chip data representing transcription factor-gene interactions show that the algorithms are efficient and robust for detecting tri-clusters. Detailed analysis of the tri-cluster extracted from yeast sporulation REV data shows genes in this cluster exhibited significant differences during the middle and late stages. The implicated regulatory network was then reconstructed for further study of defined regulatory mechanisms. Topological and statistical analysis of this network demonstrated evidence of significant changes of TF activities during the different stages of yeast sporulation, and suggests this approach might be a general way to study regulatory networks undergoing transformations.

  20. Salicin regulates the expression of functional 'youth gene clusters' to reflect a more youthful gene expression profile.

    Science.gov (United States)

    Gopaul, R; Knaggs, H E; Lephart, J

    2011-10-01

    There are a variety of biological mechanisms that contribute to specific characteristics of ageing skin; for example, the loss of skin structure proteins, increased susceptibility to UV-induced pigmentation and/or loss of hydration. Each of these biological processes is influenced by specific groups of genes. In this research, we have identified groups of genes associated with specific clinical signs of skin ageing and refer to these as functional 'youth gene clusters'. In this study, quantitative real-time polymerase chain reaction (qPCR) was used to investigate the effects of topical application of salicin in regulating the expression of functional 'youth gene clusters' to reflect a more youthful skin profile and reduce the appearance of attributes associated with skin ageing. Results showed that salicin significantly influences the gene expression profiles of treated human equivalent full-thickness skin, by regulating the expression of genes associated with various biological processes involving skin structure, skin hydration, pigmentation and cellular differentiation. Based on the findings from this experiment, salicin was identified as a key ingredient that may regulate functional 'youth gene clusters' to reflect a more youthful gene expression profile by increasing the expression of genes responsible for youthful skin and decreasing the expression of genes responsible for the appearance of aged skin. © 2011 The Authors. ICS © 2011 Society of Cosmetic Scientists and the Société Française de Cosmétologie.

  1. Cancer therapeutic target genes identified on chromosome 20q

    Directory of Open Access Journals (Sweden)

    Editorial Office

    2016-08-01

    , Snijders and Mao described that and “when the selection pressure is removed, amplifications are not maintained and eventually disappear. Thus, amplifications focus on those genes that are important for tumor development,” they said. Their analysis showed that, as tumorous cells progress toward malignancy, the DNA copy number plays a major role in the mechanism of increased expression levels for the 18-gene signature on chromosome 20q. “Strong associations between the DNA copy number and gene expression were observed in the majority of tumor types,” the researchers said. “For example, the RAE1 expression was found to be significantly associated with DNA copy number in 20 tumor types,” the study reported. “Elevated DNA copy numbers of MMP9 and SULF2 were associated with increased gene expressions in only two and seven tumor types, respectively,” it added. With their integrated multi-omics analysis of genes on chromosome 20q, Snijders and Mao believed that the 18-gene signature could become new molecular targets for cancer therapy. “Gene ontology analysis revealed significant enrichment of cell cycle and mitosis-related biological processes in our 18-gene, suggesting that a cluster of functionally related genes localize to chromosome 20q,” they said. The identification of good targets such as theirs is a critical step for the development of targeted therapies for cancer treatment, according to the researchers. Microarray and next generation sequencing technologies have become invaluable tools in cataloging genomic abnormalities in human cancers and identifying new potential therapeutic targets, in addition to the availability of large cancer genomic data sets which allows for unbiased approaches to identify genes that are important in tumor progression, the research study noted. “Here, we aggregated available cancer databases to identify cancer driver genes across tumor types by combining gene transcript and DNA copy number across chromosome 20q to

  2. Mini-clusters with mean probabilities for identifying effective siRNAs.

    Science.gov (United States)

    Xingang, Jia; Lu, Zuhong; Han, Qiuhong

    2012-09-18

    The distinction between the effective siRNAs and the ineffective ones is in high demand for gene knockout technology. To design effective siRNAs, many approaches have been proposed. Those approaches attempt to classify the siRNAs into effective and ineffective classes but they are difficult to decide the boundary between these two classes. Here, we try to split effective and ineffective siRNAs into many smaller subclasses by RMP-MiC(the relative mean probabilities of siRNAs with the mini-clusters algorithm). The relative mean probabilities of siRNAs are the modified arithmetic mean value of three probabilities, which come from three Markov chain of effective siRNAs. The mini-clusters algorithm is a modified version of micro-cluster algorithm. When the RMP-MiC was applied to the experimental siRNAs, the result shows that all effective siRNAs can be identified correctly, and no more than 9% ineffective siRNAs are misidentified as effective ones. We observed that the efficiency of those misidentified ineffective siRNAs exceed 70%, which is very closed to the used efficiency threshold. From the analysis of the siRNAs data, we suggest that the mini-clusters algorithm with relative mean probabilities can provide new insights to the applications for distinguishing effective siRNAs from ineffective ones.

  3. High presence/absence gene variability in defense-related gene clusters of Cucumis melo.

    Science.gov (United States)

    González, Víctor M; Aventín, Núria; Centeno, Emilio; Puigdomènech, Pere

    2013-11-12

    Changes in the copy number of DNA sequences are one of the main mechanisms generating genome variability in eukaryotes. These changes are often related to phenotypic effects such as genetic disorders or novel pathogen resistance. The increasing availability of genome sequences through the application of next-generation massive sequencing technologies has allowed the study of genomic polymorphisms at both the interspecific and intraspecific levels, thus helping to understand how species adapt to changing environments through genome variability. Data on gene presence/absence variation (PAV) in melon was obtained by resequencing a cultivated accession and an old-relative melon variety, and using previously obtained resequencing data from three other melon cultivars, among them DHL92, on which the current draft melon genome sequence is based. A total of 1,697 PAV events were detected, involving 4.4% of the predicted melon gene complement. In all, an average 1.5% of genes were absent from each analyzed cultivar as compared to the DHL92 reference genome. The most populated functional category among the 304 PAV genes of known function was that of stress response proteins (30% of all classified PAVs). Our results suggest that genes from multi-copy families are five times more likely to be affected by PAV than singleton genes. Also, the chance of genes present in the genome in tandem arrays being affected by PAV is double that of isolated genes, with PAV genes tending to be in longer clusters. The highest concentration of PAV events detected in the melon genome was found in a 1.1 Mb region of linkage group V, which also shows the highest density of melon stress-response genes. In particular, this region contains the longest continuous gene-containing PAV sequence so far identified in melon. The first genome-wide report of PAV variation among several melon cultivars is presented here. Multi-copy and clustered genes, especially those with putative stress-response functions

  4. A Single Gene Cluster for Chalcomycins and Aldgamycins: Genetic Basis for Bifurcation of Their Biosynthesis.

    Science.gov (United States)

    Tang, Xiao-Long; Dai, Ping; Gao, Hao; Wang, Chuan-Xi; Chen, Guo-Dong; Hong, Kui; Hu, Dan; Yao, Xin-Sheng

    2016-07-01

    Aldgamycins are 16-membered macrolide antibiotics with a rare branched-chain sugar d-aldgarose or decarboxylated d-aldgarose at C-5. In our efforts to clone the gene cluster for aldgamycins from a marine-derived Streptomyces sp. HK-2006-1 capable of producing both aldgamycins and chalcomycins, we found that both are biosynthesized from a single gene cluster. Whole-genome sequencing combined with gene disruption established the entire gene cluster of aldgamycins: nine new genes are incorporated with the previously identified chalcomycin gene cluster. Functional analysis of these genes revealed that almDI/almDII, (encoding α/β subunits of pyruvate dehydrogenase) triggers the biosynthesis of aldgamycins, whereas almCI (encoding an oxidoreductase) initiates chalcomycins biosynthesis. This is the first report that aldgamycins and chalcomycins are derived from a single gene cluster and of the genetic basis for bifurcation in their biosynthesis. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Intranuclear and higher-order chromatin organization of the major histone gene cluster in breast cancer.

    Science.gov (United States)

    Fritz, Andrew J; Ghule, Prachi N; Boyd, Joseph R; Tye, Coralee E; Page, Natalie A; Hong, Deli; Shirley, David J; Weinheimer, Adam S; Barutcu, Ahmet R; Gerrard, Diana L; Frietze, Seth; van Wijnen, Andre J; Zaidi, Sayyed K; Imbalzano, Anthony N; Lian, Jane B; Stein, Janet L; Stein, Gary S

    2018-02-01

    Alterations in nuclear morphology are common in cancer progression. However, the degree to which gross morphological abnormalities translate into compromised higher-order chromatin organization is poorly understood. To explore the functional links between gene expression and chromatin structure in breast cancer, we performed RNA-seq gene expression analysis on the basal breast cancer progression model based on human MCF10A cells. Positional gene enrichment identified the major histone gene cluster at chromosome 6p22 as one of the most significantly upregulated (and not amplified) clusters of genes from the normal-like MCF10A to premalignant MCF10AT1 and metastatic MCF10CA1a cells. This cluster is subdivided into three sub-clusters of histone genes that are organized into hierarchical topologically associating domains (TADs). Interestingly, the sub-clusters of histone genes are located at TAD boundaries and interact more frequently with each other than the regions in-between them, suggesting that the histone sub-clusters form an active chromatin hub. The anchor sites of loops within this hub are occupied by CTCF, a known chromatin organizer. These histone genes are transcribed and processed at a specific sub-nuclear microenvironment termed the major histone locus body (HLB). While the overall chromatin structure of the major HLB is maintained across breast cancer progression, we detected alterations in its structure that may relate to gene expression. Importantly, breast tumor specimens also exhibit a coordinate pattern of upregulation across the major histone gene cluster. Our results provide a novel insight into the connection between the higher-order chromatin organization of the major HLB and its regulation during breast cancer progression. © 2017 Wiley Periodicals, Inc.

  6. A genome-wide analysis of nonribosomal peptide synthetase gene clusters and their peptides in a Planktothrix rubescens strain

    Directory of Open Access Journals (Sweden)

    Nederbragt Alexander J

    2009-08-01

    Full Text Available Abstract Background Cyanobacteria often produce several different oligopeptides, with unknown biological functions, by nonribosomal peptide synthetases (NRPS. Although some cyanobacterial NRPS gene cluster types are well described, the entire NRPS genomic content within a single cyanobacterial strain has never been investigated. Here we have combined a genome-wide analysis using massive parallel pyrosequencing ("454" and mass spectrometry screening of oligopeptides produced in the strain Planktothrix rubescens NIVA CYA 98 in order to identify all putative gene clusters for oligopeptides. Results Thirteen types of oligopeptides were uncovered by mass spectrometry (MS analyses. Microcystin, cyanopeptolin and aeruginosin synthetases, highly similar to already characterized NRPS, were present in the genome. Two novel NRPS gene clusters were associated with production of anabaenopeptins and microginins, respectively. Sequence-depth of the genome and real-time PCR data revealed three copies of the microginin gene cluster. Since NRPS gene cluster candidates for microviridin and oscillatorin synthesis could not be found, putative (gene encoded precursor peptide sequences to microviridin and oscillatorin were found in the genes mdnA and oscA, respectively. The genes flanking the microviridin and oscillatorin precursor genes encode putative modifying enzymes of the precursor oligopeptides. We therefore propose ribosomal pathways involving modifications and cyclisation for microviridin and oscillatorin. The microviridin, anabaenopeptin and cyanopeptolin gene clusters are situated in close proximity to each other, constituting an oligopeptide island. Conclusion Altogether seven nonribosomal peptide synthetase (NRPS gene clusters and two gene clusters putatively encoding ribosomal oligopeptide biosynthetic pathways were revealed. Our results demonstrate that whole genome shotgun sequencing combined with MS-directed determination of oligopeptides successfully

  7. Combining affinity propagation clustering and mutual information network to investigate key genes in fibroid.

    Science.gov (United States)

    Chen, Qian-Song; Wang, Dan; Liu, Bao-Lian; Gao, Shu-Feng; Gao, Dan-Li; Li, Gui-Rong

    2017-07-01

    The aim of the present study was to investigate key genes in fibroids based on the multiple affinity propogation-Krzanowski and Lai (mAP-KL) method, which included the maxT multiple hypothesis, Krzanowski and Lai (KL) cluster quality index, affinity propagation (AP) clustering algorithm and mutual information network (MIN) constructed by the context likelihood of relatedness (CLR) algorithm. In order to achieve this goal, mAP-KL was initially implemented to investigate exemplars in fibroid, and the maxT function was employed to rank the genes of training and test sets, and the top 200 genes were obtained for further study. In addition, the KL cluster index was applied to determine the quantity of clusters and the AP clustering algorithm was conducted to identify the clusters and their exemplars. Subsequently, the support vector machine (SVM) model was selected to evaluate the classification performance of mAP-KL. Finally, topological properties (degree, closeness, betweenness and transitivity) of exemplars in MIN constructed according to the CLR algorithm were assessed to investigate key genes in fibroid. The SVM model validated that the classification between normal controls and fibroid patients by mAP-KL had a good performance. A total of 9 clusters and exemplars were identified based on mAP-KL, which were comprised of CALCOCO2 , COL4A2 , COPS8 , SNCG , PA2G4 , C17orf70 , MARK3 , BTNL3 and TBC1D13 . By accessing the topological analysis for exemplars in MIN, SNCG and COL4A2 were identified as the two most significant genes of four types of methods, and they were denoted as key genes in the progress of fibroid. In conclusion, two key genes ( SNCG and COL4A2 ) and 9 exemplars were successfully investigated, and these may be potential biomarkers for the detection and treatment of fibroid.

  8. Teaching Gene Technology in an Outreach Lab: Students' Assigned Cognitive Load Clusters and the Clusters' Relationships to Learner Characteristics, Laboratory Variables, and Cognitive Achievement

    Science.gov (United States)

    Scharfenberg, Franz-Josef; Bogner, Franz X.

    2013-01-01

    This study classified students into different cognitive load (CL) groups by means of cluster analysis based on their experienced CL in a gene technology outreach lab which has instructionally been designed with regard to CL theory. The relationships of the identified student CL clusters to learner characteristics, laboratory variables, and…

  9. iBBiG: iterative binary bi-clustering of gene sets.

    Science.gov (United States)

    Gusenleitner, Daniel; Howe, Eleanor A; Bentink, Stefan; Quackenbush, John; Culhane, Aedín C

    2012-10-01

    Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods. We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set-phenotype association that predicted tumor metastases within tumor subtypes. Implemented in the Bioconductor package iBBiG CONTACT: aedin@jimmy.harvard.edu.

  10. Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture.

    Science.gov (United States)

    Johnson, Timothy A; Stedtfeld, Robert D; Wang, Qiong; Cole, James R; Hashsham, Syed A; Looft, Torey; Zhu, Yong-Guan; Tiedje, James M

    2016-04-12

    Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if

  11. Gene expression profiling in cluster headache: a pilot microarray study.

    Science.gov (United States)

    Sjöstrand, Christina; Duvefelt, Kristina; Steinberg, Anna; Remahl, Ingela Nilsson; Waldenlind, Elisabet; Hillert, Jan

    2006-01-01

    Cluster headache (CH) is a primary neurovascular headache disorder characterized by attacks of excruciating pain accompanied by ipsilateral autonomic symptoms. CH pathophysiology is presumed to involve an activation of hypothalamic and trigeminovascular systems, but inflammation and immunological mechanisms have also been hypothesized to be of importance. To identify differentially expressed genes during different clinical phases of CH, assuming that changes of pathophysiological importance would also be seen in peripheral venous blood. Blood samples were drawn at 3 consecutive occasions from 3 episodic CH patients: during attacks, between attacks and in remission, and at 1 occasion from 3 matched controls. Global gene expression was analyzed with microarray tehnology using the Affymetrix Human Genome U133 2.0 Plus GeneChip Set, covering more than 54,000 gene transcripts, corresponding to almost 22,000 genes. Quantitative RT-PCR on S100P gene expression was analyzed in 6 patients and 14 controls. Overall, quite small differences were seen intraindividually and large differences interindividually. However, pairwise comparisons of signal values showed upregulation of several S100 calcium binding proteins; S100A8 (calgranulin A), S100A12 (calgranulin C), and S100P during active phase of the disease compared to remission. Also, annexin A3 (calcium-binding) and ICAM3 showed upregulation. BIRC1 (neuronal apoptosis inhibitory protein), CREB5, HLA-DQA1, and HLA-DQB1 were upregulated in patients compared to controls. The upregulation of S100P during attack versus remission was confirmed by quantitative RT-PCR analysis. The S100A8 and S100A12 proteins are considered markers of non-infectious inflammatory disease, while the function of S100P is still largely unknown. Furthermore, upregulation of HLA-DQ genes in CH patients may also indicate an inflammatory response. Upregulation of these pro-inflammatory genes during the active phase of CH has not formerly been reported. Data

  12. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  13. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes

    Directory of Open Access Journals (Sweden)

    Datta Somnath

    2006-08-01

    algorithm's ability to produce biologically meaningful clusters when applied repeatedly to similar data sets. A good clustering algorithm should have high BHI and moderate to high BSI. We evaluated the performance of ten well known clustering algorithms on two gene expression data sets and identified the optimal algorithm in each case. The first data set deals with SAGE profiles of differentially expressed tags between normal and ductal carcinoma in situ samples of breast cancer patients. The second data set contains the expression profiles over time of positively expressed genes (ORF's during sporulation of budding yeast. Two separate choices of the functional classes were used for this data set and the results were compared for consistency. Conclusion Functional information of annotated genes available from various GO databases mined using ontology tools can be used to systematically judge the results of an unsupervised clustering algorithm as applied to a gene expression data set in clustering genes. This information could be used to select the right algorithm from a class of clustering algorithms for the given data set.

  14. Structural and functional characterization of three polyketide synthase gene clusters in Bacillus amyloliquefaciens FZB 42.

    Science.gov (United States)

    Chen, Xiao-Hua; Vater, Joachim; Piel, Jörn; Franke, Peter; Scholz, Romy; Schneider, Kathrin; Koumoutsi, Alexandra; Hitzeroth, Gabriele; Grammel, Nicolas; Strittmatter, Axel W; Gottschalk, Gerhard; Süssmuth, Roderich D; Borriss, Rainer

    2006-06-01

    Although bacterial polyketides are of considerable biomedical interest, the molecular biology of polyketide biosynthesis in Bacillus spp., one of the richest bacterial sources of bioactive natural products, remains largely unexplored. Here we assign for the first time complete polyketide synthase (PKS) gene clusters to Bacillus antibiotics. Three giant modular PKS systems of the trans-acyltransferase type were identified in Bacillus amyloliquefaciens FZB 42. One of them, pks1, is an ortholog of the pksX operon with a previously unknown function in the sequenced model strain Bacillus subtilis 168, while the pks2 and pks3 clusters are novel gene clusters. Cassette mutagenesis combined with advanced mass spectrometric techniques such as matrix-assisted laser desorption ionization-time of flight mass spectrometry and liquid chromatography-electrospray ionization mass spectrometry revealed that the pks1 (bae) and pks3 (dif) gene clusters encode the biosynthesis of the polyene antibiotics bacillaene and difficidin or oxydifficidin, respectively. In addition, B. subtilis OKB105 (pheA sfp(0)), a transformant of the B. subtilis 168 derivative JH642, was shown to produce bacillaene, demonstrating that the pksX gene cluster directs the synthesis of that polyketide. The GenBank accession numbers for gene clusters pks1(bae), pks2, and pks3(dif) are AJ 634060.2, AJ 6340601.2, and AJ 6340602.2, respectively.

  15. The CHRNA5-A3-B4 Gene Cluster and Smoking: From Discovery to Therapeutics.

    Science.gov (United States)

    Lassi, Glenda; Taylor, Amy E; Timpson, Nicholas J; Kenny, Paul J; Mather, Robert J; Eisen, Tim; Munafò, Marcus R

    2016-12-01

    Genome-wide association studies (GWASs) have identified associations between the CHRNA5-CHRNA3-CHRNB4 gene cluster and smoking heaviness and nicotine dependence. Studies in rodents have described the anatomical localisation and function of the nicotinic acetylcholine receptors (nAChRs) formed by the subunits encoded by this gene cluster. Further investigations that complemented these studies highlighted the variability of individuals' smoking behaviours and their ability to adjust nicotine intake. GWASs of smoking-related health outcomes have also identified this signal in the CHRNA5-CHRNA3-CHRNB4 gene cluster. This insight underpins approaches to strengthen causal inference in observational data. Combining genetic and mechanistic studies of nicotine dependence and smoking heaviness may reveal novel targets for medication development. Validated targets can inform genetic therapeutic interventions for smoking cessation and tobacco-related diseases. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  16. Identifying molecular subtypes in human colon cancer using gene expression and DNA methylation microarray data.

    Science.gov (United States)

    Ren, Zhonglu; Wang, Wenhui; Li, Jinming

    2016-02-01

    Identifying colon cancer subtypes based on molecular signatures may allow for a more rational, patient-specific approach to therapy in the future. Classifications using gene expression data have been attempted before with little concordance between the different studies carried out. In this study we aimed to uncover subtypes of colon cancer that have distinct biological characteristics and identify a set of novel biomarkers which could best reflect the clinical and/or biological characteristics of each subtype. Clustering analysis and discriminant analysis were utilized to discover the subtypes in two different molecular levels on 153 colon cancer samples from The Cancer Genome Atlas (TCGA) Data Portal. At gene expression level, we identified two major subtypes, ECL1 (expression cluster 1) and ECL2 (expression cluster 2) and a list of signature genes. Due to the heterogeneity of colon cancer, the subtype ECL1 can be further subdivided into three nested subclasses, and HOTAIR were found upregulated in subclass 2. At DNA methylation level, we uncovered three major subtypes, MCL1 (methylation cluster 1), MCL2 (methylation cluster 2) and MCL3 (methylation cluster 3). We found only three subtypes of CpG island methylator phenotype (CIMP) in colon cancer instead of the four subtypes in the previous reports, and we found no sufficient evidence to subdivide MCL3 into two distinct subgroups.

  17. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    Directory of Open Access Journals (Sweden)

    Borui Pi

    Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  18. Identifying novel genes contributing to asthma pathogenesis

    NARCIS (Netherlands)

    Holloway, John W.; Koppelman, Gerard H.

    Purpose of review To illustrate recent examples of novel asthma genes such as those encoding G-protein-coupled receptor for asthma susceptibility, filaggrin and tenascin-C, and to describe the process that is needed to translate these findings to the clinic. Recent findings Many hundreds of studies

  19. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  20. CAR gene cluster and transcript levels of carotenogenic genes in Rhodotorula mucilaginosa.

    Science.gov (United States)

    Landolfo, Sara; Ianiri, Giuseppe; Camiolo, Salvatore; Porceddu, Andrea; Mulas, Giuliana; Chessa, Rossella; Zara, Giacomo; Mannazzu, Ilaria

    2018-01-01

    A molecular approach was applied to the study of the carotenoid biosynthetic pathway of Rhodotorula mucilaginosa. At first, functional annotation of the genome of R. mucilaginosa C2.5t1 was carried out and gene ontology categories were assigned to 4033 predicted proteins. Then, a set of genes involved in different steps of carotenogenesis was identified and those coding for phytoene desaturase, phytoene synthase/lycopene cyclase and carotenoid dioxygenase (CAR genes) proved to be clustered within a region of ~10 kb. Quantitative PCR of the genes involved in carotenoid biosynthesis showed that genes coding for 3-hydroxy-3-methylglutharyl-CoA reductase and mevalonate kinase are induced during exponential phase while no clear trend of induction was observed for phytoene synthase/lycopene cyclase and phytoene dehydrogenase encoding genes. Thus, in R. mucilaginosa the induction of genes involved in the early steps of carotenoid biosynthesis is transient and accompanies the onset of carotenoid production, while that of CAR genes does not correlate with the amount of carotenoids produced. The transcript levels of genes coding for carotenoid dioxygenase, superoxide dismutase and catalase A increased during the accumulation of carotenoids, thus suggesting the activation of a mechanism aimed at the protection of cell structures from oxidative stress during carotenoid biosynthesis. The data presented herein, besides being suitable for the elucidation of the mechanisms that underlie carotenoid biosynthesis, will contribute to boosting the biotechnological potential of this yeast by improving the outcome of further research efforts aimed at also exploring other features of interest.

  1. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering.

    Science.gov (United States)

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  2. Mining Association Rules among Gene Functions in Clusters of Similar Gene Expression Maps.

    Science.gov (United States)

    An, Li; Obradovic, Zoran; Smith, Desmond; Bodenreider, Olivier; Megalooikonomou, Vasileios

    2009-11-01

    Association rules mining methods have been recently applied to gene expression data analysis to reveal relationships between genes and different conditions and features. However, not much effort has focused on detecting the relation between gene expression maps and related gene functions. Here we describe such an approach to mine association rules among gene functions in clusters of similar gene expression maps on mouse brain. The experimental results show that the detected association rules make sense biologically. By inspecting the obtained clusters and the genes having the gene functions of frequent itemsets, interesting clues were discovered that provide valuable insight to biological scientists. Moreover, discovered association rules can be potentially used to predict gene functions based on similarity of gene expression maps.

  3. Identification of biosynthetic gene clusters from metagenomic libraries using PPTase complementation in a Streptomyces host.

    Science.gov (United States)

    Bitok, J Kipchirchir; Lemetre, Christophe; Ternei, Melinda A; Brady, Sean F

    2017-09-01

    The majority of environmental bacteria are not readily cultured in the lab, leaving the natural products they make inaccessible using culture-dependent discovery methods. Cloning and heterologous expression of DNA extracted from environmental samples (environmental DNA, eDNA) provides a means of circumventing this discovery bottleneck. To facilitate the identification of clones containing biosynthetic gene clusters, we developed a model heterologous expression reporter strain Streptomyces albus::bpsA ΔPPTase. This strain carries a 4΄-phosphopantetheinyl transferase (PPTase)-dependent blue pigment synthase A gene, bpsA, in a PPTase deletion background. eDNA clones that express a functional PPTase restore production of the blue pigment, indigoidine. As PPTase genes often occur in biosynthetic gene clusters (BGCs), indigoidine production can be used to identify eDNA clones containing BGCs. We screened a soil eDNA library hosted in S. albus::bpsA ΔPPTase and identified clones containing non-ribosomal peptide synthetase (NRPS), polyketide synthase (PKS) and mixed NRPS/PKS biosynthetic gene clusters. One NRPS gene cluster was shown to confer the production of myxochelin A to S. albus::bpsA ΔPPTase. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Shared gene structures and clusters of mutually exclusive spliced exons within the metazoan muscle myosin heavy chain genes.

    Directory of Open Access Journals (Sweden)

    Martin Kollmar

    Full Text Available Multicellular animals possess two to three different types of muscle tissues. Striated muscles have considerable ultrastructural similarity and contain a core set of proteins including the muscle myosin heavy chain (Mhc protein. The ATPase activity of this myosin motor protein largely dictates muscle performance at the molecular level. Two different solutions to adjusting myosin properties to different muscle subtypes have been identified so far: Vertebrates and nematodes contain many independent differentially expressed Mhc genes while arthropods have single Mhc genes with clusters of mutually exclusive spliced exons (MXEs. The availability of hundreds of metazoan genomes now allowed us to study whether the ancient bilateria already contained MXEs, how MXE complexity subsequently evolved, and whether additional scenarios to control contractile properties in different muscles could be proposed, By reconstructing the Mhc genes from 116 metazoans we showed that all intron positions within the motor domain coding regions are conserved in all bilateria analysed. The last common ancestor of the bilateria already contained a cluster of MXEs coding for part of the loop-2 actin-binding sequence. Subsequently the protostomes and later the arthropods gained many further clusters while MXEs got completely lost independently in several branches (vertebrates and nematodes and species (for example the annelid Helobdella robusta and the salmon louse Lepeophtheirus salmonis. Several bilateria have been found to encode multiple Mhc genes that might all or in part contain clusters of MXEs. Notable examples are a cluster of six tandemly arrayed Mhc genes, of which two contain MXEs, in the owl limpet Lottia gigantea and four Mhc genes with three encoding MXEs in the predatory mite Metaseiulus occidentalis. Our analysis showed that similar solutions to provide different myosin isoforms (multiple genes or clusters of MXEs or both have independently been developed

  5. Identification, characterization and metagenome analysis of oocyte-specific genes organized in clusters in the mouse genome

    Directory of Open Access Journals (Sweden)

    Vaiman Daniel

    2005-05-01

    Full Text Available Abstract Background Genes specifically expressed in the oocyte play key roles in oogenesis, ovarian folliculogenesis, fertilization and/or early embryonic development. In an attempt to identify novel oocyte-specific genes in the mouse, we have used an in silico subtraction methodology, and we have focused our attention on genes that are organized in genomic clusters. Results In the present work, five clusters have been studied: a cluster of thirteen genes characterized by an F-box domain localized on chromosome 9, a cluster of six genes related to T-cell leukaemia/lymphoma protein 1 (Tcl1 on chromosome 12, a cluster composed of a SPErm-associated glutamate (E-Rich (Speer protein expressed in the oocyte in the vicinity of four unknown genes specifically expressed in the testis on chromosome 14, a cluster composed of the oocyte secreted protein-1 (Oosp-1 gene and two Oosp-related genes on chromosome 19, all three being characterized by a partial N-terminal zona pellucida-like domain, and another small cluster of two genes on chromosome 19 as well, composed of a TWIK-Related spinal cord K+ channel encoding-gene, and an unknown gene predicted in silico to be testis-specific. The specificity of expression was confirmed by RT-PCR and in situ hybridization for eight and five of them, respectively. Finally, we showed by comparing all of the isolated and clustered oocyte-specific genes identified so far in the mouse genome, that the oocyte-specific clusters are significantly closer to telomeres than isolated oocyte-specific genes are. Conclusion We have studied five clusters of genes specifically expressed in female, some of them being also expressed in male germ-cells. Moreover, contrarily to non-clustered oocyte-specific genes, those that are organized in clusters tend to map near chromosome ends, suggesting that this specific near-telomere position of oocyte-clusters in rodents could constitute an evolutionary advantage. Understanding the biological

  6. Fine Mapping of Two Wheat Powdery Mildew Resistance Genes Located at the Pm1 Cluster

    Directory of Open Access Journals (Sweden)

    Junchao Liang

    2016-07-01

    Full Text Available Powdery mildew caused by (DC. f. sp. ( is a globally devastating foliar disease of wheat ( L.. More than a dozen genes against this disease, identified from wheat germplasms of different ploidy levels, have been mapped to the region surrounding the locus on the long arm of chromosome 7A, which forms a resistance (-gene cluster. and from einkorn wheat ( L. were two of the genes belonging to this cluster. This study was initiated to fine map these two genes toward map-based cloning. Comparative genomics study showed that macrocolinearity exists between L. chromosome 1 (Bd1 and the – region, which allowed us to develop markers based on the wheat sequences orthologous to genes contained in the Bd1 region. With these and other newly developed and published markers, high-resolution maps were constructed for both and using large F populations. Moreover, a physical map of was constructed through chromosome walking with bacterial artificial chromosome (BAC clones and comparative mapping. Eventually, and were restricted to a 0.12- and 0.86-cM interval, respectively. Based on the closely linked common markers, , , and (another powdery mildew resistance gene in the cluster were not allelic to one another. Severe recombination suppression and disruption of synteny were noted in the region encompassing . These results provided useful information for map-based cloning of the genes in the cluster and interpretation of their evolution.

  7. The gentle art of gene arrangement: the meaning of gene clusters

    Science.gov (United States)

    Trowsdale, John

    2002-01-01

    Genome sequence comparisons reveal that some sets of genes are in similar linkage groups in different organisms while other sets are dispersed. Are some linkage groups maintained by chance, or is there an advantage to such an arrangement? Some insights may come from large clusters of genes, such as the major histocompatibility complex which includes many genes involved in immune defense. PMID:11897017

  8. The gentle art of gene arrangement: the meaning of gene clusters

    OpenAIRE

    Trowsdale, John

    2002-01-01

    Genome sequence comparisons reveal that some sets of genes are in similar linkage groups in different organisms while other sets are dispersed. Are some linkage groups maintained by chance, or is there an advantage to such an arrangement? Some insights may come from large clusters of genes, such as the major histocompatibility complex which includes many genes involved in immune defense.

  9. Cloning and Heterologous Expression of the Grecocycline Biosynthetic Gene Cluster.

    Directory of Open Access Journals (Sweden)

    Oksana Bilyk

    Full Text Available Transformation-associated recombination (TAR in yeast is a rapid and inexpensive method for cloning and assembly of large DNA fragments, which relies on natural homologous recombination. Two vectors, based on p15a and F-factor replicons that can be maintained in yeast, E. coli and streptomycetes have been constructed. These vectors have been successfully employed for assembly of the grecocycline biosynthetic gene cluster from Streptomyces sp. Acta 1362. Fragments of the cluster were obtained by PCR and transformed together with the "capture" vector into the yeast cells, yielding a construct carrying the entire gene cluster. The obtained construct was heterologously expressed in S. albus J1074, yielding several grecocycline congeners. Grecocyclines have unique structural moieties such as a dissacharide side chain, an additional amino sugar at the C-5 position and a thiol group. Enzymes from this pathway may be used for the derivatization of known active angucyclines in order to improve their desired biological properties.

  10. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters

    NARCIS (Netherlands)

    Cimermancic, P.; Medema, Marnix; Claesen, J.; Kurika, K.; Wieland Brown, L.C.; Mavrommatis, K.; Pati, A.; Godfrey, P.A.; Koehrsen, M.; Clardy, J.; Birren, B. W.; Takano, Eriko; Sali, A.; Linington, R.G.; Fischbach, M.A.

    2014-01-01

    Although biosynthetic gene clusters (BGCs) have been discovered for hundreds of bacterial metabolites, our knowledge of their diversity remains limited. Here, we used a novel algorithm to systematically identify BGCs in the extensive extant microbial sequencing data. Network analysis of the

  11. Accurate prediction of secondary metabolite gene clusters in filamentous fungi

    DEFF Research Database (Denmark)

    Andersen, Mikael Rørdam; Nielsen, Jakob Blæsbjerg; Klitgaard, Andreas

    2013-01-01

    supporting enzymes for key synthases one cluster at a time. In this study, we design and apply a DNA expression array for Aspergillus nidulans in combination with legacy data to form a comprehensive gene expression compendium. We apply a guilt-by-association-based analysis to predict the extent...

  12. The fimbria gene cluster of nonencapsulated Haemophilus influenzae

    NARCIS (Netherlands)

    Geluk, F.; Eijk, P. P.; van Ham, S. M.; Jansen, H. M.; van Alphen, L.

    1998-01-01

    The occurrence of fimbria gene clusters in nonencapsulated Haemophilus influenzae strains from chronic bronchitis patients (n = 58), patients with acute otitis media (n = 13), and healthy carriers (n = 12) was determined by DNA hybridization and PCR, based on sequences of fimbriate H. influenzae

  13. The ergot alkaloid gene cluster: Functional analyses and evolutionary aspects

    Czech Academy of Sciences Publication Activity Database

    Lorenz, N.; Haarmann, T.; Pažoutová, Sylvie; Jung, M.; Tudzynski, P.

    2009-01-01

    Roč. 70, 15-16 (2009), s. 1822-1832 ISSN 0031-9422 Institutional research plan: CEZ:AV0Z50200510 Keywords : Claviceps purpurea * Ergot fungus * Ergot alkaloid gene cluster Subject RIV: EE - Microbiology, Virology Impact factor: 3.104, year: 2009

  14. Unique nucleotide polymorphism of ankyrin gene cluster in ...

    Indian Academy of Sciences (India)

    Genomics 19, 478–493. Krumlauf R. 1992 Evolution of the vertebrate Hox homeobox genes. BioEssays 14, 245–252. Kuittinen H. and Aguadé M. 2000 Nucleotide variation at the. CHALCONE ISOMERASE locus in Arabidopsis thaliana. Ge- netics 155, 863–872. Lercher M. J., Urrutia A. O. and Hurst L. D. 2002 Clustering of.

  15. Gene ordering in partitive clustering using microarray expressions

    Indian Academy of Sciences (India)

    2007-06-28

    Jun 28, 2007 ... Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and ...

  16. A remarkably stable TipE gene cluster: evolution of insect Para sodium channel auxiliary subunits

    Directory of Open Access Journals (Sweden)

    Li Jia

    2011-11-01

    Full Text Available Abstract Background First identified in fruit flies with temperature-sensitive paralysis phenotypes, the Drosophila melanogaster TipE locus encodes four voltage-gated sodium (NaV channel auxiliary subunits. This cluster of TipE-like genes on chromosome 3L, and a fifth family member on chromosome 3R, are important for the optional expression and functionality of the Para NaV channel but appear quite distinct from auxiliary subunits in vertebrates. Here, we exploited available arthropod genomic resources to trace the origin of TipE-like genes by mapping their evolutionary histories and examining their genomic architectures. Results We identified a remarkably conserved synteny block of TipE-like orthologues with well-maintained local gene arrangements from 21 insect species. Homologues in the water flea, Daphnia pulex, suggest an ancestral pancrustacean repertoire of four TipE-like genes; a subsequent gene duplication may have generated functional redundancy allowing gene losses in the silk moth and mosquitoes. Intronic nesting of the insect TipE gene cluster probably occurred following the divergence from crustaceans, but in the flour beetle and silk moth genomes the clusters apparently escaped from nesting. Across Pancrustacea, TipE gene family members have experienced intronic nesting, escape from nesting, retrotransposition, translocation, and gene loss events while generally maintaining their local gene neighbourhoods. D. melanogaster TipE-like genes exhibit coordinated spatial and temporal regulation of expression distinct from their host gene but well-correlated with their regulatory target, the Para NaV channel, suggesting that functional constraints may preserve the TipE gene cluster. We identified homology between TipE-like NaV channel regulators and vertebrate Slo-beta auxiliary subunits of big-conductance calcium-activated potassium (BKCa channels, which suggests that ion channel regulatory partners have evolved distinct lineage

  17. The Fusarium graminearum genome reveals more secondary metabolite gene clusters and hints of horizontal gene transfer.

    Directory of Open Access Journals (Sweden)

    Christian M K Sieber

    Full Text Available Fungal secondary metabolite biosynthesis genes are of major interest due to the pharmacological properties of their products (like mycotoxins and antibiotics. The genome of the plant pathogenic fungus Fusarium graminearum codes for a large number of candidate enzymes involved in secondary metabolite biosynthesis. However, the chemical nature of most enzymatic products of proteins encoded by putative secondary metabolism biosynthetic genes is largely unknown. Based on our analysis we present 67 gene clusters with significant enrichment of predicted secondary metabolism related enzymatic functions. 20 gene clusters with unknown metabolites exhibit strong gene expression correlation in planta and presumably play a role in virulence. Furthermore, the identification of conserved and over-represented putative transcription factor binding sites serves as additional evidence for cluster co-regulation. Orthologous cluster search provided insight into the evolution of secondary metabolism clusters. Some clusters are characteristic for the Fusarium phylum while others show evidence of horizontal gene transfer as orthologs can be found in representatives of the Botrytis or Cochliobolus lineage. The presented candidate clusters provide valuable targets for experimental examination.

  18. PEACE: Parallel Environment for Assembly and Clustering of Gene Expression.

    Science.gov (United States)

    Rao, D M; Moler, J C; Ozden, M; Zhang, Y; Liang, C; Karro, J E

    2010-07-01

    We present PEACE, a stand-alone tool for high-throughput ab initio clustering of transcript fragment sequences produced by Next Generation or Sanger Sequencing technologies. It is freely available from www.peace-tools.org. Installed and managed through a downloadable user-friendly graphical user interface (GUI), PEACE can process large data sets of transcript fragments of length 50 bases or greater, grouping the fragments by gene associations with a sensitivity comparable to leading clustering tools. Once clustered, the user can employ the GUI's analysis functions, facilitating the easy collection of statistics and allowing them to single out specific clusters for more comprehensive study or assembly. Using a novel minimum spanning tree-based clustering method, PEACE is the equal of leading tools in the literature, with an interface making it accessible to any user. It produces results of quality virtually identical to those of the WCD tool when applied to Sanger sequences, significantly improved results over WCD and TGICL when applied to the products of Next Generation Sequencing Technology and significantly improved results over Cap3 in both cases. In short, PEACE provides an intuitive GUI and a feature-rich, parallel clustering engine that proves to be a valuable addition to the leading cDNA clustering tools.

  19. DNACLUST: accurate and efficient clustering of phylogenetic marker genes

    Directory of Open Access Journals (Sweden)

    Liu Bo

    2011-06-01

    Full Text Available Abstract Background Clustering is a fundamental operation in the analysis of biological sequence data. New DNA sequencing technologies have dramatically increased the rate at which we can generate data, resulting in datasets that cannot be efficiently analyzed by traditional clustering methods. This is particularly true in the context of taxonomic profiling of microbial communities through direct sequencing of phylogenetic markers (e.g. 16S rRNA - the domain that motivated the work described in this paper. Many analysis approaches rely on an initial clustering step aimed at identifying sequences that belong to the same operational taxonomic unit (OTU. When defining OTUs (which have no universally accepted definition, scientists must balance a trade-off between computational efficiency and biological accuracy, as accurately estimating an environment's phylogenetic composition requires computationally-intensive analyses. We propose that efficient and mathematically well defined clustering methods can benefit existing taxonomic profiling approaches in two ways: (i the resulting clusters can be substituted for OTUs in certain applications; and (ii the clustering effectively reduces the size of the data-sets that need to be analyzed by complex phylogenetic pipelines (e.g., only one sequence per cluster needs to be provided to downstream analyses. Results To address the challenges outlined above, we developed DNACLUST, a fast clustering tool specifically designed for clustering highly-similar DNA sequences. Given a set of sequences and a sequence similarity threshold, DNACLUST creates clusters whose radius is guaranteed not to exceed the specified threshold. Underlying DNACLUST is a greedy clustering strategy that owes its performance to novel sequence alignment and k-mer based filtering algorithms. DNACLUST can also produce multiple sequence alignments for every cluster, allowing users to manually inspect clustering results, and enabling more

  20. Comparative genomics of natural killer cell receptor gene clusters.

    Directory of Open Access Journals (Sweden)

    James Kelley

    2005-08-01

    Full Text Available Many receptors on natural killer (NK cells recognize major histocompatibility complex class I molecules in order to monitor unhealthy tissues, such as cells infected with viruses, and some tumors. Genes encoding families of NK receptors and related sequences are organized into two main clusters in humans: the natural killer complex on Chromosome 12p13.1, which encodes C-type lectin molecules, and the leukocyte receptor complex on Chromosome 19q13.4, which encodes immunoglobulin superfamily molecules. The composition of these gene clusters differs markedly between closely related species, providing evidence for rapid, lineage-specific expansions or contractions of sets of loci. The choice of NK receptor genes is polarized in the two species most studied, mouse and human. In mouse, the C-type lectin-related Ly49 gene family predominates. Conversely, the single Ly49 sequence is a pseudogene in humans, and the immunoglobulin superfamily KIR gene family is extensive. These different gene sets encode proteins that are comparable in function and genetic diversity, even though they have undergone species-specific expansions. Understanding the biological significance of this curious situation may be aided by studying which NK receptor genes are used in other vertebrates, especially in relation to species-specific differences in genes for major histocompatibility complex class I molecules.

  1. Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

    Science.gov (United States)

    Chan, Julia Y. K.; Bauer, Christopher F.

    2014-01-01

    The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

  2. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Kautsar, Satria A.; Suarez Duran, Hernando G.; Blin, Kai

    2017-01-01

    in specific genomic loci: biosynthetic gene clusters (BGCs). Here, we introduce plantiSMASH, a versatile online analysis platform that automates the identification of candidate plant BGCs. Moreover, it allows integration of transcriptomic data to prioritize candidate BGCs based on the coexpression patterns......Plant specialized metabolites are chemically highly diverse, play key roles in host-microbe interactions, have important nutritional value in crops and are frequently applied as medicines. It has recently become clear that plant biosynthetic pathway-encoding genes are sometimes densely clustered...... of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental...

  3. A novel method incorporating gene ontology information for unsupervised clustering and feature selection.

    Directory of Open Access Journals (Sweden)

    Shireesh Srivastava

    Full Text Available Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection. Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest.Here, we present a method that integrates gene ontology (GO information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2 to saturated fatty acid (SFA and tumor necrosis factor (TNF-alpha, as compared to the non-toxic response to the unsaturated FFAs (UFA and TNF-alpha. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9 in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP. The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9.A framework is presented that incorporates prior ontology information, which helped to (a perform unsupervised clustering of the phenotypes, and (b identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and

  4. A method to identify differential expression profiles of time-course gene data with Fourier transformation.

    Science.gov (United States)

    Kim, Jaehee; Ogden, Robert Todd; Kim, Haseong

    2013-10-18

    Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be

  5. Gene expression analysis to identify molecular correlates of pre- and post-conditioning derived neuroprotection.

    Science.gov (United States)

    Prasad, Shiv S; Russell, Marsha; Nowakowska, Margeryta; Williams, Andrew; Yauk, Carole

    2012-06-01

    Mild ischaemic exposures before or after severe injurious ischaemia that elicit neuroprotective responses are referred to as preconditioning and post-conditioning. The corresponding molecular mechanisms of neuroprotection are not completely understood. Identification of the genes and associated pathways of corresponding neuroprotection would provide insight into neuronal survival, potential therapeutic approaches and assessments of therapies for stroke. The objectives of this study were to use global gene expression approach to infer the molecular mechanisms in pre- and post-conditioning-derived neuroprotection in cortical neurons following oxygen and glucose deprivation (OGD) in vitro and then to apply these findings to predict corresponding functional pathways. To this end, microarray analysis was applied to rat cortical neurons with or without the pre- and post-conditioning treatments at 3-h post-reperfusion, and differentially expressed transcripts were subjected to statistical, hierarchical clustering and pathway analyses. The expression patterns of 3,431 genes altered under all conditions of ischaemia (with and without pre- or post-conditioning). We identified 1,595 genes that were commonly regulated within both the pre- and post-conditioning treatments. Cluster analysis revealed that transcription profiles clustered tightly within controls, non-conditioned OGD and neuroprotected groups. Two clusters defining neuroprotective conditions associated with up- and downregulated genes were evident. The five most upregulated genes within the neuroprotective clusters were Tagln, Nes, Ptrf, Vim and Adamts9, and the five most downregulated genes were Slc7a3, Bex1, Brunol4, Nrxn3 and Cpne4. Pathway analysis revealed that the intracellular and second messenger signalling pathways in addition to cell death were predominantly associated with downregulated pre- and post-conditioning associated genes, suggesting that modulation of cell death and signal transduction pathways

  6. Sequencing and transcriptional analysis of the biosynthesis gene cluster of abscisic acid-producing Botrytis cinerea.

    Science.gov (United States)

    Gong, Tao; Shu, Dan; Yang, Jie; Ding, Zhong-Tao; Tan, Hong

    2014-09-29

    Botrytis cinerea is a model species with great importance as a pathogen of plants and has become used for biotechnological production of ABA. The ABA cluster of B. cinerea is composed of an open reading frame without significant similarities (bcaba3), followed by the genes (bcaba1 and bcaba2) encoding P450 monooxygenases and a gene probably coding for a short-chain dehydrogenase/reductase (bcaba4). In B. cinerea ATCC58025, targeted inactivation of the genes in the cluster suggested at least three genes responsible for the hydroxylation at carbon atom C-1' and C-4' or oxidation at C-4' of ABA. Our group has identified an ABA-overproducing strain, B. cinerea TB-3-H8. To differentiate TB-3-H8 from other B. cinerea strains with the functional ABA cluster, the DNA sequence of the 12.11-kb region containing the cluster of B. cinerea TB-3-H8 was determined. Full-length cDNAs were also isolated for bcaba1, bcaba2, bcaba3 and bcaba4 from B. cinerea TB-3-H8. Sequence comparison of the four genes and their flanking regions respectively derived from B. cinerea TB-3-H8, B05.10 and T4 revealed that major variations were located in intergenic sequences. In B. cinerea TB-3-H8, the expression profiles of the four function genes under ABA high-yield conditions were also analyzed by real-time PCR.

  7. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically importa...

  8. Characterisation of the paralytic shellfish toxin biosynthesis gene clusters in Anabaena circinalis AWQC131C and Aphanizomenon sp. NH-5

    Directory of Open Access Journals (Sweden)

    Neilan Brett A

    2009-03-01

    Full Text Available Abstract Background Saxitoxin and its analogues collectively known as the paralytic shellfish toxins (PSTs are neurotoxic alkaloids and are the cause of the syndrome named paralytic shellfish poisoning. PSTs are produced by a unique biosynthetic pathway, which involves reactions that are rare in microbial metabolic pathways. Nevertheless, distantly related organisms such as dinoflagellates and cyanobacteria appear to produce these toxins using the same pathway. Hypothesised explanations for such an unusual phylogenetic distribution of this shared uncommon metabolic pathway, include a polyphyletic origin, an involvement of symbiotic bacteria, and horizontal gene transfer. Results We describe the identification, annotation and bioinformatic characterisation of the putative paralytic shellfish toxin biosynthesis clusters in an Australian isolate of Anabaena circinalis and an American isolate of Aphanizomenon sp., both members of the Nostocales. These putative PST gene clusters span approximately 28 kb and contain genes coding for the biosynthesis and export of the toxin. A putative insertion/excision site in the Australian Anabaena circinalis AWQC131C was identified, and the organization and evolution of the gene clusters are discussed. A biosynthetic pathway leading to the formation of saxitoxin and its analogues in these organisms is proposed. Conclusion The PST biosynthesis gene cluster presents a mosaic structure, whereby genes have apparently transposed in segments of varying size, resulting in different gene arrangements in all three sxt clusters sequenced so far. The gene cluster organizational structure and sequence similarity seems to reflect the phylogeny of the producer organisms, indicating that the gene clusters have an ancient origin, or that their lateral transfer was also an ancient event. The knowledge we gain from the characterisation of the PST biosynthesis gene clusters, including the identity and sequence of the genes involved

  9. Characterisation of the paralytic shellfish toxin biosynthesis gene clusters in Anabaena circinalis AWQC131C and Aphanizomenon sp. NH-5.

    Science.gov (United States)

    Mihali, Troco K; Kellmann, Ralf; Neilan, Brett A

    2009-03-30

    Saxitoxin and its analogues collectively known as the paralytic shellfish toxins (PSTs) are neurotoxic alkaloids and are the cause of the syndrome named paralytic shellfish poisoning. PSTs are produced by a unique biosynthetic pathway, which involves reactions that are rare in microbial metabolic pathways. Nevertheless, distantly related organisms such as dinoflagellates and cyanobacteria appear to produce these toxins using the same pathway. Hypothesised explanations for such an unusual phylogenetic distribution of this shared uncommon metabolic pathway, include a polyphyletic origin, an involvement of symbiotic bacteria, and horizontal gene transfer. We describe the identification, annotation and bioinformatic characterisation of the putative paralytic shellfish toxin biosynthesis clusters in an Australian isolate of Anabaena circinalis and an American isolate of Aphanizomenon sp., both members of the Nostocales. These putative PST gene clusters span approximately 28 kb and contain genes coding for the biosynthesis and export of the toxin. A putative insertion/excision site in the Australian Anabaena circinalis AWQC131C was identified, and the organization and evolution of the gene clusters are discussed. A biosynthetic pathway leading to the formation of saxitoxin and its analogues in these organisms is proposed. The PST biosynthesis gene cluster presents a mosaic structure, whereby genes have apparently transposed in segments of varying size, resulting in different gene arrangements in all three sxt clusters sequenced so far. The gene cluster organizational structure and sequence similarity seems to reflect the phylogeny of the producer organisms, indicating that the gene clusters have an ancient origin, or that their lateral transfer was also an ancient event. The knowledge we gain from the characterisation of the PST biosynthesis gene clusters, including the identity and sequence of the genes involved in the biosynthesis, may also afford the identification of

  10. Genome-wide identification of physically clustered genes suggests chromatin-level co-regulation in male reproductive development in Arabidopsis thaliana.

    Science.gov (United States)

    Reimegård, Johan; Kundu, Snehangshu; Pendle, Ali; Irish, Vivian F; Shaw, Peter; Nakayama, Naomi; Sundström, Jens F; Emanuelsson, Olof

    2017-04-07

    Co-expression of physically linked genes occurs surprisingly frequently in eukaryotes. Such chromosomal clustering may confer a selective advantage as it enables coordinated gene regulation at the chromatin level. We studied the chromosomal organization of genes involved in male reproductive development in Arabidopsis thaliana. We developed an in-silico tool to identify physical clusters of co-regulated genes from gene expression data. We identified 17 clusters (96 genes) involved in stamen development and acting downstream of the transcriptional activator MS1 (MALE STERILITY 1), which contains a PHD domain associated with chromatin re-organization. The clusters exhibited little gene homology or promoter element similarity, and largely overlapped with reported repressive histone marks. Experiments on a subset of the clusters suggested a link between expression activation and chromatin conformation: qRT-PCR and mRNA in situ hybridization showed that the clustered genes were up-regulated within 48 h after MS1 induction; out of 14 chromatin-remodeling mutants studied, expression of clustered genes was consistently down-regulated only in hta9/hta11, previously associated with metabolic cluster activation; DNA fluorescence in situ hybridization confirmed that transcriptional activation of the clustered genes was correlated with open chromatin conformation. Stamen development thus appears to involve transcriptional activation of physically clustered genes through chromatin de-condensation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data.

    Science.gov (United States)

    Paul, Animesh Kumar; Shill, Pintu Chandra

    2018-01-01

    The product of gene expression works together in the cell for each living organism in order to achieve different biological processes. Many proteins are involved in different roles depending on the environment of the organism for the functioning of the cell. In this paper, we propose gene ontology (GO) annotations based semi-supervised clustering algorithm called GO fuzzy relational clustering (GO-FRC) where one gene is allowed to be assigned to multiple clusters which are the most biologically relevant behavior of genes. In the clustering process, GO-FRC utilizes useful biological knowledge which is available in the form of a gene ontology, as a prior knowledge along with the gene expression data. The prior knowledge helps to improve the coherence of the groups concerning the knowledge field. The proposed GO-FRC has been tested on the two yeast (Saccharomyces cerevisiae) expression profiles datasets (Eisen and Dream5 yeast datasets) and compared with other state-of-the-art clustering algorithms. Experimental results imply that GO-FRC is able to produce more biologically relevant clusters with the use of the small amount of GO annotations. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Cloning of the biosynthetic gene cluster for naphthoxanthene antibiotic FD-594 from Streptomyces sp. TA-0256.

    Science.gov (United States)

    Kudo, Fumitaka; Yonezawa, Takanori; Komatsubara, Akiko; Mizoue, Kazutoshi; Eguchi, Tadashi

    2011-01-01

    FD-594 is an unique pyrano[4',3':6,7]naphtho[1,2-b]xanthene polyketide with a trisaccharide of 2,6-dideoxysugars. In this study, we cloned the FD-594 biosynthetic gene cluster from the producer strain Streptomyces sp. TA-0256 to investigate its biosynthesis. The identified pnx gene cluster was 38143 bp, consisting of 40 open reading frames, including a minimal PKS gene, TDP-olivose biosynthetic genes, two glycosyltransferase genes, two methyltransferase genes and many oxygenase/reductase genes. Most of these enzymes coded in the pnx cluster were reasonably assigned to a plausible biosynthetic pathway for FD-594, in which an unique ring opening process via Baeyer-Villiger-type oxidation catalyzed by a putative flavin adenine dinucleotide (FAD)-dependent monooxygenase, is speculated to lead to the unique xanthene structure. To clarify the involvement of pnx genes in the FD-594 biosynthesis, a glycosyltransferase, PnxGT2, and a methyltransferase, PnxMT2, were characterized enzymatically with the recombinant proteins expressed in Escherichia coli. As a result, PnxGT2 catalyzed the triple olivose transfers to the FD-594 aglycon with TDP-olivose as the glycosyl donor to afford triolivoside. Surprisingly, in the PnxGT2 enzymatic reaction, tetraolivoside and pentaolivoside were significantly detected along with the expected triolivoside. To our knowledge, PnxGT2 is the first contiguous oligosaccharide-forming glycosyltransferase in secondary metabolism. Furthermore, addition of PnxMT2 and S-adenosyl-L-methionine into the PnxGT2 reaction mixture afforded natural FD-594 to confirm that the PnxGT2 reaction product was the expected regiospecifically glycosylated compound. Consequently, the identified pnx gene cluster appears to be involved in FD-594 biosynthesis.

  13. Identifying Cancer Driver Genes Using Replication-Incompetent Retroviral Vectors

    Directory of Open Access Journals (Sweden)

    Victor M. Bii

    2016-10-01

    Full Text Available Identifying novel genes that drive tumor metastasis and drug resistance has significant potential to improve patient outcomes. High-throughput sequencing approaches have identified cancer genes, but distinguishing driver genes from passengers remains challenging. Insertional mutagenesis screens using replication-incompetent retroviral vectors have emerged as a powerful tool to identify cancer genes. Unlike replicating retroviruses and transposons, replication-incompetent retroviral vectors lack additional mutagenesis events that can complicate the identification of driver mutations from passenger mutations. They can also be used for almost any human cancer due to the broad tropism of the vectors. Replication-incompetent retroviral vectors have the ability to dysregulate nearby cancer genes via several mechanisms including enhancer-mediated activation of gene promoters. The integrated provirus acts as a unique molecular tag for nearby candidate driver genes which can be rapidly identified using well established methods that utilize next generation sequencing and bioinformatics programs. Recently, retroviral vector screens have been used to efficiently identify candidate driver genes in prostate, breast, liver and pancreatic cancers. Validated driver genes can be potential therapeutic targets and biomarkers. In this review, we describe the emergence of retroviral insertional mutagenesis screens using replication-incompetent retroviral vectors as a novel tool to identify cancer driver genes in different cancer types.

  14. Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

    Science.gov (United States)

    Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra; Maulik, Ujjwal

    2010-11-12

    With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.

  15. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    Science.gov (United States)

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  16. Analysis of gene expression in the nervous system identifies key genes and novel candidates for health and disease.

    Science.gov (United States)

    Carpanini, Sarah M; Wishart, Thomas M; Gillingwater, Thomas H; Manson, Jean C; Summers, Kim M

    2017-04-01

    The incidence of neurodegenerative diseases in the developed world has risen over the last century, concomitant with an increase in average human lifespan. A major challenge is therefore to identify genes that control neuronal health and viability with a view to enhancing neuronal health during ageing and reducing the burden of neurodegeneration. Analysis of gene expression data has recently been used to infer gene functions for a range of tissues from co-expression networks. We have now applied this approach to transcriptomic datasets from the mammalian nervous system available in the public domain. We have defined the genes critical for influencing neuronal health and disease in different neurological cell types and brain regions. The functional contribution of genes in each co-expression cluster was validated using human disease and knockout mouse phenotypes, pathways and gene ontology term annotation. Additionally a number of poorly annotated genes were implicated by this approach in nervous system function. Exploiting gene expression data available in the public domain allowed us to validate key nervous system genes and, importantly, to identify additional genes with minimal functional annotation but with the same expression pattern. These genes are thus novel candidates for a role in neurological health and disease and could now be further investigated to confirm their function and regulation during ageing and neurodegeneration.

  17. Intact cluster and chordate-like expression of ParaHox genes in a sea star.

    Science.gov (United States)

    Annunziata, Rossella; Martinez, Pedro; Arnone, Maria Ina

    2013-06-27

    The ParaHox genes are thought to be major players in patterning the gut of several bilaterian taxa. Though this is a fundamental role that these transcription factors play, their activities are not limited to the endoderm and extend to both ectodermal and mesodermal tissues. Three genes compose the ParaHox group: Gsx, Xlox and Cdx. In some taxa (mostly chordates but to some degree also in protostomes) the three genes are arranged into a genomic cluster, in a similar fashion to what has been shown for the better-known Hox genes. Sea urchins possess the full complement of ParaHox genes but they are all dispersed throughout the genome, an arrangement that, perhaps, represented the primitive condition for all echinoderms. In order to understand the evolutionary history of this group of genes we cloned and characterized all ParaHox genes, studied their expression patterns and identified their genomic loci in a member of an earlier branching group of echinoderms, the asteroid Patiria miniata. We identified the three ParaHox orthologs in the genome of P. miniata. While one of them, PmGsx is provided as maternal message, with no zygotic activation afterwards, the other two, PmLox and PmCdx are expressed during embryogenesis, within restricted domains of both endoderm and ectoderm. Screening of a Patiria bacterial artificial chromosome (BAC) library led to the identification of a clone containing the three genes. The transcriptional directions of PmGsx and PmLox are opposed to that of the PmCdx gene within the cluster. The identification of P. miniata ParaHox genes has revealed the fact that these genes are clustered in the genome, in contrast to what has been reported for echinoids. Since the presence of an intact cluster, or at least a partial cluster, has been reported in chordates and polychaetes respectively, it becomes clear that within echinoderms, sea urchins have modified the original bilaterian arrangement. Moreover, the sea star ParaHox domains of expression show

  18. Identification of certain cancer-mediating genes using Gaussian fuzzy cluster validity index.

    Science.gov (United States)

    Ghosh, Anupam; De, Rajat K

    2015-10-01

    In this article, we have used an index, called Gaussian fuzzy index (GFI), recently developed by the authors, based on the notion of fuzzy set theory, for validating the clusters obtained by a clustering algorithm applied on cancer gene expression data. GFI is then used for the identification of genes that have altered quite significantly from normal state to carcinogenic state with respect to their mRNA expression patterns. The effectiveness of the methodology has been demonstrated on three gene expression cancer datasets dealing with human lung, colon and leukemia. The performance of GFI is compared with 19 exiting cluster validity indices. The results are appropriately validated biologically and statistically. In this context, we have used biochemical pathways, p-value statistics of GO attributes, t-test and zscore for the validation of the results. It has been reported that GFI is capable of identifying high-quality enriched clusters of genes, and thereby is able to select more cancer-mediating genes.

  19. Meta-analysis of cell- specific transcriptomic data using fuzzy c-means clustering discovers versatile viral responsive genes.

    Science.gov (United States)

    Khan, Atif; Katanic, Dejan; Thakar, Juilee

    2017-06-06

    Despite advances in the gene-set enrichment analysis methods; inadequate definitions of gene-sets cause a major limitation in the discovery of novel biological processes from the transcriptomic datasets. Typically, gene-sets are obtained from publicly available pathway databases, which contain generalized definitions frequently derived by manual curation. Recently unsupervised clustering algorithms have been proposed to identify gene-sets from transcriptomics datasets deposited in public domain. These data-driven definitions of the gene-sets can be context-specific revealing novel biological mechanisms. However, the previously proposed algorithms for identification of data-driven gene-sets are based on hard clustering which do not allow overlap across clusters, a characteristic that is predominantly observed across biological pathways. We developed a pipeline using fuzzy-C-means (FCM) soft clustering approach to identify gene-sets which recapitulates topological characteristics of biological pathways. Specifically, we apply our pipeline to derive gene-sets from transcriptomic data measuring response of monocyte derived dendritic cells and A549 epithelial cells to influenza infections. Our approach apply Ward's method for the selection of initial conditions, optimize parameters of FCM algorithm for human cell-specific transcriptomic data and identify robust gene-sets along with versatile viral responsive genes. We validate our gene-sets and demonstrate that by identifying genes associated with multiple gene-sets, FCM clustering algorithm significantly improves interpretation of transcriptomic data facilitating investigation of novel biological processes by leveraging on transcriptomic data available in the public domain. We develop an interactive 'Fuzzy Inference of Gene-sets (FIGS)' package (GitHub: https://github.com/Thakar-Lab/FIGS ) to facilitate use of of pipeline. Future extension of FIGS across different immune cell-types will improve mechanistic

  20. Analysis of promoter regions of co-expressed genes identified by microarray analysis

    Directory of Open Access Journals (Sweden)

    Höglund Mattias

    2006-08-01

    Full Text Available Abstract Background The use of global gene expression profiling to identify sets of genes with similar expression patterns is rapidly becoming a widespread approach for understanding biological processes. A logical and systematic approach to study co-expressed genes is to analyze their promoter sequences to identify transcription factors that may be involved in establishing specific profiles and that may be experimentally investigated. Results We introduce promoter clustering i.e. grouping of promoters with respect to their high scoring motif content, and show that this approach greatly enhances the identification of common and significant transcription factor binding sites (TFBS in co-expressed genes. We apply this method to two different dataset, one consisting of micro array data from 108 leukemias (AMLs and a second from a time series experiment, and show that biologically relevant promoter patterns may be obtained using phylogenetic foot-printing methodology. In addition, we also found that 15% of the analyzed promoter regions contained transcription factors start sites for additional genes transcribed in the opposite direction. Conclusion Promoter clustering based on global promoter features greatly improve the identification of shared TFBS in co-expressed genes. We believe that the outlined approach may be a useful first step to identify transcription factors that contribute to specific features of gene expression profiles.

  1. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    Science.gov (United States)

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  2. Evaluation of clustering algorithms for gene expression data using gene ontology annotations.

    Science.gov (United States)

    Ma, Ning; Zhang, Zheng-Guo

    2012-09-01

    Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes. Biologists frequently face the problem of choosing an appropriate algorithm. We aimed to provide a standalone, easily accessible and biologically oriented criterion for expression data clustering evaluation. An external criterion utilizing annotation based similarities between genes is proposed in this work. Gene ontology information is employed as the annotation source. Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed. The rank of these algorithms given by the criterion coincides with our common knowledge. Single-linkage has significantly poorer performance, even worse than the random algorithm. Ward's method archives the best performance in most cases. The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements. It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters. As an addition, we suggest using Ward's algorithm for gene expression data analysis.

  3. Identification of the Viridicatumtoxin and Griseofulvin Gene Clusters from Penicillium aethiopicum

    Science.gov (United States)

    Chooi, Yit-Heng; Cacho, Ralph; Tang, Yi

    2010-01-01

    SUMMARY Penicillium aethiopicum produces two structurally interesting and biologically active polyketides: the tetracycline-like viridicatumtoxin 1 and the classic antifungal agent griseofulvin 2. Here, we report the concurrent discovery of the two corresponding biosynthetic gene clusters (vrt and gsf) by 454 shotgun sequencing. Gene deletions confirmed two nonreducing PKSs (NRPKS), vrtA and gsfA, are required for the biosynthesis of 1 and 2, respectively. Both PKSs share similar domain architectures and lack a C-terminal thioesterase domain. We identified gsfI as the chlorinase involved in the biosynthesis of 2, as deletion of gsfI resulted in the accumulation of decholorogriseofulvin 3. Comparative analysis with the P. chrysogenum genome revealed that both clusters are embedded within conserved syntenic regions of P. aethiopicum chromosomes. Discovery of the vrt and gsf clusters provided the basis for genetic and biochemical studies of the pathways. PMID:20534346

  4. Hotspots of missense mutation identify novel neurodevelopmental disorder genes and functional domains

    Science.gov (United States)

    Geisheker, Madeleine R.; Heymann, Gabriel; Wang, Tianyun; Coe, Bradley P.; Turner, Tychele N.; Stessman, Holly A.F.; Hoekzema, Kendra; Kvarnung, Malin; Shaw, Marie; Friend, Kathryn; Liebelt, Jan; Barnett, Christopher; Thompson, Elizabeth M.; Haan, Eric; Guo, Hui; Anderlid, Britt-Marie; Nordgren, Ann; Lindstrand, Anna; Vandeweyer, Geert; Alberti, Antonino; Avola, Emanuela; Vinci, Mirella; Giusto, Stefania; Pramparo, Tiziano; Pierce, Karen; Nalabolu, Srinivasa; Michaelson, Jacob J.; Sedlacek, Zdenek; Santen, Gijs W.E.; Peeters, Hilde; Hakonarson, Hakon; Courchesne, Eric; Romano, Corrado; Kooy, R. Frank; Bernier, Raphael A.; Nordenskjöld, Magnus; Gecz, Jozef; Xia, Kun; Zweifel, Larry S.; Eichler, Evan E.

    2017-01-01

    Although de novo missense mutations have been predicted to account for more cases of autism than gene-truncating mutations, most research has focused on the latter. We identified the properties of de novo missense mutations in patients with neurodevelopmental disorders (NDDs) and highlight 35 genes with excess missense mutations. Additionally, 40 amino acid sites were recurrently mutated in 36 genes, and targeted sequencing of 20 sites in 17,689 NDD patients identified 21 new patients with identical missense mutations. One recurrent site (p.Ala636Thr) occurs in a glutamate receptor subunit, GRIA1. This same amino acid substitution in the homologous but distinct mouse glutamate receptor subunit Grid2 is associated with Lurcher ataxia. Phenotypic follow-up in five individuals with GRIA1 mutations shows evidence of specific learning disabilities and autism. Overall, we find significant clustering of de novo mutations in 200 genes, highlighting specific functional domains and synaptic candidate genes important in NDD pathology. PMID:28628100

  5. Identification and functional analysis of gene cluster involvement in biosynthesis of the cyclic lipopeptide antibiotic pelgipeptin produced by Paenibacillus elgii

    Directory of Open Access Journals (Sweden)

    Qian Chao-Dong

    2012-09-01

    Full Text Available Abstract Background Pelgipeptin, a potent antibacterial and antifungal agent, is a non-ribosomally synthesised lipopeptide antibiotic. This compound consists of a β-hydroxy fatty acid and nine amino acids. To date, there is no information about its biosynthetic pathway. Results A potential pelgipeptin synthetase gene cluster (plp was identified from Paenibacillus elgii B69 through genome analysis. The gene cluster spans 40.8 kb with eight open reading frames. Among the genes in this cluster, three large genes, plpD, plpE, and plpF, were shown to encode non-ribosomal peptide synthetases (NRPSs, with one, seven, and one module(s, respectively. Bioinformatic analysis of the substrate specificity of all nine adenylation domains indicated that the sequence of the NRPS modules is well collinear with the order of amino acids in pelgipeptin. Additional biochemical analysis of four recombinant adenylation domains (PlpD A1, PlpE A1, PlpE A3, and PlpF A1 provided further evidence that the plp gene cluster involved in pelgipeptin biosynthesis. Conclusions In this study, a gene cluster (plp responsible for the biosynthesis of pelgipeptin was identified from the genome sequence of Paenibacillus elgii B69. The identification of the plp gene cluster provides an opportunity to develop novel lipopeptide antibiotics by genetic engineering.

  6. Gene Clusters for Insecticidal Loline Alkaloids in the Grass-Endophytic Fungus Neotyphodium uncinatum

    OpenAIRE

    Spiering, Martin J.; Moon, Christina D.; Wilkinson, Heather H.; Schardl, Christopher L.

    2005-01-01

    Loline alkaloids are produced by mutualistic fungi symbiotic with grasses, and they protect the host plants from insects. Here we identify in the fungal symbiont, Neotyphodium uncinatum, two homologous gene clusters (LOL-1 and LOL-2) associated with loline-alkaloid production. Nine genes were identified in a 25-kb region of LOL-1 and designated (in order) lolF-1, lolC-1, lolD-1, lolO-1, lolA-1, lolU-1, lolP-1, lolT-1, and lolE-1. LOL-2 contained the homologs lolC-2 through lolE-2 in the same ...

  7. Recursive expectation-maximization clustering: A method for identifying buffering mechanisms composed of phenomic modules

    Science.gov (United States)

    Guo, Jingyu; Tian, Dehua; McKinney, Brett A.; Hartman, John L.

    2010-06-01

    Interactions between genetic and/or environmental factors are ubiquitous, affecting the phenotypes of organisms in complex ways. Knowledge about such interactions is becoming rate-limiting for our understanding of human disease and other biological phenomena. Phenomics refers to the integrative analysis of how all genes contribute to phenotype variation, entailing genome and organism level information. A systems biology view of gene interactions is critical for phenomics. Unfortunately the problem is intractable in humans; however, it can be addressed in simpler genetic model systems. Our research group has focused on the concept of genetic buffering of phenotypic variation, in studies employing the single-cell eukaryotic organism, S. cerevisiae. We have developed a methodology, quantitative high throughput cellular phenotyping (Q-HTCP), for high-resolution measurements of gene-gene and gene-environment interactions on a genome-wide scale. Q-HTCP is being applied to the complete set of S. cerevisiae gene deletion strains, a unique resource for systematically mapping gene interactions. Genetic buffering is the idea that comprehensive and quantitative knowledge about how genes interact with respect to phenotypes will lead to an appreciation of how genes and pathways are functionally connected at a systems level to maintain homeostasis. However, extracting biologically useful information from Q-HTCP data is challenging, due to the multidimensional and nonlinear nature of gene interactions, together with a relative lack of prior biological information. Here we describe a new approach for mining quantitative genetic interaction data called recursive expectation-maximization clustering (REMc). We developed REMc to help discover phenomic modules, defined as sets of genes with similar patterns of interaction across a series of genetic or environmental perturbations. Such modules are reflective of buffering mechanisms, i.e., genes that play a related role in the maintenance

  8. 'Omics' approaches in tomato aimed at identifying candidate genes ...

    African Journals Online (AJOL)

    adriana

    2013-12-04

    Dec 4, 2013 ... that provides a virtual workbench for researchers working ... lab, we undertook two different 'omics' approaches to in- .... In our laboratory, an association mapping approach by candidate gene has been undertaken with the aim of identifying among 96 different genotypes new alleles in genes that could.

  9. Transcriptional regulation of gene expression clusters in motor neurons following spinal cord injury

    DEFF Research Database (Denmark)

    Ryge, J.; Winther, Ole; Wienecke, J.

    2010-01-01

    Background: Spinal cord injury leads to neurological dysfunctions affecting the motor, sensory as well as the autonomic systems. Increased excitability of motor neurons has been implicated in injury-induced spasticity, where the reappearance of self-sustained plateau potentials in the absence...... of modulatory inputs from the brain correlates with the development of spasticity. Results: Here we examine the dynamic transcriptional response of motor neurons to spinal cord injury as it evolves over time to unravel common gene expression patterns and their underlying regulatory mechanisms. For this we use...... a rat-tail-model with complete spinal cord transection causing injury-induced spasticity, where gene expression profiles are obtained from labeled motor neurons extracted with laser microdissection 0, 2, 7, 21 and 60 days post injury. Consensus clustering identifies 12 gene clusters with distinct time...

  10. Rice Transcriptome Analysis to Identify Possible Herbicide Quinclorac Detoxification Genes

    Directory of Open Access Journals (Sweden)

    Wenying eXu

    2015-09-01

    Full Text Available Quinclorac is a highly selective auxin-type herbicide, and is widely used in the effective control of barnyard grass in paddy rice fields, improving the world’s rice yield. The herbicide mode of action of quinclorac has been proposed and hormone interactions affect quinclorac signaling. Because of widespread use, quinclorac may be transported outside rice fields with the drainage waters, leading to soil and water pollution and environmental health problems.In this study, we used 57K Affymetrix rice whole-genome array to identify quinclorac signaling response genes to study the molecular mechanisms of action and detoxification of quinclorac in rice plants. Overall, 637 probe sets were identified with differential expression levels under either 6 or 24 h of quinclorac treatment. Auxin-related genes such as GH3 and OsIAAs responded to quinclorac treatment. Gene Ontology analysis showed that genes of detoxification-related family genes were significantly enriched, including cytochrome P450, GST, UGT, and ABC and drug transporter genes. Moreover, real-time RT-PCR analysis showed that top candidate P450 families such as CYP81, CYP709C and CYP72A genes were universally induced by different herbicides. Some Arabidopsis genes for the same P450 family were up-regulated under quinclorac treatment.We conduct rice whole-genome GeneChip analysis and the first global identification of quinclorac response genes. This work may provide potential markers for detoxification of quinclorac and biomonitors of environmental chemical pollution.

  11. GenClust: A genetic algorithm for clustering gene expression data

    Directory of Open Access Journals (Sweden)

    Raimondi Alessandra

    2005-12-01

    Full Text Available Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a a novel coding of the search space that is simple, compact and easy to update; (b it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

  12. Evidence for an ergot alkaloid gene cluster in Claviceps purpurea.

    Science.gov (United States)

    Tudzynski, P; Hölter, K; Correia, T; Arntz, C; Grammel, N; Keller, U

    1999-02-01

    A gene (cpd1) coding for the dimethylallyltryptophan synthase (DMATS) that catalyzes the first specific step in the biosynthesis of ergot alkaloids, was cloned from a strain of Claviceps purpurea that produces alkaloids in axenic culture. The derived gene product (CPD1) shows only 70% similarity to the corresponding gene previously isolated from Claviceps strain ATCC 26245, which is likely to be an isolate of C. fusiformis. Therefore, the related cpd1 most probably represents the first C. purpurea gene coding for an enzymatic step of the alkaloid biosynthetic pathway to be cloned. Analysis of the 3'-flanking region of cpd1 revealed a second, closely linked ergot alkaloid biosynthetic gene named cpps1, which codes for a 356-kDa polypeptide showing significant similarity to fungal modular peptide synthetases. The protein contains three amino acid-activating modules, and in the second module a sequence is found which matches that of an internal peptide (17 amino acids in length) obtained from a tryptic digest of lysergyl peptide synthetase 1 (LPS1) of C. purpurea, thus confirming that cpps1 encodes LPS1. LPS1 activates the three amino acids of the peptide portion of ergot peptide alkaloids during D-lysergyl peptide assembly. Chromosome walking revealed the presence of additional genes upstream of cpd1 which are probably also involved in ergot alkaloid biosynthesis: cpox1 probably codes for an FAD-dependent oxidoreductase (which could represent the chanoclavine cyclase), and a second putative oxidoreductase gene, cpox2, is closely linked to it in inverse orientation. RT-PCR experiments confirm that all four genes are expressed under conditions of peptide alkaloid biosynthesis. These results strongly suggest that at least some genes of ergot alkaloid biosynthesis in C. purpurea are clustered, opening the way for a detailed molecular genetic analysis of the pathway.

  13. Identifying patterns in treatment response profiles in acute bipolar mania: a cluster analysis approach

    Directory of Open Access Journals (Sweden)

    Houston John P

    2008-07-01

    Full Text Available Abstract Background Patients with acute mania respond differentially to treatment and, in many cases, fail to obtain or sustain symptom remission. The objective of this exploratory analysis was to characterize response in bipolar disorder by identifying groups of patients with similar manic symptom response profiles. Methods Patients (n = 222 were selected from a randomized, double-blind study of treatment with olanzapine or divalproex in bipolar I disorder, manic or mixed episode, with or without psychotic features. Hierarchical clustering based on Ward's distance was used to identify groups of patients based on Young-Mania Rating Scale (YMRS total scores at each of 5 assessments over 7 weeks. Logistic regression was used to identify baseline predictors for clusters of interest. Results Four distinct clusters of patients were identified: Cluster 1 (n = 64: patients did not maintain a response (YMRS total scores ≤ 12; Cluster 2 (n = 92: patients responded rapidly (within less than a week and response was maintained; Cluster 3 (n = 36: patients responded rapidly but relapsed soon afterwards (YMRS ≥ 15; Cluster 4 (n = 30: patients responded slowly (≥ 2 weeks and response was maintained. Predictive models using baseline variables found YMRS Item 10 (Appearance, and psychosis to be significant predictors for Clusters 1 and 4 vs. Clusters 2 and 3, but none of the baseline characteristics allowed discriminating between Clusters 1 vs. 4. Experiencing a mixed episode at baseline predicted membership in Clusters 2 and 3 vs. Clusters 1 and 4. Treatment with divalproex, larger number of previous manic episodes, lack of disruptive-aggressive behavior, and more prominent depressive symptoms at baseline were predictors for Cluster 3 vs. 2. Conclusion Distinct treatment response profiles can be predicted by clinical features at baseline. The presence of these features as potential risk factors for relapse in patients who have responded to treatment

  14. Identifying the number of population clusters with structure: problems and solutions.

    Science.gov (United States)

    Gilbert, Kimberly J

    2016-05-01

    The program structure has been used extensively to understand and visualize population genetic structure. It is one of the most commonly used clustering algorithms, cited over 11,500 times in Web of Science since its introduction in 2000. The method estimates ancestry proportions to assign individuals to clusters, and post hoc analyses of results may indicate the most likely number of clusters, or populations, on the landscape. However, as has been shown in this issue of Molecular Ecology Resources by Puechmaille (), when sampling is uneven across populations or across hierarchical levels of population structure, these post hoc analyses can be inaccurate and identify an incorrect number of population clusters. To solve this problem, Puechmaille () presents strategies for subsampling and new analysis methods that are robust to uneven sampling to improve inferences of the number of population clusters. © 2016 John Wiley & Sons Ltd.

  15. Translating biosynthetic gene clusters into fungal armor and weaponry.

    Science.gov (United States)

    Keller, Nancy P

    2015-09-01

    Filamentous fungi are renowned for the production of a diverse array of secondary metabolites (SMs) where the genetic material required for synthesis of a SM is typically arrayed in a biosynthetic gene cluster (BGC). These natural products are valued for their bioactive properties stemming from their functions in fungal biology, key among those protection from abiotic and biotic stress and establishment of a secure niche. The producing fungus must not only avoid self-harm from endogenous SMs but also deliver specific SMs at the right time to the right tissue requiring biochemical aid. This review highlights functions of BGCs beyond the enzymatic assembly of SMs, considering the timing and location of SM production and other proteins in the clusters that control SM activity. Specifically, self-protection is provided by both BGC-encoded mechanisms and non-BGC subcellular containment of toxic SM precursors; delivery and timing is orchestrated through cellular trafficking patterns and stress- and developmental-responsive transcriptional programs.

  16. ENU Mutagenesis in Mice Identifies Candidate Genes For Hypogonadism

    Science.gov (United States)

    Weiss, Jeffrey; Hurley, Lisa A.; Harris, Rebecca M.; Finlayson, Courtney; Tong, Minghan; Fisher, Lisa A.; Moran, Jennifer L.; Beier, David R.; Mason, Christopher; Jameson, J. Larry

    2012-01-01

    Genome-wide mutagenesis was performed in mice to identify candidate genes for male infertility, for which the predominant causes remain idiopathic. Mice were mutagenized using N-ethyl-N-nitrosourea (ENU), bred, and screened for phenotypes associated with the male urogenital system. Fifteen heritable lines were isolated and chromosomal loci were assigned using low density genome-wide SNP arrays. Ten of the fifteen lines were pursued further using higher resolution SNP analysis to narrow the candidate gene regions. Exon sequencing of candidate genes identified mutations in mice with cystic kidneys (Bicc1), cryptorchidism (Rxfp2), restricted germ cell deficiency (Plk4), and severe germ cell deficiency (Prdm9). In two other lines with severe hypogonadism candidate sequencing failed to identify mutations, suggesting defects in genes with previously undocumented roles in gonadal function. These genomic intervals were sequenced in their entirety and a candidate mutation was identified in SnrpE in one of the two lines. The line harboring the SnrpE variant retains substantial spermatogenesis despite small testis size, an unusual phenotype. In addition to the reproductive defects, heritable phenotypes were observed in mice with ataxia (Myo5a), tremors (Pmp22), growth retardation (unknown gene), and hydrocephalus (unknown gene). These results demonstrate that the ENU screen is an effective tool for identifying potential causes of male infertility. PMID:22258617

  17. CpG islands or CpG clusters: how to identify functional GC-rich regions in a genome?

    Directory of Open Access Journals (Sweden)

    Han Leng

    2009-02-01

    Full Text Available Abstract Background CpG islands (CGIs, clusters of CpG dinucleotides in GC-rich regions, are often located in the 5' end of genes and considered gene markers. Hackenberg et al. (2006 recently developed a new algorithm, CpGcluster, which uses a completely different mathematical approach from previous traditional algorithms. Their evaluation suggests that CpGcluster provides a much more efficient approach to detecting functional clusters or islands of CpGs. Results We systematically compared CpGcluster with the traditional algorithm by Takai and Jones (2002. Our comparisons of (1 the number of islands versus the number of genes in a genome, (2 the distribution of islands in different genomic regions, (3 island length, (4 the distance between two neighboring islands, and (5 methylation status suggest that Takai and Jones' algorithm is overall more appropriate for identifying promoter-associated islands of CpGs in vertebrate genomes. Conclusion The generation of genome sequence and DNA methylation data is expected to accelerate greatly. The information in this study is important for its extensive utility in gene feature analysis and epigenomics including gene prediction and methylation chip design in different genomes.

  18. Distinct Phenotypes of Smokers with Fixed Airflow Limitation Identified by Cluster Analysis of Severe Asthma.

    Science.gov (United States)

    Konno, Satoshi; Taniguchi, Natsuko; Makita, Hironi; Nakamaru, Yuji; Shimizu, Kaoruko; Shijubo, Noriharu; Fuke, Satoshi; Takeyabu, Kimihiro; Oguri, Mitsuru; Kimura, Hirokazu; Maeda, Yukiko; Suzuki, Masaru; Nagai, Katsura; Ito, Yoichi M; Wenzel, Sally E; Nishimura, Masaharu

    2018-01-01

    Smoking may have multifactorial effects on asthma phenotypes, particularly in severe asthma. Cluster analysis has been applied to explore novel phenotypes, which are not based on any a priori hypotheses. To explore novel severe asthma phenotypes by cluster analysis when including smoking patients with asthma. We recruited a total of 127 subjects with severe asthma, including 59 current or ex-smokers, from our university hospital and its 29 affiliated hospitals/pulmonary clinics. Clinical variables obtained during a 2-day hospital stay were used for cluster analysis. After clustering using clinical variables, the sputum levels of 14 molecules were measured to biologically characterize the clinical clusters. Five clinical clusters, including two characterized by low forced expiratory volume in 1 second/forced vital capacity, were identified. When characteristics of smoking subjects in these two clusters were compared, there were marked differences between the two groups: one had high levels of circulating eosinophils, high immunoglobulin E levels, and a high sinus score, and the other was characterized by low levels of the same parameters. Sputum analysis revealed intriguing differences of cytokine/chemokine pattern in these two groups. The other three clusters were similar to those previously reported: young onset/atopic, nonsmoker/less eosinophilic, and female/obese. Key clinical variables were confirmed to be stable and consistent 3 years later. This study reveals two distinct phenotypes with potentially different biological pathways contributing to fixed airflow limitation in cigarette smokers with severe asthma.

  19. Gene clustering analysis in human osteoporosis disease and modifications of the jawbone.

    Science.gov (United States)

    Toti, Paolo; Sbordone, Carolina; Martuscelli, Ranieri; Califano, Luigi; Ramaglia, Luca; Sbordone, Ludovico

    2013-08-01

    An analysis of the genes involved in both osteoporosis and modifications of the jawbone, through text mining, using a web search tool, of information regarding gene/protein interaction. The final set of genes involved in the present phenomenon was obtained by expansion-filtering loop. Using a web-available software (STRING), interactions among all genes were searched for, and a clustering procedure was performed in which only high-confidence predicted associations were considered. Two hundred forty-two genes potentially involved in osteoporosis and in modifications of the jawbone were recorded. Seven "leader genes" were identified (CTNNB1, IL1B, IL6, JUN, RUNX2, SPP1, TGFB1), while another 10 genes formed the cluster B group (BMP2, BMP7, COL1A1, ICAM1, IGF1, IL10, MMP9, NFKB1, TNFSF11, VEGFA). Ninety-eight genes had no interactions, and were defined as "orphan genes". The expansion of knowledge regarding the molecular basis causing osteoporotic traits has been brought about with the help of a de novo identification, based on the data mining of genes involved in osteoporosis and in modification of the jawbone. A comparison of the present data, in which no role was verified for 98 genes that had been previously supposed to have a role, with that of the literature, in which another 81 genes, as obtained from GWAS reviews and meta-analyses, appeared to be strongly associated with osteoporosis, probably attests to a lack of information on osteoporotic disease. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Identifying Male Sexual Offender Subtypes Using Cluster Analysis and the Static-2002R.

    Science.gov (United States)

    Ennis, Liam; Buro, Karen; Jung, Sandy

    2016-08-01

    This study examines whether clinically meaningful subgroups could be identified within a large, undifferentiated group of convicted adult male sex offenders. Of eight cluster analyses, a reliable three-cluster solution emerged based on the subscores of the Static-2002R with 345 sex offenders. To establish the validity of the emergent clusters, the three groups of offenders were compared on four domains: criminal history, psychosexual development, sexual attitudes and interests, and recidivism. The findings revealed meaningful differences among the group, and the implications of subgroup membership is discussed in terms of risk, treatment, and supervision. © The Author(s) 2014.

  1. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters.

    Science.gov (United States)

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H

    2015-07-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis

    Science.gov (United States)

    2017-10-13

    applied to the resting-state data to identify tinnitus subgroups within the patient population and pair them with specific behavioral ...and behavioral data  Specific Aim 2: Determine tinnitus subgroups using automated cluster analysis of resting state data and associate the subgroups...data analysis and clustering method previously developed to apply to current tinnitus data set o Percentage of completion at end of Year 2 (24 months

  3. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    Science.gov (United States)

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  4. Clustering analysis of water distribution systems: identifying critical components and community impacts.

    Science.gov (United States)

    Diao, K; Farmani, R; Fu, G; Astaraie-Imani, M; Ward, S; Butler, D

    2014-01-01

    Large water distribution systems (WDSs) are networks with both topological and behavioural complexity. Thereby, it is usually difficult to identify the key features of the properties of the system, and subsequently all the critical components within the system for a given purpose of design or control. One way is, however, to more explicitly visualize the network structure and interactions between components by dividing a WDS into a number of clusters (subsystems). Accordingly, this paper introduces a clustering strategy that decomposes WDSs into clusters with stronger internal connections than external connections. The detected cluster layout is very similar to the community structure of the served urban area. As WDSs may expand along with urban development in a community-by-community manner, the correspondingly formed distribution clusters may reveal some crucial configurations of WDSs. For verification, the method is applied to identify all the critical links during firefighting for the vulnerability analysis of a real-world WDS. Moreover, both the most critical pipes and clusters are addressed, given the consequences of pipe failure. Compared with the enumeration method, the method used in this study identifies the same group of the most critical components, and provides similar criticality prioritizations of them in a more computationally efficient time.

  5. Prognostically distinct clinical patterns of systemic lupus erythematosus identified by cluster analysis.

    Science.gov (United States)

    To, C H; Mok, C C; Tang, S S K; Ying, S K Y; Wong, R W S; Lau, C S

    2009-12-01

    The objective of this study was to evaluate the patterns of clinical manifestations and their mortality in a large cohort of Chinese patients with systemic lupus erythematosus. The cumulative clinical manifestations of a large group of Chinese systemic lupus erythematosus patients who fulfilled at least four American College of Rheumatology criteria for systemic lupus erythematosus were studied. Patients were divided into distinct groups by using the K-mean cluster analysis. Clinical features, prevalence of proliferative lupus nephritis (World Health Organization class III, IV), autoantibody profile, and treatment data were compared and the standardized mortality ratios were calculated for each cluster of patients. There were 1082 patients included in the study (mean age at systemic lupus erythematosus diagnosis 30.5 years; mean systemic lupus erythematosus duration 10.3 years). Three distinct groups of patients were identified. Cluster 1 (n = 347) was characterized predominantly by mucocutaneous manifestations (malar rash, discoid rash, photosensitivity, oral ulcer) and arthritis but having the lowest prevalence of serositis, hematologic manifestations (hemolytic anemia, leukopenia, and thrombocytopenia), and proliferative lupus nephritis. Patients in cluster 2 (n = 409) had mainly renal and hematological manifestations but having the lowest prevalence of mucocutaneous manifestations. Pulmonary and gastrointestinal manifestations were significantly more frequent in cluster 2 than the other clusters. Cluster 3 patients (n = 326) had the most heterogeneous features. Besides having a high prevalence of mucocutaneous manifestations, serositis and hematologic manifestations, renal involvement, and proliferative lupus nephritis was also most prevalent among the three clusters. Patients in cluster 2 had a much higher standardized mortality ratio [standardized mortality ratio 7.23 (6.7-7.7), p lupus erythematosus could be clustered into prognostically distinct patterns of

  6. Two Gene Clusters Coordinate Galactose and Lactose Metabolism in Streptococcus gordonii

    Science.gov (United States)

    Zeng, Lin; Martino, Nicole C.

    2012-01-01

    Streptococcus gordonii is an early colonizer of the human oral cavity and an abundant constituent of oral biofilms. Two tandemly arranged gene clusters, designated lac and gal, were identified in the S. gordonii DL1 genome, which encode genes of the tagatose pathway (lacABCD) and sugar phosphotransferase system (PTS) enzyme II permeases. Genes encoding a predicted phospho-β-galactosidase (LacG), a DeoR family transcriptional regulator (LacR), and a transcriptional antiterminator (LacT) were also present in the clusters. Growth and PTS assays supported that the permease designated EIILac transports lactose and galactose, whereas EIIGal transports galactose. The expression of the gene for EIIGal was markedly upregulated in cells growing on galactose. Using promoter-cat fusions, a role for LacR in the regulation of the expressions of both gene clusters was demonstrated, and the gal cluster was also shown to be sensitive to repression by CcpA. The deletion of lacT caused an inability to grow on lactose, apparently because of its role in the regulation of the expression of the genes for EIILac, but had little effect on galactose utilization. S. gordonii maintained a selective advantage over Streptococcus mutans in a mixed-species competition assay, associated with its possession of a high-affinity galactose PTS, although S. mutans could persist better at low pHs. Collectively, these results support the concept that the galactose and lactose systems of S. gordonii are subject to complex regulation and that a high-affinity galactose PTS may be advantageous when S. gordonii is competing against the caries pathogen S. mutans in oral biofilms. PMID:22660715

  7. DMRT gene cluster analysis in the platypus: new insights into genomic organization and regulatory regions.

    Science.gov (United States)

    El-Mogharbel, Nisrine; Wakefield, Matthew; Deakin, Janine E; Tsend-Ayush, Enkhjargal; Grützner, Frank; Alsop, Amber; Ezaz, Tariq; Marshall Graves, Jennifer A

    2007-01-01

    We isolated and characterized a cluster of platypus DMRT genes and compared their arrangement, location, and sequence across vertebrates. The DMRT gene cluster on human 9p24.3 harbors, in order, DMRT1, DMRT3, and DMRT2, which share a DM domain. DMRT1 is highly conserved and involved in sexual development in vertebrates, and deletions in this region cause sex reversal in humans. Sequence comparisons of DMRT genes between species have been valuable in identifying exons, control regions, and conserved nongenic regions (CNGs). The addition of platypus sequences is expected to be particularly valuable, since monotremes fill a gap in the vertebrate genome coverage. We therefore isolated and fully sequenced platypus BAC clones containing DMRT3 and DMRT2 as well as DMRT1 and then generated multispecies alignments and ran prediction programs followed by experimental verification to annotate this gene cluster. We found that the three genes have 58-66% identity to their human orthologues, lie in the same order as in other vertebrates, and colocate on 1 of the 10 platypus sex chromosomes, X5. We also predict that optimal annotation of the newly sequenced platypus genome will be challenging. The analysis of platypus sequence revealed differences in structure and sequence of the DMRT gene cluster. Multispecies comparison was particularly effective for detecting CNGs, revealing several novel potential regulatory regions within DMRT3 and DMRT2 as well as DMRT1. RT-PCR indicated that platypus DMRT1 and DMRT3 are expressed specifically in the adult testis (and not ovary), but DMRT2 has a wider expression profile, as it does for other mammals. The platypus DMRT1 expression pattern, and its location on an X chromosome, suggests an involvement in monotreme sexual development.

  8. Gravitation field algorithm and its application in gene cluster

    Directory of Open Access Journals (Sweden)

    Zheng Ming

    2010-09-01

    Full Text Available Abstract Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA.

  9. Adaptive evolution of the FADS gene cluster within Africa.

    Directory of Open Access Journals (Sweden)

    Rasika A Mathias

    Full Text Available Long chain polyunsaturated fatty acids (LC-PUFAs are essential for brain structure, development, and function, and adequate dietary quantities of LC-PUFAs are thought to have been necessary for both brain expansion and the increase in brain complexity observed during modern human evolution. Previous studies conducted in largely European populations suggest that humans have limited capacity to synthesize brain LC-PUFAs such as docosahexaenoic acid (DHA from plant-based medium chain (MC PUFAs due to limited desaturase activity. Population-based differences in LC-PUFA levels and their product-to-substrate ratios can, in part, be explained by polymorphisms in the fatty acid desaturase (FADS gene cluster, which have been associated with increased conversion of MC-PUFAs to LC-PUFAs. Here, we show evidence that these high efficiency converter alleles in the FADS gene cluster were likely driven to near fixation in African populations by positive selection ∼85 kya. We hypothesize that selection at FADS variants, which increase LC-PUFA synthesis from plant-based MC-PUFAs, played an important role in allowing African populations obligatorily tethered to marine sources for LC-PUFAs in isolated geographic regions, to rapidly expand throughout the African continent 60-80 kya.

  10. Identification of suitable genes contributes to lung adenocarcinoma clustering by multiple meta-analysis methods.

    Science.gov (United States)

    Yang, Ze-Hui; Zheng, Rui; Gao, Yuan; Zhang, Qiang

    2016-09-01

    With the widespread application of high-throughput technology, numerous meta-analysis methods have been proposed for differential expression profiling across multiple studies. We identified the suitable differentially expressed (DE) genes that contributed to lung adenocarcinoma (ADC) clustering based on seven popular multiple meta-analysis methods. Seven microarray expression profiles of ADC and normal controls were extracted from the ArrayExpress database. The Bioconductor was used to perform the data preliminary preprocessing. Then, DE genes across multiple studies were identified. Hierarchical clustering was applied to compare the classification performance for microarray data samples. The classification efficiency was compared based on accuracy, sensitivity and specificity. Across seven datasets, 573 ADC cases and 222 normal controls were collected. After filtering out unexpressed and noninformative genes, 3688 genes were remained for further analysis. The classification efficiency analysis showed that DE genes identified by sum of ranks method separated ADC from normal controls with the best accuracy, sensitivity and specificity of 0.953, 0.969 and 0.932, respectively. The gene set with the highest classification accuracy mainly participated in the regulation of response to external stimulus (P = 7.97E-04), cyclic nucleotide-mediated signaling (P = 0.01), regulation of cell morphogenesis (P = 0.01) and regulation of cell proliferation (P = 0.01). Evaluation of DE genes identified by different meta-analysis methods in classification efficiency provided a new perspective to the choice of the suitable method in a given application. Varying meta-analysis methods always present varying abilities, so synthetic consideration should be taken when providing meta-analysis methods for particular research. © 2015 John Wiley & Sons Ltd.

  11. A conserved cluster of three PRD-class homeobox genes (homeobrain, rx and orthopedia in the Cnidaria and Protostomia

    Directory of Open Access Journals (Sweden)

    Mazza Maureen E

    2010-07-01

    Full Text Available Abstract Background Homeobox genes are a superclass of transcription factors with diverse developmental regulatory functions, which are found in plants, fungi and animals. In animals, several Antennapedia (ANTP-class homeobox genes reside in extremely ancient gene clusters (for example, the Hox, ParaHox, and NKL clusters and the evolution of these clusters has been implicated in the morphological diversification of animal bodyplans. By contrast, similarly ancient gene clusters have not been reported among the other classes of homeobox genes (that is, the LIM, POU, PRD and SIX classes. Results Using a combination of in silico queries and phylogenetic analyses, we found that a cluster of three PRD-class homeobox genes (Homeobrain (hbn, Rax (rx and Orthopedia (otp is present in cnidarians, insects and mollusks (a partial cluster comprising hbn and rx is present in the placozoan Trichoplax adhaerens. We failed to identify this 'HRO' cluster in deuterostomes; in fact, the Homeobrain gene appears to be missing from the chordate genomes we examined, although it is present in hemichordates and echinoderms. To illuminate the ancestral organization and function of this ancient cluster, we mapped the constituent genes against the assembled genome of a model cnidarian, the sea anemone Nematostella vectensis, and characterized their spatiotemporal expression using in situ hybridization. In N. vectensis, these genes reside in a span of 33 kb with the same gene order as previously reported in insects. Comparisons of genomic sequences and expressed sequence tags revealed the presence of alternative transcripts of Nv-otp and two highly unusual protein-coding polymorphisms in the terminal helix of the Nv-rx homeodomain. A population genetic survey revealed the Rx polymorphisms to be widespread in natural populations. During larval development, all three genes are expressed in the ectoderm, in non-overlapping territories along the oral-aboral axis, with distinct

  12. MeSH key terms for validation and annotation of gene expression clusters

    Energy Technology Data Exchange (ETDEWEB)

    Rechtsteiner, A. (Andreas); Rocha, L. M. (Luis Mateus)

    2004-01-01

    Integration of different sources of information is a great challenge for the analysis of gene expression data, and for the field of Functional Genomics in general. As the availability of numerical data from high-throughput methods increases, so does the need for technologies that assist in the validation and evaluation of the biological significance of results extracted from these data. In mRNA assaying with microarrays, for example, numerical analysis often attempts to identify clusters of co-expressed genes. The important task to find the biological significance of the results and validate them has so far mostly fallen to the biological expert who had to perform this task manually. One of the most promising avenues to develop automated and integrative technology for such tasks lies in the application of modern Information Retrieval (IR) and Knowledge Management (KM) algorithms to databases with biomedical publications and data. Examples of databases available for the field are bibliographic databases c ntaining scientific publications (e.g. MEDLINE/PUBMED), databases containing sequence data (e.g. GenBank) and databases of semantic annotations (e.g. the Gene Ontology Consortium and Medical Subject Headings (MeSH)). We present here an approach that uses the MeSH terms and their concept hierarchies to validate and obtain functional information for gene expression clusters. The controlled and hierarchical MeSH vocabulary is used by the National Library of Medicine (NLM) to index all the articles cited in MEDLINE. Such indexing with a controlled vocabulary eliminates some of the ambiguity due to polysemy (terms that have multiple meanings) and synonymy (multiple terms have similar meaning) that would be encountered if terms would be extracted directly from the articles due to differing article contexts or author preferences and background. Further, the hierarchical organization of the MeSH terms can illustrate the conceptuallfunctional relationships of genes

  13. Indole-Diterpene Gene Cluster from Aspergillus flavus

    Science.gov (United States)

    Zhang, Shuguang; Monahan, Brendon J.; Tkacz, Jan S.; Scott, Barry

    2004-01-01

    Aflatrem is a potent tremorgenic mycotoxin produced by the soil fungus Aspergillus flavus and is a member of a large structurally diverse group of secondary metabolites known as indole-diterpenes. By using degenerate primers for conserved domains of fungal geranylgeranyl diphosphate synthases, we cloned two genes, atmG and ggsA (an apparent pseudogene), from A. flavus. Adjacent to atmG are two other genes, atmC and atmM. These three genes have 64 to 70% amino acid sequence similarity and conserved synteny with a cluster of orthologous genes, paxG, paxC, and paxM, from Penicillium paxilli which are required for indole-diterpene biosynthesis. atmG, atmC, and atmM are coordinately expressed, with transcript levels dramatically increasing at the onset of aflatrem biosynthesis. A genomic copy of atmM can complement a paxM deletion mutant of P. paxilli, demonstrating that atmM is a functional homolog of paxM. Thus, atmG, atmC, and atmM are necessary, but not sufficient, for aflatrem biosynthesis by A. flavus. This provides the first genetic evidence for the biosynthetic pathway of aflatrem in A. flavus. PMID:15528556

  14. Characterization of the biosynthetic gene cluster for cryptic phthoxazolin A in Streptomyces avermitilis.

    Directory of Open Access Journals (Sweden)

    Dian Anggraini Suroto

    Full Text Available Phthoxazolin A, an oxazole-containing polyketide, has a broad spectrum of anti-oomycete activity and herbicidal activity. We recently identified phthoxazolin A as a cryptic metabolite of Streptomyces avermitilis that produces the important anthelmintic agent avermectin. Even though genome data of S. avermitilis is publicly available, no plausible biosynthetic gene cluster for phthoxazolin A is apparent in the sequence data. Here, we identified and characterized the phthoxazolin A (ptx biosynthetic gene cluster through genome sequencing, comparative genomic analysis, and gene disruption. Sequence analysis uncovered that the putative ptx biosynthetic genes are laid on an extra genomic region that is not found in the public database, and 8 open reading frames in the extra genomic region could be assigned roles in the biosynthesis of the oxazole ring, triene polyketide and carbamoyl moieties. Disruption of the ptxA gene encoding a discrete acyltransferase resulted in a complete loss of phthoxazolin A production, confirming that the trans-AT type I PKS system is responsible for the phthoxazolin A biosynthesis. Based on the predicted functional domains in the ptx assembly line, we propose the biosynthetic pathway of phthoxazolin A.

  15. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    Science.gov (United States)

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Genetic variations and haplotype diversity of the UGT1 gene cluster in the Chinese population.

    Directory of Open Access Journals (Sweden)

    Jing Yang

    Full Text Available Vertebrates require tremendous molecular diversity to defend against numerous small hydrophobic chemicals. UDP-glucuronosyltransferases (UGTs are a large family of detoxification enzymes that glucuronidate xenobiotics and endobiotics, facilitating their excretion from the body. The UGT1 gene cluster contains a tandem array of variable first exons, each preceded by a specific promoter, and a common set of downstream constant exons, similar to the genomic organization of the protocadherin (Pcdh, immunoglobulin, and T-cell receptor gene clusters. To assist pharmacogenomics studies in Chinese, we sequenced nine first exons, promoter and intronic regions, and five common exons of the UGT1 gene cluster in a population sample of 253 unrelated Chinese individuals. We identified 101 polymorphisms and found 15 novel SNPs. We then computed allele frequencies for each polymorphism and reconstructed their linkage disequilibrium (LD map. The UGT1 cluster can be divided into five linkage blocks: Block 9 (UGT1A9, Block 9/7/6 (UGT1A9, UGT1A7, and UGT1A6, Block 5 (UGT1A5, Block 4/3 (UGT1A4 and UGT1A3, and Block 3' UTR. Furthermore, we inferred haplotypes and selected their tagSNPs. Finally, comparing our data with those of three other populations of the HapMap project revealed ethnic specificity of the UGT1 genetic diversity in Chinese. These findings have important implications for future molecular genetic studies of the UGT1 gene cluster as well as for personalized medical therapies in Chinese.

  17. Gene clusters for insecticidal loline alkaloids in the grass-endophytic fungus Neotyphodium uncinatum.

    Science.gov (United States)

    Spiering, Martin J; Moon, Christina D; Wilkinson, Heather H; Schardl, Christopher L

    2005-03-01

    Loline alkaloids are produced by mutualistic fungi symbiotic with grasses, and they protect the host plants from insects. Here we identify in the fungal symbiont, Neotyphodium uncinatum, two homologous gene clusters (LOL-1 and LOL-2) associated with loline-alkaloid production. Nine genes were identified in a 25-kb region of LOL-1 and designated (in order) lolF-1, lolC-1, lolD-1, lolO-1, lolA-1, lolU-1, lolP-1, lolT-1, and lolE-1. LOL-2 contained the homologs lolC-2 through lolE-2 in the same order and orientation. Also identified was lolF-2, but its possible linkage with either cluster was undetermined. Most lol genes were regulated in N. uncinatum and N. coenophialum, and all were expressed concomitantly with loline-alkaloid biosynthesis. A lolC-2 RNA-interference (RNAi) construct was introduced into N. uncinatum, and in two independent transformants, RNAi significantly decreased lolC expression (P lol-gene products indicate that the pathway has evolved from various different primary and secondary biosynthesis pathways.

  18. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature

    Directory of Open Access Journals (Sweden)

    Ning Ye

    2015-01-01

    Full Text Available The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures.

  19. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature.

    Science.gov (United States)

    Ye, Ning; Yin, Hengfu; Liu, Jingjing; Dai, Xiaogang; Yin, Tongming

    2015-01-01

    The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures.

  20. Genomic organization, tissue distribution and functional characterization of the rat Pate gene cluster.

    Directory of Open Access Journals (Sweden)

    Angireddy Rajesh

    Full Text Available The cysteine rich prostate and testis expressed (Pate proteins identified till date are thought to resemble the three fingered protein/urokinase-type plasminogen activator receptor proteins. In this study, for the first time, we report the identification, cloning and characterization of rat Pate gene cluster and also determine the expression pattern. The rat Pate genes are clustered on chromosome 8 and their predicted proteins retained the ten cysteine signature characteristic to TFP/Ly-6 protein family. PATE and PATE-F three dimensional protein structure was found to be similar to that of the toxin bucandin. Though Pate gene expression is thought to be prostate and testis specific, we observed that rat Pate genes are also expressed in seminal vesicle and epididymis and in tissues beyond the male reproductive tract. In the developing rats (20-60 day old, expression of Pate genes seem to be androgen dependent in the epididymis and testis. In the adult rat, androgen ablation resulted in down regulation of the majority of Pate genes in the epididymides. PATE and PATE-F proteins were found to be expressed abundantly in the male reproductive tract of rats and on the sperm. Recombinant PATE protein exhibited potent antibacterial activity, whereas PATE-F did not exhibit any antibacterial activity. Pate expression was induced in the epididymides when challenged with LPS. Based on our results, we conclude that rat PATE proteins may contribute to the reproductive and defense functions.

  1. Gene Signature in Sessile Serrated Polyps Identifies Colon Cancer Subtype

    Science.gov (United States)

    Kanth, Priyanka; Bronner, Mary P.; Boucher, Kenneth M.; Burt, Randall W.; Neklason, Deborah W.; Hagedorn, Curt H.; Delker, Don A.

    2016-01-01

    Sessile serrated colon adenoma/polyps (SSA/Ps) are found during routine screening colonoscopy and may account for 20–30% of colon cancers. However, differentiating SSA/Ps from hyperplastic polyps (HP) with little risk of cancer is challenging and complementary molecular markers are needed. Additionally, the molecular mechanisms of colon cancer development from SSA/Ps are poorly understood. RNA sequencing was performed on 21 SSA/Ps, 10 HPs, 10 adenomas, 21 uninvolved colon and 20 control colon specimens. Differential expression and leave-one-out cross validation methods were used to define a unique gene signature of SSA/Ps. Our SSA/P gene signature was evaluated in colon cancer RNA-Seq data from The Cancer Genome Atlas (TCGA) to identify a subtype of colon cancers that may develop from SSA/Ps. A total of 1422 differentially expressed genes were found in SSA/Ps relative to controls. Serrated polyposis syndrome (n=12) and sporadic SSA/Ps (n=9) exhibited almost complete (96%) gene overlap. A 51-gene panel in SSA/P showed similar expression in a subset of TCGA colon cancers with high microsatellite instability (MSI-H). A smaller seven-gene panel showed high sensitivity and specificity in identifying BRAF mutant, CpG island methylator phenotype high (CIMP-H) and MLH1 silenced colon cancers. We describe a unique gene signature in SSA/Ps that identifies a subset of colon cancers likely to develop through the serrated pathway. These gene panels may be utilized for improved differentiation of SSA/Ps from HPs and provide insights into novel molecular pathways altered in colon cancer arising from the serrated pathway. PMID:27026680

  2. Genome scan identifies a locus affecting gamma-globin expression in human beta-cluster YAC transgenic mice

    Energy Technology Data Exchange (ETDEWEB)

    Lin, S.D.; Cooper, P.; Fung, J.; Weier, H.U.G.; Rubin, E.M.

    2000-03-01

    Genetic factors affecting post-natal g-globin expression - a major modifier of the severity of both b-thalassemia and sickle cell anemia, have been difficult to study. This is especially so in mice, an organism lacking a globin gene with an expression pattern equivalent to that of human g-globin. To model the human b-cluster in mice, with the goal of screening for loci affecting human g-globin expression in vivo, we introduced a human b-globin cluster YAC transgene into the genome of FVB mice . The b-cluster contained a Greek hereditary persistence of fetal hemoglobin (HPFH) g allele resulting in postnatal expression of human g-globin in transgenic mice. The level of human g-globin for various F1 hybrids derived from crosses between the FVB transgenics and other inbred mouse strains was assessed. The g-globin level of the C3HeB/FVB transgenic mice was noted to be significantly elevated. To map genes affecting postnatal g-globin expression, a 20 centiMorgan (cM) genome scan of a C3HeB/F VB transgenics [prime] FVB backcross was performed, followed by high-resolution marker analysis of promising loci. From this analysis we mapped a locus within a 2.2 cM interval of mouse chromosome 1 at a LOD score of 4.2 that contributes 10.4% of variation in g-globin expression level. Combining transgenic modeling of the human b-globin gene cluster with quantitative trait analysis, we have identified and mapped a murine locus that impacts on human g-globin expression in vivo.

  3. Genome scan identifies a locus affecting gamma-globin level in human beta-cluster YAC transgenic mice.

    Science.gov (United States)

    Lin, S D; Cooper, P; Fung, J; Weier, H U; Rubin, E M

    2000-11-01

    Genetic factors affecting postnatal gamma-globin expression--a major modifier of the severity of both beta-thalassemia and sickle cell anemia--have been difficult to study. This is especially so in mice, an organism lacking a globin gene with an expression pattern equivalent to that of human gamma-globin. To model the human beta-cluster in mice, with the goal of screening for loci affecting human gamma-globin expression in vivo, we introduced a human beta-globin cluster YAC transgene into the genome of FVB/N mice. The beta-cluster contained a Greek hereditary persistence of fetal hemoglobin (HPFH) gamma allele, resulting in postnatal expression of human gamma-globin in transgenic mice. The level of human gamma-globin for various F1 hybrids derived from crosses between the FVB/N transgenics and other inbred mouse strains was assessed. The gamma-globin level of the (C3HeB/FeJ x FVB/N)F1 transgenic mice was noted to be significantly elevated. To map genes affecting postnatal y-globin expression, we performed a 20-centiMorgan (cM) genome scan of a (C3HeB/FeJ x FVB/N)F1 transgenics x FVB/N backcross, followed by high-resolution marker analysis of promising loci. From this analysis we mapped a locus within an 18-cM interval of mouse Chromosome (Chr) 1 (LOD = 4.3) that contributes 10.9% of variation in gamma-globin level. Combining transgenic modeling of the human beta-globin gene cluster with quantitative trait analysis, we have identified and mapped a murine locus that impacts on human gamma-globin level in vivo.

  4. Identifying the ideal profile of French yogurts for different clusters of consumers.

    Science.gov (United States)

    Masson, M; Saint-Eve, A; Delarue, J; Blumenthal, D

    2016-05-01

    Identifying the sensory properties that affect consumer preferences for food products is an important feature of product development. Different methods, such as external preference mapping or partial least squares regression, are used to establish relationships between sensory data and consumer preferences and to identify sensory attributes that drive consumer preferences, by highlighting optimum products. Plain French yogurts were evaluated by a sensory profiling method performed by 12 trained judges. In parallel, 180 consumers were asked to score their overall liking and complete a cognitive restraint questionnaire. After hierarchical cluster analysis on the liking scores, preference mapping using a quadratic regression model was performed. Five clusters of consumers were identified as a function of different preference patterns. Contrary to our expectations, fat levels were not discriminating. For each cluster, the results of preference mapping enabled the identification of optimum products. A comparison of the 5 sensory profiles revealed numerous differences between key sensory attributes. For example, one consumer cluster had a strong preference for products perceived as very thick, grainy, but with a less flowing texture, less sticky, whey presence and color, in contrast to other clusters. In addition, each segment of consumers was characterized according to the results of the cognitive restraint questionnaire. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  5. CAsubtype: An R Package to Identify Gene Sets Predictive of Cancer Subtypes and Clinical Outcomes.

    Science.gov (United States)

    Kong, Hualei; Tong, Pan; Zhao, Xiaodong; Sun, Jielin; Li, Hua

    2018-03-01

    In the past decade, molecular classification of cancer has gained high popularity owing to its high predictive power on clinical outcomes as compared with traditional methods commonly used in clinical practice. In particular, using gene expression profiles, recent studies have successfully identified a number of gene sets for the delineation of cancer subtypes that are associated with distinct prognosis. However, identification of such gene sets remains a laborious task due to the lack of tools with flexibility, integration and ease of use. To reduce the burden, we have developed an R package, CAsubtype, to efficiently identify gene sets predictive of cancer subtypes and clinical outcomes. By integrating more than 13,000 annotated gene sets, CAsubtype provides a comprehensive repertoire of candidates for new cancer subtype identification. For easy data access, CAsubtype further includes the gene expression and clinical data of more than 2000 cancer patients from TCGA. CAsubtype first employs principal component analysis to identify gene sets (from user-provided or package-integrated ones) with robust principal components representing significantly large variation between cancer samples. Based on these principal components, CAsubtype visualizes the sample distribution in low-dimensional space for better understanding of the distinction between samples and classifies samples into subgroups with prevalent clustering algorithms. Finally, CAsubtype performs survival analysis to compare the clinical outcomes between the identified subgroups, assessing their clinical value as potentially novel cancer subtypes. In conclusion, CAsubtype is a flexible and well-integrated tool in the R environment to identify gene sets for cancer subtype identification and clinical outcome prediction. Its simple R commands and comprehensive data sets enable efficient examination of the clinical value of any given gene set, thus facilitating hypothesis generating and testing in biological and

  6. An EST-based analysis identifies new genes and reveals distinctive gene expression features of Coffea arabica and Coffea canephora

    Directory of Open Access Journals (Sweden)

    Colombo Carlos A

    2011-02-01

    Full Text Available Abstract Background Coffee is one of the world's most important crops; it is consumed worldwide and plays a significant role in the economy of producing countries. Coffea arabica and C. canephora are responsible for 70 and 30% of commercial production, respectively. C. arabica is an allotetraploid from a recent hybridization of the diploid species, C. canephora and C. eugenioides. C. arabica has lower genetic diversity and results in a higher quality beverage than C. canephora. Research initiatives have been launched to produce genomic and transcriptomic data about Coffea spp. as a strategy to improve breeding efficiency. Results Assembling the expressed sequence tags (ESTs of C. arabica and C. canephora produced by the Brazilian Coffee Genome Project and the Nestlé-Cornell Consortium revealed 32,007 clusters of C. arabica and 16,665 clusters of C. canephora. We detected different GC3 profiles between these species that are related to their genome structure and mating system. BLAST analysis revealed similarities between coffee and grape (Vitis vinifera genes. Using KA/KS analysis, we identified coffee genes under purifying and positive selection. Protein domain and gene ontology analyses suggested differences between Coffea spp. data, mainly in relation to complex sugar synthases and nucleotide binding proteins. OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species. Among the interesting families annotated are new cystatins, glycine-rich proteins and RALF-like peptides. Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes. Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories. Conclusion We present the first comprehensive

  7. Identifying influential individuals on intensive care units: using cluster analysis to explore culture.

    Science.gov (United States)

    Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson

    2017-07-01

    The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.

  8. Lampreys have a single gene cluster for the fast skeletal myosin heavy chain gene family.

    Directory of Open Access Journals (Sweden)

    Daisuke Ikeda

    Full Text Available Muscle tissues contain the most classic sarcomeric myosin, called myosin II, which consists of 2 heavy chains (MYHs and 4 light chains. In the case of humans (tetrapod, a total of 6 fast skeletal-type MYH genes (MYHs are clustered on a single chromosome. In contrast, torafugu (teleost contains at least 13 fast skeletal MYHs, which are distributed in 5 genomic regions; the MYHs are clustered in 3 of these regions. In the present study, the evolutionary relationship among fast skeletal MYHs is elucidated by comparing the MYHs of teleosts and tetrapods with those of cyclostome lampreys, one of two groups of extant jawless vertebrates (agnathans. We found that lampreys contain at least 3 fast skeletal MYHs, which are clustered in a head-to-tail manner in a single genomic region. Although there was apparent synteny in the corresponding MYH cluster regions between lampreys and tetrapods, phylogenetic analysis indicated that lamprey and tetrapod MYHs have independently duplicated and diversified. Subsequent transgenic approaches showed that the 5'-flanking sequences of Japanese lamprey fast skeletal MYHs function as a regulatory sequence to drive specific reporter gene expression in the fast skeletal muscle of zebrafish embryos. Although zebrafish MYH promoters showed apparent activity to direct reporter gene expression in myogenic cells derived from mice, promoters from Japanese lamprey MYHs had no activity. These results suggest that the muscle-specific regulatory mechanisms are partially conserved between teleosts and tetrapods but not between cyclostomes and tetrapods, despite the conserved synteny.

  9. Application of biclustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials

    Directory of Open Access Journals (Sweden)

    Andrew Williams

    2015-12-01

    previously defined, functionally relevant gene sets, the present study also identified two novel genes sets: a gene set associated with pulmonary fibrosis and a gene set associated with ROS, underlining the advantage of using a data-driven approach to identify novel, functionally related gene sets. The results can be used in future gene set enrichment analysis studies involving NMs or as features for clustering and classifying NMs of diverse properties.

  10. Contemporary Approaches for Identifying Rare Bone Disease Causing Genes.

    Science.gov (United States)

    Farber, Charles R; Clemens, Thomas L

    Recent improvements in the speed and accuracy of DNA sequencing, together with increasingly sophisticated mathematical approaches for annotating gene networks, have revolutionized the field of human genetics and made these once time consuming approaches assessable to most investigators. In the field of bone research, a particularly active area of gene discovery has occurred in patients with rare bone disorders such as osteogenesis imperfecta (OI) that are caused by mutations in single genes. In this perspective, we highlight some of these technological advances and describe how they have been used to identify the genetic determinants underlying two previously unexplained cases of OI. The widespread availability of advanced methods for DNA sequencing and bioinformatics analysis can be expected to greatly facilitate identification of novel gene networks that normally function to control bone formation and maintenance.

  11. Identifying lipid metabolism genes in pig liver after clenbuterol administration.

    Science.gov (United States)

    Liu, Qiuyue; Zhang, Jin; Guo, Wei; Zhao, Yiqiang; Hu, Xiaoxiang; Li, Ning

    2012-06-01

    Clenbuterol is a repartition agent (beta 2-adrenoceptor agonist) that can decrease fat deposition and increase skeletal muscle growth at manageable dose. To better understand the molecular mechanism of Clenbuterol's action, GeneChips and real-time PCR were used to compare the gene expression profiles of liver tissue in pigs with/without administration of Clenbuterol. Metabolism effects and the global gene expression profiles of liver tissue from Clenbuterol-treated and untreated pigs were conducted. Function enrichment tests showed that the differentially expressed genes are enriched in glycoprotein protein, plasma membrane, fatty acid and amino acid metabolic process, and cell differentiation and signal transduction groups. Pathway mining analysis revealed that physiological pathways such as MAPK, cell adhesion molecules, and the insulin signaling pathway, were remarkably regulated when Clenbuterol was administered. Gene prioritization algorithm was used to associate a number of important differentially expressed genes with lipid metabolism in response to Clenbuterol. Genes identified as differentially expressed in this study will be candidates for further investigation of the molecular mechanisms involved in Clenbuterol's effects on adipose and skeletal muscle tissue.

  12. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

    Science.gov (United States)

    Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

    2017-12-01

    Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.

  13. Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis

    Science.gov (United States)

    2016-10-01

    project activities, for the purpose of enhancing public understanding and increasing interest in learning and careers in science, technology, and the... Unsupervised hierarchical clustering of resting state functional connectivity data to identify patients with mild tinnitus. Poster session presented...including drafting of IRB behavioral and scanning protocols, advising on recruiting and initial data collection. She also supervised analysis of data and

  14. MethylMix 2.0: an R package for identifying DNA methylation genes.

    Science.gov (United States)

    Cedoz, Pierre-Louis; Prunello, Marcos; Brennan, Kevin; Gevaert, Olivier

    2018-04-14

    DNA methylation is an important mechanism regulating gene transcription, and its role in carcinogenesis has been extensively studied. Hyper and hypomethylation of genes is a major mechanism of gene expression deregulation in a wide range of diseases. At the same time, high-throughput DNA methylation assays have been developed generating vast amounts of genome wide DNA methylation measurements. We developed MethylMix, an algorithm implemented in R to identify disease specific hyper and hypomethylated genes. Here we present a new version of MethylMix that automates the construction of DNA-methylation and gene expression datasets from The Cancer Genome Atlas (TCGA). More precisely, MethylMix 2.0 incorporates two major updates: the automated downloading of DNA methylation and gene expression datasets from TCGA and the automated preprocessing of such datasets: value imputation, batch correction and CpG sites clustering within each gene. The resulting datasets can subsequently be analyzed with MethylMix to identify transcriptionally predictive methylation states. We show that the Differential Methylation Values created by MethylMix can be used for cancer subtyping. olivier.gevaert@stanford.edu. https://bioconductor.org/packages/release/bioc/manuals/MethylMix/man/MethylMix.pdf. MethylMix 2.0 was implemented as an R package and is available in bioconductor.

  15. Use of protein-engineered fabrics to identify design rules for integrin ligand clustering in biomaterials.

    Science.gov (United States)

    Benitez, Patrick L; Mascharak, Shamik; Proctor, Amy C; Heilshorn, Sarah C

    2016-01-01

    While ligand clustering is known to enhance integrin activation, this insight has been difficult to apply to the design of implantable biomaterials because the local and global ligand densities that enable clustering-enhanced integrin signaling were unpredictable. Here, two general design principles for biomaterial ligand clustering are elucidated. First, clustering ligands enhances integrin-dependent signals when the global ligand density, i.e., the ligand density across the cellular length scale, is near the ligand's effective dissociation constant (KD,eff). Second, clustering ligands enhances integrin activation when the local ligand density, i.e., the ligand density across the length scale of individual focal adhesions, is less than an overcrowding threshold. To identify these principles, we fabricated a series of elastin-like, electrospun fabrics with independent control over the local (0 to 122 000 ligands μm(-2)) and global (0 to 71 000 ligand μm(-2)) densities of an arginine-glycine-aspartate (RGD) ligand. Antibody blocking studies confirmed that human umbilical vein endothelial cell adhesion to these protein-engineered biomaterials was primarily due to αVβ3 integrin binding. Clustering ligands enhanced cell proliferation, focal adhesion number, and focal adhesion kinase expression near the ligand's KD,eff of 12 000 RGD μm(-2). Near this global ligand density, cells on ligand-clustered fabrics behaved similarly to cells grown on fabrics with significantly larger global ligand densities but without clustering. However, this enhanced ligand-clustering effect was not observed above a threshold cut-off concentration. At a local ligand density of 122 000 RGD μm(-2), cell division, focal adhesion number, and focal adhesion kinase expression were significantly reduced relative to fabrics with identical global ligand density and lesser local ligand densities. Thus, when clustering results in overcrowding of ligands, integrin receptors are no longer

  16. Identifying genes for neurobehavioural traits in rodents: progress and pitfalls

    Directory of Open Access Journals (Sweden)

    Amelie Baud

    2017-04-01

    Full Text Available Identifying genes and pathways that contribute to differences in neurobehavioural traits is a key goal in psychiatric research. Despite considerable success in identifying quantitative trait loci (QTLs associated with behaviour in laboratory rodents, pinpointing the causal variants and genes is more challenging. For a long time, the main obstacle was the size of QTLs, which could encompass tens if not hundreds of genes. However, recent studies have exploited mouse and rat resources that allow mapping of phenotypes to narrow intervals, encompassing only a few genes. Here, we review these studies, showcase the rodent resources they have used and highlight the insights into neurobehavioural traits provided to date. We discuss what we see as the biggest challenge in the field – translating QTLs into biological knowledge by experimentally validating and functionally characterizing candidate genes – and propose that the CRISPR/Cas genome-editing system holds the key to overcoming this obstacle. Finally, we challenge traditional views on inbred versus outbred resources in the light of recent resource and technology developments.

  17. Using Cluster Ensemble and Validation to Identify Subtypes of Pervasive Developmental Disorders

    OpenAIRE

    Shen, Jess J.; Lee, Phil Hyoun; Holden, Jeanette J.A.; Shatkay, Hagit

    2007-01-01

    Pervasive Developmental Disorders (PDD) are neurodevelopmental disorders characterized by impairments in social interaction, communication and behavior.1 Given the diversity and varying severity of PDD, diagnostic tools attempt to identify homogeneous subtypes within PDD. Identifying subtypes can lead to targeted etiology studies and to effective type-specific intervention. Cluster analysis can suggest coherent subsets in data; however, different methods and assumptions lead to different resu...

  18. Risk Factors in Domestic Homicides: Identifying Common Clusters in the Canadian Context.

    Science.gov (United States)

    Dawson, Myrna; Piscitelli, Anthony

    2017-09-01

    Little research has attempted to examine risk factor combinations when examining intimate partner violence. A variety of risk factors have been identified in domestic homicides, and it is recognized that risk of lethality may increase with the presence of more rather than less risk factors. This relationship is not necessarily linear, however. The objective of this study was to identify whether particular risk factor combinations are common in cases of domestic homicide. The study comprised 183 deaths that occurred between 2002 and 2012 and were reviewed by the Domestic Violence Death Review Committee, Office of the Chief Coroner of Ontario, Canada, with particular focus on the presence/absence of 40 empirically based risk factors. The analyses identified three distinct risk factor clusters that differed primarily by victim-perpetrator relationship and the likelihood of perpetrator suicide or attempts to commit suicide. Cases involving perpetrators currently in legal marriages or cohabitating with their victims were most common among the Non-Depressed/Non-Violent Cluster followed by the Depressed/Violent Cluster. In contrast, the majority of those in the Non-Depressed/Violent Cluster were estranged from their victims and the least likely to attempt/commit suicide. The study demonstrates that particular risk factor combinations are common in cases of domestic homicide. Future research should expand the number of risk factors examined, increase the sample size to further test cluster validity, and compare lethal and non-lethal intimate partner violence and homicide to allow for an examination of the clusters more unique to lethality. Prevention initiatives should emphasize the heterogeneity of domestic homicides and target specific interventions.

  19. Hierarchical clustering of breast cancer methylomes revealed differentially methylated and expressed breast cancer genes.

    Directory of Open Access Journals (Sweden)

    I-Hsuan Lin

    Full Text Available Oncogenic transformation of normal cells often involves epigenetic alterations, including histone modification and DNA methylation. We conducted whole-genome bisulfite sequencing to determine the DNA methylomes of normal breast, fibroadenoma, invasive ductal carcinomas and MCF7. The emergence, disappearance, expansion and contraction of kilobase-sized hypomethylated regions (HMRs and the hypomethylation of the megabase-sized partially methylated domains (PMDs are the major forms of methylation changes observed in breast tumor samples. Hierarchical clustering of HMR revealed tumor-specific hypermethylated clusters and differential methylated enhancers specific to normal or breast cancer cell lines. Joint analysis of gene expression and DNA methylation data of normal breast and breast cancer cells identified differentially methylated and expressed genes associated with breast and/or ovarian cancers in cancer-specific HMR clusters. Furthermore, aberrant patterns of X-chromosome inactivation (XCI was found in breast cancer cell lines as well as breast tumor samples in the TCGA BRCA (breast invasive carcinoma dataset. They were characterized with differentially hypermethylated XIST promoter, reduced expression of XIST, and over-expression of hypomethylated X-linked genes. High expressions of these genes were significantly associated with lower survival rates in breast cancer patients. Comprehensive analysis of the normal and breast tumor methylomes suggests selective targeting of DNA methylation changes during breast cancer progression. The weak causal relationship between DNA methylation and gene expression observed in this study is evident of more complex role of DNA methylation in the regulation of gene expression in human epigenetics that deserves further investigation.

  20. Human major histocompatibility complex contains a minimum of 19 genes between the complement cluster and HLA-B

    International Nuclear Information System (INIS)

    Spies, T.; Bresnahan, M.; Strominger, J.L.

    1989-01-01

    A 600-kilobase (kb) DNA segment from the human major histocompatibility complex (MHC) class III region was isolated by extension of a previous 435-kb chromosome walk. The contiguous series of cloned overlapping cosmids contains the entire 555-kb interval between C2 in the complement gene cluster and HLA-B. This region is known to encode the tumor necrosis factors (TNFs) α and β, B144, and the major heat shock protein HSP70. Moreover, a cluster of genes, BAT1-BAT5 (HLA-B-associated transcripts) have been localized in the vicinity of the genes for TNFα and TNFβ. An additional four genes were identified by isolation of corresponding cDNA clones with cosmid DNA probes. These genes for BAT6-BAT9 were mapped near the gene for C2 within a 120-kb region that includes a HSP70 gene pair. These results, together with complementary data from a similar recent study, indicated the presence of a minimum of 19 genes within the C2-HLA-B interval of the MHC class III region. Although the functional properties of most of these genes are yet unknown, they may be involved in some aspects of immunity. This idea is supported by the genetic mapping of the hematopoietic histocompatibility locus-1 (Hh-1) in recombinant mice between TNFα and H-2S, which is homologous to the complement gene cluster in humans

  1. Variations in CCL3L gene cluster sequence and non-specific gene copy numbers

    Directory of Open Access Journals (Sweden)

    Edberg Jeffrey C

    2010-03-01

    Full Text Available Abstract Background Copy number variations (CNVs of the gene CC chemokine ligand 3-like1 (CCL3L1 have been implicated in HIV-1 susceptibility, but the association has been inconsistent. CCL3L1 shares homology with a cluster of genes localized to chromosome 17q12, namely CCL3, CCL3L2, and, CCL3L3. These genes are involved in host defense and inflammatory processes. Several CNV assays have been developed for the CCL3L1 gene. Findings Through pairwise and multiple alignments of these genes, we have shown that the homology between these genes ranges from 50% to 99% in complete gene sequences and from 70-100% in the exonic regions, with CCL3L1 and CCL3L3 being identical. By use of MEGA 4 and BioEdit, we aligned sense primers, anti-sense primers, and probes used in several previously described assays against pre-multiple alignments of all four chemokine genes. Each set of probes and primers aligned and matched with overlapping sequences in at least two of the four genes, indicating that previously utilized RT-PCR based CNV assays are not specific for only CCL3L1. The four available assays measured median copies of 2 and 3-4 in European and African American, respectively. The concordance between the assays ranged from 0.44-0.83 suggesting individual discordant calls and inconsistencies with the assays from the expected gene coverage from the known sequence. Conclusions This indicates that some of the inconsistencies in the association studies could be due to assays that provide heterogenous results. Sequence information to determine CNV of the three genes separately would allow to test whether their association with the pathogenesis of a human disease or phenotype is affected by an individual gene or by a combination of these genes.

  2. Gene cluster analysis for the biosynthesis of elgicins, novel lantibiotics produced by paenibacillus elgii B69

    Directory of Open Access Journals (Sweden)

    Teng Yi

    2012-03-01

    Full Text Available Abstract Background The recent increase in bacterial resistance to antibiotics has promoted the exploration of novel antibacterial materials. As a result, many researchers are undertaking work to identify new lantibiotics because of their potent antimicrobial activities. The objective of this study was to provide details of a lantibiotic-like gene cluster in Paenibacillus elgii B69 and to produce the antibacterial substances coded by this gene cluster based on culture screening. Results Analysis of the P. elgii B69 genome sequence revealed the presence of a lantibiotic-like gene cluster composed of five open reading frames (elgT1, elgC, elgT2, elgB, and elgA. Screening of culture extracts for active substances possessing the predicted properties of the encoded product led to the isolation of four novel peptides (elgicins AI, AII, B, and C with a broad inhibitory spectrum. The molecular weights of these peptides were 4536, 4593, 4706, and 4820 Da, respectively. The N-terminal sequence of elgicin B was Leu-Gly-Asp-Tyr, which corresponded to the partial sequence of the peptide ElgA encoded by elgA. Edman degradation suggested that the product elgicin B is derived from ElgA. By correlating the results of electrospray ionization-mass spectrometry analyses of elgicins AI, AII, and C, these peptides are deduced to have originated from the same precursor, ElgA. Conclusions A novel lantibiotic-like gene cluster was shown to be present in P. elgii B69. Four new lantibiotics with a broad inhibitory spectrum were isolated, and these appear to be promising antibacterial agents.

  3. Identifying influential nodes in large-scale directed networks: the role of clustering.

    Science.gov (United States)

    Chen, Duan-Bing; Gao, Hui; Lü, Linyuan; Zhou, Tao

    2013-01-01

    Identifying influential nodes in very large-scale directed networks is a big challenge relevant to disparate applications, such as accelerating information propagation, controlling rumors and diseases, designing search engines, and understanding hierarchical organization of social and biological networks. Known methods range from node centralities, such as degree, closeness and betweenness, to diffusion-based processes, like PageRank and LeaderRank. Some of these methods already take into account the influences of a node's neighbors but do not directly make use of the interactions among it's neighbors. Local clustering is known to have negative impacts on the information spreading. We further show empirically that it also plays a negative role in generating local connections. Inspired by these facts, we propose a local ranking algorithm named ClusterRank, which takes into account not only the number of neighbors and the neighbors' influences, but also the clustering coefficient. Subject to the susceptible-infected-recovered (SIR) spreading model with constant infectivity, experimental results on two directed networks, a social network extracted from delicious.com and a large-scale short-message communication network, demonstrate that the ClusterRank outperforms some benchmark algorithms such as PageRank and LeaderRank. Furthermore, ClusterRank can also be applied to undirected networks where the superiority of ClusterRank is significant compared with degree centrality and k-core decomposition. In addition, ClusterRank, only making use of local information, is much more efficient than global methods: It takes only 191 seconds for a network with about [Formula: see text] nodes, more than 15 times faster than PageRank.

  4. Identifying influential nodes in large-scale directed networks: the role of clustering.

    Directory of Open Access Journals (Sweden)

    Duan-Bing Chen

    Full Text Available Identifying influential nodes in very large-scale directed networks is a big challenge relevant to disparate applications, such as accelerating information propagation, controlling rumors and diseases, designing search engines, and understanding hierarchical organization of social and biological networks. Known methods range from node centralities, such as degree, closeness and betweenness, to diffusion-based processes, like PageRank and LeaderRank. Some of these methods already take into account the influences of a node's neighbors but do not directly make use of the interactions among it's neighbors. Local clustering is known to have negative impacts on the information spreading. We further show empirically that it also plays a negative role in generating local connections. Inspired by these facts, we propose a local ranking algorithm named ClusterRank, which takes into account not only the number of neighbors and the neighbors' influences, but also the clustering coefficient. Subject to the susceptible-infected-recovered (SIR spreading model with constant infectivity, experimental results on two directed networks, a social network extracted from delicious.com and a large-scale short-message communication network, demonstrate that the ClusterRank outperforms some benchmark algorithms such as PageRank and LeaderRank. Furthermore, ClusterRank can also be applied to undirected networks where the superiority of ClusterRank is significant compared with degree centrality and k-core decomposition. In addition, ClusterRank, only making use of local information, is much more efficient than global methods: It takes only 191 seconds for a network with about [Formula: see text] nodes, more than 15 times faster than PageRank.

  5. Identifying Influential Nodes in Large-Scale Directed Networks: The Role of Clustering

    Science.gov (United States)

    Chen, Duan-Bing; Gao, Hui; Lü, Linyuan; Zhou, Tao

    2013-01-01

    Identifying influential nodes in very large-scale directed networks is a big challenge relevant to disparate applications, such as accelerating information propagation, controlling rumors and diseases, designing search engines, and understanding hierarchical organization of social and biological networks. Known methods range from node centralities, such as degree, closeness and betweenness, to diffusion-based processes, like PageRank and LeaderRank. Some of these methods already take into account the influences of a node’s neighbors but do not directly make use of the interactions among it’s neighbors. Local clustering is known to have negative impacts on the information spreading. We further show empirically that it also plays a negative role in generating local connections. Inspired by these facts, we propose a local ranking algorithm named ClusterRank, which takes into account not only the number of neighbors and the neighbors’ influences, but also the clustering coefficient. Subject to the susceptible-infected-recovered (SIR) spreading model with constant infectivity, experimental results on two directed networks, a social network extracted from delicious.com and a large-scale short-message communication network, demonstrate that the ClusterRank outperforms some benchmark algorithms such as PageRank and LeaderRank. Furthermore, ClusterRank can also be applied to undirected networks where the superiority of ClusterRank is significant compared with degree centrality and k-core decomposition. In addition, ClusterRank, only making use of local information, is much more efficient than global methods: It takes only 191 seconds for a network with about nodes, more than 15 times faster than PageRank. PMID:24204833

  6. Identifying candidate driver genes by integrative ovarian cancer genomics data

    Science.gov (United States)

    Lu, Xinguo; Lu, Jibo

    2017-08-01

    Integrative analysis of molecular mechanics underlying cancer can distinguish interactions that cannot be revealed based on one kind of data for the appropriate diagnosis and treatment of cancer patients. Tumor samples exhibit heterogeneity in omics data, such as somatic mutations, Copy Number Variations CNVs), gene expression profiles and so on. In this paper we combined gene co-expression modules and mutation modulators separately in tumor patients to obtain the candidate driver genes for resistant and sensitive tumor from the heterogeneous data. The final list of modulators identified are well known in biological processes associated with ovarian cancer, such as CCL17, CACTIN, CCL16, CCL22, APOB, KDF1, CCL11, HNF1B, LRG1, MED1 and so on, which can help to facilitate the discovery of biomarkers, molecular diagnostics, and drug discovery.

  7. Integrating mean and variance heterogeneities to identify differentially expressed genes.

    Science.gov (United States)

    Ouyang, Weiwei; An, Qiang; Zhao, Jinying; Qin, Huaizhen

    2016-12-06

    In functional genomics studies, tests on mean heterogeneity have been widely employed to identify differentially expressed genes with distinct mean expression levels under different experimental conditions. Variance heterogeneity (aka, the difference between condition-specific variances) of gene expression levels is simply neglected or calibrated for as an impediment. The mean heterogeneity in the expression level of a gene reflects one aspect of its distribution alteration; and variance heterogeneity induced by condition change may reflect another aspect. Change in condition may alter both mean and some higher-order characteristics of the distributions of expression levels of susceptible genes. In this report, we put forth a conception of mean-variance differentially expressed (MVDE) genes, whose expression means and variances are sensitive to the change in experimental condition. We mathematically proved the null independence of existent mean heterogeneity tests and variance heterogeneity tests. Based on the independence, we proposed an integrative mean-variance test (IMVT) to combine gene-wise mean heterogeneity and variance heterogeneity induced by condition change. The IMVT outperformed its competitors under comprehensive simulations of normality and Laplace settings. For moderate samples, the IMVT well controlled type I error rates, and so did existent mean heterogeneity test (i.e., the Welch t test (WT), the moderated Welch t test (MWT)) and the procedure of separate tests on mean and variance heterogeneities (SMVT), but the likelihood ratio test (LRT) severely inflated type I error rates. In presence of variance heterogeneity, the IMVT appeared noticeably more powerful than all the valid mean heterogeneity tests. Application to the gene profiles of peripheral circulating B raised solid evidence of informative variance heterogeneity. After adjusting for background data structure, the IMVT replicated previous discoveries and identified novel experiment

  8. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Science.gov (United States)

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is

  9. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Directory of Open Access Journals (Sweden)

    Landfors Mattias

    2010-10-01

    Full Text Available Abstract Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered, missing value imputation (2, standardization of data (2, gene selection (19 or clustering method (11. The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that

  10. Analysis of Pigeon (Columba Ovary Transcriptomes to Identify Genes Involved in Blue Light Regulation.

    Directory of Open Access Journals (Sweden)

    Ying Wang

    Full Text Available Monochromatic light is widely applied to promote poultry reproductive performance, yet little is currently known regarding the mechanism by which light wavelengths affect pigeon reproduction. Recently, high-throughput sequencing technologies have been used to provide genomic information for solving this problem. In this study, we employed Illumina Hiseq 2000 to identify differentially expressed genes in ovary tissue from pigeons under blue and white light conditions and de novo transcriptome assembly to construct a comprehensive sequence database containing information on the mechanisms of follicle development. A total of 157,774 unigenes (mean length: 790 bp were obtained by the Trinity program, and 35.83% of these unigenes were matched to genes in a non-redundant protein database. Gene description, gene ontology, and the clustering of orthologous group terms were performed to annotate the transcriptome assembly. Differentially expressed genes between blue and white light conditions included those related to oocyte maturation, hormone biosynthesis, and circadian rhythm. Furthermore, 17,574 SSRs and 533,887 potential SNPs were identified in this transcriptome assembly. This work is the first transcriptome analysis of the Columba ovary using Illumina technology, and the resulting transcriptome and differentially expressed gene data can facilitate further investigations into the molecular mechanism of the effect of blue light on follicle development and reproduction in pigeons and other bird species.

  11. Physical and genetic map of the major nif gene cluster from Azotobacter vinelandii.

    OpenAIRE

    Jacobson, M R; Brigle, K E; Bennett, L T; Setterquist, R A; Wilson, M S; Cash, V L; Beynon, J; Newton, W E; Dean, D R

    1989-01-01

    Determination of a 28,793-base-pair DNA sequence of a region from the Azotobacter vinelandii genome that includes and flanks the nitrogenase structural gene region was completed. This information was used to revise the previously proposed organization of the major nif cluster. The major nif cluster from A. vinelandii encodes 15 nif-specific genes whose products bear significant structural identity to the corresponding nif-specific gene products from Klebsiella pneumoniae. These genes include ...

  12. Comparative transcriptional profiling of the axolotl limb identifies a tripartite regeneration-specific gene program.

    Directory of Open Access Journals (Sweden)

    Dunja Knapp

    Full Text Available Understanding how the limb blastema is established after the initial wound healing response is an important aspect of regeneration research. Here we performed parallel expression profile time courses of healing lateral wounds versus amputated limbs in axolotl. This comparison between wound healing and regeneration allowed us to identify amputation-specific genes. By clustering the expression profiles of these samples, we could detect three distinguishable phases of gene expression - early wound healing followed by a transition-phase leading to establishment of the limb development program, which correspond to the three phases of limb regeneration that had been defined by morphological criteria. By focusing on the transition-phase, we identified 93 strictly amputation-associated genes many of which are implicated in oxidative-stress response, chromatin modification, epithelial development or limb development. We further classified the genes based on whether they were or were not significantly expressed in the developing limb bud. The specific localization of 53 selected candidates within the blastema was investigated by in situ hybridization. In summary, we identified a set of genes that are expressed specifically during regeneration and are therefore, likely candidates for the regulation of blastema formation.

  13. Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression

    Directory of Open Access Journals (Sweden)

    Sakaki Yoshiyuki

    2004-02-01

    Full Text Available Abstract Background Gene expression is regulated mainly by transcription factors (TFs that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS using position weight matrices (PWMs that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions. Results We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster, we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI. Conclusion Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1 those that show TFBS clustered in promoters associated with CGI, and (2 those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in

  14. Methods for identifying an essential gene in a prokaryotic microorganism

    Energy Technology Data Exchange (ETDEWEB)

    Shizuya, Hiroaki

    2006-01-31

    Methods are provided for the rapid identification of essential or conditionally essential DNA segments in any species of haploid cell (one copy chromosome per cell) that is capable of being transformed by artificial means and is capable of undergoing DNA recombination. This system offers an enhanced means of identifying essential function genes in diploid pathogens, such as gram-negative and gram-positive bacteria.

  15. A tripartite clustering analysis on microRNA, gene and disease model.

    Science.gov (United States)

    Shen, Chengcheng; Liu, Ying

    2012-02-01

    Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings.

  16. Clustering of transcriptional profiles identifies changes to insulin signaling as an early event in a mouse model of Alzheimer's disease.

    Science.gov (United States)

    Jackson, Harriet M; Soto, Ileana; Graham, Leah C; Carter, Gregory W; Howell, Gareth R

    2013-11-25

    Alzheimer's disease affects more than 35 million people worldwide but there is no known cure. Age is the strongest risk factor for Alzheimer's disease but it is not clear how age-related changes impact the disease. Here, we used a mouse model of Alzheimer's disease to identify age-specific changes that occur prior to and at the onset of traditional Alzheimer-related phenotypes including amyloid plaque formation. To identify these early events we used transcriptional profiling of mouse brains combined with computational approaches including singular value decomposition and hierarchical clustering. Our study identifies three key events in early stages of Alzheimer's disease. First, the most important drivers of Alzheimer's disease onset in these mice are age-specific changes. These include perturbations of the ribosome and oxidative phosphorylation pathways. Second, the earliest detectable disease-specific changes occur to genes commonly associated with the hypothalamic-adrenal-pituitary (HPA) axis. These include the down-regulation of genes relating to metabolism, depression and appetite. Finally, insulin signaling, in particular the down-regulation of the insulin receptor substrate 4 (Irs4) gene, may be an important event in the transition from age-related changes to Alzheimer's disease specific-changes. A combination of transcriptional profiling combined with computational analyses has uncovered novel features relevant to Alzheimer's disease in a widely used mouse model and offers avenues for further exploration into early stages of AD.

  17. Identifying key genes in glaucoma based on a benchmarked dataset and the gene regulatory network.

    Science.gov (United States)

    Chen, Xi; Wang, Qiao-Ling; Zhang, Meng-Hui

    2017-10-01

    The current study aimed to identify key genes in glaucoma based on a benchmarked dataset and gene regulatory network (GRN). Local and global noise was added to the gene expression dataset to produce a benchmarked dataset. Differentially-expressed genes (DEGs) between patients with glaucoma and normal controls were identified utilizing the Linear Models for Microarray Data (Limma) package based on benchmarked dataset. A total of 5 GRN inference methods, including Zscore, GeneNet, context likelihood of relatedness (CLR) algorithm, Partial Correlation coefficient with Information Theory (PCIT) and GEne Network Inference with Ensemble of Trees (Genie3) were evaluated using receiver operating characteristic (ROC) and precision and recall (PR) curves. The interference method with the best performance was selected to construct the GRN. Subsequently, topological centrality (degree, closeness and betweenness) was conducted to identify key genes in the GRN of glaucoma. Finally, the key genes were validated by performing reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A total of 176 DEGs were detected from the benchmarked dataset. The ROC and PR curves of the 5 methods were analyzed and it was determined that Genie3 had a clear advantage over the other methods; thus, Genie3 was used to construct the GRN. Following topological centrality analysis, 14 key genes for glaucoma were identified, including IL6 , EPHA2 and GSTT1 and 5 of these 14 key genes were validated by RT-qPCR. Therefore, the current study identified 14 key genes in glaucoma, which may be potential biomarkers to use in the diagnosis of glaucoma and aid in identifying the molecular mechanism of this disease.

  18. Human paraoxonase gene cluster overexpression alleviates angiotensin II-induced cardiac hypertrophy in mice.

    Science.gov (United States)

    Pei, Jian-Fei; Yan, Yun-Fei; Tang, Xiaoqiang; Zhang, Yang; Cui, Shen-Shen; Zhang, Zhu-Qin; Chen, Hou-Zao; Liu, De-Pei

    2016-11-01

    Cardiac hypertrophy is the strongest predictor of the development of heart failure, and anti-hypertrophic treatment holds the key to improving the clinical syndrome and increasing the survival rates for heart failure. The paraoxonase (PON) gene cluster (PC) protects against atherosclerosis and coronary artery diseases. However, the role of PC in the heart is largely unknown. To evaluate the roles of PC in cardiac hypertrophy, transgenic mice carrying the intact human PON1, PON2, and PON3 genes and their flanking sequences were studied. We demonstrated that the PC transgene (PC-Tg) protected mice from cardiac hypertrophy induced by Ang II; these mice had reduced heart weight/body weight ratios, decreased left ventricular wall thicknesses and increased fractional shortening compared with wild-type (WT) control. The same protective tendency was also observed with an Apoe -/- background. Mechanically, PC-Tg normalized the disequilibrium of matrix metalloproteinases (MMPs)/tissue inhibitors of MMPs (TIMPs) in hypertrophic hearts, which might contribute to the protective role of PC-Tg in cardiac fibrosis and, thus, protect against cardiac remodeling. Taken together, our results identify a novel anti-hypertrophic role for the PON gene cluster, suggesting a possible strategy for the treatment of cardiac hypertrophy through elevating the levels of the PON gene family.

  19. The complete coenzyme B12 biosynthesis gene cluster of Lactobacillus reuteri CRL1098

    NARCIS (Netherlands)

    Santos, F.; Vera, J.L.; van der Heijden, R.; Valdez, G.; de Vos, W.M.; Sesma, F.; Hugenholtz, J.

    2008-01-01

    The coenzyme B12 production pathway in Lactobacillus reuteri has been deduced using a combination of genetic, biochemical and bioinformatics approaches. The coenzyme B12 gene cluster of Lb. reuteri CRL1098 has the unique feature of clustering together the cbi, cob and hem genes. It consists of 29

  20. The complete coenzyme B12 biosynthesis gene cluster of Lactobacillus reuteri CRL 1098

    NARCIS (Netherlands)

    Santos, dos F.; Vera, J.L.; Heijden, van der R.; Valdez, G.F.; Vos, de W.M.; Sesma, F.; Hugenholtz, J.

    2008-01-01

    The coenzyme B12 production pathway in Lactobacillus reuteri has been deduced using a combination of genetic, biochemical and bioinformatics approaches. The coenzyme B12 gene cluster of Lb. reuteri CRL1098 has the unique feature of clustering together the cbi, cob and hem genes. It consists of 29

  1. Dominant control region of the human β- like globin gene cluster

    NARCIS (Netherlands)

    Blom van Assendelft, Margaretha van

    1989-01-01

    The structure and regulation of the human β -like globin gene cluster has been studied extensively. Genetic disorders connected with this gene cluster are responsible for human diseases associated with high levels of morbidity and mortality, such as β-thalassaemia and sickle cell anaemia. The work

  2. A phylogenomic gene cluster resource: The phylogeneticallyinferred groups (PhlGs) database

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir S.; Boore, Jeffrey L.

    2005-08-25

    We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.

  3. Astronomy and big data a data clustering approach to identifying uncertain galaxy morphology

    CERN Document Server

    Edwards, Kieran Jay

    2014-01-01

    With the onset of massive cosmological data collection through media such as the Sloan Digital Sky Survey (SDSS), galaxy classification has been accomplished for the most part with the help of citizen science communities like Galaxy Zoo. Seeking the wisdom of the crowd for such Big Data processing has proved extremely beneficial. However, an analysis of one of the Galaxy Zoo morphological classification data sets has shown that a significant majority of all classified galaxies are labelled as “Uncertain”. This book reports on how to use data mining, more specifically clustering, to identify galaxies that the public has shown some degree of uncertainty for as to whether they belong to one morphology type or another. The book shows the importance of transitions between different data mining techniques in an insightful workflow. It demonstrates that Clustering enables to identify discriminating features in the analysed data sets, adopting a novel feature selection algorithms called Incremental Feature Select...

  4. Gene-network analysis identifies susceptibility genes related to glycobiology in autism.

    Directory of Open Access Journals (Sweden)

    Bert van der Zwaag

    Full Text Available The recent identification of copy-number variation in the human genome has opened up new avenues for the discovery of positional candidate genes underlying complex genetic disorders, especially in the field of psychiatric disease. One major challenge that remains is pinpointing the susceptibility genes in the multitude of disease-associated loci. This challenge may be tackled by reconstruction of functional gene-networks from the genes residing in these loci. We applied this approach to autism spectrum disorder (ASD, and identified the copy-number changes in the DNA of 105 ASD patients and 267 healthy individuals with Illumina Humanhap300 Beadchips. Subsequently, we used a human reconstructed gene-network, Prioritizer, to rank candidate genes in the segmental gains and losses in our autism cohort. This analysis highlighted several candidate genes already known to be mutated in cognitive and neuropsychiatric disorders, including RAI1, BRD1, and LARGE. In addition, the LARGE gene was part of a sub-network of seven genes functioning in glycobiology, present in seven copy-number changes specifically identified in autism patients with limited co-morbidity. Three of these seven copy-number changes were de novo in the patients. In autism patients with a complex phenotype and healthy controls no such sub-network was identified. An independent systematic analysis of 13 published autism susceptibility loci supports the involvement of genes related to glycobiology as we also identified the same or similar genes from those loci. Our findings suggest that the occurrence of genomic gains and losses of genes associated with glycobiology are important contributors to the development of ASD.

  5. Organization and differential regulation of a cluster of lignin peroxidase genes of Phanerochaete chrysosporium

    Science.gov (United States)

    Philip. Stewart; Daniel. Cullen

    1999-06-01

    The lignin peroxidases of Phanerochaete chrysosporium are encoded by a minimum of 10 closely related genes. Physical and genetic mapping of a cluster of eight lip genes revealed six genes occurring in pairs and transcriptionally convergent, suggesting that portions of the lip family arose by gene duplication events. The completed sequence of 1ipG and lipJ, together...

  6. DriverFinder: A Gene Length-Based Network Method to Identify Cancer Driver Genes

    Directory of Open Access Journals (Sweden)

    Pi-Jing Wei

    2017-01-01

    Full Text Available Integration of multi-omics data of cancer can help people to explore cancers comprehensively. However, with a large volume of different omics and functional data being generated, there is a major challenge to distinguish functional driver genes from a sea of inconsequential passenger genes that accrue stochastically but do not contribute to cancer development. In this paper, we present a gene length-based network method, named DriverFinder, to identify driver genes by integrating somatic mutations, copy number variations, gene-gene interaction network, tumor expression, and normal expression data. To illustrate the performance of DriverFinder, it is applied to four cancer types from The Cancer Genome Atlas including breast cancer, head and neck squamous cell carcinoma, thyroid carcinoma, and kidney renal clear cell carcinoma. Compared with some conventional methods, the results demonstrate that the proposed method is effective. Moreover, it can decrease the influence of gene length in identifying driver genes and identify some rare mutated driver genes.

  7. GenCLiP: a software program for clustering gene lists by literature profiling and constructing gene co-occurrence networks related to custom keywords

    Directory of Open Access Journals (Sweden)

    Zhou Yi-Bo

    2008-07-01

    Full Text Available Abstract Background Biomedical researchers often want to explore pathogenesis and pathways regulated by abnormally expressed genes, such as those identified by microarray analyses. Literature mining is an important way to assist in this task. Many literature mining tools are now available. However, few of them allows the user to make manual adjustments to zero in on what he/she wants to know in particular. Results We present our software program, GenCLiP (Gene Cluster with Literature Profiles, which is based on the methods presented by Chaussabel and Sher (Genome Biol 2002, 3(10:RESEARCH0055 that search gene lists to identify functional clusters of genes based on up-to-date literature profiling. Four features were added to this previously described method: the ability to 1 manually curate keywords extracted from the literature, 2 search genes and gene co-occurrence networks related to custom keywords, 3 compare analyzed gene results with negative and positive controls generated by GenCLiP, and 4 calculate probabilities that the resulting genes and gene networks are randomly related. In this paper, we show with a set of differentially expressed genes between keloids and normal control, how implementation of functions in GenCLiP successfully identified keywords related to the pathogenesis of keloids and unknown gene pathways involved in the pathogenesis of keloids. Conclusion With regard to the identification of disease-susceptibility genes, GenCLiP allows one to quickly acquire a primary pathogenesis profile and identify pathways involving abnormally expressed genes not previously associated with the disease.

  8. Comparative Transcriptomics to Identify Novel Genes and Pathways in Dinoflagellates

    Science.gov (United States)

    Ryan, D.

    2016-02-01

    The unarmored dinoflagellate Karenia brevis is among the most prominent harmful, bloom-forming phytoplankton species in the Gulf of Mexico. During blooms, the polyketides PbTx-1 and PbTx-2 (brevetoxins) are produced by K. brevis. Brevetoxins negatively impact human health and the Gulf shellfish harvest. However, the genes underlying brevetoxin synthesis are currently unknown. Because the K. brevis genome is extremely large ( 1 × 1011 base pairs long), and with a high proportion of repetitive, non-coding DNA, it has not been sequenced. In fact, large, repetitive genomes are common among the dinoflagellate group. High-throughput RNA sequencing technology enabled us to assemble Karenia transcriptomes de novo and investigate potential genes in the brevetoxin pathway through comparative transcriptomics. The brevetoxin profile varies among K. brevis clonal cultures. For example, well-documented Wilson-CCFWC268 typically produces 8-10 pg PbTx per cell, whereas SP1 produces polyketide synthases (PKSs), were only expressed by brevetoxin-producing K. brevis and K. papilionacea, not K. mikimotoi. Examination of gene expression between the typical- and low-toxin Wilson clones identified about 3,500 genes with significantly different expression levels, including 2 putative PKSs. One of the 2 PKSs was only found in the brevetoxin-producing Karenia species. These transcriptomes could not have been characterized without high-throughput RNA sequencing.

  9. Strategies to identify long noncoding RNAs involved in gene regulation

    Directory of Open Access Journals (Sweden)

    Lee Catherine

    2012-11-01

    Full Text Available Abstract Long noncoding RNAs (lncRNAs have been detected in nearly every cell type and found to be fundamentally involved in many biological processes. The characterization of lncRNAs has immense potential to advance our comprehensive understanding of cellular processes and gene regulation, along with implications for the treatment of human disease. The recent ENCODE (Encyclopedia of DNA Elements study reported 9,640 lncRNA loci in the human genome, which corresponds to around half the number of protein-coding genes. Because of this sheer number and their functional diversity, it is crucial to identify a pool of potentially relevant lncRNAs early on in a given study. In this review, we evaluate the methods for isolating lncRNAs by immunoprecipitation and review the advantages, disadvantages, and applications of three widely used approaches – microarray, tiling array, and RNA-seq – for identifying lncRNAs involved in gene regulation. We also look at ways in which data from publicly available databases such as ENCODE can support the study of lncRNAs.

  10. Global analysis of biosynthetic gene clusters reveals vast potential of secondary metabolite production in Penicillium species.

    Science.gov (United States)

    Nielsen, Jens Christian; Grijseels, Sietske; Prigent, Sylvain; Ji, Boyang; Dainat, Jacques; Nielsen, Kristian Fog; Frisvad, Jens Christian; Workman, Mhairi; Nielsen, Jens

    2017-04-03

    Filamentous fungi produce a wide range of bioactive compounds with important pharmaceutical applications, such as antibiotic penicillins and cholesterol-lowering statins. However, less attention has been paid to fungal secondary metabolites compared to those from bacteria. In this study, we sequenced the genomes of 9 Penicillium species and, together with 15 published genomes, we investigated the secondary metabolism of Penicillium and identified an immense, unexploited potential for producing secondary metabolites by this genus. A total of 1,317 putative biosynthetic gene clusters (BGCs) were identified, and polyketide synthase and non-ribosomal peptide synthetase based BGCs were grouped into gene cluster families and mapped to known pathways. The grouping of BGCs allowed us to study the evolutionary trajectory of pathways based on 6-methylsalicylic acid (6-MSA) synthases. Finally, we cross-referenced the predicted pathways with published data on the production of secondary metabolites and experimentally validated the production of antibiotic yanuthones in Penicillia and identified a previously undescribed compound from the yanuthone pathway. This study is the first genus-wide analysis of the genomic diversity of Penicillia and highlights the potential of these species as a source of new antibiotics and other pharmaceuticals.

  11. Identification of a gene cluster associated with triclosan catabolism.

    Science.gov (United States)

    Kagle, Jeanne M; Paxson, Clayton; Johnstone, Precious; Hay, Anthony G

    2015-06-01

    Aerobic degradation of bis-aryl ethers like the antimicrobial triclosan typically proceeds through oxygenase-dependent catabolic pathways. Although several studies have reported on bacteria capable of degrading triclosan aerobically, there are no reports describing the genes responsible for this process. In this study, a gene encoding the large subunit of a putative triclosan oxygenase, designated tcsA was identified in a triclosan-degrading fosmid clone from a DNA library of Sphingomonas sp. RD1. Consistent with tcsA's similarity to two-part dioxygenases, a putative FMN-dependent ferredoxin reductase, designated tcsB was found immediately downstream of tcsA. Both tcsAB were found in the midst of a putative chlorocatechol degradation operon. We show that RD1 produces hydroxytriclosan and chlorocatechols during triclosan degradation and that tcsA is induced by triclosan. This is the first study to report on the genetics of triclosan degradation.

  12. Molecular characterization of a conserved archaeal copper resistance (cop) gene cluster and its copper-responsive regulator in Sulfolobus solfataricus P2

    NARCIS (Netherlands)

    Ettema, T.J.G.; Brinkman, A.B.; Lamers, P.P.; Kornet, N.; Vos, de W.M.; Oost, van der J.

    2006-01-01

    Using a comparative genomics approach, a copper resistance gene cluster has been identified in multiple archaeal genomes. The cop cluster is predicted to encode a metallochaperone (CopM), a P-type copper-exporting ATPase (CopA) and a novel, archaea-specific transcriptional regulator (CopT) which

  13. A robust approach based on Weibull distribution for clustering gene expression data

    Directory of Open Access Journals (Sweden)

    Gong Binsheng

    2011-05-01

    Full Text Available Abstract Background Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest. Results In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method, a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM using functional annotation information given by the Gene Ontology (GO. The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets. Conclusions The results demonstrate that our WDCM produces clusters

  14. Cluster analysis identifies three urodynamic patterns in patients with orthotopic neobladder reconstruction.

    Directory of Open Access Journals (Sweden)

    Kwang Hyun Kim

    Full Text Available To classify patients with orthotopic neobladder based on urodynamic parameters using cluster analysis and to characterize the voiding function of each group.From January 2012 to November 2015, 142 patients with bladder cancer underwent radical cystectomy and Studer neobladder reconstruction at our institute. Of the 142 patients, 103 with complete urodynamic data and information on urinary functional outcomes were included in this study. K-means clustering was performed with urodynamic parameters which included maximal cystometric capacity, residual volume, maximal flow rate, compliance, and detrusor pressure at maximum flow rate. Three groups emerged by cluster analysis. Urodynamic parameters and urinary function outcomes were compared between three groups.Group 1 (n = 44 had ideal urodynamic parameters with a mean maximal bladder capacity of 513.3 ml and mean residual urine volume of 33.1 ml. Group 2 (n = 42 was characterized by small bladder capacity with low compliance. Patients in group 2 had higher rates of daytime incontinence and nighttime incontinence than patients in group 1. Group 3 (n = 17 was characterized by large residual urine volume with high compliance. When we examined gender differences in urodynamics and functional outcomes, residual urine volume and the rate of daytime incontinence were only marginally significant. However, females were significantly more likely to belong to group 2 or 3 (P = 0.003. In multivariate analysis to identify factors associated with group 1 which has the most ideal urodynamic pattern, age (OR 0.95, P = 0.017 and male gender (OR 7.57, P = 0.003 were identified as significant factors.While patients with ileal neobladder present with various voiding symptoms, three urodynamic patterns were identified by cluster analysis. Approximately half of patients had ideal urodynamic parameters. The other two groups were characterized by large residual urine and small capacity bladder with low compliance. Young

  15. New genes associated with rheumatoid arthritis identified by gene expression profiling.

    Science.gov (United States)

    Wang, H; Guo, J; Jiang, J; Wu, W; Chang, X; Zhou, H; Li, Z; Zhao, J

    2017-06-01

    In this study, we aimed to find new genes associated with rheumatoid arthritis (RA) so that more comprehensive genes would be used for monitoring and/or diagnosing patients. Illumina digital gene expression profiling was applied in two sample types - peripheral blood mononuclear cells (PBMCs) and synovial cells to compare the gene expression pattern between 17 patients with RA and three control groups (six osteoarthritis patients, three ankylosing spondylitis patients and 17 healthy controls). Bioinformatics was performed on pathway analysis and protein-protein interaction networks. Four novel genes from PBMCs - DHRS3, TTC38, SAP30BP and LPIN2 - were found to be associated with RA and further confirmed through quantitative real-time polymerase chain reaction. Five new differentially expressed genes (EPYC, LIFR, GLDN, TADA3 and ZNRF3) found in synovial cells were not confirmed. Pathway analyses revealed 10 significantly enriched pathways, and a protein-protein interaction network analysis showed that four novel PBMC-derived genes were connected to previously reported genes by four intermediate genes. Therefore, we proposed that four newly identified PBMC-derived genes could be integrated with previously reported RA-associated genes to monitor and/or diagnose RA. © 2017 John Wiley & Sons Ltd.

  16. Back to the sea twice: identifying candidate plant genes for molecular evolution to marine life

    Directory of Open Access Journals (Sweden)

    Reusch Thorsten BH

    2011-01-01

    Full Text Available Abstract Background Seagrasses are a polyphyletic group of monocotyledonous angiosperms that have adapted to a completely submerged lifestyle in marine waters. Here, we exploit two collections of expressed sequence tags (ESTs of two wide-spread and ecologically important seagrass species, the Mediterranean seagrass Posidonia oceanica (L. Delile and the eelgrass Zostera marina L., which have independently evolved from aquatic ancestors. This replicated, yet independent evolutionary history facilitates the identification of traits that may have evolved in parallel and are possible instrumental candidates for adaptation to a marine habitat. Results In our study, we provide the first quantitative perspective on molecular adaptations in two seagrass species. By constructing orthologous gene clusters shared between two seagrasses (Z. marina and P. oceanica and eight distantly related terrestrial angiosperm species, 51 genes could be identified with detection of positive selection along the seagrass branches of the phylogenetic tree. Characterization of these positively selected genes using KEGG pathways and the Gene Ontology uncovered that these genes are mostly involved in translation, metabolism, and photosynthesis. Conclusions These results provide first insights into which seagrass genes have diverged from their terrestrial counterparts via an initial aquatic stage characteristic of the order and to the derived fully-marine stage characteristic of seagrasses. We discuss how adaptive changes in these processes may have contributed to the evolution towards an aquatic and marine existence.

  17. Candidate Gene Sequence Analyses toward Identifying Rsv3-Type Resistance to Soybean Mosaic Virus

    Directory of Open Access Journals (Sweden)

    N. R. Redekar

    2016-07-01

    Full Text Available is one of three genetic loci conferring strain-specific resistance to (SMV. The locus has been mapped to a 154-kb region on chromosome 14, containing a cluster of five nucleotide-binding leucine-rich repeat (NB-LRR resistance genes. High sequence similarity between the candidate genes challenges fine mapping of the locus. Among the five, Glyma14g38533 showed the highest transcript abundance in 1 to 3 h of SMV-G7 inoculation. Comparative sequence analyses were conducted with the five candidate NB-LRR genes from susceptible (-type soybean [ (L. Merr.] cultivar Williams 82, resistant (-type cultivar Hwangkeum, and resistant lines L29 and RRR. Sequence comparisons revealed that Glyma14g38533 had far more polymorphisms than the other candidate genes. Interestingly, Glyma14g38533 gene from -type lines exhibited 150 single-nucleotide polymorphism (SNP and six insertion–deletion (InDel markers relative to -type line, Furthermore, the polymorphisms identified in three -type lines were highly conserved. Several polymorphisms were validated in 18 -type resistant and six -type susceptible lines and were found associated with their disease response. The majority of the polymorphisms were located in LRR domain encoding region, which is involved in pathogen recognition via protein–protein interactions. These findings associating Glyma14g38533 with -type resistance to SMV suggest it is the most likely candidate gene for .

  18. Effects of gene disruptions in the nisin gene cluster of Lactococcus lactis on nisin production and producer immunity

    NARCIS (Netherlands)

    Ra, Runar; Beerthuyzen, Marke M.; Vos, Willem M. de; Saris, Per E.J.; Kuipers, Oscar P.

    1999-01-01

    The lantibiotic nisin is produced by several strains of Lactococcus lactis subsp. lactis. The chromosomally located gene cluster nisABTCIPRKFEG is required for biosynthesis, development of immunity, and regulation of gene expression. In-frame deletions in the nisB and nisT genes, and disruption of

  19. Differentially expressed genes in pancreatic ductal adenocarcinomas identified through serial analysis of gene expression

    DEFF Research Database (Denmark)

    Hustinx, Steven R; Cao, Dengfeng; Maitra, Anirban

    2004-01-01

    generated from six pancreatic cancers were compared to SAGE libraries generated from 11 non-neoplastic tissues. Compared to normal tissue libraries, we identified 453 SAGE tags as differentially expressed in pancreatic cancer, including 395 that mapped to known genes and 58 "uncharacterized" tags....... Of the 395 SAGE tags assigned to known genes, 223 were overexpressed in pancreatic cancer, and 172 were underexpressed. In order to map the 58 uncharacterized differentially expressed SAGE tags to genes, we used a newly developed resource called TAGmapper (http://tagmapper.ibioinformatics.org), to identify...

  20. Differentially expressed genes in pancreatic ductal adenocarcinomas identified through serial analysis of gene expression

    DEFF Research Database (Denmark)

    Hustinx, Steven R; Cao, Dengfeng; Maitra, Anirban

    2004-01-01

    genome and better biocomputational techniques have substantially improved the assignment of differentially expressed SAGE "tags" to human genes. These improvements have provided us with an opportunity to re-evaluate global gene expression in pancreatic cancer using existing SAGE libraries. SAGE libraries...... generated from six pancreatic cancers were compared to SAGE libraries generated from 11 non-neoplastic tissues. Compared to normal tissue libraries, we identified 453 SAGE tags as differentially expressed in pancreatic cancer, including 395 that mapped to known genes and 58 "uncharacterized" tags....... Of the 395 SAGE tags assigned to known genes, 223 were overexpressed in pancreatic cancer, and 172 were underexpressed. In order to map the 58 uncharacterized differentially expressed SAGE tags to genes, we used a newly developed resource called TAGmapper (http://tagmapper.ibioinformatics.org), to identify...

  1. Comprehensive Analysis of Gene Expression Profiles of Sepsis-Induced Multiorgan Failure Identified Its Valuable Biomarkers.

    Science.gov (United States)

    Wang, Yumei; Yin, Xiaoling; Yang, Fang

    2018-02-01

    Sepsis is an inflammatory-related disease, and severe sepsis would induce multiorgan dysfunction, which is the most common cause of death of patients in noncoronary intensive care units. Progression of novel therapeutic strategies has proven to be of little impact on the mortality of severe sepsis, and unfortunately, its mechanisms still remain poorly understood. In this study, we analyzed gene expression profiles of severe sepsis with failure of lung, kidney, and liver for the identification of potential biomarkers. We first downloaded the gene expression profiles from the Gene Expression Omnibus and performed preprocessing of raw microarray data sets and identification of differential expression genes (DEGs) through the R programming software; then, significantly enriched functions of DEGs in lung, kidney, and liver failure sepsis samples were obtained from the Database for Annotation, Visualization, and Integrated Discovery; finally, protein-protein interaction network was constructed for DEGs based on the STRING database, and network modules were also obtained through the MCODE cluster method. As a result, lung failure sepsis has the highest number of DEGs of 859, whereas the number of DEGs in kidney and liver failure sepsis samples is 178 and 175, respectively. In addition, 17 overlaps were obtained among the three lists of DEGs. Biological processes related to immune and inflammatory response were found to be significantly enriched in DEGs. Network and module analysis identified four gene clusters in which all or most of genes were upregulated. The expression changes of Icam1 and Socs3 were further validated through quantitative PCR analysis. This study should shed light on the development of sepsis and provide potential therapeutic targets for sepsis-induced multiorgan failure.

  2. Linking Strengths: Identifying and Exploring Protective Factor Clusters in Academically Resilient Low-Socioeconomic Urban Students of Color

    Science.gov (United States)

    Morales, Erik E.

    2010-01-01

    Based on data from qualitative interviews with 50 high-achieving low-socioeconomic students of color, two "clusters" of important and symbiotic protective factors are identified and explored. Each cluster consists of a series of interrelated protective factors identified by the participants as crucial to their statistically exceptional academic…

  3. Identifying knowledge activism in worker health and safety representation: A cluster analysis.

    Science.gov (United States)

    Hall, Alan; Oudyk, John; King, Andrew; Naqvi, Syed; Lewchuk, Wayne

    2016-01-01

    Although worker representation in OHS has been widely recognized as contributing to health and safety improvements at work, few studies have examined the role that worker representatives play in this process. Using a large quantitative sample, this paper seeks to confirm findings from an earlier exploratory qualitative study that worker representatives can be differentiated by the knowledge intensive tactics and strategies that they use to achieve changes in their workplace. Just under 900 worker health and safety representatives in Ontario completed surveys which asked them to report on the amount of time they devoted to different types of representation activities (i.e., technical activities such as inspections and report writing vs. political activities such as mobilizing workers to build support), the kinds of conditions or hazards they tried to address through their representation (e.g., housekeeping vs. modifications in ventilation systems), and their reported success in making positive improvements. A cluster analysis was used to determine whether the worker representatives could be distinguished in terms of the relative time devoted to different activities and the clusters were then compared with reference to types of intervention efforts and outcomes. The cluster analysis identified three distinct groupings of representatives with significant differences in reported types of interventions and in their level of reported impact. Two of the clusters were consistent with the findings in the exploratory study, identified as knowledge activism for greater emphasis on knowledge based political activity and technical-legal representation for greater emphasis on formalized technical oriented procedures and legal regulations. Knowledge activists were more likely to take on challenging interventions and they reported more impact across the full range of interventions. This paper provides further support for the concepts of knowledge activism and technical

  4. An Integrated Approach Identifies Nhlh1 and Insm1 as Sonic Hedgehog-regulated Genes in Developing Cerebellum and Medulloblastoma

    Directory of Open Access Journals (Sweden)

    Enrico De Smaele

    2008-01-01

    Full Text Available Medulloblastoma (MB is the most common malignant brain tumor of childhood arising from deregulated cerebellar development. Sonic Hedgehog (Shh pathway plays a critical role in cerebellar development and its aberrant expression has been identified in MB. Gene expression profiling of cerebella from 1- to 14-day-old mice unveiled a cluster of genes whose expression correlates with the levels of Hedgehog (HH activity. From this cluster, we identified Insm1 and Nhlh1/NSCL1 as novel HH targets induced by Shh treatment in cultured cerebellar granule cell progenitors. Nhlh1 promoter was found to be bound and activated by Gli1 transcription factor. Remarkably, the expression of these genes is also upregulated in mouse and human HH-dependent MBs, suggesting that they may be either a part of the HH-induced tumorigenic process or a specific trait of HH-dependent tumor cells.

  5. Evolution of C2H2-zinc finger genes and subfamilies in mammals: Species-specific duplication and loss of clusters, genes and effector domains

    Directory of Open Access Journals (Sweden)

    Aubry Muriel

    2008-06-01

    Full Text Available Abstract Background C2H2 zinc finger genes (C2H2-ZNF constitute the largest class of transcription factors in humans and one of the largest gene families in mammals. Often arranged in clusters in the genome, these genes are thought to have undergone a massive expansion in vertebrates, primarily by tandem duplication. However, this view is based on limited datasets restricted to a single chromosome or a specific subset of genes belonging to the large KRAB domain-containing C2H2-ZNF subfamily. Results Here, we present the first comprehensive study of the evolution of the C2H2-ZNF family in mammals. We assembled the complete repertoire of human C2H2-ZNF genes (718 in total, about 70% of which are organized into 81 clusters across all chromosomes. Based on an analysis of their N-terminal effector domains, we identified two new C2H2-ZNF subfamilies encoding genes with a SET or a HOMEO domain. We searched for the syntenic counterparts of the human clusters in other mammals for which complete gene data are available: chimpanzee, mouse, rat and dog. Cross-species comparisons show a large variation in the numbers of C2H2-ZNF genes within homologous mammalian clusters, suggesting differential patterns of evolution. Phylogenetic analysis of selected clusters reveals that the disparity in C2H2-ZNF gene repertoires across mammals not only originates from differential gene duplication but also from gene loss. Further, we discovered variations among orthologs in the number of zinc finger motifs and association of the effector domains, the latter often undergoing sequence degeneration. Combined with phylogenetic studies, physical maps and an analysis of the exon-intron organization of genes from the SCAN and KRAB domains-containing subfamilies, this result suggests that the SCAN subfamily emerged first, followed by the SCAN-KRAB and finally by the KRAB subfamily. Conclusion Our results are in agreement with the "birth and death hypothesis" for the evolution of

  6. Large-Scale Transposition Mutagenesis of Streptomyces coelicolor Identifies Hundreds of Genes Influencing Antibiotic Biosynthesis

    Science.gov (United States)

    Xu, Zhong; Wang, Yemin; Chater, Keith F.; Ou, Hong-Yu; Xu, H. Howard; Deng, Zixin

    2017-01-01

    ABSTRACT Gram-positive Streptomyces bacteria produce thousands of bioactive secondary metabolites, including antibiotics. To systematically investigate genes affecting secondary metabolism, we developed a hyperactive transposase-based Tn5 transposition system and employed it to mutagenize the model species Streptomyces coelicolor, leading to the identification of 51,443 transposition insertions. These insertions were distributed randomly along the chromosome except for some preferred regions associated with relatively low GC content in the chromosomal core. The base composition of the insertion site and its flanking sequences compiled from the 51,443 insertions implied a 19-bp expanded target site surrounding the insertion site, with a slight nucleic acid base preference in some positions, suggesting a relative randomness of Tn5 transposition targeting in the high-GC Streptomyces genome. From the mutagenesis library, 724 mutants involving 365 genes had altered levels of production of the tripyrrole antibiotic undecylprodigiosin (RED), including 17 genes in the RED biosynthetic gene cluster. Genetic complementation revealed that most of the insertions (more than two-thirds) were responsible for the changed antibiotic production. Genes associated with branched-chain amino acid biosynthesis, DNA metabolism, and protein modification affected RED production, and genes involved in signaling, stress, and transcriptional regulation were overrepresented. Some insertions caused dramatic changes in RED production, identifying future targets for strain improvement. IMPORTANCE High-GC Gram-positive streptomycetes and related actinomycetes have provided more than 100 clinical drugs used as antibiotics, immunosuppressants, and antitumor drugs. Their genomes harbor biosynthetic genes for many more unknown compounds with potential as future drugs. Here we developed a useful genome-wide mutagenesis tool based on the transposon Tn5 for the study of secondary metabolism and its

  7. Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species

    DEFF Research Database (Denmark)

    Kjærbølling, Inge; Vesth, Tammi C.; Frisvad, Jens C.

    2018-01-01

    to determine phylogeny and genetic diversity, showing that each presented genome contains 15–27% genes not found in other sequenced Aspergilli. In particular, A. novofumigatus was compared with the pathogenic species A. fumigatus. This suggests that A. novofumigatus can produce most of the same allergens......, virulence, and pathogenicity factors as A. fumigatus, suggesting that A. novofumigatus could be as pathogenic as A. fumigatus. Furthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences, and predictive algorithms. We thus identify putative SM....... campestris, A. novofumigatus, A. ochraceoroseus, and A. steynii) have been whole-genome PacBio sequenced to provide genetic references in three Aspergillus sections. A. taichungensis and A. candidus also were sequenced for SM elucidation. Thirteen Aspergillus genomes were analyzed with comparative genomics...

  8. The Paraoxonase Gene Cluster Protects Against Abdominal Aortic Aneurysm Formation.

    Science.gov (United States)

    Yan, Yun-Fei; Pei, Jian-Fei; Zhang, Yang; Zhang, Ran; Wang, Fang; Gao, Peng; Zhang, Zhu-Qin; Wang, Ting-Ting; She, Zhi-Gang; Chen, Hou-Zao; Liu, De-Pei

    2017-02-01

    Abdominal aortic aneurysm (AAA) is a life-threatening vascular pathology, the pathogenesis of which is closely related to oxidative stress. However, an effective pharmaceutical treatment is lacking because the exact cause of AAA remains unknown. Here, we aimed at delineating the role of the paraoxonases (PONs) gene cluster (PC), which prevents atherosclerosis through the detoxification of oxidized substrates, in AAA formation. PC transgenic (Tg) mice were crossed to an Apoe -/- background, and an angiotensin II-induced AAA mouse model was used to analyze the effect of the PC on AAA formation. Four weeks after angiotensin II infusion, PC-Tg Apoe -/- mice had a lower AAA incidence, smaller maximal abdominal aortic external diameter, and less medial elastin degradation than Apoe -/- mice. Importantly, PC-Tg Apoe -/- mice exhibited lower aortic reactive oxidative species production and oxidative stress than did the Apoe -/- control mice. As a consequence, the PC transgene alleviated angiotensin II-induced arterial inflammation and suppressed arterial extracellular matrix degradation. Specifically, on angiotensin II stimulation, PC-Tg vascular smooth muscle cells exhibited lower levels of reactive oxidative species production and a decrease in the activities and expression levels of matrix metalloproteinase-2 and matrix metalloproteinase-9. Moreover, PC-Tg serum also enhanced vascular smooth muscle cell oxidative stress resistance and further decreased the expression levels of matrix metalloproteinase-2 and matrix metalloproteinase-9, indicating that circulatory and vascular smooth muscle cell PC members suppress oxidative stress in a synergistic manner. Our findings reveal, for the first time, a protective role of the PC in AAA formation and suggest PONs as promising targets for AAA prevention. © 2016 American Heart Association, Inc.

  9. Lichen Biosynthetic Gene Clusters Part II: Homology Mapping Suggests a Functional Diversity.

    Science.gov (United States)

    Bertrand, Robert L; Abdel-Hameed, Mona; Sorensen, John L

    2018-02-27

    Lichens are renowned for their diverse natural products though little is known of the genetic programming dictating lichen natural product biosynthesis. We sequenced the genome of Cladonia uncialis and profiled its secondary metabolite biosynthetic gene clusters. Through a homology searching approach, we can now propose specific functions for gene products as well as the biosynthetic pathways that are encoded in several of these gene clusters. This analysis revealed that the lichen genome encodes the required enzymes for patulin and betaenones A-C biosynthesis, fungal toxins not known to be produced by lichens. Within several gene clusters, some (but not all) genes are genetically similar to genes devoted to secondary metabolite biosynthesis in Fungi. These lichen clusters also contain accessory tailoring genes without such genetic similarity, suggesting that the encoded tailoring enzymes perform distinct chemical transformations. We hypothesize that C. uncialis gene clusters have evolved by shuffling components of ancestral fungal clusters to create new series of chemical steps, leading to the production of hitherto undiscovered derivatives of fungal secondary metabolites.

  10. Characterization and detection of a widely distributed gene cluster that predicts anaerobic choline utilization by human gut bacteria.

    Science.gov (United States)

    Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P

    2015-04-14

    Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for

  11. TimeXNet: identifying active gene sub-networks using time-course gene expression profiles.

    Science.gov (United States)

    Patil, Ashwini; Nakai, Kenta

    2014-01-01

    Time-course gene expression profiles are frequently used to provide insight into the changes in cellular state over time and to infer the molecular pathways involved. When combined with large-scale molecular interaction networks, such data can provide information about the dynamics of cellular response to stimulus. However, few tools are currently available to predict a single active gene sub-network from time-course gene expression profiles. We introduce a tool, TimeXNet, which identifies active gene sub-networks with temporal paths using time-course gene expression profiles in the context of a weighted gene regulatory and protein-protein interaction network. TimeXNet uses a specialized form of the network flow optimization approach to identify the most probable paths connecting the genes with significant changes in expression at consecutive time intervals. TimeXNet has been extensively evaluated for its ability to predict novel regulators and their associated pathways within active gene sub-networks in the mouse innate immune response and the yeast osmotic stress response. Compared to other similar methods, TimeXNet identified up to 50% more novel regulators from independent experimental datasets. It predicted paths within a greater number of known pathways with longer overlaps (up to 7 consecutive edges) within these pathways. TimeXNet was also shown to be robust in the presence of varying amounts of noise in the molecular interaction network. TimeXNet is a reliable tool that can be used to study cellular response to stimuli through the identification of time-dependent active gene sub-networks in diverse biological systems. It is significantly better than other similar tools. TimeXNet is implemented in Java as a stand-alone application and supported on Linux, MS Windows and Macintosh. The output of TimeXNet can be directly viewed in Cytoscape. TimeXNet is freely available for non-commercial users.

  12. Leveraging long sequencing reads to investigate R-gene clustering and variation in sugar beet

    Science.gov (United States)

    Host-pathogen interactions are of prime importance to modern agriculture. Plants utilize various types of resistance genes to mitigate pathogen damage. Identification of the specific gene responsible for a specific resistance can be difficult due to duplication and clustering within R-gene families....

  13. Microarray Analyses of Peripheral Blood Cells Identifies Unique Gene Expression Signature in Psoriatic Arthritis

    Science.gov (United States)

    Batliwalla, Franak M.; Li, Wentian; Ritchlin, Christopher T.; Xiao, Xiangli; Brenner, Max; Laragione, Teresina; Shao, Tianmeng; Durham, Robert; Kemshetti, Sunil; Schwarz, Edward; Coe, Rodney; Kern, Marlena; Baechler, Emily C.; Behrens, Timothy W.; Gregersen, Peter K.

    2005-01-01

    Psoriatic arthritis (PsA) is a chronic and erosive form of arthritis of unknown cause. We aimed to characterize the PsA phenotype using gene expression profiling and comparing it with healthy control subjects and patients rheumatoid arthritis (RA). Peripheral blood cells (PBCs) of 19 patients with active PsA and 19 age- and sex-matched control subjects were used in the analyses of PsA, with blood samples collected in PaxGene tubes. A significant alteration in the pattern of expression of 313 genes was noted in the PBCs of PsA patients on Affymetrix U133A arrays: 257 genes were expressed at reduced levels in PsA, and 56 genes were expressed at increased levels, compared with controls. Downregulated genes tended to cluster to certain chromosomal regions, including those containing the psoriasis susceptibility loci PSORS1 and PSORS2. Among the genes with the most significantly reduced expression were those involved in downregulation or suppression of innate and acquired immune responses, such as SIGIRR, STAT3, SHP1, IKBKB, IL-11RA, and TCF7, suggesting inappropriate control that favors proin-flammatory responses. Several members of the MAPK signaling pathway and tumor suppressor genes showed reduced expression. Three proinflammatory genes—S100A8, S100A12, and thioredoxin—showed increased expression. Logistic regression and recursive partitioning analysis determined that one gene, nucleoporin 62 kDa, could correctly classify all controls and 94.7% of the PsA patients. Using a dataset of 48 RA samples for comparison, the combination of two genes, MAP3K3 followed by CACNA1S, was enough to correctly classify all RA and PsA patients. Thus, PBC gene expression profiling identified a gene expression signature that differentiated PsA from RA, and PsA from controls. Several novel genes were differentially expressed in PsA and may prove to be diagnostic biomarkers or serve as new targets for the development of therapies. PMID:16622521

  14. Horizontal transfer of a nitrate assimilation gene cluster and ecological transitions in fungi: a phylogenetic study.

    Directory of Open Access Journals (Sweden)

    Jason C Slot

    Full Text Available High affinity nitrate assimilation genes in fungi occur in a cluster (fHANT-AC that can be coordinately regulated. The clustered genes include nrt2, which codes for a high affinity nitrate transporter; euknr, which codes for nitrate reductase; and NAD(PH-nir, which codes for nitrite reductase. Homologs of genes in the fHANT-AC occur in other eukaryotes and prokaryotes, but they have only been found clustered in the oomycete Phytophthora (heterokonts. We performed independent and concatenated phylogenetic analyses of homologs of all three genes in the fHANT-AC. Phylogenetic analyses limited to fungal sequences suggest that the fHANT-AC has been transferred horizontally from a basidiomycete (mushrooms and smuts to an ancestor of the ascomycetous mold Trichoderma reesei. Phylogenetic analyses of sequences from diverse eukaryotes and eubacteria, and cluster structure, are consistent with a hypothesis that the fHANT-AC was assembled in a lineage leading to the oomycetes and was subsequently transferred to the Dikarya (Ascomycota+Basidiomycota, which is a derived fungal clade that includes the vast majority of terrestrial fungi. We propose that the acquisition of high affinity nitrate assimilation contributed to the success of Dikarya on land by allowing exploitation of nitrate in aerobic soils, and the subsequent transfer of a complete assimilation cluster improved the fitness of T. reesei in a new niche. Horizontal transmission of this cluster of functionally integrated genes supports the "selfish operon" hypothesis for maintenance of gene clusters.

  15. Expression-based clustering of CAZyme-encoding genes of Aspergillus niger.

    Science.gov (United States)

    Gruben, Birgit S; Mäkelä, Miia R; Kowalczyk, Joanna E; Zhou, Miaomiao; Benoit-Gelber, Isabelle; De Vries, Ronald P

    2017-11-23

    The Aspergillus niger genome contains a large repertoire of genes encoding carbohydrate active enzymes (CAZymes) that are targeted to plant polysaccharide degradation enabling A. niger to grow on a wide range of plant biomass substrates. Which genes need to be activated in certain environmental conditions depends on the composition of the available substrate. Previous studies have demonstrated the involvement of a number of transcriptional regulators in plant biomass degradation and have identified sets of target genes for each regulator. In this study, a broad transcriptional analysis was performed of the A. niger genes encoding (putative) plant polysaccharide degrading enzymes. Microarray data focusing on the initial response of A. niger to the presence of plant biomass related carbon sources were analyzed of a wild-type strain N402 that was grown on a large range of carbon sources and of the regulatory mutant strains ΔxlnR, ΔaraR, ΔamyR, ΔrhaR and ΔgalX that were grown on their specific inducing compounds. The cluster analysis of the expression data revealed several groups of co-regulated genes, which goes beyond the traditionally described co-regulated gene sets. Additional putative target genes of the selected regulators were identified, based on their expression profile. Notably, in several cases the expression profile puts questions on the function assignment of uncharacterized genes that was based on homology searches, highlighting the need for more extensive biochemical studies into the substrate specificity of enzymes encoded by these non-characterized genes. The data also revealed sets of genes that were upregulated in the regulatory mutants, suggesting interaction between the regulatory systems and a therefore even more complex overall regulatory network than has been reported so far. Expression profiling on a large number of substrates provides better insight in the complex regulatory systems that drive the conversion of plant biomass by fungi. In

  16. Figmop: a profile HMM to identify genes and bypass troublesome gene models in draft genomes.

    Science.gov (United States)

    Curran, David M; Gilleard, John S; Wasmuth, James D

    2014-11-15

    Gene models from draft genome assemblies of metazoan species are often incorrect, missing exons or entire genes, particularly for large gene families. Consequently, labour-intensive manual curation is often necessary. We present Figmop (Finding Genes using Motif Patterns) to help with the manual curation of gene families in draft genome assemblies. The program uses a pattern of short sequence motifs to identify putative genes directly from the genome sequence. Using a large gene family as a test case, Figmop was found to be more sensitive and specific than a BLAST-based approach. The visualization used allows the validation of potential genes to be carried out quickly and easily, saving hours if not days from an analysis. Source code of Figmop is freely available for download at https://github.com/dave-the-scientist, implemented in C and Python and is supported on Linux, Unix and MacOSX. curran.dave.m@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Three classes of recurrent DNA break clusters in brain progenitors identified by 3D proximity-based break joining assay.

    Science.gov (United States)

    Wei, Pei-Chi; Lee, Cheng-Sheng; Du, Zhou; Schwer, Bjoern; Zhang, Yuxiang; Kao, Jennifer; Zurita, Jeffrey; Alt, Frederick W

    2018-02-20

    We recently discovered 27 recurrent DNA double-strand break (DSB) clusters (RDCs) in mouse neural stem/progenitor cells (NSPCs). Most RDCs occurred across long, late-replicating RDC genes and were found only after mild inhibition of DNA replication. RDC genes share intriguing characteristics, including encoding surface proteins that organize brain architecture and neuronal junctions, and are genetically implicated in neuropsychiatric disorders and/or cancers. RDC identification relies on high-throughput genome-wide translocation sequencing (HTGTS), which maps recurrent DSBs based on their translocation to "bait" DSBs in specific chromosomal locations. Cellular heterogeneity in 3D genome organization allowed unequivocal identification of RDCs on 14 different chromosomes using HTGTS baits on three mouse chromosomes. Additional candidate RDCs were also implicated, however, suggesting that some RDCs were missed. To more completely identify RDCs, we exploited our finding that joining of two DSBs occurs more frequently if they lie on the same cis chromosome. Thus, we used CRISPR/Cas9 to introduce specific DSBs into each mouse chromosome in NSPCs that were used as bait for HTGTS libraries. This analysis confirmed all 27 previously identified RDCs and identified many new ones. NSPC RDCs fall into three groups based on length, organization, transcription level, and replication timing of genes within them. While mostly less robust, the largest group of newly defined RDCs share many intriguing characteristics with the original 27. Our findings also revealed RDCs in NSPCs in the absence of induced replication stress, and support the idea that the latter treatment augments an already active endogenous process.

  18. Activation and clustering of a Plasmodium falciparum var gene are affected by subtelomeric sequences.

    Science.gov (United States)

    Duffy, Michael F; Tang, Jingyi; Sumardy, Fransisca; Nguyen, Hanh H T; Selvarajah, Shamista A; Josling, Gabrielle A; Day, Karen P; Petter, Michaela; Brown, Graham V

    2017-01-01

    The Plasmodium falciparum var multigene family encodes the cytoadhesive, variant antigen PfEMP1. P. falciparum antigenic variation and cytoadhesion specificity are controlled by epigenetic switching between the single, or few, simultaneously expressed var genes. Most var genes are maintained in perinuclear clusters of heterochromatic telomeres. The active var gene(s) occupy a single, perinuclear var expression site. It is unresolved whether the var expression site forms in situ at a telomeric cluster or whether it is an extant compartment to which single chromosomes travel, thus controlling var switching. Here we show that transcription of a var gene did not require decreased colocalisation with clusters of telomeres, supporting var expression site formation in situ. However following recombination within adjacent subtelomeric sequences, the same var gene was persistently activated and did colocalise less with telomeric clusters. Thus, participation in stable, heterochromatic, telomere clusters and var switching are independent but are both affected by subtelomeric sequences. The var expression site colocalised with the euchromatic mark H3K27ac to a greater extent than it did with heterochromatic H3K9me3. H3K27ac was enriched within the active var gene promoter even when the var gene was transiently repressed in mature parasites and thus H3K27ac may contribute to var gene epigenetic memory. © 2016 Federation of European Biochemical Societies.

  19. Global analysis of biosynthetic gene clusters reveals vast potential of secondary metabolite production in Penicillium species

    DEFF Research Database (Denmark)

    Nielsen, Jens Christian; Grijseels, Sietske; Prigent, Sylvain

    2017-01-01

    -referenced the predicted pathways with published data on the production of secondary metabolites and experimentally validated the production of antibiotic yanuthones in Penicillia and identified a previously undescribed compound from the yanuthone pathway. This study is the first genus-wide analysis of the genomic......Filamentous fungi produce a wide range of bioactive compounds with important pharmaceutical applications, such as antibiotic penicillins and cholesterol-lowering statins. However, less attention has been paid to fungal secondary metabolites compared to those from bacteria. In this study, we...... sequenced the genomes of 9 Penicillium species and, together with 15 published genomes, we investigated the secondary metabolism of Penicillium and identified an immense, unexploited potential for producing secondary metabolites by this genus. A total of 1,317 putative biosynthetic gene clusters (BGCs) were...

  20. Single Molecule Cluster Analysis Identifies Signature Dynamic Conformations along the Splicing Pathway

    Science.gov (United States)

    Blanco, Mario R.; Martin, Joshua S.; Kahlscheuer, Matthew L.; Krishnan, Ramya; Abelson, John; Laederach, Alain; Walter, Nils G.

    2016-01-01

    The spliceosome is the dynamic RNA-protein machine responsible for faithfully splicing introns from precursor messenger RNAs (pre-mRNAs). Many of the dynamic processes required for the proper assembly, catalytic activation, and disassembly of the spliceosome as it acts on its pre-mRNA substrate remain poorly understood, a challenge that persists for many biomolecular machines. Here, we developed a fluorescence-based Single Molecule Cluster Analysis (SiMCAn) tool to dissect the manifold conformational dynamics of a pre-mRNA through the splicing cycle. By clustering common dynamic behaviors derived from selectively blocked splicing reactions, SiMCAn was able to identify signature conformations and dynamic behaviors of multiple ATP-dependent intermediates. In addition, it identified a conformation adopted late in splicing by a 3′ splice site mutant, invoking a mechanism for substrate proofreading. SiMCAn presents a novel framework for interpreting complex single molecule behaviors that should prove widely useful for the comprehensive analysis of a plethora of dynamic cellular machines. PMID:26414013

  1. CASSIS and SMIPS: promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes.

    Science.gov (United States)

    Wolf, Thomas; Shelest, Vladimir; Nath, Neetika; Shelest, Ekaterina

    2016-04-15

    Secondary metabolites (SM) are structurally diverse natural products of high pharmaceutical importance. Genes involved in their biosynthesis are often organized in clusters, i.e., are co-localized and co-expressed. In silico cluster prediction in eukaryotic genomes remains problematic mainly due to the high variability of the clusters' content and lack of other distinguishing sequence features. We present Cluster Assignment by Islands of Sites (CASSIS), a method for SM cluster prediction in eukaryotic genomes, and Secondary Metabolites by InterProScan (SMIPS), a tool for genome-wide detection of SM key enzymes ('anchor' genes): polyketide synthases, non-ribosomal peptide synthetases and dimethylallyl tryptophan synthases. Unlike other tools based on protein similarity, CASSIS exploits the idea of co-regulation of the cluster genes, which assumes the existence of common regulatory patterns in the cluster promoters. The method searches for 'islands' of enriched cluster-specific motifs in the vicinity of anchor genes. It was validated in a series of cross-validation experiments and showed high sensitivity and specificity. CASSIS and SMIPS are freely available at https://sbi.hki-jena.de/cassis thomas.wolf@leibniz-hki.de or ekaterina.shelest@leibniz-hki.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  2. Characterization of the fumonisin B2 biosynthetic gene cluster in Aspergillus niger and A. awamori.

    Science.gov (United States)

    Aspergillus niger and A. awamori strains isolated from grapes cultivated in Mediterranean basin were examined for fumonisin B2 (FB2) production and presence/absence of sequences within the fumonisin biosynthetic gene (fum) cluster. Presence of 13 regions in the fum cluster was evaluated by PCR assay...

  3. Molecular population genetics of the β-esterase gene cluster of ...

    Indian Academy of Sciences (India)

    However there are some 'footprints' of directional and balancing selection shaping specific distribution of nucleotide polymorphism within the cluster. Intergenic epistatic selection between Est-6 and Est-6 may play an important role in the evolution of the -esterase gene cluster preserving the putative pseudogene from ...

  4. Clustering reveals limits of parameter identifiability in multi-parameter models of biochemical dynamics.

    Science.gov (United States)

    Nienałtowski, Karol; Włodarczyk, Michał; Lipniacki, Tomasz; Komorowski, Michał

    2015-09-29

    Compared to engineering or physics problems, dynamical models in quantitative biology typically depend on a relatively large number of parameters. Progress in developing mathematics to manipulate such multi-parameter models and so enable their efficient interplay with experiments has been slow. Existing solutions are significantly limited by model size. In order to simplify analysis of multi-parameter models a method for clustering of model parameters is proposed. It is based on a derived statistically meaningful measure of similarity between groups of parameters. The measure quantifies to what extend changes in values of some parameters can be compensated by changes in values of other parameters. The proposed methodology provides a natural mathematical language to precisely communicate and visualise effects resulting from compensatory changes in values of parameters. As a results, a relevant insight into identifiability analysis and experimental planning can be obtained. Analysis of NF-κB and MAPK pathway models shows that highly compensative parameters constitute clusters consistent with the network topology. The method applied to examine an exceptionally rich set of published experiments on the NF-κB dynamics reveals that the experiments jointly ensure identifiability of only 60% of model parameters. The method indicates which further experiments should be performed in order to increase the number of identifiable parameters. We currently lack methods that simplify broadly understood analysis of multi-parameter models. The introduced tools depict mutually compensative effects between parameters to provide insight regarding role of individual parameters, identifiability and experimental design. The method can also find applications in related methodological areas of model simplification and parameters estimation.

  5. Heterologous expression of the Halothiobacillus neapolitanus carboxysomal gene cluster in Corynebacterium glutamicum.

    Science.gov (United States)

    Baumgart, Meike; Huber, Isabel; Abdollahzadeh, Iman; Gensch, Thomas; Frunzke, Julia

    2017-09-20

    Compartmentalization represents a ubiquitous principle used by living organisms to optimize metabolic flux and to avoid detrimental interactions within the cytoplasm. Proteinaceous bacterial microcompartments (BMCs) have therefore created strong interest for the encapsulation of heterologous pathways in microbial model organisms. However, attempts were so far mostly restricted to Escherichia coli. Here, we introduced the carboxysomal gene cluster of Halothiobacillus neapolitanus into the biotechnological platform species Corynebacterium gluta-micum. Transmission electron microscopy, fluorescence microscopy and single molecule localization microscopy suggested the formation of BMC-like structures in cells expressing the complete carboxysome operon or only the shell proteins. Purified carboxysomes consisted of the expected protein components as verified by mass spectrometry. Enzymatic assays revealed the functional production of RuBisCO in C. glutamicum both in the presence and absence of carboxysomal shell proteins. Furthermore, we could show that eYFP is targeted to the carboxysomes by fusion to the large RuBisCO subunit. Overall, this study represents the first transfer of an α-carboxysomal gene cluster into a Gram-positive model species supporting the modularity and orthogonality of these microcompartments, but also identified important challenges which need to be addressed on the way towards biotechnological application. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Galaxy Clusters Identified from the SDSS DR6 and Their Properties

    Science.gov (United States)

    Wen, Z. L.; Han, J. L.; Liu, F. S.

    2009-08-01

    Clusters of galaxies in most of the previous catalogs have redshifts z contamination rate of member galaxies is found to be roughly 20%, and the completeness of member galaxy detection reaches ~90%. Monte Carlo simulations show that the cluster detection rate is more than 90% for massive (M 200 > 2 × 1014 M sun) clusters of z candidates of X-ray clusters are found by cross-identification of our clusters with the source list of the ROSAT X-ray survey.

  7. Identifying genes related to choriogenesis in insect panoistic ovaries by Suppression Subtractive Hybridization

    Directory of Open Access Journals (Sweden)

    Bellés Xavier

    2009-04-01

    Full Text Available Abstract Background Insect ovarioles are classified into two categories: panoistic and meroistic, the later having apparently evolved from an ancestral panoistic type. Molecular data on oogenesis is practically restricted to meroistic ovaries. If we aim at studying the evolutionary transition from panoistic to meroistic, data on panoistic ovaries should be gathered. To this end, we planned the construction of a Suppression Subtractive Hybridization (SSH library to identify genes involved in panoistic choriogenesis, using the cockroach Blattella germanica as model. Results We constructed a post-vitellogenic ovary library by SSH to isolate genes involved in choriogenesis in B. germanica. The tester library was prepared with an ovary pool from 6- to 7-day-old females, whereas the driver library was prepared with an ovary pool from 3- to 4-day-old females. From the SSH library, we obtained 258 high quality sequences which clustered into 34 unique sequences grouped in 19 contigs and 15 singlets. The sequences were compared against non-redundant NCBI databases using BLAST. We found that 44% of the unique sequences had homologous sequences in known genes of other organisms, whereas 56% had no significant similarity to any of the databases entries. A Gene Ontology analysis was carried out, classifying the 34 sequences into different functional categories. Seven of these gene sequences, representative of different categories and processes, were chosen to perform expression studies during the first gonadotrophic cycle by real-time PCR. Results showed that they were mainly expressed during post-vitellogenesis, which validates the SSH technique. In two of them corresponding to novel genes, we demonstrated that they are specifically expressed in the cytoplasm of follicular cells in basal oocytes at the time of choriogenesis. Conclusion The SSH approach has proven to be useful in identifying ovarian genes expressed after vitellogenesis in B. germanica. For

  8. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    Energy Technology Data Exchange (ETDEWEB)

    Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,' ' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  9. A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs database

    Directory of Open Access Journals (Sweden)

    Boore Jeffrey L

    2006-04-01

    Full Text Available Abstract Background We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community. Discussion The PhIGs database currently contains 23 completely sequenced genomes of fungi and metazoans, containing 409,653 genes that have been grouped into 42,645 gene clusters. Each gene cluster is built such that the gene sequence distances are consistent with the known organismal relationships and in so doing, maximizing the likelihood for the clusters to represent truly orthologous genes. The PhIGs website contains tools that allow the study of genes within their phylogenetic framework through keyword searches on annotations, such as GO and InterPro assignments, and sequence similarity searches by BLAST and HMM. In addition to displaying the evolutionary relationships of the genes in each cluster, the website also allows users to view the relative physical positions of homologous genes in specified sets of genomes. Summary Accurate analyses of genes and genomes can only be done within their full phylogenetic context. The PhIGs database and

  10. Nonblack patients with sickle cell disease have African. beta. sup s gene cluster haplotypes

    Energy Technology Data Exchange (ETDEWEB)

    Rogers, Z.R.; Powars, D.R.; Williams, W.D. (Univ. of Southern California School of Medicine, Los Angeles (USA)); Kinney, T.R. (Duke Univ., Durham, NC (USA)); Schroeder, W.A. (California Institute of Technology, Pasadena (USA))

    1989-05-26

    Of 18 nonblack patients with sickle cell disease, 14 had sickle cell anemia, 2 had hemoglobin SC disease, and 2 had hemoglobin S-{beta}{sup o}-thalassemia. The {beta}{sup s} gene cluster haplotypes that were determined in 7 patients were of African origin and were identified as Central African Republic, Central African Republic minor II, Benin, and Senegal. The haplotype Central African Republic minor II was present on the {beta}{sup o}-thalassemia chromosome in 2 patients. None of 10 patients whose {alpha}-gene status was determined had {alpha}-thalassemia-2. These data strongly support the concept that the {beta}{sup s} gene on chromosome 11 of these individuals is of African origin and that the {alpha}-gene locus on chromosome 16 is of white or native American origin. The clinical severity of the disease in these nonblack patients is appropriate to their haplotype without {alpha}-thalassemia-2 and is comparable with that of black patients. All persons with congenital hemolytic anemia should be examined for the presence of sickle cell disease regardless of physical appearance or ethnic background.

  11. Identifying genes and gene networks involved in chromium metabolism and detoxification in Crambe abyssinica

    Energy Technology Data Exchange (ETDEWEB)

    Zulfiqar, Asma, E-mail: asmazulfiqar08@yahoo.com [Department of Plant, Soil, and Insect Sciences, 270 Stockbridge Road, University of Massachusetts Amherst, MA 01003 (United States); Paulose, Bibin, E-mail: bpaulose@psis.umass.edu [Department of Plant, Soil, and Insect Sciences, 270 Stockbridge Road, University of Massachusetts Amherst, MA 01003 (United States); Chhikara, Sudesh, E-mail: sudesh@psis.umass.edu [Department of Plant, Soil, and Insect Sciences, 270 Stockbridge Road, University of Massachusetts Amherst, MA 01003 (United States); Dhankher, Om Parkash, E-mail: parkash@psis.umass.edu [Department of Plant, Soil, and Insect Sciences, 270 Stockbridge Road, University of Massachusetts Amherst, MA 01003 (United States)

    2011-10-15

    Chromium pollution is a serious environmental problem with few cost-effective remediation strategies available. Crambe abyssinica (a member of Brassicaseae), a non-food, fast growing high biomass crop, is an ideal candidate for phytoremediation of heavy metals contaminated soils. The present study used a PCR-Select Suppression Subtraction Hybridization approach in C. abyssinica to isolate differentially expressed genes in response to Cr exposure. A total of 72 differentially expressed subtracted cDNAs were sequenced and found to represent 43 genes. The subtracted cDNAs suggest that Cr stress significantly affects pathways related to stress/defense, ion transporters, sulfur assimilation, cell signaling, protein degradation, photosynthesis and cell metabolism. The regulation of these genes in response to Cr exposure was further confirmed by semi-quantitative RT-PCR. Characterization of these differentially expressed genes may enable the engineering of non-food, high-biomass plants, including C. abyssinica, for phytoremediation of Cr-contaminated soils and sediments. - Highlights: > Molecular mechanism of Cr uptake and detoxification in plants is not well known. > We identified differentially regulated genes upon Cr exposure in Crambe abyssinica. > 72 Cr-induced subtracted cDNAs were sequenced and found to represent 43 genes. > Pathways linked to stress, ion transport, and sulfur assimilation were affected. > This is the first Cr transcriptome study in a crop with phytoremediation potential. - This study describes the identification and isolation of differentially expressed genes involved in chromium metabolism and detoxification in a non-food industrial oil crop Crambe abyssinica.

  12. Organization of nif gene cluster in Frankia sp. EuIK1 strain, a symbiont of Elaeagnus umbellata.

    Science.gov (United States)

    Oh, Chang Jae; Kim, Ho Bang; Kim, Jitae; Kim, Won Jin; Lee, Hyoungseok; An, Chung Sun

    2012-01-01

    The nucleotide sequence of a 20.5-kb genomic region harboring nif genes was determined and analyzed. The fragment was obtained from Frankia sp. EuIK1 strain, an indigenous symbiont of Elaeagnus umbellata. A total of 20 ORFs including 12 nif genes were identified and subjected to comparative analysis with the genome sequences of 3 Frankia strains representing diverse host plant specificities. The nucleotide and deduced amino acid sequences showed highest levels of identity with orthologous genes from an Elaeagnus-infecting strain. The gene organization patterns around the nif gene clusters were well conserved among all 4 Frankia strains. However, characteristic features appeared in the location of the nifV gene for each Frankia strain, depending on the type of host plant. Sequence analysis was performed to determine the transcription units and suggested that there could be an independent operon starting from the nifW gene in the EuIK strain. Considering the organization patterns and their total extensions on the genome, we propose that the nif gene clusters remained stable despite genetic variations occurring in the Frankia genomes.

  13. Unusual Gene Order and Organization of the Sea Urchin HoxCluster

    Energy Technology Data Exchange (ETDEWEB)

    Richardson, Paul M.; Lucas, Susan; Cameron, R. Andrew; Rowen,Lee; Nesbitt, Ryan; Bloom, Scott; Rast, Jonathan P.; Berney, Kevin; Arenas-Mena, Cesar; Martinez, Pedro; Davidson, Eric H.; Peterson, KevinJ.; Hood, Leroy

    2005-05-10

    The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3' gene is Hox5. (The gene order is : 5'-Hox1,2, 3, 11/13c, 11/13b, '11/13a, 9/10, 8, 7, 6, 5 - 3)'. The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.

  14. A Link-Based Cluster Ensemble Approach For Improved Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    P.Balaji

    2015-01-01

    Full Text Available Abstract It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset for a defined set of gene expression data because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms this is no more totally optimal. Cluster ensemble research can solve this type of problem by automatically merging multiple data partitions from a wide range of different clusterings of any dimensions to improve both the quality and robustness of the clustering result. But we have many existing ensemble approaches using an association matrix to condense sample-cluster and co-occurrence statistics and relations within the ensemble are encapsulated only at raw level while the existing among clusters are totally discriminated. Finding these missing associations can greatly expand the capability of those ensemble methodologies for microarray data clustering. We propose general K-means cluster ensemble approach for the clustering of general categorical data into required number of partitions.

  15. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters

    Science.gov (United States)

    Schorn, Michelle A.; Alanjary, Mohammad M.; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R.; Ziemert, Nadine

    2016-01-01

    Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites. PMID:27902408

  16. Yeast homologous recombination-based promoter engineering for the activation of silent natural product biosynthetic gene clusters.

    Science.gov (United States)

    Montiel, Daniel; Kang, Hahk-Soo; Chang, Fang-Yuan; Charlop-Powers, Zachary; Brady, Sean F

    2015-07-21

    Large-scale sequencing of prokaryotic (meta)genomic DNA suggests that most bacterial natural product gene clusters are not expressed under common laboratory culture conditions. Silent gene clusters represent a promising resource for natural product discovery and the development of a new generation of therapeutics. Unfortunately, the characterization of molecules encoded by these clusters is hampered owing to our inability to express these gene clusters in the laboratory. To address this bottleneck, we have developed a promoter-engineering platform to transcriptionally activate silent gene clusters in a model heterologous host. Our approach uses yeast homologous recombination, an auxotrophy complementation-based yeast selection system and sequence orthogonal promoter cassettes to exchange all native promoters in silent gene clusters with constitutively active promoters. As part of this platform, we constructed and validated a set of bidirectional promoter cassettes consisting of orthogonal promoter sequences, Streptomyces ribosome binding sites, and yeast selectable marker genes. Using these tools we demonstrate the ability to simultaneously insert multiple promoter cassettes into a gene cluster, thereby expediting the reengineering process. We apply this method to model active and silent gene clusters (rebeccamycin and tetarimycin) and to the silent, cryptic pseudogene-containing, environmental DNA-derived Lzr gene cluster. Complete promoter refactoring and targeted gene exchange in this "dead" cluster led to the discovery of potent indolotryptoline antiproliferative agents, lazarimides A and B. This potentially scalable and cost-effective promoter reengineering platform should streamline the discovery of natural products from silent natural product biosynthetic gene clusters.

  17. Clustered organization, polycistronic transcription, and evolution of modification-guide snoRNA genes in Euglena gracilis.

    Science.gov (United States)

    Moore, Ashley N; Russell, Anthony G

    2012-01-01

    Previous studies have shown that the eukaryotic microbe Euglena gracilis contains an unusually large assortment of small nucleolar RNAs (snoRNAs) and ribosomal RNA (rRNA) modification sites. However, little is known about the evolutionary mechanisms contributing to this situation. In this study, we have examined the organization and evolution of snoRNA genes in Euglena with the additional objective of determining how these properties relate to the rRNA modification pattern in this protist. We have identified and extensively characterized a clustered pattern of genes encoding previously biochemically isolated snoRNA sequences in E. gracilis. We show that polycistronic transcription is a prevalent snoRNA gene expression strategy in this organism. Further, we have identified 121 new snoRNA coding regions through sequence analysis of these clusters. We have identified an E. gracilis U14 snoRNA homolog clustered with modification-guide snoRNA genes. The U14 snoRNAs in other eukaryotic organisms examined to date typically contain both a modification and a processing domain. E. gracilis U14 lacks the modification domain but retains the processing domain. Our analysis of U14 structure and evolution in Euglena and other eukaryotes allows us to propose a model for its evolution and suggest its processing role may be its more important function, explaining its conservation in many eukaryotes. The preponderance of apparent small and larger-scale duplication events in the genomic regions we have characterized in Euglena provides a mechanism for the generation of the unusually diverse collection and abundance of snoRNAs and modified rRNA sites. Our findings provide the framework for more extensive whole genome analysis to elucidate whether these snoRNA gene clusters are spread across multiple chromosomes and/or form dense "arrays" at a limited number of chromosomal loci.

  18. Comparative phylogenomic analyses of teleost fish Hox gene clusters: lessons from the cichlid fish Astatotilapia burtoni

    Directory of Open Access Journals (Sweden)

    Kuehl Jennifer V

    2007-09-01

    Full Text Available Abstract Background Teleost fish have seven paralogous clusters of Hox genes stemming from two complete genome duplications early in vertebrate evolution, and an additional genome duplication during the evolution of ray-finned fish, followed by the secondary loss of one cluster. Gene duplications on the one hand, and the evolution of regulatory sequences on the other, are thought to be among the most important mechanisms for the evolution of new gene functions. Cichlid fish, the largest family of vertebrates with about 2500 species, are famous examples of speciation and morphological diversity. Since this diversity could be based on regulatory changes, we chose to study the coding as well as putative regulatory regions of their Hox clusters within a comparative genomic framework. Results We sequenced and characterized all seven Hox clusters of Astatotilapia burtoni, a haplochromine cichlid fish. Comparative analyses with data from other teleost fish such as zebrafish, two species of pufferfish, stickleback and medaka were performed. We traced losses of genes and microRNAs of Hox clusters, the medaka lineage seems to have lost more microRNAs than the other fish lineages. We found that each teleost genome studied so far has a unique set of Hox genes. The hoxb7a gene was lost independently several times during teleost evolution, the most recent event being within the radiation of East African cichlid fish. The conserved non-coding sequences (CNS encompass a surprisingly large part of the clusters, especially in the HoxAa, HoxCa, and HoxDa clusters. Across all clusters, we observe a trend towards an increased content of CNS towards the anterior end. Conclusion The gene content of Hox clusters in teleost fishes is more variable than expected, with each species studied so far having a different set. Although the highest loss rate of Hox genes occurred immediately after whole genome duplications, our analyses showed that gene loss continued and is

  19. Dispersed Benzoxazinone Gene Cluster: Molecular Characterization and Chromosomal Localization of Glucosyltransferase and Glucosidase Genes in Wheat and Rye1[W

    Science.gov (United States)

    Sue, Masayuki; Nakamura, Chihiro; Nomura, Taiji

    2011-01-01

    Benzoxazinones (Bxs) are major defensive secondary metabolites in wheat (Triticum aestivum), rye (Secale cereale), and maize (Zea mays). Here, we identified full sets of homeologous and paralogous genes encoding Bx glucosyltransferase (GT) and Bx-glucoside glucosidase (Glu) in hexaploid wheat (2n = 6x = 42; AABBDD). Four GT loci (TaGTa–TaGTd) were mapped on chromosomes 7A, 7B (two loci), and 7D, whereas four glu1 loci (Taglu1a–Taglu1d) were on chromosomes 2A, 2B (two loci), and 2D. Transcript levels differed greatly among the four loci; B-genome loci of both TaGT and Taglu1 genes were preferentially transcribed. Catalytic properties of the enzyme encoded by each homeolog/paralog also differed despite high levels of identity among amino acid sequences. The predominant contribution of the B genome to GT and Glu reactions was revealed, as observed previously for the five Bx biosynthetic genes, TaBx1 to TaBx5, which are separately located on homeologous groups 4 and 5 chromosomes. In rye, where the ScBx1 to ScBx5 genes are dispersed to chromosomes 7R and 5R, ScGT and Scglu were located separately on chromosomes 4R and 2R, respectively. The dispersal of Bx-pathway loci to four distinct chromosomes in hexaploid wheat and rye suggests that the clustering of Bx-pathway genes, as found in maize, is not essential for coordinated transcription. On the other hand, barley (Hordeum vulgare) was found to lack the orthologous GT and glu loci like the Bx1 to Bx5 loci despite its close phylogenetic relationship with wheat and rye. These results contribute to our understanding of the evolutionary processes that the Bx-pathway loci have undergone in grasses. PMID:21875895

  20. Dispersed benzoxazinone gene cluster: molecular characterization and chromosomal localization of glucosyltransferase and glucosidase genes in wheat and rye.

    Science.gov (United States)

    Sue, Masayuki; Nakamura, Chihiro; Nomura, Taiji

    2011-11-01

    Benzoxazinones (Bxs) are major defensive secondary metabolites in wheat (Triticum aestivum), rye (Secale cereale), and maize (Zea mays). Here, we identified full sets of homeologous and paralogous genes encoding Bx glucosyltransferase (GT) and Bx-glucoside glucosidase (Glu) in hexaploid wheat (2n = 6x = 42; AABBDD). Four GT loci (TaGTa-TaGTd) were mapped on chromosomes 7A, 7B (two loci), and 7D, whereas four glu1 loci (Taglu1a-Taglu1d) were on chromosomes 2A, 2B (two loci), and 2D. Transcript levels differed greatly among the four loci; B-genome loci of both TaGT and Taglu1 genes were preferentially transcribed. Catalytic properties of the enzyme encoded by each homeolog/paralog also differed despite high levels of identity among amino acid sequences. The predominant contribution of the B genome to GT and Glu reactions was revealed, as observed previously for the five Bx biosynthetic genes, TaBx1 to TaBx5, which are separately located on homeologous groups 4 and 5 chromosomes. In rye, where the ScBx1 to ScBx5 genes are dispersed to chromosomes 7R and 5R, ScGT and Scglu were located separately on chromosomes 4R and 2R, respectively. The dispersal of Bx-pathway loci to four distinct chromosomes in hexaploid wheat and rye suggests that the clustering of Bx-pathway genes, as found in maize, is not essential for coordinated transcription. On the other hand, barley (Hordeum vulgare) was found to lack the orthologous GT and glu loci like the Bx1 to Bx5 loci despite its close phylogenetic relationship with wheat and rye. These results contribute to our understanding of the evolutionary processes that the Bx-pathway loci have undergone in grasses.

  1. In silico analysis highlights the frequency and diversity of type 1 lantibiotic gene clusters in genome sequenced bacteria

    LENUS (Irish Health Repository)

    Marsh, Alan J

    2010-11-30

    Abstract Background Lantibiotics are lanthionine-containing, post-translationally modified antimicrobial peptides. These peptides have significant, but largely untapped, potential as preservatives and chemotherapeutic agents. Type 1 lantibiotics are those in which lanthionine residues are introduced into the structural peptide (LanA) through the activity of separate lanthionine dehydratase (LanB) and lanthionine synthetase (LanC) enzymes. Here we take advantage of the conserved nature of LanC enzymes to devise an in silico approach to identify potential lantibiotic-encoding gene clusters in genome sequenced bacteria. Results In total 49 novel type 1 lantibiotic clusters were identified which unexpectedly were associated with species, genera and even phyla of bacteria which have not previously been associated with lantibiotic production. Conclusions Multiple type 1 lantibiotic gene clusters were identified at a frequency that suggests that these antimicrobials are much more widespread than previously thought. These clusters represent a rich repository which can yield a large number of valuable novel antimicrobials and biosynthetic enzymes.

  2. Evolution of Chromosomal Clostridium botulinum Type E Neurotoxin Gene Clusters: Evidence Provided by Their Rare Plasmid-Borne Counterparts.

    Science.gov (United States)

    Carter, Andrew T; Austin, John W; Weedmark, Kelly A; Peck, Michael W

    2016-03-02

    Analysis of more than 150 Clostridium botulinum Group II type E genomes identified a small fraction (6%) where neurotoxin-encoding genes were located on plasmids. Seven closely related (134-144 kb) neurotoxigenic plasmids of subtypes E1, E3, and E10 were characterized; all carried genes associated with plasmid mobility via conjugation. Each plasmid contained the same 24-kb neurotoxin cluster cassette (six neurotoxin cluster and six flanking genes) that had split a helicase gene, rather than the more common chromosomal rarA. The neurotoxin cluster cassettes had evolved as separate genetic units which had either exited their chromosomal rarA locus in a series of parallel events, inserting into the plasmid-borne helicase gene, or vice versa. A single intact version of the helicase gene was discovered on a nonneurotoxigenic form of this plasmid. The observed low frequency for the plasmid location may reflect one or more of the following: 1) Less efficient recombination mechanism for the helicase gene target, 2) lack of suitable target plasmids, and 3) loss of neurotoxigenic plasmids. Type E1 and E10 plasmids possessed a Clustered Regularly Interspaced Short Palindromic Repeats locus with spacers that recognized C. botulinum Group II plasmids, but not C. botulinum Group I plasmids, demonstrating their long-term separation. Clostridium botulinum Group II type E strains also carry nonneurotoxigenic plasmids closely related to C. botulinum Group II types B and F plasmids. Here, the absence of neurotoxin cassettes may be because recombination requires both a specific mechanism and specific target sequence, which are rarely found together. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  3. Method of Selection of Bacteria Antibiotic Resistance Genes Based on Clustering of Similar Nucleotide Sequences.

    Science.gov (United States)

    Balashov, I S; Naumov, V A; Borovikov, P I; Gordeev, A B; Dubodelov, D V; Lyubasovskaya, L A; Rodchenko, Yu V; Bystritskii, A A; Aleksandrova, N V; Trofimov, D Yu; Priputnevich, T V

    2017-10-01

    A new method for selection of bacterium antibiotic resistance genes is proposed and tested for solving the problems related to selection of primers for PCR assay. The method implies clustering of similar nucleotide sequences and selection of group primers for all genes of each cluster. Clustering of resistance genes for six groups of antibiotics (aminoglycosides, β-lactams, fluoroquinolones, glycopeptides, macrolides and lincosamides, and fusidic acid) was performed. The method was tested for 81 strains of bacteria of different genera isolated from patients (K. pneumoniae, Staphylococcus spp., S. agalactiae, E. faecalis, E. coli, and G. vaginalis). The results obtained by us are comparable to those in the selection of individual genes; this allows reducing the number of primers necessary for maximum coverage of the known antibiotic resistance genes during PCR analysis.

  4. Mapping in an apple (Malus x domestica) F1 segregating population based on physical clustering of differentially expressed genes.

    Science.gov (United States)

    Jensen, Philip J; Fazio, Gennaro; Altman, Naomi; Praul, Craig; McNellis, Timothy W

    2014-04-04

    Apple tree breeding is slow and difficult due to long generation times, self-incompatibility, and complex genetics. The identification of molecular markers linked to traits of interest is a way to expedite the breeding process. In the present study, we aimed to identify genes whose steady-state transcript abundance was associated with inheritance of specific traits segregating in an apple (Malus × domestica) rootstock F1 breeding population, including resistance to powdery mildew (Podosphaera leucotricha) disease and woolly apple aphid (Eriosoma lanigerum). Transcription profiling was performed for 48 individual F1 apple trees from a cross of two highly heterozygous parents, using RNA isolated from healthy, actively-growing shoot tips and a custom apple DNA oligonucleotide microarray representing 26,000 unique transcripts. Genome-wide expression profiles were not clear indicators of powdery mildew or woolly apple aphid resistance phenotype. However, standard differential gene expression analysis between phenotypic groups of trees revealed relatively small sets of genes with trait-associated expression levels. For example, thirty genes were identified that were differentially expressed between trees resistant and susceptible to powdery mildew. Interestingly, the genes encoding twenty-four of these transcripts were physically clustered on chromosome 12. Similarly, seven genes were identified that were differentially expressed between trees resistant and susceptible to woolly apple aphid, and the genes encoding five of these transcripts were also clustered, this time on chromosome 17. In each case, the gene clusters were in the vicinity of previously identified major quantitative trait loci for the corresponding trait. Similar results were obtained for a series of molecular traits. Several of the differentially expressed genes were used to develop DNA polymorphism markers linked to powdery mildew disease and woolly apple aphid resistance. Gene expression profiling

  5. Insights into the evolutionary origins of clostridial neurotoxins from analysis of the Clostridium botulinum strain A neurotoxin gene cluster.

    Science.gov (United States)

    Doxey, Andrew C; Lynch, Michael D J; Müller, Kirsten M; Meiering, Elizabeth M; McConkey, Brendan J

    2008-11-14

    Clostridial neurotoxins (CNTs) are the most deadly toxins known and causal agents of botulism and tetanus neuroparalytic diseases. Despite considerable progress in understanding CNT structure and function, the evolutionary origins of CNTs remain a mystery as they are unique to Clostridium and possess a sequence and structural architecture distinct from other protein families. Uncovering the origins of CNTs would be a significant contribution to our understanding of how pathogens evolve and generate novel toxin families. The C. botulinum strain A genome was examined for potential homologues of CNTs. A key link was identified between the neurotoxin and the flagellin gene (CBO0798) located immediately upstream of the BoNT/A neurotoxin gene cluster. This flagellin sequence displayed the strongest sequence similarity to the neurotoxin and NTNH homologue out of all proteins encoded within C. botulinum strain A. The CBO0798 gene contains a unique hypervariable region, which in closely related flagellins encodes a collagenase-like domain. Remarkably, these collagenase-containing flagellins were found to possess the characteristic HEXXH zinc-protease motif responsible for the neurotoxin's endopeptidase activity. Additional links to collagenase-related sequences and functions were detected by further analysis of CNTs and surrounding genes, including sequence similarities to collagen-adhesion domains and collagenases. Furthermore, the neurotoxin's HCRn domain was found to exhibit both structural and sequence similarity to eukaryotic collagen jelly-roll domains. Multiple lines of evidence suggest that the neurotoxin and adjacent genes evolved from an ancestral collagenase-like gene cluster, linking CNTs to another major family of clostridial proteolytic toxins. Duplication, reshuffling and assembly of neighboring genes within the BoNT/A neurotoxin gene cluster may have lead to the neurotoxin's unique architecture. This work provides new insights into the evolution of C

  6. Analysis of the retinal gene expression profile after hypoxic preconditioning identifies candidate genes for neuroprotection

    Directory of Open Access Journals (Sweden)

    Wenzel Andreas

    2008-02-01

    Full Text Available Abstract Background Retinal degeneration is a main cause of blindness in humans. Neuroprotective therapies may be used to rescue retinal cells and preserve vision. Hypoxic preconditioning stabilizes the transcription factor HIF-1α in the retina and strongly protects photoreceptors in an animal model of light-induced retinal degeneration. To address the molecular mechanisms of the protection, we analyzed the transcriptome of the hypoxic retina using microarrays and real-time PCR. Results Hypoxic exposure induced a marked alteration in the retinal transcriptome with significantly different expression levels of 431 genes immediately after hypoxic exposure. The normal expression profile was restored within 16 hours of reoxygenation. Among the differentially regulated genes, several candidates for neuroprotection were identified like metallothionein-1 and -2, the HIF-1 target gene adrenomedullin and the gene encoding the antioxidative and cytoprotective enzyme paraoxonase 1 which was previously not known to be a hypoxia responsive gene in the retina. The strongly upregulated cyclin dependent kinase inhibitor p21 was excluded from being essential for neuroprotection. Conclusion Our data suggest that neuroprotection after hypoxic preconditioning is the result of the differential expression of a multitude of genes which may act in concert to protect visual cells against a toxic insult.

  7. The genome of tolypocladium inflatum: evolution, organization, and expression of the cyclosporin biosynthetic gene cluster.

    Directory of Open Access Journals (Sweden)

    Kathryn E Bushley

    2013-06-01

    Full Text Available The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921, the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS that encodes for cyclosporin synthetase (simA and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc., and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further

  8. Time-course microarray analysis for identifying candidate genes involved in obesity-associated pathological changes in the mouse colon.

    Science.gov (United States)

    Bae, Yun Jung; Kim, Sung-Eun; Hong, Seong Yeon; Park, Taesun; Lee, Sang Gyu; Choi, Myung-Sook; Sung, Mi-Kyung

    2016-01-01

    Obesity is known to increase the risk of colorectal cancer. However, mechanisms underlying the pathogenesis of obesity-induced colorectal cancer are not completely understood. The purposes of this study were to identify differentially expressed genes in the colon of mice with diet-induced obesity and to select candidate genes as early markers of obesity-associated abnormal cell growth in the colon. C57BL/6N mice were fed normal diet (11% fat energy) or high-fat diet (40% fat energy) and were euthanized at different time points. Genome-wide expression profiles of the colon were determined at 2, 4, 8, and 12 weeks. Cluster analysis was performed using expression data of genes showing log 2 fold change of ≥1 or ≤-1 (twofold change), based on time-dependent expression patterns, followed by virtual network analysis. High-fat diet-fed mice showed significant increase in body weight and total visceral fat weight over 12 weeks. Time-course microarray analysis showed that 50, 47, 36, and 411 genes were differentially expressed at 2, 4, 8, and 12 weeks, respectively. Ten cluster profiles representing distinguishable patterns of genes differentially expressed over time were determined. Cluster 4, which consisted of genes showing the most significant alterations in expression in response to high-fat diet over 12 weeks, included Apoa4 (apolipoprotein A-IV), Ppap2b (phosphatidic acid phosphatase type 2B), Cel (carboxyl ester lipase), and Clps (colipase, pancreatic), which interacted strongly with surrounding genes associated with colorectal cancer or obesity. Our data indicate that Apoa4 , Ppap2b , Cel , and Clps are candidate early marker genes associated with obesity-related pathological changes in the colon. Genome-wide analyses performed in the present study provide new insights on selecting novel genes that may be associated with the development of diseases of the colon.

  9. Whole Genome Analysis of Injectional Anthrax Identifies Two Disease Clusters Spanning More Than 13 Years

    Directory of Open Access Journals (Sweden)

    Paul Keim

    2015-11-01

    Lay Person Interpretation: Injectional anthrax has been plaguing heroin drug users across Europe for more than 10 years. In order to better understand this outbreak, we assessed genomic relationships of all available injectional anthrax strains from four countries spanning a >12 year period. Very few differences were identified using genome-based analysis, but these differentiated the isolates into two distinct clusters. This strongly supports a hypothesis of at least two separate anthrax spore contamination events perhaps during the drug production processes. Identification of two events would not have been possible from standard epidemiological analysis. These comprehensive data will be invaluable for classifying future injectional anthrax isolates and for future geographic attribution.

  10. Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome

    Directory of Open Access Journals (Sweden)

    Dougan Gordon

    2009-12-01

    Full Text Available Abstract Background Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region. Results The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI, and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC to establish a standardized naming scheme for alpha-defensins. Conclusions Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene

  11. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    Science.gov (United States)

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of EOperon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Molecular Typing and Virulence Gene Profiles of Enterotoxin Gene Cluster (egc)-Positive Staphylococcus aureus Isolates Obtained from Various Food and Clinical Specimens.

    Science.gov (United States)

    Song, Minghui; Shi, Chunlei; Xu, Xuebing; Shi, Xianming

    2016-11-01

    The enterotoxin gene cluster (egc) has been proposed to contribute to the Staphylococcus aureus colonization, which highlights the need to evaluate genetic diversity and virulence gene profiles of the egc-positive population. Here, a total of 43 egc-positive isolates (16.2%) were identified from 266 S. aureus isolates that were obtained from various food and clinical specimens in Shanghai. Seven different egc profiles were found based on the polymerase chain reaction (PCR) result for egc genes. Then, these 43 egc-positive isolates were further typed by multilocus sequence typing, pulsed-field gel electrophoresis (PFGE), multiple-locus variable-number tandem-repeat analysis (MLVA), and accessory gene regulatory (agr) typing. It showed that the 43 egc-positive isolates displayed 17 sequence types, 28 PFGE patterns, 29 MLVA types, and 4 agr types, respectively. Among them, the dominant clonal lineage was CC5-agr II (48.84%). Thirty toxin and 20 adhesion-associated genes were detected by PCR in egc-positive isolates. Notably, invasive toxin genes showed a high prevalence, such as 76.7% for Panton-Valentine leukocidin encoding genes, 27.9% for sec, and 23.3% for tsst-1. Most of the examined adhesion-associated genes were found to be conserved (76.7-100%), whereas the fnbB gene was only found in 8 (18.6%) isolates. In addition, 33 toxin gene profiles and 13 adhesion gene profiles were identified, respectively. Our results imply that isolates belonging to the same clonal lineage harbored similar adhesion gene profiles but diverse toxin gene profiles. Overall, the high prevalence of invasive virulence genes increases the potential risk of egc-positive isolates in S. aureus infection.

  13. Identifying novel genes in C. elegans using SAGE tags

    Directory of Open Access Journals (Sweden)

    Chen Nansheng

    2010-12-01

    Full Text Available Abstract Background Despite extensive efforts devoted to predicting protein-coding genes in genome sequences, many bona fide genes have not been found and many existing gene models are not accurate in all sequenced eukaryote genomes. This situation is partly explained by the fact that gene prediction programs have been developed based on our incomplete understanding of gene feature information such as splicing and promoter characteristics. Additionally, full-length cDNAs of many genes and their isoforms are hard to obtain due to their low level or rare expression. In order to obtain full-length sequences of all protein-coding genes, alternative approaches are required. Results In this project, we have developed a method of reconstructing full-length cDNA sequences based on short expressed sequence tags which is called sequence tag-based amplification of cDNA ends (STACE. Expressed tags are used as anchors for retrieving full-length transcripts in two rounds of PCR amplification. We have demonstrated the application of STACE in reconstructing full-length cDNA sequences using expressed tags mined in an array of serial analysis of gene expression (SAGE of C. elegans cDNA libraries. We have successfully applied STACE to recover sequence information for 12 genes, for two of which we found isoforms. STACE was used to successfully recover full-length cDNA sequences for seven of these genes. Conclusions The STACE method can be used to effectively reconstruct full-length cDNA sequences of genes that are under-represented in cDNA sequencing projects and have been missed by existing gene prediction methods, but their existence has been suggested by short sequence tags such as SAGE tags.

  14. Gene Expression and the Diversity of Identified Neurons

    OpenAIRE

    Buck, L.; Stein, R.; Palazzolo, M.; Anderson, D. J.; Axel, R.

    1983-01-01

    Nervous systems consist of diverse populations of neurons that are anatomically and functionally distinct. The diversity of neurons and the precision with which they are interconnected suggest that specific genes or sets of genes are activated in some neurons but not expressed in others. Experimentally, this problem may be considered at two levels. First, what is the total number of genes expressed in the brain, and how are they distributed among the different populations of neurons? Second, ...

  15. MicroRNAs located in the Hox gene clusters are implicated in huntington's disease pathogenesis.

    Directory of Open Access Journals (Sweden)

    Andrew G Hoss

    2014-02-01

    Full Text Available Transcriptional dysregulation has long been recognized as central to the pathogenesis of Huntington's disease (HD. MicroRNAs (miRNAs represent a major system of post-transcriptional regulation, by either preventing translational initiation or by targeting transcripts for storage or for degradation. Using next-generation miRNA sequencing in prefrontal cortex (Brodmann Area 9 of twelve HD and nine controls, we identified five miRNAs (miR-10b-5p, miR-196a-5p, miR-196b-5p, miR-615-3p and miR-1247-5p up-regulated in HD at genome-wide significance (FDR q-value<0.05. Three of these, miR-196a-5p, miR-196b-5p and miR-615-3p, were expressed at near zero levels in control brains. Expression was verified for all five miRNAs using reverse transcription quantitative PCR and all but miR-1247-5p were replicated in an independent sample (8HD/8C. Ectopic miR-10b-5p expression in PC12 HTT-Q73 cells increased survival by MTT assay and cell viability staining suggesting increased expression may be a protective response. All of the miRNAs but miR-1247-5p are located in intergenic regions of Hox clusters. Total mRNA sequencing in the same samples identified fifteen of 55 genes within the Hox cluster gene regions as differentially expressed in HD, and the Hox genes immediately adjacent to the four Hox cluster miRNAs as up-regulated. Pathway analysis of mRNA targets of these miRNAs implicated functions for neuronal differentiation, neurite outgrowth, cell death and survival. In regression models among the HD brains, huntingtin CAG repeat size, onset age and age at death were independently found to be inversely related to miR-10b-5p levels. CAG repeat size and onset age were independently inversely related to miR-196a-5p, onset age was inversely related to miR-196b-5p and age at death was inversely related to miR-615-3p expression. These results suggest these Hox-related miRNAs may be involved in neuroprotective response in HD. Recently, miRNAs have shown promise as

  16. Sequencing and Transcriptional Analysis of the Biosynthesis Gene Cluster of Putrescine-Producing Lactococcus lactis ▿ †

    Science.gov (United States)

    Ladero, Victor; Rattray, Fergal P.; Mayo, Baltasar; Martín, María Cruz; Fernández, María; Alvarez, Miguel A.

    2011-01-01

    Lactococcus lactis is a prokaryotic microorganism with great importance as a culture starter and has become the model species among the lactic acid bacteria. The long and safe history of use of L. lactis in dairy fermentations has resulted in the classification of this species as GRAS (General Regarded As Safe) or QPS (Qualified Presumption of Safety). However, our group has identified several strains of L. lactis subsp. lactis and L. lactis subsp. cremoris that are able to produce putrescine from agmatine via the agmatine deiminase (AGDI) pathway. Putrescine is a biogenic amine that confers undesirable flavor characteristics and may even have toxic effects. The AGDI cluster of L. lactis is composed of a putative regulatory gene, aguR, followed by the genes (aguB, aguD, aguA, and aguC) encoding the catabolic enzymes. These genes are transcribed as an operon that is induced in the presence of agmatine. In some strains, an insertion (IS) element interrupts the transcription of the cluster, which results in a non-putrescine-producing phenotype. Based on this knowledge, a PCR-based test was developed in order to differentiate nonproducing L. lactis strains from those with a functional AGDI cluster. The analysis of the AGDI cluster and their flanking regions revealed that the capacity to produce putrescine via the AGDI pathway could be a specific characteristic that was lost during the adaptation to the milk environment by a process of reductive genome evolution. PMID:21803900

  17. antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification

    DEFF Research Database (Denmark)

    Blin, Kai; Wolf, Thomas; Chevrette, Marc G.

    2017-01-01

    architectures. Additionally, several usability features have been updated and improved. Together, these improvements make antiSMASH up-to-date with the latest developments in natural product research and will further facilitate computational genome mining for the discovery of novel bioactive molecules.......Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding...... the production of such compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' has assisted researchers in efficiently performing this, both as a web server and a standalone tool. Here, we present the thoroughly updated antiSMASH version 4, which adds several novel features...

  18. Mycobiota and identification of aflatoxin gene cluster in marketed spices in West Africa

    DEFF Research Database (Denmark)

    Gnonlonfin, G. J. B.; Adjovi, Y. C.; Tokpo, A. F.

    2013-01-01

    of Aspergillus were dominant on all marketed dried and milled spices irrespective of country. Gene characterization and amplification analysis showed that most of the Aspergillus flavus isolates possess the cluster genes for aflatoxin production. Aflatoxin B1 assessment by Thin Layer Chromatography showed...

  19. Gene expression in human hippocampus from cocaine abusers identifies genes which regulate extracellular matrix remodeling.

    Directory of Open Access Journals (Sweden)

    Deborah C Mash

    2007-11-01

    Full Text Available The chronic effects of cocaine abuse on brain structure and function are blamed for the inability of most addicts to remain abstinent. Part of the difficulty in preventing relapse is the persisting memory of the intense euphoria or cocaine "rush". Most abused drugs and alcohol induce neuroplastic changes in brain pathways subserving emotion and cognition. Such changes may account for the consolidation and structural reconfiguration of synaptic connections with exposure to cocaine. Adaptive hippocampal plasticity could be related to specific patterns of gene expression with chronic cocaine abuse. Here, we compare gene expression profiles in the human hippocampus from cocaine addicts and age-matched drug-free control subjects. Cocaine abusers had 151 gene transcripts upregulated, while 91 gene transcripts were downregulated. Topping the list of cocaine-regulated transcripts was RECK in the human hippocampus (FC = 2.0; p<0.05. RECK is a membrane-anchored MMP inhibitor that is implicated in the coordinated regulation of extracellular matrix integrity and angiogenesis. In keeping with elevated RECK expression, active MMP9 protein levels were decreased in the hippocampus from cocaine abusers. Pathway analysis identified other genes regulated by cocaine that code for proteins involved in the remodeling of the cytomatrix and synaptic connections and the inhibition of blood vessel proliferation (PCDH8, LAMB1, ITGB6, CTGF and EphB4. The observed microarray phenotype in the human hippocampus identified RECK and other region-specific genes that may promote long-lasting structural changes with repeated cocaine abuse. Extracellular matrix remodeling in the hippocampus may be a persisting effect of chronic abuse that contributes to the compulsive and relapsing nature of cocaine addiction.

  20. Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples.

    Science.gov (United States)

    Shi, Jinlong; Luo, Zhigang

    2010-08-01

    Gene expression data are the representation of nonlinear interactions among genes and environmental factors. Computing analysis of these data is expected to gain knowledge of gene functions and disease mechanisms. Clustering is a classical exploratory technique of discovering similar expression patterns and function modules. However, gene expression data are usually of high dimensions and relatively small samples, which results in the main difficulty for the application of clustering algorithms. Principal component analysis (PCA) is usually used to reduce the data dimensions for further clustering analysis. While PCA estimates the similarity between expression profiles based on the Euclidean distance, which cannot reveal the nonlinear connections between genes. This paper uses nonlinear dimensionality reduction (NDR) as a preprocessing strategy for feature selection and visualization, and then applies clustering algorithms to the reduced feature spaces. In order to estimate the effectiveness of NDR for capturing biologically relevant structures, the comparative analysis between NDR and PCA is exploited to five real cancer expression datasets. Results show that NDR can perform better than PCA in visualization and clustering analysis of complex gene expression data. Copyright 2010 Elsevier Ltd. All rights reserved.

  1. Identification and characterization of a biosynthetic gene cluster for tryptophan dimers in deep sea-derived Streptomyces sp. SCSIO 03032.

    Science.gov (United States)

    Ma, Liang; Zhang, Wenjun; Zhu, Yiguang; Zhang, Guangtao; Zhang, Haibo; Zhang, Qingbo; Zhang, Liping; Yuan, Chengshan; Zhang, Changsheng

    2017-08-01

    Tryptophan dimers (TDs) are an important class of natural products with diverse bioactivities and share conserved biosynthetic pathways. We report the identification of a partial gene cluster (spm) responsible for the biosynthesis of a class of unusual TDs with non-planar skeletons including spiroindimicins (SPMs), indimicins (IDMs), and lynamicins (LNMs) from the deep-sea derived Streptomyces sp. SCSIO 03032. Bioinformatics analysis, targeted gene disruptions, and heterologous expression studies confirmed the involvement of the spm gene cluster in the biosynthesis of SPM/IDM/LNMs, and revealed the indispensable roles for the halogenase/reductase pair SpmHF, the amino acid oxidase SpmO, and the chromopyrrolic acid (CPA) synthase SpmD, as well as the positive regulator SpmR and the putative transporter SpmA. However, the spm gene cluster was unable to confer a heterologous host the ability to produce SPM/IDM/LNMs. In addition, the P450 enzyme SpmP and the monooxygenase SpmX2 were found to be non-relevant to the biosynthesis of SPM/IDM/LNMs. Sequence alignment and structure modeling suggested the lack of key conserved amino acid residues in the substrate-binding pocket of SpmP. Furthermore, feeding experiments in the non-producing ΔspmO mutant revealed several biosynthetic precursors en route to SPMs, indicating that key enzymes responsible for the biosynthesis of SPMs should be encoded by genes outside of the identified spm gene cluster. Finally, the biosynthetic pathways of SPM/IDM/LNMs are proposed to lay a basis for further insights into their intriguing biosynthetic machinery.

  2. Structural variation of the ribosomal gene cluster within the class Insecta

    Energy Technology Data Exchange (ETDEWEB)

    Mukha, D.V.; Sidorenko, A.P.; Lazebnaya, I.V. [Vavilov Institute of General Genetics, Moscow (Russian Federation)] [and others

    1995-09-01

    General estimation of ribosomal DNA variation within the class Insecta is presented. It is shown that, using blot-hybridization, one can detect differences in the structure of the ribosomal gene cluster not only between genera within an order, but also between species within a genera, including sibling species. Structure of the ribosomal gene cluster of the Coccinellidae family (ladybirds) is analyzed. It is shown that cloned highly conservative regions of ribosomal DNA of Tetrahymena pyriformis can be used as probes for analyzing ribosomal genes in insects. 24 refs., 4 figs.

  3. Comparative Genomics and an Insect Model Rapidly Identify Novel Virulence Genes of Burkholderia mallei

    Science.gov (United States)

    2008-04-01

    T3SS ) (19, 20, 23, 60, 64, 66), and lipopolysaccharide O antigen (LPS) (12). Type II secre- tion, type IV pili, and flagella (10, 11, 18) have been...factors since it contains most of the genes in the animal pathogen-like T3SS (T3SSAP), CAP, T6S, and LPS biosynthetic gene clusters, the only...gene clusters ( T3SS , T6S, and CAP) that are es- sential for B. mallei virulence. In contrast, orthologs for 10% of the virulome CDSs were found in the

  4. Identification and manipulation of the pleuromutilin gene cluster from Clitopilus passeckerianus for increased rapid antibiotic production

    Science.gov (United States)

    Bailey, Andy M.; Alberti, Fabrizio; Kilaru, Sreedhar; Collins, Catherine M.; de Mattos-Shipley, Kate; Hartley, Amanda J.; Hayes, Patrick; Griffin, Alison; Lazarus, Colin M.; Cox, Russell J.; Willis, Christine L.; O'Dwyer, Karen; Spence, David W.; Foster, Gary D.

    2016-05-01

    Semi-synthetic derivatives of the tricyclic diterpene antibiotic pleuromutilin from the basidiomycete Clitopilus passeckerianus are important in combatting bacterial infections in human and veterinary medicine. These compounds belong to the only new class of antibiotics for human applications, with novel mode of action and lack of cross-resistance, representing a class with great potential. Basidiomycete fungi, being dikaryotic, are not generally amenable to strain improvement. We report identification of the seven-gene pleuromutilin gene cluster and verify that using various targeted approaches aimed at increasing antibiotic production in C. passeckerianus, no improvement in yield was achieved. The seven-gene pleuromutilin cluster was reconstructed within Aspergillus oryzae giving production of pleuromutilin in an ascomycete, with a significant increase (2106%) in production. This is the first gene cluster from a basidiomycete to be successfully expressed in an ascomycete, and paves the way for the exploitation of a metabolically rich but traditionally overlooked group of fungi.

  5. Prevalence of the lmo0036-0043 gene cluster encoding arginine deiminase and agmatine deiminase systems in Listeria monocytogenes.

    Science.gov (United States)

    Chen, Jianshun; Chen, Fan; Cheng, Changyong; Fang, Weihuan

    2013-04-01

    Arginine deiminase and agmatine deiminase systems are involved in acid tolerance, and their encoding genes form the cluster lmo0036-0043 in Listeria monocytogenes. While lmo0042 and lmo0043 were conserved in all L. monocytogenes strains, the lmo0036-0041 region of this cluster was identified in all lineages I and II, and the majority of lineage IV (83.3%) strains, but absent in all lineage III and a small fraction of lineage IV (16.7%) strains, suggesting that the presence of the complete lmo0036-0043 cluster is dependent on lineages. lmo0036-0043-complete and -deficient lineage IV strains exhibit specific ascB-dapE profiles, which might represent two subpopulations with distinct genetic characteristics.

  6. Identification of the Biosynthetic Gene Clusters for the Lipopeptides Fusaristatin A and W493 B in Fusarium graminearum and F. pseudograminearum

    DEFF Research Database (Denmark)

    Sørensen, Jens Laurids; Sondergaard, Teis Esben; Covarelli, Lorenzo

    2014-01-01

    The closely related species Fusarium graminearum and Fusarium pseudograminearum differ in that each contains a gene cluster with a polyketide synthase (PKS) and a nonribosomal peptide synthetase (NRPS) that is not present in the other species. To identify their products, we deleted PKS6 and NRPS7...... Fusarium species. On the basis of genes in the putative gene clusters we propose a model for biosynthesis where the polyketide product is shuttled to the NPRS via a CoA ligase and a thioesterase in F. pseudograminearum. In F. graminearum the polyketide is proposed to be directly assimilated by the NRPS....

  7. A Generally Applicable Translational Strategy Identifies S100A4 as a Candidate Gene in Allergy

    DEFF Research Database (Denmark)

    Bruhn, Sören; Fang, Yu; Barrenäs, Fredrik

    2014-01-01

    The identification of diagnostic markers and therapeutic candidate genes in common diseases is complicated by the involvement of thousands of genes. We hypothesized that genes co-regulated with a key gene in allergy, IL13, would form a module that could help to identify candidate genes. We identi...

  8. Defining reference sequences for Nocardia species by similarity and clustering analyses of 16S rRNA gene sequence data.

    Directory of Open Access Journals (Sweden)

    Manal Helal

    Full Text Available BACKGROUND: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52% corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION: The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra

  9. Identification of a gene cluster for biosynthesis of mannosylerythritol lipids in the basidiomycetous fungus Ustilago maydis.

    Science.gov (United States)

    Hewald, Sandra; Linne, Uwe; Scherer, Mario; Marahiel, Mohamed A; Kämper, Jörg; Bölker, Michael

    2006-08-01

    Many microorganisms produce surface-active substances that enhance the availability of water-insoluble substrates. Although many of these biosurfactants have interesting potential applications, very little is known about their biosynthesis. The basidiomycetous fungus Ustilago maydis secretes large amounts of mannosylerythritol lipids (MELs) under conditions of nitrogen starvation. We recently described a putative glycosyltransferase, Emt1, which is essential for MEL biosynthesis and whose expression is strongly induced by nitrogen limitation. We used DNA microarray analysis to identify additional genes involved in MEL biosynthesis. Here we show that emt1 is part of a gene cluster which comprises five open reading frames. Three of the newly identified proteins, Mac1, Mac2, and Mat1, contain short sequence motifs characteristic for acyl- and acetyltransferases. Mutational analysis revealed that Mac1 and Mac2 are essential for MEL production, which suggests that they are involved in the acylation of mannosylerythritol. Deletion of mat1 resulted in the secretion of completely deacetylated MELs, as determined by mass spectrometry. We overexpressed Mat1 in Escherichia coli and demonstrated that this enzyme acts as an acetyl coenzyme A-dependent acetyltransferase. Remarkably, Mat1 displays relaxed regioselectivity and is able to acetylate mannosylerythritol at both the C-4 and C-6 hydroxyl groups. Based on these results, we propose a biosynthesis pathway for the generation of mannosylerythritol lipids in U. maydis.

  10. Discovery of Unusual Biaryl Polyketides by Activation of a Silent Streptomyces venezuelae Biosynthetic Gene Cluster.

    Science.gov (United States)

    Thanapipatsiri, Anyarat; Gomez-Escribano, Juan Pablo; Song, Lijiang; Bibb, Maureen J; Al-Bassam, Mahmoud; Chandra, Govind; Thamchaipenet, Arinthip; Challis, Gregory L; Bibb, Mervyn J

    2016-11-17

    Comparative transcriptional profiling of a ΔbldM mutant of Streptomyces venezuelae with its unmodified progenitor revealed that the expression of a cryptic biosynthetic gene cluster containing both type I and type III polyketide synthase genes is activated in the mutant. The 29.5 kb gene cluster, which was predicted to encode an unusual biaryl metabolite, which we named venemycin, and potentially halogenated derivatives, contains 16 genes including one-vemR-that encodes a transcriptional activator of the large ATP-binding LuxR-like (LAL) family. Constitutive expression of vemR in the ΔbldM mutant led to the production of sufficient venemycin for structural characterisation, confirming its unusual biaryl structure. Co-expression of the venemycin biosynthetic gene cluster and vemR in the heterologous host Streptomyces coelicolor also resulted in venemycin production. Although the gene cluster encodes two halogenases and a flavin reductase, constitutive expression of all three genes led to the accumulation only of a monohalogenated venemycin derivative, both in the native producer and the heterologous host. A competition experiment in which equimolar quantities of sodium chloride and sodium bromide were fed to the venemycin-producing strains resulted in the preferential incorporation of bromine, thus suggesting that bromide is the preferred substrate for one or both halogenases. © 2016 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.

  11. Expression profiling of rainbow trout testis development identifies evolutionary conserved genes involved in spermatogenesis

    Directory of Open Access Journals (Sweden)

    Esquerré Diane

    2009-11-01

    Full Text Available Abstract Background Spermatogenesis is a late developmental process that involves a coordinated expression program in germ cells and a permanent communication between the testicular somatic cells and the germ-line. Current knowledge regarding molecular factors driving male germ cell proliferation and differentiation in vertebrates is still limited and mainly based on existing data from rodents and human. Fish with a marked reproductive cycle and a germ cell development in synchronous cysts have proven to be choice models to study precise stages of the spermatogenetic development and the germ cell-somatic cell communication network. In this study we used 9K cDNA microarrays to investigate the expression profiles underlying testis maturation during the male reproductive cycle of the trout, Oncorhynchus mykiss. Results Using total testis samples at various developmental stages and isolated spermatogonia, spermatocytes and spermatids, 3379 differentially expressed trout cDNAs were identified and their gene activation or repression patterns throughout the reproductive cycle were reported. We also performed a tissue-profiling analysis and highlighted many genes for which expression signals were restricted to the testes or gonads from both sexes. The search for orthologous genes in genome-sequenced fish species and the use of their mammalian orthologs allowed us to provide accurate annotations for trout cDNAs. The analysis of the GeneOntology terms therefore validated and broadened our interpretation of expression clusters by highlighting enriched functions that are consistent with known sequential events during male gametogenesis. Furthermore, we compared expression profiles of trout and mouse orthologs and identified a complement of genes for which expression during spermatogenesis was maintained throughout evolution. Conclusion A comprehensive study of gene expression and associated functions during testis maturation and germ cell differentiation in

  12. An elm EST database for identifying leaf beetle egg-induced defense genes

    Directory of Open Access Journals (Sweden)

    Büchel Kerstin

    2012-06-01

    Full Text Available Abstract Background Plants can defend themselves against herbivorous insects prior to the onset of larval feeding by responding to the eggs laid on their leaves. In the European field elm (Ulmus minor, egg laying by the elm leaf beetle ( Xanthogaleruca luteola activates the emission of volatiles that attract specialised egg parasitoids, which in turn kill the eggs. Little is known about the transcriptional changes that insect eggs trigger in plants and how such indirect defense mechanisms are orchestrated in the context of other biological processes. Results Here we present the first large scale study of egg-induced changes in the transcriptional profile of a tree. Five cDNA libraries were generated from leaves of (i untreated control elms, and elms treated with (ii egg laying and feeding by elm leaf beetles, (iii feeding, (iv artificial transfer of egg clutches, and (v methyl jasmonate. A total of 361,196 ESTs expressed sequence tags (ESTs were identified which clustered into 52,823 unique transcripts (Unitrans and were stored in a database with a public web interface. Among the analyzed Unitrans, 73% could be annotated by homology to known genes in the UniProt (Plant database, particularly to those from Vitis, Ricinus, Populus and Arabidopsis. Comparative in silico analysis among the different treatments revealed differences in Gene Ontology term abundances. Defense- and stress-related gene transcripts were present in high abundance in leaves after herbivore egg laying, but transcripts involved in photosynthesis showed decreased abundance. Many pathogen-related genes and genes involved in phytohormone signaling were expressed, indicative of jasmonic acid biosynthesis and activation of jasmonic acid responsive genes. Cross-comparisons between different libraries based on expression profiles allowed the identification of genes with a potential relevance in egg-induced defenses, as well as other biological processes, including signal transduction

  13. An elm EST database for identifying leaf beetle egg-induced defense genes.

    Science.gov (United States)

    Büchel, Kerstin; McDowell, Eric; Nelson, Will; Descour, Anne; Gershenzon, Jonathan; Hilker, Monika; Soderlund, Carol; Gang, David R; Fenning, Trevor; Meiners, Torsten

    2012-06-15

    Plants can defend themselves against herbivorous insects prior to the onset of larval feeding by responding to the eggs laid on their leaves. In the European field elm (Ulmus minor), egg laying by the elm leaf beetle ( Xanthogaleruca luteola) activates the emission of volatiles that attract specialised egg parasitoids, which in turn kill the eggs. Little is known about the transcriptional changes that insect eggs trigger in plants and how such indirect defense mechanisms are orchestrated in the context of other biological processes. Here we present the first large scale study of egg-induced changes in the transcriptional profile of a tree. Five cDNA libraries were generated from leaves of (i) untreated control elms, and elms treated with (ii) egg laying and feeding by elm leaf beetles, (iii) feeding, (iv) artificial transfer of egg clutches, and (v) methyl jasmonate. A total of 361,196 ESTs expressed sequence tags (ESTs) were identified which clustered into 52,823 unique transcripts (Unitrans) and were stored in a database with a public web interface. Among the analyzed Unitrans, 73% could be annotated by homology to known genes in the UniProt (Plant) database, particularly to those from Vitis, Ricinus, Populus and Arabidopsis. Comparative in silico analysis among the different treatments revealed differences in Gene Ontology term abundances. Defense- and stress-related gene transcripts were present in high abundance in leaves after herbivore egg laying, but transcripts involved in photosynthesis showed decreased abundance. Many pathogen-related genes and genes involved in phytohormone signaling were expressed, indicative of jasmonic acid biosynthesis and activation of jasmonic acid responsive genes. Cross-comparisons between different libraries based on expression profiles allowed the identification of genes with a potential relevance in egg-induced defenses, as well as other biological processes, including signal transduction, transport and primary metabolism

  14. Identifying human disease genes through cross-species gene mapping of evolutionary conserved processes.

    Directory of Open Access Journals (Sweden)

    Martin Poot

    2011-05-01

    Full Text Available Understanding complex networks that modulate development in humans is hampered by genetic and phenotypic heterogeneity within and between populations. Here we present a method that exploits natural variation in highly diverse mouse genetic reference panels in which genetic and environmental factors can be tightly controlled. The aim of our study is to test a cross-species genetic mapping strategy, which compares data of gene mapping in human patients with functional data obtained by QTL mapping in recombinant inbred mouse strains in order to prioritize human disease candidate genes.We exploit evolutionary conservation of developmental phenotypes to discover gene variants that influence brain development in humans. We studied corpus callosum volume in a recombinant inbred mouse panel (C57BL/6J×DBA/2J, BXD strains using high-field strength MRI technology. We aligned mouse mapping results for this neuro-anatomical phenotype with genetic data from patients with abnormal corpus callosum (ACC development.From the 61 syndromes which involve an ACC, 51 human candidate genes have been identified. Through interval mapping, we identified a single significant QTL on mouse chromosome 7 for corpus callosum volume with a QTL peak located between 25.5 and 26.7 Mb. Comparing the genes in this mouse QTL region with those associated with human syndromes (involving ACC and those covered by copy number variations (CNV yielded a single overlap, namely HNRPU in humans and Hnrpul1 in mice. Further analysis of corpus callosum volume in BXD strains revealed that the corpus callosum was significantly larger in BXD mice with a B genotype at the Hnrpul1 locus than in BXD mice with a D genotype at Hnrpul1 (F = 22.48, p<9.87*10(-5.This approach that exploits highly diverse mouse strains provides an efficient and effective translational bridge to study the etiology of human developmental disorders, such as autism and schizophrenia.

  15. The effect of alcohol on the differential expression of cluster of differentiation 14 gene, associated pathways, and genetic network.

    Science.gov (United States)

    Zhou, Diana X; Zhao, Yinghong; Baker, Jessica A; Gu, Qingqing; Hamre, Kristin M; Yue, Junming; Jones, Byron C; Cook, Melloni N; Lu, Lu

    2017-01-01

    Alcohol consumption affects human health in part by compromising the immune system. In this study, we examined the expression of the Cd14 (cluster of differentiation 14) gene, which is involved in the immune system through a proinflammatory cascade. Expression was evaluated in BXD mice treated with saline or acute 1.8 g/kg i.p. ethanol (12.5% v/v). Hippocampal gene expression data were generated to examine differential expression and to perform systems genetics analyses. The Cd14 gene expression showed significant changes among the BXD strains after ethanol treatment, and eQTL mapping revealed that Cd14 is a cis-regulated gene. We also identified eighteen ethanol-related phenotypes correlated with Cd14 expression related to either ethanol responses or ethanol consumption. Pathway analysis was performed to identify possible biological pathways involved in the response to ethanol and Cd14. We also constructed a genetic network for Cd14 using the top 20 correlated genes and present several genes possibly involved in Cd14 and ethanol responses based on differential gene expression. In conclusion, we found Cd14, along with several other genes and pathways, to be involved in ethanol responses in the hippocampus, such as increased susceptibility to lipopolysaccharides and neuroinflammation.

  16. The effect of alcohol on the differential expression of cluster of differentiation 14 gene, associated pathways, and genetic network.

    Directory of Open Access Journals (Sweden)

    Diana X Zhou

    Full Text Available Alcohol consumption affects human health in part by compromising the immune system. In this study, we examined the expression of the Cd14 (cluster of differentiation 14 gene, which is involved in the immune system through a proinflammatory cascade. Expression was evaluated in BXD mice treated with saline or acute 1.8 g/kg i.p. ethanol (12.5% v/v. Hippocampal gene expression data were generated to examine differential expression and to perform systems genetics analyses. The Cd14 gene expression showed significant changes among the BXD strains after ethanol treatment, and eQTL mapping revealed that Cd14 is a cis-regulated gene. We also identified eighteen ethanol-related phenotypes correlated with Cd14 expression related to either ethanol responses or ethanol consumption. Pathway analysis was performed to identify possible biological pathways involved in the response to ethanol and Cd14. We also constructed a genetic network for Cd14 using the top 20 correlated genes and present several genes possibly involved in Cd14 and ethanol responses based on differential gene expression. In conclusion, we found Cd14, along with several other genes and pathways, to be involved in ethanol responses in the hippocampus, such as increased susceptibility to lipopolysaccharides and neuroinflammation.

  17. A CLUSTERING OF DJA STOCKS - THE APPLICATION IN FINANCE OF A METHOD FIRST USED IN GENE TRAJECTORY STUDY

    Directory of Open Access Journals (Sweden)

    Silaghi Gheorghe Cosmin

    2009-05-01

    Full Text Available Previously we employed the Gene Trajectory Clustering methodology to search for different associations of the stocks composing the DJA index, with the aim of finding different, logic clusters, supported by economic reasons, preferably different than the

  18. A pseudogene cluster in the leader region of the Euglena chloroplast 16S-23S rRNA genes.

    Science.gov (United States)

    Miyata, T; Kikuno, R; Ohshima, Y

    1982-01-01

    The nucleotide sequence of a region (leader region) preceding the 5'-end of 16S-23S rRNA gene region of Euglena gracilis chloroplast DNA was compared with the homologous sequences that code for the 16S-23S rRNA operons of Euglena and E. coli. The leader region shows close homology in sequence to the 16S-23S rRNA gene region of Euglena (Orozco et al. (1980) J. Biol.Chem. 255, 10997-11003) as well as to the rrnD operon of E. coli, suggesting that it was derived from the 16S-23S rRNA gene region by gene duplication. It was shown that the leader region had accumulated nucleotide substitutions at an extremely rapid rate in its entirety, similar to the rate of tRNAIle pseudogene identified in the leader region. In addition, the leader region shows an unique base content which is quite distinct from those of 16S-23S rRNA gene regions of Euglena and E. coli, but again is similar to that of the tRNAIle pseudogene. The above two results strongly suggest that the leader region contains a pseudogene cluster which was derived from a gene cluster coding for the functional 16S-23S rRNA operon possibly by imperfect duplication during evolution of Euglena chloroplast DNA. PMID:7041094

  19. Phylogeography of var gene repertoires reveals fine-scale geospatial clustering of Plasmodium falciparum populations in a highly endemic area.

    Science.gov (United States)

    Tessema, Sofonias K; Monk, Stephanie L; Schultz, Mark B; Tavul, Livingstone; Reeder, John C; Siba, Peter M; Mueller, Ivo; Barry, Alyssa E

    2015-01-01

    Plasmodium falciparum malaria is a major global health problem that is being targeted for progressive elimination. Knowledge of local disease transmission patterns in endemic countries is critical to these elimination efforts. To investigate fine-scale patterns of malaria transmission, we have compared repertoires of rapidly evolving var genes in a highly endemic area. A total of 3680 high-quality DBLα-sequences were obtained from 68 P. falciparum isolates from ten villages spread over two distinct catchment areas on the north coast of Papua New Guinea (PNG). Modelling of the extent of var gene diversity in the two parasite populations predicts more than twice as many var gene alleles circulating within each catchment (Mugil = 906; Wosera = 1094) than previously recognized in PNG (Amele = 369). In addition, there were limited levels of var gene sharing between populations, consistent with local parasite population structure. Phylogeographic analyses demonstrate that while neutrally evolving microsatellite markers identified population structure only at the catchment level, var gene repertoires reveal further fine-scale geospatial clustering of parasite isolates. The clustering of parasite isolates by village in Mugil, but not in Wosera was consistent with the physical and cultural isolation of the human populations in the two catchments. The study highlights the microheterogeneity of P. falciparum transmission in highly endemic areas and demonstrates the potential of var genes as markers of local patterns of parasite population structure. © 2014 John Wiley & Sons Ltd.

  20. Clustered Mutation Signatures Reveal that Error-Prone DNA Repair Targets Mutations to Active Genes.

    Science.gov (United States)

    Supek, Fran; Lehner, Ben

    2017-07-27

    Many processes can cause the same nucleotide change in a genome, making the identification of the mechanisms causing mutations a difficult challenge. Here, we show that clustered mutations provide a more precise fingerprint of mutagenic processes. Of nine clustered mutation signatures identified from >1,000 tumor genomes, three relate to variable APOBEC activity and three are associated with tobacco smoking. An additional signature matches the spectrum of translesion DNA polymerase eta (POLH). In lymphoid cells, these mutations target promoters, consistent with AID-initiated somatic hypermutation. In solid tumors, however, they are associated with UV exposure and alcohol consumption and target the H3K36me3 chromatin of active genes in a mismatch repair (MMR)-dependent manner. These regions normally have a low mutation rate because error-free MMR also targets H3K36me3 chromatin. Carcinogens and error-prone repair therefore redistribute mutations to the more important regions of the genome, contributing a substantial mutation load in many tumors, including driver mutations. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. MS/MS networking guided analysis of molecule and gene cluster families

    Science.gov (United States)

    Nguyen, Don Duy; Wu, Cheng-Hsuan; Moree, Wilna J.; Lamsa, Anne; Medema, Marnix H.; Zhao, Xiling; Gavilan, Ronnie G.; Aparicio, Marystella; Atencio, Librada; Jackson, Chanaye; Ballesteros, Javier; Sanchez, Joel; Watrous, Jeramie D.; Phelan, Vanessa V.; van de Wiel, Corine; Kersten, Roland D.; Mehnaz, Samina; De Mot, René; Shank, Elizabeth A.; Charusanti, Pep; Nagarajan, Harish; Duggan, Brendan M.; Moore, Bradley S.; Bandeira, Nuno; Palsson, Bernhard Ø.; Pogliano, Kit; Gutiérrez, Marcelino; Dorrestein, Pieter C.

    2013-01-01

    The ability to correlate the production of specialized metabolites to the genetic capacity of the organism that produces such molecules has become an invaluable tool in aiding the discovery of biotechnologically applicable molecules. Here, we accomplish this task by matching molecular families with gene cluster families, making these correlations to 60 microbes at one time instead of connecting one molecule to one organism at a time, such as how it is traditionally done. We can correlate these families through the use of nanospray desorption electrospray ionization MS/MS, an ambient pressure MS technique, in conjunction with MS/MS networking and peptidogenomics. We matched the molecular families of peptide natural products produced by 42 bacilli and 18 pseudomonads through the generation of amino acid sequence tags from MS/MS data of specific clusters found in the MS/MS network. These sequence tags were then linked to biosynthetic gene clusters in publicly accessible genomes, providing us with the ability to link particular molecules with the genes that produced them. As an example of its use, this approach was applied to two unsequenced Pseudoalteromonas species, leading to the discovery of the gene cluster for a molecular family, the bromoalterochromides, in the previously sequenced strain P. piscicida JCM 20779T. The approach itself is not limited to 60 related strains, because spectral networking can be readily adopted to look at molecular family–gene cluster families of hundreds or more diverse organisms in one single MS/MS network. PMID:23798442

  2. A 6-gene signature identifies four molecular subgroups of neuroblastoma

    OpenAIRE

    Abel, Frida; Dalevi, Daniel; Nethander, Maria; Jörnsten, Rebecka; De Preter, Katleen; Vermeulen, Joëlle; Stallings, Raymond; Kogner, Per; Maris, John; Nilsson, Staffan

    2011-01-01

    Abstract Background There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB); Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA) and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK) was associated to unfavourable biology of sporadic NB. Also, various other genes have been linke...

  3. Regulation of Three Nitrogenase Gene Clusters in the Cyanobacterium Anabaena variabilis ATCC 29413

    Directory of Open Access Journals (Sweden)

    Teresa Thiel

    2014-12-01

    Full Text Available The filamentous cyanobacterium Anabaena variabilis ATCC 29413 fixes nitrogen under aerobic conditions in specialized cells called heterocysts that form in response to an environmental deficiency in combined nitrogen. Nitrogen fixation is mediated by the enzyme nitrogenase, which is very sensitive to oxygen. Heterocysts are microxic cells that allow nitrogenase to function in a filament comprised primarily of vegetative cells that produce oxygen by photosynthesis. A. variabilis is unique among well-characterized cyanobacteria in that it has three nitrogenase gene clusters that encode different nitrogenases, which function under different environmental conditions. The nif1 genes encode a Mo-nitrogenase that functions only in heterocysts, even in filaments grown anaerobically. The nif2 genes encode a different Mo-nitrogenase that functions in vegetative cells, but only in filaments grown under anoxic conditions. An alternative V-nitrogenase is encoded by vnf genes that are expressed only in heterocysts in an environment that is deficient in Mo. Thus, these three nitrogenases are expressed differentially in response to environmental conditions. The entire nif1 gene cluster, comprising at least 15 genes, is primarily under the control of the promoter for the first gene, nifB1. Transcriptional control of many of the downstream nif1 genes occurs by a combination of weak promoters within the coding regions of some downstream genes and by RNA processing, which is associated with increased transcript stability. The vnf genes show a similar pattern of transcriptional and post-transcriptional control of expression suggesting that the complex pattern of regulation of the nif1 cluster is conserved in other cyanobacterial nitrogenase gene clusters.

  4. Overexpression of Hoxc13 in differentiating keratinocytes results in downregulation of a novel hair keratin gene cluster and alopecia.

    Science.gov (United States)

    Tkatchenko, A V; Visconti, R P; Shang, L; Papenbrock, T; Pruett, N D; Ito, T; Ogawa, M; Awgulewitsch, A

    2001-05-01

    Studying the roles of Hox genes in normal and pathological development of skin and hair requires identification of downstream target genes in genetically defined animal models. We show that transgenic mice overexpressing Hoxc13 in differentiating keratinocytes of hair follicles develop alopecia, accompanied by a progressive pathological skin condition that resembles ichthyosis. Large-scale analysis of differential gene expression in postnatal skin of these mice identified 16 previously unknown and 13 known genes as presumptive Hoxc13 targets. The majority of these targets are downregulated and belong to a subgroup of genes that encode hair-specific keratin-associated proteins (KAPs). Genomic mapping using a mouse hamster radiation hybrid panel showed these genes to reside in a novel KAP gene cluster on mouse chromosome 16 in a region of conserved linkage with human chromosome 21q22.11. Furthermore, data obtained by Hoxc13/lacZ reporter gene analysis in mice that overexpress Hoxc13 suggest negative autoregulatory feedback control of Hoxc13 expression levels, thus providing an entry point for elucidating currently unknown mechanisms that are required for regulating quantitative levels of Hox gene expression. Combined, these results provide a framework for understanding molecular mechanisms of Hoxc13 function in hair growth and development.

  5. Gonad Transcriptome Analysis of the Pacific Oyster Crassostrea gigas Identifies Potential Genes Regulating the Sex Determination and Differentiation Process.

    Science.gov (United States)

    Yue, Chenyang; Li, Qi; Yu, Hong

    2018-04-01

    The Pacific oyster Crassostrea gigas is a commercially important bivalve in aquaculture worldwide. C. gigas has a fascinating sexual reproduction system consisting of dioecism, sex change, and occasional hermaphroditism, while knowledge of the molecular mechanisms of sex determination and differentiation is still limited. In this study, the transcriptomes of male and female gonads at different gametogenesis stages were characterized by RNA-seq. Hierarchical clustering based on genes differentially expressed revealed that 1269 genes were expressed specifically in female gonads and 817 genes were expressed increasingly over the course of spermatogenesis. Besides, we identified two and one gene modules related to female and male gonad development, respectively, using weighted gene correlation network analysis (WGCNA). Interestingly, GO and KEGG enrichment analysis showed that neurotransmitter-related terms were significantly enriched in genes related to ovary development, suggesting that the neurotransmitters were likely to regulate female sex differentiation. In addition, two hub genes related to testis development, lncRNA LOC105321313 and Cg-Sh3kbp1, and one hub gene related to ovary development, Cg-Malrd1-like, were firstly investigated. This study points out the role of neurotransmitter and non-coding RNA regulation during gonad development and produces lists of novel relevant candidate genes for further studies. All of these provided valuable information to understand the molecular mechanisms of C. gigas sex determination and differentiation.

  6. Identification and functional clustering of genes regulating muscle protein degradation from amongst the known C. elegans muscle mutants.

    Directory of Open Access Journals (Sweden)

    Freya Shephard

    Full Text Available Loss of muscle mass via protein degradation is an important clinical problem but we know little of how muscle protein degradation is regulated genetically. To gain insight our labs developed C. elegans into a model for understanding the regulation of muscle protein degradation. Past studies uncovered novel functional roles for genes affecting muscle and/or involved in signalling in other cells or tissues. Here we examine most of the genes previously identified as the sites of mutations affecting muscle for novel roles in regulating degradation. We evaluate genomic (RNAi knockdown approaches and combine them with our established genetic (mutant and pharmacologic (drugs approaches to examine these 159 genes. We find that RNAi usually recapitulates both organismal and sub-cellular mutant phenotypes but RNAi, unlike mutants, can frequently be used acutely to study gene function solely in differentiated muscle. In the majority of cases where RNAi does not produce organismal level phenotypes, sub-cellular defects can be detected; disrupted proteostasis is most commonly observed. We identify 48 genes in which mutation or RNAi knockdown causes excessive protein degradation; myofibrillar and/or mitochondrial morphologies are also disrupted in 19 of these 48 cases. These 48 genes appear to act via at least three sub-networks to control bulk degradation of protein in muscle cytosol. Attachment to the extracellular matrix regulates degradation via unidentified proteases and affects myofibrillar and mitochondrial morphology. Growth factor imbalance and calcium overload promote lysosome based degradation whereas calcium deficit promotes proteasome based degradation, in both cases myofibrillar and mitochondrial morphologies are largely unaffected. Our results provide a framework for effectively using RNAi to identify and functionally cluster novel regulators of degradation. This clustering allows prioritization of candidate genes/pathways for future

  7. Ancestral Variations of the PCDHG Gene Cluster Predispose to Dyslexia in a Multiplex Family

    Directory of Open Access Journals (Sweden)

    Teesta Naskar

    2018-02-01

    Full Text Available Dyslexia is a heritable neurodevelopmental disorder characterized by difficulties in reading and writing. In this study, we describe the identification of a set of 17 polymorphisms located across 1.9 Mb region on chromosome 5q31.3, encompassing genes of the PCDHG cluster, TAF7, PCDH1 and ARHGAP26, dominantly inherited with dyslexia in a multi-incident family. Strikingly, the non-risk form of seven variations of the PCDHG cluster, are preponderant in the human lineage, while risk alleles are ancestral and conserved across Neanderthals to non-human primates. Four of these seven ancestral variations (c.460A > C [p.Ile154Leu], c.541G > A [p.Ala181Thr], c.2036G > C [p.Arg679Pro] and c.2059A > G [p.Lys687Glu] result in amino acid alterations. p.Ile154Leu and p.Ala181Thr are present at EC2: EC3 interacting interface of γA3-PCDH and γA4-PCDH respectively might affect trans-homophilic interaction and hence neuronal connectivity. p.Arg679Pro and p.Lys687Glu are present within the linker region connecting trans-membrane to extracellular domain. Sequence analysis indicated the importance of p.Ile154, p.Arg679 and p.Lys687 in maintaining class specificity. Thus the observed association of PCDHG genes encoding neural adhesion proteins reinforces the hypothesis of aberrant neuronal connectivity in the pathophysiology of dyslexia. Additionally, the striking conservation of the identified variants indicates a role of PCDHG in the evolution of highly specialized cognitive skills critical to reading.

  8. Ancestral Variations of the PCDHG Gene Cluster Predispose to Dyslexia in a Multiplex Family.

    Science.gov (United States)

    Naskar, Teesta; Faruq, Mohammed; Banerjee, Priyajit; Khan, Massarat; Midha, Rashi; Kumari, Renu; Devasenapathy, Subhashree; Prajapati, Bharat; Sengupta, Sanghamitra; Jain, Deepti; Mukerji, Mitali; Singh, Nandini Chatterjee; Sinha, Subrata

    2018-02-01

    Dyslexia is a heritable neurodevelopmental disorder characterized by difficulties in reading and writing. In this study, we describe the identification of a set of 17 polymorphisms located across 1.9Mb region on chromosome 5q31.3, encompassing genes of the PCDHG cluster, TAF7, PCDH1 and ARHGAP26, dominantly inherited with dyslexia in a multi-incident family. Strikingly, the non-risk form of seven variations of the PCDHG cluster, are preponderant in the human lineage, while risk alleles are ancestral and conserved across Neanderthals to non-human primates. Four of these seven ancestral variations (c.460A>C [p.Ile154Leu], c.541G>A [p.Ala181Thr], c.2036G>C [p.Arg679Pro] and c.2059A>G [p.Lys687Glu]) result in amino acid alterations. p.Ile154Leu and p.Ala181Thr are present at EC2: EC3 interacting interface of γA3-PCDH and γA4-PCDH respectively might affect trans-homophilic interaction and hence neuronal connectivity. p.Arg679Pro and p.Lys687Glu are present within the linker region connecting trans-membrane to extracellular domain. Sequence analysis indicated the importance of p.Ile154, p.Arg679 and p.Lys687 in maintaining class specificity. Thus the observed association of PCDHG genes encoding neural adhesion proteins reinforces the hypothesis of aberrant neuronal connectivity in the pathophysiology of dyslexia. Additionally, the striking conservation of the identified variants indicates a role of PCDHG in the evolution of highly specialized cognitive skills critical to reading. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Hessian regularization based non-negative matrix factorization for gene expression data clustering.

    Science.gov (United States)

    Liu, Xiao; Shi, Jun; Wang, Congzhi

    2015-01-01

    Since a key step in the analysis of gene expression data is to detect groups of genes that have similar expression patterns, clustering technique is then commonly used to analyze gene expression data. Data representation plays an important role in clustering analysis. The non-negative matrix factorization (NMF) is a widely used data representation method with great success in machine learning. Although the traditional manifold regularization method, Laplacian regularization (LR), can improve the performance of NMF, LR still suffers from the problem of its weak extrapolating power. Hessian regularization (HR) is a newly developed manifold regularization method, whose natural properties make it more extrapolating, especially for small sample data. In this work, we propose the HR-based NMF (HR-NMF) algorithm, and then apply it to represent gene expression data for further clustering task. The clustering experiments are conducted on five commonly used gene datasets, and the results indicate that the proposed HR-NMF outperforms LR-based NMM and original NMF, which suggests the potential application of HR-NMF for gene expression data.

  10. Oil palm (Elaeis guineensis Jacq.) tissue culture ESTs: identifying genes associated with callogenesis and embryogenesis.

    Science.gov (United States)

    Low, Eng-Ti L; Alias, Halimah; Boon, Soo-Heong; Shariff, Elyana M; Tan, Chi-Yee A; Ooi, Leslie Cl; Cheah, Suan-Choo; Raha, Abdul-Rahim; Wan, Kiew-Lian; Singh, Rajinder

    2008-05-29

    Oil palm (Elaeis guineensis Jacq.) is one of the most important oil bearing crops in the world. However, genetic improvement of oil palm through conventional breeding is extremely slow and costly, as the breeding cycle can take up to 10 years. This has brought about interest in vegetative propagation of oil palm. Since the introduction of oil palm tissue culture in the 1970s, clonal propagation has proven to be useful, not only in producing uniform planting materials, but also in the development of the genetic engineering programme. Despite considerable progress in improving the tissue culture techniques, the callusing and embryogenesis rates from proliferating callus cultures remain very low. Thus, understanding the gene diversity and expression profiles in oil palm tissue culture is critical in increasing the efficiency of these processes. A total of 12 standard cDNA libraries, representing three main developmental stages in oil palm tissue culture, were generated in this study. Random sequencing of clones from these cDNA libraries generated 17,599 expressed sequence tags (ESTs). The ESTs were analysed, annotated and assembled to generate 9,584 putative unigenes distributed in 3,268 consensi and 6,316 singletons. These unigenes were assigned putative functions based on similarity and gene ontology annotations. Cluster analysis, which surveyed the relatedness of each library based on the abundance of ESTs in each consensus, revealed that lipid transfer proteins were highly expressed in embryogenic tissues. A glutathione S-transferase was found to be highly expressed in non-embryogenic callus. Further analysis of the unigenes identified 648 non-redundant simple sequence repeats and 211 putative full-length open reading frames. This study has provided an overview of genes expressed during oil palm tissue culture. Candidate genes with expression that are modulated during tissue culture were identified. However, in order to confirm whether these genes are suitable as

  11. Oil palm (Elaeis guineensis Jacq. tissue culture ESTs: Identifying genes associated with callogenesis and embryogenesis

    Directory of Open Access Journals (Sweden)

    Ooi Leslie CL

    2008-05-01

    Full Text Available Abstract Background Oil palm (Elaeis guineensis Jacq. is one of the most important oil bearing crops in the world. However, genetic improvement of oil palm through conventional breeding is extremely slow and costly, as the breeding cycle can take up to 10 years. This has brought about interest in vegetative propagation of oil palm. Since the introduction of oil palm tissue culture in the 1970s, clonal propagation has proven to be useful, not only in producing uniform planting materials, but also in the development of the genetic engineering programme. Despite considerable progress in improving the tissue culture techniques, the callusing and embryogenesis rates from proliferating callus cultures remain very low. Thus, understanding the gene diversity and expression profiles in oil palm tissue culture is critical in increasing the efficiency of these processes. Results A total of 12 standard cDNA libraries, representing three main developmental stages in oil palm tissue culture, were generated in this study. Random sequencing of clones from these cDNA libraries generated 17,599 expressed sequence tags (ESTs. The ESTs were analysed, annotated and assembled to generate 9,584 putative unigenes distributed in 3,268 consensi and 6,316 singletons. These unigenes were assigned putative functions based on similarity and gene ontology annotations. Cluster analysis, which surveyed the relatedness of each library based on the abundance of ESTs in each consensus, revealed that lipid transfer proteins were highly expressed in embryogenic tissues. A glutathione S-transferase was found to be highly expressed in non-embryogenic callus. Further analysis of the unigenes identified 648 non-redundant simple sequence repeats and 211 putative full-length open reading frames. Conclusion This study has provided an overview of genes expressed during oil palm tissue culture. Candidate genes with expression that are modulated during tissue culture were identified. However

  12. Identifying multiple outliers in linear regression: robust fit and clustering approach

    International Nuclear Information System (INIS)

    Robiah Adnan; Mohd Nor Mohamad; Halim Setan

    2001-01-01

    This research provides a clustering based approach for determining potential candidates for outliers. This is modification of the method proposed by Serbert et. al (1988). It is based on using the single linkage clustering algorithm to group the standardized predicted and residual values of data set fit by least trimmed of squares (LTS). (Author)

  13. Social Learning Network Analysis Model to Identify Learning Patterns Using Ontology Clustering Techniques and Meaningful Learning

    Science.gov (United States)

    Firdausiah Mansur, Andi Besse; Yusof, Norazah

    2013-01-01

    Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…

  14. Identifying subtypes among offenders with antisocial personality disorder: a cluster-analytic study.

    Science.gov (United States)

    Poythress, Norman G; Edens, John F; Skeem, Jennifer L; Lilienfeld, Scott O; Douglas, Kevin S; Frick, Paul J; Patrick, Christopher J; Epstein, Monica; Wang, Tao

    2010-05-01

    The question of whether antisocial personality disorder (ASPD) and psychopathy are largely similar or fundamentally different constructs remains unresolved. In the Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM-IV; American Psychiatric Association, 1994), many of the personality features of psychopathy are cast as associated features of ASPD, although the DSM-IV offers no guidance as to how, or the extent to which, these features relate to ASPD. In a sample of 691 offenders who met DSM-IV criteria for ASPD, we used model-based clustering to identify subgroups of individuals with relatively homogeneous profiles on measures of associated features (psychopathic personality traits) and other constructs with potential etiological significance for subtypes of ASPD. Two emergent groups displayed profiles that conformed broadly to theoretical descriptions of primary psychopathy and Karpman's (1941) variant of secondary psychopathy. As expected, a third group (nonpsychopathic ASPD) lacked substantial associated features. A fourth group exhibited elevated psychopathic features as well as a highly fearful temperament, a profile not clearly predicted by extant models. Planned comparisons revealed theoretically informative differences between primary and secondary groups in multiple domains, including self-report measures, passive avoidance learning, clinical ratings, and official records. Our results inform ongoing debates about the overlap between psychopathy and ASPD and raise questions about the wisdom of placing most individuals who habitually violate social norms and laws into a single diagnostic category.

  15. Quantile regression and Bayesian cluster detection to identify radon prone areas.

    Science.gov (United States)

    Sarra, Annalina; Fontanella, Lara; Valentini, Pasquale; Palermi, Sergio

    2016-11-01

    Albeit the dominant source of radon in indoor environments is the geology of the territory, many studies have demonstrated that indoor radon concentrations also depend on dwelling-specific characteristics. Following a stepwise analysis, in this study we propose a combined approach to delineate radon prone areas. We first investigate the impact of various building covariates on indoor radon concentrations. To achieve a more complete picture of this association, we exploit the flexible formulation of a Bayesian spatial quantile regression, which is also equipped with parameters that controls the spatial dependence across data. The quantitative knowledge of the influence of each significant building-specific factor on the measured radon levels is employed to predict the radon concentrations that would have been found if the sampled buildings had possessed standard characteristics. Those normalised radon measures should reflect the geogenic radon potential of the underlying ground, which is a quantity directly related to the geological environment. The second stage of the analysis is aimed at identifying radon prone areas, and to this end, we adopt a Bayesian model for spatial cluster detection using as reference unit the building with standard characteristics. The case study is based on a data set of more than 2000 indoor radon measures, available for the Abruzzo region (Central Italy) and collected by the Agency of Environmental Protection of Abruzzo, during several indoor radon monitoring surveys. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Clustering Time-Series Gene Expression Data Using Smoothing Spline Derivatives

    Directory of Open Access Journals (Sweden)

    Martin PGP

    2007-01-01

    Full Text Available Microarray data acquired during time-course experiments allow the temporal variations in gene expression to be monitored. An original postprandial fasting experiment was conducted in the mouse and the expression of 200 genes was monitored with a dedicated macroarray at 11 time points between 0 and 72 hours of fasting. The aim of this study was to provide a relevant clustering of gene expression temporal profiles. This was achieved by focusing on the shapes of the curves rather than on the absolute level of expression. Actually, we combined spline smoothing and first derivative computation with hierarchical and partitioning clustering. A heuristic approach was proposed to tune the spline smoothing parameter using both statistical and biological considerations. Clusters are illustrated a posteriori through principal component analysis and heatmap visualization. Most results were found to be in agreement with the literature on the effects of fasting on the mouse liver and provide promising directions for future biological investigations.

  17. Clustering Time-Series Gene Expression Data Using Smoothing Spline Derivatives

    Directory of Open Access Journals (Sweden)

    S. Déjean

    2007-06-01

    Full Text Available Microarray data acquired during time-course experiments allow the temporal variations in gene expression to be monitored. An original postprandial fasting experiment was conducted in the mouse and the expression of 200 genes was monitored with a dedicated macroarray at 11 time points between 0 and 72 hours of fasting. The aim of this study was to provide a relevant clustering of gene expression temporal profiles. This was achieved by focusing on the shapes of the curves rather than on the absolute level of expression. Actually, we combined spline smoothing and first derivative computation with hierarchical and partitioning clustering. A heuristic approach was proposed to tune the spline smoothing parameter using both statistical and biological considerations. Clusters are illustrated a posteriori through principal component analysis and heatmap visualization. Most results were found to be in agreement with the literature on the effects of fasting on the mouse liver and provide promising directions for future biological investigations.

  18. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    Science.gov (United States)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  19. Evaluation of gene-expression clustering via mutual information distance measure

    Directory of Open Access Journals (Sweden)

    Maimon Oded

    2007-03-01

    Full Text Available Abstract Background The definition of a distance measure plays a key role in the evaluation of different clustering solutions of gene expression profiles. In this empirical study we compare different clustering solutions when using the Mutual Information (MI measure versus the use of the well known Euclidean distance and Pearson correlation coefficient. Results Relying on several public gene expression datasets, we evaluate the homogeneity and separation scores of different clustering solutions. It was found that the use of the MI measure yields a more significant differentiation among erroneous clustering solutions. The proposed measure was also used to analyze the performance of several known clustering algorithms. A comparative study of these algorithms reveals that their "best solutions" are ranked almost oppositely when using different distance measures, despite the found correspondence between these measures when analysing the averaged scores of groups of solutions. Conclusion In view of the results, further attention should be paid to the selection of a proper distance measure for analyzing the clustering of gene expression data.

  20. CLEAN: CLustering Enrichment ANalysis

    Science.gov (United States)

    Freudenberg, Johannes M; Joshi, Vineet K; Hu, Zhen; Medvedovic, Mario

    2009-01-01

    Background Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. Results We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score). The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at . The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView). Conclusion Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the conclusions made about clusters of co

  1. Identifying Two Groups of Entitled Individuals: Cluster Analysis Reveals Emotional Stability and Self-Esteem Distinction.

    Science.gov (United States)

    Crowe, Michael L; LoPilato, Alexander C; Campbell, W Keith; Miller, Joshua D

    2016-12-01

    The present study hypothesized that there exist two distinct groups of entitled individuals: grandiose-entitled, and vulnerable-entitled. Self-report scores of entitlement were collected for 916 individuals using an online platform. Model-based cluster analyses were conducted on the individuals with scores one standard deviation above mean (n = 159) using the five-factor model dimensions as clustering variables. The results support the existence of two groups of entitled individuals categorized as emotionally stable and emotionally vulnerable. The emotionally stable cluster reported emotional stability, high self-esteem, more positive affect, and antisocial behavior. The emotionally vulnerable cluster reported low self-esteem and high levels of neuroticism, disinhibition, conventionality, psychopathy, negative affect, childhood abuse, intrusive parenting, and attachment difficulties. Compared to the control group, both clusters reported being more antagonistic, extraverted, Machiavellian, and narcissistic. These results suggest important differences are missed when simply examining the linear relationships between entitlement and various aspects of its nomological network.

  2. The Integrative Method Based on the Module-Network for Identifying Driver Genes in Cancer Subtypes

    Directory of Open Access Journals (Sweden)

    Xinguo Lu

    2018-01-01

    Full Text Available With advances in next-generation sequencing(NGS technologies, a large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data. Breast cancer is known as a heterogeneous disease. The identification of subtype-specific driver genes is critical to guide the diagnosis, assessment of prognosis and treatment of breast cancer. We developed an integrated frame based on gene expression profiles and copy number variation (CNV data to identify breast cancer subtype-specific driver genes. In this frame, we employed statistical machine-learning method to select gene subsets and utilized an module-network analysis method to identify potential candidate driver genes. The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes. To validate specificity of the driver genes, the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes. The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods.

  3. Ancestral and derived attributes of the dlx gene repertoire, cluster structure and expression patterns in an African cichlid fish

    Directory of Open Access Journals (Sweden)

    Renz Adina J

    2011-01-01

    Full Text Available Abstract Background Cichlid fishes have undergone rapid, expansive evolutionary radiations that are manifested in the diversification of their trophic morphologies, tooth patterning and coloration. Understanding the molecular mechanisms that underlie the cichlids' unique patterns of evolution requires a thorough examination of genes that pattern the neural crest, from which these diverse phenotypes are derived. Among those genes, the homeobox-containing Dlx gene family is of particular interest since it is involved in the patterning of the brain, jaws and teeth. Results In this study, we characterized the dlx genes of an African cichlid fish, Astatotilapia burtoni, to provide a baseline to later allow cross-species comparison within Cichlidae. We identified seven dlx paralogs (dlx1a, -2a, -4a, -3b, -4b, -5a and -6a, whose orthologies were validated with molecular phylogenetic trees. The intergenic regions of three dlx gene clusters (dlx1a-2a, dlx3b-4b, and dlx5a-6a were amplified with long PCR. Intensive cross-species comparison revealed a number of conserved non-coding elements (CNEs that are shared with other percomorph fishes. This analysis highlighted additional lineage-specific gains/losses of CNEs in different teleost fish lineages and a novel CNE that had previously not been identified. Our gene expression analyses revealed overlapping but distinct expression of dlx orthologs in the developing brain and pharyngeal arches. Notably, four of the seven A. burtoni dlx genes, dlx2a, dlx3b, dlx4a and dlx5a, were expressed in the developing pharyngeal teeth. Conclusion This comparative study of the dlx genes of A. burtoni has deepened our knowledge of the diversity of the Dlx gene family, in terms of gene repertoire, expression patterns and non-coding elements. We have identified possible cichlid lineage-specific changes, including losses of a subset of dlx expression domains in the pharyngeal teeth, which will be the targets of future functional

  4. Gene microarray data analysis using parallel point-symmetry-based clustering.

    Science.gov (United States)

    Sarkar, Anasua; Maulik, Ujjwal

    2015-01-01

    Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.

  5. ( Euphausia superba ) transcriptome to identify function genes and ...

    Indian Academy of Sciences (India)

    MA

    database, Superba SE, was described by Hunt et al (2017) and KrillDB was developed (Sales et al., 2017) for purpose of free accession to annotation information for users. However, the availability of molecular data concerning function genes, microsatellites, and single nucleotide polymorphism (SNP) in E. superba is still ...

  6. ( Euphausia superba ) transcriptome to identify function genes and ...

    Indian Academy of Sciences (India)

    MA

    Further analysis produced 106,250 unigenes, of which. 31,683 were annotated based on protein homology searches against protein databases. Gene. Ontology (GO) analysis showed that Ion binding, organic substance metabolic process, and cell part were the most abundant terms in molecular function, biological process ...

  7. Transposon-tagging identifies novel pathogenicity genes in Fusarium graminearum

    NARCIS (Netherlands)

    Dufresne, M.; Lee, van der T.A.J.; M'Barek, Ben S.; Xu, X.; Zhang, X.; Kema, G.H.J.; Daboussi, M.J.

    2008-01-01

    With the increase of sequenced fungal genomes, high-throughput methods for functional analyses of genes are needed. We assessed the potential of a new transposon mutagenesis tool deploying a Fusarium oxysporum miniature inverted-repeat transposable element mimp1, mobilized by the transposase of

  8. Allelic polymorphism of 'Makoei' sheep myostatin gene identified by ...

    African Journals Online (AJOL)

    Administrator

    2011-09-05

    Sep 5, 2011 ... Key words: Myostatin gene, polymerase chain reaction (PCR), single strand conformation polymorphism technique (SSCP), Ovis aries. ..... Kambadur R, Sharmam M, Smith TPL, Bass JJ (1997). Mutations In. Myostatin (GDF8) In Double-Muscled Belgian Blue And Piemontese. Cattle. Genome. Res., 7: ...

  9. Allelic polymorphism of Makoei sheep leptin gene identified by ...

    African Journals Online (AJOL)

    use

    2011-12-05

    Dec 5, 2011 ... intake, energy expenditure and whole-body energy balance in animals. In the present study, the polymorphism of the leptin gene (LEP) of Makoei sheep was investigated by polymerase chain reaction and single strand conformation polymorphism technique (PCR–SSCP). Genomic DNA was extracted.

  10. Transcriptional analysis of the jamaicamide gene cluster from the marine cyanobacterium Lyngbya majuscula and identification of possible regulatory proteins

    Directory of Open Access Journals (Sweden)

    Dorrestein Pieter C

    2009-12-01

    Full Text Available Abstract Background The marine cyanobacterium Lyngbya majuscula is a prolific producer of bioactive secondary metabolites. Although biosynthetic gene clusters encoding several of these compounds have been identified, little is known about how these clusters of genes are transcribed or regulated, and techniques targeting genetic manipulation in Lyngbya strains have not yet been developed. We conducted transcriptional analyses of the jamaicamide gene cluster from a Jamaican strain of Lyngbya majuscula, and isolated proteins that could be involved in jamaicamide regulation. Results An unusually long untranslated leader region of approximately 840 bp is located between the jamaicamide transcription start site (TSS and gene cluster start codon. All of the intergenic regions between the pathway ORFs were transcribed into RNA in RT-PCR experiments; however, a promoter prediction program indicated the possible presence of promoters in multiple intergenic regions. Because the functionality of these promoters could not be verified in vivo, we used a reporter gene assay in E. coli to show that several of these intergenic regions, as well as the primary promoter preceding the TSS, are capable of driving β-galactosidase production. A protein pulldown assay was also used to isolate proteins that may regulate the jamaicamide pathway. Pulldown experiments using the intergenic region upstream of jamA as a DNA probe isolated two proteins that were identified by LC-MS/MS. By BLAST analysis, one of these had close sequence identity to a regulatory protein in another cyanobacterial species. Protein comparisons suggest a possible correlation between secondary metabolism regulation and light dependent complementary chromatic adaptation. Electromobility shift assays were used to evaluate binding of the recombinant proteins to the jamaicamide promoter region. Conclusion Insights into natural product regulation in cyanobacteria are of significant value to drug discovery

  11. Functional clustering and lineage markers: insights into cellular differentiation and gene function from large-scale microarray studies of purified primary cell populations.

    Science.gov (United States)

    Hume, David A; Summers, Kim M; Raza, Sobia; Baillie, J Kenneth; Freeman, Thomas C

    2010-06-01

    Very large microarray datasets showing gene expression across multiple tissues and cell populations provide a window on the transcriptional networks that underpin the differences in functional activity between biological systems. Clusters of co-expressed genes provide lineage markers, candidate regulators of cell function and, by applying the principle of guilt by association, candidate functions for genes of currently unknown function. We have analysed a dataset comprising pure cell populations from hemopoietic and non-hemopoietic cell types (http://biogps.gnf.org). Using a novel network visualisation and clustering approach, we demonstrate that it is possible to identify very tight expression signatures associated specifically with embryonic stem cells, mesenchymal cells and hematopoietic lineages. Selected examples validate the prediction that gene function can be inferred by co-expression. One expression cluster was enriched in phagocytes, which, alongside endosome-lysosome constituents, contains genes that may make up a 'pathway' for phagocyte differentiation. Promoters of these genes are enriched for binding sites for the ETS/PU.1 and MITF families. Another cluster was associated with the production of a specific extracellular matrix, with high levels of gene expression shared by cells of mesenchymal origin (fibroblasts, adipocytes, osteoblasts and myoblasts). We discuss the limitations placed upon such data by the presence of alternative promoters with distinct tissue specificity within many protein-coding genes. Copyright 2010 Elsevier Inc. All rights reserved.

  12. Identification of a conserved cluster of skin-specific genes encoding secreted proteins.

    Science.gov (United States)

    Moffatt, Pierre; Salois, Patrick; St-Amant, Natalie; Gaumond, Marie-Hélène; Lanctôt, Christian

    2004-06-09

    Terminal differentiation of keratinocytes results in the formation of a cornified layer composed of cross-linked intracellular and extracellular material. Using a signal trap expression screening strategy, we have identified four cDNAs encoding secreted proteins potentially involved in this process. One of the cDNAs is identical to the short isoform of suprabasin, a recently described epidermis-specific protein, which is shown here to contain a functional secretory signal. The second cDNA, sk89, encodes a protein of 493 amino acids, rich in glycine and serine residues. The third cDNA encodes a C-terminal fragment of SK89 (amino acids 410-493). It comprises exons 13 to 18 of the sk89 locus but transcription starts at an isoform-specific exon encoding a distinct secretory signal. The fourth cDNA encodes keratinocyte differentiation-associated protein (KDAP), a precursor protein of 102 amino acids. Subcellular localization by immunofluorescence and detection of the tagged proteins by Western blotting confirmed that the four proteins are secreted. Northern analysis and in situ hybridization revealed that expression of the corresponding genes was restricted to the suprabasal keratinocytes of the epidermis. These genes encoding epidermis-specific secreted products are found in a conserved cluster on human chromosome 19q13.12 and on mouse chromosome 7A3.

  13. IMA: Identifying disease-related genes using MeSH terms and association rules.

    Science.gov (United States)

    Kim, Jeongwoo; Bang, Changbae; Hwang, Hyeonseo; Kim, Doyoung; Park, Chihyun; Park, Sanghyun

    2017-12-01

    Genes play an important role in several diseases. Hence, in biology, identifying relationships between diseases and genes is important for the analysis of diseases, because mutated or dysregulated genes play an important role in pathogenesis. Here, we propose a method to identify disease-related genes using MeSH terms and association rules. We identified genes by analyzing the MeSH terms and extracted information on gene-gene interactions based on association rules. By integrating the extracted interactions, we constructed gene-gene networks and identified disease-related genes. We applied the proposed method to study five cancers, including prostate, lung, breast, stomach, and colorectal cancer, and demonstrated that the proposed method is more useful for identifying disease-related and candidate disease-related genes than previously published methods. In this study, we identified 20 genes for each disease. Among them, we presented 34 important candidate genes with evidence that supports the relationship of the candidate genes with diseases. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks

    Directory of Open Access Journals (Sweden)

    Zwinderman Aeilko H

    2009-09-01

    Full Text Available Abstract Background We generalized penalized canonical correlation analysis for analyzing microarray gene-expression measurements for checking completeness of known metabolic pathways and identifying candidate genes for incorporation in the pathway. We used Wold's method for calculation of the canonical variates, and we applied ridge penalization to the regression of pathway genes on canonical variates of the non-pathway genes, and the elastic net to the regression of non-pathway genes on the canonical variates of the pathway genes. Results We performed a small simulation to illustrate the model's capability to identify new candidate genes to incorporate in the pathway: in our simulations it appeared that a gene was correctly identified if the correlation with the pathway genes was 0.3 or more. We applied the methods to a gene-expression microarray data set of 12, 209 genes measured in 45 patients with glioblastoma, and we considered genes to incorporate in the glioma-pathway: we identified more than 25 genes that correlated > 0.9 with canonical variates of the pathway genes. Conclusion We concluded that penalized canonical correlation analysis is a powerful tool to identify candidate genes in pathway analysis.

  15. Rearrangements of the beta-globin gene cluster in apparently typical betaS haplotypes.

    Science.gov (United States)

    Zago, M A; Silva, W A; Gualandro, S; Yokomizu, I K; Araujo, A G; Tavela, M H; Gerard, N; Krishnamoorthy, R; Elion, J

    2001-02-01

    The majority of the chromosomes with the betaS gene have one of the five common haplotypes, designated as Benin, Bantu, Senegal, Cameroon, and Arab-Indian haplotypes. However, 5-10% of the chromosomes have less common haplotypes, usually referred to as atypical haplotypes. We have demonstrated that most atypical haplotypes are generated by recombinations. The present study was carried out in order to explore whether recombination also occurs in chromosomes with the common (or typical) haplotypes. We screened the HS-2 region of the beta-globin gene locus control region (LCR) in 244 sickle cell patients who had typical restriction fragment length polymorphism (RFLP)-defined haplotypes of the betaS-gene cluster. For 14 cases in which the expected and the observed LCR repeat-sequence sizes were discrepant, the analysis was extended to other unexplored polymorphic markers of the bS-globin gene cluster, i.e.: pre-Ggamma framework, pre-Ggamma 6-bp deletion, HS-2 LCR (AT)xR(AT)y and pre-beta(AT)xTy repeats, and the intragenic beta-globin gene framework. In all 14 cases (15 chromosomes) in which the LCR repeat-sequence sizes were discrepant, a recombination involving a typical 3' segment of the betaS globin gene cluster was demonstrated. In most of the cases, the recombination site was located between the beta-globin gene and the betaLCR. Nine cases involving recombination were detected among 156 Brazilian HbS homozygotes and five among 88 African patients homozygotes for the Benin haplotype. INTERPRETATION AND CONCLUSIONS. Thus, 3.1% of apparently typical haplotypes linked to the sickle cell gene involve recombinations similar to those that generate the atypical haplotypes, a finding that reinforces the picture of the beta-globin gene cluster as highly dynamic.

  16. Tissue-wide expression profiling using cDNA subtraction and microarrays to identify tumor-specific genes.

    Science.gov (United States)

    Amatschek, Stefan; Koenig, Ulrich; Auer, Herbert; Steinlein, Peter; Pacher, Margit; Gruenfelder, Agnes; Dekan, Gerhard; Vogl, Sonja; Kubista, Ernst; Heider, Karl-Heinz; Stratowa, Christian; Schreiber, Martin; Sommergruber, Wolfgang

    2004-02-01

    With the objective of discovering novel putative intervention sites for anticancer therapy, we compared transcriptional profiles of breast cancer, lung squamous cell cancer (LSCC), lung adenocarcinoma (LAC), and renal cell cancer (RCC). Each of these tumor types still needs improvement in medical treatment. Our intention was to search for genes not only highly expressed in the majority of patient samples but which also exhibit very low or even absence of expression in a comprehensive panel of 16 critical (vital) normal tissues. To achieve this goal, we combined two powerful technologies, PCR-based cDNA subtraction and cDNA microarrays. Seven subtractive libraries consisting of approximately 9250 clones were established and enriched for tumor-specific transcripts. These clones, together with approximately 1750 additional tumor-relevant genes, were used for cDNA microarray preparation. Hybridizations were performed using a pool of 16 critical normal tissues as a reference in all experiments. In total, we analyzed 20 samples of breast cancer, 11 of LSCC, 11 of LAC, and 8 of RCC. To select for genes with low or even no expression in normal tissues, expression profiles of 22 different normal tissues were additionally analyzed. Importantly, this tissue-wide expression profiling allowed us to eliminate genes, which exhibit also high expression in normal tissues. Similarly, expression signatures of genes, which are derived from infiltrating cells of the immune system, were eliminated as well. Cluster analysis resulted in the identification of 527 expressed sequence tags specifically up-regulated in these tumors. Gene-wise hierarchical clustering of these clones clearly separated the different tumor types with RCC exhibiting the most homogeneous and LAC the most diverse expression profile. In addition to already known tumor-associated genes, the majority of identified genes have not yet been brought into context with tumorigenesis such as genes involved in bone matrix

  17. A critical evaluation of the use of cluster analysis to identify contaminated sediments in the Ria de Vigo

    Energy Technology Data Exchange (ETDEWEB)

    Rubio, B; Nombela, M. A; Vilas, F [Departamento de Geociencias Marinas y Ordenacion del Territorio, Vigo, Espana (Spain)

    2001-06-01

    The indiscriminate use of cluster analysis to distinguish contaminated and non-contaminated sediments has led us to make a comparative evaluation of different cluster analysis procedures as applied to heavy metal concentrations in subtidal sediments from the Ria de Vigo, NW Spain. The use of different clusters algorithms and other transformations from the same departing set of data lead to the formation of different clusters with a clear inconclusive result about the contamination status of the sediments. The results show that this approach is better suited to identifying groups of samples differing in sedimentological characteristics, such as grain size, rather than in the degree of contamination. Our main aim is to call attention to these aspects in cluster analysis and to suggest that researches should be rigorous with this kind of analysis. Finally, the use of discriminate analysis allows us to find a discriminate function that separates the samples into two clearly differentiated groups, which should not be treated jointly. [Spanish] El uso indiscriminado del analisis cluster para distinguir sedimentos contaminados y no contaminados nos ha llevado a realizar una evaluacion comparativa entre los diferentes procedimientos de estos analisis aplicada a la concentracion de metales pesados en sedimentos submareales de la Ria de Vigo, NW de Espana. La utilizacion de distintos algoritmos de cluster, asi como otras transformaciones de la misma matriz de datos conduce a la formacion de diferentes clusters con un resultado inconcluso sobre el estado de contaminacion de los sedimentos. Los resultados muestran que esta aproximacion se ajusta mejor para identificar grupos de muestras que difieren en caracteristicas sedimentologicas, tal como el tamano de grano, mas que el grado de contaminacion. El principal objetivo es llamar la atencion sobre estos aspectos del analisis cluster y sugerir a los investigadores que sean rigurosos con este tipo de analisis. Finalmente el uso

  18. Identifying the heterogeneity of young adult rhinitis through cluster analysis in the Isle of Wight birth cohort.

    Science.gov (United States)

    Kurukulaaratchy, Ramesh J; Zhang, Hongmei; Patil, Veeresh; Raza, Abid; Karmaus, Wilfried; Ewart, Susan; Arshad, S Hasan

    2015-01-01

    Rhinitis affects many young adults and often shows comorbidity with asthma. We hypothesized that young adult rhinitis, like asthma, exhibits clinical heterogeneity identifiable by means of cluster analysis. Participants in the Isle of Wight birth cohort (n = 1456) were assessed at 1, 2, 4, 10, and 18 years of age. Cluster analysis was performed on those with rhinitis at age 18 years (n = 468) by using 13 variables defining clinical characteristics. Four clusters were identified. Patients in cluster 1 (n = 128 [27.4%]; ie, moderate childhood-onset rhinitis) had high atopy and eczema prevalence and high total IgE levels but low asthma prevalence. They showed the best lung function at 18 years of age, with normal fraction of exhaled nitric oxide (Feno), low bronchial hyperresponsiveness (BHR), and low bronchodilator reversibility (BDR) but high rhinitis symptoms and treatment. Patients in cluster 2 (n = 199 [42.5%]; ie, mild-adolescence-onset female rhinitis) had the lowest prevalence of comorbid atopy, asthma, and eczema. They had normal lung function and low BHR, BDR, Feno values, and total IgE levels plus low rhinitis symptoms, severity, and treatment. Patients in cluster 3 (n = 59 [12.6%]; ie, severe earliest-onset rhinitis with asthma) had the youngest rhinitis onset plus the highest comorbid asthma (of simultaneous onset) and atopy. They showed the most obstructed lung function with high BHR, BDR, and Feno values plus high rhinitis symptoms, severity, and treatment. Patient 4 in cluster 4 (n = 82 [17.5%]; ie, moderate childhood-onset male rhinitis with asthma) had high atopy, intermediate asthma, and low eczema. They had impaired lung function with high Feno values and total IgE levels but intermediate BHR and BDR. They had moderate rhinitis symptoms. Clinically distinctive adolescent rhinitis clusters are apparent with varying sex and asthma associations plus differing rhinitis severity and treatment needs. Copyright © 2014 American Academy of Allergy, Asthma

  19. Mapping of the {alpha}{sub 4} subunit gene (GABRA4) to human chromosome 4 defines an {alpha}{sub 2}-{alpha}{sub 4}-{beta}{sub 1}-{gamma}{sub 1} gene cluster: Further evidence that modern GABA{sub a} receptor gene clusters are derived from an ancestral cluster

    Energy Technology Data Exchange (ETDEWEB)

    McLean, P.J.; Farb, D.H.; Russek, S.J. [Boston Univ. School of Medicine, MA (United States)] [and others

    1995-04-10

    We demonstrated previously that an {alpha}{sub 1}-{beta}{sub 2}-{gamma}{sub 2} gene cluster of the {gamma}-aminobutyric acid (GABA{sub A}) receptor is located on human chromosome 5q34-q35 and that an ancestral {alpha}-{beta}-{gamma} gene cluster probably spawned clusters on chromosomes 4, 5, and 15. Here, we report that the {alpha}{sub 4} gene (GABRA4) maps to human chromosome 4p14-q12, defining a cluster comprising the {alpha}{sub 2}, {alpha}{sub 4}, {beta}{sub 1}, and {gamma}{sub 1} genes. The existence of an {alpha}{sub 2}-{alpha}{sub 4}-{beta}{sub 1}-{gamma}{sub 2} cluster on chromosome 4 and an {alpha}{sub 1}-{alpha}{sub 6}-{beta}{sub 2}-{gamma}{sub 2} cluster on chromosome 5 provides further evidence that the number of ancestral GABA{sub A} receptor subunit genes has been expanded by duplication within an ancestral gene cluster. Moreover, if duplication of the {alpha} gene occurred before duplication of the ancestral gene cluster, then a heretofore undiscovered subtype of a subunit should be located on human chromosome 15q11-q13 within an {alpha}{sub 5}-{alpha}{sub x}-{beta}{sub 3}-{gamma}{sub 3} gene cluster at the locus for Angelman and Prader-Willi syndromes. 34 refs., 6 figs., 1 tab.

  20. Genome-Wide Analysis of Secondary Metabolite Gene Clusters in Ophiostoma ulmi and Ophiostoma novo-ulmi Reveals a Fujikurin-Like Gene Cluster with a Putative Role in Infection

    Directory of Open Access Journals (Sweden)

    Nicolau Sbaraini

    2017-06-01

    Full Text Available The emergence of new microbial pathogens can result in destructive outbreaks, since their hosts have limited resistance and pathogens may be excessively aggressive. Described as the major ecological incident of the twentieth century, Dutch elm disease, caused by ascomycete fungi from the Ophiostoma genus, has caused a significant decline in elm tree populations (Ulmus sp. in North America and Europe. Genome sequencing of the two main causative agents of Dutch elm disease (Ophiostoma ulmi and Ophiostoma novo-ulmi, along with closely related species with different lifestyles, allows for unique comparisons to be made to identify how pathogens and virulence determinants have emerged. Among several established virulence determinants, secondary metabolites (SMs have been suggested to play significant roles during phytopathogen infection. Interestingly, the secondary metabolism of Dutch elm pathogens remains almost unexplored, and little is known about how SM biosynthetic genes are organized in these species. To better understand the metabolic potential of O. ulmi and O. novo-ulmi, we performed a deep survey and description of SM biosynthetic gene clusters (BGCs in these species and assessed their conservation among eight species from the Ophiostomataceae family. Among 19 identified BGCs, a fujikurin-like gene cluster (OpPKS8 was unique to Dutch elm pathogens. Phylogenetic analysis revealed that orthologs for this gene cluster are widespread among phytopathogens and plant-associated fungi, suggesting that OpPKS8 may have been horizontally acquired by the Ophiostoma genus. Moreover, the detailed identification of several BGCs paves the way for future in-depth research and supports the potential impact of secondary metabolism on Ophiostoma genus’ lifestyle.

  1. clusters

    Indian Academy of Sciences (India)

    2017-09-27

    Sep 27, 2017 ... while CuCoNO, Co3NO, Cu3CoNO, Cu2Co3NO, Cu3Co3NO and Cu6CoNO clusters display stronger chemical stability. Magnetic and electronic properties are also discussed. The magnetic moment is affected by charge transfer and the spd hybridization. Keywords. CumConNO (m + n = 2–7) clusters; ...

  2. Identifying spatial clustering properties of the 1997-2003 Liguria (Northern Italy) forest-fire sequence

    International Nuclear Information System (INIS)

    Telesca, Luciano; Amatulli, Giuseppe; Lasaponara, Rosa; Lovallo, Michele; Santulli, Adriano

    2007-01-01

    The spatial clustering of the forest-fire sequence (1997-2003) of Liguria Region (Northern Italy) has been analysed using the correlation dimension D C , calculated by means of the correlation integral method. Studying the variations of this parameter, we recognize the presence of a strong variability of the spatial clusterization, modulated by seasonal cycles. Furthermore, we found that the larger fires (size >400 ha) mark the cyclic behaviour of the correlation dimension

  3. UML Language Use in Identifying Tangible and Intangible Assets in a Cluster

    OpenAIRE

    Claudiu Pîrnău; Anca Ioana Vlad

    2013-01-01

    Clusters contain a group of related industries and other entities important in terms of competition and are geographic concentrations of interconnected companies and institutions belonging to a particular area. These include suppliers of specialized inputs such as components, machinery and services, and providers of specialized infrastructure. Clusters often extend downstream towards various distribution channels and customers and later to manufacturers of complementary products and the indus...

  4. Identification of natural killer cell receptor clusters in the platypus genome reveals an expansion of C-type lectin genes.

    Science.gov (United States)

    Wong, Emily S W; Sanderson, Claire E; Deakin, Janine E; Whittington, Camilla M; Papenfuss, Anthony T; Belov, Katherine

    2009-08-01

    Natural killer (NK) cell receptors belong to two unrelated, but functionally analogous gene families: the immunoglobulin superfamily, situated in the leukocyte receptor complex (LRC) and the C-type lectin superfamily, located in the natural killer complex (NKC). Here, we describe the largest NK receptor gene expansion seen to date. We identified 213 putative C-type lectin NK receptor homologs in the genome of the platypus. Many have arisen as the result of a lineage-specific expansion. Orthologs of OLR1, CD69, KLRE, CLEC12B, and CLEC16p genes were also identified. The NKC is split into at least two regions of the genome: 34 genes map to chromosome 7, two map to a small autosome, and the remainder are unanchored in the current genome assembly. No NK receptor genes from the LRC were identified. The massive C-type lectin expansion and lack of Ig-domain-containing NK receptors represents the most extreme polarization of NK receptors found to date. We have used this new data from platypus to trace the possible evolutionary history of the NK receptor clusters.

  5. Genome-wide significant localization for working and spatial memory: Identifying genes for psychosis using models of cognition.

    Science.gov (United States)

    Knowles, Emma E M; Carless, Melanie A; de Almeida, Marcio A A; Curran, Joanne E; McKay, D Reese; Sprooten, Emma; Dyer, Thomas D; Göring, Harald H; Olvera, Rene; Fox, Peter; Almasy, Laura; Duggirala, Ravi; Kent, Jack W; Blangero, John; Glahn, David C

    2014-01-01

    It is well established that risk for developing psychosis is largely mediated by the influence of genes, but identifying precisely which genes underlie that risk has been problematic. Focusing on endophenotypes, rather than illness risk, is one solution to this problem. Impaired cognition is a well-established endophenotype of psychosis. Here we aimed to characterize the genetic architecture of cognition using phenotypically detailed models as opposed to relying on general IQ or individual neuropsychological measures. In so doing we hoped to identify genes that mediate cognitive ability, which might also contribute to psychosis risk. Hierarchical factor models of genetically clustered cognitive traits were subjected to linkage analysis followed by QTL region-specific association analyses in a sample of 1,269 Mexican American individuals from extended pedigrees. We identified four genome wide significant QTLs, two for working and two for spatial memory, and a number of plausible and interesting candidate genes. The creation of detailed models of cognition seemingly enhanced the power to detect genetic effects on cognition and provided a number of possible candidate genes for psychosis. © 2013 Wiley Periodicals, Inc.

  6. Evolutionary history of the phl gene cluster in the plant-associated bacterium Pseudomonas fluorescens

    NARCIS (Netherlands)

    Moynihan, J.A.; Morrissey, J.P.; Coppoolse, E.; Stiekema, W.J.; O'Gara, F.; Boyd, E.F.

    2009-01-01

    Pseudomonas fluorescens is of agricultural and economic importance as a biological control agent largely because of its plant-association and production of secondary metabolites, in particular 2, 4-diacetylphloroglucinol (2, 4-DAPG). This polyketide, which is encoded by the eight gene phl cluster,

  7. Molecular population genetics of the β-esterase gene cluster of ...

    Indian Academy of Sciences (India)

    We suggest that the demographic history (bottleneck and admixture of genetically differentiated populations) is the major factor shaping the pattern of nucleotide polymorphism in the -esterase gene cluster. However there are some 'footprints' of directional and balancing selection shaping specific distribution of nucleotide ...

  8. Molecular population genetics of the β-esterase gene cluster of ...

    Indian Academy of Sciences (India)

    Unknown

    neutrality with recombination are significant for the β−esterase gene cluster in the non-African samples but not signi- ficant in the African one. We suggest ...... I. Viability studies. Genetics 102,. 467–483. Selva E. M., New L., Crouse G. F. and Lahue R. S. 1995 Mis- match correction acts as a barrier to homologous recombina-.

  9. The impact of self-identified race on epidemiologic studies of gene expression.

    Science.gov (United States)

    Sharma, Sunita; Murphy, Amy; Howrylak, Judie; Himes, Blanca; Cho, Michael H; Chu, Jen-Hwa; Hunninghake, Gary M; Fuhlbrigge, Anne; Klanderman, Barbara; Ziniti, John; Senter-Sylvia, Jody; Liu, Andy; Szefler, Stanley J; Strunk, Robert; Castro, Mario; Hansel, Nadia N; Diette, Gregory B; Vonakis, Becky M; Adkinson, N Franklin; Carey, Vincent J; Raby, Benjamin A

    2011-02-01

    Although population differences in gene expression have been established, the impact on differential gene expression studies in large populations is not well understood. We describe the effect of self-reported race on a gene expression study of lung function in asthma. We generated gene expression profiles for 254 young adults (205 non-Hispanic whites and 49 African Americans) with asthma on whom concurrent total RNA derived from peripheral blood CD4(+) lymphocytes and lung function measurements were obtained. We identified four principal components that explained 62% of the variance in gene expression. The dominant principal component, which explained 29% of the total variance in gene expression, was strongly associated with self-identified race (Pracial differences was observed when we performed differential gene expression analysis of lung function. Using multivariate linear models, we tested whether gene expression was associated with a quantitative measure of lung function: pre-bronchodilator forced expiratory volume in one second (FEV(1)). Though unadjusted linear models of FEV(1) identified several genes strongly correlated with lung function, these correlations were due to racial differences in the distribution of both FEV(1) and gene expression, and were no longer statistically significant following adjustment for self-identified race. These results suggest that self-identified race is a critical confounding covariate in epidemiologic studies of gene expression and that, similar to genetic studies, careful consideration of self-identified race in gene expression profiling studies is needed to avoid spurious association. © 2011 Wiley-Liss, Inc.

  10. Using BAC transgenesis in zebrafish to identify regulatory sequences of the amyloid precursor protein gene in humans

    Directory of Open Access Journals (Sweden)

    Shakes Leighcraft A

    2012-09-01

    Full Text Available Abstract Background Non-coding DNA in and around the human Amyloid Precursor Protein (APP gene that is central to Alzheimer’s disease (AD shares little sequence similarity with that of appb in zebrafish. Identifying DNA domains regulating expression of the gene in such situations becomes a challenge. Taking advantage of the zebrafish system that allows rapid functional analyses of gene regulatory sequences, we previously showed that two discontinuous DNA domains in zebrafish appb are important for expression of the gene in neurons: an enhancer in intron 1 and sequences 28–31 kb upstream of the gene. Here we identify the putative transcription factor binding sites responsible for this distal cis-acting regulation, and use that information to identify a regulatory region of the human APP gene. Results Functional analyses of intron 1 enhancer mutations in enhancer-trap BACs expressed as transgenes in zebrafish identified putative binding sites of two known transcription factor proteins, E4BP4/ NFIL3 and Forkhead, to be required for expression of appb. A cluster of three E4BP4 sites at −31 kb is also shown to be essential for neuron-specific expression, suggesting that the dependence of expression on upstream sequences is mediated by these E4BP4 sites. E4BP4/ NFIL3 and XFD1 sites in the intron enhancer and E4BP4/ NFIL3 sites at −31 kb specifically and efficiently bind the corresponding zebrafish proteins in vitro. These sites are statistically over-represented in both the zebrafish appb and the human APP genes, although their locations are different. Remarkably, a cluster of four E4BP4 sites in intron 4 of human APP exists in actively transcribing chromatin in a human neuroblastoma cell-line, SHSY5Y, expressing APP as shown using chromatin immunoprecipitation (ChIP experiments. Thus although the two genes share little sequence conservation, they appear to share the same regulatory logic and are regulated by a similar set of transcription

  11. Using BAC transgenesis in zebrafish to identify regulatory sequences of the amyloid precursor protein gene in humans.

    Science.gov (United States)

    Shakes, Leighcraft A; Du, Hansen; Wolf, Hope M; Hatcher, Charles; Norford, Derek C; Precht, Patricia; Sen, Ranjan; Chatterjee, Pradeep K

    2012-09-04

    Non-coding DNA in and around the human Amyloid Precursor Protein (APP) gene that is central to Alzheimer's disease (AD) shares little sequence similarity with that of appb in zebrafish. Identifying DNA domains regulating expression of the gene in such situations becomes a challenge. Taking advantage of the zebrafish system that allows rapid functional analyses of gene regulatory sequences, we previously showed that two discontinuous DNA domains in zebrafish appb are important for expression of the gene in neurons: an enhancer in intron 1 and sequences 28-31 kb upstream of the gene. Here we identify the putative transcription factor binding sites responsible for this distal cis-acting regulation, and use that information to identify a regulatory region of the human APP gene. Functional analyses of intron 1 enhancer mutations in enhancer-trap BACs expressed as transgenes in zebrafish identified putative binding sites of two known transcription factor proteins, E4BP4/ NFIL3 and Forkhead, to be required for expression of appb. A cluster of three E4BP4 sites at -31 kb is also shown to be essential for neuron-specific expression, suggesting that the dependence of expression on upstream sequences is mediated by these E4BP4 sites. E4BP4/ NFIL3 and XFD1 sites in the intron enhancer and E4BP4/ NFIL3 sites at -31 kb specifically and efficiently bind the corresponding zebrafish proteins in vitro. These sites are statistically over-represented in both the zebrafish appb and the human APP genes, although their locations are different. Remarkably, a cluster of four E4BP4 sites in intron 4 of human APP exists in actively transcribing chromatin in a human neuroblastoma cell-line, SHSY5Y, expressing APP as shown using chromatin immunoprecipitation (ChIP) experiments. Thus although the two genes share little sequence conservation, they appear to share the same regulatory logic and are regulated by a similar set of transcription factors. The results suggest that the clock

  12. Genomic and expression analysis of the vanG-like gene cluster of Clostridium difficile.

    Science.gov (United States)

    Peltier, Johann; Courtin, Pascal; El Meouche, Imane; Catel-Ferreira, Manuella; Chapot-Chartier, Marie-Pierre; Lemée, Ludovic; Pons, Jean-Louis

    2013-07-01

    Primary antibiotic treatment of Clostridium difficile intestinal diseases requires metronidazole or vancomycin therapy. A cluster of genes homologous to enterococcal glycopeptides resistance vanG genes was found in the genome of C. difficile 630, although this strain remains sensitive to vancomycin. This vanG-like gene cluster was found to consist of five ORFs: the regulatory region consisting of vanR and vanS and the effector region consisting of vanG, vanXY and vanT. We found that 57 out of 83 C. difficile strains, representative of the main lineages of the species, harbour this vanG-like cluster. The cluster is expressed as an operon and, when present, is found at the same genomic location in all strains. The vanG, vanXY and vanT homologues in C. difficile 630 are co-transcribed and expressed to a low level throughout the growth phases in the absence of vancomycin. Conversely, the expression of these genes is strongly induced in the presence of subinhibitory concentrations of vancomycin, indicating that the vanG-like operon is functional at the transcriptional level in C. difficile. Hydrophilic interaction liquid chromatography (HILIC-HPLC) and MS analysis of cytoplasmic peptidoglycan precursors of C. difficile 630 grown without vancomycin revealed the exclusive presence of a UDP-MurNAc-pentapeptide with an alanine at the C terminus. UDP-MurNAc-pentapeptide [d-Ala] was also the only peptidoglycan precursor detected in C. difficile grown in the presence of vancomycin, corroborating the lack of vancomycin resistance. Peptidoglycan structures of a vanG-like mutant strain and of a strain lacking the vanG-like cluster did not differ from the C. difficile 630 strain, indicating that the vanG-like cluster also has no impact on cell-wall composition.

  13. Blood Pressure Loci Identified with a Gene-Centric Array

    NARCIS (Netherlands)

    Johnson, Toby; Gaunt, Tom R.; Newhouse, Stephen J.; Padmanabhan, Sandosh; Tomaszewski, Maciej; Kumari, Meena; Morris, Richard W.; Tzoulaki, Ioanna; O'Brien, Eoin T.; Poulter, Neil R.; Sever, Peter; Shields, Denis C.; Thom, Simon; Wannamethee, Sasiwarang G.; Whincup, Peter H.; Brown, Morris J.; Connell, John M.; Dobson, Richard J.; Howard, Philip J.; Mein, Charles A.; Onipinla, Abiodun; Shaw-Hawkins, Sue; Zhang, Yun; Smith, George Davey; Day, Ian N. M.; Lawlor, Debbie A.; Goodall, Alison H.; Fowkes, F. Gerald; Abecasis, Goncalo R.; Elliott, Paul; Gateva, Vesela; Braund, Peter S.; Burton, Paul R.; Nelson, Christopher P.; Tobin, Martin D.; van der Harst, Pim; Glorioso, Nicola; Neuvrith, Hani; Salvi, Erika; Staessen, Jan A.; Stucchi, Andrea; Devos, Nabila; Jeunemaitre, Xavier; Plouin, Pierre-Francois; Tichet, Jean; Juhanson, Peeter; Org, Elin; Putku, Margus; Sober, Siim; Veldre, Gudrun; Viigimaa, Margus; Levinsson, Anna; Rosengren, Annika; Thelle, Dag S.; Hastie, Claire E.; Hedner, Thomas; Lee, Wai K.; Melander, Olle; Wahlstrand, Bjoern; Hardy, Rebecca; Wong, Andrew; Cooper, Jackie A.; Palmen, Jutta; Chen, Li; Stewart, Alexandre F. R.; Wells, George A.; Westra, Harm-Jan; Wolfs, Marcel G. M.; Clarke, Robert; Franzosi, Maria Grazia; Goel, Anuj; Hamsten, Anders; Lathrop, Mark; Peden, John F.; Seedorf, Udo; Watkins, Hugh; Ouwehand, Willem H.; Sambrook, Jennifer; Stephens, Jonathan; Casas, Juan-Pablo; Drenos, Fotios; Holmes, Michael V.; Kivimaki, Mika; Shah, Sonia; Shah, Tina; Talmud, Philippa J.; Whittaker, John; Wallace, Chris; Delles, Christian; Laan, Mans; Kuh, Diana; Humphries, Steve E.; Nyberg, Fredrik; Cusi, Daniele; Roberts, Robert; Newton-Cheh, Christopher; Franke, Lude; Stanton, Alice V.; Dominiczak, Anna F.; Farrall, Martin; Hingorani, Aroon D.; Samani, Nilesh J.; Caulfield, Mark J.; Munroe, Patricia B.

    2011-01-01

    Raised blood pressure (BP) is a major risk factor for cardiovascular disease. Previous studies have identified 47 distinct genetic variants robustly associated with BP, but collectively these explain only a few percent of the heritability for BP phenotypes. To find additional BP loci, we used a

  14. Clustering of multi-parametric functional imaging to identify high-risk subvolumes in non-small cell lung cancer.

    Science.gov (United States)

    Even, Aniek J G; Reymen, Bart; La Fontaine, Matthew D; Das, Marco; Mottaghy, Felix M; Belderbos, José S A; De Ruysscher, Dirk; Lambin, Philippe; van Elmpt, Wouter

    2017-12-01

    We aimed to identify tumour subregions with characteristic phenotypes based on pre-treatment multi-parametric functional imaging and correlate these subregions to treatment outcome. The subregions were created using imaging of metabolic activity (FDG-PET/CT), hypoxia (HX4-PET/CT) and tumour vasculature (DCE-CT). 36 non-small cell lung cancer (NSCLC) patients underwent functional imaging prior to radical radiotherapy. Kinetic analysis was performed on DCE-CT scans to acquire blood flow (BF) and volume (BV) maps. HX4-PET/CT and DCE-CT scans were non-rigidly co-registered to the planning FDG-PET/CT. Two clustering steps were performed on multi-parametric images: first to segment each tumour into homogeneous subregions (i.e. supervoxels) and second to group the supervoxels of all tumours into phenotypic clusters. Patients were split based on the absolute or relative volume of supervoxels in each cluster; overall survival was compared using a log-rank test. Unsupervised clustering of supervoxels yielded four independent clusters. One cluster (high hypoxia, high FDG, intermediate BF/BV) related to a high-risk tumour type: patients assigned to this cluster had significantly worse survival compared to patients not in this cluster (p = 0.035). We designed a subregional analysis for multi-parametric imaging in NSCLC, and showed the potential of subregion classification as a biomarker for prognosis. This methodology allows for a comprehensive data-driven analysis of multi-parametric functional images. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.

  15. Two Phenotypes Are Identified by Cluster Analysis in Early Inflammatory Back Pain Suggestive of Spondyloarthritis: Results From the DESIR Cohort.

    Science.gov (United States)

    Costantino, Félicie; Aegerter, Philippe; Dougados, Maxime; Breban, Maxime; D'Agostino, Maria-Antonietta

    2016-07-01

    To determine whether disease manifestations at baseline would combine according to distinguishable ordered phenotypes in patients with early inflammatory back pain (IBP) suggestive of spondyloarthritis (SpA). Baseline clinical and demographic characteristics as well as imaging features and biologic data on patients included in the French multicenter Devenir des Spondyloarthropathies Indifferérenciées Récentes cohort were analyzed by multiple correspondence analysis and cluster analysis to identify subgroups of patients based on shared characteristics. Cluster analysis allowed us to classify the 679 patients with no missing data into 2 major groups-one with a predominance of isolated axial manifestations and the other with associated peripheral symptoms. The application of the same analysis to selected subsets of the cohort, such as HLA-B27-positive and -negative patients and patients fulfilling the Assessment of SpondyloArthritis international Society classification criteria for axial SpA, resulted again in an optimal division of the samples into 2 recurrent clusters of patients similar to those observed in the whole cohort. Cluster analysis of SpA manifestations among patients with early IBP highly suggestive of SpA allowed us to clearly identify at baseline 2 different clinical phenotypes-one with predominant axial manifestations and the other with predominant peripheral manifestations. Ongoing follow-up will allow us to determine whether these clusters correspond to different patterns of disease severity. © 2016, American College of Rheumatology.

  16. Utilizing Hierarchical Clustering to improve Efficiency of Self-Organizing Feature Map to Identify Hydrological Homogeneous Regions

    Science.gov (United States)

    Farsadnia, Farhad; Ghahreman, Bijan

    2016-04-01

    Hydrologic homogeneous group identification is considered both fundamental and applied research in hydrology. Clustering methods are among conventional methods to assess the hydrological homogeneous regions. Recently, Self-Organizing feature Map (SOM) method has been applied in some studies. However, the main problem of this method is the interpretation on the output map of this approach. Therefore, SOM is used as input to other clustering algorithms. The aim of this study is to apply a two-level Self-Organizing feature map and Ward hierarchical clustering method to determine the hydrologic homogenous regions in North and Razavi Khorasan provinces. At first by principal component analysis, we reduced SOM input matrix dimension, then the SOM was used to form a two-dimensional features map. To determine homogeneous regions for flood frequency analysis, SOM output nodes were used as input into the Ward method. Generally, the regions identified by the clustering algorithms are not statistically homogeneous. Consequently, they have to be adjusted to improve their homogeneity. After adjustment of the homogeneity regions by L-moment tests, five hydrologic homogeneous regions were identified. Finally, adjusted regions were created by a two-level SOM and then the best regional distribution function and associated parameters were selected by the L-moment approach. The results showed that the combination of self-organizing maps and Ward hierarchical clustering by principal components as input is more effective than the hierarchical method, by principal components or standardized inputs to achieve hydrologic homogeneous regions.

  17. Using a Candidate Gene-Based Genetic Linkage Map to Identify QTL for Winter Survival in Perennial Ryegrass.

    Directory of Open Access Journals (Sweden)

    Cristiana Paina

    Full Text Available Important agronomical traits in perennial ryegrass (Lolium perenne breeding programs such as winter survival and heading date, are quantitative traits that are generally controlled by multiple loci. Individually, these loci have relatively small effects. The aim of this study was to develop a candidate gene based Illumina GoldenGate 1,536-plex assay, containing single nucleotide polymorphism markers designed from transcripts involved in response to cold acclimation, vernalization, and induction of flowering. The assay was used to genotype a mapping population that we have also phenotyped for winter survival to complement the heading date trait previously mapped in this population. A positive correlation was observed between strong vernalization requirement and winter survival, and some QTL for winter survival and heading date overlapped on the genetic map. Candidate genes were located in clusters along the genetic map, some of which co-localized with QTL for winter survival and heading date. These clusters of candidate genes may be used in candidate gene based association studies to identify alleles associated with winter survival and heading date.

  18. Mandibulofacial dysostosis in a patient with a de novo 2;17 translocation that disrupts the HOXD gene cluster.

    Science.gov (United States)

    Stevenson, David A; Bleyl, Steven B; Maxwell, Teresa; Brothman, Arthur R; South, Sarah T

    2007-05-15

    Treacher Collins syndrome (TCS) is the prototypical mandibulofacial dysostosis syndrome, but other mandibulofacial dysostosis syndromes have been described. We report an infant with mandibulofacial dysostosis and an apparently balanced de novo 2;17 translocation. She presented with severe lower eyelid colobomas requiring skin grafting, malar and mandibular hypoplasia, bilateral microtia with external auditory canal atreasia, dysplastic ossicles, hearing loss, bilateral choanal stenosis, cleft palate without cleft lip, several oral frenula of the upper lip/gum, and micrognathia requiring tracheostomy. Her limbs were normal. Chromosome analysis at the 600-band level showed a 46,XX,t(2;17)(q24.3;q23) karyotype. Sequencing of the entire TCOF1 coding region did not show evidence of a sequence variation. High-resolution genomic microarray analysis did not identify a cryptic imbalance. FISH mapping refined the breakpoints to 2q31.1 and 17q24.3-25.1 and showed the 2q31.1 breakpoint likely affects the HOXD gene cluster. Several atypical findings and lack of an identifiable TCOF1 mutation suggest that this child has a provisionally unique mandibulofacial dysostosis syndrome. The apparently balanced de novo translocation provides candidate loci for atypical and TCOF1 mutation negative cases of TCS. Based on the agreement of our findings with one previous case of mandibulofacial dysostosis with a 2q31.1 transocation, we hypothesize that misexpression of genes in the HOXD gene cluster produced the described phenotype in this patient.

  19. Multiplexed CRISPR/Cas9- and TAR-Mediated Promoter Engineering of Natural Product Biosynthetic Gene Clusters in Yeast.

    Science.gov (United States)

    Kang, Hahk-Soo; Charlop-Powers, Zachary; Brady, Sean F

    2016-09-16

    The use of DNA sequencing to guide the discovery of natural products has emerged as a new paradigm for revealing chemistries encoded in bacterial genomes. A major obstacle to implementing this approach to natural product discovery is the transcriptional silence of biosynthetic gene clusters under laboratory growth conditions. Here we describe an improved yeast-based promoter engineering platform (mCRISTAR) that combines CRISPR/Cas9 and TAR to enable single-marker multiplexed promoter engineering of large gene clusters. mCRISTAR highlights the first application of the CRISPR/Cas9 system to multiplexed promoter engineering of natural product biosynthetic gene clusters. In this method, CRISPR/Cas9 is used to induce DNA double-strand breaks in promoter regions of biosynthetic gene clusters, and the resulting operon fragments are reassembled by TAR using synthetic gene-cluster-specific promoter cassettes. mCRISTAR uses a CRISPR array to simplify the construction of a CRISPR plasmid for multiplex CRISPR and a single auxotrophic selection to improve the inefficiency of using a CRISPR array for multiplex gene cluster refactoring. mCRISTAR is a simple and generic method for multiplexed replacement of promoters in biosynthetic gene clusters that will facilitate the discovery of natural products from the rapidly growing collection of gene clusters found in microbial genome and metagenome sequencing projects.

  20. Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages

    Science.gov (United States)

    Fungi that have the enzymes cyanase and carbonic anhydrase show a limited capacity to detoxify cyanate, a fungicide employed by both plants and humans. Here, we describe a novel two-gene cluster that comprises duplicated cyanase and carbonic anhydrase copies, which we name the CCA gene cluster, trac...

  1. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis

    Directory of Open Access Journals (Sweden)

    Saurav Mallik

    2017-12-01

    Full Text Available For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures—weighted rank-based Jaccard and Cosine measures—and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm—RANWAR—was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

  2. [Sequence analysis of 16S rDNA and pmoCAB gene cluster of trichloroethylene-degrading methanotroph].

    Science.gov (United States)

    Zhang, Yunru; Chen, Huaqing; Gao, Yanhui; Xing, Zhilin; Zhao, Tiantao

    2014-12-01

    Methanotrophs could degrade methane and various chlorinated hydrocarbons. The analysis on methane monooxygenase gene cluster sequence would help to understand its catalytic mechanism and enhance the application in pollutants biodegradation. The methanotrophs was enriched and isolated with methane as the sole carbon source in the nitrate mineral salt medium. Then, five chlorinated hydrocarbons were selected as cometabolic substrates to study the biodegradation. The phylogenetic tree of 16S rDNA using MEGE5.05 software was constructed to identify the methanotroph strain. The pmoCAB gene cluster encoding particulate methane monooxygenase (pMMO) was amplified by semi-nested PCR in segments. ExPASy was performed to analyze theoretical molecular weight of the three pMMO subunits. As a result, a strain of methanotroph was isolated. The phylogenetic analysis indicated that the strain belongs to a species of Methylocystis, and it was named as Methylocystis sp. JTC3. The degradation rate of trichloroethylene (TCE) reached 93.79% when its initial concentration was 15.64 μmol/L after 5 days. We obtained the pmoCAB gene cluster of 3 227 bp including pmoC gene of 771 bp, pmoA gene of 759 bp, pmoB gene of 1 260 bp and two noncoding sequences in the middle by semi-nested PCR, T-A cloning and sequencing. The theoretical molecular weight of their corresponding gamma, beta and alpha subunit were 29.1 kDa, 28.6 kDa and 45.6 kDa respectively analyzed using ExPASy tool. The pmoCAB gene cluster of JTC3 was highly identical with that of Methylocystis sp. strain M analyzed by Blast, and pmoA sequences is more conservative than pmoC and pmoB. Finally, Methylocystis sp. JTC3 could degrade TCE efficiently. And the detailed analysis of pmoCAB from Methylocystis sp. JTC3 laid a solid foundation to further study its active sites features and its selectivity to chlorinated hydrocarbon.

  3. A search engine to identify pathway genes from expression data on multiple organisms

    Directory of Open Access Journals (Sweden)

    Zambon Alexander C

    2007-05-01

    Full Text Available Abstract Background The completion of several genome projects showed that most genes have not yet been characterized, especially in multicellular organisms. Although most genes have unknown functions, a large collection of data is available describing their transcriptional activities under many different experimental conditions. In many cases, the coregulatation of a set of genes across a set of conditions can be used to infer roles for genes of unknown function. Results We developed a search engine, the Multiple-Species Gene Recommender (MSGR, which scans gene expression datasets from multiple organisms to identify genes that participate in a genetic pathway. The MSGR takes a query consisting of a list of genes that function together in a genetic pathway from one of six organisms: Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana, and Helicobacter pylori. Using a probabilistic method to merge searches, the MSGR identifies genes that are significantly coregulated with the query genes in one or more of those organisms. The MSGR achieves its highest accuracy for many human pathways when searches are combined across species. We describe specific examples in which new genes were identified to be involved in a neuromuscular signaling pathway and a cell-adhesion pathway. Conclusion The search engine can scan large collections of gene expression data for new genes that are significantly coregulated with a pathway of interest. By integrating searches across organisms, the MSGR can identify pathway members whose coregulation is either ancient or newly evolved.

  4. A transcription map of the 6p22.3 reading disability locus identifying candidate genes.

    Science.gov (United States)

    Londin, Eric R; Meng, Haiying; Gruen, Jeffrey R

    2003-06-30

    Reading disability (RD) is a common syndrome with a large genetic component. Chromosome 6 has been identified in several linkage studies as playing a significant role. A more recent study identified a peak of transmission disequilibrium to marker JA04 (G72384) on chromosome 6p22.3, suggesting that a gene is located near this marker. In silico cloning was used to identify possible candidate genes located near the JA04 marker. The 2 million base pairs of sequence surrounding JA04 was downloaded and searched against the dbEST database to identify ESTs. In total, 623 ESTs from 80 different tissues were identified and assembled into 153 putative coding regions from 19 genes and 2 pseudogenes encoded near JA04. The identified genes were tested for their tissue specific expression by RT-PCR. In total, five possible candidate genes for RD and other diseases mapping to this region were identified.

  5. A transcription map of the 6p22.3 reading disability locus identifying candidate genes

    Directory of Open Access Journals (Sweden)

    Gruen Jeffrey R

    2003-06-01

    Full Text Available Abstract Background Reading disability (RD is a common syndrome with a large genetic component. Chromosome 6 has been identified in several linkage studies as playing a significant role. A more recent study identified a peak of transmission disequilibrium to marker JA04 (G72384 on chromosome 6p22.3, suggesting that a gene is located near this marker. Results In silico cloning was used to identify possible candidate genes located near the JA04 marker. The 2 million base pairs of sequence surrounding JA04 was downloaded and searched against the dbEST database to identify ESTs. In total, 623 ESTs from 80 different tissues were identified and assembled into 153 putative coding regions from 19 genes and 2 pseudogenes encoded near JA04. The identified genes were tested for their tissue specific expression by RT-PCR. Conclusions In total, five possible candidate genes for RD and other diseases mapping to this region were identified.

  6. A Metabolic Gene Cluster in the Wheat W1 and the Barley Cer-cqu Loci Determines β-Diketone Biosynthesis and Glaucousness.

    Science.gov (United States)

    Hen-Avivi, Shelly; Savin, Orna; Racovita, Radu C; Lee, Wing-Sham; Adamski, Nikolai M; Malitsky, Sergey; Almekias-Siegl, Efrat; Levy, Matan; Vautrin, Sonia; Bergès, Hélène; Friedlander, Gilgi; Kartvelishvily, Elena; Ben-Zvi, Gil; Alkan, Noam; Uauy, Cristobal; Kanyuka, Kostya; Jetter, Reinhard; Distelfeld, Assaf; Aharoni, Asaph

    2016-06-01

    The glaucous appearance of wheat (Triticum aestivum) and barley (Hordeum vulgare) plants, that is the light bluish-gray look of flag leaf, stem, and spike surfaces, results from deposition of cuticular β-diketone wax on their surfaces; this phenotype is associated with high yield, especially under drought conditions. Despite extensive genetic and biochemical characterization, the molecular genetic basis underlying the biosynthesis of β-diketones remains unclear. Here, we discovered that the wheat W1 locus contains a metabolic gene cluster mediating β-diketone biosynthesis. The cluster comprises genes encoding proteins of several families including type-III polyketide synthases, hydrolases, and cytochrome P450s related to known fatty acid hydroxylases. The cluster region was identified in both genetic and physical maps of glaucous and glossy tetraploid wheat, demonstrating entirely different haplotypes in these accessions. Complementary evidence obtained through gene silencing in planta and heterologous expression in bacteria supports a model for a β-diketone biosynthesis pathway involving members of these three protein families. Mutations in homologous genes were identified in the barley eceriferum mutants defective in β-diketone biosynthesis, demonstrating a gene cluster also in the β-diketone biosynthesis Cer-cqu locus in barley. Hence, our findings open new opportunities to breed major cereal crops for surface features that impact yield and stress response. © 2016 American Society of Plant Biologists. All rights reserved.

  7. cluster

    Indian Academy of Sciences (India)

    has been investigated electrochemically in positive and negative microenvironments, both in solution and in film. Charge nature around the active centre ... in plants, bacteria and also in mammals. This cluster is also an important constituent of a ..... selection of non-cysteine amino acid in the active centre of Rieske proteins.

  8. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data

    Directory of Open Access Journals (Sweden)

    Scherer Stephen W

    2011-05-01

    Full Text Available Abstract Background Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. Results We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. Conclusions The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  9. An epigenetic switch involving overlapping fur and DNA methylation optimizes expression of a type VI secretion gene cluster.

    Directory of Open Access Journals (Sweden)

    Yannick R Brunet

    2011-07-01

    Full Text Available Type VI secretion systems (T6SS are macromolecular machines of the cell envelope of Gram-negative bacteria responsible for bacterial killing and/or virulence towards different host cells. Here, we characterized the regulatory mechanism underlying expression of the enteroagregative Escherichia coli sci1 T6SS gene cluster. We identified Fur as the main regulator of the sci1 cluster. A detailed analysis of the promoter region showed the presence of three GATC motifs, which are target of the DNA adenine methylase Dam. Using a combination of reporter fusion, gel shift, and in vivo and in vitro Dam methylation assays, we dissected the regulatory role of Fur and Dam-dependent methylation. We showed that the sci1 gene cluster expression is under the control of an epigenetic switch depending on methylation: fur binding prevents methylation of a GATC motif, whereas methylation at this specific site decreases the affinity of Fur for its binding box. A model is proposed in which the sci1 promoter is regulated by iron availability, adenine methylation, and DNA replication.

  10. Identifying resistance gene analogs associated with resistances to different pathogens in common bean.

    Science.gov (United States)

    López, Camilo E; Acosta, Iván F; Jara, Carlos; Pedraza, Fabio; Gaitán-Solís, Eliana; Gallego, Gerardo; Beebe, Steve; Tohme, Joe

    2003-01-01

    ABSTRACT A polymerase chain reaction approach using degenerate primers that targeted the conserved domains of cloned plant disease resistance genes (R genes) was used to isolate a set of 15 resistance gene analogs (RGAs) from common bean (Phaseolus vulgaris). Eight different classes of RGAs were obtained from nucleotide binding site (NBS)-based primers and seven from not previously described Toll/Interleukin-1 receptor-like (TIR)-based primers. Putative amino acid sequences of RGAs were significantly similar to R genes and contained additional conserved motifs. The NBS-type RGAs were classified in two subgroups according to the expected final residue in the kinase-2 motif. Eleven RGAs were mapped at 19 loci on eight linkage groups of the common bean genetic map constructed at Centro Internacional de Agricultura Tropical. Genetic linkage was shown for eight RGAs with partial resistance to anthracnose, angular leaf spot (ALS) and Bean golden yellow mosaic virus (BGYMV). RGA1 and RGA2 were associated with resistance loci to anthracnose and BGYMV and were part of two clusters of R genes previously described. A new major cluster was detected by RGA7 and explained up to 63.9% of resistance to ALS and has a putative contribution to anthracnose resistance. These results show the usefulness of RGAs as candidate genes to detect and eventually isolate numerous R genes in common bean.

  11. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

    Science.gov (United States)

    Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

    2017-12-01

    Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Cluster Analysis of an International Pressure Pain Threshold Database Identifies 4 Meaningful Subgroups of Adults With Mechanical Neck Pain

    DEFF Research Database (Denmark)

    Walton, David M; Kwok, Timothy S H; Mehta, Swati

    2017-01-01

    values taken at both a local and distal region (total N=1176). Minor systematic differences in mean PPDT values across cohorts necessitated z-transformation before analysis, and each cohort was split into male and female sexes. Latent profile analysis (LPA) using the k-means approach was undertaken...... predictor variables were evaluated for intracluster and cross-cluster significance. Low-low cluster was most affected, as indicated by pain intensity, disability, and catastrophization scores all significantly above the cohort-specific and sex-specific mean, and active range of motion scores significantly...... to identify the most parsimonious set of PPDT-based phenotypes that were both statistically and clinically meaningful. RESULTS: LPA revealed 4 distinct clusters named according to PPDT levels at the local and distal zones: low-low PPDT (67%), mod-mod (25%), mod-high (4%), and high-high (4%). Secondary...

  13. Gene expression profiling and candidate gene resequencing identifies pathways and mutations important for malignant transformation caused by leukemogenic fusion genes.

    Science.gov (United States)

    Novak, Rachel L; Harper, David P; Caudell, David; Slape, Christopher; Beachy, Sarah H; Aplan, Peter D

    2012-12-01

    NUP98-HOXD13 (NHD13) and CALM-AF10 (CA10) are oncogenic fusion proteins produced by recurrent chromosomal translocations in patients with acute myeloid leukemia (AML). Transgenic mice that express these fusions develop AML with a long latency and incomplete penetrance, suggesting that collaborating genetic events are required for leukemic transformation. We employed genetic techniques to identify both preleukemic abnormalities in healthy transgenic mice as well as collaborating events leading to leukemic transformation. Candidate gene resequencing revealed that 6 of 27 (22%) CA10 AMLs spontaneously acquired a Ras pathway mutation and 8 of 27 (30%) acquired an Flt3 mutation. Two CA10 AMLs acquired an Flt3 internal-tandem duplication, demonstrating that these mutations can be acquired in murine as well as human AML. Gene expression profiles revealed a marked upregulation of Hox genes, particularly Hoxa5, Hoxa9, and Hoxa10 in both NHD13 and CA10 mice. Furthermore, mir196b, which is embedded within the Hoxa locus, was overexpressed in both CA10 and NHD13 samples. In contrast, the Hox cofactors Meis1 and Pbx3 were differentially expressed; Meis1 was increased in CA10 AMLs but not NHD13 AMLs, whereas Pbx3 was consistently increased in NHD13 but not CA10 AMLs. Silencing of Pbx3 in NHD13 cells led to decreased proliferation, increased apoptosis, and decreased colony formation in vitro, suggesting a previously unexpected role for Pbx3 in leukemic transformation. Published by Elsevier Inc.

  14. NJ cluster analysis of the SnRK2, PYR/PYL/RCAR, and ABF genes in Tibetan hulless barley.

    Science.gov (United States)

    Yuan, H J; Wang, Y L; Wei, Z X; Xu, Q J; Zeng, X Q; Tang, Y W; Nyima, T S

    2016-11-03

    The abscisic acid (ABA) signaling pathway is known as one of the most important signaling pathways in plants and is mediated by multiple regulators. The genes SnRK2, PYR/PYL/RCAR, and ABF are relevant to both ABA-dependent and -independent signaling pathways. To elucidate the profile of these genes from Tibetan hulless barley (Hordeum vulgare L. var. nudum Hook. f.), we collected available sequences from RNA-Seq data, together with NCBI data from five other model plant species (Arabidopsis thaliana, Brachypodium distachyon, Oryza sativa, Populus trichocarpa, and Sorghum bicolor). Gene trees of SnRK2, PYR/PYL/RCAR, and ABF were constructed using a neighbor joining (NJ) method. For all genes, we identified a dominant group in which all six species were represented. Three, four, and five groups were found in the NJ trees of SnRK2, PYR/PYL/RCAR, and ABF, respectively. For each gene, Tibetan hulless barley was divided into three groups. Our analyses indicated that Tibetan hulless barley was associated with B. distachyon. The NJ cluster analysis also suggested that Tibetan hulless barley was affiliated with S. bicolor (SnRK2), A. thaliana (PYR/PYL/RCAR), and O. sativa (ABF). These results illustrate a diverse expression of genes SnRK2, PYR/PYL/RCAR, and ABF, and suggest a relationship among the six species studied. Collectively, our characterization of the three components of the ABA signaling pathway may contribute to improve stress tolerance in Tibetan hulless barley.

  15. Functional characterization of diverse ring-hydroxylating oxygenases and induction of complex aromatic catabolic gene clusters in Sphingobium sp. PNB

    Directory of Open Access Journals (Sweden)

    Pratick Khara

    2014-01-01

    Full Text Available Sphingobium sp. PNB, like other sphingomonads, has multiple ring-hydroxylating oxygenase (RHO genes. Three different fosmid clones have been sequenced to identify the putative genes responsible for the degradation of various aromatics in this bacterial strain. Comparison of the map of the catabolic genes with that of different sphingomonads revealed a similar arrangement of gene clusters that harbors seven sets of RHO terminal components and a sole set of electron transport (ET proteins. The presence of distinctly conserved amino acid residues in ferredoxin and in silico molecular docking analyses of ferredoxin with the well characterized terminal oxygenase components indicated the structural uniqueness of the ET component in sphingomonads. The predicted substrate specificities, derived from the phylogenetic relationship of each of the RHOs, were examined based on transformation of putative substrates and their structural homologs by the recombinant strains expressing each of the oxygenases and the sole set of available ET proteins. The RHO AhdA1bA2b was functionally characterized for the first time and was found to be capable of transforming ethylbenzene, propylbenzene, cumene, p-cymene and biphenyl, in addition to a number of polycyclic aromatic hydrocarbons. Overexpression of aromatic catabolic genes in strain PNB, revealed by real-time PCR analyses, is a way forward to understand the complex regulation of degradative genes in sphingomonads.

  16. Identification of an extensive gene cluster among a family of PPOs in Trifolium pratense L. (red clover) using a large insert BAC library.

    Science.gov (United States)

    Winters, Ana; Heywood, Sue; Farrar, Kerrie; Donnison, Iain; Thomas, Ann; Webb, K Judith

    2009-07-20

    Polyphenol oxidase (PPO) activity in plants is a trait with potential economic, agricultural and environmental impact. In relation to the food industry, PPO-induced browning causes unacceptable discolouration in fruit and vegetables: from an agriculture perspective, PPO can protect plants against pathogens and environmental stress, improve ruminant growth by increasing nitrogen absorption and decreasing nitrogen loss to the environment through the animal's urine. The high PPO legume, red clover, has a significant economic and environmental role in sustaining low-input organic and conventional farms. Molecular markers for a range of important agricultural traits are being developed for red clover and improved knowledge of PPO genes and their structure will facilitate molecular breeding. A bacterial artificial chromosome (BAC) library comprising 26,016 BAC clones with an average 135 Kb insert size, was constructed from Trifolium pratense L. (red clover), a diploid legume with a haploid genome size of 440-637 Mb. Library coverage of 6-8 genome equivalents ensured good representation of genes: the library was screened for polyphenol oxidase (PPO) genes.Two single copy PPO genes, PPO4 and PPO5, were identified to add to a family of three, previously reported, paralogous genes (PPO1-PPO3). Multiple PPO1 copies were identified and characterised revealing a subfamily comprising three variants PPO1/2, PPO1/4 and PPO1/5. Six PPO genes clustered within the genome: four separate BAC clones could be assembled onto a predicted 190-510 Kb single BAC contig. A PPO gene family in red clover resides as a cluster of at least 6 genes. Three of these genes have high homology, suggesting a more recent evolutionary event. This PPO cluster covers a longer region of the genome than clusters detected in rice or previously reported in tomato. Full-length coding sequences from PPO4, PPO5, PPO1/5 and PPO1/4 will facilitate functional studies and provide genetic markers for plant breeding.

  17. Diverse and Abundant Secondary Metabolism Biosynthetic Gene Clusters in the Genomes of Marine Sponge Derived Streptomyces spp. Isolates

    Directory of Open Access Journals (Sweden)

    Stephen A. Jackson

    2018-02-01

    Full Text Available The genus Streptomyces produces secondary metabolic compounds that are rich in biological activity. Many of these compounds are genetically encoded by large secondary metabolism biosynthetic gene clusters (smBGCs such as polyketide synthases (PKS and non-ribosomal peptide synthetases (NRPS which are modular and can be highly repetitive. Due to the repeats, these gene clusters can be difficult to resolve using short read next generation datasets and are often quite poorly predicted using standard approaches. We have sequenced the genomes of 13 Streptomyces spp. strains isolated from shallow water and deep-sea sponges that display antimicrobial activities against a number of clinically relevant bacterial and yeast species. Draft genomes have been assembled and smBGCs have been identified using the antiSMASH (antibiotics and Secondary Metabolite Analysis Shell web platform. We have compared the smBGCs amongst strains in the search for novel sequences conferring the potential to produce novel bioactive secondary metabolites. The strains in this study recruit to four distinct clades within the genus Streptomyces. The marine strains host abundant smBGCs which encode polyketides, NRPS, siderophores, bacteriocins and lantipeptides. The deep-sea strains appear to be enriched with gene clusters encoding NRPS. Marine adaptations are evident in the sponge-derived strains which are enriched for genes involved in the biosynthesis and transport of compatible solutes and for heat-shock proteins. Streptomyces spp. from marine environments are a promising source of novel bioactive secondary metabolites as the abundance and diversity of smBGCs show high degrees of novelty. Sponge derived Streptomyces spp. isolates appear to display genomic adaptations to marine living when compared to terrestrial strains.

  18. Frequent long-range epigenetic silencing of protocadherin gene clusters on chromosome 5q31 in Wilms' tumor.

    Directory of Open Access Journals (Sweden)

    Anthony R Dallosso

    2009-11-01

    Full Text Available Wilms' tumour (WT is a pediatric tumor of the kidney that arises via failure of the fetal developmental program. The absence of identifiable mutations in the majority of WTs suggests the frequent involvement of epigenetic aberrations in WT. We therefore conducted a genome-wide analysis of promoter hypermethylation in WTs and identified hypermethylation at chromosome 5q31 spanning 800 kilobases (kb and more than 50 genes. The methylated genes all belong to alpha-, beta-, and gamma-protocadherin (PCDH gene clusters (Human Genome Organization nomenclature PCDHA@, PCDHB@, and PCDHG@, respectively. This demonstrates that long-range epigenetic silencing (LRES occurs in developmental tumors as well as in adult tumors. Bisulfite polymerase chain reaction analysis showed that PCDH hypermethylation is a frequent event found in all Wilms' tumor subtypes. Hypermethylation is concordant with reduced PCDH expression in tumors. WT precursor lesions showed no PCDH hypermethylation, suggesting that de novo PCDH hypermethylation occurs during malignant progression. Discrete boundaries of the PCDH domain are delimited by abrupt changes in histone modifications; unmethylated genes flanking the LRES are associated with permissive marks which are absent from methylated genes within the domain. Silenced genes are marked with non-permissive histone 3 lysine 9 dimethylation. Expression analysis of embryonic murine kidney and differentiating rat metanephric mesenchymal cells demonstrates that Pcdh expression is developmentally regulated and that Pcdhg@ genes are expressed in blastemal cells. Importantly, we show that PCDHs negatively regulate canonical Wnt signalling, as short-interfering RNA-induced reduction of PCDHG@ encoded proteins leads to elevated beta-catenin protein, increased beta-catenin/T-cell factor (TCF reporter activity, and induction of Wnt target genes. Conversely, over-expression of PCDHs suppresses beta-catenin/TCF-reporter activity and also inhibits

  19. Sequencing and transcriptional analysis of the Streptococcus thermophilus histamine biosynthesis gene cluster: factors that affect differential hdcA expression

    DEFF Research Database (Denmark)

    Calles-Enríquez, Marina; Hjort, Benjamin Benn; Andersen, Pia Skov

    2010-01-01

    to produce histamine. The hdc clusters of S. thermophilus CHCC1524 and CHCC6483 were sequenced, and the factors that affect histamine biosynthesis and histidine-decarboxylating gene (hdcA) expression were studied. The hdc cluster began with the hdcA gene, was followed by a transporter (hdcP), and ended...... with the hdcB gene, which is of unknown function. The three genes were orientated in the same direction. The genetic organization of the hdc cluster showed a unique organization among the lactic acid bacterial group and resembled those of Staphylococcus and Clostridium species, thus indicating possible...... acquisition through a horizontal transfer mechanism. Transcriptional analysis of the hdc cluster revealed the existence of a polycistronic mRNA covering the three genes. The histidine-decarboxylating gene (hdcA) of S. thermophilus demonstrated maximum expression during the stationary growth phase, with high...

  20. Identifying the optimal gene and gene set in hepatocellular carcinoma based on differential expression and differential co-expression algorithm.

    Science.gov (United States)

    Dong, Li-Yang; Zhou, Wei-Zhong; Ni, Jun-Wei; Xiang, Wei; Hu, Wen-Hao; Yu, Chang; Li, Hai-Yan

    2017-02-01

    The objective of this study was to identify the optimal gene and gene set for hepatocellular carcinoma (HCC) utilizing differential expression and differential co-expression (DEDC) algorithm. The DEDC algorithm consisted of four parts: calculating differential expression (DE) by absolute t-value in t-statistics; computing differential co-expression (DC) based on Z-test; determining optimal thresholds on the basis of Chi-squared (χ2) maximization and the corresponding gene was the optimal gene; and evaluating functional relevance of genes categorized into different partitions to determine the optimal gene set with highest mean minimum functional information (FI) gain (Δ*G). The optimal thresholds divided genes into four partitions, high DE and high DC (HDE-HDC), high DE and low DC (HDE-LDC), low DE and high DC (LDE‑HDC), and low DE and low DC (LDE-LDC). In addition, the optimal gene was validated by conducting reverse transcription-polymerase chain reaction (RT-PCR) assay. The optimal threshold for DC and DE were 1.032 and 1.911, respectively. Using the optimal gene, the genes were divided into four partitions including: HDE-HDC (2,053 genes), HED-LDC (2,822 genes), LDE-HDC (2,622 genes), and LDE-LDC (6,169 genes). The optimal gene was microtubule‑associated protein RP/EB family member 1 (MAPRE1), and RT-PCR assay validated the significant difference between the HCC and normal state. The optimal gene set was nucleoside metabolic process (GO\\GO:0009116) with Δ*G = 18.681 and 24 HDE-HDC partitions in total. In conclusion, we successfully investigated the optimal gene, MAPRE1, and gene set, nucleoside metabolic process, which may be potential biomarkers for targeted therapy and provide significant insight for revealing the pathological mechanism underlying HCC.

  1. QTL Mapping and CRISPR/Cas9 Editing to Identify a Drug Resistance Gene in Toxoplasma gondii.

    Science.gov (United States)

    Shen, Bang; Powell, Robin H; Behnke, Michael S

    2017-06-22

    Scientific knowledge is intrinsically linked to available technologies and methods. This article will present two methods that allowed for the identification and verification of a drug resistance gene in the Apicomplexan parasite Toxoplasma gondii, the method of Quantitative Trait Locus (QTL) mapping using a Whole Genome Sequence (WGS) -based genetic map and the method of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 -based gene editing. The approach of QTL mapping allows one to test if there is a correlation between a genomic region(s) and a phenotype. Two datasets are required to run a QTL scan, a genetic map based on the progeny of a recombinant cross and a quantifiable phenotype assessed in each of the progeny of that cross. These datasets are then formatted to be compatible with R/qtl software that generates a QTL scan to identify significant loci correlated with the phenotype. Although this can greatly narrow the search window of possible candidates, QTLs span regions containing a number of genes from which the causal gene needs to be identified. Having WGS of the progeny was critical to identify the causal drug resistance mutation at the gene level. Once identified, the candidate mutation can be verified by genetic manipulation of drug sensitive parasites. The most facile and efficient method to genetically modify T. gondii is the CRISPR/Cas9 system. This system comprised of just 2 components both encoded on a single plasmid, a single guide RNA (gRNA) containing a 20 bp sequence complementary to the genomic target and the Cas9 endonuclease that generates a double-strand DNA break (DSB) at the target, repair of which allows for insertion or deletion of sequences around the break site. This article provides detailed protocols to use CRISPR/Cas9 based genome editing tools to verify the gene responsible for sinefungin resistance and to construct transgenic parasites.

  2. LGscore: A method to identify disease-related genes using biological literature and Google data.

    Science.gov (United States)

    Kim, Jeongwoo; Kim, Hyunjin; Yoon, Youngmi; Park, Sanghyun

    2015-04-01

    Since the genome project in 1990s, a number of studies associated with genes have been conducted and researchers have confirmed that genes are involved in disease. For this reason, the identification of the relationships between diseases and genes is important in biology. We propose a method called LGscore, which identifies disease-related genes using Google data and literature data. To implement this method, first, we construct a disease-related gene network using text-mining results. We then extract gene-gene interactions based on co-occurrences in abstract data obtained from PubMed, and calculate the weights of edges in the gene network by means of Z-scoring. The weights contain two values: the frequency and the Google search results. The frequency value is extracted from literature data, and the Google search result is obtained using Google. We assign a score to each gene through a network analysis. We assume that genes with a large number of links and numerous Google search results and frequency values are more likely to be involved in disease. For validation, we investigated the top 20 inferred genes for five different diseases using answer sets. The answer sets comprised six databases that contain information on disease-gene relationships. We identified a significant number of disease-related genes as well as candidate genes for Alzheimer's disease, diabetes, colon cancer, lung cancer, and prostate cancer. Our method was up to 40% more accurate than existing methods. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Using cluster analysis to identify patterns in students’ responses to contextually different conceptual problems

    Directory of Open Access Journals (Sweden)

    John Stewart

    2012-10-01

    Full Text Available This study examined the evolution of student responses to seven contextually different versions of two Force Concept Inventory questions in an introductory physics course at the University of Arkansas. The consistency in answering the closely related questions evolved little over the seven-question exam. A model for the state of student knowledge involving the probability of selecting one of the multiple-choice answers was developed. Criteria for using clustering algorithms to extract model parameters were explored and it was found that the overlap between the probability distributions of the model vectors was an important parameter in characterizing the cluster models. The course data were then clustered and the extracted model showed that students largely fit into two groups both pre- and postinstruction: one that answered all questions correctly with high probability and one that selected the distracter representing the same misconception with high probability. For the course studied, 14% of the students were left with persistent misconceptions post instruction on a static force problem and 30% on a dynamic Newton’s third law problem. These students selected the answer representing the predominant misconception slightly more consistently postinstruction, indicating that the course studied had been ineffective at moving this subgroup of students nearer a Newtonian force concept and had instead moved them slightly farther away from a correct conceptual understanding of these two problems. The consistency in answering pairs of problems with varied physical contexts is shown to be an important supplementary statistic to the score on the problems and suggests that the inclusion of such problem pairs in future conceptual inventories would be efficacious. Multiple, contextually varied questions further probe the structure of students’ knowledge. To allow working instructors to make use of the additional insight gained from cluster analysis, it

  4. Rearranged Biosynthetic Gene Cluster and Synthesis of Hassallidin E in Planktothrix serta PCC 8927.

    Science.gov (United States)

    Pancrace, Claire; Jokela, Jouni; Sassoon, Nathalie; Ganneau, Christelle; Desnos-Ollivier, Marie; Wahlsten, Matti; Humisto, Anu; Calteau, Alexandra; Bay, Sylvie; Fewer, David P; Sivonen, Kaarina; Gugger, Muriel

    2017-07-21

    Cyanobacteria produce a wide range of natural products with antifungal bioactivity. The cyclic glycosylated lipopeptides of the hassallidin family have potent antifungal activity and display a great degree of chemical diversity. Here, we report the discovery of a hassallidin biosynthetic gene cluster from the filamentous cyanobacterium Planktothrix serta PCC 8927. The hassallidin gene cluster showed heavy rearrangement and marks of genomic plasticity. Nucleotide bias, differences in GC content, and phylogenetic incongruence suggested the acquisition of the hassallidin biosynthetic gene cluster in Planktothrix serta PCC 8927 by horizontal gene transfer. Chemical analyses by liquid chromatography and mass spectrometry demonstrated that this strain produced hassallidin E, a new glycosylated hassallidin variant. Hassallidin E was the only structural variant produced by Planktothrix serta PCC 8927 in all tested conditions. Further evaluated on human pathogenic fungi, hassallidin E showed an antifungal bioactivity. Hassallidin production levels correlated with nitrogen availability, in the only nitrogen-fixing Planktothrix described so far. Our results provide insights into the distribution and chemical diversity of cyanobacterial antifungal compounds as well as raise questions on their ecological relevance.

  5. Using cluster analysis of anxiety-depression to identify subgroups of prostate cancer patients for targeted treatment planning.

    Science.gov (United States)

    Sharpley, Christopher F; Bitsika, Vicki; Warren, Amelia K; Christie, David R H

    2017-11-01

    To explore any possible subgroupings of prostate cancer (PCa) patients based upon their combined anxiety-depression symptoms for the purposes of informing targeted treatments. A sample of 119 PCa patients completed the GAD7 (anxiety) and PHQ9 (depression), plus a background questionnaire, by mail survey. Data on the GAD7 and PHQ9 were used in a cluster analysis procedure to identify and define any cohesive subgroupings of patients within the sample. Three distinct clusters of patients were identified and were found to be significantly different in the severity of their GAD7 and PHQ9 responses, and also by the profile of symptoms that they exhibited. The presence of these 3 clusters of PCa patients indicates that there is a need to extend assessment of anxiety and depression in these men beyond simple total score results. By applying the clustering profiles to samples of PCa patients, more focussed treatment might be provided to them, hopefully improving outcome efficacy. Copyright © 2017 John Wiley & Sons, Ltd.

  6. Haplotype diversity of VvTFL1A gene and association with cluster traits in grapevine (V. vinifera).

    Science.gov (United States)

    Fernandez, Lucie; Le Cunff, Loïc; Tello, Javier; Lacombe, Thierry; Boursiquot, Jean Michel; Fournier-Level, Alexandre; Bravo, Gema; Lalet, Sandrine; Torregrosa, Laurent; This, Patrice; Martinez-Zapater, José Miguel

    2014-08-05

    Interaction between TERMINAL FLOWER 1 (TFL1) and LEAFY (LFY) seem to determine the inflorescence architecture in Arabidopsis. In a parallel way, overexpression of VvTFL1A, a grapevine TFL1 homolog, causes delayed flowering and production of a ramose cluster in the reiterated reproductive meristem (RRM) somatic variant of cultivar Carignan. To analyze the possible contribution of this gene to cluster phenotypic variation in a diversity panel of cultivated grapevine (Vitis vinifera L. subsp. vinifera) its nucleotide diversity was characterized and association analyses among detected sequence polymorphisms and phenology and cluster traits was carried out. A total of 3.6 kb of the VvTFL1A gene, including its promoter, was sequenced in a core collection of 140 individuals designed to maximize phenotypic variation at agronomical relevant traits. Nucleotide variation for VvTFL1A within this collection was higher in the promoter and intron sequences than in the exon regions; where few polymorphisms were located in agreement with a high conservation of coding sequence. Characterization of the VvTFL1A haplotype network identified three major haplogroups, consistent with the geographic origins and the use of the cultivars that could correspond to three major ancestral alleles or evolutionary branches, based on the existence of mutations in linkage disequilibrium. Genetic association studies with cluster traits revealed the presence of major INDEL polymorphisms, explaining 16%, 13% and 25% of flowering time, cluster width and berry weight, respectively, and also structuring the three haplogroups. At least three major VvTFL1A haplogroups are present in cultivated grapevines, which are defined by the presence of three main polymorphism LD blocks and associated to characteristic phenotypic values for flowering time, cluster width and berry size. Phenotypic differences between haplogroups are consistent with differences observed between Eastern and Western grapevine cultivars and

  7. Conserved gene clusters in bacterial genomes provide further support for the primacy of RNA

    Science.gov (United States)

    Siefert, J. L.; Martin, K. A.; Abdi, F.; Widger, W. R.; Fox, G. E.

    1997-01-01

    Five complete bacterial genome sequences have been released to the scientific community. These include four (eu)Bacteria, Haemophilus influenzae, Mycoplasma genitalium, M. pneumoniae, and Synechocystis PCC 6803, as well as one Archaeon, Methanococcus jannaschii. Features of organization shared by these genomes are likely to have arisen very early in the history of the bacteria and thus can be expected to provide further insight into the nature of early ancestors. Results of a genome comparison of these five organisms confirm earlier observations that gene order is remarkably unpreserved. There are, nevertheless, at least 16 clusters of two or more genes whose order remains the same among the four (eu)Bacteria and these are presumed to reflect conserved elements of coordinated gene expression that require gene proximity. Eight of these gene orders are essentially conserved in the Archaea as well. Many of these clusters are known to be regulated by RNA-level mechanisms in Escherichia coli, which supports the earlier suggestion that this type of regulation of gene expression may have arisen very early. We conclude that although the last common ancestor may have had a DNA genome, it likely was preceded by progenotes with an RNA genome.

  8. QServer: a biclustering server for prediction and assessment of co-expressed gene clusters.

    Directory of Open Access Journals (Sweden)

    Fengfeng Zhou

    Full Text Available BACKGROUND: Biclustering is a powerful technique for identification of co-expressed gene groups under any (unspecified substantial subset of given experimental conditions, which can be used for elucidation of transcriptionally co-regulated genes. RESULTS: We have previously developed a biclustering algorithm, QUBIC, which can solve more general biclustering problems than previous biclustering algorithms. To fully utilize the analysis power the algorithm provides, we have developed a web server, QServer, for prediction, computational validation and analyses of co-expressed gene clusters. Specifically, the QServer has the following capabilities in addition to biclustering by QUBIC: (i prediction and assessment of conserved cis regulatory motifs in promoter sequences of the predicted co-expressed genes; (ii functional enrichment analyses of the predicted co-expressed gene clusters using Gene Ontology (GO terms, and (iii visualization capabilities in support of interactive biclustering analyses. QServer supports the biclustering and functional analysis for a wide range of organisms, including human, mouse, Arabidopsis, bacteria and archaea, whose underlying genome database will be continuously updated. CONCLUSION: We believe that QServer provides an easy-to-use and highly effective platform useful for hypothesis formulation and testing related to transcription co-regulation.

  9. QServer: a biclustering server for prediction and assessment of co-expressed gene clusters.

    Science.gov (United States)

    Zhou, Fengfeng; Ma, Qin; Li, Guojun; Xu, Ying

    2012-01-01

    Biclustering is a powerful technique for identification of co-expressed gene groups under any (unspecified) substantial subset of given experimental conditions, which can be used for elucidation of transcriptionally co-regulated genes. We have previously developed a biclustering algorithm, QUBIC, which can solve more general biclustering problems than previous biclustering algorithms. To fully utilize the analysis power the algorithm provides, we have developed a web server, QServer, for prediction, computational validation and analyses of co-expressed gene clusters. Specifically, the QServer has the following capabilities in addition to biclustering by QUBIC: (i) prediction and assessment of conserved cis regulatory motifs in promoter sequences of the predicted co-expressed genes; (ii) functional enrichment analyses of the predicted co-expressed gene clusters using Gene Ontology (GO) terms, and (iii) visualization capabilities in support of interactive biclustering analyses. QServer supports the biclustering and functional analysis for a wide range of organisms, including human, mouse, Arabidopsis, bacteria and archaea, whose underlying genome database will be continuously updated. We believe that QServer provides an easy-to-use and highly effective platform useful for hypothesis formulation and testing related to transcription co-regulation.

  10. Spatial expression of Hox cluster genes in the ontogeny of a sea urchin

    Science.gov (United States)

    Arenas-Mena, C.; Cameron, A. R.; Davidson, E. H.

    2000-01-01

    The Hox cluster of the sea urchin Strongylocentrous purpuratus contains ten genes in a 500 kb span of the genome. Only two of these genes are expressed during embryogenesis, while all of eight genes tested are expressed during development of the adult body plan in the larval stage. We report the spatial expression during larval development of the five 'posterior' genes of the cluster: SpHox7, SpHox8, SpHox9/10, SpHox11/13a and SpHox11/13b. The five genes exhibit a dynamic, largely mesodermal program of expression. Only SpHox7 displays extensive expression within the pentameral rudiment itself. A spatially sequential and colinear arrangement of expression domains is found in the somatocoels, the paired posterior mesodermal structures that will become the adult perivisceral coeloms. No such sequential expression pattern is observed in endodermal, epidermal or neural tissues of either the larva or the presumptive juvenile sea urchin. The spatial expression patterns of the Hox genes illuminate the evolutionary process by which the pentameral echinoderm body plan emerged from a bilateral ancestor.

  11. Network Diffusion-Based Prioritization of Autism Risk Genes Identifies Significantly Connected Gene Modules

    Directory of Open Access Journals (Sweden)

    Ettore Mosca

    2017-09-01

    Full Text Available Autism spectrum disorder (ASD is marked by a strong genetic heterogeneity, which is underlined by the low overlap between ASD risk gene lists proposed in different studies. In this context, molecular networks can be used to analyze the results of several genome-wide studies in order to underline those network regions harboring genetic variations associated with ASD, the so-called “disease modules.” In this work, we used a recent network diffusion-based approach to jointly analyze multiple ASD risk gene lists. We defined genome-scale prioritizations of human genes in relation to ASD genes from multiple studies, found significantly connected gene modules associated with ASD and predicted genes functionally related to ASD risk genes. Most of them play a role in synapsis and neuronal development and function; many are related to syndromes that can be in comorbidity with ASD and the remaining are involved in epigenetics, cell cycle, cell adhesion and cancer.

  12. IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

    Science.gov (United States)

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita

    2015-07-14

    In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to

  13. De Novo assembly of the Japanese flounder (Paralichthys olivaceus spleen transcriptome to identify putative genes involved in immunity.

    Directory of Open Access Journals (Sweden)

    Lin Huang

    Full Text Available Japanese flounder (Paralichthys olivaceus is an economically important marine fish in Asia and has suffered from disease outbreaks caused by various pathogens, which requires more information for immune relevant genes on genome background. However, genomic and transcriptomic data for Japanese flounder remain scarce, which limits studies on the immune system of this species. In this study, we characterized the Japanese flounder spleen transcriptome using an Illumina paired-end sequencing platform to identify putative genes involved in immunity.A cDNA library from the spleen of P. olivaceus was constructed and randomly sequenced using an Illumina technique. The removal of low quality reads generated 12,196,968 trimmed reads, which assembled into 96,627 unigenes. A total of 21,391 unigenes (22.14% were annotated in the NCBI Nr database, and only 1.1% of the BLASTx top-hits matched P. olivaceus protein sequences. Approximately 12,503 (58.45% unigenes were categorized into three Gene Ontology groups, 19,547 (91.38% were classified into 26 Cluster of Orthologous Groups, and 10,649 (49.78% were assigned to six Kyoto Encyclopedia of Genes and Genomes pathways. Furthermore, 40,928 putative simple sequence repeats and 47, 362 putative single nucleotide polymorphisms were identified. Importantly, we identified 1,563 putative immune-associated unigenes that mapped to 15 immune signaling pathways.The P. olivaceus transciptome data provides a rich source to discover and identify new genes, and the immune-relevant sequences identified here will facilitate our understanding of the mechanisms involved in the immune response. Furthermore, the plentiful potential SSRs and SNPs found in this study are important resources with respect to future development of a linkage map or marker assisted breeding programs for the flounder.

  14. Gametogenesis in the Pacific Oyster Crassostrea gigas: A Microarrays-Based Analysis Identifies Sex and Stage Specific Genes

    Science.gov (United States)

    Dheilly, Nolwenn M.; Lelong, Christophe; Huvet, Arnaud; Kellner, Kristell; Dubos, Marie-Pierre; Riviere, Guillaume; Boudry, Pierre; Favrel, Pascal

    2012-01-01

    Background The Pacific oyster Crassostrea gigas (Mollusca, Lophotrochozoa) is an alternative and irregular protandrous hermaphrodite: most individuals mature first as males and then change sex several times. Little is known about genetic and phenotypic basis of sex differentiation in oysters, and little more about the molecular pathways regulating reproduction. We have recently developed and validated a microarray containing 31,918 oligomers (Dheilly et al., 2011) representing the oyster transcriptome. The application of this microarray to the study of mollusk gametogenesis should provide a better understanding of the key factors involved in sex differentiation and the regulation of oyster reproduction. Methodology/Principal Findings Gene expression was studied in gonads of oysters cultured over a yearly reproductive cycle. Principal component analysis and hierarchical clustering showed a significant divergence in gene expression patterns of males and females coinciding with the start of gonial mitosis. ANOVA analysis of the data revealed 2,482 genes differentially expressed during the course of males and/or females gametogenesis. The expression of 434 genes could be localized in either germ cells or somatic cells of the gonad by comparing the transcriptome of female gonads to the transcriptome of stripped oocytes and somatic tissues. Analysis of the annotated genes revealed conserved molecular mechanisms between mollusks and mammals: genes involved in chromatin condensation, DNA replication and repair, mitosis and meiosis regulation, transcription, translation and apoptosis were expressed in both male and female gonads. Most interestingly, early expressed male-specific genes included bindin and a dpy-30 homolog and female-specific genes included foxL2, nanos homolog 3, a pancreatic lipase related protein, cd63 and vitellogenin. Further functional analyses are now required in order to investigate their role in sex differentiation in oysters. Conclusions

  15. A strategy to identify housekeeping genes suitable for analysis in breast cancer diseases.

    Science.gov (United States)

    Tilli, Tatiana M; Castro, Cláudio da Silva; Tuszynski, Jack A; Carels, Nicolas

    2016-08-15

    The selection of suitable internal control genes is crucial for proper interpretation of real-time PCR data. Here we outline a strategy to identify housekeeping genes that could serve as suitable internal control for comparative analyses of gene expression data in breast cancer cell lines and tissues obtained by high throughput sequencing and quantitative real-time PCR (qRT-PCR). The strategy proposed includes the large-scale screening of potential candidate reference genes from RNA-seq data as well as their validation by qRT-PCR, and careful examination of reference data from the International Cancer Genome Consortium, The Cancer Genome Atlas and Gene Expression Omnibus repositories. The identified set of reference genes, also called novel housekeeping genes that includes CCSER2, SYMPK, ANKRD17 and PUM1, proved to be less variable and thus potentially more accurate for research and clinical analyses of breast cell lines and tissue samples compared to the traditional housekeeping genes used to this end. These results highlight the importance of a massive evaluation of housekeeping genes for their relevance as internal control for optimized intra- and inter-assay comparison of gene expression. We developed a strategy to identify and evaluate the significance of housekeeping genes as internal control for the intra- and inter-assay comparison of gene expression in breast cancer that could be applied to other tumor types and diseases.

  16. Zebrafish embryo screen for mycobacterial genes involved in the initiation of granuloma formation reveals a newly identified ESX-1 component.

    Science.gov (United States)

    Stoop, Esther J M; Schipper, Tim; Rosendahl Huber, Sietske K; Nezhinsky, Alexander E; Verbeek, Fons J; Gurcha, Sudagar S; Besra, Gurdyal S; Vandenbroucke-Grauls, Christina M J E; Bitter, Wilbert; van der Sar, Astrid M

    2011-07-01

    The hallmark of tuberculosis (TB) is the formation of granulomas, which are clusters of infected macrophages surrounded by additional macrophages, neutrophils and lymphocytes. Although it has long been thought that granulomas are beneficial for the host, there is evidence that mycobacteria also promote the formation of these structures. In this study, we aimed to identify new mycobacterial factors involved in the initial stages of granuloma formation. We exploited the zebrafish embryo Mycobacterium marinum infection model to study initiation of granuloma formation and developed an in vivo screen to select for random M. marinum mutants that were unable to induce granuloma formation efficiently. Upon screening 200 mutants, three mutants repeatedly initiated reduced granuloma formation. One of the mutants was found to be defective in the espL gene, which is located in the ESX-1 cluster. The ESX-1 cluster is disrupted in the Mycobacterium bovis BCG vaccine strain and encodes a specialized secretion system known to be important for granuloma formation and virulence. Although espL has not been implicated in protein secretion before, we observed a strong effect on the secretion of the ESX-1 substrates ESAT-6 and EspE. We conclude that our zebrafish embryo M. marinum screen is a useful tool to identify mycobacterial genes involved in the initial stages of granuloma formation and that we have identified a new component of the ESX-1 secretion system. We are confident that our approach will contribute to the knowledge of mycobacterial virulence and could be helpful for the development of new TB vaccines.

  17. Zebrafish embryo screen for mycobacterial genes involved in the initiation of granuloma formation reveals a newly identified ESX-1 component

    Directory of Open Access Journals (Sweden)

    Esther J. M. Stoop

    2011-07-01

    The hallmark of tuberculosis (TB is the formation of granulomas, which are clusters of infected macrophages surrounded by additional macrophages, neutrophils and lymphocytes. Although it has long been thought that granulomas are beneficial for the host, there is evidence that mycobacteria also promote the formation of these structures. In this study, we aimed to identify new mycobacterial factors involved in the initial stages of granuloma formation. We exploited the zebrafish embryo Mycobacterium marinum infection model to study initiation of granuloma formation and developed an in vivo screen to select for random M. marinum mutants that were unable to induce granuloma formation efficiently. Upon screening 200 mutants, three mutants repeatedly initiated reduced granuloma formation. One of the mutants was found to be defective in the espL gene, which is located in the ESX-1 cluster. The ESX-1 cluster is disrupted in the Mycobacterium bovis BCG vaccine strain and encodes a specialized secretion system known to be important for granuloma formation and virulence. Although espL has not been implicated in protein secretion before, we observed a strong effect on the secretion of the ESX-1 substrates ESAT-6 and EspE. We conclude that our zebrafish embryo M. marinum screen is a useful tool to identify mycobacterial genes involved in the initial stages of granuloma formation and that we have identified a new component of the ESX-1 secretion system. We are confident that our approach will contribute to the knowledge of mycobacterial virulence and could be helpful for the development of new TB vaccines.

  18. Identifying gene-gene interactions that are highly associated with Body Mass Index using Quantitative Multifactor Dimensionality Reduction (QMDR)

    NARCIS (Netherlands)

    De, Rishika; Verma, Shefali S; Drenos, Fotios; Holzinger, Emily R; Holmes, Michael V; Hall, Molly A; Crosslin, David R; Carrell, David S; Hakonarson, Hakon; Jarvik, Gail; Larson, Eric; Pacheco, Jennifer A; Rasmussen-Torvik, Laura J; Moore, Carrie B; Asselbergs, Folkert W; Moore, Jason H; Ritchie, Marylyn D; Keating, Brendan J; Gilbert-Diamond, Diane

    2015-01-01

    BACKGROUND: Despite heritability estimates of 40-70 % for obesity, less than 2 % of its variation is explained by Body Mass Index (BMI) associated loci that have been identified so far. Epistasis, or gene-gene interactions are a plausible source to explain portions of the missing heritability of

  19. Natural product proteomining, a quantitative proteomics platform, allows rapid discovery of biosynthetic gene clusters for different classes of natural products.

    Science.gov (United States)

    Gubbens, Jacob; Zhu, Hua; Girard, Geneviève; Song, Lijiang; Florea, Bogdan I; Aston, Philip; Ichinose, Koji; Filippov, Dmitri V; Choi, Young H; Overkleeft, Herman S; Challis, Gregory L; van Wezel, Gilles P

    2014-06-19

    Information on gene clusters for natural product biosynthesis is accumulating rapidly because of the current boom of available genome sequencing data. However, linking a natural product to a specific gene cluster remains challenging. Here, we present a widely applicable strategy for the identification of gene clusters for specific natural products, which we name natural product proteomining. The method is based on using fluctuating growth conditions that ensure differential biosynthesis of the bioactivity of interest. Subsequent combination of metabolomics and quantitative proteomics establishes correlations between abundance of natural products and concomitant changes in the protein pool, which allows identification of the relevant biosynthetic gene cluster. We used this approach to elucidate gene clusters for different natural products in Bacillus and Streptomyces, including a novel juglomycin-type antibiotic. Natural product proteomining does not require prior knowledge of the gene cluster or secondary metabolite and therefore represents a general strategy for identification of all types of gene clusters. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Lichen Biosynthetic Gene Clusters. Part I. Genome Sequencing Reveals a Rich Biosynthetic Potential.

    Science.gov (United States)

    Bertrand, Robert L; Abdel-Hameed, Mona; Sorensen, John L

    2018-02-27

    Lichens are symbionts of fungi and algae that produce diverse secondary metabolites with useful properties. Little is known of lichen natural product biosynthesis because of the challenges of working with lichenizing fungi. We describe the first attempt to comprehensively profile the genetic secondary metabolome of a lichenizing fungus. An Illumina platform combined with the Antibiotics and Secondary Metabolites Analysis Shell (FungiSMASH, version 4.0) was used to sequence and annotate assembled contigs of the fungal partner of Cladonia uncialis. Up to 48 putative gene clusters are described comprising type I and type III polyketide synthases (PKS), nonribosomal peptide synthetases (NRPS), hybrid PKS-NRPS, and terpene synthases. The number of gene clusters revealed by this work dwarfs the number of known secondary metabolites from C. uncialis, suggesting that lichenizing fungi have an unexplored biosynthetic potential.

  1. Molecular analysis of an inactive aflatoxin biosynthesis gene cluster in Aspergillus oryzae RIB strains.

    Science.gov (United States)

    Tominaga, Mihoko; Lee, Yun-Hae; Hayashi, Risa; Suzuki, Yuji; Yamada, Osamu; Sakamoto, Kazutoshi; Gotoh, Kuniyasu; Akita, Osamu

    2006-01-01

    To help assess the potential for aflatoxin production by Aspergillus oryzae, the structure of an aflatoxin biosynthesis gene homolog cluster in A. oryzae RIB 40 was analyzed. Although most genes in the corresponding cluster exhibited from 97 to 99% similarity to those of Aspergillus flavus, three genes shared 93% similarity or less. A 257-bp deletion in the aflT region, a frameshift mutation in norA, and a base pair substitution in verA were found in A. oryzae RIB 40. In the aflR promoter, two substitutions were found in one of the three putative AreA binding sites and in the FacB binding site. PCR primers were designed to amplify homologs of aflT, nor-1, aflR, norA, avnA, verB, and vbs and were used to detect these genes in 210 A. oryzae strains. Based on the PCR results, the A. oryzae RIB strains were classified into three groups, although most of them fell into two of the groups. Group 1, in which amplification of all seven genes was confirmed, contained 122 RIB strains (58.1% of examined strains), including RIB 40. Seventy-seven strains (36.7%) belonged to group 2, characterized by having only vbs, verB, and avnA in half of the cluster. Although slight expression of aflR was detected by reverse transcription-PCR in some group 1 strains, including RIB 40, other genes (avnA, vbs, verB, and omtA) related to aflatoxin production were not detected. aflR was not detected in group 2 strains by Southern analysis.

  2. A Global Clustering Algorithm to Identify Long Intergenic Non-Coding RNA - with Applications in Mouse Macrophages

    OpenAIRE

    Garmire, Lana X.; Garmire, David G.; Huang, Wendy; Yao, Joyee; Glass, Christopher K.; Subramaniam, Shankar

    2011-01-01

    Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global cl...

  3. A novel unsupervised method to identify genes important in the anti-viral response: application to interferon/ribavirin in hepatitis C patients.

    Directory of Open Access Journals (Sweden)

    Leonid I Brodsky

    2007-07-01

    Full Text Available Treating hepatitis C with interferon/ribavirin results in a varied response in terms of decrease in viral titer and ultimate outcome. Marked responders have a sharp decline in viral titer within a few days of treatment initiation, whereas in other patients there is no effect on the virus (poor responders. Previous studies have shown that combination therapy modifies expression of hundreds of genes in vitro and in vivo. However, identifying which, if any, of these genes have a role in viral clearance remains challenging.The goal of this paper is to link viral levels with gene expression and thereby identify genes that may be responsible for early decrease in viral titer.Microarrays were performed on RNA isolated from PBMC of patients undergoing interferon/ribavirin therapy. Samples were collected at pre-treatment (day 0, and 1, 2, 7, 14 and 28 days after initiating treatment. A novel method was applied to identify genes that are linked to a decrease in viral titer during interferon/ribavirin treatment. The method uses the relationship between inter-patient gene expression based proximities and inter-patient viral titer based proximities to define the association between microarray gene expression measurements of each gene and viral-titer measurements.We detected 36 unique genes whose expressions provide a clustering of patients that resembles viral titer based clustering of patients. These genes include IRF7, MX1, OASL and OAS2, viperin and many ISG's of unknown function.The genes identified by this method appear to play a major role in the reduction of hepatitis C virus during the early phase of treatment. The method has broad utility and can be used to analyze response to any group of factors influencing biological outcome such as antiviral drugs or anti-cancer agents where microarray data are available.

  4. Acinetobacter baumannii K27 and K44 capsular polysaccharides have the same K unit but different structures due to the presence of distinct wzy genes in otherwise closely related K gene clusters.

    Science.gov (United States)

    Shashkov, Alexander S; Kenyon, Johanna J; Senchenkova, Sof'ya N; Shneider, Mikhail M; Popova, Anastasiya V; Arbatsky, Nikolay P; Miroshnikov, Konstantin A; Volozhantsev, Nikolay V; Hall, Ruth M; Knirel, Yuriy A

    2016-05-01

    Capsular polysaccharides (CPSs), from Acinetobacter baumannii isolates 1432, 4190 and NIPH 70, which have related gene content at the K locus, were examined, and the chemical structures established using 2D(1)H and(13)C NMR spectroscopy. The three isolates produce the same pentasaccharide repeat unit, which consists of 5-N-acetyl-7-N-[(S)-3-hydroxybutanoyl] (major) or 5,7-di-N-acetyl (minor) derivatives of 5,7-diamino-3,5,7,9-tetradeoxy-D-glycero-D-galacto-non-2-ulosonic (legionaminic) acid (Leg5Ac7R), D-galactose, N-acetyl-D-galactosamine and N-acetyl-D-glucosamine. However, the linkage between repeat units in NIPH 70 was different to that in 1432 and 4190, and this significantly alters the CPS structure. The KL27 gene cluster in 4190 and KL44 gene cluster in NIPH 70 are organized identically and contain lga genes for Leg5Ac7R synthesis, genes for the synthesis of the common sugars, as well as anitrA2 initiating transferase and four glycosyltransferases genes. They share high-level nucleotide sequence identity for corresponding genes, but differ in the wzy gene encoding the Wzy polymerase. The Wzy proteins, which have different lengths and share no similarity, would form the unrelated linkages in the K27 and K44 structures. The linkages formed by the four shared glycosyltransferases were predicted by comparison with gene clusters that synthesize related structures. These findings unambiguously identify the linkages formed by WzyK27 and WzyK44, and show that the presence of different wzy genes in otherwise closely related K gene clusters changes the structure of the CPS. This may affect its capacity as a protective barrier for A. baumannii. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  5. Molecular typing using polymorphisms of the polyketide synthase gene (PKS1) of strains in Japan morphologically identified as Fonsecaea pedrosoi.

    Science.gov (United States)

    Ushigami, Tsuyoshi; Anzawa, Kazushi; Mochizuki, Takashi

    2017-01-01

    Fonsecaea pedrosoi sensu lato is a major causative agent of dematiaceous fungal infection in Japan. Recent sequence analysis of the internal transcribed spacer (ITS) regions of the ribosomal RNA gene has shown that this species can be separated into three species: F. pedrosoi sensu stricto, F. monophora and F. nubica. The cell walls of dematiaceous fungi including the genus Fonsecaea contain melanin, which is important for their virulence. Polyketide synthase (PKS1) is an enzyme required for melanin synthesis. This study analyzed the phylogeny of strains of F. pedrosoi sensu lato isolated in Japan by sequencing the PKS1 gene and ITS regions and identifying molecular polymorphism. Sixty strains morphologically identified as F. pedrosoi isolated worldwide, including 37 strains isolated in Japan, were analyzed. ITS regions of the ribosomal RNA gene and part of the PKS1 gene region were amplified, yielding sequences of approximately 600 and 450 bp, respectively. Polymerase chain reaction products were sequenced, and cluster analysis was performed. The proposed phylogenetic tree based on PKS1 sequences closely matched that based on the ITS regions. Sequencing of both regions showed that the isolates from Japan belonged to the clade of F. monophora. Molecular variations of these Japanese strains were evaluated by assessing both ITS and PKS1 sequences. The 37 isolates could be divided into at least seven molecular subtypes. The combination of these two molecular markers provides a most robust method for intraspecies subtyping and further epidemiological study of F. monophora. © 2016 Japanese Dermatological Association.

  6. Genome Comparison of Erythromycin Resistant Campylobacter from Turkeys Identifies Hosts and Pathways for Horizontal Spread of erm(B Genes

    Directory of Open Access Journals (Sweden)

    Diego Florez-Cuadrado

    2017-11-01

    Full Text Available Pathogens in the genus Campylobacter are the most common cause of food-borne bacterial gastro-enteritis. Campylobacteriosis, caused principally by Campylobacter jejuni and Campylobacter coli, is transmitted to humans by food of animal origin, especially poultry. As for many pathogens, antimicrobial resistance in Campylobacter is increasing at an alarming rate. Erythromycin prescription is the treatment of choice for clinical cases requiring antimicrobial therapy but this is compromised by mobility of the erythromycin resistance gene erm(B between strains. Here, we evaluate resistance to six antimicrobials in 170 Campylobacter isolates (133 C. coli and 37 C. jejuni from turkeys. Erythromycin resistant isolates (n = 85; 81 C. coli and 4 C. jejuni were screened for the presence of the erm(B gene, that has not previously been identified in isolates from turkeys. The genomes of two positive C. coli isolates were sequenced and in both isolates the erm(B gene clustered with resistance determinants against aminoglycosides plus tetracycline, including aad9, aadE, aph(2″-IIIa, aph(3′-IIIa, and tet(O genes. Comparative genomic analysis identified identical erm(B sequences among Campylobacter from turkeys, Streptococcus suis from pigs and Enterococcus faecium and Clostridium difficile from humans. This is consistent with multiple horizontal transfer events among different bacterial species colonizing turkeys. This example highlights the potential for dissemination of antimicrobial resistance across bacterial species boundaries which may compromise their effectiveness in antimicrobial therapy.

  7. Genetic clusters and sex-biased gene flow in a unicolonial Formica ant

    Directory of Open Access Journals (Sweden)

    Chapuisat Michel

    2009-03-01

    Full Text Available Abstract Background Animal societies are diverse, ranging from small family-based groups to extraordinarily large social networks in which many unrelated individuals interact. At the extreme of this continuum, some ant species form unicolonial populations in which workers and queens can move among multiple interconnected nests without eliciting aggression. Although unicoloniality has been mostly studied in invasive ants, it also occurs in some native non-invasive species. Unicoloniality is commonly associated with very high queen number, which may result in levels of relatedness among nestmates being so low as to raise the question of the maintenance of altruism by kin selection in such systems. However, the actual relatedness among cooperating individuals critically depends on effective dispersal and the ensuing pattern of genetic structuring. In order to better understand the evolution of unicoloniality in native non-invasive ants, we investigated the fine-scale population genetic structure and gene flow in three unicolonial populations of the wood ant F. paralugubris. Results The analysis of geo-referenced microsatellite genotypes and mitochondrial haplotypes revealed the presence of cryptic clusters of genetically-differentiated nests in the three populations of F. paralugubris. Because of this spatial genetic heterogeneity, members of the same clusters were moderately but significantly related. The comparison of nuclear (microsatellite and mitochondrial differentiation indicated that effective gene flow was male-biased in all populations. Conclusion The three unicolonial populations exhibited male-biased and mostly local gene flow. The high number of queens per nest, exchanges among neighbouring nests and restricted long-distance gene flow resulted in large clusters of genetically similar nests. The positive relatedness among clustermates suggests that kin selection may still contribute to the maintenance of altruism in unicolonial

  8. Comparison of Expression of Secondary Metabolite Biosynthesis Cluster Genes in Aspergillus flavus, A. parasiticus, and A. oryzae

    Directory of Open Access Journals (Sweden)

    Kenneth C. Ehrlich

    2014-06-01

    Full Text Available Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in