WorldWideScience

Sample records for gene set enrichment

  1. A general modular framework for gene set enrichment analysis

    Directory of Open Access Journals (Sweden)

    Strimmer Korbinian

    2009-02-01

    Full Text Available Abstract Background Analysis of microarray and other high-throughput data on the basis of gene sets, rather than individual genes, is becoming more important in genomic studies. Correspondingly, a large number of statistical approaches for detecting gene set enrichment have been proposed, but both the interrelations and the relative performance of the various methods are still very much unclear. Results We conduct an extensive survey of statistical approaches for gene set analysis and identify a common modular structure underlying most published methods. Based on this finding we propose a general framework for detecting gene set enrichment. This framework provides a meta-theory of gene set analysis that not only helps to gain a better understanding of the relative merits of each embedded approach but also facilitates a principled comparison and offers insights into the relative interplay of the methods. Conclusion We use this framework to conduct a computer simulation comparing 261 different variants of gene set enrichment procedures and to analyze two experimental data sets. Based on the results we offer recommendations for best practices regarding the choice of effective procedures for gene set enrichment analysis.

  2. ChIP-Enrich: gene set enrichment testing for ChIP-seq data.

    Science.gov (United States)

    Welch, Ryan P; Lee, Chee; Imbriano, Paul M; Patil, Snehal; Weymouth, Terry E; Smith, R Alex; Scott, Laura J; Sartor, Maureen A

    2014-07-01

    Gene set enrichment testing can enhance the biological interpretation of ChIP-seq data. Here, we develop a method, ChIP-Enrich, for this analysis which empirically adjusts for gene locus length (the length of the gene body and its surrounding non-coding sequence). Adjustment for gene locus length is necessary because it is often positively associated with the presence of one or more peaks and because many biologically defined gene sets have an excess of genes with longer or shorter gene locus lengths. Unlike alternative methods, ChIP-Enrich can account for the wide range of gene locus length-to-peak presence relationships (observed in ENCODE ChIP-seq data sets). We show that ChIP-Enrich has a well-calibrated type I error rate using permuted ENCODE ChIP-seq data sets; in contrast, two commonly used gene set enrichment methods, Fisher's exact test and the binomial test implemented in Genomic Regions Enrichment of Annotations Tool (GREAT), can have highly inflated type I error rates and biases in ranking. We identify DNA-binding proteins, including CTCF, JunD and glucocorticoid receptor α (GRα), that show different enrichment patterns for peaks closer to versus further from transcription start sites. We also identify known and potential new biological functions of GRα. ChIP-Enrich is available as a web interface (http://chip-enrich.med.umich.edu) and Bioconductor package. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. The limitations of simple gene set enrichment analysis assuming gene independence.

    Science.gov (United States)

    Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

    2016-02-01

    Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.

  4. Zebrafish Expression Ontology of Gene Sets (ZEOGS): a tool to analyze enrichment of zebrafish anatomical terms in large gene sets.

    Science.gov (United States)

    Prykhozhij, Sergey V; Marsico, Annalisa; Meijsing, Sebastiaan H

    2013-09-01

    The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene expression

  5. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data.

    Science.gov (United States)

    Lewin, Alex; Grieve, Ian C

    2006-10-03

    Gene Ontology (GO) terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  6. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data

    Directory of Open Access Journals (Sweden)

    Grieve Ian C

    2006-10-01

    Full Text Available Abstract Background Gene Ontology (GO terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. Results We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Conclusion Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  7. Gene set of nuclear-encoded mitochondrial regulators is enriched for common inherited variation in obesity.

    Directory of Open Access Journals (Sweden)

    Nadja Knoll

    Full Text Available There are hints of an altered mitochondrial function in obesity. Nuclear-encoded genes are relevant for mitochondrial function (3 gene sets of known relevant pathways: (1 16 nuclear regulators of mitochondrial genes, (2 91 genes for oxidative phosphorylation and (3 966 nuclear-encoded mitochondrial genes. Gene set enrichment analysis (GSEA showed no association with type 2 diabetes mellitus in these gene sets. Here we performed a GSEA for the same gene sets for obesity. Genome wide association study (GWAS data from a case-control approach on 453 extremely obese children and adolescents and 435 lean adult controls were used for GSEA. For independent confirmation, we analyzed 705 obesity GWAS trios (extremely obese child and both biological parents and a population-based GWAS sample (KORA F4, n = 1,743. A meta-analysis was performed on all three samples. In each sample, the distribution of significance levels between the respective gene set and those of all genes was compared using the leading-edge-fraction-comparison test (cut-offs between the 50(th and 95(th percentile of the set of all gene-wise corrected p-values as implemented in the MAGENTA software. In the case-control sample, significant enrichment of associations with obesity was observed above the 50(th percentile for the set of the 16 nuclear regulators of mitochondrial genes (p(GSEA,50 = 0.0103. This finding was not confirmed in the trios (p(GSEA,50 = 0.5991, but in KORA (p(GSEA,50 = 0.0398. The meta-analysis again indicated a trend for enrichment (p(MAGENTA,50 = 0.1052, p(MAGENTA,75 = 0.0251. The GSEA revealed that weak association signals for obesity might be enriched in the gene set of 16 nuclear regulators of mitochondrial genes.

  8. Gene set of nuclear-encoded mitochondrial regulators is enriched for common inherited variation in obesity.

    Science.gov (United States)

    Knoll, Nadja; Jarick, Ivonne; Volckmar, Anna-Lena; Klingenspor, Martin; Illig, Thomas; Grallert, Harald; Gieger, Christian; Wichmann, Heinz-Erich; Peters, Annette; Hebebrand, Johannes; Scherag, André; Hinney, Anke

    2013-01-01

    There are hints of an altered mitochondrial function in obesity. Nuclear-encoded genes are relevant for mitochondrial function (3 gene sets of known relevant pathways: (1) 16 nuclear regulators of mitochondrial genes, (2) 91 genes for oxidative phosphorylation and (3) 966 nuclear-encoded mitochondrial genes). Gene set enrichment analysis (GSEA) showed no association with type 2 diabetes mellitus in these gene sets. Here we performed a GSEA for the same gene sets for obesity. Genome wide association study (GWAS) data from a case-control approach on 453 extremely obese children and adolescents and 435 lean adult controls were used for GSEA. For independent confirmation, we analyzed 705 obesity GWAS trios (extremely obese child and both biological parents) and a population-based GWAS sample (KORA F4, n = 1,743). A meta-analysis was performed on all three samples. In each sample, the distribution of significance levels between the respective gene set and those of all genes was compared using the leading-edge-fraction-comparison test (cut-offs between the 50(th) and 95(th) percentile of the set of all gene-wise corrected p-values) as implemented in the MAGENTA software. In the case-control sample, significant enrichment of associations with obesity was observed above the 50(th) percentile for the set of the 16 nuclear regulators of mitochondrial genes (p(GSEA,50) = 0.0103). This finding was not confirmed in the trios (p(GSEA,50) = 0.5991), but in KORA (p(GSEA,50) = 0.0398). The meta-analysis again indicated a trend for enrichment (p(MAGENTA,50) = 0.1052, p(MAGENTA,75) = 0.0251). The GSEA revealed that weak association signals for obesity might be enriched in the gene set of 16 nuclear regulators of mitochondrial genes.

  9. Gene set enrichment analysis for non-monotone association and multiple experimental categories

    OpenAIRE

    Heinloth Alexandra N; Irwin Richard D; Dai Shuangshuang; Lin Rongheng; Boorman Gary A; Li Leping

    2008-01-01

    Abstract Background Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statis...

  10. Gene-based analysis of regionally enriched cortical genes in GWAS data sets of cognitive traits and psychiatric disorders.

    Directory of Open Access Journals (Sweden)

    Kari M Ersland

    Full Text Available BACKGROUND: Despite its estimated high heritability, the genetic architecture leading to differences in cognitive performance remains poorly understood. Different cortical regions play important roles in normal cognitive functioning and impairment. Recently, we reported on sets of regionally enriched genes in three different cortical areas (frontomedial, temporal and occipital cortices of the adult rat brain. It has been suggested that genes preferentially, or specifically, expressed in one region or organ reflect functional specialisation. Employing a gene-based approach to the analysis, we used the regionally enriched cortical genes to mine a genome-wide association study (GWAS of the Norwegian Cognitive NeuroGenetics (NCNG sample of healthy adults for association to nine psychometric tests measures. In addition, we explored GWAS data sets for the serious psychiatric disorders schizophrenia (SCZ (n = 3 samples and bipolar affective disorder (BP (n = 3 samples, to which cognitive impairment is linked. PRINCIPAL FINDINGS: At the single gene level, the temporal cortex enriched gene RAR-related orphan receptor B (RORB showed the strongest overall association, namely to a test of verbal intelligence (Vocabulary, P = 7.7E-04. We also applied gene set enrichment analysis (GSEA to test the candidate genes, as gene sets, for enrichment of association signal in the NCNG GWAS and in GWASs of BP and of SCZ. We found that genes differentially expressed in the temporal cortex showed a significant enrichment of association signal in a test measure of non-verbal intelligence (Reasoning in the NCNG sample. CONCLUSION: Our gene-based approach suggests that RORB could be involved in verbal intelligence differences, while the genes enriched in the temporal cortex might be important to intellectual functions as measured by a test of reasoning in the healthy population. These findings warrant further replication in independent samples on cognitive traits.

  11. The Core Mouse Response to Infection by Neospora Caninum Defined by Gene Set Enrichment Analyses

    Science.gov (United States)

    Ellis, John; Goodswen, Stephen; Kennedy, Paul J; Bush, Stephen

    2012-01-01

    In this study, the BALB/c and Qs mouse responses to infection by the parasite Neospora caninum were investigated in order to identify host response mechanisms. Investigation was done using gene set (enrichment) analyses of microarray data. GSEA, MANOVA, Romer, subGSE and SAM-GS were used to study the contrasts Neospora strain type, Mouse type (BALB/c and Qs) and time post infection (6 hours post infection and 10 days post infection). The analyses show that the major signal in the core mouse response to infection is from time post infection and can be defined by gene ontology terms Protein Kinase Activity, Cell Proliferation and Transcription Initiation. Several terms linked to signaling, morphogenesis, response and fat metabolism were also identified. At 10 days post infection, genes associated with fatty acid metabolism were identified as up regulated in expression. The value of gene set (enrichment) analyses in the analysis of microarray data is discussed. PMID:23012496

  12. Associations between DNA methylation and schizophrenia-related intermediate phenotypes - a gene set enrichment analysis.

    Science.gov (United States)

    Hass, Johanna; Walton, Esther; Wright, Carrie; Beyer, Andreas; Scholz, Markus; Turner, Jessica; Liu, Jingyu; Smolka, Michael N; Roessner, Veit; Sponheim, Scott R; Gollub, Randy L; Calhoun, Vince D; Ehrlich, Stefan

    2015-06-03

    Multiple genetic approaches have identified microRNAs as key effectors in psychiatric disorders as they post-transcriptionally regulate expression of thousands of target genes. However, their role in specific psychiatric diseases remains poorly understood. In addition, epigenetic mechanisms such as DNA methylation, which affect the expression of both microRNAs and coding genes, are critical for our understanding of molecular mechanisms in schizophrenia. Using clinical, imaging, genetic, and epigenetic data of 103 patients with schizophrenia and 111 healthy controls of the Mind Clinical Imaging Consortium (MCIC) study of schizophrenia, we conducted gene set enrichment analysis to identify markers for schizophrenia-associated intermediate phenotypes. Genes were ranked based on the correlation between DNA methylation patterns and each phenotype, and then searched for enrichment in 221 predicted microRNA target gene sets. We found the predicted hsa-miR-219a-5p target gene set to be significantly enriched for genes (EPHA4, PKNOX1, ESR1, among others) whose methylation status is correlated with hippocampal volume independent of disease status. Our results were strengthened by significant associations between hsa-miR-219a-5p target gene methylation patterns and hippocampus-related neuropsychological variables. IPA pathway analysis of the respective predicted hsa-miR-219a-5p target genes revealed associated network functions in behavior and developmental disorders. Altered methylation patterns of predicted hsa-miR-219a-5p target genes are associated with a structural aberration of the brain that has been proposed as a possible biomarker for schizophrenia. The (dys)regulation of microRNA target genes by epigenetic mechanisms may confer additional risk for developing psychiatric symptoms. Further study is needed to understand possible interactions between microRNAs and epigenetic changes and their impact on risk for brain-based disorders such as schizophrenia.

  13. Gowinda: unbiased analysis of gene set enrichment for genome-wide association studies.

    Science.gov (United States)

    Kofler, Robert; Schlötterer, Christian

    2012-08-01

    An analysis of gene set [e.g. Gene Ontology (GO)] enrichment assumes that all genes are sampled independently from each other with the same probability. These assumptions are violated in genome-wide association (GWA) studies since (i) longer genes typically have more single-nucleotide polymorphisms resulting in a higher probability of being sampled and (ii) overlapping genes are sampled in clusters. Herein, we introduce Gowinda, a software specifically designed to test for enrichment of gene sets in GWA studies. We show that GO tests on GWA data could result in a substantial number of false-positive GO terms. Permutation tests implemented in Gowinda eliminate these biases, but maintain sufficient power to detect enrichment of GO terms. Since sufficient resolution for large datasets requires millions of permutations, we use multi-threading to keep computation times reasonable. Gowinda is implemented in Java (v1.6) and freely available on http://code.google.com/p/gowinda/ christian.schloetterer@vetmeduni.ac.at Manual: http://code.google.com/p/gowinda/wiki/Manual. Test data and tutorial: http://code.google.com/p/gowinda/wiki/Tutorial. http://code.google.com/p/gowinda/wiki/VALIDATION.

  14. Identification of a set of genes showing regionally enriched expression in the mouse brain

    Directory of Open Access Journals (Sweden)

    Marra Marco A

    2008-07-01

    Full Text Available Abstract Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters ( Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression.

  15. Gene-Based Analysis of Regionally Enriched Cortical Genes in GWAS Data Sets of Cognitive Traits and Psychiatric Disorders

    DEFF Research Database (Denmark)

    Ersland, Kari M; Christoforou, Andrea; Stansberg, Christine;

    2012-01-01

    the regionally enriched cortical genes to mine a genome-wide association study (GWAS) of the Norwegian Cognitive NeuroGenetics (NCNG) sample of healthy adults for association to nine psychometric tests measures. In addition, we explored GWAS data sets for the serious psychiatric disorders schizophrenia (SCZ) (n......Despite its estimated high heritability, the genetic architecture leading to differences in cognitive performance remains poorly understood. Different cortical regions play important roles in normal cognitive functioning and impairment. Recently, we reported on sets of regionally enriched genes...... in three different cortical areas (frontomedial, temporal and occipital cortices) of the adult rat brain. It has been suggested that genes preferentially, or specifically, expressed in one region or organ reflect functional specialisation. Employing a gene-based approach to the analysis, we used...

  16. WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis.

    Science.gov (United States)

    Glez-Peña, Daniel; Gómez-López, Gonzalo; Pisano, David G; Fdez-Riverola, Florentino

    2009-07-01

    WhichGenes is a web-based interactive gene set building tool offering a very simple interface to extract always-updated gene lists from multiple databases and unstructured biological data sources. While the user can specify new gene sets of interest by following a simple four-step wizard, the tool is able to run several queries in parallel. Every time a new set is generated, it is automatically added to the private gene-set cart and the user is notified by an e-mail containing a direct link to the new set stored in the server. WhichGenes provides functionalities to edit, delete and rename existing sets as well as the capability of generating new ones by combining previous existing sets (intersection, union and difference operators). The user can export his sets configuring the output format and selecting among multiple gene identifiers. In addition to the user-friendly environment, WhichGenes allows programmers to access its functionalities in a programmatic way through a Representational State Transfer web service. WhichGenes front-end is freely available at http://www.whichgenes.org/, WhichGenes API is accessible at http://www.whichgenes.org/api/.

  17. The Schizophrenia-Associated BRD1 Gene Regulates Behavior, Neurotransmission, and Expression of Schizophrenia Risk Enriched Gene Sets in Mice

    DEFF Research Database (Denmark)

    Qvist, Per; Christensen, Jane Hvarregaard; Vardya, Irina;

    2016-01-01

    BACKGROUND: The schizophrenia-associated BRD1 gene encodes a transcriptional regulator whose comprehensive chromatin interactome is enriched with schizophrenia risk genes. However, the biology underlying the disease association of BRD1 remains speculative. METHODS: This study assessed......-inhibition imbalances involving loss of parvalbumin immunoreactive interneurons. RNA-sequencing analyses of cortical and striatal micropunches from Brd1(+/-) and wild-type mice revealed differential expression of genes enriched for schizophrenia risk, including several schizophrenia genome-wide association study risk...... the transcriptional drive of a schizophrenia-associated BRD1 risk variant in vitro. Accordingly, to examine the effects of reduced Brd1 expression, we generated a genetically modified Brd1(+/-) mouse and subjected it to behavioral, electrophysiological, molecular, and integrative genomic analyses with focus...

  18. Genetic network and gene set enrichment analysis to identify biomarkers related to cigarette smoking and lung cancer.

    Science.gov (United States)

    Fang, Xiaocong; Netzer, Michael; Baumgartner, Christian; Bai, Chunxue; Wang, Xiangdong

    2013-02-01

    Cigarette smoking is the most demonstrated risk factor for the development of lung cancer, while the related genetic mechanisms are still unclear. The preprocessed microarray expression dataset was downloaded from Gene Expression Omnibus database. Samples were classified according to the disease state, stage and smoking state. A new computational strategy was applied for the identification and biological interpretation of new candidate genes in lung cancer and smoking by coupling a network-based approach with gene set enrichment analysis. Network analysis was performed by pair-wise comparison according to the disease states (tumor or normal), smoking states (current smokers or nonsmokers or former smokers), or the disease stage (stages I-IV). The most activated metabolic pathways were identified by gene set enrichment analysis. Panels of top ranked gene candidates in smoking or cancer development were identified, including genes involved in cell proliferation and drug metabolism like cytochrome P450 and WW domain containing transcription regulator 1. Semaphorin 5A and protein phosphatase 1F are the common genes represented as major hubs in both the smoking and cancer related network. Six pathways, e.g. cell cycle, DNA replication, RNA transport, protein processing in endoplasmic reticulum, vascular smooth muscle contraction and endocytosis were commonly involved in smoking and lung cancer when comparing the top ten selected pathways. New approach of bioinformatics for biomarker identification and validation can probe into deep genetic relationships between cigarette smoking and lung cancer. Our studies indicate that disease-specific network biomarkers, interaction between genes/proteins, or cross-talking of pathways provide more specific values for the development of precision therapies for lung. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.

    Science.gov (United States)

    Hundt, Christian; Hildebrandt, Andreas; Schmidt, Bertil

    2016-09-23

    Gene Set Enrichment Analysis (GSEA) is a popular method to reveal significant dependencies between predefined sets of gene symbols and observed phenotypes by evaluating the deviation of gene expression values between cases and controls. An established measure of inter-class deviation, the enrichment score, is usually computed using a weighted running sum statistic over the whole set of gene symbols. Due to the lack of analytic expressions the significance of enrichment scores is determined using a non-parametric estimation of their null distribution by permuting the phenotype labels of the probed patients. Accordingly, GSEA is a time-consuming task due to the large number of required permutations to accurately estimate the nominal p-value - a circumstance that is even more pronounced during multiple hypothesis testing since its estimate is lower-bounded by the inverse number of samples in permutation space. We present rapidGSEA - a software suite consisting of two tools for facilitating permutation-based GSEA: cudaGSEA and ompGSEA. cudaGSEA is a CUDA-accelerated tool using fine-grained parallelization schemes on massively parallel architectures while ompGSEA is a coarse-grained multi-threaded tool for multi-core CPUs. Nominal p-value estimation of 4,725 gene sets on a data set consisting of 20,639 unique gene symbols and 200 patients (183 cases + 17 controls) each probing one million permutations takes 19 hours on a Xeon CPU and less than one hour on a GeForce Titan X GPU while the established GSEA tool from the Broad Institute (broadGSEA) takes roughly 13 days. cudaGSEA outperforms broadGSEA by around two orders-of-magnitude on a single Tesla K40c or GeForce Titan X GPU. ompGSEA provides around one order-of-magnitude speedup to broadGSEA on a standard Xeon CPU. The rapidGSEA suite is open-source software and can be downloaded at https://github.com/gravitino/cudaGSEA as standalone application or package for the R framework.

  20. Gene set enrichment analysis and ingenuity pathway analysis of metastatic clear cell renal cell carcinoma cell line.

    Science.gov (United States)

    Khan, Mohammed I; Dębski, Konrad J; Dabrowski, Michał; Czarnecka, Anna M; Szczylik, Cezary

    2016-08-01

    In recent years, genome-wide RNA expression analysis has become a routine tool that offers a great opportunity to study and understand the key role of genes that contribute to carcinogenesis. Various microarray platforms and statistical approaches can be used to identify genes that might serve as prognostic biomarkers and be developed as antitumor therapies in the future. Metastatic renal cell carcinoma (mRCC) is a serious, life-threatening disease, and there are few treatment options for patients. In this study, we performed one-color microarray gene expression (4×44K) analysis of the mRCC cell line Caki-1 and the healthy kidney cell line ASE-5063. A total of 1,921 genes were differentially expressed in the Caki-1 cell line (1,023 upregulated and 898 downregulated). Gene Set Enrichment Analysis (GSEA) and Ingenuity Pathway Analysis (IPA) approaches were used to analyze the differential-expression data. The objective of this research was to identify complex biological changes that occur during metastatic development using Caki-1 as a model mRCC cell line. Our data suggest that there are multiple deregulated pathways associated with metastatic clear cell renal cell carcinoma (mccRCC), including integrin-linked kinase (ILK) signaling, leukocyte extravasation signaling, IGF-I signaling, CXCR4 signaling, and phosphoinositol 3-kinase/AKT/mammalian target of rapamycin signaling. The IPA upstream analysis predicted top transcriptional regulators that are either activated or inhibited, such as estrogen receptors, TP53, KDM5B, SPDEF, and CDKN1A. The GSEA approach was used to further confirm enriched pathway data following IPA.

  1. Phospholipase C isozymes are deregulated in colorectal cancer--insights gained from gene set enrichment analysis of the transcriptome.

    Directory of Open Access Journals (Sweden)

    Stine A Danielsen

    Full Text Available Colorectal cancer (CRC is one of the most common cancer types in developed countries. To identify molecular networks and biological processes that are deregulated in CRC compared to normal colonic mucosa, we applied Gene Set Enrichment Analysis to two independent transcriptome datasets, including a total of 137 CRC and ten normal colonic mucosa samples. Eighty-two gene sets as described by the Kyoto Encyclopedia of Genes and Genomes database had significantly altered gene expression in both datasets. These included networks associated with cell division, DNA maintenance, and metabolism. Among signaling pathways with known changes in key genes, the "Phosphatidylinositol signaling network", comprising part of the PI3K pathway, was found deregulated. The downregulated genes in this pathway included several members of the Phospholipase C protein family, and the reduced expression of two of these, PLCD1 and PLCE1, were successfully validated in CRC biopsies (n = 70 and cell lines (n = 19 by quantitative analyses. The repression of both genes was found associated with KRAS mutations (P = 0.005 and 0.006, respectively, and we observed that microsatellite stable carcinomas with reduced PLCD1 expression more frequently had TP53 mutations (P = 0.002. Promoter methylation analyses of PLCD1 and PLCE1 performed in cell lines and tumor biopsies revealed that methylation of PLCD1 can contribute to reduced expression in 40% of the microsatellite instable carcinomas. In conclusion, we have identified significantly deregulated pathways in CRC, and validated repression of PLCD1 and PLCE1 expression. This illustrates that the GSEA approach may guide discovery of novel biomarkers in cancer.

  2. ABAEnrichment: an R package to test for gene set expression enrichment in the adult and developing human brain.

    Science.gov (United States)

    Grote, Steffi; Prüfer, Kay; Kelso, Janet; Dannemann, Michael

    2016-10-15

    We present ABAEnrichment, an R package that tests for expression enrichment in specific brain regions at different developmental stages using expression information gathered from multiple regions of the adult and developing human brain, together with ontologically organized structural information about the brain, both provided by the Allen Brain Atlas. We validate ABAEnrichment by successfully recovering the origin of gene sets identified in specific brain cell-types and developmental stages. ABAEnrichment was implemented as an R package and is available under GPL (≥ 2) from the Bioconductor website (http://bioconductor.org/packages/3.3/bioc/html/ABAEnrichment.html). steffi_grote@eva.mpg.de, kelso@eva.mpg.de or michael_dannemann@eva.mpg.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  3. Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome

    Directory of Open Access Journals (Sweden)

    Gaora Peadar Ó

    2010-10-01

    Full Text Available Abstract Background Currently, a number of bioinformatics methods are available to generate appropriate lists of genes from a microarray experiment. While these lists represent an accurate primary analysis of the data, fewer options exist to contextualise those lists. The development and validation of such methods is crucial to the wider application of microarray technology in the clinical setting. Two key challenges in clinical bioinformatics involve appropriate statistical modelling of dynamic transcriptomic changes, and extraction of clinically relevant meaning from very large datasets. Results Here, we apply an approach to gene set enrichment analysis that allows for detection of bi-directional enrichment within a gene set. Furthermore, we apply canonical correlation analysis and Fisher's exact test, using plasma marker data with known clinical relevance to aid identification of the most important gene and pathway changes in our transcriptomic dataset. After a 28-day dietary intervention with high-CLA beef, a range of plasma markers indicated a marked improvement in the metabolic health of genetically obese mice. Tissue transcriptomic profiles indicated that the effects were most dramatic in liver (1270 genes significantly changed; p Conclusion Bi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of

  4. Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.

    Science.gov (United States)

    Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy

    2013-08-01

    Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association

  5. Whole-Transcriptome RNA-seq, Gene Set Enrichment Pathway Analysis, and Exon Coverage Analysis of Two Plastid RNA Editing Mutants.

    Science.gov (United States)

    Hackett, Justin B; Lu, Yan

    2017-04-07

    In land plants, plastid and mitochondrial RNAs are subject to post-transcriptional C-to-U RNA editing. T-DNA insertions in the ORGANELLE RNA RECOGNITION MOTIF PROTEIN6 gene resulted in reduced photosystem II (PSII) activity and smaller plant and leaf sizes. Exon coverage analysis of the ORRM6 gene showed that orrm6-1 and orrm6-2 are loss-of-function mutants. Compared to other ORRM proteins, ORRM6 affects a relative small number of RNA editing sites. Sanger sequencing of reverse transcription-PCR products of plastid transcripts revealed two plastid RNA editing sites that are substantially affected in the orrm6 mutants: psbF-C77 and accD-C794. The psbF gene encodes the beta subunit of cytochrome b559, an essential component of PSII. The accD gene encodes the beta subunit of acetyl-CoA carboxylase, a protein required in plastid fatty acid biosynthesis. Whole-transcriptome RNA-seq demonstrated that editing at psbF-C77 is nearly absent and the editing extent at accD-C794 was significantly reduced. Gene set enrichment pathway analysis showed that expression of multiple gene sets involved in photosynthesis, especially photosynthetic electron transport, is significantly up-regulated in both orrm6 mutants. The up-regulation could be a mechanism to compensate for the reduced PSII electron transport rate in the orrm6 mutants. These results further demonstrated that Organelle RNA Recognition Motif protein ORRM6 is required in editing of specific RNAs in the Arabidopsis (Arabidopsis thaliana) plastid.

  6. Integrative set enrichment testing for multiple omics platforms

    Directory of Open Access Journals (Sweden)

    Poisson Laila M

    2011-11-01

    Full Text Available Abstract Background Enrichment testing assesses the overall evidence of differential expression behavior of the elements within a defined set. When we have measured many molecular aspects, e.g. gene expression, metabolites, proteins, it is desirable to assess their differential tendencies jointly across platforms using an integrated set enrichment test. In this work we explore the properties of several methods for performing a combined enrichment test using gene expression and metabolomics as the motivating platforms. Results Using two simulation models we explored the properties of several enrichment methods including two novel methods: the logistic regression 2-degree of freedom Wald test and the 2-dimensional permutation p-value for the sum-of-squared statistics test. In relation to their univariate counterparts we find that the joint tests can improve our ability to detect results that are marginal univariately. We also find that joint tests improve the ranking of associated pathways compared to their univariate counterparts. However, there is a risk of Type I error inflation with some methods and self-contained methods lose specificity when the sets are not representative of underlying association. Conclusions In this work we show that consideration of data from multiple platforms, in conjunction with summarization via a priori pathway information, leads to increased power in detection of genomic associations with phenotypes.

  7. Witnessing stressful events induces glutamatergic synapse pathway alterations and gene set enrichment of positive EPSP regulation within the VTA of adult mice: An ontology based approach

    Science.gov (United States)

    Brewer, Jacob S.

    It is well known that exposure to severe stress increases the risk for developing mood disorders. Currently, the neurobiological and genetic mechanisms underlying the functional effects of psychological stress are poorly understood. Presenting a major obstacle to the study of psychological stress is the inability of current animal models of stress to distinguish between physical and psychological stressors. A novel paradigm recently developed by Warren et al., is able to tease apart the effects of physical and psychological stress in adult mice by allowing these mice to "witness," the social defeat of another mouse thus removing confounding variables associated with physical stressors. Using this 'witness' model of stress and RNA-Seq technology, the current study aims to study the genetic effects of psychological stress. After, witnessing the social defeat of another mouse, VTA tissue was extracted, sequenced, and analyzed for differential expression. Since genes often work together in complex networks, a pathway and gene ontology (GO) analysis was performed using data from the differential expression analysis. The pathway and GO analyzes revealed a perturbation of the glutamatergic synapse pathway and an enrichment of positive excitatory post-synaptic potential regulation. This is consistent with the excitatory synapse theory of depression. Together these findings demonstrate a dysregulation of the mesolimbic reward pathway at the gene level as a result of psychological stress potentially contributing to depressive like behaviors.

  8. Gene set analysis using variance component tests

    Science.gov (United States)

    2013-01-01

    Background Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. Results We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). Conclusion We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data. PMID:23806107

  9. Reduced retinal microvascular density, improved forepaw reach, comparative microarray and gene set enrichment analysis with c-jun targeting DNA enzyme.

    Directory of Open Access Journals (Sweden)

    Cecilia W S Chan

    Full Text Available Retinal neovascularization is a critical component in the pathogenesis of common ocular disorders that cause blindness, and treatment options are limited. We evaluated the therapeutic effect of a DNA enzyme targeting c-jun mRNA in mice with pre-existing retinal neovascularization. A single injection of Dz13 in a lipid formulation containing N-[1-(2,3-dioleoyloxypropyl]-N,N,N-trimethylammonium methyl-sulfate and 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine inhibited c-Jun expression and reduced retinal microvascular density. The DNAzyme inhibited retinal microvascular density as effectively as VEGF-A antibodies. Comparative microarray and gene expression analysis determined that Dz13 suppressed not only c-jun but a range of growth factors and matrix-degrading enzymes. Dz13 in this formulation inhibited microvascular endothelial cell proliferation, migration and tubule formation in vitro. Moreover, animals treated with Dz13 sensed the top of the cage in a modified forepaw reach model, unlike mice given a DNAzyme with scrambled RNA-binding arms that did not affect c-Jun expression. These findings demonstrate reduction of microvascular density and improvement in forepaw reach in mice administered catalytic DNA.

  10. Gene ontology and KEGG enrichment analyses of genes related to age-related macular degeneration.

    Science.gov (United States)

    Zhang, Jian; Xing, ZhiHao; Ma, Mingming; Wang, Ning; Cai, Yu-Dong; Chen, Lei; Xu, Xun

    2014-01-01

    Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  11. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool

    National Research Council Canada - National Science Library

    Chen, Edward Y; Tan, Christopher M; Kou, Yan; Duan, Qiaonan; Wang, Zichen; Meirelles, Gabriela Vaz; Clark, Neil R; Ma'ayan, Avi

    2013-01-01

    .... Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive...

  12. Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.

    Directory of Open Access Journals (Sweden)

    Alexander Kaever

    Full Text Available A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.

  13. Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.

    Science.gov (United States)

    Kaever, Alexander; Landesfeind, Manuel; Feussner, Kirstin; Morgenstern, Burkhard; Feussner, Ivo; Meinicke, Peter

    2014-01-01

    A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene) Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.

  14. Gene set analysis for GWAS

    DEFF Research Database (Denmark)

    Debrabant, Birgit; Soerensen, Mette

    2014-01-01

    Abstract We discuss the use of modified Kolmogorov-Smirnov (KS) statistics in the context of gene set analysis and review corresponding null and alternative hypotheses. Especially, we show that, when enhancing the impact of highly significant genes in the calculation of the test statistic...... parameter and the genesis and distribution of the gene-level statistics, and illustrate the effects of differential weighting in a real-life example....

  15. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

    Directory of Open Access Journals (Sweden)

    Steinfeld Israel

    2009-02-01

    Full Text Available Abstract Background Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results. Results GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (e.g. by level of expression or of differential expression. GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms. Conclusion GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation. GOrilla is publicly available at: http://cbl-gorilla.cs.technion.ac.il

  16. Multi-edge gene set networks reveal novel insights into global relationships between biological themes.

    Directory of Open Access Journals (Sweden)

    Jignesh R Parikh

    Full Text Available Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.

  17. Multi-edge gene set networks reveal novel insights into global relationships between biological themes.

    Science.gov (United States)

    Parikh, Jignesh R; Xia, Yu; Marto, Jarrod A

    2012-01-01

    Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.

  18. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool.

    Science.gov (United States)

    Chen, Edward Y; Tan, Christopher M; Kou, Yan; Duan, Qiaonan; Wang, Zichen; Meirelles, Gabriela Vaz; Clark, Neil R; Ma'ayan, Avi

    2013-04-15

    System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement. Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3). The software can also be embedded into any tool that performs gene list analysis. We applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signatures to the enrichment signatures of matched normal tissues. We observed a common pattern of up regulation of the polycomb group PRC2 and enrichment for the histone mark H3K27me3 in many cancer cell lines, as well as alterations in Toll-like receptor and interlukin signaling in K562 cells when compared with normal myeloid CD33+ cells. Such analyses provide global visualization of critical differences between normal tissues and cancer cell lines but can be applied to many other scenarios. Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr.

  19. Effects of environmental enrichment on gene expression in the brain

    OpenAIRE

    Rampon, Claire; Jiang, Cecilia H.; Dong, Helin; Tang, Ya-Ping; Lockhart, David J; Schultz, Peter G.; Joe Z Tsien; Hu, Yinghe

    2000-01-01

    An enriched environment is known to promote structural changes in the brain and to enhance learning and memory performance in rodents [Hebb, D. O. (1947) Am. Psychol. 2, 306–307]. To better understand the molecular mechanisms underlying these experience-dependent cognitive changes, we have used high-density oligonucleotide microarrays to analyze gene expression in the brain. Expression of a large number of genes changes in response to enrichment training, many of w...

  20. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  1. Separate enrichment analysis of pathways for up- and downregulated genes.

    Science.gov (United States)

    Hong, Guini; Zhang, Wenjing; Li, Hongdong; Shen, Xiaopei; Guo, Zheng

    2014-03-06

    Two strategies are often adopted for enrichment analysis of pathways: the analysis of all differentially expressed (DE) genes together or the analysis of up- and downregulated genes separately. However, few studies have examined the rationales of these enrichment analysis strategies. Using both microarray and RNA-seq data, we show that gene pairs with functional links in pathways tended to have positively correlated expression levels, which could result in an imbalance between the up- and downregulated genes in particular pathways. We then show that the imbalance could greatly reduce the statistical power for finding disease-associated pathways through the analysis of all-DE genes. Further, using gene expression profiles from five types of tumours, we illustrate that the separate analysis of up- and downregulated genes could identify more pathways that are really pertinent to phenotypic difference. In conclusion, analysing up- and downregulated genes separately is more powerful than analysing all of the DE genes together.

  2. Pathways targeted by antidiabetes drugs are enriched for multiple genes associated with type 2 diabetes risk.

    Science.gov (United States)

    Segrè, Ayellet V; Wei, Nancy; Altshuler, David; Florez, Jose C

    2015-04-01

    Genome-wide association studies (GWAS) have uncovered >65 common variants associated with type 2 diabetes (T2D); however, their relevance for drug development is not yet clear. Of note, the first two T2D-associated loci (PPARG and KCNJ11/ABCC8) encode known targets of antidiabetes medications. We therefore tested whether other genes/pathways targeted by antidiabetes drugs are associated with T2D. We compiled a list of 102 genes in pathways targeted by marketed antidiabetic medications and applied Gene Set Enrichment Analysis (MAGENTA [Meta-Analysis Gene-set Enrichment of variaNT Associations]) to this gene set, using available GWAS meta-analyses for T2D and seven quantitative glycemic traits. We detected a strong enrichment of drug target genes associated with T2D (P = 2 × 10(-5); 14 potential new associations), primarily driven by insulin and thiazolidinedione (TZD) targets, which was replicated in an independent meta-analysis (Metabochip). The glycemic traits yielded no enrichment. The T2D enrichment signal was largely due to multiple genes of modest effects (P = 4 × 10(-4), after removing known loci), highlighting new associations for follow-up (ACSL1, NFKB1, SLC2A2, incretin targets). Furthermore, we found that TZD targets were enriched for LDL cholesterol associations, illustrating the utility of this approach in identifying potential side effects. These results highlight the potential biomedical relevance of genes revealed by GWAS and may provide new avenues for tailored therapy and T2D treatment design. © 2015 by the American Diabetes Association. Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered.

  3. Gene set analyses for interpreting microarray experiments on prokaryotic organisms.

    Energy Technology Data Exchange (ETDEWEB)

    Tintle, Nathan; Best, Aaron; Dejongh, Matthew; VanBruggen, Dirk; Heffron, Fred; Porwollik, Steffen; Taylor, Ronald C.

    2008-11-05

    Background: Recent advances in microarray technology have brought with them the need for enhanced methods of biologically interpreting gene expression data. Recently, methods like Gene Set Enrichment Analysis (GSEA) and variants of Fisher’s exact test have been proposed which utilize a priori biological information. Typically, these methods are demonstrated with a priori biological information from the Gene Ontology. Results: Alternative gene set definitions are presented based on gene sets inferred from the SEED: open-source software environment for comparative genome annotation and analysis of microbial organisms. Many of these gene sets are then shown to provide consistent expression across a series of experiments involving Salmonella Typhimurium. Implementation of the gene sets in an analysis of microarray data is then presented for the Salmonella Typhimurium data. Conclusions: SEED inferred gene sets can be naturally defined based on subsystems in the SEED. The consistent expression values of these SEED inferred gene sets suggest their utility for statistical analyses of gene expression data based on a priori biological information

  4. Gene enrichment in plant genomic shotgun libraries.

    Science.gov (United States)

    Rabinowicz, Pablo D; McCombie, W Richard; Martienssen, Robert A

    2003-04-01

    The Arabidopsis genome (about 130 Mbp) has been completely sequenced; whereas a draft sequence of the rice genome (about 430 Mbp) is now available and the sequencing of this genome will be completed in the near future. The much larger genomes of several important crop species, such as wheat (about 16,000 Mbp) or maize (about 2500 Mbp), may not be fully sequenced with current technology. Instead, sequencing-analysis strategies are being developed to obtain sequencing and mapping information selectively for the genic fraction (gene space) of complex plant genomes.

  5. Evidence-based prioritisation and enrichment of genes interacting with metformin in type 2 diabetes.

    Science.gov (United States)

    Dawed, Adem Y; Ali, Ashfaq; Zhou, Kaixin; Pearson, Ewan R; Franks, Paul W

    2017-08-25

    There is an extensive body of literature suggesting the involvement of multiple loci in regulating the action of metformin; most findings lack replication, without which distinguishing true-positive from false-positive findings is difficult. To address this, we undertook evidence-based, multiple data integration to determine the validity of published evidence. We (1) built a database of published data on gene-metformin interactions using an automated text-mining approach (n = 5963 publications), (2) generated evidence scores for each reported locus, (3) from which a rank-ordered gene set was generated, and (4) determined the extent to which this gene set was enriched for glycaemic response through replication analyses in a well-powered independent genome-wide association study (GWAS) dataset from the Genetics of Diabetes and Audit Research Tayside Study (GoDARTS). From the literature search, seven genes were identified that are related to the clinical outcomes of metformin. Fifteen genes were linked with either metformin pharmacokinetics or pharmacodynamics, and the expression profiles of a further 51 genes were found to be responsive to metformin. Gene-set enrichment analysis consisting of the three sets and two more composite sets derived from the above three showed no significant enrichment in four of the gene sets. However, we detected significant enrichment of genes in the least prioritised category (a gene set in which their expression is affected by metformin) with glycaemic response to metformin (p = 0.03). This gene set includes novel candidate genes such as SLC2A4 (p = 3.24 × 10(-04)) and G6PC (p = 4.77 × 10(-04)). We have described a semi-automated text-mining and evidence-scoring algorithm that facilitates the organisation and extraction of useful information about gene-drug interactions. We further validated the output of this algorithm in a drug-response GWAS dataset, providing novel candidate loci for gene-metformin interactions.

  6. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  7. Applying gene set enrichment analysis and meta-analysis to screen key genes controlling the development and progression of hepatic carcinoma%基因富集及Meta分析对影响肝癌发生发展关键基因的筛选

    Institute of Scientific and Technical Information of China (English)

    曹骥; 卢晓旭; 胡艳玲; 李瑗; 朱伶群; 杨春; 欧超; 唐艳萍

    2012-01-01

    AIM: To analyze vast amounts of hepatic carcinoma-related microarray data and identify crucial genes that control the development and progression of hepatocellular carcinoma (HCQ.METHODS: Cross-species comparison could be used to explore the similarities between HCC-related gene expression profiles of human beings and other species. In order to screen genes that are involved in hepatocarcinogenesis, gene set enrichment analysis (GSEA) and meta-analysis were performed to study five gene expression data sets of independent species.RESULTS: Among the five gene expression data sets, three up-regulated and two down-regulated pathways were found to be consistent by gene set enrichment analysis. The up-regulated pathways are amino sugar and nucle-otide sugar metabolism, cell cycle, and thyroid cancer, while the down-regulated pathways are linoleic acid metabolism and arachidonic acid metabolism. A total of 1 708 genes with a P < 0.05 were found in meta-analysis for five datas-ets, of which 720 could be assigned to functional pathways by DAVID and KEGG. These pathways include cell cycle, oocyte meiosis, and DNA replication. Cell cycle is the overlapping significant pathway between the two methods. Twenty-five genes with a P < 0.05 were identified in meta-analysis of cell cycle pathway. Five significant genes may be involved in the occurrence and progression of HCC.CONCLUSION: Cell cycle may be the crucial pathway to affect signal transduction in hepatocarcinogenesis.%目的:筛选影响肝癌发生发展的关键基因.方法:运用跨种属肿瘤基因筛选策略比较不同种属的肝癌基因表达谱间的相似改变,选择5套不同种属的肝癌基因表达芯片分别通过基因组富集(gene set enrichment analysis,GSEA)以及对单套数据集单个基因元分析(metaanalysis,Meta)的分析方法,筛选出在转录水平上影响肝癌的基因.结果:用GSEA方法分析,5组数据中所得通路对比,上调中皆有的通路为氨基糖核苷酸糖代谢

  8. Candidate genes for obesity-susceptibility show enriched association within a large genome-wide association study for BMI

    Science.gov (United States)

    Vimaleswaran, Karani S.; Tachmazidou, Ioanna; Zhao, Jing Hua; Hirschhorn, Joel N.; Dudbridge, Frank; Loos, Ruth J.F.

    2012-01-01

    Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10−7. Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits. PMID:22791748

  9. Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci.

    Science.gov (United States)

    Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate

    2017-02-14

    Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). The PAX8-target gene set was ranked 1/615 in the discovery (PGSEA<0.001; FDR=0.21), 7/615 in the replication (PGSEA=0.004; FDR=0.37), and 1/615 in the combined (PGSEA<0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10(-5) (including six with P<5 × 10(-8)). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (PGSEA=0.025) and IGROV1 (PGSEA=0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC.

  10. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain

    Science.gov (United States)

    Krienen, Fenna M.; Yeo, B. T. Thomas; Ge, Tian; Buckner, Randy L.; Sherwood, Chet C.

    2016-01-01

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute’s human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections. PMID:26739559

  11. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain.

    Science.gov (United States)

    Krienen, Fenna M; Yeo, B T Thomas; Ge, Tian; Buckner, Randy L; Sherwood, Chet C

    2016-01-26

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute's human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections.

  12. Switch-like genes populate cell communication pathways and are enriched for extracellular proteins

    Directory of Open Access Journals (Sweden)

    Tozeren Aydin

    2008-01-01

    Full Text Available Abstract Background Recent studies have placed gene expression in the context of distribution profiles including housekeeping, graded, and bimodal (switch-like. Single-gene studies have shown bimodal expression results from healthy cell signaling and complex diseases such as cancer, however developing a comprehensive list of human bimodal genes has remained a major challenge due to inherent noise in human microarray data. This study presents a two-component mixture analysis of mouse gene expression data for genes on the Affymetrix MG-U74Av2 array for the detection and annotation of switch-like genes. Two-component normal mixtures were fit to the data to identify bimodal genes and their potential roles in cell signaling and disease progression. Results Seventeen percent of the genes on the MG-U74Av2 array (1519 out of 9091 were identified as bimodal or switch-like. KEGG pathways significantly enriched for bimodal genes included ECM-receptor interaction, cell communication, and focal adhesion. Similarly, the GO biological process "cell adhesion" and cellular component "extracellular matrix" were significantly enriched. Switch-like genes were found to be associated with such diseases as congestive heart failure, Alzheimer's disease, arteriosclerosis, breast neoplasms, hypertension, myocardial infarction, obesity, rheumatoid arthritis, and type I and type II diabetes. In diabetes alone, over two hundred bimodal genes were in a different mode of expression compared to normal tissue. Conclusion This research identified and annotated bimodal or switch-like genes in the mouse genome using a large collection of microarray data. Genes with bimodal expression were enriched within the cell membrane and extracellular environment. Hundreds of bimodal genes demonstrated alternate modes of expression in diabetic muscle, pancreas, liver, heart, and adipose tissue. Bimodal genes comprise a candidate set of biomarkers for a large number of disease states because

  13. Gene-ontology enrichment analysis in two independent family-based samples highlights biologically plausible processes for autism spectrum disorders.

    LENUS (Irish Health Repository)

    Anney, Richard J L

    2012-02-01

    Recent genome-wide association studies (GWAS) have implicated a range of genes from discrete biological pathways in the aetiology of autism. However, despite the strong influence of genetic factors, association studies have yet to identify statistically robust, replicated major effect genes or SNPs. We apply the principle of the SNP ratio test methodology described by O\\'Dushlaine et al to over 2100 families from the Autism Genome Project (AGP). Using a two-stage design we examine association enrichment in 5955 unique gene-ontology classifications across four groupings based on two phenotypic and two ancestral classifications. Based on estimates from simulation we identify excess of association enrichment across all analyses. We observe enrichment in association for sets of genes involved in diverse biological processes, including pyruvate metabolism, transcription factor activation, cell-signalling and cell-cycle regulation. Both genes and processes that show enrichment have previously been examined in autistic disorders and offer biologically plausibility to these findings.

  14. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data.

    Directory of Open Access Journals (Sweden)

    Boris P Hejblum

    2015-06-01

    Full Text Available Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial, and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package.

  15. Transcriptome profiling of Set5 and Set1 methyltransferases: Tools for visualization of gene expression

    Directory of Open Access Journals (Sweden)

    Glòria Mas Martín

    2014-12-01

    Full Text Available Cells regulate transcription by coordinating the activities of multiple histone modifying complexes. We recently identified the yeast histone H4 methyltransferase Set5 and discovered functional overlap with the histone H3 methyltransferase Set1 in gene expression. Specifically, using next-generation RNA sequencing (RNA-Seq, we found that Set5 and Set1 function synergistically to regulate specific transcriptional programs at subtelomeres and transposable elements. Here we provide a comprehensive description of the methodology and analysis tools corresponding to the data deposited in NCBI's Gene Expression Omnibus (GEO under the accession number GSE52086. This data complements the experimental methods described in Mas Martín G et al. (2014 and provides the means to explore the cooperative functions of histone H3 and H4 methyltransferases in the regulation of transcription. Furthermore, a fully annotated R code is included to enable researchers to use the following computational tools: comparison of significant differential expression (SDE profiles; gene ontology enrichment of SDE; and enrichment of SDE relative to chromosomal features, such as centromeres, telomeres, and transposable elements. Overall, we present a bioinformatics platform that can be generally implemented for similar analyses with different datasets and in different organisms.

  16. GeneCodis3: a non-redundant and modular enrichment analysis tool for functional genomics.

    Science.gov (United States)

    Tabas-Madrid, Daniel; Nogales-Cadenas, Ruben; Pascual-Montano, Alberto

    2012-07-01

    Since its first release in 2007, GeneCodis has become a valuable tool to functionally interpret results from experimental techniques in genomics. This web-based application integrates different sources of information to finding groups of genes with similar biological meaning. This process, known as enrichment analysis, is essential in the interpretation of high-throughput experiments. The frequent feedbacks and the natural evolution of genomics and bioinformatics have allowed the growth of the tool and the development of this third release. In this version, a special effort has been made to remove noisy and redundant output from the enrichment results with the inclusion of a recently reported algorithm that summarizes significantly enriched terms and generates functionally coherent modules of genes and terms. A new comparative analysis has been added to allow the differential analysis of gene sets. To expand the scope of the application, new sources of biological information have been included, such as genetic diseases, drugs-genes interactions and Pubmed information among others. Finally, the graphic section has been renewed with the inclusion of new interactive graphics and filtering options. The application is freely available at http://genecodis.cnb.csic.es.

  17. Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.

    Science.gov (United States)

    Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-01-01

    Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs.

  18. goSTAG: gene ontology subtrees to tag and annotate genes within a set.

    Science.gov (United States)

    Bennett, Brian D; Bushel, Pierre R

    2017-01-01

    Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. goSTAG converts gene lists from genomic analyses into biological themes

  19. Epigenomic elements enriched in the promoters of autoimmunity susceptibility genes.

    Science.gov (United States)

    Dozmorov, Mikhail G; Wren, Jonathan D; Alarcón-Riquelme, Marta E

    2014-02-01

    Genome-wide association studies have identified a number of autoimmune disease-susceptibility genes. Whether or not these loci share any regulatory or functional elements, however, is an open question. Finding such common regulators is of considerable research interest in order to define systemic therapeutic targets. The growing amount of experimental genomic annotations, particularly those from the ENCODE project, provide a wealth of opportunities to search for such commonalities. We hypothesized that regulatory commonalities might not only delineate a regulatory landscape predisposing to autoimmune diseases, but also define functional elements distinguishing specific diseases. We further investigated if, and how, disease-specific epigenomic elements can identify novel genes yet to be associated with the diseases. We evaluated transcription factors, histone modifications, and chromatin state data obtained from the ENCODE project for statistically significant over- or under-representation in the promoters of genes associated with Systemic Lupus Erythematosus (SLE), Rheumatoid Arthritis (RA), and Systemic Sclerosis (SSc). We identified BATF, BCL11A, IRF4, NFkB, PAX5, and PU.1 as transcription factors over-represented in SLE- and RA-susceptibility gene promoters. H3K4me1 and H3K4me2 epigenomic marks were associated with SLE susceptibility genes, and H3K9me3 was common to both SLE and RA. In contrast to a transcriptionally active signature in SLE and RA, SSc-susceptibility genes were depleted in activating epigenomic elements. Using epigenomic elements enriched in SLE and RA, we identified additional immune and B cell signaling-related genes with the same elements in their promoters. Our analysis suggests common and disease-specific epigenomic elements that may define novel therapeutic targets for controlling aberrant activation of autoimmune susceptibility genes.

  20. Gene set analysis for longitudinal gene expression data

    Directory of Open Access Journals (Sweden)

    Piepho Hans-Peter

    2011-07-01

    Full Text Available Abstract Background Gene set analysis (GSA has become a successful tool to interpret gene expression profiles in terms of biological functions, molecular pathways, or genomic locations. GSA performs statistical tests for independent microarray samples at the level of gene sets rather than individual genes. Nowadays, an increasing number of microarray studies are conducted to explore the dynamic changes of gene expression in a variety of species and biological scenarios. In these longitudinal studies, gene expression is repeatedly measured over time such that a GSA needs to take into account the within-gene correlations in addition to possible between-gene correlations. Results We provide a robust nonparametric approach to compare the expressions of longitudinally measured sets of genes under multiple treatments or experimental conditions. The limiting distributions of our statistics are derived when the number of genes goes to infinity while the number of replications can be small. When the number of genes in a gene set is small, we recommend permutation tests based on our nonparametric test statistics to achieve reliable type I error and better power while incorporating unknown correlations between and within-genes. Simulation results demonstrate that the proposed method has a greater power than other methods for various data distributions and heteroscedastic correlation structures. This method was used for an IL-2 stimulation study and significantly altered gene sets were identified. Conclusions The simulation study and the real data application showed that the proposed gene set analysis provides a promising tool for longitudinal microarray analysis. R scripts for simulating longitudinal data and calculating the nonparametric statistics are posted on the North Dakota INBRE website http://ndinbre.org/programs/bioinformatics.php. Raw microarray data is available in Gene Expression Omnibus (National Center for Biotechnology Information with

  1. Alcohol-related genes show an enrichment of associations with a persistent externalizing factor.

    Science.gov (United States)

    Ashenhurst, James R; Harden, K Paige; Corbin, William R; Fromme, Kim

    2016-10-01

    Research using twins has found that much of the variability in externalizing phenotypes-including alcohol and drug use, impulsive personality traits, risky sex, and property crime-is explained by genetic factors. Nevertheless, identification of specific genes and variants associated with these traits has proven to be difficult, likely because individual differences in externalizing are explained by many genes of small individual effect. Moreover, twin research indicates that heritable variance in externalizing behaviors is mostly shared across the externalizing spectrum rather than specific to any behavior. We use a longitudinal, "deep phenotyping" approach to model a general externalizing factor reflecting persistent engagement in a variety of socially problematic behaviors measured at 11 assessment occasions spanning early adulthood (ages 18 to 28). In an ancestrally homogenous sample of non-Hispanic Whites (N = 337), we then tested for enrichment of associations between the persistent externalizing factor and a set of 3,281 polymorphisms within 104 genes that were previously identified as associated with alcohol-use behaviors. Next, we tested for enrichment among domain-specific factors (e.g., property crime) composed of residual variance not accounted for by the common factor. Significance was determined relative to bootstrapped empirical thresholds derived from permutations of phenotypic data. Results indicated significant enrichment of genetic associations for persistent externalizing, but not for domain-specific factors. Consistent with twin research findings, these results suggest that genetic variants are broadly associated with externalizing behaviors rather than unique to specific behaviors. (PsycINFO Database Record

  2. Gene set analysis for interpreting genetic studies

    DEFF Research Database (Denmark)

    Pers, Tune H

    2016-01-01

    Interpretation of genome-wide association study (GWAS) results is lacking behind the discovery of new genetic associations. Consequently, there is an urgent need for data-driven methods for interpreting genetic association studies. Gene set analysis (GSA) can identify aetiologic pathways and func......Interpretation of genome-wide association study (GWAS) results is lacking behind the discovery of new genetic associations. Consequently, there is an urgent need for data-driven methods for interpreting genetic association studies. Gene set analysis (GSA) can identify aetiologic pathways...

  3. Multidimensional gene set analysis of genomic data.

    Directory of Open Access Journals (Sweden)

    David Montaner

    Full Text Available Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms in response to one particular variable (e.g. differential gene expression. In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc. simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.

  4. GSMA: Gene Set Matrix Analysis, An Automated Method for Rapid Hypothesis Testing of Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Chris Cheadle

    2007-01-01

    Full Text Available Background: Microarray technology has become highly valuable for identifying complex global changes in gene expression patterns. The assignment of functional information to these complex patterns remains a challenging task in effectively interpreting data and correlating results from across experiments, projects and laboratories. Methods which allow the rapid and robust evaluation of multiple functional hypotheses increase the power of individual researchers to data mine gene expression data more efficiently.Results: We have developed (gene set matrix analysis GSMA as a useful method for the rapid testing of group-wise up- or downregulation of gene expression simultaneously for multiple lists of genes (gene sets against entire distributions of gene expression changes (datasets for single or multiple experiments. The utility of GSMA lies in its flexibility to rapidly poll gene sets related by known biological function or as designated solely by the end-user against large numbers of datasets simultaneously.Conclusions: GSMA provides a simple and straightforward method for hypothesis testing in which genes are tested by groups across multiple datasets for patterns of expression enrichment.

  5. Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership

    Directory of Open Access Journals (Sweden)

    Ernesto eIacucci

    2012-02-01

    Full Text Available High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an interesting set of genes—say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched or under-represented (depleted among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover gold standard annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.

  6. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    Science.gov (United States)

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  7. Integrative Analysis of Gene Expression Data Including an Assessment of Pathway Enrichment for Predicting Prostate Cancer

    Directory of Open Access Journals (Sweden)

    Pingzhao Hu

    2006-01-01

    Full Text Available Background: Microarray technology has been previously used to identify genes that are differentially expressed between tumour and normal samples in a single study, as well as in syntheses involving multiple studies. When integrating results from several Affymetrix microarray datasets, previous studies summarized probeset-level data, which may potentially lead to a loss of information available at the probe-level. In this paper, we present an approach for integrating results across studies while taking probe-level data into account. Additionally, we follow a new direction in the analysis of microarray expression data, namely to focus on the variation of expression phenotypes in predefined gene sets, such as pathways. This targeted approach can be helpful for revealing information that is not easily visible from the changes in the individual genes. Results: We used a recently developed method to integrate Affymetrix expression data across studies. The idea is based on a probe-level based test statistic developed for testing for differentially expressed genes in individual studies. We incorporated this test statistic into a classic random-effects model for integrating data across studies. Subsequently, we used a gene set enrichment test to evaluate the significance of enriched biological pathways in the differentially expressed genes identified from the integrative analysis. We compared statistical and biological significance of the prognostic gene expression signatures and pathways identified in the probe-level model (PLM with those in the probeset-level model (PSLM. Our integrative analysis of Affymetrix microarray data from 110 prostate cancer samples obtained from three studies reveals thousands of genes significantly correlated with tumour cell differentiation. The bioinformatics analysis, mapping these genes to the publicly available KEGG database, reveals evidence that tumour cell differentiation is significantly associated with many

  8. Definition, conservation and epigenetics of housekeeping and tissue-enriched genes

    Directory of Open Access Journals (Sweden)

    Johnson Jason M

    2009-06-01

    Full Text Available Abstract Background Housekeeping genes (HKG are constitutively expressed in all tissues while tissue-enriched genes (TEG are expressed at a much higher level in a single tissue type than in others. HKGs serve as valuable experimental controls in gene and protein expression experiments, while TEGs tend to represent distinct physiological processes and are frequently candidates for biomarkers or drug targets. The genomic features of these two groups of genes expressed in opposing patterns may shed light on the mechanisms by which cells maintain basic and tissue-specific functions. Results Here, we generate gene expression profiles of 42 normal human tissues on custom high-density microarrays to systematically identify 1,522 HKGs and 975 TEGs and compile a small subset of 20 housekeeping genes which are highly expressed in all tissues with lower variance than many commonly used HKGs. Cross-species comparison shows that both the functions and expression patterns of HKGs are conserved. TEGs are enriched with respect to both segmental duplication and copy number variation, while no such enrichment is observed for HKGs, suggesting the high expression of HKGs are not due to high copy numbers. Analysis of genomic and epigenetic features of HKGs and TEGs reveals that the high expression of HKGs across different tissues is associated with decreased nucleosome occupancy at the transcription start site as indicated by enhanced DNase hypersensitivity. Additionally, we systematically and quantitatively demonstrated that the CpG islands' enrichment in HKGs transcription start sites (TSS and their depletion in TEGs TSS. Histone methylation patterns differ significantly between HKGs and TEGs, suggesting that methylation contributes to the differential expression patterns as well. Conclusion We have compiled a set of high quality HKGs that should provide higher and more consistent expression when used as references in laboratory experiments than currently used

  9. A method for developing regulatory gene set networks to characterize complex biological systems.

    Science.gov (United States)

    Suphavilai, Chayaporn; Zhu, Liugen; Chen, Jake Y

    2015-01-01

    Traditional approaches to studying molecular networks are based on linking genes or proteins. Higher-level networks linking gene sets or pathways have been proposed recently. Several types of gene set networks have been used to study complex molecular networks such as co-membership gene set networks (M-GSNs) and co-enrichment gene set networks (E-GSNs). Gene set networks are useful for studying biological mechanism of diseases and drug perturbations. In this study, we proposed a new approach for constructing directed, regulatory gene set networks (R-GSNs) to reveal novel relationships among gene sets or pathways. We collected several gene set collections and high-quality gene regulation data in order to construct R-GSNs in a comparative study with co-membership gene set networks (M-GSNs). We described a method for constructing both global and disease-specific R-GSNs and determining their significance. To demonstrate the potential applications to disease biology studies, we constructed and analysed an R-GSN specifically built for Alzheimer's disease. R-GSNs can provide new biological insights complementary to those derived at the protein regulatory network level or M-GSNs. When integrated properly to functional genomics data, R-GSNs can help enable future research on systems biology and translational bioinformatics.

  10. Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

    Directory of Open Access Journals (Sweden)

    Lijing Xu

    Full Text Available High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05. These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT. GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature.GCAT is freely available at http://binf1.memphis.edu/gcat.

  11. Gene expression profiles and physiological data from mice fed resveratrol-enriched rice DJ526

    Science.gov (United States)

    Chung, Hea-Jong; Lee, Heui-Kwan; Kim, Hyeon-Jin; Baek, So-Hyeon; Hong, Seong-Tshool

    2016-01-01

    The molecular mechanism underlying lifespan extension by resveratrol remains widely discussed. To help study this mechanism, we previously created resveratrol-enriched rice, DJ526, by transferring the resveratrol biosynthesis gene into Dongjin rice. DJ526 accumulates 1.4–1.9 μg g−1 of resveratrol in its grain and can ameliorates age-related deterioration in mice, as compared to control animals, based on assessments of motor coordination, physical strength and cutaneous tissue aging. Here, we present raw data sets, deposited in public repositories, from microarray analysis and physiological data of mice fed with DJ526 and Dongjin rice and treated with resveratrol. We also provide a method to analyze blood serum at micron levels. These data sets may help other researchers find new clues regarding the etiology of the anti-aging process and signaling pathways induced by resveratrol, rice, or DJ526. PMID:27996975

  12. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits.

    Directory of Open Access Journals (Sweden)

    Ayellet V Segrè

    2010-08-01

    Full Text Available Mitochondrial dysfunction has been observed in skeletal muscle of people with diabetes and insulin-resistant individuals. Furthermore, inherited mutations in mitochondrial DNA can cause a rare form of diabetes. However, it is unclear whether mitochondrial dysfunction is a primary cause of the common form of diabetes. To date, common genetic variants robustly associated with type 2 diabetes (T2D are not known to affect mitochondrial function. One possibility is that multiple mitochondrial genes contain modest genetic effects that collectively influence T2D risk. To test this hypothesis we developed a method named Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA; http://www.broadinstitute.org/mpg/magenta. MAGENTA, in analogy to Gene Set Enrichment Analysis, tests whether sets of functionally related genes are enriched for associations with a polygenic disease or trait. MAGENTA was specifically designed to exploit the statistical power of large genome-wide association (GWA study meta-analyses whose individual genotypes are not available. This is achieved by combining variant association p-values into gene scores and then correcting for confounders, such as gene size, variant number, and linkage disequilibrium properties. Using simulations, we determined the range of parameters for which MAGENTA can detect associations likely missed by single-marker analysis. We verified MAGENTA's performance on empirical data by identifying known relevant pathways in lipid and lipoprotein GWA meta-analyses. We then tested our mitochondrial hypothesis by applying MAGENTA to three gene sets: nuclear regulators of mitochondrial genes, oxidative phosphorylation genes, and approximately 1,000 nuclear-encoded mitochondrial genes. The analysis was performed using the most recent T2D GWA meta-analysis of 47,117 people and meta-analyses of seven diabetes-related glycemic traits (up to 46,186 non-diabetic individuals. This well-powered analysis found no

  13. Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways.

    Science.gov (United States)

    Chen, Lei; Zhang, Yu-Hang; Wang, ShaoPeng; Zhang, YunHua; Huang, Tao; Cai, Yu-Dong

    2017-01-01

    Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.

  14. Groundwater fluoride enrichment in an active rift setting: Central Kenya Rift case study

    Energy Technology Data Exchange (ETDEWEB)

    Olaka, Lydia A., E-mail: lydiaolaka@gmail.com [Department of Geology, University of Nairobi, P.O Box 30197, Nairobi (Kenya); Wilke, Franziska D.H. [Geoforschungs Zentrum, Telegrafenberg, 14473 Potsdam (Germany); Olago, Daniel O.; Odada, Eric O. [Department of Geology, University of Nairobi, P.O Box 30197, Nairobi (Kenya); Mulch, Andreas [Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325 Frankfurt (Germany); Institut für Geowissenschaften, Goethe Universität Frankfurt, Altenhöferallee 1, 60438 Frankfurt (Germany); Musolff, Andreas [UFZ-Helmholtz-Centre for Environmental Research, Department of Hydrogeology, Permoserstr. 15, 04318 Leipzig (Germany)

    2016-03-01

    Groundwater is used extensively in the Central Kenya Rift for domestic and agricultural demands. In these active rift settings groundwater can exhibit high fluoride levels. In order to address water security and reduce human exposure to high fluoride in drinking water, knowledge of the source and geochemical processes of enrichment are required. A study was therefore carried out within the Naivasha catchment (Kenya) to understand the genesis, enrichment and seasonal variations of fluoride in the groundwater. Rocks, rain, surface and groundwater sources were sampled for hydrogeochemical and isotopic investigations, the data was statistically and geospatially analyzed. Water sources have variable fluoride concentrations between 0.02–75 mg/L. 73% exceed the health limit (1.5 mg/L) in both dry and wet seasons. F{sup −} concentrations in rivers are lower (0.2–9.2 mg/L) than groundwater (0.09 to 43.6 mg/L) while saline lake waters have the highest concentrations (0.27–75 mg/L). The higher values are confined to elevations below 2000 masl. Oxygen (δ{sup 18}O) and hydrogen (δD) isotopic values range from − 6.2 to + 5.8‰ and − 31.3 to + 33.3‰, respectively, they are also highly variable in the rift floor where they attain maximum values. Fluoride base levels in the precursor vitreous volcanic rocks are higher (between 3750–6000 ppm) in minerals such as cordierite and muscovite while secondary minerals like illite and kaolinite have lower remnant fluoride (< 1000 ppm). Thus, geochemical F{sup −} enrichment in regional groundwater is mainly due to a) rock alteration, i.e. through long residence times and natural discharge and/or enhanced leakages of deep seated geothermal water reservoirs, b) secondary concentration fortification of natural reservoirs through evaporation, through reduced recharge and/or enhanced abstraction and c) through additional enrichment of fluoride after volcanic emissions. The findings are useful to help improve water management

  15. Symmetry-enriched string nets: Exactly solvable models for SET phases

    Science.gov (United States)

    Heinrich, Chris; Burnell, Fiona; Fidkowski, Lukasz; Levin, Michael

    2016-12-01

    We construct exactly solvable models for a wide class of symmetry-enriched topological (SET) phases. Our construction applies to two-dimensional (2D) bosonic SET phases with finite unitary on-site symmetry group G and we conjecture that our models realize every phase in this class that can be described by a commuting projector Hamiltonian. Our models are designed so that they have a special property: If we couple them to a dynamical lattice gauge field with gauge group G , the resulting gauge theories are equivalent to string-net models. This property is what allows us to analyze our models in generality. As an example, we present a model for a phase with the same anyon excitations as the toric code and with a Z2 symmetry which exchanges the e and m type anyons. We further illustrate our construction with a number of additional examples.

  16. Systematic enrichment analysis of gene expression profiling studies identifies consensus pathways implicated in colorectal cancer development

    Directory of Open Access Journals (Sweden)

    Jesús Lascorz

    2011-01-01

    Full Text Available Background: A large number of gene expression profiling (GEP studies on colorectal carcinogenesis have been performed but no reliable gene signature has been identified so far due to the lack of reproducibility in the reported genes. There is growing evidence that functionally related genes, rather than individual genes, contribute to the etiology of complex traits. We used, as a novel approach, pathway enrichment tools to define functionally related genes that are consistently up- or down-regulated in colorectal carcinogenesis. Materials and Methods: We started the analysis with 242 unique annotated genes that had been reported by any of three recent meta-analyses covering GEP studies on genes differentially expressed in carcinoma vs normal mucosa. Most of these genes (218, 91.9% had been reported in at least three GEP studies. These 242 genes were submitted to bioinformatic analysis using a total of nine tools to detect enrichment of Gene Ontology (GO categories or Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. As a final consistency criterion the pathway categories had to be enriched by several tools to be taken into consideration. Results: Our pathway-based enrichment analysis identified the categories of ribosomal protein constituents, extracellular matrix receptor interaction, carbonic anhydrase isozymes, and a general category related to inflammation and cellular response as significantly and consistently overrepresented entities. Conclusions: We triaged the genes covered by the published GEP literature on colorectal carcinogenesis and subjected them to multiple enrichment tools in order to identify the consistently enriched gene categories. These turned out to have known functional relationships to cancer development and thus deserve further investigation.

  17. Gene expression profiles of Beta-cell enriched tissue obtained by laser capture microdissection from subjects with type 2 diabetes.

    Directory of Open Access Journals (Sweden)

    Lorella Marselli

    Full Text Available BACKGROUND: Changes in gene expression in pancreatic beta-cells from type 2 diabetes (T2D should provide insights into their abnormal insulin secretion and turnover. METHODOLOGY/PRINCIPAL FINDINGS: Frozen sections were obtained from cadaver pancreases of 10 control and 10 T2D human subjects. Beta-cell enriched samples were obtained by laser capture microdissection (LCM. RNA was extracted, amplified and subjected to microarray analysis. Further analysis was performed with DNA-Chip Analyzer (dChip and Gene Set Enrichment Analysis (GSEA software. There were changes in expression of genes linked to glucotoxicity. Evidence of oxidative stress was provided by upregulation of several metallothionein genes. There were few changes in the major genes associated with cell cycle, apoptosis or endoplasmic reticulum stress. There was differential expression of genes associated with pancreatic regeneration, most notably upregulation of members of the regenerating islet gene (REG family and metalloproteinase 7 (MMP7. Some of the genes found in GWAS studies to be related to T2D were also found to be differentially expressed. IGF2BP2, TSPAN8, and HNF1B (TCF2 were upregulated while JAZF1 and SLC30A8 were downregulated. CONCLUSIONS/SIGNIFICANCE: This study made possible by LCM has identified many novel changes in gene expression that enhance understanding of the pathogenesis of T2D.

  18. Putting the “E” in SPIDER: Evolving Trends in the Evaluation of Environmental Enrichment Efficacy in Zoological Settings

    Directory of Open Access Journals (Sweden)

    Christina Alligood

    2015-08-01

    Full Text Available In their seminal paper on environmental enrichment, Mellen and MacPhee (2001 proposed a set of broad goals for enrichment in zoological settings, as well as a framework for enrichment programs. Since that time, the philosophy and practice of environmental enrichment in zoos has continued to grow. Here we review evaluations of enrichment efficacy in the literature since 2001, looking for trends in species, target behaviors, enrichment strategies, and analytic techniques and discussing progress toward the SPIDER vision and outstanding needs in the field. We selected 94 peer-reviewed and 121 non-peer-reviewed articles for review, representing enrichment strategies across a wide range of species. The number of peer-reviewed articles published per year was relatively stable, such that the cumulative number of articles has continued to rise over the thirteen-year review period. We echo the call issued by a number of authors for continued and improved evaluation of enrichment efficacy, and add a recommendation for further exploration of single-subject experimental designs. We also call for focus on a broader array of species and on specific areas of application including reintroduction.

  19. Gene set analyses for interpreting microarray experiments on prokaryotic organisms

    OpenAIRE

    Heffron Fred; Van Bruggen Dirk; DeJongh Matthew; Best Aaron A; Tintle Nathan L; Porwollik Steffen; Taylor Ronald C

    2008-01-01

    Abstract Background Despite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fisher's exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, ...

  20. Studying the Complex Expression Dependences between Sets of Coexpressed Genes

    Directory of Open Access Journals (Sweden)

    Mario Huerta

    2014-01-01

    Full Text Available Organisms simplify the orchestration of gene expression by coregulating genes whose products function together in the cell. The use of clustering methods to obtain sets of coexpressed genes from expression arrays is very common; nevertheless there are no appropriate tools to study the expression networks among these sets of coexpressed genes. The aim of the developed tools is to allow studying the complex expression dependences that exist between sets of coexpressed genes. For this purpose, we start detecting the nonlinear expression relationships between pairs of genes, plus the coexpressed genes. Next, we form networks among sets of coexpressed genes that maintain nonlinear expression dependences between all of them. The expression relationship between the sets of coexpressed genes is defined by the expression relationship between the skeletons of these sets, where this skeleton represents the coexpressed genes with a well-defined nonlinear expression relationship with the skeleton of the other sets. As a result, we can study the nonlinear expression relationships between a target gene and other sets of coexpressed genes, or start the study from the skeleton of the sets, to study the complex relationships of activation and deactivation between the sets of coexpressed genes that carry out the different cellular processes present in the expression experiments.

  1. Self-Contained Statistical Analysis of Gene Sets

    Science.gov (United States)

    Cannon, Judy L.; Ricoy, Ulises M.; Johnson, Christopher

    2016-01-01

    Microarrays are a powerful tool for studying differential gene expression. However, lists of many differentially expressed genes are often generated, and unraveling meaningful biological processes from the lists can be challenging. For this reason, investigators have sought to quantify the statistical probability of compiled gene sets rather than individual genes. The gene sets typically are organized around a biological theme or pathway. We compute correlations between different gene set tests and elect to use Fisher’s self-contained method for gene set analysis. We improve Fisher’s differential expression analysis of a gene set by limiting the p-value of an individual gene within the gene set to prevent a small percentage of genes from determining the statistical significance of the entire set. In addition, we also compute dependencies among genes within the set to determine which genes are statistically linked. The method is applied to T-ALL (T-lineage Acute Lymphoblastic Leukemia) to identify differentially expressed gene sets between T-ALL and normal patients and T-ALL and AML (Acute Myeloid Leukemia) patients. PMID:27711232

  2. Diversity of reductive dehalogenase genes from environmental samples and enrichment cultures identified with degenerate primer PCR screens.

    Directory of Open Access Journals (Sweden)

    Laura Audrey Hug

    2013-11-01

    Full Text Available Reductive dehalogenases are the critical enzymes for anaerobic organohalide respiration, a microbial metabolic process that has been harnessed for bioremediation efforts to resolve chlorinated solvent contamination in groundwater and is implicated in the global halogen cycle. Reductive dehalogenase sequence diversity is informative for the dechlorination potential of the site or enrichment culture. A suite of degenerate PCR primers targeting a comprehensive curated set of reductive dehalogenase genes was designed and applied to twelve DNA samples extracted from contaminated and pristine sites, as well as six enrichment cultures capable of reducing chlorinated compounds to non-toxic end-products. The amplified gene products from four environmental sites and two enrichment cultures were sequenced using Illumina HiSeq, and the reductive dehalogenase complement of each sample determined. The results indicate that the diversity of the reductive dehalogenase gene family is much deeper than is currently accounted for: one-third of the translated proteins have less than 70% pairwise amino acid identity to database sequences. Approximately 60% of the sequenced reductive dehalogenase genes were broadly distributed, being identified in four or more samples, and often in previously sequenced genomes as well. In contrast, 17% of the sequenced reductive dehalogenases were unique, present in only a single sample and bearing less than 90% pairwise amino acid identity to any previously identified proteins. Many of the broadly distributed reductive dehalogenases are uncharacterized in terms of their substrate specificity, making these intriguing targets for further biochemical experimentation. Finally, comparison of samples from a contaminated site and an enrichment culture derived from the same site eight years prior allowed examination of the effect of the enrichment process.

  3. Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data.

    Science.gov (United States)

    Gupta, Ravi; Wikramasinghe, Priyankara; Bhattacharyya, Anirban; Perez, Francisco A; Pal, Sharmistha; Davuluri, Ramana V

    2010-01-18

    Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context. We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters. We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The

  4. MGMT enrichment and second gene co-expression in hematopoietic progenitor cells using separate or dual-gene lentiviral vectors.

    Science.gov (United States)

    Roth, Justin C; Alberti, Michael O; Ismail, Mourad; Lingas, Karen T; Reese, Jane S; Gerson, Stanton L

    2015-01-22

    The DNA repair gene O(6)-methylguanine-DNA methyltransferase (MGMT) allows efficient in vivo enrichment of transduced hematopoietic stem cells (HSC). Thus, linking this selection strategy to therapeutic gene expression offers the potential to reconstitute diseased hematopoietic tissue with gene-corrected cells. However, different dual-gene expression vector strategies are limited by poor expression of one or both transgenes. To evaluate different co-expression strategies in the context of MGMT-mediated HSC enrichment, we compared selection and expression efficacies in cells cotransduced with separate single-gene MGMT and GFP lentivectors to those obtained with dual-gene vectors employing either encephalomyocarditis virus (EMCV) internal ribosome entry site (IRES) or foot and mouth disease virus (FMDV) 2A elements for co-expression strategies. Each strategy was evaluated in vitro and in vivo using equivalent multiplicities of infection (MOI) to transduce 5-fluorouracil (5-FU) or Lin(-)Sca-1(+)c-kit(+) (LSK)-enriched murine bone marrow cells (BMCs). The highest dual-gene expression (MGMT(+)GFP(+)) percentages were obtained with the FMDV-2A dual-gene vector, but half of the resulting gene products existed as fusion proteins. Following selection, dual-gene expression percentages in single-gene vector cotransduced and dual-gene vector transduced populations were similar. Equivalent MGMT expression levels were obtained with each strategy, but GFP expression levels derived from the IRES dual-gene vector were significantly lower. In mice, vector-insertion averages were similar among cells enriched after dual-gene vectors and those cotransduced with single-gene vectors. These data demonstrate the limitations and advantages of each strategy in the context of MGMT-mediated selection, and may provide insights into vector design with respect to a particular therapeutic gene or hematologic defect.

  5. A novel hypothesis-unbiased method for Gene Ontology enrichment based on transcriptome data.

    Science.gov (United States)

    Fruzangohar, Mario; Ebrahimie, Esmaeil; Adelson, David L

    2017-01-01

    Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer's disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology.

  6. A novel hypothesis-unbiased method for Gene Ontology enrichment based on transcriptome data

    Science.gov (United States)

    Fruzangohar, Mario; Ebrahimie, Esmaeil; Adelson, David L.

    2017-01-01

    Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer’s disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology. PMID:28199395

  7. GO-based Functional Dissimilarity of Gene Sets

    Directory of Open Access Journals (Sweden)

    Aguilar-Ruiz Jesús S

    2011-09-01

    Full Text Available Abstract Background The Gene Ontology (GO provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. Results To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity, a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Conclusions Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG. It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD.

  8. Geo-Enrichment and Semantic Enhancement of Metadata Sets to Augment Discovery in Geoportals

    Directory of Open Access Journals (Sweden)

    Bernhard Vockner

    2014-03-01

    Full Text Available Geoportals are established to function as main gateways to find, evaluate, and start “using” geographic information. Still, current geoportal implementations face problems in optimizing the discovery process due to semantic heterogeneity issues, which leads to low recall and low precision in performing text-based searches. Therefore, we propose an enhanced semantic discovery approach that supports multilingualism and information domain context. Thus, we present workflow that enriches existing structured metadata with synonyms, toponyms, and translated terms derived from user-defined keywords based on multilingual thesauri and ontologies. To make the results easier and understandable, we also provide automated translation capabilities for the resource metadata to support the user in conceiving the thematic content of the descriptive metadata, even if it has been documented using a language the user is not familiar with. In addition, to text-enable spatial filtering capabilities, we add additional location name keywords to metadata sets. These are based on the existing bounding box and shall tweak discovery scores when performing single text line queries. In order to improve the user’s search experience, we tailor faceted search strategies presenting an enhanced query interface for geo-metadata discovery that are transparently leveraging the underlying thesauri and ontologies.

  9. Digital gene expression profiling of flax (Linum usitatissimum L.) stem peel identifies genes enriched in fiber-bearing phloem tissue.

    Science.gov (United States)

    Guo, Yuan; Qiu, Caisheng; Long, Songhua; Chen, Ping; Hao, Dongmei; Preisner, Marta; Wang, Hui; Wang, Yufu

    2017-08-30

    To better understand the molecular mechanisms and gene expression characteristics associated with development of bast fiber cell within flax stem phloem, the gene expression profiling of flax stem peels and leaves were screened, using Illumina's Digital Gene Expression (DGE) analysis. Four DGE libraries (2 for stem peel and 2 for leaf), ranging from 6.7 to 9.2 million clean reads were obtained, which produced 7.0 million and 6.8 million mapped reads for flax stem peel and leave, respectively. By differential gene expression analysis, a total of 975 genes, of which 708 (73%) genes have protein-coding annotation, were identified as phloem enriched genes putatively involved in the processes of polysaccharide and cell wall metabolism. Differential expression genes (DEGs) was validated using quantitative RT-PCR, the expression pattern of all nine genes determined by qRT-PCR fitted in well with that obtained by sequencing analysis. Cluster and Gene Ontology (GO) analysis revealed that a large number of genes related to metabolic process, catalytic activity and binding category were expressed predominantly in the stem peels. The Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of the phloem enriched genes suggested approximately 111 biological pathways. The large number of genes and pathways produced from DGE sequencing will expand our understanding of the complex molecular and cellular events in flax bast fiber development and provide a foundation for future studies on fiber development in other bast fiber crops. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Transcriptomic Analysis Identifies Candidate Genes and Gene Sets Controlling the Response of Porcine Peripheral Blood Mononuclear Cells to Poly I:C Stimulation

    Directory of Open Access Journals (Sweden)

    Jiying Wang

    2016-05-01

    Full Text Available Polyinosinic-polycytidylic acid (poly I:C, a synthetic dsRNA analog, has been demonstrated to have stimulatory effects similar to viral dsRNA. To gain deep knowledge of the host transcriptional response of pigs to poly I:C stimulation, in the present study, we cultured and stimulated peripheral blood mononuclear cells (PBMC of piglets of one Chinese indigenous breed (Dapulian and one modern commercial breed (Landrace with poly I:C, and compared their transcriptional profiling using RNA-sequencing (RNA-seq. Our results indicated that poly I:C stimulation can elicit significantly differentially expressed (DE genes in Dapulian (g = 290 as well as Landrace (g = 85. We also performed gene set analysis using the Gene Set Enrichment Analysis (GSEA package, and identified some significantly enriched gene sets in Dapulian (g = 18 and Landrace (g = 21. Most of the shared DE genes and gene sets were immune-related, and may play crucial rules in the immune response of poly I:C stimulation. In addition, we detected large sets of significantly DE genes and enriched gene sets when comparing the gene expression profile between the two breeds, including control and poly I:C stimulation groups. Besides immune-related functions, some of the DE genes and gene sets between the two breeds were involved in development and growth of various tissues, which may be correlated with the different characteristics of the two breeds. The DE genes and gene sets detected herein provide crucial information towards understanding the immune regulation of antiviral responses, and the molecular mechanisms of different genetic resistance to viral infection, in modern and indigenous pigs.

  11. AMD-associated genes encoding stress-activated MAPK pathway constituents are identified by interval-based enrichment analysis.

    Directory of Open Access Journals (Sweden)

    John Paul SanGiovanni

    Full Text Available PURPOSE: To determine whether common DNA sequence variants within groups of genes encoding elements of stress-activated mitogen-activated protein kinase (MAPK signaling pathways are, in aggregate, associated with advanced AMD (AAMD. METHODS: We used meta-regression and exact testing methods to identify AAMD-associated SNPs in 1177 people with AAMD and 1024 AMD-free elderly peers from 3 large-scale genotyping projects on the molecular genetics of AMD. SNPs spanning independent AAMD-associated genomic intervals were examined with a multi-locus-testing method (INRICH for enrichment within five sets of genes encoding constituents of stress-activated MAPK signaling cascades. RESULTS: Four-of-five pathway gene sets showed enrichment with AAMD-associated SNPs; findings persisted after adjustment for multiple testing in two. Strongest enrichment signals (P = 0.006 existed in a c-Jun N-terminal kinase (JNK/MAPK cascade (Science Signaling, STKE CMP_10827. In this pathway, seven independent AAMD-associated regions were resident in 6 of 25 genes examined. These included sequence variants in: 1 three MAP kinase kinase kinases (MAP3K4, MAP3K5, MAP3K9 that phosphorylate and activate the MAP kinase kinases MAP2K4 and MAP2K7 (molecules that phosphorylate threonine and tyrosine residues within the activation loop of JNK; 2 a target of MAP2K7 (JNK3A1 that activates complexes involved in transcriptional regulation of stress related genes influencing cell proliferation, apoptosis, motility, metabolism and DNA repair; and 3 NR2C2, a transcription factor activated by JNK1A1 (a drugable molecule influencing retinal cell viability in model systems. We also observed AAMD-related sequence variants resident in genes encoding PPP3CA (a drugable molecule that inactivates MAP3K5, and two genes (TGFB2, TGFBR2 encoding factors involved in MAPK sensing of growth factors/cytokines. CONCLUSIONS: Linkage disequilibrium (LD-independent genomic enrichment analysis yielded

  12. Transcriptome outlier analysis implicates schizophrenia susceptibility genes and enriches putatively functional rare genetic variants.

    Science.gov (United States)

    Duan, Jubao; Sanders, Alan R; Moy, Winton; Drigalenko, Eugene I; Brown, Eric C; Freda, Jessica; Leites, Catherine; Göring, Harald H H; Gejman, Pablo V

    2015-08-15

    We searched a gene expression dataset comprised of 634 schizophrenia (SZ) cases and 713 controls for expression outliers (i.e., extreme tails of the distribution of transcript expression values) with SZ cases overrepresented compared with controls. These outlier genes were enriched for brain expression and for genes known to be associated with neurodevelopmental disorders. SZ cases showed higher outlier burden (i.e., total outlier events per subject) than controls for genes within copy number variants (CNVs) associated with SZ or neurodevelopmental disorders. Outlier genes were enriched for CNVs and for rare putative regulatory variants, but this only explained a small proportion of the outlier subjects, highlighting the underlying presence of additional genetic and potentially, epigenetic mechanisms.

  13. A brain region-specific predictive gene map for autism derived by profiling a reference gene set.

    Directory of Open Access Journals (Sweden)

    Ajay Kumar

    Full Text Available Molecular underpinnings of complex psychiatric disorders such as autism spectrum disorders (ASD remain largely unresolved. Increasingly, structural variations in discrete chromosomal loci are implicated in ASD, expanding the search space for its disease etiology. We exploited the high genetic heterogeneity of ASD to derive a predictive map of candidate genes by an integrated bioinformatics approach. Using a reference set of 84 Rare and Syndromic candidate ASD genes (AutRef84, we built a composite reference profile based on both functional and expression analyses. First, we created a functional profile of AutRef84 by performing Gene Ontology (GO enrichment analysis which encompassed three main areas: 1 neurogenesis/projection, 2 cell adhesion, and 3 ion channel activity. Second, we constructed an expression profile of AutRef84 by conducting DAVID analysis which found enrichment in brain regions critical for sensory information processing (olfactory bulb, occipital lobe, executive function (prefrontal cortex, and hormone secretion (pituitary. Disease specificity of this dual AutRef84 profile was demonstrated by comparative analysis with control, diabetes, and non-specific gene sets. We then screened the human genome with the dual AutRef84 profile to derive a set of 460 potential ASD candidate genes. Importantly, the power of our predictive gene map was demonstrated by capturing 18 existing ASD-associated genes which were not part of the AutRef84 input dataset. The remaining 442 genes are entirely novel putative ASD risk genes. Together, we used a composite ASD reference profile to generate a predictive map of novel ASD candidate genes which should be prioritized for future research.

  14. The mouse X chromosome is enriched for multicopy testis genes showing postmeiotic expression.

    Science.gov (United States)

    Mueller, Jacob L; Mahadevaiah, Shantha K; Park, Peter J; Warburton, Peter E; Page, David C; Turner, James M A

    2008-06-01

    According to the prevailing view, mammalian X chromosomes are enriched in spermatogenesis genes expressed before meiosis and deficient in spermatogenesis genes expressed after meiosis. The paucity of postmeiotic genes on the X chromosome has been interpreted as a consequence of meiotic sex chromosome inactivation (MSCI)--the complete silencing of genes on the XY bivalent at meiotic prophase. Recent studies have concluded that MSCI-initiated silencing persists beyond meiosis and that most genes on the X chromosome remain repressed in round spermatids. Here, we report that 33 multicopy gene families, representing approximately 273 mouse X-linked genes, are expressed in the testis and that this expression is predominantly in postmeiotic cells. RNA FISH and microarray analysis show that the maintenance of X chromosome postmeiotic repression is incomplete. Furthermore, X-linked multicopy genes exhibit a similar degree of expression as autosomal genes. Thus, not only is the mouse X chromosome enriched for spermatogenesis genes functioning before meiosis, but in addition, approximately 18% of mouse X-linked genes are expressed in postmeiotic cells.

  15. Using targeted enrichment of nuclear genes to increase phylogenetic resolution in the neotropical rain forest genus Inga (Leguminosae: Mimosoideae

    Directory of Open Access Journals (Sweden)

    James A Nicholls

    2015-09-01

    Full Text Available Evolutionary radiations are prominent and pervasive across many plant lineages in diverse geographical and ecological settings; in neotropical rainforests there is growing evidence suggesting that a significant fraction of species richness is the result of recent radiations. Understanding the evolutionary trajectories and mechanisms underlying these radiations demands much greater phylogenetic resolution than is currently available for these groups. The neotropical tree genus Inga (Leguminosae is a good example, with ~300 extant species and a crown age of 2-10 MY, yet over 6kb of plastid and nuclear DNA sequence data gives only poor phylogenetic resolution among species. Here we explore the use of larger-scale nuclear gene data obtained though targeted enrichment to increase phylogenetic resolution within Inga. Transcriptome data from three Inga species were used to select 264 nuclear loci for targeted enrichment and sequencing. Following quality control to remove probable paralogs from these sequence data, the final dataset comprised 259,313 bases from 194 loci for 24 accessions representing 22 Inga species and an outgroup (Zygia. Bayesian phylogenies reconstructed using either all loci concatenated or a subset of 60 loci in a gene-tree/species-tree approach yielded highly resolved phylogenies. We used coalescent approaches to show that the same targeted enrichment data also have significant power to discriminate among alternative within-species population histories in the widespread species I. umbellifera. In either application, targeted enrichment simplifies the informatics challenge of identifying orthologous loci associated with de novo genome sequencing. We conclude that targeted enrichment provides the large volumes of phylogenetically-informative sequence data required to resolve relationships within recent plant species radiations, both at the species level and for within-species phylogeographic studies.

  16. Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci

    DEFF Research Database (Denmark)

    Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan

    2017-01-01

    BACKGROUND: Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription facto...

  17. A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity.

    Directory of Open Access Journals (Sweden)

    Adi L Tarca

    Full Text Available Identification of functional sets of genes associated with conditions of interest from omics data was first reported in 1999, and since, a plethora of enrichment methods were published for systematic analysis of gene sets collections including Gene Ontology and biological pathways. Despite their widespread usage in reducing the complexity of omics experiment results, their performance is poorly understood. Leveraging the existence of disease specific gene sets in KEGG and Metacore® databases, we compared the performance of sixteen methods under relaxed assumptions while using 42 real datasets (over 1,400 samples. Most of the methods ranked high the gene sets designed for specific diseases whenever samples from affected individuals were compared against controls via microarrays. The top methods for gene set prioritization were different from the top ones in terms of sensitivity, and four of the sixteen methods had large false positives rates assessed by permuting the phenotype of the samples. The best overall methods among those that generated reasonably low false positive rates, when permuting phenotypes, were PLAGE, GLOBALTEST, and PADOG. The best method in the category that generated higher than expected false positives was MRGSE.

  18. Differential Gene Expression Profiling of Enriched Human Spermatogonia after Short- and Long-Term Culture

    Directory of Open Access Journals (Sweden)

    Sabine Conrad

    2014-01-01

    Full Text Available This study aimed to provide a molecular signature for enriched adult human stem/progenitor spermatogonia during short-term (<2 weeks and long-term culture (up to more than 14 months in comparison to human testicular fibroblasts and human embryonic stem cells. Human spermatogonia were isolated by CD49f magnetic activated cell sorting and collagen−/laminin+ matrix binding from primary testis cultures obtained from ten adult men. For transcriptomic analysis, single spermatogonia-like cells were collected based on their morphology and dimensions using a micromanipulation system from the enriched germ cell cultures. Immunocytochemical, RT-PCR and microarray analyses revealed that the analyzed populations of cells were distinct at the molecular level. The germ- and pluripotency-associated genes and genes of differentiation/spermatogenesis pathway were highly expressed in enriched short-term cultured spermatogonia. After long-term culture, a proportion of cells retained and aggravated the “spermatogonial” gene expression profile with the expression of germ and pluripotency-associated genes, while in the majority of long-term cultured cells this molecular profile, typical for the differentiation pathway, was reduced and more genes related to the extracellular matrix production and attachment were expressed. The approach we provide here to study the molecular status of in vitro cultured spermatogonia may be important to optimize the culture conditions and to evaluate the germ cell plasticity in the future.

  19. Differential gene expression profiling of enriched human spermatogonia after short- and long-term culture.

    Science.gov (United States)

    Conrad, Sabine; Azizi, Hossein; Hatami, Maryam; Kubista, Mikael; Bonin, Michael; Hennenlotter, Jörg; Renninger, Markus; Skutella, Thomas

    2014-01-01

    This study aimed to provide a molecular signature for enriched adult human stem/progenitor spermatogonia during short-term (differentiation/spermatogenesis pathway were highly expressed in enriched short-term cultured spermatogonia. After long-term culture, a proportion of cells retained and aggravated the "spermatogonial" gene expression profile with the expression of germ and pluripotency-associated genes, while in the majority of long-term cultured cells this molecular profile, typical for the differentiation pathway, was reduced and more genes related to the extracellular matrix production and attachment were expressed. The approach we provide here to study the molecular status of in vitro cultured spermatogonia may be important to optimize the culture conditions and to evaluate the germ cell plasticity in the future.

  20. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    Science.gov (United States)

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  1. Dorsal horn-enriched genes identified by DNA microarray, in situ hybridization and immunohistochemistry

    Directory of Open Access Journals (Sweden)

    Koblan Kenneth S

    2002-08-01

    Full Text Available Abstract Background Neurons in the dorsal spinal cord play important roles in nociception and pain. These neurons receive input from peripheral sensory neurons and then transmit the signals to the brain, as well as receive and integrate descending control signals from the brain. Many molecules important for pain transmission have been demonstrated to be localized to the dorsal horn of the spinal cord. Further understanding of the molecular interactions and signaling pathways in the dorsal horn neurons will require a better knowledge of the molecular neuroanatomy in the dorsal spinal cord. Results A large scale screening was conducted for genes with enriched expression in the dorsal spinal cord using DNA microarray and quantitative real-time PCR. In addition to genes known to be specifically expressed in the dorsal spinal cord, other neuropeptides, receptors, ion channels, and signaling molecules were also found enriched in the dorsal spinal cord. In situ hybridization and immunohistochemistry revealed the cellular expression of a subset of these genes. The regulation of a subset of the genes was also studied in the spinal nerve ligation (SNL neuropathic pain model. In general, we found that the genes that are enriched in the dorsal spinal cord were not among those found to be up-regulated in the spinal nerve ligation model of neuropathic pain. This study also provides a level of validation of the use of DNA microarrays in conjunction with our novel analysis algorithm (SAFER for the identification of differences in gene expression. Conclusion This study identified molecules that are enriched in the dorsal horn of the spinal cord and provided a molecular neuroanatomy in the spinal cord, which will aid in the understanding of the molecular mechanisms important in nociception and pain.

  2. Semantic particularity measure for functional characterization of gene sets using gene ontology.

    Science.gov (United States)

    Bettembourg, Charles; Diot, Christian; Dameron, Olivier

    2014-01-01

    Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity. We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term's distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure. Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.

  3. Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture.

    Science.gov (United States)

    Johnson, Timothy A; Stedtfeld, Robert D; Wang, Qiong; Cole, James R; Hashsham, Syed A; Looft, Torey; Zhu, Yong-Guan; Tiedje, James M

    2016-04-12

    Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if

  4. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data

    Science.gov (United States)

    Ben-Ari Fuchs, Shani; Lieder, Iris; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-01-01

    Abstract Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®—the human gene database; the MalaCards—the human diseases database; and the PathCards—the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics

  5. Composting-Like Conditions Are More Efficient for Enrichment and Diversity of Organisms Containing Cellulase-Encoding Genes than Submerged Cultures.

    Science.gov (United States)

    Heiss-Blanquet, Senta; Fayolle-Guichard, Françoise; Lombard, Vincent; Hébert, Agnès; Coutinho, Pedro M; Groppi, Alexis; Barre, Aurélien; Henrissat, Bernard

    2016-01-01

    Cost-effective biofuel production from lignocellulosic biomass depends on efficient degradation of the plant cell wall. One of the major obstacles for the development of a cost-efficient process is the lack of resistance of currently used fungal enzymes to harsh conditions such as high temperature. Adapted, thermophilic microbial communities provide a huge reservoir of potentially interesting lignocellulose-degrading enzymes for improvement of the cellulose hydrolysis step. In order to identify such enzymes, a leaf and wood chip compost was enriched on a mixture of thermo-chemically pretreated wheat straw, poplar and Miscanthus under thermophile conditions, but in two different set-ups. Unexpectedly, metagenome sequencing revealed that incubation of the lignocellulosic substrate with compost as inoculum in a suspension culture resulted in an impoverishment of putative cellulase- and hemicellulase-encoding genes. However, mimicking composting conditions without liquid phase yielded a high number and diversity of glycoside hydrolase genes and an enrichment of genes encoding cellulose binding domains. These identified genes were most closely related to species from Actinobacteria, which seem to constitute important players of lignocellulose degradation under the applied conditions. The study highlights that subtle changes in an enrichment set-up can have an important impact on composition and functions of the microcosm. Composting-like conditions were found to be the most successful method for enrichment in species with high biomass degrading capacity.

  6. A Bayesian variable selection procedure to rank overlapping gene sets

    Directory of Open Access Journals (Sweden)

    Skarman Axel

    2012-05-01

    Full Text Available Abstract Background Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability and stable given a limited number of observations. Conclusions Bayesian variable selection is a useful way to prioritize gene sets while considering their overlaps. Ignoring the overlaps gives different and possibly misleading results. Additional procedures may be needed in cases of highly overlapping pathways that are hard to prioritize.

  7. A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes.

    Science.gov (United States)

    de Jong, Simone; Boks, Marco P M; Fuller, Tova F; Strengman, Eric; Janson, Esther; de Kovel, Carolien G F; Ori, Anil P S; Vi, Nancy; Mulder, Flip; Blom, Jan Dirk; Glenthøj, Birte; Schubart, Chris D; Cahn, Wiepke; Kahn, René S; Horvath, Steve; Ophoff, Roel A

    2012-01-01

    Despite large-scale genome-wide association studies (GWAS), the underlying genes for schizophrenia are largely unknown. Additional approaches are therefore required to identify the genetic background of this disorder. Here we report findings from a large gene expression study in peripheral blood of schizophrenia patients and controls. We applied a systems biology approach to genome-wide expression data from whole blood of 92 medicated and 29 antipsychotic-free schizophrenia patients and 118 healthy controls. We show that gene expression profiling in whole blood can identify twelve large gene co-expression modules associated with schizophrenia. Several of these disease related modules are likely to reflect expression changes due to antipsychotic medication. However, two of the disease modules could be replicated in an independent second data set involving antipsychotic-free patients and controls. One of these robustly defined disease modules is significantly enriched with brain-expressed genes and with genetic variants that were implicated in a GWAS study, which could imply a causal role in schizophrenia etiology. The most highly connected intramodular hub gene in this module (ABCF1), is located in, and regulated by the major histocompatibility (MHC) complex, which is intriguing in light of the fact that common allelic variants from the MHC region have been implicated in schizophrenia. This suggests that the MHC increases schizophrenia susceptibility via altered gene expression of regulatory genes in this network.

  8. EWS and FUS bind a subset of transcribed genes encoding proteins enriched in RNA regulatory functions

    DEFF Research Database (Denmark)

    Luo, Yonglun; Friis, Jenny Blechingberg; Fernandes, Ana Miguel;

    2015-01-01

    IP-seq). Our results show that FUS and EWS bind to a subset of actively transcribed genes, that binding often is downstream the poly(A)-signal, and that binding overlaps with RNA polymerase II. Functional examinations of selected target genes identified that FUS and EWS can regulate gene expression...... at different levels. Gene Ontology analyses showed that FUS and EWS target genes preferentially encode proteins involved in regulatory processes at the RNA level. Conclusions The presented results yield new insights into gene interactions of EWS and FUS and have identified a set of FUS and EWS target genes...

  9. Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data

    Directory of Open Access Journals (Sweden)

    Tintle Nathan L

    2012-08-01

    Full Text Available Abstract Background Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. Results We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Conclusions Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.

  10. Microbial gene functions enriched in the Deepwater Horizon deep-sea oil plume

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Z.; Deng, Y.; Nostrand, J.D. Van; He, Z.; Voordeckers, J.; Zhou, A.; Lee, Y.-J.; Mason, O.U.; Dubinsky, E.; Chavarria, K.; Tom, L.; Fortney, J.; Lamendella, R.; Jansson, J.K.; D?haeseleer, P.; Hazen, T.C.; Zhou, J.

    2011-06-15

    The Deepwater Horizon oil spill in the Gulf of Mexico is the deepest and largest offshore spill in U.S. history and its impacts on marine ecosystems are largely unknown. Here, we showed that the microbial community functional composition and structure were dramatically altered in a deep-sea oil plume resulting from the spill. A variety of metabolic genes involved in both aerobic and anaerobic hydrocarbon degradation were highly enriched in the plume compared to outside the plume, indicating a great potential for intrinsic bioremediation or natural attenuation in the deep-sea. Various other microbial functional genes relevant to carbon, nitrogen, phosphorus, sulfur and iron cycling, metal resistance, and bacteriophage replication were also enriched in the plume. Together, these results suggest that the indigenous marine microbial communities could play a significant role in biodegradation of oil spills in deep-sea environments.

  11. Systematic analysis of a novel human renal glomerulus-enriched gene expression dataset.

    Directory of Open Access Journals (Sweden)

    Maja T Lindenmeyer

    Full Text Available Glomerular diseases account for the majority of cases with chronic renal failure. Several genes have been identified with key relevance for glomerular function. Quite a few of these genes show a specific or preferential mRNA expression in the renal glomerulus. To identify additional candidate genes involved in glomerular function in humans we generated a human renal glomerulus-enriched gene expression dataset (REGGED by comparing gene expression profiles from human glomeruli and tubulointerstitium obtained from six transplant living donors using Affymetrix HG-U133A arrays. This analysis resulted in 677 genes with prominent overrepresentation in the glomerulus. Genes with 'a priori' known prominent glomerular expression served for validation and were all found in the novel dataset (e.g. CDKN1, DAG1, DDN, EHD3, MYH9, NES, NPHS1, NPHS2, PDPN, PLA2R1, PLCE1, PODXL, PTPRO, SYNPO, TCF21, TJP1, WT1. The mRNA expression of several novel glomerulus-enriched genes in REGGED was validated by qRT-PCR. Gene ontology and pathway analysis identified biological processes previously not reported to be of relevance in glomeruli of healthy human adult kidneys including among others axon guidance. This finding was further validated by assessing the expression of the axon guidance molecules neuritin (NRN1 and roundabout receptor ROBO1 and -2. In diabetic nephropathy, a prevalent glomerulopathy, differential regulation of glomerular ROBO2 mRNA was found.In summary, novel transcripts with predominant expression in the human glomerulus could be identified using a comparative strategy on microdissected nephrons. A systematic analysis of this glomerulus-specific gene expression dataset allows the detection of target molecules and biological processes involved in glomerular biology and renal disease.

  12. The effect of peptidoglycan enriched diets on antimicrobial peptide gene expression in rainbow trout (Oncorhynchus mykiss).

    Science.gov (United States)

    Casadei, Elisa; Bird, Steve; Vecino, Jose L González; Wadsworth, Simon; Secombes, Christopher J

    2013-02-01

    The aim of this study was to investigate the effect of feeding rainbow trout (Oncorhynchus mykiss) peptidoglycan (PG) enriched diets on antimicrobial peptide (AMP) gene expression. Fish were divided into 5 groups and fed diets containing 0, 5, 10, 50 and 100 mg PG/Kg, and sampled 1, 7 and 14 days later. The expression of eight AMP genes (four defensins, two cathelicidins and two liver expressed AMPs) was determined in skin, gill, gut and liver, tissues important for first lines of defence or production of acute phase proteins. Up-regulation of many AMPs was found after feeding the PG enriched diets, with sequential expression seen over the time course studied, where defensins were typically expressed early and cathelicidins and LEAPs later on. A number of clear differences in AMP responsiveness between the tissues examined were also apparent. Of the four PG concentrations used, 5 mg PG/Kg did not always elicit AMP gene induction or to the same degree as seen with the other diets. The three higher dose groups generally showed similar trends although differences in fold change were more pronounced in the 50 and 100 mg PG/Kg groups. Curiously several AMPs were down-regulated after 14 days of feeding in gills, gut and liver. Nevertheless, overall the PG enriched diets had a positive effect on AMP expression. Further investigations now need to be undertaken to confirm whether this higher AMP gene expression correlates with protection against common bacterial diseases and if PG enriched diets have value as a means to temporarily boost the piscine immune system.

  13. Involvement of astrocyte and oligodendrocyte gene sets in migraine.

    Science.gov (United States)

    Eising, Else; de Leeuw, Christiaan; Min, Josine L; Anttila, Verneri; Verheijen, Mark Hg; Terwindt, Gisela M; Dichgans, Martin; Freilinger, Tobias; Kubisch, Christian; Ferrari, Michel D; Smit, August B; de Vries, Boukje; Palotie, Aarno; van den Maagdenberg, Arn Mjm; Posthuma, Danielle

    2016-06-01

    Migraine is a common episodic brain disorder characterized by recurrent attacks of severe unilateral headache and additional neurological symptoms. Two main migraine types can be distinguished based on the presence of aura symptoms that can accompany the headache: migraine with aura and migraine without aura. Multiple genetic and environmental factors confer disease susceptibility. Recent genome-wide association studies (GWAS) indicate that migraine susceptibility genes are involved in various pathways, including neurotransmission, which have already been implicated in genetic studies of monogenic familial hemiplegic migraine, a subtype of migraine with aura. To further explore the genetic background of migraine, we performed a gene set analysis of migraine GWAS data of 4954 clinic-based patients with migraine, as well as 13,390 controls. Curated sets of synaptic genes and sets of genes predominantly expressed in three glial cell types (astrocytes, microglia and oligodendrocytes) were investigated. Our results show that gene sets containing astrocyte- and oligodendrocyte-related genes are associated with migraine, which is especially true for gene sets involved in protein modification and signal transduction. Observed differences between migraine with aura and migraine without aura indicate that both migraine types, at least in part, seem to have a different genetic background. © International Headache Society 2015.

  14. Using Goal-Setting Strategies To Enrich the Practicum and Internship Experiences of Beginning Counselors.

    Science.gov (United States)

    Curtis, Russell C.

    2000-01-01

    Goal setting can be an effective way to help beginning counselors focus on important developmental issues. This article argues that counselors and supervisors must consider issues related to goal-setting theory and understand the process by which goals are set so that optimal learning experiences are created. (Author/MKA)

  15. Core and region-enriched networks of behaviorally regulated genes and the singing genome

    Science.gov (United States)

    Whitney, Osceola; Pfenning, Andreas R.; Howard, Jason T.; Blatti, Charles A; Liu, Fang; Ward, James M.; Wang, Rui; Audet, Jean-Nicolas; Kellis, Manolis; Mukherjee, Sayan; Sinha, Saurabh; Hartemink, Alexander J.; West, Anne E.; Jarvis, Erich D.

    2015-01-01

    Songbirds represent an important model organism for elucidating molecular mechanisms that link genes with complex behaviors, in part because they have discrete vocal learning circuits that have parallels with those that mediate human speech. We found that ~10% of the genes in the avian genome were regulated by singing, and we found a striking regional diversity of both basal and singing-induced programs in the four key song nuclei of the zebra finch, a vocal learning songbird. The region-enriched patterns were a result of distinct combinations of region-enriched transcription factors (TFs), their binding motifs, and presinging acetylation of histone 3 at lysine 27 (H3K27ac) enhancer activity in the regulatory regions of the associated genes. RNA interference manipulations validated the role of the calcium-response transcription factor (CaRF) in regulating genes preferentially expressed in specific song nuclei in response to singing. Thus, differential combinatorial binding of a small group of activity-regulated TFs and predefined epigenetic enhancer activity influences the anatomical diversity of behaviorally regulated gene networks. PMID:25504732

  16. Constructing gene-enriched plant genomic libraries using methylation filtration technology.

    Science.gov (United States)

    Rabinowicz, Pablo D

    2003-01-01

    Full genome sequencing in higher plants is a very difficult task, because their genomes are often very large and repetitive. For this reason, gene targeted partial genomic sequencing becomes a realistic option. The method reported here is a simple approach to generate gene-enriched plant genomic libraries called methylation filtration. This technique takes advantage of the fact that repetitive DNA is heavily methylated and genes are hypomethylated. Then, by simply using an Escherichia coli host strain harboring a wild-type modified cytosine restriction (McrBC) system, which cuts DNA containing methylcytosine, repetitive DNA is eliminated from these genomic libraries, while low copy DNA (i.e., genes) is recovered. To prevent cloning significant proportions of organelle DNA, a crude nuclear preparation must be performed prior to purifying genomic DNA. Adaptor-mediated cloning and DNA size fractionation are necessary for optimal results.

  17. A Bayesian variable selection procedure for ranking overlapping gene sets

    DEFF Research Database (Denmark)

    Skarman, Axel; Mahdi Shariati, Mohammad; Janss, Luc

    2012-01-01

    described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian...... variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our...... data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability...

  18. Enrichment of short interspersed transposable elements to embryonic stem cell-specific hypomethylated gene regions.

    Science.gov (United States)

    Muramoto, Hiroki; Yagi, Shintaro; Hirabayashi, Keiji; Sato, Shinya; Ohgane, Jun; Tanaka, Satoshi; Shiota, Kunio

    2010-08-01

    Embryonic stem cells (ESCs) have a distinctive epigenome, which includes their genome-wide DNA methylation modification status, as represented by the ESC-specific hypomethylation of tissue-dependent and differentially methylated regions (T-DMRs) of Pou5f1 and Nanog. Here, we conducted a genome-wide investigation of sequence characteristics associated with T-DMRs that were differentially methylated between ESCs and somatic cells, by focusing on transposable elements including short interspersed elements (SINEs), long interspersed elements (LINEs) and long terminal repeats (LTRs). We found that hypomethylated T-DMRs were predominantly present in SINE-rich/LINE-poor genomic loci. The enrichment for SINEs spread over 300 kb in cis and there existed SINE-rich genomic domains spreading continuously over 1 Mb, which contained multiple hypomethylated T-DMRs. The characterization of sequence information showed that the enriched SINEs were relatively CpG rich and belonged to specific subfamilies. A subset of the enriched SINEs were hypomethylated T-DMRs in ESCs at Dppa3 gene locus, although SINEs are overall methylated in both ESCs and the liver. In conclusion, we propose that SINE enrichment is the genomic property of regions harboring hypomethylated T-DMRs in ESCs, which is a novel aspect of the ESC-specific epigenomic information.

  19. Genes2GO: A web application for querying gene sets for specific GO terms.

    Science.gov (United States)

    Chawla, Konika; Kuiper, Martin

    2016-01-01

    Gene ontology annotations have become an essential resource for biological interpretations of experimental findings. The process of gathering basic annotation information in tables that link gene sets with specific gene ontology terms can be cumbersome, in particular if it requires above average computer skills or bioinformatics expertise. We have therefore developed Genes2GO, an intuitive R-based web application. Genes2GO uses the biomaRt package of Bioconductor in order to retrieve custom sets of gene ontology annotations for any list of genes from organisms covered by the Ensembl database. Genes2GO produces a binary matrix file, indicating for each gene the presence or absence of specific annotations for a gene. It should be noted that other GO tools do not offer this user-friendly access to annotations. Genes2GO is freely available and listed under http://www.semantic-systems-biology.org/tools/externaltools/.

  20. Screening Key Genes Associated with the Development and Progression of Non-small Cell Lung Cancer Based on Gene-enrichment Analysis and Meta-analysis

    Directory of Open Access Journals (Sweden)

    Wenwu HE

    2012-07-01

    Full Text Available Background and objective Non-small cell lung cancer (NSCLC is one of the most common malignant tumors; however, its causes are still not completely understood. This study was designed to screen the key genes and pathways related to NSCLC occurrence and development and to establish the scientific foundation for the genetic mechanisms and targeted therapy of NSCLC. Methods Both gene set-enrichment analysis (GSEA and meta-analysis (meta were used to screen the critical pathways and genes that might be corretacted with the development and progression of lung cancer at the transcription level. Results Using the GSEA and meta methods, focal adhesion and regulation of actin cytoskeleton were determined to be the more prominent overlapping significant pathways. In the focal adhesion pathway, 31 genes were statistically significant (P<0.05, whereas in the regulation of actin cytoskeleton pathway, 32 genes were statistically significant (P<0.05. Conclusion The focal adhesion and the regulation of actin cytoskeleton pathways might play important roles in the occurrence and development of NSCLC. Further studies are needed to determine the biological function for the positiue genes.

  1. An Immune Response Enriched 72-Gene Prognostic Profile for Early-Stage Non-Small-Cell Lung Cancer

    NARCIS (Netherlands)

    Roepman, P.; Jassem, J; Smit, E.F.; Muley, T.; Niklinski, J.; Vel, van de T.; Witteveen, A.T.; Rzyman, W.; Floore, A.; Burgers, S.; Giaccone, G.; Meister, M.; Dienemann, H.; Skrzypski, M.; Kozlowski, M.; Mooi, W.J.; Zandwijk, van N.

    2009-01-01

    0.01; n = 69). Genes in our prognostic signature were strongly enriched for genes associated with immune response. Conclusions: Our 72-gene signature is closely associated with recurrence-free and overall survival in early-stage NSCLC patients and may become a tool for patient selection for adjuvant

  2. Ribosomal protein genes are highly enriched among genes with allele-specific expression in the interspecific F1 hybrid catfish.

    Science.gov (United States)

    Chen, Ailu; Wang, Ruijia; Liu, Shikai; Peatman, Eric; Sun, Luyang; Bao, Lisui; Jiang, Chen; Li, Chao; Li, Yun; Zeng, Qifan; Liu, Zhanjiang

    2016-06-01

    Interspecific hybrids provide a rich source for the analysis of allele-specific expression (ASE). In this work, we analyzed ASE in F1 hybrid catfish using RNA-Seq datasets. While the vast majority of genes were expressed with both alleles, 7-8 % SNPs exhibited significant differences in allele ratios of expression. Of the 66,251 and 177,841 SNPs identified from the datasets of the liver and gill, 5420 (8.2 %) and 13,390 (7.5 %) SNPs were identified as significant ASE-SNPs, respectively. With these SNPs, a total of 1519 and 3075 ASE-genes were identified. Gene Ontology analysis revealed that genes encoding cytoplasmic ribosomal proteins (RP) were highly enriched among ASE genes. Parent-of-origin was determined for 27 and 30 ASE RP genes in the liver and gill, respectively. The results indicated that genes from both channel catfish and blue catfish were involved in ASE. However, each RP gene appeared to be almost exclusively expressed from only one parent, indicating that ribosomes in the hybrid catfish were in the "hybrid" form. Overall representation of RP transcripts among the transcriptome appeared lower in the F1 hybrid catfish than in channel catfish or blue catfish, suggesting that the "hybrid" ribosomes may work more efficiently for translation in the F1 hybrid catfish.

  3. EWS and FUS bind a subset of transcribed genes encoding proteins enriched in RNA regulatory functions

    DEFF Research Database (Denmark)

    Luo, Yonglun; Friis, Jenny Blechingberg; Fernandes, Ana Miguel

    2015-01-01

    at different levels. Gene Ontology analyses showed that FUS and EWS target genes preferentially encode proteins involved in regulatory processes at the RNA level. Conclusions The presented results yield new insights into gene interactions of EWS and FUS and have identified a set of FUS and EWS target genes...... and involved in the human neurological diseases amyotrophic lateral sclerosis and fronto-temporal lobar degeneration. Results To determine the gene regulatory functions of FUS and EWS at the level of chromatin, we have performed chromatin immunoprecipitation followed by next generation sequencing (Ch......IP-seq). Our results show that FUS and EWS bind to a subset of actively transcribed genes, that binding often is downstream the poly(A)-signal, and that binding overlaps with RNA polymerase II. Functional examinations of selected target genes identified that FUS and EWS can regulate gene expression...

  4. Different EV enrichment methods suitable for clinical settings yield different subpopulations of urinary extracellular vesicles from human samples

    Directory of Open Access Journals (Sweden)

    Felix Royo

    2016-02-01

    Full Text Available Urine sample analysis is irreplaceable as a non-invasive method for disease diagnosis and follow-up. However, in urine samples, non-degraded protein and RNA may be only found in urinary extracellular vesicles (uEVs. In recent years, various methods of uEV enrichment using low volumes of urine and unsophisticated equipment have been developed, with variable success. We compared the results of the differential ultracentrifugation procedure with 4 of these methods. The methods tested were a lectin-based purification, Exoquick (System Biosciences, Total Exosome Isolation from Invitrogen and an in-house modified procedure employing the Exosomal RNA Kit from Norgen Biotek Corp. The analysis of selected gene transcripts and protein markers of extracellular vesicles (EVs revealed that each method isolates a different mixture of uEV protein markers. In our conditions, the extraction with Norgen's reagent achieved the best performance in terms of gene transcript and protein detection and reproducibility. By using this method, we were able to detect alterations of EVs protein markers in urine samples from prostate cancer adenoma patients. Taken together, our results show that the isolation of uEVs is feasible from small volumes of urine and avoiding ultracentrifugation, making easier the analysis in a clinical facility. However, caution should be taken in the selection of the enrichment method since they have a differential affinity for protein uEVs markers and by extension for different subpopulation of EVs.

  5. Polycyclic aromatic hydrocarbons (PAHs) enriching antibiotic resistance genes (ARGs) in the soils.

    Science.gov (United States)

    Chen, Baowei; He, Rong; Yuan, Ke; Chen, Enzhong; Lin, Lan; Chen, Xin; Sha, Sha; Zhong, Jianan; Lin, Li; Yang, Lihua; Yang, Ying; Wang, Xiaowei; Zou, Shichun; Luan, Tiangang

    2017-01-01

    The prevalence of antibiotic resistance genes (ARGs) in modern environment raises an emerging global health concern. In this study, soil samples were collected from three sites in petrochemical plant that represented different pollution levels of polycyclic aromatic hydrocarbons (PAHs). Metagenomic profiling of these soils demonstrated that ARGs in the PAHs-contaminated soils were approximately 15 times more abundant than those in the less-contaminated ones, with Proteobacterial being the preponderant phylum. Resistance profile of ARGs in the PAHs-polluted soils was characterized by the dominance of efflux pump-encoding ARGs associated with aromatic antibiotics (e.g., fluoroquinolones and acriflavine) that accounted for more than 70% of the total ARGs, which was significantly different from representative sources of ARG pollution due to wide use of antibiotics. Most of ARGs enriched in the PAHs-contaminated soils were not carried by plasmids, indicating the low possibilities of them being transferred between bacteria. Significant correlation was observed between the total abundance of ARGs and that of Proteobacteria in the soils. Proteobacteria selected by PAHs led to simultaneously enriching of ARGs carried by them in the soils. Our results suggested that PAHs could serve as one of selective stresses for greatly enriching of ARGs in the human-impacted environment.

  6. Evidence for intron length conservation in a set of mammalian genes associated with embryonic development

    LENUS (Irish Health Repository)

    2011-10-05

    Abstract Background We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples of genes that showed the most extreme conservation of total intron content in mammals. Results Gene sets annotated as being involved in pattern specification in the early embryo or containing the homeobox DNA-binding domain, were significantly enriched among genes with highly conserved intron content. We used ancestral sequences reconstructed with probabilistic models that account for insertion and deletion mutations to distinguish insertion and deletion events on lineages leading to human and mouse from their last common ancestor. Using a randomization procedure, we show that genes containing the homeobox domain show less change in intron content than expected, given the number of insertion and deletion events within their introns. Conclusions Our results suggest selection for gene expression precision or the existence of additional development-associated genes for which transcriptional delay is functionally significant.

  7. Evidence for intron length conservation in a set of mammalian genes associated with embryonic development

    Directory of Open Access Journals (Sweden)

    Korir Paul K

    2011-10-01

    Full Text Available Abstract Background We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples of genes that showed the most extreme conservation of total intron content in mammals. Results Gene sets annotated as being involved in pattern specification in the early embryo or containing the homeobox DNA-binding domain, were significantly enriched among genes with highly conserved intron content. We used ancestral sequences reconstructed with probabilistic models that account for insertion and deletion mutations to distinguish insertion and deletion events on lineages leading to human and mouse from their last common ancestor. Using a randomization procedure, we show that genes containing the homeobox domain show less change in intron content than expected, given the number of insertion and deletion events within their introns. Conclusions Our results suggest selection for gene expression precision or the existence of additional development-associated genes for which transcriptional delay is functionally significant.

  8. Meta-Analysis of Pathway Enrichment: Combining Independent and Dependent Omics Data Sets

    OpenAIRE

    Kaever, Alexander; Landesfeind, Manuel; Feussner, Kirstin; Morgenstern, Burkhard; Feussner, Ivo; Meinicke, Peter

    2014-01-01

    A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics techn...

  9. Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data

    Directory of Open Access Journals (Sweden)

    Simpson David

    2006-03-01

    Full Text Available Abstract Background Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regulation of the many genes specifically or highly expressed in these cells. Over the last decades, different experimental approaches have been developed to identify photoreceptor enriched genes. Recent progress in RNA analysis technology has generated large amounts of gene expression data relevant to retinal development. This paper assesses a machine learning methodology for supporting the identification of photoreceptor enriched genes based on expression data. Results Based on the analysis of publicly-available gene expression data from the developing mouse retina generated by serial analysis of gene expression (SAGE, this paper presents a predictive methodology comprising several in silico models for detecting key complex features and relationships encoded in the data, which may be useful to distinguish genes in terms of their functional roles. In order to understand temporal patterns of photoreceptor gene expression during retinal development, a two-way cluster analysis was firstly performed. By clustering SAGE libraries, a hierarchical tree reflecting relationships between developmental stages was obtained. By clustering SAGE tags, a more comprehensive expression profile for photoreceptor cells was revealed. To demonstrate the usefulness of machine learning-based models in predicting functional associations from the SAGE data, three supervised classification models were compared. The results indicated that a relatively simple instance-based model (KStar model performed significantly better than relatively more complex algorithms, e.g. neural networks. To deal with the problem of functional class imbalance occurring in the dataset, two data re

  10. Integrating genome-wide association study and expression quantitative trait loci data identifies multiple genes and gene set associated with neuroticism.

    Science.gov (United States)

    Fan, Qianrui; Wang, Wenyu; Hao, Jingcan; He, Awen; Wen, Yan; Guo, Xiong; Wu, Cuiyan; Ning, Yujie; Wang, Xi; Wang, Sen; Zhang, Feng

    2017-08-01

    Neuroticism is a fundamental personality trait with significant genetic determinant. To identify novel susceptibility genes for neuroticism, we conducted an integrative analysis of genomic and transcriptomic data of genome wide association study (GWAS) and expression quantitative trait locus (eQTL) study. GWAS summary data was driven from published studies of neuroticism, totally involving 170,906 subjects. eQTL dataset containing 927,753 eQTLs were obtained from an eQTL meta-analysis of 5311 samples. Integrative analysis of GWAS and eQTL data was conducted by summary data-based Mendelian randomization (SMR) analysis software. To identify neuroticism associated gene sets, the SMR analysis results were further subjected to gene set enrichment analysis (GSEA). The gene set annotation dataset (containing 13,311 annotated gene sets) of GSEA Molecular Signatures Database was used. SMR single gene analysis identified 6 significant genes for neuroticism, including MSRA (p value=2.27×10(-10)), MGC57346 (p value=6.92×10(-7)), BLK (p value=1.01×10(-6)), XKR6 (p value=1.11×10(-6)), C17ORF69 (p value=1.12×10(-6)) and KIAA1267 (p value=4.00×10(-6)). Gene set enrichment analysis observed significant association for Chr8p23 gene set (false discovery rate=0.033). Our results provide novel clues for the genetic mechanism studies of neuroticism. Copyright © 2017. Published by Elsevier Inc.

  11. Analysis of gene set using shrinkage covariance matrix approach

    Science.gov (United States)

    Karjanto, Suryaefiza; Aripin, Rasimah

    2013-09-01

    Microarray methodology has been exploited for different applications such as gene discovery and disease diagnosis. This technology is also used for quantitative and highly parallel measurements of gene expression. Recently, microarrays have been one of main interests of statisticians because they provide a perfect example of the paradigms of modern statistics. In this study, the alternative approach to estimate the covariance matrix has been proposed to solve the high dimensionality problem in microarrays. The extension of traditional Hotelling's T2 statistic is constructed for determining the significant gene sets across experimental conditions using shrinkage approach. Real data sets were used as illustrations to compare the performance of the proposed methods with other methods. The results across the methods are consistent, implying that this approach provides an alternative to existing techniques.

  12. Enriched Environment-induced Maternal Weight Loss Reprograms Metabolic Gene Expression in Mouse Offspring*

    Science.gov (United States)

    Wei, Yanchang; Yang, Cai-Rong; Wei, Yan-Ping; Ge, Zhao-Jia; Zhao, Zhen-Ao; Zhang, Bing; Hou, Yi; Schatten, Heide; Sun, Qing-Yuan

    2015-01-01

    The global prevalence of weight loss is increasing, especially in young women. However, the extent and mechanisms by which maternal weight loss affects the offspring is still poorly understood. Here, using an enriched environment (EE)-induced weight loss model, we show that maternal weight loss improves general health and reprograms metabolic gene expression in mouse offspring, and the epigenetic alterations can be inherited for at least two generations. EE in mothers induced weight loss and its associated physiological and metabolic changes such as decreased adiposity and improved glucose tolerance and insulin sensitivity. Relative to controls, their offspring exhibited improved general health such as reduced fat accumulation, decreased plasma and hepatic lipid levels, and improved glucose tolerance and insulin sensitivity. Maternal weight loss altered gene expression patterns in the liver of offspring with coherent down-regulation of genes involved in lipid and cholesterol biosynthesis. Epigenomic profiling of offspring livers revealed numerous changes in cytosine methylation depending on maternal weight loss, including reproducible changes in promoter methylation over several key lipid biosynthesis genes, correlated with their expression patterns. Embryo transfer studies indicated that oocyte alteration in response to maternal metabolic conditions is a strong factor in determining metabolic and epigenetic changes in offspring. Several important lipid metabolism-related genes have been identified to partially inherit methylated alleles from oocytes. Our study reveals a molecular and mechanistic basis of how maternal lifestyle modification affects metabolic changes in the offspring. PMID:25555918

  13. ROUGH SET BASED CLUSTERING OF GENE EXPRESSION DATA: A SURVEY

    Directory of Open Access Journals (Sweden)

    J.JEBA EMILYN

    2010-12-01

    Full Text Available Microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. But the high dimensionality property of gene expression data makes it difficult to be analyzed. Lot of clustering algorithms are available for clustering. In this paper we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. Then we introduce rough clustering and itsadvantage over strict and fuzzy clustering is explored. We also explain why rough clustering is preferred over other conventional methods by presenting a survey on few clustering algorithms based on rough set theory for gene expression data. We conclude by stating that this area proves to be potential research field for the researchcommunity.

  14. A Rough Set based Gene Expression Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    J. J. Emilyn

    2011-01-01

    Full Text Available Problem statement: Microarray technology helps in monitoring the expression levels of thousands of genes across collections of related samples. Approach: The main goal in the analysis of large and heterogeneous gene expression datasets was to identify groups of genes that get expressed in a set of experimental conditions. Results: Several clustering techniques have been proposed for identifying gene signatures and to understand their role and many of them have been applied to gene expression data, but with partial success. The main aim of this work was to develop a clustering algorithm that would successfully indentify gene patterns. The proposed novel clustering technique (RCGED provides an efficient way of finding the hidden and unique gene expression patterns. It overcomes the restriction of one object being placed in only one cluster. Conclusion/Recommendations: The proposed algorithm is termed intelligent because it automatically determines the optimum number of clusters. The proposed algorithm was experimented with colon cancer dataset and the results were compared with Rough Fuzzy K Means algorithm.

  15. Targeted gene enrichment and high-throughput sequencing for environmental biomonitoring: a case study using freshwater macroinvertebrates.

    Science.gov (United States)

    Dowle, Eddy J; Pochon, Xavier; C Banks, Jonathan; Shearer, Karen; Wood, Susanna A

    2016-09-01

    Recent studies have advocated biomonitoring using DNA techniques. In this study, two high-throughput sequencing (HTS)-based methods were evaluated: amplicon metabarcoding of the cytochrome C oxidase subunit I (COI) mitochondrial gene and gene enrichment using MYbaits (targeting nine different genes including COI). The gene-enrichment method does not require PCR amplification and thus avoids biases associated with universal primers. Macroinvertebrate samples were collected from 12 New Zealand rivers. Macroinvertebrates were morphologically identified and enumerated, and their biomass determined. DNA was extracted from all macroinvertebrate samples and HTS undertaken using the illumina miseq platform. Macroinvertebrate communities were characterized from sequence data using either six genes (three of the original nine were not used) or just the COI gene in isolation. The gene-enrichment method (all genes) detected the highest number of taxa and obtained the strongest Spearman rank correlations between the number of sequence reads, abundance and biomass in 67% of the samples. Median detection rates across rare (5%) taxa were highest using the gene-enrichment method (all genes). Our data indicated primer biases occurred during amplicon metabarcoding with greater than 80% of sequence reads originating from one taxon in several samples. The accuracy and sensitivity of both HTS methods would be improved with more comprehensive reference sequence databases. The data from this study illustrate the challenges of using PCR amplification-based methods for biomonitoring and highlight the potential benefits of using approaches, such as gene enrichment, which circumvent the need for an initial PCR step. © 2015 John Wiley & Sons Ltd.

  16. Textrous!: extracting semantic textual meaning from gene sets.

    Directory of Open Access Journals (Sweden)

    Hongyu Chen

    Full Text Available The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual 'tokens' from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI, sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM, and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.

  17. Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets

    Directory of Open Access Journals (Sweden)

    Burgun Anita

    2006-05-01

    Full Text Available Abstract Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence or text-mining of the published scientific literature (literature profiling. Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2 and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells. Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions.

  18. A conserved BDNF, glutamate- and GABA-enriched gene module related to human depression identified by coexpression meta-analysis and DNA variant genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Lun-Ching Chang

    Full Text Available Large scale gene expression (transcriptome analysis and genome-wide association studies (GWAS for single nucleotide polymorphisms have generated a considerable amount of gene- and disease-related information, but heterogeneity and various sources of noise have limited the discovery of disease mechanisms. As systematic dataset integration is becoming essential, we developed methods and performed meta-clustering of gene coexpression links in 11 transcriptome studies from postmortem brains of human subjects with major depressive disorder (MDD and non-psychiatric control subjects. We next sought enrichment in the top 50 meta-analyzed coexpression modules for genes otherwise identified by GWAS for various sets of disorders. One coexpression module of 88 genes was consistently and significantly associated with GWAS for MDD, other neuropsychiatric disorders and brain functions, and for medical illnesses with elevated clinical risk of depression, but not for other diseases. In support of the superior discriminative power of this novel approach, we observed no significant enrichment for GWAS-related genes in coexpression modules extracted from single studies or in meta-modules using gene expression data from non-psychiatric control subjects. Genes in the identified module encode proteins implicated in neuronal signaling and structure, including glutamate metabotropic receptors (GRM1, GRM7, GABA receptors (GABRA2, GABRA4, and neurotrophic and development-related proteins [BDNF, reelin (RELN, Ephrin receptors (EPHA3, EPHA5]. These results are consistent with the current understanding of molecular mechanisms of MDD and provide a set of putative interacting molecular partners, potentially reflecting components of a functional module across cells and biological pathways that are synchronously recruited in MDD, other brain disorders and MDD-related illnesses. Collectively, this study demonstrates the importance of integrating transcriptome data, gene

  19. A conserved BDNF, glutamate- and GABA-enriched gene module related to human depression identified by coexpression meta-analysis and DNA variant genome-wide association studies.

    Science.gov (United States)

    Chang, Lun-Ching; Jamain, Stephane; Lin, Chien-Wei; Rujescu, Dan; Tseng, George C; Sibille, Etienne

    2014-01-01

    Large scale gene expression (transcriptome) analysis and genome-wide association studies (GWAS) for single nucleotide polymorphisms have generated a considerable amount of gene- and disease-related information, but heterogeneity and various sources of noise have limited the discovery of disease mechanisms. As systematic dataset integration is becoming essential, we developed methods and performed meta-clustering of gene coexpression links in 11 transcriptome studies from postmortem brains of human subjects with major depressive disorder (MDD) and non-psychiatric control subjects. We next sought enrichment in the top 50 meta-analyzed coexpression modules for genes otherwise identified by GWAS for various sets of disorders. One coexpression module of 88 genes was consistently and significantly associated with GWAS for MDD, other neuropsychiatric disorders and brain functions, and for medical illnesses with elevated clinical risk of depression, but not for other diseases. In support of the superior discriminative power of this novel approach, we observed no significant enrichment for GWAS-related genes in coexpression modules extracted from single studies or in meta-modules using gene expression data from non-psychiatric control subjects. Genes in the identified module encode proteins implicated in neuronal signaling and structure, including glutamate metabotropic receptors (GRM1, GRM7), GABA receptors (GABRA2, GABRA4), and neurotrophic and development-related proteins [BDNF, reelin (RELN), Ephrin receptors (EPHA3, EPHA5)]. These results are consistent with the current understanding of molecular mechanisms of MDD and provide a set of putative interacting molecular partners, potentially reflecting components of a functional module across cells and biological pathways that are synchronously recruited in MDD, other brain disorders and MDD-related illnesses. Collectively, this study demonstrates the importance of integrating transcriptome data, gene coexpression modules

  20. Enrichment of pathogenic alleles in the brittle cornea gene, ZNF469, in keratoconus

    Science.gov (United States)

    Lechner, Judith; Porter, Louise F.; Rice, Aine; Vitart, Veronique; Armstrong, David J.; Schorderet, Daniel F.; Munier, Francis L.; Wright, Alan F.; Inglehearn, Chris F.; Black, Graeme C.; Simpson, David A.; Manson, Forbes; Willoughby, Colin E.

    2014-01-01

    Keratoconus, a common inherited ocular disorder resulting in progressive corneal thinning, is the leading indication for corneal transplantation in the developed world. Genome-wide association studies have identified common SNPs 100 kb upstream of ZNF469 strongly associated with corneal thickness. Homozygous mutations in ZNF469 and PR domain-containing protein 5 (PRDM5) genes result in brittle cornea syndrome (BCS) Types 1 and 2, respectively. BCS is an autosomal recessive generalized connective tissue disorder associated with extreme corneal thinning and a high risk of corneal rupture. Some individuals with heterozygous PRDM5 mutations demonstrate a carrier ocular phenotype, which includes a mildly reduced corneal thickness, keratoconus and blue sclera. We hypothesized that heterozygous variants in PRDM5 and ZNF469 predispose to the development of isolated keratoconus. We found a significant enrichment of potentially pathologic heterozygous alleles in ZNF469 associated with the development of keratoconus (P = 0.00102) resulting in a relative risk of 12.0. This enrichment of rare potentially pathogenic alleles in ZNF469 in 12.5% of keratoconus patients represents a significant mutational load and highlights ZNF469 as the most significant genetic factor responsible for keratoconus identified to date. PMID:24895405

  1. Enrichment of pathogenic alleles in the brittle cornea gene, ZNF469, in keratoconus.

    Science.gov (United States)

    Lechner, Judith; Porter, Louise F; Rice, Aine; Vitart, Veronique; Armstrong, David J; Schorderet, Daniel F; Munier, Francis L; Wright, Alan F; Inglehearn, Chris F; Black, Graeme C; Simpson, David A; Manson, Forbes; Willoughby, Colin E

    2014-10-15

    Keratoconus, a common inherited ocular disorder resulting in progressive corneal thinning, is the leading indication for corneal transplantation in the developed world. Genome-wide association studies have identified common SNPs 100 kb upstream of ZNF469 strongly associated with corneal thickness. Homozygous mutations in ZNF469 and PR domain-containing protein 5 (PRDM5) genes result in brittle cornea syndrome (BCS) Types 1 and 2, respectively. BCS is an autosomal recessive generalized connective tissue disorder associated with extreme corneal thinning and a high risk of corneal rupture. Some individuals with heterozygous PRDM5 mutations demonstrate a carrier ocular phenotype, which includes a mildly reduced corneal thickness, keratoconus and blue sclera. We hypothesized that heterozygous variants in PRDM5 and ZNF469 predispose to the development of isolated keratoconus. We found a significant enrichment of potentially pathologic heterozygous alleles in ZNF469 associated with the development of keratoconus (P = 0.00102) resulting in a relative risk of 12.0. This enrichment of rare potentially pathogenic alleles in ZNF469 in 12.5% of keratoconus patients represents a significant mutational load and highlights ZNF469 as the most significant genetic factor responsible for keratoconus identified to date.

  2. Identification and characterization of Argonaute gene family and meiosis-enriched Argonaute during sporogenesis in maize

    Institute of Scientific and Technical Information of China (English)

    Zuxin Zhang

    2014-01-01

    Argonaute (AGO) proteins play a key role in regulation of gene expression through smal RNA‐directed RNA cleavage and translational repression, and are essential for multiple developmental processes. In the present study, 17 AGO genes of maize (Zea mays L., ZmAGOs) were identified using a Hidden Markov Model and validated by rapid amplifica-tion of cDNA ends assay. Subsequently, quantitative PCR revealed that expressions of these genes were higher in reproductive than in vegetative tissues. AGOs presented five temporal and spatial expression patterns, which were likely modulated by DNA methylation, 50‐untranslated exons and microRNA‐mediated feedback loops. Intriguingly, ZmAGO18b was highly expressed in tassels during meiosis. Furthermore, in situ hybridization and immunofluorescence showed that ZmA- GO18b was enriched in the tapetum and germ cel s in meiotic anthers. We hypothesized that ZmAGOs are highly expressed in reproductive tissues, and that ZmAGO18b is a tapetum and germ cel‐specific member of the AGO family in maize.

  3. Enrichment of human hematopoietic stem/progenitor cells facilitates transduction for stem cell gene therapy.

    Science.gov (United States)

    Baldwin, Kismet; Urbinati, Fabrizia; Romero, Zulema; Campo-Fernandez, Beatriz; Kaufman, Michael L; Cooper, Aaron R; Masiuk, Katelyn; Hollis, Roger P; Kohn, Donald B

    2015-05-01

    Autologous hematopoietic stem cell (HSC) gene therapy for sickle cell disease has the potential to treat this illness without the major immunological complications associated with allogeneic transplantation. However, transduction efficiency by β-globin lentiviral vectors using CD34-enriched cell populations is suboptimal and large vector production batches may be needed for clinical trials. Transducing a cell population more enriched for HSC could greatly reduce vector needs and, potentially, increase transduction efficiency. CD34(+) /CD38(-) cells, comprising ∼1%-3% of all CD34(+) cells, were isolated from healthy cord blood CD34(+) cells by fluorescence-activated cell sorting and transduced with a lentiviral vector expressing an antisickling form of beta-globin (CCL-β(AS3) -FB). Isolated CD34(+) /CD38(-) cells were able to generate progeny over an extended period of long-term culture (LTC) compared to the CD34(+) cells and required up to 40-fold less vector for transduction compared to bulk CD34(+) preparations containing an equivalent number of CD34(+) /CD38(-) cells. Transduction of isolated CD34(+) /CD38(-) cells was comparable to CD34(+) cells measured by quantitative PCR at day 14 with reduced vector needs, and average vector copy/cell remained higher over time for LTC initiated from CD34(+) /38(-) cells. Following in vitro erythroid differentiation, HBBAS3 mRNA expression was similar in cultures derived from CD34(+) /CD38(-) cells or unfractionated CD34(+) cells. In vivo studies showed equivalent engraftment of transduced CD34(+) /CD38(-) cells when transplanted in competition with 100-fold more CD34(+) /CD38(+) cells. This work provides initial evidence for the beneficial effects from isolating human CD34(+) /CD38(-) cells to use significantly less vector and potentially improve transduction for HSC gene therapy.

  4. Genome engineering uncovers 54 evolutionarily conserved and testis-enriched genes that are not required for male fertility in mice.

    Science.gov (United States)

    Miyata, Haruhiko; Castaneda, Julio M; Fujihara, Yoshitaka; Yu, Zhifeng; Archambeault, Denise R; Isotani, Ayako; Kiyozumi, Daiji; Kriseman, Maya L; Mashiko, Daisuke; Matsumura, Takafumi; Matzuk, Ryan M; Mori, Masashi; Noda, Taichi; Oji, Asami; Okabe, Masaru; Prunskaite-Hyyrylainen, Renata; Ramirez-Solis, Ramiro; Satouh, Yuhkoh; Zhang, Qian; Ikawa, Masahito; Matzuk, Martin M

    2016-07-12

    Gene-expression analysis studies from Schultz et al. estimate that more than 2,300 genes in the mouse genome are expressed predominantly in the male germ line. As of their 2003 publication [Schultz N, Hamra FK, Garbers DL (2003) Proc Natl Acad Sci USA 100(21):12201-12206], the functions of the majority of these testis-enriched genes during spermatogenesis and fertilization were largely unknown. Since the study by Schultz et al., functional analysis of hundreds of reproductive-tract-enriched genes have been performed, but there remain many testis-enriched genes for which their relevance to reproduction remain unexplored or unreported. Historically, a gene knockout is the "gold standard" to determine whether a gene's function is essential in vivo. Although knockout mice without apparent phenotypes are rarely published, these knockout mouse lines and their phenotypic information need to be shared to prevent redundant experiments. Herein, we used bioinformatic and experimental approaches to uncover mouse testis-enriched genes that are evolutionarily conserved in humans. We then used gene-disruption approaches, including Knockout Mouse Project resources (targeting vectors and mice) and CRISPR/Cas9, to mutate and quickly analyze the fertility of these mutant mice. We discovered that 54 mutant mouse lines were fertile. Thus, despite evolutionary conservation of these genes in vertebrates and in some cases in all eukaryotes, our results indicate that these genes are not individually essential for male mouse fertility. Our phenotypic data are highly relevant in this fiscally tight funding period and postgenomic age when large numbers of genomes are being analyzed for disease association, and will prevent unnecessary expenditures and duplications of effort by others.

  5. Host response transcriptional profiling reveals extracellular components and ABC (ATP-binding cassette transporters gene enrichment in typhoid fever-infected Nigerian children

    Directory of Open Access Journals (Sweden)

    Resau James H

    2011-09-01

    Full Text Available Abstract Background Salmonella enterica serovar Typhi (S. Typhi is a human-specific pathogen that causes typhoid fever, and remains a global health problem especially in developing countries. Its pathogenesis is complex and host response is poorly understood. In Africa, typhoid fever can be a major cause of morbidity in young infected children. The onset of the illness is insidious and clinical diagnosis is often unreliable. Gold standard blood culture diagnostic services are limited, thus rapid, sensitive, and affordable diagnostic test is essential in poor-resourced clinical settings. Routine typhoid fever vaccination is highly recommended but currently licensed vaccines provide only 55-75% protection. Recent epidemiological studies also show the rapid emergence of multi-drug resistant S. Typhi strains. High-throughput molecular technologies, such as microarrays, can dissect the molecular mechanisms of host responses which are S. Typhi-specific to provide a comprehensive genomic component of immunological responses and suggest new insights for diagnosis and treatment. Methods Global transcriptional profiles of S. Typhi-infected young Nigerian children were obtained from their peripheral blood and compared with that of other bacteremic infections using Agilent gene expression microarrays. The host-response profiles of the same patients in acute vs. convalescent phases were also determined. The top 96-100 differentially-expressed genes were identified and four genes were validated by quantitative real-time PCR. Gene clusters were obtained and functional pathways were predicted by DAVID (Database for Annotation, Visualization and Integrated Discovery. Results Transcriptional profiles from S. Typhi-infected children could be distinguished from those of other bacteremic infections. Enriched gene clusters included genes associated with extracellular peptides/components such as lipocalin (LCN2 and systemic immune response which is atypical in

  6. The mouse X chromosome is enriched for sex-biased genes not subject to selection by meiotic sex chromosome inactivation.

    Science.gov (United States)

    Khil, Pavel P; Smirnova, Natalya A; Romanienko, Peter J; Camerini-Otero, R Daniel

    2004-06-01

    Sex chromosomes are subject to sex-specific selective evolutionary forces. One model predicts that genes with sex-biased expression should be enriched on the X chromosome. In agreement with Rice's hypothesis, spermatogonial genes are over-represented on the X chromosome of mice and sex- and reproduction-related genes are over-represented on the human X chromosome. Male-biased genes are under-represented on the X chromosome in worms and flies, however. Here we show that mouse spermatogenesis genes are relatively under-represented on the X chromosome and female-biased genes are enriched on it. We used Spo11(-/-) mice blocked in spermatogenesis early in meiosis to evaluate the temporal pattern of gene expression in sperm development. Genes expressed before the Spo11 block are enriched on the X chromosome, whereas those expressed later in spermatogenesis are depleted. Inactivation of the X chromosome in male meiosis may be a universal driving force for X-chromosome demasculinization.

  7. Omega-3 Fatty Acid Enriched Chevon (Goat Meat Lowers Plasma Cholesterol Levels and Alters Gene Expressions in Rats

    Directory of Open Access Journals (Sweden)

    Mahdi Ebrahimi

    2014-01-01

    Full Text Available In this study, control chevon (goat meat and omega-3 fatty acid enriched chevon were obtained from goats fed a 50% oil palm frond diet and commercial goat concentrate for 100 days, respectively. Goats fed the 50% oil palm frond diet contained high amounts of α-linolenic acid (ALA in their meat compared to goats fed the control diet. The chevon was then used to prepare two types of pellets (control or enriched chevon that were then fed to twenty-male-four-month-old Sprague-Dawley rats (n=10 in each group for 12 weeks to evaluate their effects on plasma cholesterol levels, tissue fatty acids, and gene expression. There was a significant increase in ALA and docosahexaenoic acid (DHA in the muscle tissues and liver of the rats fed the enriched chevon compared with the control group. Plasma cholesterol also decreased (P<0.05 in rats fed the enriched chevon compared to the control group. The rat pellets containing enriched chevon significantly upregulated the key transcription factor PPAR-γ and downregulated SREBP-1c expression relative to the control group. The results showed that the omega-3 fatty acid enriched chevon increased the omega-3 fatty acids in the rat tissues and altered PPAR-γ and SREBP-1c genes expression.

  8. Genome engineering uncovers 54 evolutionarily conserved and testis-enriched genes that are not required for male fertility in mice

    Science.gov (United States)

    Miyata, Haruhiko; Castaneda, Julio M.; Fujihara, Yoshitaka; Yu, Zhifeng; Archambeault, Denise R.; Isotani, Ayako; Kiyozumi, Daiji; Kriseman, Maya L.; Mashiko, Daisuke; Matsumura, Takafumi; Matzuk, Ryan M.; Mori, Masashi; Noda, Taichi; Oji, Asami; Okabe, Masaru; Prunskaite-Hyyrylainen, Renata; Ramirez-Solis, Ramiro; Satouh, Yuhkoh; Zhang, Qian; Ikawa, Masahito; Matzuk, Martin M.

    2016-01-01

    Gene-expression analysis studies from Schultz et al. estimate that more than 2,300 genes in the mouse genome are expressed predominantly in the male germ line. As of their 2003 publication [Schultz N, Hamra FK, Garbers DL (2003) Proc Natl Acad Sci USA 100(21):12201–12206], the functions of the majority of these testis-enriched genes during spermatogenesis and fertilization were largely unknown. Since the study by Schultz et al., functional analysis of hundreds of reproductive-tract–enriched genes have been performed, but there remain many testis-enriched genes for which their relevance to reproduction remain unexplored or unreported. Historically, a gene knockout is the “gold standard” to determine whether a gene’s function is essential in vivo. Although knockout mice without apparent phenotypes are rarely published, these knockout mouse lines and their phenotypic information need to be shared to prevent redundant experiments. Herein, we used bioinformatic and experimental approaches to uncover mouse testis-enriched genes that are evolutionarily conserved in humans. We then used gene-disruption approaches, including Knockout Mouse Project resources (targeting vectors and mice) and CRISPR/Cas9, to mutate and quickly analyze the fertility of these mutant mice. We discovered that 54 mutant mouse lines were fertile. Thus, despite evolutionary conservation of these genes in vertebrates and in some cases in all eukaryotes, our results indicate that these genes are not individually essential for male mouse fertility. Our phenotypic data are highly relevant in this fiscally tight funding period and postgenomic age when large numbers of genomes are being analyzed for disease association, and will prevent unnecessary expenditures and duplications of effort by others. PMID:27357688

  9. Algorithm for Finding Optimal Gene Sets in Microarray Prediction

    CERN Document Server

    Deutsch, J M

    2001-01-01

    Motivation: Microarray data has been recently been shown to be efficacious in distinguishing closely related cell types that often appear in the diagnosis of cancer. It is useful to determine the minimum number of genes needed to do such a diagnosis both for clinical use and to determine the importance of specific genes for cancer. Here a replication algorithm is used for this purpose. It evolves an ensemble of predictors, all using different combinations of genes to generate a set of optimal predictors. Results: We apply this method to the leukemia data of the Whitehead/MIT group that attempts to differentially diagnose two kinds of leukemia, and also to data of Khan et. al. to distinguish four different kinds of childhood cancers. In the latter case we were able to reduce the number of genes needed from 96 down to 15, while at the same time being able to perfectly classify all of their test data. Availability: http://stravinsky.ucsc.edu/josh/gesses/ Contact: josh@physics.ucsc.edu

  10. Neonicotinoid Insecticides Alter the Gene Expression Profile of Neuron-Enriched Cultures from Neonatal Rat Cerebellum

    Directory of Open Access Journals (Sweden)

    Junko Kimura-Kuroda

    2016-10-01

    Full Text Available Neonicotinoids are considered safe because of their low affinities to mammalian nicotinic acetylcholine receptors (nAChRs relative to insect nAChRs. However, because of importance of nAChRs in mammalian brain development, there remains a need to establish the safety of chronic neonicotinoid exposures with regards to children’s health. Here we examined the effects of longterm (14 days and low dose (1 μM exposure of neuron-enriched cultures from neonatal rat cerebellum to nicotine and two neonicotinoids: acetamiprid and imidacloprid. Immunocytochemistry revealed no differences in the number or morphology of immature neurons or glial cells in any group versus untreated control cultures. However, a slight disturbance in Purkinje cell dendritic arborization was observed in the exposed cultures. Next we performed transcriptome analysis on total RNAs using microarrays, and identified significant differential expression (p < 0.05, q < 0.05, ≥1.5 fold between control cultures versus nicotine-, acetamiprid-, or imidacloprid-exposed cultures in 34, 48, and 67 genes, respectively. Common to all exposed groups were nine genes essential for neurodevelopment, suggesting that chronic neonicotinoid exposure alters the transcriptome of the developing mammalian brain in a similar way to nicotine exposure. Our results highlight the need for further careful investigations into the effects of neonicotinoids in the developing mammalian brain.

  11. Core set approach to reduce uncertainty of gene trees

    Directory of Open Access Journals (Sweden)

    Okuhara Yoshiyasu

    2006-05-01

    Full Text Available Abstract Background A genealogy based on gene sequences within a species plays an essential role in the estimation of the character, structure, and evolutionary history of that species. Because intraspecific sequences are more closely related than interspecific ones, detailed information on the evolutionary process may be available by determining all the node sequences of trees and provide insight into functional constraints and adaptations. However, strong evolutionary correlations on a few lineages make this determination difficult as a whole, and the maximum parsimony (MP method frequently allows a number of topologies with a same total branching length. Results Kitazoe et al. developed multidimensional vector-space representation of phylogeny. It converts additivity of evolutionary distances to orthogonality among the vectors expressing branches, and provides a unified index to measure deviations from the orthogoality. In this paper, this index is used to detect and exclude sequences with large deviations from orthogonality, and then selects a maximum subset ("core set" of sequences for which MP generates a single solution. Once the core set tree is formed whose all the node sequences are given, the excluded sequences are found to have basically two phylogenetic positions on this tree, respectively. Fortunately, since multiple substitutions are rare in intra-species sequences, the variance of nucleotide transitions is confined to a small range. By applying the core set approach to 38 partial env sequences of HIV-1 in a single patient and also 198 mitochondrial COI and COII DNA sequences of Anopheles dirus, we demonstrate how consistently this approach constructs the tree. Conclusion In the HIV dataset, we confirmed that the obtained core set tree is the unique maximum set for which MP proposes a single tree. In the mosquito data set, the fluctuation of nucleotide transitions caused by the sequences excluded from the core set was very small

  12. Approaching the axiomatic enrichment of the Gene Ontology from a lexical perspective.

    Science.gov (United States)

    Quesada-Martínez, Manuel; Mikroyannidi, Eleni; Fernández-Breis, Jesualdo Tomás; Stevens, Robert

    2015-09-01

    The main goal of this work is to measure how lexical regularities in biomedical ontology labels can be used for the automatic creation of formal relationships between classes, and to evaluate the results of applying our approach to the Gene Ontology (GO). In recent years, we have developed a method for the lexical analysis of regularities in biomedical ontology labels, and we showed that the labels can present a high degree of regularity. In this work, we extend our method with a cross-products extension (CPE) metric, which estimates the potential interest of a specific regularity for axiomatic enrichment in the lexical analysis, using information on exact matches in external ontologies. The GO consortium recently enriched the GO by using so-called cross-product extensions. Cross-products are generated by establishing axioms that relate a given GO class with classes from the GO or other biomedical ontologies. We apply our method to the GO and study how its lexical analysis can identify and reconstruct the cross-products that are defined by the GO consortium. The label of the classes of the GO are highly regular in lexical terms, and the exact matches with labels of external ontologies affect 80% of the GO classes. The CPE metric reveals that 31.48% of the classes that exhibit regularities have fragments that are classes into two external ontologies that are selected for our experiment, namely, the Cell Ontology and the Chemical Entities of Biological Interest ontology, and 18.90% of them are fully decomposable into smaller parts. Our results show that the CPE metric permits our method to detect GO cross-product extensions with a mean recall of 62% and a mean precision of 28%. The study is completed with an analysis of false positives to explain this precision value. We think that our results support the claim that our lexical approach can contribute to the axiomatic enrichment of biomedical ontologies and that it can provide new insights into the engineering of

  13. The Dynamics of Visual Art Dialogues: Experiences to Be Used in Hospital Settings with Visual Art Enrichment

    Directory of Open Access Journals (Sweden)

    Britt-Maj Wikström

    2011-01-01

    Full Text Available Objectives. Given that hospitals have environmental enrichment with paintings and visual art arrangement, it would be meaningful to develop and document how hospital art could be used by health professionals. Methods. The study was undertaken at an art site in Sweden. During 1-hour sessions, participants (=20 get together in an art gallery every second week five times. Results. According to the participants a new value was perceived. From qualitative analyses, three themes appear: raise association, mentally present, and door-opener. In addition 72% of the participants reported makes me happy and gives energy and inspiration, and 52% reported that dialogues increase inspiration, make you involved, and stimulate curiosity. Conclusion. The present study supported the view that visual art dialogue could be used by health care professionals in a structured manner and that meaningful art stimulation, related to a person’s experiences, could be of importance for the patients. Implementing art dialogues in hospital settings could be a fruitful working tool for nurses, a complementary manner of patient communication.

  14. A gene pattern mining algorithm using interchangeable gene sets for prokaryotes

    Directory of Open Access Journals (Sweden)

    Kim Sun

    2008-02-01

    Full Text Available Abstract Background Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity. Results In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable, we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection. Conclusion The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function.

  15. Generation of an algorithm based on minimal gene sets to clinically subtype triple negative breast cancer patients.

    Science.gov (United States)

    Ring, Brian Z; Hout, David R; Morris, Stephan W; Lawrence, Kasey; Schweitzer, Brock L; Bailey, Daniel B; Lehmann, Brian D; Pietenpol, Jennifer A; Seitz, Robert S

    2016-02-23

    Recently, a gene expression algorithm, TNBCtype, was developed that can divide triple-negative breast cancer (TNBC) into molecularly-defined subtypes. The algorithm has potential to provide predictive value for TNBC subtype-specific response to various treatments. TNBCtype used in a retrospective analysis of neoadjuvant clinical trial data of TNBC patients demonstrated that TNBC subtype and pathological complete response to neoadjuvant chemotherapy were significantly associated. Herein we describe an expression algorithm reduced to 101 genes with the power to subtype TNBC tumors similar to the original 2188-gene expression algorithm and predict patient outcomes. The new classification model was built using the same expression data sets used for the original TNBCtype algorithm. Gene set enrichment followed by shrunken centroid analysis were used for feature reduction, then elastic-net regularized linear modeling was used to identify genes for a centroid model classifying all subtypes, comprised of 101 genes. The predictive capability of both this new "lean" algorithm and the original 2188-gene model were applied to an independent clinical trial cohort of 139 TNBC patients treated initially with neoadjuvant doxorubicin/cyclophosphamide and then randomized to receive either paclitaxel or ixabepilone to determine association of pathologic complete response within the subtypes. The new 101-gene expression model reproduced the classification provided by the 2188-gene algorithm and was highly concordant in the same set of seven TNBC cohorts used to generate the TNBCtype algorithm (87%), as well as in the independent clinical trial cohort (88%), when cases with significant correlations to multiple subtypes were excluded. Clinical responses to both neoadjuvant treatment arms, found BL2 to be significantly associated with poor response (Odds Ratio (OR) =0.12, p=0.03 for the 2188-gene model; OR = 0.23, p sets can recapitulate the TNBC subtypes identified by the original 2188

  16. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer.

    Science.gov (United States)

    Pandi, Narayanan Sathiya; Suganya, Sivagurunathan; Rajendran, Suriliyandi

    2013-10-04

    Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Case-control admixture mapping in Latino populations enriches for known asthma-associated genes

    Science.gov (United States)

    Torgerson, Dara G.; Gignoux, Christopher R.; Galanter, Joshua M.; Drake, Katherine A.; Roth, Lindsey A.; Eng, Celeste; Huntsman, Scott; Torres, Raul; Avila, Pedro C.; Chapela, Rocio; Ford, Jean G.; Rodríguez-Santana, José R.; Rodríguez-Cintrón, William; Hernandez, Ryan D.; Burchard, Esteban G.

    2012-01-01

    Background Polymorphisms in more than 100 genes have been associated with asthma susceptibility, yet much of the heritability remains to be explained. Asthma disproportionately affects different racial and ethnic groups in the United States, suggesting that admixture mapping is a useful strategy to identify novel asthma-associated loci. Objective We sought to identify novel asthma-associated loci in Latino populations using case-control admixture mapping. Methods We performed genome-wide admixture mapping by comparing levels of local Native American, European, and African ancestry between children with asthma and nonasthmatic control subjects in Puerto Rican and Mexican populations. Within candidate peaks, we performed allelic tests of association, controlling for differences in local ancestry. Results Between the 2 populations, we identified a total of 62 admixture mapping peaks at a P value of less than 10−3 that were significantly enriched for previously identified asthma-associated genes (P = .0051). One of the peaks was statistically significant based on 100 permutations in the Mexican sample (6q15); however, it was not significant in Puerto Rican subjects. Another peak was identified at nominal significance in both populations (8q12); however, the association was observed with different ancestries. Conclusion Case-control admixture mapping is a promising strategy for identifying novel asthma-associated loci in Latino populations and implicates genetic variation at 6q15 and 8q12 regions with asthma susceptibility. This approach might be useful for identifying regions that contribute to both shared and population-specific differences in asthma susceptibility. PMID:22502797

  18. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer

    Energy Technology Data Exchange (ETDEWEB)

    Pandi, Narayanan Sathiya, E-mail: sathiyapandi@gmail.com; Suganya, Sivagurunathan; Rajendran, Suriliyandi

    2013-10-04

    Highlights: •Identified stomach lineage specific gene set (SLSGS) was found to be under expressed in gastric tumors. •Elevated expression of SLSGS in gastric tumor is a molecular predictor of metabolic type gastric cancer. •In silico pathway scanning identified estrogen-α signaling is a putative regulator of SLSGS in gastric cancer. •Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. -- Abstract: Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.

  19. Microarray analysis identifies a common set of cellular genes modulated by different HCV replicon clones

    Directory of Open Access Journals (Sweden)

    Gerosolimo Germano

    2008-06-01

    Full Text Available Abstract Background Hepatitis C virus (HCV RNA synthesis and protein expression affect cell homeostasis by modulation of gene expression. The impact of HCV replication on global cell transcription has not been fully evaluated. Thus, we analysed the expression profiles of different clones of human hepatoma-derived Huh-7 cells carrying a self-replicating HCV RNA which express all viral proteins (HCV replicon system. Results First, we compared the expression profile of HCV replicon clone 21-5 with both the Huh-7 parental cells and the 21-5 cured (21-5c cells. In these latter, the HCV RNA has been eliminated by IFN-α treatment. To confirm data, we also analyzed microarray results from both the 21-5 and two other HCV replicon clones, 22-6 and 21-7, compared to the Huh-7 cells. The study was carried out by using the Applied Biosystems (AB Human Genome Survey Microarray v1.0 which provides 31,700 probes that correspond to 27,868 human genes. Microarray analysis revealed a specific transcriptional program induced by HCV in replicon cells respect to both IFN-α-cured and Huh-7 cells. From the original datasets of differentially expressed genes, we selected by Venn diagrams a final list of 38 genes modulated by HCV in all clones. Most of the 38 genes have never been described before and showed high fold-change associated with significant p-value, strongly supporting data reliability. Classification of the 38 genes by Panther System identified functional categories that were significantly enriched in this gene set, such as histones and ribosomal proteins as well as extracellular matrix and intracellular protein traffic. The dataset also included new genes involved in lipid metabolism, extracellular matrix and cytoskeletal network, which may be critical for HCV replication and pathogenesis. Conclusion Our data provide a comprehensive analysis of alterations in gene expression induced by HCV replication and reveal modulation of new genes potentially useful

  20. Jetset: selecting the optimal microarray probe set to represent a gene

    DEFF Research Database (Denmark)

    Li, Qiyuan; Birkbak, Nicolai Juul; Gyorffy, Balazs

    2011-01-01

    Background: Interpretation of gene expression microarrays requires a mapping from probe set to gene. On many Affymetrix gene expression microarrays, a given gene may be detected by multiple probe sets, which may deliver inconsistent or even contradictory measurements. Therefore, obtaining...... an unambiguous expression estimate of a pre-specified gene can be a nontrivial but essential task. Results: We developed scoring methods to assess each probe set for specificity, splice isoform coverage, and robustness against transcript degradation. We used these scores to select a single representative probe...... set for each gene, thus creating a simple one-to-one mapping between gene and probe set. To test this method, we evaluated concordance between protein measurements and gene expression values, and between sets of genes whose expression is known to be correlated. For both test cases, we identified genes...

  1. Between-species differences in gene copy number are enriched among functions critical for adaptive evolution in Arabidopsis halleri.

    Science.gov (United States)

    Suryawanshi, Vasantika; Talke, Ina N; Weber, Michael; Eils, Roland; Brors, Benedikt; Clemens, Stephan; Krämer, Ute

    2016-12-22

    Gene copy number divergence between species is a form of genetic polymorphism that contributes significantly to both genome size and phenotypic variation. In plants, copy number expansions of single genes were implicated in cultivar- or species-specific tolerance of high levels of soil boron, aluminium or calamine-type heavy metals, respectively. Arabidopsis halleri is a zinc- and cadmium-hyperaccumulating extremophile species capable of growing on heavy-metal contaminated, toxic soils. In contrast, its non-accumulating sister species A. lyrata and the closely related reference model species A. thaliana exhibit merely basal metal tolerance. For a genome-wide assessment of the role of copy number divergence (CND) in lineage-specific environmental adaptation, we conducted cross-species array comparative genome hybridizations of three plant species and developed a global signal scaling procedure to adjust for sequence divergence. In A. halleri, transition metal homeostasis functions are enriched twofold among the genes detected as copy number expanded. Moreover, biotic stress functions including mostly disease Resistance (R) gene-related genes are enriched twofold among genes detected as copy number reduced, when compared to the abundance of these functions among all genes. Our results provide genome-wide support for a link between evolutionary adaptation and CND in A. halleri as shown previously for Heavy metal ATPase4. Moreover our results support the hypothesis that elemental defences, which result from the hyperaccumulation of toxic metals, allow the reduction of classical defences against biotic stress as a trade-off.

  2. A novel CpG island set identifies tissue-specific methylation at developmental gene loci.

    Directory of Open Access Journals (Sweden)

    Robert Illingworth

    2008-01-01

    Full Text Available CpG islands (CGIs are dense clusters of CpG sequences that punctuate the CpG-deficient human genome and associate with many gene promoters. As CGIs also differ from bulk chromosomal DNA by their frequent lack of cytosine methylation, we devised a CGI enrichment method based on nonmethylated CpG affinity chromatography. The resulting library was sequenced to define a novel human blood CGI set that includes many that are not detected by current algorithms. Approximately half of CGIs were associated with annotated gene transcription start sites, the remainder being intra- or intergenic. Using an array representing over 17,000 CGIs, we established that 6%-8% of CGIs are methylated in genomic DNA of human blood, brain, muscle, and spleen. Inter- and intragenic CGIs are preferentially susceptible to methylation. CGIs showing tissue-specific methylation were overrepresented at numerous genetic loci that are essential for development, including HOX and PAX family members. The findings enable a comprehensive analysis of the roles played by CGI methylation in normal and diseased human tissues.

  3. META-GSA: Combining Findings from Gene-Set Analyses across Several Genome-Wide Association Studies.

    Directory of Open Access Journals (Sweden)

    Albert Rosenberger

    Full Text Available Gene-set analysis (GSA methods are used as complementary approaches to genome-wide association studies (GWASs. The single marker association estimates of a predefined set of genes are either contrasted with those of all remaining genes or with a null non-associated background. To pool the p-values from several GSAs, it is important to take into account the concordance of the observed patterns resulting from single marker association point estimates across any given gene set. Here we propose an enhanced version of Fisher's inverse χ2-method META-GSA, however weighting each study to account for imperfect correlation between association patterns.We investigated the performance of META-GSA by simulating GWASs with 500 cases and 500 controls at 100 diallelic markers in 20 different scenarios, simulating different relative risks between 1 and 1.5 in gene sets of 10 genes. Wilcoxon's rank sum test was applied as GSA for each study. We found that META-GSA has greater power to discover truly associated gene sets than simple pooling of the p-values, by e.g. 59% versus 37%, when the true relative risk for 5 of 10 genes was assume to be 1.5. Under the null hypothesis of no difference in the true association pattern between the gene set of interest and the set of remaining genes, the results of both approaches are almost uncorrelated. We recommend not relying on p-values alone when combining the results of independent GSAs.We applied META-GSA to pool the results of four case-control GWASs of lung cancer risk (Central European Study and Toronto/Lunenfeld-Tanenbaum Research Institute Study; German Lung Cancer Study and MD Anderson Cancer Center Study, which had already been analyzed separately with four different GSA methods (EASE; SLAT, mSUMSTAT and GenGen. This application revealed the pathway GO0015291 "transmembrane transporter activity" as significantly enriched with associated genes (GSA-method: EASE, p = 0.0315 corrected for multiple testing. Similar

  4. Inter-species Inference of Gene Set Enrichment in Lung Epithelial Cells from Proteomic and Large Transcriptomic Data Sets

    NARCIS (Netherlands)

    Hormoz, Sahand; Bhanot, Gyan; Biehl, Michael; Bilal, Erhan; Meyer, Pablo; Norel, Raquel; Rhrissorrakrai, Kahn; Dayarian, Adel

    2014-01-01

    MOTIVATION: Translating findings in rodent models to humans has been a corner-stone of modern biology and drug development. However, in many cases a naive 'extrapolation' between the two species has not succeeded. As a result, clinical trials of new drugs sometimes fail even after considerable succe

  5. Beyond main effects of gene-sets: harsh parenting moderates the association between a dopamine gene-set and child externalizing behavior

    NARCIS (Netherlands)

    J. Windhorst (Judith); V. Mileva-Seitz; R.C.A. Rippe (Ralph C.A.); H.W. Tiemeier (Henning); V.W.V. Jaddoe (Vincent); F.C. Verhulst (Frank); M.H. van IJzendoorn (Marinus); M.J. Bakermans-Kranenburg (Marian)

    2016-01-01

    textabstractBackground: In a longitudinal cohort study, we investigated the interplay of harsh parenting and genetic variation across a set of functionally related dopamine genes, in association with children's externalizing behavior. This is one of the first studies to employ gene-based and gene-se

  6. Beyond main effects of gene-sets: Harsh parenting moderates the association between a dopamine gene-set and child externalizing behavior

    OpenAIRE

    Windhorst, D.A.; Mileva, V.R.; Rippe, R.C.A.; Tiemeier, H; Jaddoe, V. W. V.; Verhulst, F. C.; IJzendoorn, van, M.H.; Bakermans, M.J.

    2016-01-01

    Abstract Background In a longitudinal cohort study, we investigated the interplay of harsh parenting and genetic variation across a set of functionally related dopamine genes, in association with children's externalizing behavior. This is one of the first studies to employ gene‐based and gene‐set approaches in tests of Gene by Environment (G × E) effects on complex behavior. This approach can offer an important alternative or complement to candidate gene and genome‐wide environmental interact...

  7. Science Teaching Experiences in Informal Settings: One Way to Enrich the Preparation Program for Preservice Science Teachers

    Science.gov (United States)

    Hsu, Pei-Ling

    2016-01-01

    The high attrition rate of new science teachers demonstrates the urgent need to incorporate effective practices in teacher preparation programs to better equip preservice science teachers. The purpose of the study is to demonstrate a way to enrich preservice science teachers' preparation by incorporating informal science teaching practice into…

  8. Evaluating the effect of using different sets of enrichment for FAs on fuel management optimization using CA

    Energy Technology Data Exchange (ETDEWEB)

    Moghaddam, Nader Maleki [Faculty of Nuclear Engineering and Physics, Amirkabir University of Technology (Tehran Polytechnique), Hafez Street, Tehran (Iran, Islamic Republic of); Fadaei, Amir Hosein, E-mail: Fadaei_amir@aut.ac.ir [Faculty of Nuclear Engineering and Physics, Amirkabir University of Technology (Tehran Polytechnique), Hafez Street, Tehran (Iran, Islamic Republic of); Zahedi, Ehsan [Department of Mechanical Engineering, Sharif University of Technology, Azadi Str., Tehran (Iran, Islamic Republic of)

    2011-04-15

    In nuclear reactor core design, achieving the optimized arrangement of fuel assemblies (FAs) is the most important step towards satisfying safety and economic requirements. In most studies, nuclear fuel optimizations have been performed by using a finite number of different types of FAs. However the effect of FA numbers with different enrichments and the difference between their maximum and minimum enrichment values can be important and should be evaluated in the optimization process. This research is aimed at evaluating the effect of using different enrichment values for FAs. This issue has been investigated by focusing on two parameters, namely, the initially selected enrichment and the difference between the minimum and maximum enrichments applied in the core design. In the previous studies of nuclear fuel management, these parameters have been kept as fixed quantities and considered as initial assumptions in the optimization process. Therefore, to achieve an optimized arrangement of the core, the proper values of these parameters have to be determined. For this purpose a parameter ({delta}) served through the optimization process to show the effect of the difference between the enrichment values of FAs. Another parameter named {epsilon}{sub 0} shows the minimum enrichment of FAs. These parameters are defined based on a factor named Fuel Quality Factor (FQF) as a characteristic of fuel composition. FQF is shown by Z(r) is also used through the optimization process for achieving the smooth distribution of power. The values of Z(r) are calculated using the MCNP code. This methodology is applied to a VVER-1000 nuclear reactor core in order to minimize the local power peaking factor (P{sub q}). For finding the best configuration of FAs in the core, Cellular Automata (CA) is applied as a powerful and reliable tool. The computer codes WIMS and CITATION are used for core calculations. The results provide a comprehensive view of VVER-1000 reactor core configuration for

  9. A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes

    DEFF Research Database (Denmark)

    de Jong, Simone; Boks, Marco P M; Fuller, Tova F;

    2012-01-01

    Despite large-scale genome-wide association studies (GWAS), the underlying genes for schizophrenia are largely unknown. Additional approaches are therefore required to identify the genetic background of this disorder. Here we report findings from a large gene expression study in peripheral blood...... of schizophrenia patients and controls. We applied a systems biology approach to genome-wide expression data from whole blood of 92 medicated and 29 antipsychotic-free schizophrenia patients and 118 healthy controls. We show that gene expression profiling in whole blood can identify twelve large gene co...... of these robustly defined disease modules is significantly enriched with brain-expressed genes and with genetic variants that were implicated in a GWAS study, which could imply a causal role in schizophrenia etiology. The most highly connected intramodular hub gene in this module (ABCF1), is located in...

  10. Human Effector / Initiator Gene Sets That Regulate Myometrial Contractility During Term and Preterm Labor

    Science.gov (United States)

    WEINER, Carl P.; MASON, Clifford W.; DONG, Yafeng; BUHIMSCHI, Irina A.; SWAAN, Peter W.; BUHIMSCHI, Catalin S.

    2010-01-01

    Objective Distinct processes govern transition from quiescence to activation during term (TL) and preterm labor (PTL). We sought gene sets responsible for TL and PTL, along with the effector genes necessary for labor independent of gestation and underlying trigger. Methods Expression was analyzed in term and preterm +/− labor (n =6 subjects/group). Gene sets were generated using logic operations. Results 34 genes were similarly expressed in PTL/TL but absent from nonlabor samples (Effector Set). 49 genes were specific to PTL (Preterm Initiator Set) and 174 to TL (Term Initiator Set). The gene ontogeny processes comprising Term Initiator and Effector Sets were diverse, though inflammation was represented in 4 of the top 10; inflammation dominated the Preterm Initiator Set. Comments TL and PTL differ dramatically in initiator profiles. Though inflammation is part of the Term Initiator and the Effector Sets, it is an overwhelming part of PTL associated with intraamniotic inflammation. PMID:20452493

  11. Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains

    Directory of Open Access Journals (Sweden)

    Bharti Arvind K

    2008-12-01

    Full Text Available Abstract Background Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR and methylation spanning linker libraries (MSLL. These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends. Results A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig, while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%. These two techniques were compared with other gene-enrichment methods, and shown to be complementary. Conclusion MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of

  12. JAG: A Computational Tool to Evaluate the Role of Gene-Sets in Complex Traits.

    Science.gov (United States)

    Lips, Esther S; Kooyman, Maarten; de Leeuw, Christiaan; Posthuma, Danielle

    2015-05-14

    Gene-set analysis has been proposed as a powerful tool to deal with the highly polygenic architecture of complex traits, as well as with the small effect sizes typically found in GWAS studies for complex traits. We developed a tool, Joint Association of Genetic variants (JAG), which can be applied to Genome Wide Association (GWA) data and tests for the joint effect of all single nucleotide polymorphisms (SNPs) located in a user-specified set of genes or biological pathway. JAG assigns SNPs to genes and incorporates self-contained and/or competitive tests for gene-set analysis. JAG uses permutation to evaluate gene-set significance, which implicitly controls for linkage disequilibrium, sample size, gene size, the number of SNPs per gene and the number of genes in the gene-set. We conducted a power analysis using the Wellcome Trust Case Control Consortium (WTCCC) Crohn's disease data set and show that JAG correctly identifies validated gene-sets for Crohn's disease and has more power than currently available tools for gene-set analysis. JAG is a powerful, novel tool for gene-set analysis, and can be freely downloaded from the CTG Lab website.

  13. JAG: A Computational Tool to Evaluate the Role of Gene-Sets in Complex Traits

    Directory of Open Access Journals (Sweden)

    Esther S. Lips

    2015-05-01

    Full Text Available Gene-set analysis has been proposed as a powerful tool to deal with the highly polygenic architecture of complex traits, as well as with the small effect sizes typically found in GWAS studies for complex traits. We developed a tool, Joint Association of Genetic variants (JAG, which can be applied to Genome Wide Association (GWA data and tests for the joint effect of all single nucleotide polymorphisms (SNPs located in a user-specified set of genes or biological pathway. JAG assigns SNPs to genes and incorporates self-contained and/or competitive tests for gene-set analysis. JAG uses permutation to evaluate gene-set significance, which implicitly controls for linkage disequilibrium, sample size, gene size, the number of SNPs per gene and the number of genes in the gene-set. We conducted a power analysis using the Wellcome Trust Case Control Consortium (WTCCC Crohn’s disease data set and show that JAG correctly identifies validated gene-sets for Crohn’s disease and has more power than currently available tools for gene-set analysis. JAG is a powerful, novel tool for gene-set analysis, and can be freely downloaded from the CTG Lab website.

  14. Detection of Salmonella invA gene in shrimp enrichment culture by polymerase chain reaction.

    Science.gov (United States)

    Upadhyay, Bishnu Prasad; Utrarachkij, Fuangfa; Thongshoob, Jarinee; Mahakunkijcharoen, Yuvadee; Wongchinda, Niracha; Suthienkul, Orasa; Khusmith, Srisin

    2010-03-01

    Contamination of seafood with salmonellae is a major public health concern. Detection of Salmonella by standard culture methods is time consuming. In this study, an enrichment culture step prior to polymerase chain reaction (PCR) was applied to detect 284 bp fragment of Salmonella invA in comparison with the conventional culture method in 100 shrimp samples collected from four different shrimp farms and fresh food markets around Bangkok. Samples were pre-enriched in non-selective lactose broth (LB) and selective tetrathionate broth (TTB). PCR detection limit was 10 pg and 10(4) cfu/ml of viable salmonellae with 100% specificity. PCR assay detected 19 different Salmonella serovars belonging to 8 serogroups (B, C1, C2-C3, D1, E1, E4 and K) commonly found in clinical and environmental samples in Thailand. The detection rate of PCR following TTB enrichment (24%) was higher than conventional culture method (19%). PCR following TTB, but not in LB enrichment allowed salmonella detection with 84% sensitivity, 90% specificity and 89% accuracy. Shrimp samples collected from fresh food markets had higher levels of contaminated salmonellae than those from shrimp farms. The results indicated that incorporation of an enrichment step prior to PCR has the potential to be applied for detection of naturally contaminated salmonellae in food, environment and clinical samples.

  15. New cyt b gene universal primer set for forensic analysis.

    Science.gov (United States)

    Lopez-Oceja, A; Gamarra, D; Borragan, S; Jiménez-Moreno, S; de Pancorbo, M M

    2016-07-01

    Analysis of mitochondrial DNA, and in particular the cytochrome b gene (cyt b), has become an essential tool for species identification in routine forensic practice. In cases of degraded samples, where the DNA is fractionated, universal primers that are highly efficient for the amplification of the target region are necessary. Therefore, in the present study a new universal cyt b primer set with high species identification capabilities, even in samples with highly degraded DNA, has been developed. In order to achieve this objective, the primers were designed following the alignment of complete sequences of the cyt b from 751 species from the Class of Mammalia listed in GenBank. A highly variable region of 148bp flanked by highly conserved sequences was chosen for placing the primers. The effectiveness of the new pair of primers was examined in 63 animal species belonging to 38 Families from 14 Orders and 5 Classes (Mammalia, Aves, Reptilia, Actinopterygii, and Malacostraca). Species determination was possible in all cases, which shows that the fragment analyzed provided a high capability for species identification. Furthermore, to ensure the efficiency of the 148bp fragment, the intraspecific variability was analyzed by calculating the concordance between individuals with the BLAST tool from the NCBI (National Center for Biotechnological Information). The intraspecific concordance levels were superior to 97% in all species. Likewise, the phylogenetic information from the selected fragment was confirmed by obtaining the phylogenetic tree from the sequences of the species analyzed. Evidence of the high power of phylogenetic discrimination of the analyzed fragment of the cyt b was obtained, as 93.75% of the species were grouped within their corresponding Orders. Finally, the analysis of 40 degraded samples with small-size DNA fragments showed that the new pair of primers permits identifying the species, even when the DNA is highly degraded as it is very common in

  16. Metagenomic survey of methanesulfonic acid (MSA) catabolic genes in an Atlantic Ocean surface water sample and in a partial enrichment

    Science.gov (United States)

    Henriques, Ana C.; Azevedo, Rui M.S.

    2016-01-01

    Methanesulfonic acid (MSA) is a relevant intermediate of the biogeochemical cycle of sulfur and environmental microorganisms assume an important role in the mineralization of this compound. Several methylotrophic bacterial strains able to grow on MSA have been isolated from soil or marine water and two conserved operons, msmABCD coding for MSA monooxygenase and msmEFGH coding for a transport system, have been repeatedly encountered in most of these strains. Homologous sequences have also been amplified directly from the environment or observed in marine metagenomic data, but these showed a base composition (G + C content) very different from their counterparts from cultivated bacteria. The aim of this study was to understand which microorganisms within the coastal surface oceanic microflora responded to MSA as a nutrient and how the community evolved in the early phases of an enrichment by means of metagenome and gene-targeted amplicon sequencing. From the phylogenetic point of view, the community shifted significantly with the disappearance of all signals related to the Archaea, the Pelagibacteraceae and phylum SAR406, and the increase in methylotroph-harboring taxa, accompanied by other groups so far not known to comprise methylotrophs such as the Hyphomonadaceae. At the functional level, the abundance of several genes related to sulfur metabolism and methylotrophy increased during the enrichment and the allelic distribution of gene msmA diagnostic for MSA monooxygenase altered considerably. Even more dramatic was the disappearance of MSA import-related gene msmE, which suggests that alternative transporters must be present in the enriched community and illustrate the inadequacy of msmE as an ecofunctional marker for MSA degradation at sea. PMID:27761315

  17. Metagenomic survey of methanesulfonic acid (MSA) catabolic genes in an Atlantic Ocean surface water sample and in a partial enrichment.

    Science.gov (United States)

    Henriques, Ana C; Azevedo, Rui M S; De Marco, Paolo

    2016-01-01

    Methanesulfonic acid (MSA) is a relevant intermediate of the biogeochemical cycle of sulfur and environmental microorganisms assume an important role in the mineralization of this compound. Several methylotrophic bacterial strains able to grow on MSA have been isolated from soil or marine water and two conserved operons, msmABCD coding for MSA monooxygenase and msmEFGH coding for a transport system, have been repeatedly encountered in most of these strains. Homologous sequences have also been amplified directly from the environment or observed in marine metagenomic data, but these showed a base composition (G + C content) very different from their counterparts from cultivated bacteria. The aim of this study was to understand which microorganisms within the coastal surface oceanic microflora responded to MSA as a nutrient and how the community evolved in the early phases of an enrichment by means of metagenome and gene-targeted amplicon sequencing. From the phylogenetic point of view, the community shifted significantly with the disappearance of all signals related to the Archaea, the Pelagibacteraceae and phylum SAR406, and the increase in methylotroph-harboring taxa, accompanied by other groups so far not known to comprise methylotrophs such as the Hyphomonadaceae. At the functional level, the abundance of several genes related to sulfur metabolism and methylotrophy increased during the enrichment and the allelic distribution of gene msmA diagnostic for MSA monooxygenase altered considerably. Even more dramatic was the disappearance of MSA import-related gene msmE, which suggests that alternative transporters must be present in the enriched community and illustrate the inadequacy of msmE as an ecofunctional marker for MSA degradation at sea.

  18. Identifying the genetic variation of gene expression using gene sets: application of novel gene Set eQTL approach to PharmGKB and KEGG.

    Directory of Open Access Journals (Sweden)

    Ryan Abo

    Full Text Available Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs and pathways, referred to as gene set - expression quantitative trait loci analysis (GS-eQTL. The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations.

  19. A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction.

    Science.gov (United States)

    Jung, Hye-Young; Leem, Sangseob; Lee, Sungyoung; Park, Taesung

    2016-12-01

    Gene-gene interaction (GGI) is one of the most popular approaches for finding the missing heritability of common complex traits in genetic association studies. The multifactor dimensionality reduction (MDR) method has been widely studied for detecting GGIs. In order to identify the best interaction model associated with disease susceptibility, MDR compares all possible genotype combinations in terms of their predictability of disease status from a simple binary high(H) and low(L) risk classification. However, this simple binary classification does not reflect the uncertainty of H/L classification. We regard classifying H/L as equivalent to defining the degree of membership of two risk groups H/L. By adopting the fuzzy set theory, we propose Fuzzy MDR which takes into account the uncertainty of H/L classification. Fuzzy MDR allows the possibility of partial membership of H/L through a membership function which transforms the degree of uncertainty into a [0,1] scale. The best genotype combinations can be selected which maximizes a new fuzzy set based accuracy measure. Two simulation studies are conducted to compare the power of the proposed Fuzzy MDR with that of MDR. Our results show that Fuzzy MDR has higher power than MDR. We illustrate the proposed Fuzzy MDR by analysing bipolar disorder (BD) trait of the WTCCC dataset to detect GGI associated with BD. We propose a novel Fuzzy MDR method to detect gene-gene interaction by taking into account the uncertainly of H/L classification and show that it has higher power than MDR. Fuzzy MDR can be easily extended to handle continuous phenotypes as well. The program written in R for the proposed Fuzzy MDR is available at https://statgen.snu.ac.kr/software/FuzzyMDR. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

    Institute of Scientific and Technical Information of China (English)

    Heng Li; Tao Liu; Hai-Hong Li; Yan Li; Li-Jun Fang; Hui-Min Xie; Wei-Mou Zheng; Bai-Lin Hao; Jin-Song Liu; Zhao Xu; Jiao Jin; Lin Fang; Lei Gao; Yu-Dong Li; Zi-Xing Xing; Shao-Gen Gao

    2005-01-01

    With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM,GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.

  1. Comparison of gene sets for expression profiling: prediction of metastasis from low-malignant breast cancer

    DEFF Research Database (Denmark)

    Thomassen, Mads; Tan, Qihua; Eiriksdottir, Freyja;

    2007-01-01

    -six tumors from low-risk patients and 34 low-malignant T2 tumors from patients with slightly higher risk have been examined by genome-wide gene expression analysis. Nine prognostic gene sets were tested in this data set. RESULTS: A 32-gene profile (HUMAC32) that accurately predicts metastasis has previously...... sets, mainly developed in high-risk cancers, predict metastasis from low-malignant cancer....

  2. Characterization of gene mutations and copy number changes in acute myeloid leukemia using a rapid target enrichment protocol.

    Science.gov (United States)

    Bolli, Niccolò; Manes, Nicla; McKerrell, Thomas; Chi, Jianxiang; Park, Naomi; Gundem, Gunes; Quail, Michael A; Sathiaseelan, Vijitha; Herman, Bram; Crawley, Charles; Craig, Jenny I O; Conte, Natalie; Grove, Carolyn; Papaemmanuil, Elli; Campbell, Peter J; Varela, Ignacio; Costeas, Paul; Vassiliou, George S

    2015-02-01

    Prognostic stratification is critical for making therapeutic decisions and maximizing survival of patients with acute myeloid leukemia. Advances in the genomics of acute myeloid leukemia have identified several recurrent gene mutations whose prognostic impact is being deciphered. We used HaloPlex target enrichment and Illumina-based next generation sequencing to study 24 recurrently mutated genes in 42 samples of acute myeloid leukemia with a normal karyotype. Read depth varied between and within genes for the same sample, but was predictable and highly consistent across samples. Consequently, we were able to detect copy number changes, such as an interstitial deletion of BCOR, three MLL partial tandem duplications, and a novel KRAS amplification. With regards to coding mutations, we identified likely oncogenic variants in 41 of 42 samples. NPM1 mutations were the most frequent, followed by FLT3, DNMT3A and TET2. NPM1 and FLT3 indels were reported with good efficiency. We also showed that DNMT3A mutations can persist post-chemotherapy and in 2 cases studied at diagnosis and relapse, we were able to delineate the dynamics of tumor evolution and give insights into order of acquisition of variants. HaloPlex is a quick and reliable target enrichment method that can aid diagnosis and prognostic stratification of acute myeloid leukemia patients.

  3. Identifying the optimal gene and gene set in hepatocellular carcinoma based on differential expression and differential co-expression algorithm.

    Science.gov (United States)

    Dong, Li-Yang; Zhou, Wei-Zhong; Ni, Jun-Wei; Xiang, Wei; Hu, Wen-Hao; Yu, Chang; Li, Hai-Yan

    2017-02-01

    The objective of this study was to identify the optimal gene and gene set for hepatocellular carcinoma (HCC) utilizing differential expression and differential co-expression (DEDC) algorithm. The DEDC algorithm consisted of four parts: calculating differential expression (DE) by absolute t-value in t-statistics; computing differential co-expression (DC) based on Z-test; determining optimal thresholds on the basis of Chi-squared (χ2) maximization and the corresponding gene was the optimal gene; and evaluating functional relevance of genes categorized into different partitions to determine the optimal gene set with highest mean minimum functional information (FI) gain (Δ*G). The optimal thresholds divided genes into four partitions, high DE and high DC (HDE-HDC), high DE and low DC (HDE-LDC), low DE and high DC (LDE‑HDC), and low DE and low DC (LDE-LDC). In addition, the optimal gene was validated by conducting reverse transcription-polymerase chain reaction (RT-PCR) assay. The optimal threshold for DC and DE were 1.032 and 1.911, respectively. Using the optimal gene, the genes were divided into four partitions including: HDE-HDC (2,053 genes), HED-LDC (2,822 genes), LDE-HDC (2,622 genes), and LDE-LDC (6,169 genes). The optimal gene was microtubule‑associated protein RP/EB family member 1 (MAPRE1), and RT-PCR assay validated the significant difference between the HCC and normal state. The optimal gene set was nucleoside metabolic process (GO\\GO:0009116) with Δ*G = 18.681 and 24 HDE-HDC partitions in total. In conclusion, we successfully investigated the optimal gene, MAPRE1, and gene set, nucleoside metabolic process, which may be potential biomarkers for targeted therapy and provide significant insight for revealing the pathological mechanism underlying HCC.

  4. Toxoplasmosis and Polygenic Disease Susceptibility Genes: Extensive Toxoplasma gondii Host/Pathogen Interactome Enrichment in Nine Psychiatric or Neurological Disorders

    Directory of Open Access Journals (Sweden)

    C. J. Carter

    2013-01-01

    Full Text Available Toxoplasma gondii is not only implicated in schizophrenia and related disorders, but also in Alzheimer's or Parkinson's disease, cancer, cardiac myopathies, and autoimmune disorders. During its life cycle, the pathogen interacts with ~3000 host genes or proteins. Susceptibility genes for multiple sclerosis, Alzheimer's disease, schizophrenia, bipolar disorder, depression, childhood obesity, Parkinson's disease, attention deficit hyperactivity disorder (multiple sclerosis, and autism (, but not anorexia or chronic fatigue are highly enriched in the human arm of this interactome and 18 (ADHD to 33% (MS of the susceptibility genes relate to it. The signalling pathways involved in the susceptibility gene/interactome overlaps are relatively specific and relevant to each disease suggesting a means whereby susceptibility genes could orient the attentions of a single pathogen towards disruption of the specific pathways that together contribute (positively or negatively to the endophenotypes of different diseases. Conditional protein knockdown, orchestrated by T. gondii proteins or antibodies binding to those of the host (pathogen derived autoimmunity and metabolite exchange, may contribute to this disruption. Susceptibility genes may thus be related to the causes and influencers of disease, rather than (and as well as to the disease itself.

  5. Assessing the Association of Mitochondrial Genetic Variation With Primary Open-Angle Glaucoma Using Gene-Set Analyses.

    Science.gov (United States)

    Khawaja, Anthony P; Cooke Bailey, Jessica N; Kang, Jae Hee; Allingham, R Rand; Hauser, Michael A; Brilliant, Murray; Budenz, Donald L; Christen, William G; Fingert, John; Gaasterland, Douglas; Gaasterland, Terry; Kraft, Peter; Lee, Richard K; Lichter, Paul R; Liu, Yutao; Medeiros, Felipe; Moroi, Syoko E; Richards, Julia E; Realini, Tony; Ritch, Robert; Schuman, Joel S; Scott, William K; Singh, Kuldev; Sit, Arthur J; Vollrath, Douglas; Wollstein, Gadi; Zack, Donald J; Zhang, Kang; Pericak-Vance, Margaret; Weinreb, Robert N; Haines, Jonathan L; Pasquale, Louis R; Wiggs, Janey L

    2016-09-01

    Recent studies indicate that mitochondrial proteins may contribute to the pathogenesis of primary open-angle glaucoma (POAG). In this study, we examined the association between POAG and common variations in gene-encoding mitochondrial proteins. We examined genetic data from 3430 POAG cases and 3108 controls derived from the combination of the GLAUGEN and NEIGHBOR studies. We constructed biological-system coherent mitochondrial nuclear-encoded protein gene-sets by intersecting the MitoCarta database with the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We examined the mitochondrial gene-sets for association with POAG and with normal-tension glaucoma (NTG) and high-tension glaucoma (HTG) subsets using Pathway Analysis by Randomization Incorporating Structure. We identified 22 KEGG pathways with significant mitochondrial protein-encoding gene enrichment, belonging to six general biological classes. Among the pathway classes, mitochondrial lipid metabolism was associated with POAG overall (P = 0.013) and with NTG (P = 0.0006), and mitochondrial carbohydrate metabolism was associated with NTG (P = 0.030). Examining the individual KEGG pathway mitochondrial gene-sets, fatty acid elongation and synthesis and degradation of ketone bodies, both lipid metabolism pathways, were significantly associated with POAG (P = 0.005 and P = 0.002, respectively) and NTG (P = 0.0004 and P < 0.0001, respectively). Butanoate metabolism, a carbohydrate metabolism pathway, was significantly associated with POAG (P = 0.004), NTG (P = 0.001), and HTG (P = 0.010). We present an effective approach for assessing the contributions of mitochondrial genetic variation to open-angle glaucoma. Our findings support a role for mitochondria in POAG pathogenesis and specifically point to lipid and carbohydrate metabolism pathways as being important.

  6. A reference gene set for chemosensory receptor genes of Manduca sexta.

    Science.gov (United States)

    Koenig, Christopher; Hirsh, Ariana; Bucks, Sascha; Klinner, Christian; Vogel, Heiko; Shukla, Aditi; Mansfield, Jennifer H; Morton, Brian; Hansson, Bill S; Grosse-Wilde, Ewald

    2015-11-01

    The order of Lepidoptera has historically been crucial for chemosensory research, with many important advances coming from the analysis of species like Bombyx mori or the tobacco hornworm, Manduca sexta. Specifically M. sexta has long been a major model species in the field, especially regarding the importance of olfaction in an ecological context, mainly the interaction with its host plants. In recent years transcriptomic data has led to the discovery of members of all major chemosensory receptor families in the species, but the data was fragmentary and incomplete. Here we present the analysis of the newly available high-quality genome data for the species, supplemented by additional transcriptome data to generate a high quality reference gene set for the three major chemosensory receptor gene families, the gustatory (GR), olfactory (OR) and antennal ionotropic receptors (IR). Coupled with gene expression analysis our approach allows association of specific receptor types and behaviors, like pheromone and host detection. The dataset will provide valuable support for future analysis of these essential chemosensory modalities in this species and in Lepidoptera in general.

  7. Construction and expression of SET gene and siRNA recombinant adenovirus vectors

    Institute of Scientific and Technical Information of China (English)

    Xu Bo-qun; Lu Pin-hong; Li Ying; Xue Kai; Li Mei; Ma Xiang; Diao Fei-yan; Cui Yu-gui; Liu Jia-yin

    2010-01-01

    Objective: To construct SET gene recombinant adenovirus vector and SET gene small interfering RNA (SiRNA) recombinant adenovirus vector for over-expression or knock-down of SET levels.Methods: The cDNA sequence of SET was cloned by reverse transcriptive polymerase chain reaction (RT-PCR) and the SET gene fragment was subcloned into adenovirus shuttle plasmid pAdTrack-CMV to construct the shuttle plasmid pAdTrack-SET. The shuttle plasmid pAdtrack-SET was transformed into BJ5183 cells with the adenoviral backbone pAdEasy-1 to obtain the homologous recombinant Ad-CMV-SET and the recombinant Ad-CMV-SET was packaged and amplified in the AD293 cells. The expression of SET in AD293 cells was detected by Western blot. In addition, we constructed SET gene SiRNA recombinant adenovirus vector (Ad-H1-SiRNA/SET) and its efficacy of knockdown of SET protein was detected in infected GC-2spd(ts) cells by Western blot. Results: The recombinant adenovirus vectors, both SET gene recombinant adenovirus vector Ad-CMV-SET and SET gene SiRNA recombinant adenovirus vector Ad-H1-SiRNA/SET, were proven to be constructed successfully by the evidence of endonulease digestion and sequencing. AD293 cells infected with either recombinant adenovirus vector of Ad-CMV-SET or Ad-H1-SiRNA/SET were observed to express GFP. The expression of SET protein was up-regulated significantly in AD293 cells infected with SET gene recombinant adenovirus vector. On the contrast, SET protein was significantly down-regulated in the GC-2spd(ts) cells infected with Ad-H1-SiRNA/SET (P<0.05) and the knockdown efficiency was approximately 50%-70%. Conclusion: The recombinant adenovirus vector Ad-CMV-SET and Ad-H1-SiRNA/SET were successfully constructed and effectively expressed in germ cells and somatic cells. It provides an experimental tool for further study of SET gene in the physiological and pathophysiological mechanism of reproduction-related diseases.

  8. Gene-Set Local Hierarchical Clustering (GSLHC--A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups.

    Directory of Open Access Journals (Sweden)

    Feng-Hsiang Chung

    Full Text Available Gene-set-based analysis (GSA, which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA, which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap, an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap, in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases.

  9. Gene and miRNA expression signature of Lewis lung carcinoma LLC1 cells in extracellular matrix enriched microenvironment.

    Science.gov (United States)

    Stankevicius, Vaidotas; Vasauskas, Gintautas; Bulotiene, Danute; Butkyte, Stase; Jarmalaite, Sonata; Rotomskis, Ricardas; Suziedelis, Kestutis

    2016-10-11

    The extracellular matrix (ECM), one of the key components of tumor microenvironment, has a tremendous impact on cancer development and highly influences tumor cell features. ECM affects vital cellular functions such as cell differentiation, migration, survival and proliferation. Gene and protein expression levels are regulated in cell-ECM interaction dependent manner as well. The rate of unsuccessful clinical trials, based on cell culture research models lacking the ECM microenvironment, indicates the need for alternative models and determines the shift to three-dimensional (3D) laminin rich ECM models, better simulating tissue organization. Recognized advantages of 3D models suggest the development of new anticancer treatment strategies. This is among the most promising directions of 3D cell cultures application. However, detailed analysis at the molecular level of 2D/3D cell cultures and tumors in vivo is still needed to elucidate cellular pathways most promising for the development of targeted therapies. In order to elucidate which biological pathways are altered during microenvironmental shift we have analyzed whole genome mRNA and miRNA expression differences in LLC1 cells cultured in 2D or 3D culture conditions. In our study we used DNA microarrays for whole genome analysis of mRNA and miRNA expression differences in LLC1 cells cultivated in 2D or 3D culture conditions. Next, we indicated the most common enriched functional categories using KEGG pathway enrichment analysis. Finally, we validated the microarray data by quantitative PCR in LLC1 cells cultured under 2D or 3D conditions or LLC1 tumors implanted in experimental animals. Microarray gene expression analysis revealed that 1884 genes and 77 miRNAs were significantly altered in LLC1 cells after 48 h cell growth under 2D and ECM based 3D cell growth conditions. Pathway enrichment results indicated metabolic pathway, MAP kinase, cell adhesion and immune response as the most significantly altered

  10. GOMA: functional enrichment analysis tool based on GO modules

    Institute of Scientific and Technical Information of China (English)

    Qiang Huang; Ling-Yun Wu; Yong Wang; Xiang-Sun Zhang

    2013-01-01

    Analyzing the function of gene sets is a critical step in interpreting the results of high-throughput experiments in systems biology.A variety of enrichment analysis tools have been developed in recent years,but most output a long list of significantly enriched terms that are often redundant,making it difficult to extract the most meaningful functions.In this paper,we present GOMA,a novel enrichment analysis method based on the new concept of enriched functional Gene Ontology (GO) modules.With this method,we systematically revealed functional GO modules,i.e.,groups of functionally similar GO terms,via an optimization model and then ranked them by enrichment scores.Our new method simplifies enrichment analysis results by reducing redundancy,thereby preventing inconsistent enrichment results among functionally similar terms and providing more biologically meaningful results.

  11. GOMA: functional enrichment analysis tool based on GO modules

    Science.gov (United States)

    Huang, Qiang; Wu, Ling-Yun; Wang, Yong; Zhang, Xiang-Sun

    2013-01-01

    Analyzing the function of gene sets is a critical step in interpreting the results of high-throughput experiments in systems biology. A variety of enrichment analysis tools have been developed in recent years, but most output a long list of significantly enriched terms that are often redundant, making it difficult to extract the most meaningful functions. In this paper, we present GOMA, a novel enrichment analysis method based on the new concept of enriched functional Gene Ontology (GO) modules. With this method, we systematically revealed functional GO modules, i.e., groups of functionally similar GO terms, via an optimization model and then ranked them by enrichment scores. Our new method simplifies enrichment analysis results by reducing redundancy, thereby preventing inconsistent enrichment results among functionally similar terms and providing more biologically meaningful results. PMID:23237213

  12. Novel histone biotinylation marks are enriched in repeat regions and participate in repression of transcriptionally competent genes.

    Science.gov (United States)

    Pestinger, Valerie; Wijeratne, Subhashinee S K; Rodriguez-Melendez, Rocio; Zempleni, Janos

    2011-04-01

    Covalent histone modifications play crucial roles in chromatin structure and genome stability. We previously reported biotinylation of lysine (K) residues in histones H2A, H3 and H4 by holocarboxylase synthetase and demonstrated that K12-biotinylated histone H4 (H4K12bio) is enriched in repeat regions and participates in gene repression. The biological functions of biotinylation marks other than H4K12bio are poorly understood. Here, novel biotinylation site-specific antibodies against H3K9bio, H3K18bio and H4K8bio were used in chromatin immunoprecipitation studies to obtain first insights into possible biological functions of these marks. Chromatin immunoprecipitation assays were conducted in human primary fibroblasts and Jurkat lymphoblastoma cells, and revealed that H3K9bio, H3K18bio and H4K8bio are enriched in repeat regions such as pericentromeric alpha satellite repeats and long-terminal repeats while being depleted in transcriptionally active promoters in euchromatin. Transcriptional stimulation of the repressed interleukin-2 promoter triggered a rapid depletion of histone biotinylation marks at this locus in Jurkat cells, which was paralleled by an increase in interleukin-2 mRNA. Importantly, the enrichment of H3K9bio, H3K18bio and H4K8bio at genomic loci depended on the concentration of biotin in culture media at nutritionally relevant levels, suggesting a novel mechanism of gene regulation by biotin. Copyright © 2011 Elsevier Inc. All rights reserved.

  13. Flavanol-Enriched Cocoa Powder Alters the Intestinal Microbiota, Tissue and Fluid Metabolite Profiles, and Intestinal Gene Expression in Pigs.

    Science.gov (United States)

    Jang, Saebyeol; Sun, Jianghao; Chen, Pei; Lakshman, Sukla; Molokin, Aleksey; Harnly, James M; Vinyard, Bryan T; Urban, Joseph F; Davis, Cindy D; Solano-Aguilar, Gloria

    2016-04-01

    Consumption of cocoa-derived polyphenols has been associated with several health benefits; however, their effects on the intestinal microbiome and related features of host intestinal health are not adequately understood. The objective of this study was to determine the effects of eating flavanol-enriched cocoa powder on the composition of the gut microbiota, tissue metabolite profiles, and intestinal immune status. Male pigs (5 mo old, 28 kg mean body weight) were supplemented with 0, 2.5, 10, or 20 g flavanol-enriched cocoa powder/d for 27 d. Metabolites in serum, urine, the proximal colon contents, liver, and adipose tissue; bacterial abundance in the intestinal contents and feces; and intestinal tissue gene expression of inflammatory markers and Toll-like receptors (TLRs) were then determined. O-methyl-epicatechin-glucuronide conjugates dose-dependently increased (Pcocoa powder. The concentration of 3-hydroxyphenylpropionic acid isomers in urine decreased as the dose of cocoa powder fed to pigs increased (75-85%,Pcocoa powder/d, respectively. Moreover, consumption of cocoa powder reducedTLR9gene expression in ileal Peyer's patches (67-80%,Pcocoa powder/d compared with pigs not supplemented with cocoa powder. This study demonstrates that consumption of cocoa powder by pigs can contribute to gut health by enhancing the abundance ofLactobacillusandBifidobacteriumspecies and modulating markers of localized intestinal immunity. © 2016 American Society for Nutrition.

  14. Transcriptome and Gene Ontology (GO) Enrichment Analysis Reveals Genes Involved in Biotin Metabolism That Affect L-Lysine Production in Corynebacterium glutamicum.

    Science.gov (United States)

    Kim, Hong-Il; Kim, Jong-Hyeon; Park, Young-Jin

    2016-03-09

    Corynebacterium glutamicum is widely used for amino acid production. In the present study, 543 genes showed a significant change in their mRNA expression levels in L-lysine-producing C. glutamicum ATCC21300 than that in the wild-type C. glutamicum ATCC13032. Among these 543 differentially expressed genes (DEGs), 28 genes were up- or downregulated. In addition, 454 DEGs were functionally enriched and categorized based on BLAST sequence homologies and gene ontology (GO) annotations using the Blast2GO software. Interestingly, NCgl0071 (bioB, encoding biotin synthase) was expressed at levels ~20-fold higher in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain. Five other genes involved in biotin metabolism or transport--NCgl2515 (bioA, encoding adenosylmethionine-8-amino-7-oxononanoate aminotransferase), NCgl2516 (bioD, encoding dithiobiotin synthetase), NCgl1883, NCgl1884, and NCgl1885--were also expressed at significantly higher levels in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain, which we determined using both next-generation RNA sequencing and quantitative real-time PCR analysis. When we disrupted the bioB gene in C. glutamicum ATCC21300, L-lysine production decreased by approximately 76%, and the three genes involved in biotin transport (NCgl1883, NCgl1884, and NCgl1885) were significantly downregulated. These results will be helpful to improve our understanding of C. glutamicum for industrial amino acid production.

  15. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  16. Pathways enrichment analysis for differentially expressed genes in squamous lung cancer.

    Science.gov (United States)

    Qian, Liqiang; Luo, Qingquan; Zhao, Xiaojing; Huang, Jia

    2014-01-01

    Squamous lung cancer (SQLC) is a common type of lung cancer, but its oncogenesis mechanism is not so clear. The aim of this study was to screen the potential pathways changed in SQLC and elucidate the mechanism of it. Published microarray data of GSE3268 series was downloaded from Gene Expression Omnibus (GEO). Significance analysis of microarrays was performed using software R, and differentially expressed genes (DEGs) were harvested. The functions and pathways of DEGs were mapped in Gene Otology and KEGG pathway database, respectively. A total of 2961 genes were filtered as DEGs between normal and SQLC cells. Cell cycle and metabolism were the mainly changed functions of SQLC cells. Meanwhile genes such as MCM, RFC, FEN1, and POLD may induce SQLC through DNA replication pathway, and genes such as PTTG1, CCNB1, CDC6, and PCNA may be involved in SQLC through cell cycle pathway. It is demonstrated that pathway analysis is useful in the identification of target genes in SQLC.

  17. Isolation and characterization of rice cesium transporter genes from a rice-transporter-enriched yeast expression library.

    Science.gov (United States)

    Yamaki, Tomohiro; Otani, Masahiro; Ono, Kohei; Mimura, Takuro; Oda, Koshiro; Minamii, Takeshi; Matsumoto, Shingo; Matsuo, Yuzy; Kawamukai, Makoto; Akihiro, Takashi

    2017-08-01

    A considerable portion of agricultural land in central-east Japan has been contaminated by radioactive material, particularly radioactive Cs, due to the industrial accident at the Fukushima Daiichi nuclear power plant. Understanding the mechanism of absorption, translocation and accumulation of Cs(+) in plants will greatly assist in developing approaches to help reduce the radioactive contamination of agricultural products. At present, however, little is known regarding the Cs(+) transporters in rice. A transporter-enriched yeast expression library was constructed and the library was screened for Cs(+) transporter genes. The 1452 full length cDNAs encoding transporter genes were obtained from the Rice Genome Resource Center and 1358 clones of these transporter genes were successively subcloned into yeast expression vectors; which were then transferred into yeast. Using this library, both positive and negative selection screens can be performed, which have not been previously possible. The constructed library is an excellent tool for the isolation of novel transporter genes. This library was screened for clones that were sensitive to Cs(+) using a SD-Gal medium containing either 30 or 70 mM CsCl; resulting in the isolation of 13 Cs(+) sensitive clones. (137) Cs absorption experiments were conducted and confirmed that all of the identified clones were able to absorb (137) Cs. A total of 3 potassium transporters, 2 ABC transporters and 1 NRAMP transporter were among the 13 identified clones. © 2017 Scandinavian Plant Physiology Society.

  18. liver-enriched gene 1a and 1b encode novel secretory proteins essential for normal liver development in zebrafish.

    Directory of Open Access Journals (Sweden)

    Changqing Chang

    Full Text Available liver-enriched gene 1 (leg1 is a liver-enriched gene in zebrafish and encodes a novel protein. Our preliminary data suggested that Leg1 is probably involved in early liver development. However, no detailed characterization of Leg1 has been reported thus far. We undertook both bioinformatic and experimental approaches to study leg1 gene structure and its role in early liver development. We found that Leg1 identifies a new conserved protein superfamily featured by the presence of domain of unknown function 781 (DUF781. There are two copies of leg1 in zebrafish, namely leg1a and leg1b. Both leg1a and leg1b are expressed in the larvae and adult liver with leg1a being the predominant form. Knockdown of Leg1a or Leg1b by their respective morpholinos specifically targeting their 5'-UTR each resulted in a small liver phenotype, demonstrating that both Leg1a and Leg1b are important for early liver development. Meanwhile, we found that injection of leg1-ATG(MO, a morpholino which can simultaneously block the translation of Leg1a and Leg1b, caused not only a small liver phenotype but hypoplastic exocrine pancreas and intestinal tube as well. Further examination of leg1-ATG(MO morphants with early endoderm markers and early hepatic markers revealed that although depletion of total Leg1 does not alter the hepatic and pancreatic fate of the endoderm cells, it leads to cell cycle arrest that results in growth retardation of liver, exocrine pancreas and intestine. Finally, we proved that Leg1 is a secretory protein. This intrigued us to propose that Leg1 might act as a novel secreted regulator that is essential for liver and other digestive organ development in zebrafish.

  19. Effect of biochar amendment on the control of soil sulfonamides, antibiotic-resistant bacteria, and gene enrichment in lettuce tissues

    Energy Technology Data Exchange (ETDEWEB)

    Ye, Mao [State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008 (China); Sun, Mingming [Soil Ecology Lab, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing 210095 (China); Feng, Yanfang, E-mail: fengyanfang@163.com [Institute of Agricultural Resources and Environment, Jiangsu Academy of Agricultural Sciences, Nanjing 210014 (China); Wan, Jinzhong [Nanjing Institute of Environmental Science, Ministry of Environmental Protection of China, Nanjing 210042 (China); Xie, Shanni; Tian, Da [Soil Ecology Lab, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing 210095 (China); Zhao, Yu [Collaborative Innovation Center of Advanced Microstructures, Jiangsu Provincial Key Laboratory of Photonic and Electronic Materials, School of Electronic Science and Engineering, Nanjing University, Nanjing 210093 (China); Wu, Jun; Hu, Feng; Li, Huixin [Soil Ecology Lab, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing 210095 (China); Jiang, Xin, E-mail: Jiangxin@issas.ac.cn [State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008 (China)

    2016-05-15

    Highlights: • Biochar can prevent soil sulfonamides from accumulating in lettuce tissues. • ARB enrichment in lettuce tissues decreased significantly after biochar amendment. • Impedance effect of biochar addition on soil ARGs was also quite effective. • Biochar application can be a practical strategy to protect vegetable safety. - Abstract: Considering the potential threat of vegetables growing in antibiotic-polluted soil with high abundance of antibiotic-resistant genes (ARGs) against human health through the food chain, it is thus urgent to develop novel control technology to ensure vegetable safety. In the present work, pot experiments were conducted in lettuce cultivation to assess the impedance effect of biochar amendment on soil sulfonamides (SAs), antibiotic-resistant bacteria (ARB), and ARG enrichment in lettuce tissues. After 100 days of cultivation, lettuce cultivation with biochar amendment exhibited the greatest soil SA dissipation as well as the significant improvement of lettuce growth indices, with residual soil SAs mainly existing as the tightly bound fraction. Moreover, the SA contents in roots and new/old leaves were reduced by one to two orders of magnitude compared to those without biochar amendment. In addition, isolate counts for SA-resistant bacterial endophytes in old leaves and sul gene abundances in roots and old leaves also decreased significantly after biochar application. However, neither SA resistant bacteria nor sul genes were detected in new leaves. It was the first study to demonstrate that biochar amendment can be a practical strategy to protect lettuce safety growing in SA-polluted soil with rich ARB and ARGs.

  20. Adipose tissue transcriptional response of lipid metabolism genes in growing Iberian pigs fed oleic acid v. carbohydrate enriched diets.

    Science.gov (United States)

    Benítez, R; Núñez, Y; Fernández, A; Isabel, B; Rodríguez, C; Daza, A; López-Bote, C; Silió, L; Óvilo, C

    2016-06-01

    Diet influences animal body and tissue composition due to direct deposition and to the nutrients effects on metabolism. The influence of specific nutrients on the molecular regulation of lipogenesis is not well characterized and is known to be influenced by many factors including timing and physiological status. A trial was performed to study the effects of different dietary energy sources on lipogenic genes transcription in ham adipose tissue of Iberian pigs, at different growth periods and on feeding/fasting situations. A total of 27 Iberian male pigs of 28 kg BW were allocated to two separate groups and fed with different isocaloric feeding regimens: standard diet with carbohydrates as energy source (CH) or diet enriched with high oleic sunflower oil (HO). Ham subcutaneous adipose tissue was sampled by biopsy at growing (44 kg mean BW) and finishing (100 kg mean BW) periods. The first sampling was performed on fasted animals, while the last sampling was performed twice, with animals fasted overnight and 3 h after refeeding. Effects of diet, growth period and feeding/fasting status on gene expression were explored quantifying the expression of a panel of key genes implicated in lipogenesis and lipid metabolism processes. Quantitative PCR revealed several differentially expressed genes according to diet, with similar results at both timings: RXRG, LEP and FABP5 genes were upregulated in HO group while ME1, FASN, ACACA and ELOVL6 were upregulated in CH. The diet effect on ME1 gene expression was conditional on feeding/fasting status, with the higher ME1 gene expression in CH than HO groups, observed only in fasting samples. Results are compatible with a higher de novo endogenous synthesis of fatty acids (FA) in the carbohydrate-supplemented group and a higher FA transport in the oleic acid-supplemented group. Growth period significantly affected the expression of most of the studied genes, with all but PPARG showing higher expression in finishing pigs according to

  1. Improved detection of Burkholderia pseudomallei from non-blood clinical specimens using enrichment culture and PCR: narrowing diagnostic gap in resource-constrained settings.

    Science.gov (United States)

    Tellapragada, Chaitanya; Shaw, Tushar; D'Souza, Annet; Eshwara, Vandana Kalwaje; Mukhopadhyay, Chiranjay

    2017-07-01

    To evaluate the diagnostic utility of enrichment culture and PCR for improved case detection rates of non-bacteraemic form of melioidosis in limited resource settings. Clinical specimens (n = 525) obtained from patients presenting at a tertiary care hospital of South India with clinical symptoms suggestive of community-acquired pneumonia, lower respiratory tract infections, superficial or internal abscesses, chronic skin ulcers and bone or joint infections were tested for the presence of Burkholderia pseudomallei using conventional culture (CC), enrichment culture (EC) and PCR. Sensitivity, specificity, positive and negative predictive values of CC and PCR were initially deduced using EC as the gold standard method. Further, diagnostic accuracies of all the three methods were analysed using Bayesian latent class modelling (BLCM). Detection rates of B. pseudomallei using CC, EC and PCR were 3.8%, 5.3% and 6%, respectively. Diagnostic sensitivities and specificities of CC and PCR were 71.4, 98.4% and 100 and 99.4%, respectively in comparison with EC as the gold standard test. With Bayesian latent class modelling, EC and PCR demonstrated sensitivities of 98.7 and 99.3%, respectively, while CC showed a sensitivity of 70.3% for detection of B. pseudomallei. An increase of 1.6% (95% CI: 1.08-4.32%) in the case detection rate of melioidosis was observed in the study population when EC and/or PCR were used in adjunct to the conventional culture technique. Our study findings underscore the diagnostic superiority of enrichment culture and/or PCR over conventional microbiological culture for improved case detection of melioidosis from non-blood clinical specimens. © 2017 John Wiley & Sons Ltd.

  2. Enrichment of brain-related genes on the mammalian X chromosome is ancient and predates the divergence of synapsid and sauropsid lineages.

    Science.gov (United States)

    Kemkemer, Claus; Kohn, Matthias; Kehrer-Sawatzki, Hildegard; Fundele, Reinald H; Hameister, Horst

    2009-01-01

    Previous studies have revealed an enrichment of reproduction- and brain-related genes on the human X chromosome. In the present study, we investigated the evolutionary history that underlies this functional specialization. To do so, we analyzed the orthologous building blocks of the mammalian X chromosome in the chicken genome. We used Affymetrix chicken genome microarrays to determine tissue-selective gene expression in several tissues of the chicken, including testis and brain. Subsequently, chromosomal distribution of genes with tissue-selective expression was determined. These analyzes provided several new findings. Firstly, they showed that chicken chromosomes orthologous to the mammalian X chromosome exhibited an increased concentration of genes expressed selectively in brain. More specifically, the highest concentration of brain-selectively expressed genes was found on chicken chromosome GGA12, which shows orthology to the X chromosomal regions with the highest enrichment of non-syndromic X-linked mental retardation (MRX) genes. Secondly, and in contrast to the first finding, no enrichment of testis-selective genes could be detected on these chicken chromosomes. These findings indicate that the accumulation of brain-related genes on the prospective mammalian X chromosome antedates the divergence of sauropsid and synapsid lineages 315 million years ago, whereas the accumulation of testis-related genes on the mammalian X chromosome is more recent and due to adaptational changes.

  3. Identification of a core set of genes that signifies pathways underlying cardiac hypertrophy

    DEFF Research Database (Denmark)

    Strom, C.C.; Kruhoffer, M.; Knudsen, Steen

    2004-01-01

    Although the molecular signals underlying cardiac hypertrophy have been the subject of intense investigation, the extent of common and distinct gene regulation between different forms of cardiac hypertrophy remains unclear. We hypothesized that a general and comparative analysis of hypertrophic...... gene expression, using microarray technology in multiple models of cardiac hypertrophy, including aortic banding, myocardial infarction, an arteriovenous shunt and pharmacologically induced hypertrophy, would uncover networks of conserved hypertrophy-specific genes and identify novel genes involved...... in hypertrophic signalling. From gene expression analyses (8740 probe sets, n = 46) of rat ventricular RNA, we identified a core set of 139 genes with consistent differential expression in all hypertrophy models as compared to their controls, including 78 genes not previously associated with hypertrophy and 61...

  4. Identification of a core set of genes that signifies pathways underlying cardiac hypertrophy

    DEFF Research Database (Denmark)

    Strøm, Claes C; Kruhøffer, Mogens; Knudsen, Steen

    2004-01-01

    Although the molecular signals underlying cardiac hypertrophy have been the subject of intense investigation, the extent of common and distinct gene regulation between different forms of cardiac hypertrophy remains unclear. We hypothesized that a general and comparative analysis of hypertrophic...... gene expression, using microarray technology in multiple models of cardiac hypertrophy, including aortic banding, myocardial infarction, an arteriovenous shunt and pharmacologically induced hypertrophy, would uncover networks of conserved hypertrophy-specific genes and identify novel genes involved...... in hypertrophic signalling. From gene expression analyses (8740 probe sets, n = 46) of rat ventricular RNA, we identified a core set of 139 genes with consistent differential expression in all hypertrophy models as compared to their controls, including 78 genes not previously associated with hypertrophy and 61...

  5. Mechanical Unloading of Mouse Bone in Microgravity Significantly Alters Cell Cycle Gene Set Expression

    Science.gov (United States)

    Blaber, Elizabeth; Dvorochkin, Natalya; Almeida, Eduardo; Kaplan, Warren; Burns, Brnedan

    2012-07-01

    unloading in spaceflight, we conducted genome wide microarray analysis of total RNA isolated from the mouse pelvis. Specifically, 16 week old mice were subjected to 15 days spaceflight onboard NASA's STS-131 space shuttle mission. The pelvis of the mice was dissected, the bone marrow was flushed and the bones were briefly stored in RNAlater. The pelvii were then homogenized, and RNA was isolated using TRIzol. RNA concentration and quality was measured using a Nanodrop spectrometer, and 0.8% agarose gel electrophoresis. Samples of cDNA were analyzed using an Affymetrix GeneChip\\S Gene 1.0 ST (Sense Target) Array System for Mouse and GenePattern Software. We normalized the ST gene arrays using Robust Multichip Average (RMA) normalization, which summarizes perfectly matched spots on the array through the median polish algorithm, rather than normalizing according to mismatched spots. We also used Limma for statistical analysis, using the BioConductor Limma Library by Gordon Smyth, and differential expression analysis to identify genes with significant changes in expression between the two experimental conditions. Finally we used GSEApreRanked for Gene Set Enrichment Analysis (GSEA), with Kolmogorov-Smirnov style statistics to identify groups of genes that are regulated together using the t-statistics derived from Limma. Preliminary results show that 6,603 genes expressed in pelvic bone had statistically significant alterations in spaceflight compared to ground controls. These prominently included cell cycle arrest molecules p21, and p18, cell survival molecule Crbp1, and cell cycle molecules cyclin D1, and Cdk1. Additionally, GSEA results indicated alterations in molecular targets of cyclin D1 and Cdk4, senescence pathways resulting from abnormal laminin maturation, cell-cell contacts via E-cadherin, and several pathways relating to protein translation and metabolism. In total 111 gene sets out of 2,488, about 4%, showed statistically significant set alterations. These

  6. Deletion of the Novel Oocyte-Enriched Gene, Gpr149, Leads to Increased Fertility in Mice

    Science.gov (United States)

    Edson, Mark A.; Lin, Yi-Nan; Matzuk, Martin M.

    2010-01-01

    Through in silico subtraction and microarray analysis, we identified mouse Gpr149, a novel, oocyte-enriched transcript that encodes a predicted orphan G-protein-coupled receptor (GPR). Phylogenetic analysis of GPR149 from fish to mammals suggests that it is widely conserved in vertebrates. By multitissue RT-PCR analysis, we found that Gpr149 is highly expressed in the ovary and also in the brain and the digestive tract at low levels. Gpr149 levels are low in newborn ovaries but increase throughout folliculogenesis. In the ovary, we found that granulosa cells did not express Gpr149, whereas germinal vesicle and meiosis II stage oocytes showed high levels of Gpr149 expression. After fertilization, Gpr149 expression declined, becoming undetectable by the two-cell stage. To study the function of GPR149 in oocyte growth and maturation, we generated Gpr149 null mice. Surprisingly, Gpr149 null mice are viable and have normal folliculogenesis, but demonstrate increased fertility, enhanced ovulation, increased oocyte Gdf9 mRNA levels, and increased levels of FSH receptor and cyclin D2 mRNA levels in granulosa cells. Thus, Gpr149 null mice are one of the few models with enhanced fertility, and GPR149 could be a target for small molecules to enhance fertility in the assisted reproductive technology clinic. PMID:19887567

  7. miEAA: microRNA enrichment analysis and annotation.

    Science.gov (United States)

    Backes, Christina; Khaleeq, Qurratulain T; Meese, Eckart; Keller, Andreas

    2016-07-08

    Similar to the development of gene set enrichment and gene regulatory network analysis tools over a decade ago, microRNA enrichment tools are currently gaining importance. Building on our experience with the gene set analysis toolkit GeneTrail, we implemented the miRNA Enrichment Analysis and Annotation tool (miEAA). MiEAA is a web-based application that offers a variety of commonly applied statistical tests such as over-representation analysis and miRNA set enrichment analysis, which is similar to Gene Set Enrichment Analysis. Besides the different statistical tests, miEAA also provides rich functionality in terms of miRNA categories. Altogether, over 14 000 miRNA sets have been added, including pathways, diseases, organs and target genes. Importantly, our tool can be applied for miRNA precursors as well as mature miRNAs. To make the tool as useful as possible we additionally implemented supporting tools such as converters between different miRBase versions and converters from miRNA names to precursor names. We evaluated the performance of miEAA on two sets of miRNAs that are affected in lung adenocarcinomas and have been detected by array analysis. The web-based application is freely accessible at: http://www.ccb.uni-saarland.de/mieaa_tool/.

  8. Reduced expression of brain-enriched microRNAs in glioblastomas permits targeted regulation of a cell death gene.

    Directory of Open Access Journals (Sweden)

    Rebecca L Skalsky

    Full Text Available Glioblastoma is a highly aggressive malignant tumor involving glial cells in the human brain. We used high-throughput sequencing to comprehensively profile the small RNAs expressed in glioblastoma and non-tumor brain tissues. MicroRNAs (miRNAs made up the large majority of small RNAs, and we identified over 400 different cellular pre-miRNAs. No known viral miRNAs were detected in any of the samples analyzed. Cluster analysis revealed several miRNAs that were significantly down-regulated in glioblastomas, including miR-128, miR-124, miR-7, miR-139, miR-95, and miR-873. Post-transcriptional editing was observed for several miRNAs, including the miR-376 family, miR-411, miR-381, and miR-379. Using the deep sequencing information, we designed a lentiviral vector expressing a cell suicide gene, the herpes simplex virus thymidine kinase (HSV-TK gene, under the regulation of a miRNA, miR-128, that was found to be enriched in non-tumor brain tissue yet down-regulated in glioblastomas, Glioblastoma cells transduced with this vector were selectively killed when cultured in the presence of ganciclovir. Using an in vitro model to recapitulate expression of brain-enriched miRNAs, we demonstrated that neuronally differentiated SH-SY5Y cells transduced with the miRNA-regulated HSV-TK vector are protected from killing by expression of endogenous miR-128. Together, these results provide an in-depth analysis of miRNA dysregulation in glioblastoma and demonstrate the potential utility of these data in the design of miRNA-regulated therapies for the treatment of brain cancers.

  9. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, van D.A.M.; Goeman, J.J.; Jong, de E.; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    Background: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set

  10. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, D.A. van; Goeman, J.J.; Jong, Esther de; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    BACKGROUND: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set

  11. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, D.A. van; Goeman, J.J.; Jong, Esther de; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    BACKGROUND: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set anal

  12. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, van D.A.M.; Goeman, J.J.; Jong, de E.; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    Background: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set anal

  13. Ontological Enrichment of the Genes-to-Systems Breast Cancer Database

    Science.gov (United States)

    Viti, Federica; Mosca, Ettore; Merelli, Ivan; Calabria, Andrea; Alfieri, Roberta; Milanesi, Luciano

    Breast cancer research need the development of specific and suitable tools to appropriately manage biomolecular knowledge. The presented work deals with the integrative storage of breast cancer related biological data, in order to promote a system biology approach to this network disease. To increase data standardization and resource integration, annotations maintained in Genes-to-Systems Breast Cancer (G2SBC) database are associated to ontological terms, which provide a hierarchical structure to organize data enabling more effective queries, statistical analysis and semantic web searching. Exploited ontologies, which cover all levels of the molecular environment, from genes to systems, are among the most known and widely used bioinformatics resources. In G2SBC database ontology terms both provide a semantic layer to improve data storage, accessibility and analysis and represent a user friendly instrument to identify relations among biological components.

  14. Improving functional modules discovery by enriching interaction networks with gene profiles

    KAUST Repository

    Salem, Saeed

    2013-05-01

    Recent advances in proteomic and transcriptomic technologies resulted in the accumulation of vast amount of high-throughput data that span multiple biological processes and characteristics in different organisms. Much of the data come in the form of interaction networks and mRNA expression arrays. An important task in systems biology is functional modules discovery where the goal is to uncover well-connected sub-networks (modules). These discovered modules help to unravel the underlying mechanisms of the observed biological processes. While most of the existing module discovery methods use only the interaction data, in this work we propose, CLARM, which discovers biological modules by incorporating gene profiles data with protein-protein interaction networks. We demonstrate the effectiveness of CLARM on Yeast and Human interaction datasets, and gene expression and molecular function profiles. Experiments on these real datasets show that the CLARM approach is competitive to well established functional module discovery methods.

  15. Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords.

    Science.gov (United States)

    Luque-Baena, R M; Urda, D; Gonzalo Claros, M; Franco, L; Jerez, J M

    2014-06-01

    Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. However, these techniques are unable to produce stable and robust solutions suitable to use in clinical and biomedical studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection combining the genetic algorithm with biological information extracted from the KEGG database. A comparative study is carried out over public data from three different types of cancer (leukemia, lung cancer and prostate cancer). Even though the analyses only use features having KEGG information, the results demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near future. Additionally, it could also be used for biological knowledge discovery about the studied disease.

  16. An SCD gene from the Mollusca and its upregulation in carotenoid-enriched scallops.

    Science.gov (United States)

    Li, Xue; Ning, Xianhui; Dou, Jinzhuang; Yu, Qian; Wang, Shuyue; Zhang, Lingling; Wang, Shi; Hu, Xiaoli; Bao, Zhenmin

    2015-06-10

    Carotenoids are a diverse group of red, orange, and yellow pigments that act as vitamin A precursors and antioxidants. Animals can only obtain carotenoids through their diets. Amongst the carotenoids identified in nature, over one third are of marine origin, but current research on carotenoid absorption in marine species is limited. Bivalves possess an adductor muscle, which is normally white in scallops. However, a new variety of Yesso scallop (Patinopecten yessoensis), the 'Haida golden scallop', can be distinguished by its adductor muscle's orange colour, which is caused by carotenoid accumulation. Studying the genes related to carotenoid accumulation in this scallop could benefit our understanding of the mechanisms underlying carotenoid absorption in marine organisms, and it could further improve scallop breeding for carotenoid content. Stearoyl-CoA desaturase (SCD) is the rate-limiting enzyme in the production of monounsaturated fatty acids, which enhance carotenoid absorption. Here, the full-length cDNA and genomic DNA sequences of the SCD gene from the Yesso scallop (PySCD) were obtained. The PySCD gene consisted of four exons and three introns, and it contained a 990-bp open reading frame encoding 329 amino acids. It was ubiquitously expressed in adult tissues, embryos and larvae of both white Yesso scallops and 'Haida golden' scallops. Although the expression pattern of PySCD in both types of scallops was similar, significantly more PySCD transcripts were detected in the 'Haida golden' scallops than in the white scallops. Elevated PySCD expression was found in tissues including the adductor muscle, digestive gland, and gonad, as well as in veliger larvae. This study represents the first characterisation of an SCD gene from the Mollusca. Our data imply that PySCD functions in multiple biological processes, and it might be involved in carotenoid accumulation.

  17. ErmineJ: Tool for functional analysis of gene expression data sets

    Directory of Open Access Journals (Sweden)

    Braynen William

    2005-11-01

    Full Text Available Abstract Background It is common for the results of a microarray study to be analyzed in the context of biologically-motivated groups of genes such as pathways or Gene Ontology categories. The most common method for such analysis uses the hypergeometric distribution (or a related technique to look for "over-representation" of groups among genes selected as being differentially expressed or otherwise of interest based on a gene-by-gene analysis. However, this method suffers from some limitations, and biologist-friendly tools that implement alternatives have not been reported. Results We introduce ErmineJ, a multiplatform user-friendly stand-alone software tool for the analysis of functionally-relevant sets of genes in the context of microarray gene expression data. ErmineJ implements multiple algorithms for gene set analysis, including over-representation and resampling-based methods that focus on gene scores or correlation of gene expression profiles. In addition to a graphical user interface, ErmineJ has a command line interface and an application programming interface that can be used to automate analyses. The graphical user interface includes tools for creating and modifying gene sets, visualizing the Gene Ontology as a table or tree, and visualizing gene expression data. ErmineJ comes with a complete user manual, and is open-source software licensed under the Gnu Public License. Conclusion The availability of multiple analysis algorithms, together with a rich feature set and simple graphical interface, should make ErmineJ a useful addition to the biologist's informatics toolbox. ErmineJ is available from http://microarray.cu.genome.org.

  18. Exploration of data partitioning in an eight-gene data set

    DEFF Research Database (Denmark)

    Rota, Jadranka; Wahlberg, Niklas

    2012-01-01

    Molecular data sets for phylogenetic inference continue to increase in size, especially with respect to the number of genes sampled. As more and more genes are included in analyses, the importance of partitioning the data to avoid problems that can arise from underparameterization becomes more...... apparent. With an eight-gene data set from 38 metalmark moth species (12 genera represented) and three outgroups, we explored different data partitioning strategies and their influence on convergence and mixing of Markov Chains Monte Carlo in a Bayesian setting. We found that in larger data sets......, with an increase in the number of partitions that are made a priori (e.g. by gene and codon position), convergence and mixing become poor. This problem can be overcome by using a recently published algorithm in which homologous sites are grouped into blocks with similar evolutionary rates that can then be modelled...

  19. Molecular cloning and expression analysis of the retinoid X receptor (RXR) gene in golden pompano Trachinotus ovatus fed Artemia nauplii with different enrichments.

    Science.gov (United States)

    Yang, Qibin; Zheng, Panlong; Ma, Zhenhua; Li, Tao; Jiang, Shigui; Qin, Jian G

    2015-12-01

    The retinoid X receptors (RXRs) are involved in the skeletal development and other biological process such as blood vessel formation and metabolism. Partial sequences of RXRα and β genes were obtained, and their expressions were quantified on golden pompano Trachinotus ovatus at 28 days post hatching (DPH) to explore the molecular response to nutritional manipulation in fish larvae. As live food, Artemia nauplii were separately enriched with Nannochloropsis and Algamac 3080 and non-enriched Artemia nauplii (control) for fish feeding. The expressions of RXRs were detected in the embryos and fish larvae at early stages, suggesting that the skeletal development in golden pompano initiated before yolk re-sorption completion. Fish fed non-enriched Artemia nauplii ended up with higher jaw malformation. The highest specific growth rate was obtained when fish were fed with the Artemia nauplii enriched with Algamac 3080, and the lowest growth rate was observed when fish were fed with unenriched Artemia nauplii. The highest survival was obtained when fish were fed with non-enriched or Nannochloropsis-enriched Artemia nauplii. This study indicates that the use of enriched formula for Artemia nauplii can significantly affect the expression levels of RXRs and jaw malformation of golden pompano larvae, but there is no clear correlation between RXRs expressions and malformation rates when fish are subjected to nutrient challenge.

  20. High-throughput sequencing of mGluR signaling pathway genes reveals enrichment of rare variants in autism.

    Directory of Open Access Journals (Sweden)

    Raymond J Kelleher

    Full Text Available Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism.

  1. Identification of Multiple Dehalogenase Genes Involved in Tetrachloroethene-to-Ethene Dechlorination in a Dehalococcoides-Dominated Enrichment Culture

    Directory of Open Access Journals (Sweden)

    Mohamed Ismaeil

    2017-01-01

    Full Text Available Chloroethenes (CEs are widespread groundwater toxicants that are reductively dechlorinated to nontoxic ethene (ETH by members of Dehalococcoides. This study established a Dehalococcoides-dominated enrichment culture (designated “YN3” that dechlorinates tetrachloroethene (PCE to ETH with high dechlorination activity, that is, complete dechlorination of 800 μM PCE to ETH within 14 days in the presence of Dehalococcoides species at 5.7±1.9×107 copies of 16S rRNA gene/mL. The metagenome of YN3 harbored 18 rdhA genes (designated YN3rdhA1–18 encoding the catalytic subunit of reductive dehalogenase (RdhA, four of which were suggested to be involved in PCE-to-ETH dechlorination based on significant increases in their transcription in response to CE addition. The predicted proteins for two of these four genes, YN3RdhA8 and YN3RdhA16, showed 94% and 97% of amino acid similarity with PceA and VcrA, which are well known to dechlorinate PCE to trichloroethene (TCE and TCE to ETH, respectively. The other two rdhAs, YN3rdhA6 and YN3rdhA12, which were never proved as rdhA for CEs, showed particularly high transcription upon addition of vinyl chloride (VC, with 75±38 and 16±8.6 mRNA copies per gene, respectively, suggesting their possible functions as novel VC-reductive dehalogenases. Moreover, metagenome data indicated the presence of three coexisting bacterial species, including novel species of the genus Bacteroides, which might promote CE dechlorination by Dehalococcoides.

  2. Environmental enrichment attenuates cognitive deficits, but does not alter neurotrophin gene expression in the hippocampus following lateral fluid percussion brain injury.

    Science.gov (United States)

    Hicks, R R; Zhang, L; Atkinson, A; Stevenon, M; Veneracion, M; Seroogy, K B

    2002-01-01

    Environmental enrichment attenuates neurological deficits associated with experimental brain injury. The molecular events that mediate these environmentally induced improvements in function after injury are largely unknown, but neurotrophins have been hypothesized to be a neural substrate because of their role in cell survival and neural plasticity. Furthermore, exposure to complex environments in normal animals increases neurotrophin gene expression. However, following an ischemic injury, environmental enrichment decreases neurotrophin mRNA levels. Whether these contrasting findings are attributable to differences between injured and uninjured animals or are dependent upon the specific type of brain injury has not been determined. We examined the effects of 14 days of environmental enrichment following a lateral fluid percussion brain injury on behavior and gene expression of brain-derived neurotrophic factor, its high-affinity receptor, TrkB, and neurotrophin-3 in the rat hippocampus. Environmental enrichment attenuated learning deficits in the injured animals, but neither the injury nor housing conditions influenced neurotrophin/receptor mRNA levels. From these data we suggest that following brain trauma, improvements in learning associated with environmental enrichment are not mediated by alterations in brain-derived neurotrophic factor, TrkB or neurotrophin-3 gene expression.

  3. Identification and functional validation of a unique set of drought induced genes preferentially expressed in response to gradual water stress in peanut.

    Science.gov (United States)

    Govind, Geetha; Harshavardhan, Vokkaliga ThammeGowda; ThammeGowda, Harshavardhan Vokkaliga; Patricia, Jayaker Kalaiarasi; Kalaiarasi, Patricia Jayaker; Dhanalakshmi, Ramachandra; Iyer, Dhanalakshmi Ramchandra; Senthil Kumar, Muthappa; Muthappa, Senthil Kumar; Sreenivasulu, Nese; Nese, Sreenivasulu; Udayakumar, Makarla; Makarla, Udaya Kumar

    2009-06-01

    Peanut, found to be relatively drought tolerant crop, has been the choice of study to characterize the genes expressed under gradual water deficit stress. Nearly 700 genes were identified to be enriched in subtractive cDNA library from gradual process of drought stress adaptation. Further, expression of the drought inducible genes related to various signaling components and gene sets involved in protecting cellular function has been described based on dot blot experiments. Fifty genes (25 regulators and 25 functional related genes) selected based on dot blot experiments were tested for their stress responsiveness using northern blot analysis and confirmed their nature of differential regulation under different field capacity of drought stress treatments. ESTs generated from this subtracted cDNA library offered a rich source of stress-related genes including signaling components. Additional 50% uncharacterized sequences are noteworthy. Insights gained from this study would provide the foundation for further studies to understand the question of how peanut plants are able to adapt to naturally occurring harsh drought conditions. At present functional validation cannot be deemed in peanut, hence as a proof of concept seven orthologues of drought induced genes of peanut have been silenced in heterologous N. benthamiana system, using virus induced gene silencing method. These results point out the functional importance for HSP70 gene and key regulators such as Jumonji in drought stress response.

  4. Combining distance matrices on identical taxon sets for multi-gene analysis with singular value decomposition.

    Directory of Open Access Journals (Sweden)

    Melanie Abeysundera

    Full Text Available We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97, and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.

  5. Primer Sets Developed for Functional Genes Reveal Shifts in Functionality of Fungal Community in Soils

    NARCIS (Netherlands)

    Hannula, S.E.; van Veen, J.A.

    2016-01-01

    Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in

  6. Enriching regulatory networks by bootstrap learning using optimised GO-based gene similarity and gene links mined from PubMed abstracts

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.; Baddeley, Robert L.; Riensche, Roderick M.; Jensen, Russell S.; Verhagen, Marc; Pustejovsky, James

    2011-02-18

    Transcriptional regulatory networks are being determined using “reverse engineering” methods that infer connections based on correlations in gene state. Corroboration of such networks through independent means such as evidence from the biomedical literature is desirable. Here, we explore a novel approach, a bootstrapping version of our previous Cross-Ontological Analytic method (XOA) that can be used for semi-automated annotation and verification of inferred regulatory connections, as well as for discovery of additional functional relationships between the genes. First, we use our annotation and network expansion method on a biological network learned entirely from the literature. We show how new relevant links between genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. Second, we apply our method to annotation, verification, and expansion of a set of regulatory connections found by the Context Likelihood of Relatedness algorithm.

  7. GeneBrowser 2: an application to explore and identify common biological traits in a set of genes

    Directory of Open Access Journals (Sweden)

    Oliveira José

    2010-07-01

    Full Text Available Abstract Background The development of high-throughput laboratory techniques created a demand for computer-assisted result analysis tools. Many of these techniques return lists of genes whose interpretation requires finding relevant biological roles for the problem at hand. The required information is typically available in public databases, and usually, this information must be manually retrieved to complement the analysis. This process is a very time-consuming task that should be automated as much as possible. Results GeneBrowser is a web-based tool that, for a given list of genes, combines data from several public databases with visualisation and analysis methods to help identify the most relevant and common biological characteristics. The functionalities provided include the following: a central point with the most relevant biological information for each inserted gene; a list of the most related papers in PubMed and gene expression studies in ArrayExpress; and an extended approach to functional analysis applied to Gene Ontology, homologies, gene chromosomal localisation and pathways. Conclusions GeneBrowser provides a unique entry point to several visualisation and analysis methods, providing fast and easy analysis of a set of genes. GeneBrowser fills the gap between Web portals that analyse one gene at a time and functional analysis tools that are limited in scope and usually desktop-based.

  8. Construction and screening of metagenomic libraries derived from enrichment cultures: generation of a gene bank for genes conferring alcohol oxidoreductase activity on Escherichia coli.

    Science.gov (United States)

    Knietsch, Anja; Waschkowitz, Tanja; Bowien, Susanne; Henne, Anke; Daniel, Rolf

    2003-03-01

    Enrichment of microorganisms with special traits and the construction of metagenomic libraries by direct cloning of environmental DNA have great potential for identifying genes and gene products for biotechnological purposes. We have combined these techniques to isolate novel genes conferring oxidation of short-chain (C(2) to C(4)) polyols or reduction of the corresponding carbonyls. In order to favor the growth of microorganisms containing the targeted genes, samples collected from four different environments were incubated in the presence of glycerol and 1,2-propanediol. Subsequently, the DNA was extracted from the four samples and used to construct complex plasmid libraries. Approximately 100,000 Escherichia coli strains of each library per test substrate were screened for the production of carbonyls from polyols on indicator agar. Twenty-four positive E. coli clones were obtained during the initial screen. Sixteen of them contained a plasmid (pAK101 to pAK116) which conferred a stable carbonyl-forming phenotype. Eight of the positive clones exhibited NAD(H)-dependent alcohol oxidoreductase activity with polyols or carbonyls as the substrates in crude extracts. Sequencing revealed that the inserts of pAK101 to pAK116 encoded 36 complete and 17 incomplete presumptive protein-encoding genes. Fifty of these genes showed similarity to sequenced genes from a broad collection of different microorganisms. The genes responsible for the carbonyl formation of E. coli were identified for nine of the plasmids (pAK101, pAK102, pAK105, pAK107 to pAK110, pAK115, and pAK116). Analyses of the amino acid sequences deduced from these genes revealed that three (orf12, orf14, and orf22) encoded novel alcohol dehydrogenases of different types, four (orf5, sucB, fdhD, and yabF) encoded novel putative oxidoreductases belonging to groups distinct from alcohol dehydrogenases, one (glpK) encoded a putative glycerol kinase, and one (orf1) encoded a protein which showed no similarity to any

  9. Dimethylated H3K27 Is a Repressive Epigenetic Histone Mark in the Protist Entamoeba histolytica and Is Significantly Enriched in Genes Silenced via the RNAi Pathway*

    Science.gov (United States)

    Foda, Bardees M.; Singh, Upinder

    2015-01-01

    RNA interference (RNAi) is a fundamental biological process that plays a crucial role in regulation of gene expression in many organisms. Transcriptional gene silencing (TGS) is one of the important nuclear roles of RNAi. Our previous data show that Entamoeba histolytica has a robust RNAi pathway that links to TGS via Argonaute 2-2 (Ago2-2) associated 27-nucleotide small RNAs with 5′-polyphosphate termini. Here, we report the first repressive histone mark to be identified in E. histolytica, dimethylation of H3K27 (H3K27Me2), and demonstrate that it is enriched at genes that are silenced by RNAi-mediated TGS. An RNAi-silencing trigger can induce H3K27Me2 deposits at both episomal and chromosomal loci, mediating gene silencing. Our data support two phases of RNAi-mediated TGS: an active silencing phase where the RNAi trigger is present and both H3K27Me2 and Ago2-2 concurrently enrich at chromosomal loci; and an established silencing phase in which the RNAi trigger is removed, but gene silencing with H3K27Me2 enrichment persist independently of Ago2-2 deposition. Importantly, some genes display resistance to chromosomal silencing despite induction of functional small RNAs. In those situations, the RNAi-triggering plasmid that is maintained episomally gets partially silenced and has H3K27Me2 enrichment, but the chromosomal copy displays no repressive histone enrichment. Our data are consistent with a model in which H3K27Me2 is a repressive histone modification, which is strongly associated with transcriptional repression. This is the first example of an epigenetic histone modification that functions to mediate RNAi-mediated TGS in the deep-branching eukaryote E. histolytica. PMID:26149683

  10. Dimethylated H3K27 Is a Repressive Epigenetic Histone Mark in the Protist Entamoeba histolytica and Is Significantly Enriched in Genes Silenced via the RNAi Pathway.

    Science.gov (United States)

    Foda, Bardees M; Singh, Upinder

    2015-08-21

    RNA interference (RNAi) is a fundamental biological process that plays a crucial role in regulation of gene expression in many organisms. Transcriptional gene silencing (TGS) is one of the important nuclear roles of RNAi. Our previous data show that Entamoeba histolytica has a robust RNAi pathway that links to TGS via Argonaute 2-2 (Ago2-2) associated 27-nucleotide small RNAs with 5'-polyphosphate termini. Here, we report the first repressive histone mark to be identified in E. histolytica, dimethylation of H3K27 (H3K27Me2), and demonstrate that it is enriched at genes that are silenced by RNAi-mediated TGS. An RNAi-silencing trigger can induce H3K27Me2 deposits at both episomal and chromosomal loci, mediating gene silencing. Our data support two phases of RNAi-mediated TGS: an active silencing phase where the RNAi trigger is present and both H3K27Me2 and Ago2-2 concurrently enrich at chromosomal loci; and an established silencing phase in which the RNAi trigger is removed, but gene silencing with H3K27Me2 enrichment persist independently of Ago2-2 deposition. Importantly, some genes display resistance to chromosomal silencing despite induction of functional small RNAs. In those situations, the RNAi-triggering plasmid that is maintained episomally gets partially silenced and has H3K27Me2 enrichment, but the chromosomal copy displays no repressive histone enrichment. Our data are consistent with a model in which H3K27Me2 is a repressive histone modification, which is strongly associated with transcriptional repression. This is the first example of an epigenetic histone modification that functions to mediate RNAi-mediated TGS in the deep-branching eukaryote E. histolytica. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  11. MeSH-Informed Enrichment Analysis and MeSH-Guided Semantic Similarity Among Functional Terms and Gene Products in Chicken.

    Science.gov (United States)

    Morota, Gota; Beissinger, Timothy M; Peñagaricano, Francisco

    2016-01-01

    Biomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation of gene products is mainly accelerated by Gene Ontology (GO), and more recently by Medical Subject Headings (MeSH). Here, we report a suite of MeSH packages for chicken in Bioconductor, and illustrate some features of different MeSH-based analyses, including MeSH-informed enrichment analysis and MeSH-guided semantic similarity among terms and gene products, using two lists of chicken genes available in public repositories. The two published datasets that were employed represent (i) differentially expressed genes, and (ii) candidate genes under selective sweep or epistatic selection. The comparison of MeSH with GO overrepresentation analyses suggested not only that MeSH supports the findings obtained from GO analysis, but also that MeSH is able to further enrich the representation of biological knowledge and often provide more interpretable results. Based on the hierarchical structures of MeSH and GO, we computed semantic similarities among vocabularies, as well as semantic similarities among selected genes. These yielded the similarity levels between significant functional terms, and the annotation of each gene yielded the measures of gene similarity. Our findings show the benefits of using MeSH as an alternative choice of annotation in order to draw biological inferences from a list of genes of interest. We argue that the use of MeSH in conjunction with GO will be instrumental in facilitating the understanding of the genetic basis of complex traits.

  12. In vitro modulation of inflammatory target gene expression by a polyphenol-enriched fraction of rose oil distillation waste water.

    Science.gov (United States)

    Wedler, Jonas; Weston, Anna; Rausenberger, Julia; Butterweck, Veronika

    2016-10-01

    Classical production of rose oil is based on water steam distillation from the flowers of Rosa damascena. During this process, large quantities of waste water accrue which are discharged to the environment, causing severe pollution of both, groundwater and surface water due to a high content of polyphenols. We recently developed a strategy to purify the waste water into a polyphenol-depleted and a polyphenol-enriched fraction RF20-(SP-207). RF20-(SP-207) and sub-fraction F(IV) significantly inhibited cell proliferation and migration of HaCaT cells. Since there is a close interplay between these actions and inflammatory processes, here we focused on the fractions' influence on pro-inflammatory biomarkers. HaCaT keratinocytes were treated with RF20-(SP-207), F(IV) (both at 50μg/mL) and ellagic acid (10μM) for 24h under TNF-α (20ng/mL) stimulated and non-stimulated conditions. Gene expression of IL-1β, IL-6, IL-8, RANTES and MCP-1 was analyzed by reverse transcriptase polymerase chain reaction (RT-PCR) and cellular protein secretion of IL-8, RANTES and MCP-1 was determined by ELISA based assays. RF20-(SP-207) and F(IV) significantly decreased the expression and cellular protein secretion of IL-1β, IL-6, IL-8, RANTES and MCP-1. The diminishing effects on inflammatory target gene expression were slightly less pronounced under TNF-α stimulated conditions. In conclusion, the recovered polyphenol fraction RF20-(SP-207) from rose oil distillation waste water markedly modified inflammatory target gene expression in vitro, and, therefore, could be further developed as alternative treatment of acute and chronic inflammation. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Deficiency of Prdm13, a dorsomedial hypothalamus-enriched gene, mimics age-associated changes in sleep quality and adiposity.

    Science.gov (United States)

    Satoh, Akiko; Brace, Cynthia S; Rensing, Nick; Imai, Shin-Ichiro

    2015-04-01

    The dorsomedial hypothalamus (DMH) controls a number of essential physiological responses. We have demonstrated that the DMH plays an important role in the regulation of mammalian aging and longevity. To further dissect the molecular basis of the DMH function, we conducted microarray-based gene expression profiling with total RNA from laser-microdissected hypothalamic nuclei and tried to find the genes highly and selectively expressed in the DMH. We found neuropeptide VF precursor (Npvf), PR domain containing 13 (Prdm13), and SK1 family transcriptional corepressor (Skor1) as DMH-enriched genes. Particularly, Prdm13, a member of the Prdm family of transcription regulators, was specifically expressed in the compact region of the DMH (DMC), where Nk2 homeobox 1 (Nkx2-1) is predominantly expressed. The expression of Prdm13 in the hypothalamus increased under diet restriction, whereas it decreased during aging. Prdm13 expression also showed diurnal oscillation and was significantly upregulated in the DMH of long-lived BRASTO mice. The transcriptional activity of the Prdm13 promoter was upregulated by Nkx2-1, and knockdown of Nkx2-1 suppressed Prdm13 expression in primary hypothalamic neurons. Interestingly, DMH-specific Prdm13-knockdown mice showed significantly reduced wake time during the dark period and decreased sleep quality, which was defined by the quantity of electroencephalogram delta activity during NREM sleep. DMH-specific Prdm13-knockdown mice also exhibited progressive increases in body weight and adiposity. Our findings indicate that Prdm13/Nkx2-1-mediated signaling in the DMC declines with advanced age, leading to decreased sleep quality and increased adiposity, which mimic age-associated pathophysiology, and provides a potential link to DMH-mediated aging and longevity control in mammals.

  14. DOSE RESPONSE FROM HIGH THROUGHPUT GENE EXPRESSION STUDIES AND THE INFLUENCE OF TIME AND CELL LINE ON INFERRED MODE OF ACTION BY ONTOLOGIC ENRICHMENT (SOT)

    Science.gov (United States)

    Gene expression with ontologic enrichment and connectivity mapping tools is widely used to infer modes of action (MOA) for therapeutic drugs. Despite progress in high-throughput (HT) genomic systems, strategies suitable to identify industrial chemical MOA are needed. The L1000 is...

  15. Allele diversity for abiotic stress responsive candidate genes in chickpea reference set using gene based SNP markers

    Directory of Open Access Journals (Sweden)

    Manish eRoorkiwal

    2014-06-01

    Full Text Available Chickpea is an important food legume crop for the semi-arid regions, however, its productivity is adversely affected by various biotic and abiotic stresses. Identification of candidate genes associated with abiotic stress response will help breeding efforts aiming to enhance its productivity. With this objective, 10 abiotic stress responsive candidate genes were selected on the basis of prior knowledge of this complex trait. These 10 genes were subjected to allele specific sequencing across a chickpea reference set comprising 300 genotypes including 211 accessions of chickpea mini core collection. A total of 1.3 Mbp sequence data were generated. Multiple sequence alignment revealed 79 SNPs and 41 indels in nine genes while the CAP2 gene was found to be conserved across all the genotypes. Among ten candidate genes, the maximum number of SNPs (34 was observed in abscisic acid stress and ripening (ASR gene including 22 transitions, 11 transversions and one tri-allelic SNP. Nucleotide diversity varied from 0.0004 to 0.0029 while PIC values ranged from 0.01 (AKIN gene to 0.43 (CAP2 promoter. Haplotype analysis revealed that alleles were represented by more than two haplotype blocks, except alleles of the CAP2 and sucrose synthase (SuSy gene, where only one haplotype was identified. These genes can be used for association analysis and if validated, may be useful for enhancing abiotic stress, including drought tolerance, through molecular breeding.

  16. Identification of a conserved set of upregulated genes in mouse skeletal muscle hypertrophy and regrowth

    Science.gov (United States)

    Chaillou, Thomas; Jackson, Janna R.; England, Jonathan H.; Kirby, Tyler J.; Richards-White, Jena; Esser, Karyn A.; Dupont-Versteegden, Esther E.

    2014-01-01

    The purpose of this study was to compare the gene expression profile of mouse skeletal muscle undergoing two forms of growth (hypertrophy and regrowth) with the goal of identifying a conserved set of differentially expressed genes. Expression profiling by microarray was performed on the plantaris muscle subjected to 1, 3, 5, 7, 10, and 14 days of hypertrophy or regrowth following 2 wk of hind-limb suspension. We identified 97 differentially expressed genes (≥2-fold increase or ≥50% decrease compared with control muscle) that were conserved during the two forms of muscle growth. The vast majority (∼90%) of the differentially expressed genes was upregulated and occurred at a single time point (64 out of 86 genes), which most often was on the first day of the time course. Microarray analysis from the conserved upregulated genes showed a set of genes related to contractile apparatus and stress response at day 1, including three genes involved in mechanotransduction and four genes encoding heat shock proteins. Our analysis further identified three cell cycle-related genes at day and several genes associated with extracellular matrix (ECM) at both days 3 and 10. In conclusion, we have identified a core set of genes commonly upregulated in two forms of muscle growth that could play a role in the maintenance of sarcomere stability, ECM remodeling, cell proliferation, fast-to-slow fiber type transition, and the regulation of skeletal muscle growth. These findings suggest conserved regulatory mechanisms involved in the adaptation of skeletal muscle to increased mechanical loading. PMID:25554798

  17. Identification of a conserved set of upregulated genes in mouse skeletal muscle hypertrophy and regrowth.

    Science.gov (United States)

    Chaillou, Thomas; Jackson, Janna R; England, Jonathan H; Kirby, Tyler J; Richards-White, Jena; Esser, Karyn A; Dupont-Versteegden, Esther E; McCarthy, John J

    2015-01-01

    The purpose of this study was to compare the gene expression profile of mouse skeletal muscle undergoing two forms of growth (hypertrophy and regrowth) with the goal of identifying a conserved set of differentially expressed genes. Expression profiling by microarray was performed on the plantaris muscle subjected to 1, 3, 5, 7, 10, and 14 days of hypertrophy or regrowth following 2 wk of hind-limb suspension. We identified 97 differentially expressed genes (≥2-fold increase or ≥50% decrease compared with control muscle) that were conserved during the two forms of muscle growth. The vast majority (∼90%) of the differentially expressed genes was upregulated and occurred at a single time point (64 out of 86 genes), which most often was on the first day of the time course. Microarray analysis from the conserved upregulated genes showed a set of genes related to contractile apparatus and stress response at day 1, including three genes involved in mechanotransduction and four genes encoding heat shock proteins. Our analysis further identified three cell cycle-related genes at day and several genes associated with extracellular matrix (ECM) at both days 3 and 10. In conclusion, we have identified a core set of genes commonly upregulated in two forms of muscle growth that could play a role in the maintenance of sarcomere stability, ECM remodeling, cell proliferation, fast-to-slow fiber type transition, and the regulation of skeletal muscle growth. These findings suggest conserved regulatory mechanisms involved in the adaptation of skeletal muscle to increased mechanical loading. Copyright © 2015 the American Physiological Society.

  18. MicroPattern: a web-based tool for microbe set enrichment analysis and disease similarity calculation based on a list of microbes

    Science.gov (United States)

    Ma, Wei; Huang, Chuanbo; Zhou, Yuan; Li, Jianwei; Cui, Qinghua

    2017-01-01

    The microbiota colonized on human body is renowned as “a forgotten organ” due to its big impacts on human health and disease. Recently, microbiome studies have identified a large number of microbes differentially regulated in a variety of conditions, such as disease and diet. However, methods for discovering biological patterns in the differentially regulated microbes are still limited. For this purpose, here, we developed a web-based tool named MicroPattern to discover biological patterns for a list of microbes. In addition, MicroPattern implemented and integrated an algorithm we previously presented for the calculation of disease similarity based on disease-microbe association data. MicroPattern first grouped microbes into different sets based on the associated diseases and the colonized positions. Then, for a given list of microbes, MicroPattern performed enrichment analysis of the given microbes on all of the microbe sets. Moreover, using MicroPattern, we can also calculate disease similarity based on the shared microbe associations. Finally, we confirmed the accuracy and usefulness of MicroPattern by applying it to the changed microbes under the animal-based diet condition. MicroPattern is freely available at http://www.cuilab.cn/micropattern. PMID:28071710

  19. Genome-wide Anaplasma phagocytophilum AnkA-DNA interactions are enriched in intergenic regions and gene promoters and correlate with infection-induced differential gene expression.

    Directory of Open Access Journals (Sweden)

    J Stephen Dumler

    2016-09-01

    Full Text Available Anaplasma phagocytophilum, an obligate intracellular prokaryote, infects neutrophils and alters cardinal functions via reprogrammed transcription. Large contiguous regions of neutrophil chromosomes are differentially expressed during infection. Secreted A. phagocytophilum effector AnkA transits into the neutrophil or granulocyte nucleus to complex with DNA in heterochromatin across all chromosomes. AnkA binds to gene promoters to dampen cis-transcription and also has features of matrix attachment region (MAR-binding proteins that regulate three-dimensional chromatin architecture and coordinate transcriptional programs encoded in topologically-associated chromatin domains. We hypothesize that identification of additional AnkA binding sites will better delineate how A. phagocytophilum infection results in reprogramming of the neutrophil genome. Using AnkA-binding ChIP-seq, we showed that AnkA binds broadly throughout all chromosomes in a reproducible pattern, especially at: i intergenic regions predicted to be matrix attachment regions (MARs; ii within predicted lamina-associated domains; and iii at promoters ≤3,000 bp upstream of transcriptional start sites. These findings provide genome-wide support for AnkA as a regulator of cis-gene transcription. Moreover, the dominant mark of AnkA in distal intergenic regions known to be AT-enriched, coupled with frequent enrichment in the nuclear lamina, provides strong support for its role as a MAR-binding protein and genome re-organizer. AnkA must be considered a prime candidate to promote neutrophil reprogramming and subsequent functional changes that belie improved microbial fitness and pathogenicity.

  20. Integrated analysis of DNA copy number and gene expression microarray data using gene sets

    NARCIS (Netherlands)

    R.X. de Menezes (Renee); M. Boetzer (Marten); M. Sieswerda (Melle); G.J.B. van Ommen; J.M. Boer (Judith)

    2009-01-01

    textabstractBackground: Genes that play an important role in tumorigenesis are expected to show association between DNA copy number and RNA expression. Optimal power to find such associations can only be achieved if analysing copy number and gene expression jointly. Furthermore, some copy number

  1. Constructing Minimal Spanning Tree Based on Rough Set Theory for Gene Selection

    Directory of Open Access Journals (Sweden)

    Soumen Kumar Pati

    2012-11-01

    Full Text Available Microarray gene dataset often contains high dimensionalities which cause difficulty in clustering and classification. Datasets containing huge number of genes lead to increased complexity and therefore, degradation of dataset handling performance. Often, all the measured features of these high-dimensional datasets are not relevant for understanding the underlying phenomena of interest. Dimensionality reduction by reduct generation is hence performed as an important step before clustering and classification. The reduced attribute set has the same characteristics as the entire set of attributes in the information system. In this paper, a new attribute reduction technique, based on directed minimal spanning tree and rough set theory is done, for unsupervised learning. The method, firstly, computes a similarity factor between each pair of attributes using indiscernibility relation, a concept of rough set theory. Based on the similarity factors, an attribute similarity set is formed from which a directed weighted graph with vertices as attributes and edge weights as the inverse of the similarity factor is constructed. Then, all possible minimal spanning trees of the graph are generated. From each tree, iteratively, the most important vertex is included in the reduct set and all its out-going edges are removed. The process stops when the edge set is empty, thus producing multiple reducts. The proposed method and some well-known attribute reduction techniques have been applied on several microarray gene datasets for gene selection. The results obtained show the effectiveness of the method.

  2. CONSTRUCTING MINIMAL SPANNING TREE BASED ON ROUGH SET THEORY FOR GENE SELECTION

    Directory of Open Access Journals (Sweden)

    Soumen Kumar Pati

    2013-01-01

    Full Text Available Microarray gene dataset often contains high dimensionalities which cause difficulty in clustering and classification. Datasets containing huge number of genes lead to increased complexity and therefore, degradation of dataset handling performance. Often, all the measured features of these high-dimensional datasets are not relevant for understanding the underlying phenomena of interest. Dimensionality reduction by reduct generation is hence performed as an important step before clustering and classification. The reduced attribute set has the same characteristics as the entire set of attributes in the information system. In this paper, a new attribute reduction technique, based on directed minimal spanning tree and rough set theory is done, for unsupervised learning. The method, firstly, computes a similarity factor between each pair of attributes using indiscernibility relation, a concept of rough set theory. Based on the similarity factors, an attribute similarity set is formed from which a directed weighted graph with vertices as attributes and edge weights as the inverse of the similarity factor is constructed. Then, all possible minimal spanning trees of the graph are generated. From each tree, iteratively, the most important vertex is included in the reduct set and all its out-going edges are removed. The process stops when the edge set is empty, thus producing multiple reducts. The proposed method and some well-known attribute reduction techniques have been applied on several microarray gene datasets for gene selection. The results obtained show the effectiveness of the method.

  3. Targeted enrichment of the black cottonwood (Populus trichocarpa gene space using sequence capture

    Directory of Open Access Journals (Sweden)

    Zhou Lecong

    2012-12-01

    Full Text Available Abstract Background High-throughput re-sequencing is rapidly becoming the method of choice for studies of neutral and adaptive processes in natural populations across taxa. As re-sequencing the genome of large numbers of samples is still cost-prohibitive in many cases, methods for genome complexity reduction have been developed in attempts to capture most ecologically-relevant genetic variation. One of these approaches is sequence capture, in which oligonucleotide baits specific to genomic regions of interest are synthesized and used to retrieve and sequence those regions. Results We used sequence capture to re-sequence most predicted exons, their upstream regulatory regions, as well as numerous random genomic intervals in a panel of 48 genotypes of the angiosperm tree Populus trichocarpa (black cottonwood, or ‘poplar’. A total of 20.76Mb (5% of the poplar genome was targeted, corresponding to 173,040 baits. With 12 indexed samples run in each of four lanes on an Illumina HiSeq instrument (2x100 paired-end, 86.8% of the bait regions were on average sequenced at a depth ≥10X. Few off-target regions (>250bp away from any bait were present in the data, but on average ~80bp on either side of the baits were captured and sequenced to an acceptable depth (≥10X to call heterozygous SNPs. Nucleotide diversity estimates within and adjacent to protein-coding genes were similar to those previously reported in Populus spp., while intergenic regions had higher values consistent with a relaxation of selection. Conclusions Our results illustrate the efficiency and utility of sequence capture for re-sequencing highly heterozygous tree genomes, and suggest design considerations to optimize the use of baits in future studies.

  4. Low enzymatic activity haplotypes of the human catechol-O-methyltransferase gene: enrichment for marker SNPs.

    Directory of Open Access Journals (Sweden)

    Andrea G Nackley

    Full Text Available Catechol-O-methyltransferase (COMT is an enzyme that plays a key role in the modulation of catechol-dependent functions such as cognition, cardiovascular function, and pain processing. Three common haplotypes of the human COMT gene, divergent in two synonymous and one nonsynonymous (val(158met position, designated as low (LPS, average (APS, and high pain sensitive (HPS, are associated with experimental pain sensitivity and risk of developing chronic musculoskeletal pain conditions. APS and HPS haplotypes produce significant functional effects, coding for 3- and 20-fold reductions in COMT enzymatic activity, respectively. In the present study, we investigated whether additional minor single nucleotide polymorphisms (SNPs, accruing in 1 to 5% of the population, situated in the COMT transcript region contribute to haplotype-dependent enzymatic activity. Computer analysis of COMT ESTs showed that one synonymous minor SNP (rs769224 is linked to the APS haplotype and three minor SNPs (two synonymous: rs6267, rs740602 and one nonsynonymous: rs8192488 are linked to the HPS haplotype. Results from in silico and in vitro experiments revealed that inclusion of allelic variants of these minor SNPs in APS or HPS haplotypes did not modify COMT function at the level of mRNA folding, RNA transcription, protein translation, or enzymatic activity. These data suggest that neutral variants are carried with APS and HPS haplotypes, while the high activity LPS haplotype displays less linked variation. Thus, both minor synonymous and nonsynonymous SNPs in the coding region are markers of functional APS and HPS haplotypes rather than independent contributors to COMT activity.

  5. Different gene sets contribute to different symptom dimensions of depression and anxiety.

    Science.gov (United States)

    van Veen, Tineke; Goeman, Jelle J; Monajemi, Ramin; Wardenaar, Klaas J; Hartman, Catharina A; Snieder, Harold; Nolte, Ilja M; Penninx, Brenda W J H; Zitman, Frans G

    2012-07-01

    Although many genetic association studies have been carried out, it remains unclear which genes contribute to depression. This may be due to heterogeneity of the DSM-IV category of depression. Specific symptom-dimensions provide a more homogenous phenotype. Furthermore, as effects of individual genes are small, analysis of genetic data at the pathway-level provides more power to detect associations and yield valuable biological insight. In 1,398 individuals with a Major Depressive Disorder, the symptom dimensions of the tripartite model of anxiety and depression, General Distress, Anhedonic Depression, and Anxious Arousal, were measured with the Mood and Anxiety Symptoms Questionnaire (30-item Dutch adaptation; MASQ-D30). Association of these symptom dimensions with candidate gene sets and gene sets from two public pathway databases was tested using the Global test. One pathway was associated with General Distress, and concerned molecules expressed in the endoplasmatic reticulum lumen. Seven pathways were associated with Anhedonic Depression. Important themes were neurodevelopment, neurodegeneration, and cytoskeleton. Furthermore, three gene sets associated with Anxious Arousal regarded development, morphology, and genetic recombination. The individual pathways explained up to 1.7% of the variance. These data demonstrate mechanisms that influence the specific dimensions. Moreover, they show the value of using dimensional phenotypes on one hand and gene sets on the other hand.

  6. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    Directory of Open Access Journals (Sweden)

    Nilotpal Chowdhury

    Full Text Available Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis.The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets.Four microarray series (having 742 patients were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA.Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed.To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and

  7. Constructing Minimal Spanning Tree Based on Rough Set Theory for Gene Selection

    Directory of Open Access Journals (Sweden)

    Soumen Kumar Pati

    2013-02-01

    Full Text Available Microarray gene dataset often contains high dimensionalities which cause difficulty in clustering andclassification. Datasets containing huge number of genes lead to increased complexity and therefore,degradation of dataset handling performance. Often, all the measured features of these high-dimensionaldatasets are not relevant for understanding the underlying phenomena of interest. Dimensionality reductionby reduct generation is hence performed as an important step before clustering and classification. Thereduced attribute set has the same characteristics as the entire set of attributes in the information system.In this paper, a new attribute reduction technique, based on directed minimal spanning tree and rough settheory is done, for unsupervised learning. The method, firstly, computes a similarity factor between eachpair of attributes using indiscernibility relation, a concept of rough set theory. Based on the similarityfactors, an attribute similarity set is formed from which a directed weighted graph with vertices asattributes and edge weights as the inverse of the similarity factor is constructed. Then, all possible minimalspanning trees of the graph are generated. From each tree, iteratively, the most important vertex isincluded in the reduct set and all its out-going edges are removed. The process stops when the edge set isempty, thus producing multiple reducts. The proposed method and some well-known attribute reductiontechniques have been applied on several microarray gene datasets for gene selection. The results obtainedshow the effectiveness of the method.

  8. Global adaptive rank truncated product method for gene-set analysis in association studies.

    Science.gov (United States)

    Vilor-Tejedor, Natalia; Calle, M Luz

    2014-09-01

    Gene set analysis (GSA) aims to assess the overall association of a set of genetic variants with a phenotype and has the potential to detect subtle effects of variants in a gene or a pathway that might be missed when assessed individually. We present a new implementation of the Adaptive Rank Truncated Product method (ARTP) for analyzing the association of a set of Single Nucleotide Polymorphisms (SNPs) in a gene or pathway. The new implementation, referred to as globalARTP, improves the original one by allowing the different SNPs in the set to have different modes of inheritance. We perform a simulation study for exploring the power of the proposed methodology in a set of scenarios with different numbers of causal SNPs with different effect sizes. Moreover, we show the advantage of using the gene set approach in the context of an Alzheimer's disease case-control study where we explore the endocytosis pathway. The new method is implemented in the R function globalARTP of the globalGSA package available at http://cran.r-project.org. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Genome-wide association of bipolar disorder suggests an enrichment of replicable associations in regions near genes.

    Directory of Open Access Journals (Sweden)

    Erin N Smith

    2011-06-01

    Full Text Available Although a highly heritable and disabling disease, bipolar disorder's (BD genetic variants have been challenging to identify. We present new genotype data for 1,190 cases and 401 controls and perform a genome-wide association study including additional samples for a total of 2,191 cases and 1,434 controls. We do not detect genome-wide significant associations for individual loci; however, across all SNPs, we show an association between the power to detect effects calculated from a previous genome-wide association study and evidence for replication (P = 1.5×10(-7. To demonstrate that this result is not likely to be a false positive, we analyze replication rates in a large meta-analysis of height and show that, in a large enough study, associations replicate as a function of power, approaching a linear relationship. Within BD, SNPs near exons exhibit a greater probability of replication, supporting an enrichment of reproducible associations near functional regions of genes. These results indicate that there is likely common genetic variation associated with BD near exons (±10 kb that could be identified in larger studies and, further, provide a framework for assessing the potential for replication when combining results from multiple studies.

  10. Genome-wide association of bipolar disorder suggests an enrichment of replicable associations in regions near genes.

    Directory of Open Access Journals (Sweden)

    Erin N Smith

    2011-06-01

    Full Text Available Although a highly heritable and disabling disease, bipolar disorder's (BD genetic variants have been challenging to identify. We present new genotype data for 1,190 cases and 401 controls and perform a genome-wide association study including additional samples for a total of 2,191 cases and 1,434 controls. We do not detect genome-wide significant associations for individual loci; however, across all SNPs, we show an association between the power to detect effects calculated from a previous genome-wide association study and evidence for replication (P = 1.5×10(-7. To demonstrate that this result is not likely to be a false positive, we analyze replication rates in a large meta-analysis of height and show that, in a large enough study, associations replicate as a function of power, approaching a linear relationship. Within BD, SNPs near exons exhibit a greater probability of replication, supporting an enrichment of reproducible associations near functional regions of genes. These results indicate that there is likely common genetic variation associated with BD near exons (±10 kb that could be identified in larger studies and, further, provide a framework for assessing the potential for replication when combining results from multiple studies.

  11. Genome-Wide Association Studies Suggest Limited Immune Gene Enrichment in Schizophrenia Compared to 5 Autoimmune Diseases.

    Science.gov (United States)

    Pouget, Jennie G; Gonçalves, Vanessa F; Spain, Sarah L; Finucane, Hilary K; Raychaudhuri, Soumya; Kennedy, James L; Knight, Jo

    2016-09-01

    There has been intense debate over the immunological basis of schizophrenia, and the potential utility of adjunct immunotherapies. The major histocompatibility complex is consistently the most powerful region of association in genome-wide association studies (GWASs) of schizophrenia and has been interpreted as strong genetic evidence supporting the immune hypothesis. However, global pathway analyses provide inconsistent evidence of immune involvement in schizophrenia, and it remains unclear whether genetic data support an immune etiology per se. Here we empirically test the hypothesis that variation in immune genes contributes to schizophrenia. We show that there is no enrichment of immune loci outside of the MHC region in the largest genetic study of schizophrenia conducted to date, in contrast to 5 diseases of known immune origin. Among 108 regions of the genome previously associated with schizophrenia, we identify 6 immune candidates (DPP4, HSPD1, EGR1, CLU, ESAM, NFATC3) encoding proteins with alternative, nonimmune roles in the brain. While our findings do not refute evidence that has accumulated in support of the immune hypothesis, they suggest that genetically mediated alterations in immune function may not play a major role in schizophrenia susceptibility. Instead, there may be a role for pleiotropic effects of a small number of immune genes that also regulate brain development and plasticity. Whether immune alterations drive schizophrenia progression is an important question to be addressed by future research, especially in light of the growing interest in applying immunotherapies in schizophrenia. © The Author 2016. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center.

  12. Network-based functional enrichment

    Directory of Open Access Journals (Sweden)

    Poirel Christopher L

    2011-11-01

    Full Text Available Abstract Background Many methods have been developed to infer and reason about molecular interaction networks. These approaches often yield networks with hundreds or thousands of nodes and up to an order of magnitude more edges. It is often desirable to summarize the biological information in such networks. A very common approach is to use gene function enrichment analysis for this task. A major drawback of this method is that it ignores information about the edges in the network being analyzed, i.e., it treats the network simply as a set of genes. In this paper, we introduce a novel method for functional enrichment that explicitly takes network interactions into account. Results Our approach naturally generalizes Fisher’s exact test, a gene set-based technique. Given a function of interest, we compute the subgraph of the network induced by genes annotated to this function. We use the sequence of sizes of the connected components of this sub-network to estimate its connectivity. We estimate the statistical significance of the connectivity empirically by a permutation test. We present three applications of our method: i determine which functions are enriched in a given network, ii given a network and an interesting sub-network of genes within that network, determine which functions are enriched in the sub-network, and iii given two networks, determine the functions for which the connectivity improves when we merge the second network into the first. Through these applications, we show that our approach is a natural alternative to network clustering algorithms. Conclusions We presented a novel approach to functional enrichment that takes into account the pairwise relationships among genes annotated by a particular function. Each of the three applications discovers highly relevant functions. We used our methods to study biological data from three different organisms. Our results demonstrate the wide applicability of our methods. Our algorithms are

  13. Dissection of the oncogenic MYCN transcriptional network reveals a large set of clinically relevant cell cycle genes as drivers of neuroblastoma tumorigenesis.

    Science.gov (United States)

    Murphy, Derek M; Buckley, Patrick G; Bryan, Kenneth; Watters, Karen M; Koster, Jan; van Sluis, Peter; Molenaar, Jan; Versteeg, Rogier; Stallings, Raymond L

    2011-06-01

    Amplification of the oncogenic transcription factor MYCN plays a major role in the pathogenesis of several pediatric cancers, including neuroblastoma, medulloblastoma, and rhabodomyosarcoma. For neuroblastoma, MYCN amplification is the most powerful genetic predictor of poor patient survival, yet the mechanism by which MYCN drives tumorigenesis is only partially understood. To gain an insight into the distribution of MYCN binding and to identify clinically relevant MYCN target genes, we performed an integrated analysis of MYCN ChIP-chip and mRNA expression using the MYCN repressible SHEP-21N neuroblastoma cell line. We hypothesized that genes exclusively MYCN bound in SHEP-21N cells over-expressing MYCN would be enriched for direct targets which contribute to the process of disease progression. Integrated analysis revealed that MYCN drives tumorigenesis predominantly as a positive regulator of target gene transcription. A high proportion of genes (24%) that are MYCN bound and up-regulated in the SHEP-21N model are significantly associated with poor overall patient survival (OS) in a set of 88 tumors. In contrast, the proportion of genes down-regulated when bound by MYCN in the SHEP-21N model and which are significantly associated with poor overall patient survival when under-expressed in primary tumors was significantly lower (5%). Gene ontology analysis determined a highly statistically significant enrichment for cell cycle related genes within the over-expressed MYCN target group which were also associated with poor OS. We conclude that the over-expression of MYCN leads to aberrant binding and over-expression of genes associated with cell cycle regulation which are significantly correlated with poor OS and MYCN amplification.

  14. MoSET1 (Histone H3K4 Methyltransferase in Magnaporthe oryzae Regulates Global Gene Expression during Infection-Related Morphogenesis.

    Directory of Open Access Journals (Sweden)

    Kieu Thi Minh Pham

    2015-07-01

    Full Text Available Here we report the genetic analyses of histone lysine methyltransferase (KMT genes in the phytopathogenic fungus Magnaporthe oryzae. Eight putative M. oryzae KMT genes were targeted for gene disruption by homologous recombination. Phenotypic assays revealed that the eight KMTs were involved in various infection processes at varying degrees. Moset1 disruptants (Δmoset1 impaired in histone H3 lysine 4 methylation (H3K4me showed the most severe defects in infection-related morphogenesis, including conidiation and appressorium formation. Consequently, Δmoset1 lost pathogenicity on wheat host plants, thus indicating that H3K4me is an important epigenetic mark for infection-related gene expression in M. oryzae. Interestingly, appressorium formation was greatly restored in the Δmoset1 mutants by exogenous addition of cAMP or of the cutin monomer, 16-hydroxypalmitic acid. The Δmoset1 mutants were still infectious on the super-susceptible barley cultivar Nigrate. These results suggested that MoSET1 plays roles in various aspects of infection, including signal perception and overcoming host-specific resistance. However, since Δmoset1 was also impaired in vegetative growth, the impact of MoSET1 on gene regulation was not infection specific. ChIP-seq analysis of H3K4 di- and tri-methylation (H3K4me2/me3 and MoSET1 protein during infection-related morphogenesis, together with RNA-seq analysis of the Δmoset1 mutant, led to the following conclusions: 1 Approximately 5% of M. oryzae genes showed significant changes in H3K4-me2 or -me3 abundance during infection-related morphogenesis. 2 In general, H3K4-me2 and -me3 abundance was positively associated with active transcription. 3 Lack of MoSET1 methyltransferase, however, resulted in up-regulation of a significant portion of the M. oryzae genes in the vegetative mycelia (1,491 genes, and during infection-related morphogenesis (1,385 genes, indicating that MoSET1 has a role in gene repression either

  15. Glutamatergic and GABAergic gene sets in attention-deficit/hyperactivity disorder

    DEFF Research Database (Denmark)

    Naaijen, J; Bralten, J; Poelmans, G

    2017-01-01

    Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance...... within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms......, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome...

  16. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets.

    Science.gov (United States)

    Khan, Aziz; Mathelier, Anthony

    2017-05-31

    A common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited. To address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets. Intervene and its web application companion provide an easy command line and an interactive web interface to compute intersections of multiple genomic and list sets. They have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene , with the web application available at https://asntech.shinyapps.io/intervene .

  17. Investigating the effect of paralogs on microarray gene-set analysis

    LENUS (Irish Health Repository)

    Faure, Andre J

    2011-01-24

    Abstract Background In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research. Results We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene http:\\/\\/www.cbio.uct.ac.za\\/indygene, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs. Conclusions The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.

  18. BiNGO及DAVID在miR-155靶基因富集分析中的应用%Application of BiNGO and DAVID in Biological Enrichment Analysis of miR-155 Target Genes

    Institute of Scientific and Technical Information of China (English)

    杨蓉; 蔡琳

    2012-01-01

    目的 通过对预测得到的miR-155靶基因进行生物信息学分析,探索和比较DAVID(Database for Annotation,Visualization and Integrated Discovery)及BiNGO(Biological Networks Gene Ontology tool)软件在基因本体(GO)及生物学通路富集分析中的应用,以期为miR-155靶基因的实验验证及生物学功能的研究提供理论指导.方法 利用TargetScan 6.0预测得到的miR-155靶基因作为分析的基因集合,通过BiNGO及DAVID对这个靶基因集合进行GO富集分析和生物通路富集分析,并对两个软件的富集分析结果进行比较和分析.结果 miR-155的预测靶基因集合分别富集在转录调控活性、蛋白激酶活性等分子功能上和代谢调控、转录调控、高分子生物合成、基因表达调控及信号转导等生物学过程中(P<0.01);进一步分析显示该基因集合在KEGG代谢通路数据库中,显著富集于T细胞受体信号通路、B细胞受体信号通路、MAPK信号通路、ErbB信号通路等7个信号通路和结直肠癌、急慢性髓性白血病等7个疾病通路中(P<0.05).结论 BiNGO和DAVID软件在miRNAs靶基因富集分析中各有优势,可以结合两个软件的分析结果对miRNAs靶基因集合进行生物学描述,为进一步的功能研究提供生物信息学指导.%Objective To explore and compare the DAVID (Database for Annotation, Visualization and Integrated Discovery) and BiNGO software for bioinformatics analysis through the Gene Ontology and biological pathway enrichment analysis of targets of miR-155,in order to provide theoretical guidance for experimental validation of miR-155 target genes and biological functions. Methods TargetScan 6. 0 algorithm was used to predict target genes of miR-155 , and the result as gene set was analyzed by DAVID and BiNGO software. The differences of enrichment analysis result between DAVID and BiNGO were compared. Results The gene set was mostly enriched in transcriptional regulation activity protein

  19. Accurate Gene Expression-Based Biodosimetry Using a Minimal Set of Human Gene Transcripts

    Energy Technology Data Exchange (ETDEWEB)

    Tucker, James D., E-mail: jtucker@biology.biosci.wayne.edu [Department of Biological Sciences, Wayne State University, Detroit, Michigan (United States); Joiner, Michael C. [Department of Radiation Oncology, Wayne State University, Detroit, Michigan (United States); Thomas, Robert A.; Grever, William E.; Bakhmutsky, Marina V. [Department of Biological Sciences, Wayne State University, Detroit, Michigan (United States); Chinkhota, Chantelle N.; Smolinski, Joseph M. [Department of Electrical and Computer Engineering, Wayne State University, Detroit, Michigan (United States); Divine, George W. [Department of Public Health Sciences, Henry Ford Hospital, Detroit, Michigan (United States); Auner, Gregory W. [Department of Electrical and Computer Engineering, Wayne State University, Detroit, Michigan (United States)

    2014-03-15

    Purpose: Rapid and reliable methods for conducting biological dosimetry are a necessity in the event of a large-scale nuclear event. Conventional biodosimetry methods lack the speed, portability, ease of use, and low cost required for triaging numerous victims. Here we address this need by showing that polymerase chain reaction (PCR) on a small number of gene transcripts can provide accurate and rapid dosimetry. The low cost and relative ease of PCR compared with existing dosimetry methods suggest that this approach may be useful in mass-casualty triage situations. Methods and Materials: Human peripheral blood from 60 adult donors was acutely exposed to cobalt-60 gamma rays at doses of 0 (control) to 10 Gy. mRNA expression levels of 121 selected genes were obtained 0.5, 1, and 2 days after exposure by reverse-transcriptase real-time PCR. Optimal dosimetry at each time point was obtained by stepwise regression of dose received against individual gene transcript expression levels. Results: Only 3 to 4 different gene transcripts, ASTN2, CDKN1A, GDF15, and ATM, are needed to explain ≥0.87 of the variance (R{sup 2}). Receiver-operator characteristics, a measure of sensitivity and specificity, of 0.98 for these statistical models were achieved at each time point. Conclusions: The actual and predicted radiation doses agree very closely up to 6 Gy. Dosimetry at 8 and 10 Gy shows some effect of saturation, thereby slightly diminishing the ability to quantify higher exposures. Analyses of these gene transcripts may be advantageous for use in a field-portable device designed to assess exposures in mass casualty situations or in clinical radiation emergencies.

  20. Gene regulatory network inference using fused LASSO on multiple data sets.

    Science.gov (United States)

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M O; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-02-11

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions.

  1. Primer Sets Developed for Functional Genes Reveal Shifts in Functionality of Fungal Community in Soils

    Science.gov (United States)

    Hannula, S. Emilia; van Veen, Johannes A.

    2016-01-01

    Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in carbon cycling were designed and/or validated and the functioning of soil fungal communities along a chronosequence of land abandonment from agriculture was evaluated using them. We hypothesized that changes in fungal community structure during secondary succession would lead to difference in the types of genes present in soils and that these changes would be directional. We expected an increase in genes involved in degradation of recalcitrant organic matter in time since agriculture. Out of the investigated genes, the richness of the genes related to carbon cycling was significantly higher in fields abandoned for longer time. The composition of six of the genes analyzed revealed significant differences between fields abandoned for shorter and longer time. However, all genes revealed significant variance over the fields studied, and this could be related to other parameters than the time since agriculture such as pH, organic matter, and the amount of available nitrogen. Contrary to our initial hypothesis, the genes significantly different between fields were not related to the decomposition of more recalcitrant matter but rather involved in degradation of cellulose and hemicellulose. PMID:27965632

  2. Primer Sets Developed for Functional Genes Reveal Shifts in Functionality of Fungal Community in Soils.

    Science.gov (United States)

    Hannula, S Emilia; van Veen, Johannes A

    2016-01-01

    Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in carbon cycling were designed and/or validated and the functioning of soil fungal communities along a chronosequence of land abandonment from agriculture was evaluated using them. We hypothesized that changes in fungal community structure during secondary succession would lead to difference in the types of genes present in soils and that these changes would be directional. We expected an increase in genes involved in degradation of recalcitrant organic matter in time since agriculture. Out of the investigated genes, the richness of the genes related to carbon cycling was significantly higher in fields abandoned for longer time. The composition of six of the genes analyzed revealed significant differences between fields abandoned for shorter and longer time. However, all genes revealed significant variance over the fields studied, and this could be related to other parameters than the time since agriculture such as pH, organic matter, and the amount of available nitrogen. Contrary to our initial hypothesis, the genes significantly different between fields were not related to the decomposition of more recalcitrant matter but rather involved in degradation of cellulose and hemicellulose.

  3. Primer sets developed for fungal functional genes reveal shifts in functionality of fungal community in soils

    Directory of Open Access Journals (Sweden)

    Emilia Silja Hannula

    2016-11-01

    Full Text Available Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in carbon cycling were designed and/or validated and the functioning of soil fungal communities along a chronosequence of land abandonment from agriculture was evaluated using them. We hypothesized that changes in fungal community structure during secondary succession would lead to difference in the types of genes present in soils and that these changes would be directional. We expected an increase in genes involved in degradation of recalcitrant organic matter in time since agriculture. Out of the investigated genes, the richness of the genes related to carbon cycling was significantly higher in fields abandoned for longer time. The composition of six of the genes analyzed revealed significant differences between fields abandoned for shorter and longer time. However, all genes revealed significant variance over the fields studied, and this could be related to other parameters than the time since agriculture such as pH, organic matter and the amount of available nitrogen. Contrary to our initial hypothesis, the genes significantly different between fields were not related to the decomposition of more recalcitrant matter but rather involved in degradation of cellulose and hemicellulose.

  4. Gene and protein analysis of brain derived neurotrophic factor expression in relation to neurological recovery induced by an enriched environment in a rat stroke model.

    Science.gov (United States)

    Hirata, Kenji; Kuge, Yuji; Yokota, Chiaki; Harada, Akina; Kokame, Koichi; Inoue, Hiroyasu; Kawashima, Hidekazu; Hanzawa, Hiroko; Shono, Yuji; Saji, Hideo; Minematsu, Kazuo; Tamaki, Nagara

    2011-05-20

    Although an enriched environment enhances functional recovery after ischemic stroke, the mechanism underlying this effect remains unclear. We previously reported that brain derived neurotrophic factor (BDNF) gene expression decreased in rats housed in an enriched environment for 4 weeks compared to those housed in a standard cage for the same period. To further clarify the relationship between the decrease in BDNF and functional recovery, we investigated the effects of differential 2-week housing conditions on the mRNA of BDNF and protein levels of proBDNF and mature BDNF (matBDNF). After transient occlusion of the right middle cerebral artery of male Sprague-Dawley rats, we divided the rats into two groups: (1) an enriched group housed multiply in large cages equipped with toys, and (2) a standard group housed alone in small cages without toys. Behavioral tests before and after 2-week differential housing showed better neurological recovery in the enriched group than in the standard group. Synaptophysin immunostaining demonstrated that the density of synapses in the peri-infarct area was increased in the enriched group compared to the standard group, while infarct volumes were not significantly different. Real-time reverse transcription polymerase chain reaction, Western blotting and immunostaining all revealed no significant difference between the groups. The present results suggest that functional recovery cannot be ascribed to an increase in matBDNF or a decrease in proBDNF but rather to other underlying mechanisms.

  5. Transcriptome analysis of cortical tissue reveals shared sets of downregulated genes in autism and schizophrenia

    Science.gov (United States)

    Ellis, S E; Panitch, R; West, A B; Arking, D E

    2016-01-01

    Autism (AUT), schizophrenia (SCZ) and bipolar disorder (BPD) are three highly heritable neuropsychiatric conditions. Clinical similarities and genetic overlap between the three disorders have been reported; however, the causes and the downstream effects of this overlap remain elusive. By analyzing transcriptomic RNA-sequencing data generated from post-mortem cortical brain tissues from AUT, SCZ, BPD and control subjects, we have begun to characterize the extent of gene expression overlap between these disorders. We report that the AUT and SCZ transcriptomes are significantly correlated (P<0.001), whereas the other two cross-disorder comparisons (AUT–BPD and SCZ–BPD) are not. Among AUT and SCZ, we find that the genes differentially expressed across disorders are involved in neurotransmission and synapse regulation. Despite the lack of global transcriptomic overlap across all three disorders, we highlight two genes, IQSEC3 and COPS7A, which are significantly downregulated compared with controls across all three disorders, suggesting either shared etiology or compensatory changes across these neuropsychiatric conditions. Finally, we tested for enrichment of genes differentially expressed across disorders in genetic association signals in AUT, SCZ or BPD, reporting lack of signal in any of the previously published genome-wide association study (GWAS). Together, these studies highlight the importance of examining gene expression from the primary tissue involved in neuropsychiatric conditions—the cortical brain. We identify a shared role for altered neurotransmission and synapse regulation in AUT and SCZ, in addition to two genes that may more generally contribute to neurodevelopmental and neuropsychiatric conditions. PMID:27219343

  6. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    Directory of Open Access Journals (Sweden)

    Hettne Kristina M

    2013-01-01

    Full Text Available Abstract Background Availability of chemical response-specific lists of genes (gene sets for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM, and that these can be used with gene set analysis (GSA methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human and 588 (mouse gene sets from the Comparative Toxicogenomics Database (CTD. We tested for significant differential expression (SDE (false discovery rate -corrected p-values Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.

  7. Immunity related genes in dipterans share common enrichment of AT-rich motifs in their 5' regulatory regions that are potentially involved in nucleosome formation

    Directory of Open Access Journals (Sweden)

    Rodriguez Mario H

    2008-07-01

    Full Text Available Abstract Background Understanding the transcriptional regulation mechanisms in response to environmental challenges is of fundamental importance in biology. Transcription factors associated to response elements and the chromatin structure had proven to play important roles in gene expression regulation. We have analyzed promoter regions of dipteran genes induced in response to immune challenge, in search for particular sequence patterns involved in their transcriptional regulation. Results 5' upstream regions of D. melanogaster and A. gambiae immunity-induced genes and their corresponding orthologous genes in 11 non-melanogaster drosophilid species and Ae. aegypti share enrichment in AT-rich short motifs. AT-rich motifs are associated with nucleosome formation as predicted by two different algorithms. In A. gambiae and D. melanogaster, many immunity genes 5' upstream sequences also showed NFκB response elements, located within 500 bp from the transcription start site. In A. gambiae, the frequency of ATAA motif near the NFκB response elements was increased, suggesting a functional link between nucleosome formation/remodelling and NFκB regulation of transcription. Conclusion AT-rich motif enrichment in 5' upstream sequences in A. gambiae, Ae. aegypti and the Drosophila genus immunity genes suggests a particular pattern of nucleosome formation/chromatin organization. The co-occurrence of such motifs with the NFκB response elements suggests that these sequence signatures may be functionally involved in transcriptional activation during dipteran immune response. AT-rich motif enrichment in regulatory regions in this group of co-regulated genes could represent an evolutionary constrained signature in dipterans and perhaps other distantly species.

  8. A small set of extra-embryonic genes defines a new landmark for bovine embryo staging.

    Science.gov (United States)

    Degrelle, Séverine A; Lê Cao, Kim-Anh; Heyman, Yvan; Everts, Robin E; Campion, Evelyne; Richard, Christophe; Ducroix-Crépy, Céline; Tian, X Cindy; Lewin, Harris A; Renard, Jean-Paul; Robert-Granié, Christèle; Hue, Isabelle

    2011-01-01

    Axis specification in mouse is determined by a sequence of reciprocal interactions between embryonic and extra-embryonic tissues so that a few extra-embryonic genes appear as 'patterning' the embryo. Considering these interactions as essential, but lacking in most mammals the genetically driven approaches used in mouse and the corresponding patterning mutants, we examined whether a molecular signature originating from extra-embryonic tissues could relate to the developmental stage of the embryo proper and predict it. To this end, we have profiled bovine extra-embryonic tissues at peri-implantation stages, when gastrulation and early neurulation occur, and analysed the subsequent expression profiles through the use of predictive methods as previously reported for tumour classification. A set of six genes (CALM1, CPA3, CITED1, DLD, HNRNPDL, and TGFB3), half of which had not been previously associated with any extra-embryonic feature, appeared significantly discriminative and mainly dependent on embryonic tissues for its faithful expression. The predictive value of this set of genes for gastrulation and early neurulation stages, as assessed on naive samples, was remarkably high (93%). In silico connected to the bovine orthologues of the mouse patterning genes, this gene set is proposed as a new trait for embryo staging. As such, this will allow saving the bovine embryo proper for molecular or cellular studies. To us, it offers as well new perspectives for developmental phenotyping and modelling of embryonic/extra-embryonic co-differentiation.

  9. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Science.gov (United States)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  10. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    K.M. Hettne (Kristina); J. Boorsma (Jeffrey); D.A.M. van Dartel (Dorien A M); J.J. Goeman (Jelle); E.C. de Jong (Esther); A.H. Piersma (Aldert); R.H. Stierum (Rob); J. Kleinjans (Jos); J.A. Kors (Jan)

    2013-01-01

    textabstractBackground: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with

  11. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    K.M. Hettne (Kristina); J. Boorsma (Jeffrey); D.A.M. van Dartel (Dorien A M); J.J. Goeman (Jelle); E.C. de Jong (Esther); A.H. Piersma (Aldert); R.H. Stierum (Rob); J. Kleinjans (Jos); J.A. Kors (Jan)

    2013-01-01

    textabstractBackground: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with g

  12. Core gene set as the basis of multilocus sequence analysis of the subclass Actinobacteridae.

    Directory of Open Access Journals (Sweden)

    Toïdi Adékambi

    Full Text Available Comparative genomic sequencing is shedding new light on bacterial identification, taxonomy and phylogeny. An in silico assessment of a core gene set necessary for cellular functioning was made to determine a consensus set of genes that would be useful for the identification, taxonomy and phylogeny of the species belonging to the subclass Actinobacteridae which contained two orders Actinomycetales and Bifidobacteriales. The subclass Actinobacteridae comprised about 85% of the actinobacteria families. The following recommended criteria were used to establish a comprehensive gene set; the gene should (i be long enough to contain phylogenetically useful information, (ii not be subject to horizontal gene transfer, (iii be a single copy (iv have at least two regions sufficiently conserved that allow the design of amplification and sequencing primers and (v predict whole-genome relationships. We applied these constraints to 50 different Actinobacteridae genomes and made 1,224 pairwise comparisons of the genome conserved regions and gene fragments obtained by using Sequence VARiability Analysis Program (SVARAP, which allow designing the primers. Following a comparative statistical modeling phase, 3 gene fragments were selected, ychF, rpoB, and secY with R2>0.85. Selected sets of broad range primers were tested from the 3 gene fragments and were demonstrated to be useful for amplification and sequencing of 25 species belonging to 9 genera of Actinobacteridae. The intraspecies similarities were 96.3-100% for ychF, 97.8-100% for rpoB and 96.9-100% for secY among 73 strains belonging to 15 species of the subclass Actinobacteridae compare to 99.4-100% for 16S rRNA. The phylogenetic topology obtained from the combined datasets ychF+rpoB+secY was globally similar to that inferred from the 16S rRNA but with higher confidence. It was concluded that multi-locus sequence analysis using core gene set might represent the first consensus and valid approach for

  13. General approach for in vivo recovery of cell type-specific effector gene sets.

    Science.gov (United States)

    Barsi, Julius C; Tu, Qiang; Davidson, Eric H

    2014-05-01

    Differentially expressed, cell type-specific effector gene sets hold the key to multiple important problems in biology, from theoretical aspects of developmental gene regulatory networks (GRNs) to various practical applications. Although individual cell types of interest have been recovered by various methods and analyzed, systematic recovery of multiple cell type-specific gene sets from whole developing organisms has remained problematic. Here we describe a general methodology using the sea urchin embryo, a material of choice because of the large-scale GRNs already solved for this model system. This method utilizes the regulatory states expressed by given cells of the embryo to define cell type and includes a fluorescence activated cell sorting (FACS) procedure that results in no perturbation of transcript representation. We have extensively validated the method by spatial and qualitative analyses of the transcriptome expressed in isolated embryonic skeletogenic cells and as a consequence, generated a prototypical cell type-specific transcriptome database.

  14. Chronic vitamin A-enriched diet feeding regulates hypercholesterolaemia through transcriptional regulation of reverse cholesterol transport pathway genes in obese rat model of WNIN/GR-Ob strain

    Directory of Open Access Journals (Sweden)

    Shanmugam M Jeyakumar

    2016-01-01

    Full Text Available Background & objectives: Hepatic scavenger receptor class B1 (SR-B1, a high-density lipoprotein (HDL receptor, is involved in the selective uptake of HDL-associated esterified cholesterol (EC, thereby regulates cholesterol homoeostasis and improves reverse cholesterol transport. Previously, we reported in euglycaemic obese rats (WNIN/Ob strain that feeding of vitamin A-enriched diet normalized hypercholesterolaemia, possibly through hepatic SR-B1-mediated pathway. This study was aimed to test whether it would be possible to normalize hypercholesterolaemia in glucose-intolerant obese rat model (WNIN/GR/Ob through similar mechanism by feeding identical vitamin A-enriched diet. Methods: In this study, 30 wk old male lean and obese rats of WNIN/GR-Ob strain were divided into two groups and received either stock diet or vitamin A-enriched diet (2.6 mg or 129 mg vitamin A/kg diet for 14 wk. Blood and other tissues were collected for various biochemical analyses. Results: Chronic vitamin A-enriched diet feeding decreased hypercholesterolaemia and normalized abnormally elevated plasma HDL-cholesterol (HDL-C levels in obese rats as compared to stock diet-fed obese groups. Further, decreased free cholesterol (FC and increased esterified cholesterol (EC contents of plasma cholesterol were observed, which were reflected in higher EC to FC ratio of vitamin A-enriched diet-fed obese rats. However, neither lecithin-cholesterol acyltransferase (LCAT activity of plasma nor its expression (both gene and protein in the liver were altered. On the contrary, hepatic cholesterol levels significantly increased in vitamin A-enriched diet fed obese rats. Hepatic SR-B1 expression (both mRNA and protein remained unaltered among groups. Vitamin A-enriched diet fed obese rats showed a significant increase in hepatic low-density lipoprotein receptor mRNA levels, while the expression of genes involved in HDL synthesis, namely, ATP-binding cassette protein 1 (ABCA1 and

  15. Comprehensive set of integrative plasmid vectors for copper-inducible gene expression in Myxococcus xanthus.

    Science.gov (United States)

    Gómez-Santos, Nuria; Treuner-Lange, Anke; Moraleda-Muñoz, Aurelio; García-Bravo, Elena; García-Hernández, Raquel; Martínez-Cayuela, Marina; Pérez, Juana; Søgaard-Andersen, Lotte; Muñoz-Dorado, José

    2012-04-01

    Myxococcus xanthus is widely used as a model system for studying gliding motility, multicellular development, and cellular differentiation. Moreover, M. xanthus is a rich source of novel secondary metabolites. The analysis of these processes has been hampered by the limited set of tools for inducible gene expression. Here we report the construction of a set of plasmid vectors to allow copper-inducible gene expression in M. xanthus. Analysis of the effect of copper on strain DK1622 revealed that copper concentrations of up to 500 μM during growth and 60 μM during development do not affect physiological processes such as cell viability, motility, or aggregation into fruiting bodies. Of the copper-responsive promoters in M. xanthus reported so far, the multicopper oxidase cuoA promoter was used to construct expression vectors, because no basal expression is observed in the absence of copper and induction linearly depends on the copper concentration in the culture medium. Four different plasmid vectors have been constructed, with different marker selection genes and sites of integration in the M. xanthus chromosome. The vectors have been tested and gene expression quantified using the lacZ gene. Moreover, we demonstrate the functional complementation of the motility defect caused by lack of PilB by the copper-induced expression of the pilB gene. These versatile vectors are likely to deepen our understanding of the biology of M. xanthus and may also have biotechnological applications.

  16. MeSH-Informed Enrichment Analysis and MeSH-Guided Semantic Similarity Among Functional Terms and Gene Products in Chicken

    Directory of Open Access Journals (Sweden)

    Gota Morota

    2016-08-01

    Full Text Available Biomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation of gene products is mainly accelerated by Gene Ontology (GO, and more recently by Medical Subject Headings (MeSH. Here, we report a suite of MeSH packages for chicken in Bioconductor, and illustrate some features of different MeSH-based analyses, including MeSH-informed enrichment analysis and MeSH-guided semantic similarity among terms and gene products, using two lists of chicken genes available in public repositories. The two published datasets that were employed represent (i differentially expressed genes, and (ii candidate genes under selective sweep or epistatic selection. The comparison of MeSH with GO overrepresentation analyses suggested not only that MeSH supports the findings obtained from GO analysis, but also that MeSH is able to further enrich the representation of biological knowledge and often provide more interpretable results. Based on the hierarchical structures of MeSH and GO, we computed semantic similarities among vocabularies, as well as semantic similarities among selected genes. These yielded the similarity levels between significant functional terms, and the annotation of each gene yielded the measures of gene similarity. Our findings show the benefits of using MeSH as an alternative choice of annotation in order to draw biological inferences from a list of genes of interest. We argue that the use of MeSH in conjunction with GO will be instrumental in facilitating the understanding of the genetic basis of complex traits.

  17. PKA phosphorylation redirects ERα to promoters of a unique gene set to induce tamoxifen resistance.

    Science.gov (United States)

    de Leeuw, R; Flach, K; Bentin Toaldo, C; Alexi, X; Canisius, S; Neefjes, J; Michalides, R; Zwart, W

    2013-07-25

    Protein kinase A (PKA)-induced estrogen receptor alpha (ERα) phosphorylation at serine residue 305 (ERαS305-P) can induce tamoxifen (TAM) resistance in breast cancer. How this phospho-modification affects ERα specificity and translates into TAM resistance is unclear. Here, we show that S305-P modification of ERα reprograms the receptor, redirecting it to new transcriptional start sites, thus modulating the transcriptome. By altering the chromatin-binding pattern, Ser305 phosphorylation of ERα translates into a 26-gene expression classifier that identifies breast cancer patients with a poor disease outcome after TAM treatment. MYC-target genes and networks were significantly enriched in this gene classifier that includes a number of selective targets for ERαS305-P. The enhanced expression of MYC increased cell proliferation in the presence of TAM. We demonstrate that activation of the PKA signaling pathway alters the transcriptome by redirecting ERα to new transcriptional start sites, resulting in altered transcription and TAM resistance.

  18. Can survival prediction be improved by merging gene expression data sets?

    Directory of Open Access Journals (Sweden)

    Haleh Yasrebi

    Full Text Available BACKGROUND: High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. RESULTS: Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. CONCLUSIONS: Merging did not deteriorate performance on average despite (a The diversity of microarray platforms used. (b The heterogeneity of patients cohorts. (c The heterogeneity of breast cancer disease. (d Substantial variation of time to death or relapse. (e The reduced number of genes in the merged data

  19. Protein-Protein Interaction and Pathway Analyses of Top Schizophrenia Genes Reveal Schizophrenia Susceptibility Genes Converge on Common Molecular Networks and Enrichment of Nucleosome (Chromatin) Assembly Genes in Schizophrenia Susceptibility Loci

    OpenAIRE

    Luo, Xiongjian; Huang, Liang; Jia, Peilin; Li, Ming; SU, Bing; Zhao, Zhongming; Gan, Lin

    2013-01-01

    Recent genome-wide association studies have identified many promising schizophrenia candidate genes and demonstrated that common polygenic variation contributes to schizophrenia risk. However, whether these genes represent perturbations to a common but limited set of underlying molecular processes (pathways) that modulate risk to schizophrenia remains elusive, and it is not known whether these genes converge on common biological pathways (networks) or represent different pathways. In addition...

  20. Meta-analysis of differentiating mouse embryonic stem cell gene expression kinetics reveals early change of a small gene set.

    Directory of Open Access Journals (Sweden)

    Clive H Glover

    2006-11-01

    Full Text Available Stem cell differentiation involves critical changes in gene expression. Identification of these should provide endpoints useful for optimizing stem cell propagation as well as potential clues about mechanisms governing stem cell maintenance. Here we describe the results of a new meta-analysis methodology applied to multiple gene expression datasets from three mouse embryonic stem cell (ESC lines obtained at specific time points during the course of their differentiation into various lineages. We developed methods to identify genes with expression changes that correlated with the altered frequency of functionally defined, undifferentiated ESC in culture. In each dataset, we computed a novel statistical confidence measure for every gene which captured the certainty that a particular gene exhibited an expression pattern of interest within that dataset. This permitted a joint analysis of the datasets, despite the different experimental designs. Using a ranking scheme that favored genes exhibiting patterns of interest, we focused on the top 88 genes whose expression was consistently changed when ESC were induced to differentiate. Seven of these (103728_at, 8430410A17Rik, Klf2, Nr0b1, Sox2, Tcl1, and Zfp42 showed a rapid decrease in expression concurrent with a decrease in frequency of undifferentiated cells and remained predictive when evaluated in additional maintenance and differentiating protocols. Through a novel meta-analysis, this study identifies a small set of genes whose expression is useful for identifying changes in stem cell frequencies in cultures of mouse ESC. The methods and findings have broader applicability to understanding the regulation of self-renewal of other stem cell types.

  1. Inferring phylogenies with incomplete data sets: a 5-gene, 567-taxon analysis of angiosperms

    Directory of Open Access Journals (Sweden)

    Hilu Khidir W

    2009-03-01

    Full Text Available Abstract Background Phylogenetic analyses of angiosperm relationships have used only a small percentage of available sequence data, but phylogenetic data matrices often can be augmented with existing data, especially if one allows missing characters. We explore the effects on phylogenetic analyses of adding 378 matK sequences and 240 26S rDNA sequences to the complete 3-gene, 567-taxon angiosperm phylogenetic matrix of Soltis et al. Results We performed maximum likelihood bootstrap analyses of the complete, 3-gene 567-taxon data matrix and the incomplete, 5-gene 567-taxon data matrix. Although the 5-gene matrix has more missing data (27.5% than the 3-gene data matrix (2.9%, the 5-gene analysis resulted in higher levels of bootstrap support. Within the 567-taxon tree, the increase in support is most evident for relationships among the 170 taxa for which both matK and 26S rDNA sequences were added, and there is little gain in support for relationships among the 119 taxa having neither matK nor 26S rDNA sequences. The 5-gene analysis also places the enigmatic Hydrostachys in Lamiales (BS = 97% rather than in Cornales (BS = 100% in 3-gene analysis. The placement of Hydrostachys in Lamiales is unprecedented in molecular analyses, but it is consistent with embryological and morphological data. Conclusion Adding available, and often incomplete, sets of sequences to existing data sets can be a fast and inexpensive way to increase support for phylogenetic relationships and produce novel and credible new phylogenetic hypotheses.

  2. Comparison of the Dictyostelium rasD and ecmA genes reveals two distinct mechanisms whereby an mRNA may become enriched in prestalk cells.

    Science.gov (United States)

    Jermyn, K; Wiliams, J

    1995-04-01

    The Dictyostelium ras gene, rasD, encodes an mRNA that is more abundant in prestalk than prespore cells in the migratory slug. Its expression is inducible by extracellular cAMP but is not inducible by the prestalk and stalk cell morphogen differentiation inducing factor (DIF). We show that a rasD-lacZ fusion gene is first expressed in approximately one half of the cells in the aggregate, including some cells that also express a prespore-specific marker. The amount of rasD-lacZ fusion protein in prespore cells then diminishes as the slug is formed. Analysis of a rasD-lacZ fusion protein with an N terminal substitution that reduces protein stability within the cell provides strong confirmatory evidence that the ras gene product becomes enriched in prestalk cells by selective repression of gene expression in prespore cells. In contrast, the DIF-inducible ecmA gene is expressed only in those cells that will become prestalk cells in the migratory slug. These results show that there are two different ways in which an mRNA may become enriched in prestalk cells and support the view that DIF is the inducer of prestalk cell differentiation.

  3. Comparative Analysis of 16S rRNA and amoA Genes from Archaea Selected with Organic and Inorganic Amendments in Enrichment Culture

    Science.gov (United States)

    Xu, Mouzhong; Schnorr, Jon; Keibler, Brandon

    2012-01-01

    We took advantage of a plant-root enrichment culture system to characterize mesophilic soil archaea selected through the use of organic and inorganic amendments. Comparative analysis of 16S rRNA and amoA genes indicated that specific archaeal clades were selected under different conditions. Three amoA sequence clades were identified, while for a fourth group, identified by 16S rRNA gene analysis alone and referred to as the “root” clade, we detected no corresponding amoA gene. The amoA-containing archaea were present in media with either organic or inorganic amendments, whereas archaea representing the root clade were present only when organic amendment was used. Analysis of amoA gene abundance and expression, together with nitrification-coupled growth assays, indicated potential growth by autotrophic ammonia oxidation for members of two group 1.1b clades. Increased abundance of one of these clades, however, also occurred upon the addition of organic amendment. Finally, although amoA-containing group 1.1a archaea were present in enrichments, we detected neither expression of amoA genes nor evidence for nitrification-coupled growth of these organisms. These data support a model of a diverse metabolic community in mesophilic soil archaea that is just beginning to be characterized. PMID:22267662

  4. Gene Sets for Utilization of Primary and Secondary Nutrition Supplies in the Distal Gut of Endangered Iberian Lynx

    Science.gov (United States)

    Alcaide, María; Messina, Enzo; Richter, Michael; Bargiela, Rafael; Peplies, Jörg; Huws, Sharon A.; Newbold, Charles J.; Golyshin, Peter N.; Simón, Miguel A.; López, Guillermo; Yakimov, Michail M.; Ferrer, Manuel

    2012-01-01

    Recent studies have indicated the existence of an extensive trans-genomic trans-mural co-metabolism between gut microbes and animal hosts that is diet-, host phylogeny- and provenance-influenced. Here, we analyzed the biodiversity at the level of small subunit rRNA gene sequence and the metabolic composition of 18 Mbp of consensus metagenome sequences and activity characteristics of bacterial intra-cellular extracts, in wild Iberian lynx (Lynx pardinus) fecal samples. Bacterial signatures (14.43% of all of the Firmicutes reads and 6.36% of total reads) related to the uncultured anaerobic commensals Anaeroplasma spp., which are typically found in ovine and bovine rumen, were first identified. The lynx gut was further characterized by an over-representation of ‘presumptive’ aquaporin aqpZ genes and genes encoding ‘active’ lysosomal-like digestive enzymes that are possibly needed to acquire glycerol, sugars and amino acids from glycoproteins, glyco(amino)lipids, glyco(amino)glycans and nucleoside diphosphate sugars. Lynx gut was highly enriched (28% of the total glycosidases) in genes encoding α-amylase and related enzymes, although it exhibited low rate of enzymatic activity indicative of starch degradation. The preponderance of β-xylosidase activity in protein extracts further suggests lynx gut microbes being most active for the metabolism of β-xylose containing plant N-glycans, although β-xylosidases sequences constituted only 1.5% of total glycosidases. These collective and unique bacterial, genetic and enzymatic activity signatures suggest that the wild lynx gut microbiota not only harbors gene sets underpinning sugar uptake from primary animal tissues (with the monotypic dietary profile of the wild lynx consisting of 80–100% wild rabbits) but also for the hydrolysis of prey-derived plant biomass. Although, the present investigation corresponds to a single sample and some of the statements should be considered qualitative, the data most likely

  5. Gene sets for utilization of primary and secondary nutrition supplies in the distal gut of endangered Iberian lynx.

    Directory of Open Access Journals (Sweden)

    María Alcaide

    Full Text Available Recent studies have indicated the existence of an extensive trans-genomic trans-mural co-metabolism between gut microbes and animal hosts that is diet-, host phylogeny- and provenance-influenced. Here, we analyzed the biodiversity at the level of small subunit rRNA gene sequence and the metabolic composition of 18 Mbp of consensus metagenome sequences and activity characteristics of bacterial intra-cellular extracts, in wild Iberian lynx (Lynx pardinus fecal samples. Bacterial signatures (14.43% of all of the Firmicutes reads and 6.36% of total reads related to the uncultured anaerobic commensals Anaeroplasma spp., which are typically found in ovine and bovine rumen, were first identified. The lynx gut was further characterized by an over-representation of 'presumptive' aquaporin aqpZ genes and genes encoding 'active' lysosomal-like digestive enzymes that are possibly needed to acquire glycerol, sugars and amino acids from glycoproteins, glyco(aminolipids, glyco(aminoglycans and nucleoside diphosphate sugars. Lynx gut was highly enriched (28% of the total glycosidases in genes encoding α-amylase and related enzymes, although it exhibited low rate of enzymatic activity indicative of starch degradation. The preponderance of β-xylosidase activity in protein extracts further suggests lynx gut microbes being most active for the metabolism of β-xylose containing plant N-glycans, although β-xylosidases sequences constituted only 1.5% of total glycosidases. These collective and unique bacterial, genetic and enzymatic activity signatures suggest that the wild lynx gut microbiota not only harbors gene sets underpinning sugar uptake from primary animal tissues (with the monotypic dietary profile of the wild lynx consisting of 80-100% wild rabbits but also for the hydrolysis of prey-derived plant biomass. Although, the present investigation corresponds to a single sample and some of the statements should be considered qualitative, the data most likely

  6. A rough set based rational clustering framework for determining correlated genes.

    Science.gov (United States)

    Jeyaswamidoss, Jeba Emilyn; Thangaraj, Kesavan; Ramar, Kadarkarai; Chitra, Muthusamy

    2016-06-01

    Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters.

  7. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    Science.gov (United States)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  8. Selection and validation of a set of reliable reference genes for quantitative sod gene expression analysis in C. elegans

    Directory of Open Access Journals (Sweden)

    Vandesompele Jo

    2008-01-01

    Full Text Available Abstract Background In the nematode Caenorhabditis elegans the conserved Ins/IGF-1 signaling pathway regulates many biological processes including life span, stress response, dauer diapause and metabolism. Detection of differentially expressed genes may contribute to a better understanding of the mechanism by which the Ins/IGF-1 signaling pathway regulates these processes. Appropriate normalization is an essential prerequisite for obtaining accurate and reproducible quantification of gene expression levels. The aim of this study was to establish a reliable set of reference genes for gene expression analysis in C. elegans. Results Real-time quantitative PCR was used to evaluate the expression stability of 12 candidate reference genes (act-1, ama-1, cdc-42, csq-1, eif-3.C, mdh-1, gpd-2, pmp-3, tba-1, Y45F10D.4, rgs-6 and unc-16 in wild-type, three Ins/IGF-1 pathway mutants, dauers and L3 stage larvae. After geNorm analysis, cdc-42, pmp-3 and Y45F10D.4 showed the most stable expression pattern and were used to normalize 5 sod expression levels. Significant differences in mRNA levels were observed for sod-1 and sod-3 in daf-2 relative to wild-type animals, whereas in dauers sod-1, sod-3, sod-4 and sod-5 are differentially expressed relative to third stage larvae. Conclusion Our findings emphasize the importance of accurate normalization using stably expressed reference genes. The methodology used in this study is generally applicable to reliably quantify gene expression levels in the nematode C. elegans using quantitative PCR.

  9. Coverage and characteristics of the Affymetrix GeneChip Human Mapping 100K SNP set.

    Directory of Open Access Journals (Sweden)

    2006-05-01

    Full Text Available Improvements in technology have made it possible to conduct genome-wide association mapping at costs within reach of academic investigators, and experiments are currently being conducted with a variety of high-throughput platforms. To provide an appropriate context for interpreting results of such studies, we summarize here results of an investigation of one of the first of these technologies to be publicly available, the Affymetrix GeneChip Human Mapping 100K set of single nucleotide polymorphisms (SNPs. In a systematic analysis of the pattern and distribution of SNPs in the Mapping 100K set, we find that SNPs in this set are undersampled from coding regions (both nonsynonymous and synonymous and oversampled from regions outside genes, relative to SNPs in the overall HapMap database. In addition, we utilize a novel multilocus linkage disequilibrium (LD coefficient based on information content (analogous to the information content scores commonly used for linkage mapping that is equivalent to the familiar measure r2 in the special case of two loci. Using this approach, we are able to summarize for any subset of markers, such as the Affymetrix Mapping 100K set, the information available for association mapping in that subset, relative to the information available in the full set of markers included in the HapMap, and highlight circumstances in which this multilocus measure of LD provides substantial additional insight about the haplotype structure in a region over pairwise measures of LD.

  10. Gene set-based module discovery in the breast cancer transcriptome

    Directory of Open Access Journals (Sweden)

    Zhang Michael Q

    2009-02-01

    Full Text Available Abstract Background Although microarray-based studies have revealed global view of gene expression in cancer cells, we still have little knowledge about regulatory mechanisms underlying the transcriptome. Several computational methods applied to yeast data have recently succeeded in identifying expression modules, which is defined as co-expressed gene sets under common regulatory mechanisms. However, such module discovery methods are not applied cancer transcriptome data. Results In order to decode oncogenic regulatory programs in cancer cells, we developed a novel module discovery method termed EEM by extending a previously reported module discovery method, and applied it to breast cancer expression data. Starting from seed gene sets prepared based on cis-regulatory elements, ChIP-chip data, and gene locus information, EEM identified 10 principal expression modules in breast cancer based on their expression coherence. Moreover, EEM depicted their activity profiles, which predict regulatory programs in each subtypes of breast tumors. For example, our analysis revealed that the expression module regulated by the Polycomb repressive complex 2 (PRC2 is downregulated in triple negative breast cancers, suggesting similarity of transcriptional programs between stem cells and aggressive breast cancer cells. We also found that the activity of the PRC2 expression module is negatively correlated to the expression of EZH2, a component of PRC2 which belongs to the E2F expression module. E2F-driven EZH2 overexpression may be responsible for the repression of the PRC2 expression modules in triple negative tumors. Furthermore, our network analysis predicts regulatory circuits in breast cancer cells. Conclusion These results demonstrate that the gene set-based module discovery approach is a powerful tool to decode regulatory programs in cancer cells.

  11. Enriched environment inhibits mouse pancreatic cancer growth and down-regulates the expression of mitochondria-related genes in cancer cells.

    Science.gov (United States)

    Li, Guohua; Gan, Yu; Fan, Yingchao; Wu, Yufeng; Lin, Hechun; Song, Yanfang; Cai, Xiaojin; Yu, Xiang; Pan, Weihong; Yao, Ming; Gu, Jianren; Tu, Hong

    2015-01-19

    Psycho-social stress has been suggested to influence the development of cancer, but it remains poorly defined with regard to pancreatic cancer, a lethal malignancy with few effective treatment modalities. In this study, we sought to investigate the impacts of enriched environment (EE) housing, a rodent model of "eustress", on the growth of mouse pancreatic cancer, and to explore the potential underlying mechanisms through gene expression profiling. The EE mice showed significantly reduced tumor weights in both subcutaneous (53%) and orthotopic (41%) models, while each single component of EE (inanimate stimulation, social stimulation or physical exercise) was not profound enough to achieve comparative anti-tumor effects as EE. The integrative transcriptomic and proteomic analysis revealed that in response to EE, a total of 129 genes in the tumors showed differential expression at both the mRNA and protein levels. The differentially expressed genes were mostly localized to the mitochondria and enriched in the citrate cycle and oxidative phosphorylation pathways. Interestingly, nearly all of the mitochondria-related genes were down-regulated by EE. Our data have provided experimental evidence in favor of the application of positive stress or of benign environmental stimulation in pancreatic cancer therapy.

  12. A genetic map of melon highly enriched with fruit quality QTLs and EST markers, including sugar and carotenoid metabolism genes

    Science.gov (United States)

    A genetic map of melon enriched for fruit traits was constructed, using a recombinant inbred (RI) population developed from a cross between representatives of the two subspecies of Cucumis melo L.: PI 414723 (subspecies agrestis) and 'Dulce' (subspecies melo). Phenotyping of 99 RI lines was conducte...

  13. Identification of the Core Set of Carbon-Associated Genes in a Bioenergy Grassland Soil

    Science.gov (United States)

    Howe, Adina; Yang, Fan; Williams, Ryan J.; Meyer, Folker; Hofmockel, Kirsten S.

    2016-01-01

    Despite the central role of soil microbial communities in global carbon (C) cycling, little is known about soil microbial community structure and even less about their metabolic pathways. Efforts to characterize soil communities often focus on identifying differences in gene content across environmental gradients, but an alternative question is what genes are similar in soils. These genes may indicate critical species or potential functions that are required in all soils. Here we identified the “core” set of C cycling sequences widely present in multiple soil metagenomes from a fertilized prairie (FP). Of 226,887 sequences associated with known enzymes involved in the synthesis, metabolism, and transport of carbohydrates, 843 were identified to be consistently prevalent across four replicate soil metagenomes. This core metagenome was functionally and taxonomically diverse, representing five enzyme classes and 99 enzyme families within the CAZy database. Though it only comprised 0.4% of all CAZy-associated genes identified in FP metagenomes, the core was found to be comprised of functions similar to those within cumulative soils. The FP CAZy-associated core sequences were present in multiple publicly available soil metagenomes and most similar to soils sharing geographic proximity. In soil ecosystems, where high diversity remains a key challenge for metagenomic investigations, these core genes represent a subset of critical functions necessary for carbohydrate metabolism, which can be targeted to evaluate important C fluxes in these and other similar soils. PMID:27855202

  14. Protein-protein interaction and pathway analyses of top schizophrenia genes reveal schizophrenia susceptibility genes converge on common molecular networks and enrichment of nucleosome (chromatin) assembly genes in schizophrenia susceptibility loci.

    Science.gov (United States)

    Luo, Xiongjian; Huang, Liang; Jia, Peilin; Li, Ming; Su, Bing; Zhao, Zhongming; Gan, Lin

    2014-01-01

    Recent genome-wide association studies have identified many promising schizophrenia candidate genes and demonstrated that common polygenic variation contributes to schizophrenia risk. However, whether these genes represent perturbations to a common but limited set of underlying molecular processes (pathways) that modulate risk to schizophrenia remains elusive, and it is not known whether these genes converge on common biological pathways (networks) or represent different pathways. In addition, the theoretical and genetic mechanisms underlying the strong genetic heterogeneity of schizophrenia remain largely unknown. Using 4 well-defined data sets that contain top schizophrenia susceptibility genes and applying protein-protein interaction (PPI) network analysis, we investigated the interactions among proteins encoded by top schizophrenia susceptibility genes. We found proteins encoded by top schizophrenia susceptibility genes formed a highly significant interconnected network, and, compared with random networks, these PPI networks are statistically highly significant for both direct connectivity and indirect connectivity. We further validated these results using empirical functional data (transcriptome data from a clinical sample). These highly significant findings indicate that top schizophrenia susceptibility genes encode proteins that significantly directly interacted and formed a densely interconnected network, suggesting perturbations of common underlying molecular processes or pathways that modulate risk to schizophrenia. Our findings that schizophrenia susceptibility genes encode a highly interconnected protein network may also provide a novel explanation for the observed genetic heterogeneity of schizophrenia, ie, mutation in any member of this molecular network will lead to same functional consequences that eventually contribute to risk of schizophrenia.

  15. NEAT: an efficient network enrichment analysis test

    OpenAIRE

    Signorelli, Mirko; Vinciotti, Veronica; Wit, Ernst C

    2016-01-01

    Background Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. Results We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises ...

  16. Gene-set meta-analysis of lung cancer identifies pathway related to systemic lupus erythematosus.

    Science.gov (United States)

    Rosenberger, Albert; Sohns, Melanie; Friedrichs, Stefanie; Hung, Rayjean J; Fehringer, Gord; McLaughlin, John; Amos, Christopher I; Brennan, Paul; Risch, Angela; Brüske, Irene; Caporaso, Neil E; Landi, Maria Teresa; Christiani, David C; Wei, Yongyue; Bickeböller, Heike

    2017-01-01

    Gene-set analysis (GSA) is an approach using the results of single-marker genome-wide association studies when investigating pathways as a whole with respect to the genetic basis of a disease. We performed a meta-analysis of seven GSAs for lung cancer, applying the method META-GSA. Overall, the information taken from 11,365 cases and 22,505 controls from within the TRICL/ILCCO consortia was used to investigate a total of 234 pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. META-GSA reveals the systemic lupus erythematosus KEGG pathway hsa05322, driven by the gene region 6p21-22, as also implicated in lung cancer (p = 0.0306). This gene region is known to be associated with squamous cell lung carcinoma. The most important genes driving the significance of this pathway belong to the genomic areas HIST1-H4L, -1BN, -2BN, -H2AK, -H4K and C2/C4A/C4B. Within these areas, the markers most significantly associated with LC are rs13194781 (located within HIST12BN) and rs1270942 (located between C2 and C4A). We have discovered a pathway currently marked as specific to systemic lupus erythematosus as being significantly implicated in lung cancer. The gene region 6p21-22 in this pathway appears to be more extensively associated with lung cancer than previously assumed. Given wide-stretched linkage disequilibrium to the area APOM/BAG6/MSH5, there is currently simply not enough information or evidence to conclude whether the potential pleiotropy of lung cancer and systemic lupus erythematosus is spurious, biological, or mediated. Further research into this pathway and gene region will be necessary.

  17. Genome-Wide Temporal Expression Profiling in Caenorhabditis elegans Identifies a Core Gene Set Related to Long-Term Memory.

    Science.gov (United States)

    Freytag, Virginie; Probst, Sabine; Hadziselimovic, Nils; Boglari, Csaba; Hauser, Yannick; Peter, Fabian; Gabor Fenyves, Bank; Milnik, Annette; Demougin, Philippe; Vukojevic, Vanja; de Quervain, Dominique J-F; Papassotiropoulos, Andreas; Stetak, Attila

    2017-07-12

    The identification of genes related to encoding, storage, and retrieval of memories is a major interest in neuroscience. In the current study, we analyzed the temporal gene expression changes in a neuronal mRNA pool during an olfactory long-term associative memory (LTAM) in Caenorhabditis elegans hermaphrodites. Here, we identified a core set of 712 (538 upregulated and 174 downregulated) genes that follows three distinct temporal peaks demonstrating multiple gene regulation waves in LTAM. Compared with the previously published positive LTAM gene set (Lakhina et al., 2015), 50% of the identified upregulated genes here overlap with the previous dataset, possibly representing stimulus-independent memory-related genes. On the other hand, the remaining genes were not previously identified in positive associative memory and may specifically regulate aversive LTAM. Our results suggest a multistep gene activation process during the formation and retrieval of long-term memory and define general memory-implicated genes as well as conditioning-type-dependent gene sets.SIGNIFICANCE STATEMENT The identification of genes regulating different steps of memory is of major interest in neuroscience. Identification of common memory genes across different learning paradigms and the temporal activation of the genes are poorly studied. Here, we investigated the temporal aspects of Caenorhabditis elegans gene expression changes using aversive olfactory associative long-term memory (LTAM) and identified three major gene activation waves. Like in previous studies, aversive LTAM is also CREB dependent, and CREB activity is necessary immediately after training. Finally, we define a list of memory paradigm-independent core gene sets as well as conditioning-dependent genes. Copyright © 2017 the authors 0270-6474/17/376661-12$15.00/0.

  18. Gene set analyses of genome-wide association studies on 49 quantitative traits measured in a single genetic epidemiology dataset.

    Science.gov (United States)

    Kim, Jihye; Kwon, Ji-Sun; Kim, Sangsoo

    2013-09-01

    Gene set analysis is a powerful tool for interpreting a genome-wide association study result and is gaining popularity these days. Comparison of the gene sets obtained for a variety of traits measured from a single genetic epidemiology dataset may give insights into the biological mechanisms underlying these traits. Based on the previously published single nucleotide polymorphism (SNP) genotype data on 8,842 individuals enrolled in the Korea Association Resource project, we performed a series of systematic genome-wide association analyses for 49 quantitative traits of basic epidemiological, anthropometric, or blood chemistry parameters. Each analysis result was subjected to subsequent gene set analyses based on Gene Ontology (GO) terms using gene set analysis software, GSA-SNP, identifying a set of GO terms significantly associated to each trait (pcorr neuronal or nerve systems.

  19. Enriching Glucoraphanin in Brassica rapa Through Replacement of BrAOP2.2/BrAOP2.3 with Non-functional Genes

    Directory of Open Access Journals (Sweden)

    Zhiyuan Liu

    2017-08-01

    Full Text Available Sulforaphane, the hydrolytic product of glucoraphanin glucosinolate, is a potent anticarcinogen that reduces the risk of several human cancers. However, in most B. rapa vegetables, glucoraphanin is undetectable or only present in trace amounts, since the glucoraphanin that is present is converted to gluconapin by three functional BrAOP2 genes. In this study, to enrich beneficial glucoraphanin content in B. rapa, the functional BrAOP2 alleles were replaced by non-functional counterparts through marker-assisted backcrossing (MAB. We identified non-functional mutations of two BrAOP2 genes from B. rapa. The backcross progenies with introgression of both non-functional braop2.2 and braop2.3 alleles significantly increased the glucoraphanin content by 18 times relative to the recurrent parent. In contrast, replacement or introgression of single non-functional braop2.2 or braop2.3 locus did not change glucoraphanin content. Our results suggest that replacement of these two functional BrAOP2 genes with non-functional alleles has the potential for producing improved Brassica crops with enriched beneficial glucoraphanin content.

  20. Development of a Pacific oyster (Crassostrea gigas) 31,918-feature microarray: identification of reference genes and tissue-enriched expression patterns

    Science.gov (United States)

    2011-01-01

    Background Research using the Pacific oyster Crassostrea gigas as a model organism has experienced rapid growth in recent years due to the development of high-throughput molecular technologies. As many as 56,268 EST sequences have been sequenced to date, representing a genome-wide resource that can be used for transcriptomic investigations. Results In this paper, we developed a Pacific oyster microarray containing oligonucleotides representing 31,918 transcribed sequences selected from the publicly accessible GigasDatabase. This newly designed microarray was used to study the transcriptome of male and female gonads, mantle, gills, posterior adductor muscle, visceral ganglia, hemocytes, labial palps and digestive gland. Statistical analyses identified genes differentially expressed among tissues and clusters of tissue-enriched genes. These genes reflect major tissue-specific functions at the molecular level, such as tissue formation in the mantle, filtering in the gills and labial palps, and reproduction in the gonads. Hierarchical clustering predicted the involvement of unannotated genes in specific functional pathways such as the insulin/NPY pathway, an important pathway under study in our model species. Microarray data also accurately identified reference genes whose mRNA level appeared stable across all the analyzed tissues. Adp-ribosylation factor 1 (arf1) appeared to be the most robust reference for normalizing gene expression data across different tissues and is therefore proposed as a relevant reference gene for further gene expression analysis in the Pacific oyster. Conclusions This study provides a new transcriptomic tool for studies of oyster biology, which will help in the annotation of its genome and which identifies candidate reference genes for gene expression analysis. PMID:21951653

  1. Hydrothermal, biogenic, and seawater components in metalliferous black shales of the Brooks Range, Alaska: Synsedimentary metal enrichment in a carbonate ramp setting

    Science.gov (United States)

    Slack, John F.; Selby, David; Dumoulin, Julie A.

    2015-01-01

    Trace element and Os isotope data for Lisburne Group metalliferous black shales of Middle Mississippian (early Chesterian) age in the Brooks Range of northern Alaska suggest that metals were sourced chiefly from local seawater (including biogenic detritus) but also from externally derived hydrothermal fluids. These black shales are interbedded with phosphorites and limestones in sequences 3 to 35 m thick; deposition occurred mainly on a carbonate ramp during intermittent upwelling under varying redox conditions, from suboxic to anoxic to sulfidic. Deposition of the black shales at ~335 Ma was broadly contemporaneous with sulfide mineralization in the Red Dog and Drenchwater Zn-Pb-Ag deposits, which formed in a distal marginal basin.Relative to the composition of average black shale, the metalliferous black shales (n = 29) display large average enrichment factors (>10) for Zn (10.1), Cd (11.0), and Ag (20.1). Small enrichments (>2–metals.Average authigenic (detrital-free) contents of Mo, V, U, Ni, Cu, Cd, Pb, Ge, Re, Se, As, Sb, Tl, Pd, and Au show enrichment factors of 4.3 × 103 to 1.2 × 106 relative to modern seawater. Such moderate enrichments, which are common in other metalliferous black shales, suggest wholly marine sources (seawater and biogenic material) for these metals, given similar trends for enrichment factors in organic-rich sediments of modern upwelling zones on the Namibian, Peruvian, and Chilean shelves. The largest enrichment factors for Zn and Ag are much higher (1.4 × 107 and 2.9 × 107, respectively), consistent with an appreciable hydrothermal component. Other metals such as Cu, Pb, and Tl that are concentrated in several black shale samples, and are locally abundant in the Red Dog and Drenchwater Zn-Pb-Ag deposits, may have a partly hydrothermal origin but this cannot be fully established with the available data. Enrichments in Cr (up to 7.8 × 106) are attributed to marine and not hydrothermal processes. The presence in some samples

  2. Gene set based association analyses for the WSSV resistance of Pacific white shrimp Litopenaeus vannamei

    Science.gov (United States)

    Yu, Yang; Liu, Jingwen; Li, Fuhua; Zhang, Xiaojun; Zhang, Chengsong; Xiang, Jianhai

    2017-01-01

    White Spot Syndrome Virus (WSSV) is regarded as a virus with the strongest pathogenicity to shrimp. For the threshold trait such as disease resistance, marker assisted selection (MAS) was considered to be a more effective approach. In the present study, association analyses of single nucleotide polymorphisms (SNPs) located in a set of immune related genes were conducted to identify markers associated with WSSV resistance. SNPs were detected by bioinformatics analysis on RNA sequencing data generated by Illimina sequencing platform and Roche 454 sequencing technology. A total of 681 SNPs located in the exons of immune related genes were selected as candidate SNPs. Among these SNPs, 77 loci were genotyped in WSSV susceptible group and resistant group. Association analysis was performed based on logistic regression method under an additive and dominance model in GenABEL package. As a result, five SNPs showed associations with WSSV resistance at a significant level of 0.05. Besides, SNP-SNP interaction analysis was conducted. The combination of SNP loci in TRAF6, Cu/Zn SOD and nLvALF2 exhibited a significant effect on the WSSV resistance of shrimp. Gene expression analysis revealed that these SNPs might influence the expression of these immune-related genes. This study provides a useful method for performing MAS in shrimp. PMID:28094323

  3. Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression

    Institute of Scientific and Technical Information of China (English)

    Amichai Painsky; Saharon Rosset

    2014-01-01

    The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.

  4. Gene expression risk signatures maintain prognostic power in multiple myeloma despite microarray probe set translation

    DEFF Research Database (Denmark)

    Hermansen, N E U; Borup, R; Andersen, M K

    2016-01-01

    INTRODUCTION: Gene expression profiling (GEP) risk models in multiple myeloma are based on 3'-end microarrays. We hypothesized that GEP risk signatures could retain prognostic power despite being translated and applied to whole-transcript microarray data. METHODS: We studied CD138-positive bone...... marrow plasma cells in a prospective cohort of 59 samples from newly diagnosed patients eligible for high-dose therapy (HDT) and 67 samples from previous HDT patients with progressive disease. We used Affymetrix Human Gene 1.1 ST microarrays for GEP. Nine GEP risk signatures were translated by probe set......-87). Various translated GEP risk signatures or combinations hereof were significantly correlated with survival: among newly diagnosed patients mainly in combination with cytogenetic high-risk markers and among relapsed patients mainly in combination with ISS stage III. CONCLUSION: Translated GEP risk...

  5. Functional analysis of genes differentially expressed in the Drosophila wing disc: role of transcripts enriched in the wing region.

    Science.gov (United States)

    Jacobsen, Thomas L; Cain, Donna; Paul, Litty; Justiniano, Steven; Alli, Anwar; Mullins, Jeremi S; Wang, Chun Ping; Butchar, Jon P; Simcox, Amanda

    2006-12-01

    Differential gene expression is the major mechanism underlying the development of specific body regions. Here we assessed the role of genes differentially expressed in the Drosophila wing imaginal disc, which gives rise to two distinct adult structures: the body wall and the wing. Reverse genetics was used to test the function of uncharacterized genes first identified in a microarray screen as having high levels of expression in the presumptive wing. Such genes could participate in elaborating the specific morphological characteristics of the wing. The activity of the genes was modulated using misexpression and RNAi-mediated silencing. Misexpression of eight of nine genes tested caused phenotypes. Of 12 genes tested, 10 showed effective silencing with RNAi transgenes, but only 3 of these had resulting phenotypes. The wing phenotypes resulting from RNAi suggest that CG8780 is involved in patterning the veins in the proximal region of the wing blade and that CG17278 and CG30069 are required for adhesion of wing surfaces. Venation and apposition of the wing surfaces are processes specific to wing development providing a correlation between the expression and function of these genes. The results show that a combination of expression profiling and tissue-specific gene silencing has the potential to identify new genes involved in wing development and hence to contribute to our understanding of this process. However, there are both technical and biological limitations to this approach, including the efficacy of RNAi and the role that gene redundancy may play in masking phenotypes.

  6. Enrichment of Conserved Synaptic Activity-Responsive Element in Neuronal Genes Predicts a Coordinated Response of MEF2, CREB and SRF

    Science.gov (United States)

    Rodríguez-Tornos, Fernanda M.; San Aniceto, Iñigo; Cubelos, Beatriz; Nieto, Marta

    2013-01-01

    A unique synaptic activity-responsive element (SARE) sequence, composed of the consensus binding sites for SRF, MEF2 and CREB, is necessary for control of transcriptional upregulation of the Arc gene in response to synaptic activity. We hypothesize that this sequence is a broad mechanism that regulates gene expression in response to synaptic activation and during plasticity; and that analysis of SARE-containing genes could identify molecular mechanisms involved in brain disorders. To search for conserved SARE sequences in the mammalian genome, we used the SynoR in silico tool, and found the SARE cluster predominantly in the regulatory regions of genes expressed specifically in the nervous system; most were related to neural development and homeostatic maintenance. Two of these SARE sequences were tested in luciferase assays and proved to promote transcription in response to neuronal activation. Supporting the predictive capacity of our candidate list, up-regulation of several SARE containing genes in response to neuronal activity was validated using external data and also experimentally using primary cortical neurons and quantitative real time RT-PCR. The list of SARE-containing genes includes several linked to mental retardation and cognitive disorders, and is significantly enriched in genes that encode mRNA targeted by FMRP (fragile X mental retardation protein). Our study thus supports the idea that SARE sequences are relevant transcriptional regulatory elements that participate in plasticity. In addition, it offers a comprehensive view of how activity-responsive transcription factors coordinate their actions and increase the selectivity of their targets. Our data suggest that analysis of SARE-containing genes will reveal yet-undescribed pathways of synaptic plasticity and additional candidate genes disrupted in mental disease. PMID:23382855

  7. Enrichment of conserved synaptic activity-responsive element in neuronal genes predicts a coordinated response of MEF2, CREB and SRF.

    Directory of Open Access Journals (Sweden)

    Fernanda M Rodríguez-Tornos

    Full Text Available A unique synaptic activity-responsive element (SARE sequence, composed of the consensus binding sites for SRF, MEF2 and CREB, is necessary for control of transcriptional upregulation of the Arc gene in response to synaptic activity. We hypothesize that this sequence is a broad mechanism that regulates gene expression in response to synaptic activation and during plasticity; and that analysis of SARE-containing genes could identify molecular mechanisms involved in brain disorders. To search for conserved SARE sequences in the mammalian genome, we used the SynoR in silico tool, and found the SARE cluster predominantly in the regulatory regions of genes expressed specifically in the nervous system; most were related to neural development and homeostatic maintenance. Two of these SARE sequences were tested in luciferase assays and proved to promote transcription in response to neuronal activation. Supporting the predictive capacity of our candidate list, up-regulation of several SARE containing genes in response to neuronal activity was validated using external data and also experimentally using primary cortical neurons and quantitative real time RT-PCR. The list of SARE-containing genes includes several linked to mental retardation and cognitive disorders, and is significantly enriched in genes that encode mRNA targeted by FMRP (fragile X mental retardation protein. Our study thus supports the idea that SARE sequences are relevant transcriptional regulatory elements that participate in plasticity. In addition, it offers a comprehensive view of how activity-responsive transcription factors coordinate their actions and increase the selectivity of their targets. Our data suggest that analysis of SARE-containing genes will reveal yet-undescribed pathways of synaptic plasticity and additional candidate genes disrupted in mental disease.

  8. Joint genetic analysis using variant sets reveals polygenic gene-context interactions.

    Directory of Open Access Journals (Sweden)

    Francesco Paolo Casale

    2017-04-01

    Full Text Available Joint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait models have been designed to increase power for detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed models to test for interactions between sets of variants and environmental states or other contexts. Our model generalizes previous interaction tests and in particular provides a test for local differences in the genetic architecture between contexts. We first use simulations to validate iSet before applying the model to the analysis of genotype-environment interactions in an eQTL study. Our model retrieves a larger number of interactions than alternative methods and reveals that up to 20% of cases show context-specific configurations of causal variants. Finally, we apply iSet to test for sub-group specific genetic effects in human lipid levels in a large human cohort, where we identify a gene-sex interaction for C-reactive protein that is missed by alternative methods.

  9. Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters.

    Science.gov (United States)

    Yang, Chuhu; Bolotin, Eugene; Jiang, Tao; Sladek, Frances M; Martinez, Ernest

    2007-03-01

    The core promoter of eukaryotic genes is the minimal DNA region that recruits the basal transcription machinery to direct efficient and accurate transcription initiation. The fraction of human and yeast genes that contain specific core promoter elements such as the TATA box and the initiator (INR) remains unclear and core promoter motifs specific for TATA-less genes remain to be identified. Here, we present genome-scale computational analyses indicating that approximately 76% of human core promoters lack TATA-like elements, have a high GC content, and are enriched in Sp1-binding sites. We further identify two motifs - M3 (SCGGAAGY) and M22 (TGCGCANK) - that occur preferentially in human TATA-less core promoters. About 24% of human genes have a TATA-like element and their promoters are generally AT-rich; however, only approximately 10% of these TATA-containing promoters have the canonical TATA box (TATAWAWR). In contrast, approximately 46% of human core promoters contain the consensus INR (YYANWYY) and approximately 30% are INR-containing TATA-less genes. Significantly, approximately 46% of human promoters lack both TATA-like and consensus INR elements. Surprisingly, mammalian-type INR sequences are present - and tend to cluster - in the transcription start site (TSS) region of approximately 40% of yeast core promoters and the frequency of specific core promoter types appears to be conserved in yeast and human genomes. Gene Ontology analyses reveal that TATA-less genes in humans, as in yeast, are frequently involved in basic "housekeeping" processes, while TATA-containing genes are more often highly regulated, such as by biotic or stress stimuli. These results reveal unexpected similarities in the occurrence of specific core promoter types and in their associated biological processes in yeast and humans and point to novel vertebrate-specific DNA motifs that might play a selective role in TATA-independent transcription.

  10. Flavanol-Enriched Cocoa Powder Alters the Intestinal Microbiota, Tissue and Fluid Metabolite Profiles, and Intestinal Gene Expression in Pigs1234

    Science.gov (United States)

    Jang, Saebyeol; Sun, Jianghao; Chen, Pei; Lakshman, Sukla; Molokin, Aleksey; Harnly, James M; Vinyard, Bryan T; Urban, Joseph F; Davis, Cindy D; Solano-Aguilar, Gloria

    2016-01-01

    Background: Consumption of cocoa-derived polyphenols has been associated with several health benefits; however, their effects on the intestinal microbiome and related features of host intestinal health are not adequately understood. Objective: The objective of this study was to determine the effects of eating flavanol-enriched cocoa powder on the composition of the gut microbiota, tissue metabolite profiles, and intestinal immune status. Methods: Male pigs (5 mo old, 28 kg mean body weight) were supplemented with 0, 2.5, 10, or 20 g flavanol-enriched cocoa powder/d for 27 d. Metabolites in serum, urine, the proximal colon contents, liver, and adipose tissue; bacterial abundance in the intestinal contents and feces; and intestinal tissue gene expression of inflammatory markers and Toll-like receptors (TLRs) were then determined. Results: O-methyl-epicatechin-glucuronide conjugates dose-dependently increased (P cocoa powder. The concentration of 3-hydroxyphenylpropionic acid isomers in urine decreased as the dose of cocoa powder fed to pigs increased (75–85%, P cocoa powder/d, respectively. Moreover, consumption of cocoa powder reduced TLR9 gene expression in ileal Peyer’s patches (67–80%, P cocoa powder/d compared with pigs not supplemented with cocoa powder. Conclusion: This study demonstrates that consumption of cocoa powder by pigs can contribute to gut health by enhancing the abundance of Lactobacillus and Bifidobacterium species and modulating markers of localized intestinal immunity. PMID:26936136

  11. Genes conserved in bilaterians but jointly lost with Myc during nematode evolution are enriched in cell proliferation and cell migration functions.

    Science.gov (United States)

    Erives, Albert J

    2015-09-01

    Animals use a stereotypical set of developmental genes to build body architectures of varying sizes and organizational complexity. Some genes are critical to developmental patterning, while other genes are important to physiological control of growth. However, growth regulator genes may not be as important in small-bodied "micro-metazoans" such as nematodes. Nematodes use a simplified developmental strategy of lineage-based cell fate specifications to produce an adult bilaterian body composed of a few hundreds of cells. Nematodes also lost the MYC proto-oncogenic regulator of cell proliferation. To identify additional regulators of cell proliferation that were lost with MYC, we computationally screened and determined 839 high-confidence genes that are conserved in bilaterians/lost in nematodes (CIBLIN genes). We find that 30 % of all CIBLIN genes encode transcriptional regulators of cell proliferation, epithelial-to-mesenchyme transitions, and other processes. Over 50 % of CIBLIN genes are unnamed genes in Drosophila, suggesting that there are many understudied genes. Interestingly, CIBLIN genes include many Myc synthetic lethal (MycSL) hits from recent screens. CIBLIN genes include key regulators of heparan sulfate proteoglycan (HSPG) sulfation patterns, and lysyl oxidases involved in cross-linking and modification of the extracellular matrix (ECM). These genes and others suggest the CIBLIN repertoire services critical functions in ECM remodeling and cell migration in large-bodied bilaterians. Correspondingly, CIBLIN genes are co-expressed with Myc in cancer transcriptomes, and include a preponderance of known determinants of cancer progression and tumor aggression. We propose that CIBLIN gene research can improve our understanding of regulatory control of cellular growth in metazoans.

  12. Gene Selection Integrated with Biological Knowledge for Plant Stress Response Using Neighborhood System and Rough Set Theory.

    Science.gov (United States)

    Meng, Jun; Zhang, Jing; Luan, Yushi

    2015-01-01

    Mining knowledge from gene expression data is a hot research topic and direction of bioinformatics. Gene selection and sample classification are significant research trends, due to the large amount of genes and small size of samples in gene expression data. Rough set theory has been successfully applied to gene selection, as it can select attributes without redundancy. To improve the interpretability of the selected genes, some researchers introduced biological knowledge. In this paper, we first employ neighborhood system to deal directly with the new information table formed by integrating gene expression data with biological knowledge, which can simultaneously present the information in multiple perspectives and do not weaken the information of individual gene for selection and classification. Then, we give a novel framework for gene selection and propose a significant gene selection method based on this framework by employing reduction algorithm in rough set theory. The proposed method is applied to the analysis of plant stress response. Experimental results on three data sets show that the proposed method is effective, as it can select significant gene subsets without redundancy and achieve high classification accuracy. Biological analysis for the results shows that the interpretability is well.

  13. Refining ensembles of predicted gene regulatory networks based on characteristic interaction sets.

    Directory of Open Access Journals (Sweden)

    Lukas Windhager

    Full Text Available Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate

  14. Earthquake forecast enrichment scores

    Directory of Open Access Journals (Sweden)

    Christine Smyth

    2012-03-01

    Full Text Available The Collaboratory for the Study of Earthquake Predictability (CSEP is a global project aimed at testing earthquake forecast models in a fair environment. Various metrics are currently used to evaluate the submitted forecasts. However, the CSEP still lacks easily understandable metrics with which to rank the universal performance of the forecast models. In this research, we modify a well-known and respected metric from another statistical field, bioinformatics, to make it suitable for evaluating earthquake forecasts, such as those submitted to the CSEP initiative. The metric, originally called a gene-set enrichment score, is based on a Kolmogorov-Smirnov statistic. Our modified metric assesses if, over a certain time period, the forecast values at locations where earthquakes have occurred are significantly increased compared to the values for all locations where earthquakes did not occur. Permutation testing allows for a significance value to be placed upon the score. Unlike the metrics currently employed by the CSEP, the score places no assumption on the distribution of earthquake occurrence nor requires an arbitrary reference forecast. In this research, we apply the modified metric to simulated data and real forecast data to show it is a powerful and robust technique, capable of ranking competing earthquake forecasts.

  15. GLANET: genomic loci annotation and enrichment tool.

    Science.gov (United States)

    Otlu, Burçak; Firtina, Can; Keles, Sündüz; Tastan, Oznur

    2017-09-15

    Genomic studies identify genomic loci representing genetic variations, transcription factor (TF) occupancy, or histone modification through next generation sequencing (NGS) technologies. Interpreting these loci requires evaluating them with known genomic and epigenomic annotations. We present GLANET as a comprehensive annotation and enrichment analysis tool which implements a sampling-based enrichment test that accounts for GC content and/or mappability biases, jointly or separately. GLANET annotates and performs enrichment analysis on these loci with a rich library. We introduce and perform novel data-driven computational experiments for assessing the power and Type-I error of its enrichment procedure which show that GLANET has attained high statistical power and well-controlled Type-I error rate. As a key feature, users can easily extend its library with new gene sets and genomic intervals. Other key features include assessment of impact of single nucleotide variants (SNPs) on TF binding sites and regulation based pathway enrichment analysis. GLANET can be run using its GUI or on command line. GLANET's source code is available at https://github.com/burcakotlu/GLANET . Tutorials are provided at https://glanet.readthedocs.org . burcak@ceng.metu.edu.tr or oznur.tastan@cs.bilkent.edu.tr. Supplementary data are available at Bioinformatics online.

  16. An ancient dental gene set governs development and continuous regeneration of teeth in sharks.

    Science.gov (United States)

    Rasch, Liam J; Martin, Kyle J; Cooper, Rory L; Metscher, Brian D; Underwood, Charlie J; Fraser, Gareth J

    2016-07-15

    The evolution of oral teeth is considered a major contributor to the overall success of jawed vertebrates. This is especially apparent in cartilaginous fishes including sharks and rays, which develop elaborate arrays of highly specialized teeth, organized in rows and retain the capacity for life-long regeneration. Perpetual regeneration of oral teeth has been either lost or highly reduced in many other lineages including important developmental model species, so cartilaginous fishes are uniquely suited for deep comparative analyses of tooth development and regeneration. Additionally, sharks and rays can offer crucial insights into the characters of the dentition in the ancestor of all jawed vertebrates. Despite this, tooth development and regeneration in chondrichthyans is poorly understood and remains virtually uncharacterized from a developmental genetic standpoint. Using the emerging chondrichthyan model, the catshark (Scyliorhinus spp.), we characterized the expression of genes homologous to those known to be expressed during stages of early dental competence, tooth initiation, morphogenesis, and regeneration in bony vertebrates. We have found that expression patterns of several genes from Hh, Wnt/β-catenin, Bmp and Fgf signalling pathways indicate deep conservation over ~450 million years of tooth development and regeneration. We describe how these genes participate in the initial emergence of the shark dentition and how they are redeployed during regeneration of successive tooth generations. We suggest that at the dawn of the vertebrate lineage, teeth (i) were most likely continuously regenerative structures, and (ii) utilised a core set of genes from members of key developmental signalling pathways that were instrumental in creating a dental legacy redeployed throughout vertebrate evolution. These data lay the foundation for further experimental investigations utilizing the unique regenerative capacity of chondrichthyan models to answer evolutionary

  17. Transcriptional shift identifies a set of genes driving breast cancer chemoresistance.

    Directory of Open Access Journals (Sweden)

    Laura Vera-Ramirez

    Full Text Available BACKGROUND: Distant recurrences after antineoplastic treatment remain a serious problem for breast cancer clinical management, which threats patients' life. Systemic therapy is administered to eradicate cancer cells from the organism, both at the site of the primary tumor and at any other potential location. Despite this intervention, a significant proportion of breast cancer patients relapse even many years after their primary tumor has been successfully treated according to current clinical standards, evidencing the existence of a chemoresistant cell subpopulation originating from the primary tumor. METHODS/FINDINGS: To identify key molecules and signaling pathways which drive breast cancer chemoresistance we performed gene expression analysis before and after anthracycline and taxane-based chemotherapy and compared the results between different histopathological response groups (good-, mid- and bad-response, established according to the Miller & Payne grading system. Two cohorts of 33 and 73 breast cancer patients receiving neoadjuvant chemotherapy were recruited for whole-genome expression analysis and validation assay, respectively. Identified genes were subjected to a bioinformatic analysis in order to ascertain the molecular function of the proteins they encode and the signaling in which they participate. High throughput technologies identified 65 gene sequences which were over-expressed in all groups (P ≤ 0·05 Bonferroni test. Notably we found that, after chemotherapy, a significant proportion of these genes were over-expressed in the good responders group, making their tumors indistinguishable from those of the bad responders in their expression profile (P ≤ 0.05 Benjamini-Hochgerg`s method. CONCLUSIONS: These data identify a set of key molecular pathways selectively up-regulated in post-chemotherapy cancer cells, which may become appropriate targets for the development of future directed therapies against breast cancer.

  18. Transcriptional Shift Identifies a Set of Genes Driving Breast Cancer Chemoresistance

    Science.gov (United States)

    Vera-Ramirez, Laura; Sanchez-Rovira, Pedro; Ramirez-Tortosa, Cesar L.; Quiles, Jose L.; Ramirez-Tortosa, MCarmen; Lorente, Jose A.

    2013-01-01

    Background Distant recurrences after antineoplastic treatment remain a serious problem for breast cancer clinical management, which threats patients’ life. Systemic therapy is administered to eradicate cancer cells from the organism, both at the site of the primary tumor and at any other potential location. Despite this intervention, a significant proportion of breast cancer patients relapse even many years after their primary tumor has been successfully treated according to current clinical standards, evidencing the existence of a chemoresistant cell subpopulation originating from the primary tumor. Methods/Findings To identify key molecules and signaling pathways which drive breast cancer chemoresistance we performed gene expression analysis before and after anthracycline and taxane-based chemotherapy and compared the results between different histopathological response groups (good-, mid- and bad-response), established according to the Miller & Payne grading system. Two cohorts of 33 and 73 breast cancer patients receiving neoadjuvant chemotherapy were recruited for whole-genome expression analysis and validation assay, respectively. Identified genes were subjected to a bioinformatic analysis in order to ascertain the molecular function of the proteins they encode and the signaling in which they participate. High throughput technologies identified 65 gene sequences which were over-expressed in all groups (P ≤ 0·05 Bonferroni test). Notably we found that, after chemotherapy, a significant proportion of these genes were over-expressed in the good responders group, making their tumors indistinguishable from those of the bad responders in their expression profile (P ≤ 0.05 Benjamini-Hochgerg`s method). Conclusions These data identify a set of key molecular pathways selectively up-regulated in post-chemotherapy cancer cells, which may become appropriate targets for the development of future directed therapies against breast cancer. PMID:23326553

  19. Discharge of KPC-2 genes from the WWTPs contributed to their enriched abundance in the receiving river.

    Science.gov (United States)

    Yang, Fengxia; Huang, Liang; Li, Linyun; Yang, Yang; Mao, Daqing; Luo, Yi

    2017-03-01

    At present, very little is known about the persistence and spread pathway of KPC-2 genes in the environment. Our previous study reported the prevalence and persistence of KPC-2 genes in wastewater treatment plants (WWTPs). In the present work, we investigated the occurrence and fate of KPC-2 genes in a WWTP discharge-receiving river and studied the effect of WWTP discharges on the prevalence of KPC-2 genes and host bacteria in the receiving river. It is observed that a considerable level of KPC-2 genes occurred in the receiving river, and a significant increase of blaKPC-2 abundance in the downstream following WWTP discharge was observed compared to the upstream. Furthermore, opportunistic pathogens with 100% identical blaKPC-2 sequence, like Escherichia coli and Kluyvera georgiana, were isolated from both WWTP and its receiving water, whereas no blaKPC-2 carrying bacteria was detected in the upstream. These findings indicated that the treated wastewater discharges have a considerable influence on blaKPC-2 levels in the receiving river. Interestingly, there is no correlation between concentrations of antibiotics and blaKPC-2 concentrations, demonstrating that the increase of KPC-2 genes in the receiving river is mainly due to WWTP release. This finding is important because it illustrates a significant pathway for KPC-2 gene proliferation to the environment.

  20. BiNChE: A web tool and library for chemical enrichment analysis based on the ChEBI ontology

    OpenAIRE

    Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph

    2015-01-01

    Background Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology,...

  1. Evaluation of endogenous control genes for gene expression studies across multiple tissues and in the specific sets of fat- and muscle-type samples of the pig.

    Science.gov (United States)

    Gu, Y R; Li, M Z; Zhang, K; Chen, L; Jiang, A A; Wang, J Y; Li, X W

    2011-08-01

    To normalize a set of quantitative real-time PCR (q-PCR) data, it is essential to determine an optimal number/set of housekeeping genes, as the abundance of housekeeping genes can vary across tissues or cells during different developmental stages, or even under certain environmental conditions. In this study, of the 20 commonly used endogenous control genes, 13, 18 and 17 genes exhibited credible stability in 56 different tissues, 10 types of adipose tissue and five types of muscle tissue, respectively. Our analysis clearly showed that three optimal housekeeping genes are adequate for an accurate normalization, which correlated well with the theoretical optimal number (r ≥ 0.94). In terms of economical and experimental feasibility, we recommend the use of the three most stable housekeeping genes for calculating the normalization factor. Based on our results, the three most stable housekeeping genes in all analysed samples (TOP2B, HSPCB and YWHAZ) are recommended for accurate normalization of q-PCR data. We also suggest that two different sets of housekeeping genes are appropriate for 10 types of adipose tissue (the HSPCB, ALDOA and GAPDH genes) and five types of muscle tissue (the TOP2B, HSPCB and YWHAZ genes), respectively. Our report will serve as a valuable reference for other studies aimed at measuring tissue-specific mRNA abundance in porcine samples.

  2. A set of genes critical to development is epigenetically poised in mouse germ cells from fetal stages through completion of meiosis.

    Science.gov (United States)

    Lesch, Bluma J; Dokshin, Gregoriy A; Young, Richard A; McCarrey, John R; Page, David C

    2013-10-01

    In multicellular organisms, germ cells carry the hereditary material from one generation to the next. Developing germ cells are unipotent gamete precursors, and mature gametes are highly differentiated, specialized cells. However, upon gamete union at fertilization, their genomes drive a totipotent program, giving rise to a complete embryo as well as extraembryonic tissues. The biochemical basis for the ability to transition from differentiated cell to totipotent zygote is unknown. Here we report that a set of developmentally critical genes is maintained in an epigenetically poised (bivalent) state from embryonic stages through the end of meiosis. We performed ChIP-seq and RNA-seq analysis on flow-sorted male and female germ cells during embryogenesis at three time points surrounding sexual differentiation and female meiotic initiation, and then extended our analysis to meiotic and postmeiotic male germ cells. We identified a set of genes that is highly enriched for regulators of differentiation and retains a poised state (high H3K4me3, high H3K27me3, and lack of expression) across sexes and across developmental stages, including in haploid postmeiotic cells. The existence of such a state in embryonic stem cells has been well described. We now demonstrate that a subset of genes is maintained in a poised state in the germ line from the initiation of sexual differentiation during fetal development and into postmeiotic stages. We propose that the epigenetically poised condition of these developmental genes is a fundamental property of the mammalian germ-line nucleus, allowing differentiated gametes to unleash a totipotent program following fertilization.

  3. Risk alleles of genes with monoallelic expression are enriched in gain-of-function variants and depleted in loss-of-function variants for neurodevelopmental disorders.

    Science.gov (United States)

    Savova, V; Vinogradova, S; Pruss, D; Gimelbrant, A A; Weiss, L A

    2017-03-07

    Over 3000 human genes can be expressed from a single allele in one cell, and from the other allele-or both-in neighboring cells. Little is known about the consequences of this epigenetic phenomenon, monoallelic expression (MAE). We hypothesized that MAE increases expression variability, with a potential impact on human disease. Here, we use a chromatin signature to infer MAE for genes in lymphoblastoid cell lines and human fetal brain tissue. We confirm that across clones MAE status correlates with expression level, and that in human tissue data sets, MAE genes show increased expression variability. We then compare mono- and biallelic genes at three distinct scales. In the human population, we observe that genes with polymorphisms influencing expression variance are more likely to be MAE (PMolecular Psychiatry advance online publication, 7 March 2017; doi:10.1038/mp.2017.13.

  4. Atrazine biodegradation efficiency, metabolite detection, and trzD gene expression by enrichment bacterial cultures from agricultural soil.

    Science.gov (United States)

    Solomon, Robinson David Jebakumar; Kumar, Amit; Satheeja Santhi, Velayudhan

    2013-12-01

    Atrazine is a selective herbicide used in agricultural fields to control the emergence of broadleaf and grassy weeds. The persistence of this herbicide is influenced by the metabolic action of habituated native microorganisms. This study provides information on the occurrence of atrazine mineralizing bacterial strains with faster metabolizing ability. The enrichment cultures were tested for the biodegradation of atrazine by high-performance liquid chromatography (HPLC) and mass spectrometry. Nine cultures JS01.Deg01 to JS09.Deg01 were identified as the degrader of atrazine in the enrichment culture. The three isolates JS04.Deg01, JS07.Deg01, and JS08.Deg01 were identified as efficient atrazine metabolizers. Isolates JS04.Deg01 and JS07.Deg01 produced hydroxyatrazine (HA) N-isopropylammelide and cyanuric acid by dealkylation reaction. The isolate JS08.Deg01 generated deethylatrazine (DEA), deisopropylatrazine (DIA), and cyanuric acid by N-dealkylation in the upper degradation pathway and later it incorporated cyanuric acid in their biomass by the lower degradation pathway. The optimum pH for degrading atrazine by JS08.Deg01 was 7.0 and 16S rDNA phylogenetic typing identified it as Enterobacter cloacae strain JS08.Deg01. The highest atrazine mineralization was observed in case of isolate JS08.Deg01, where an ample amount of trzD mRNA was quantified at 72 h of incubation with atrazine. Atrazine bioremediating isolate E. cloacae strain JS08.Deg01 could be the better environmental remediator of agricultural soils and the crop fields contaminated with atrazine could be the source of the efficient biodegrading microbial strains for the environmental cleanup process.

  5. Atrazine biodegradation efficiency, metabolite detection, and trzD gene expression by enrichment bacterial cultures from agricultural soil

    Institute of Scientific and Technical Information of China (English)

    Robinson David Jebakumar SOLOMON; Amit KUMAR; Velayudhan SATHEEJA SANTHI

    2013-01-01

    Atrazine is a selective herbicide used in agricultural fields to control the emergence of broadleaf and grassy weeds. The persistence of this herbicide is influenced by the metabolic action of habituated native microor-ganisms. This study provides information on the occurrence of atrazine mineralizing bacterial strains with faster me-tabolizing ability. The enrichment cultures were tested for the biodegradation of atrazine by high-performance liquid chromatography (HPLC) and mass spectrometry. Nine cultures JS01.Deg01 to JS09.Deg01 were identified as the degrader of atrazine in the enrichment culture. The three isolates JS04.Deg01, JS07.Deg01, and JS08.Deg01 were identified as efficient atrazine metabolizers. Isolates JS04.Deg01 and JS07.Deg01 produced hydroxyatrazine (HA) N-isopropylammelide and cyanuric acid by dealkylation reaction. The isolate JS08.Deg01 generated deethylatrazine (DEA), deisopropylatrazine (DIA), and cyanuric acid by N-dealkylation in the upper degradation pathway and later it incorporated cyanuric acid in their biomass by the lower degradation pathway. The optimum pH for degrading atrazine by JS08.Deg01 was 7.0 and 16S rDNA phylogenetic typing identified it as Enterobacter cloacae strain JS08.Deg01. The highest atrazine mineralization was observed in case of isolate JS08.Deg01, where an ample amount of trzD mRNA was quantified at 72 h of incubation with atrazine. Atrazine bioremediating isolate E. cloacae strain JS08.Deg01 could be the better environmental remediator of agricultural soils and the crop fields contaminated with atrazine could be the source of the efficient biodegrading microbial strains for the environmental cleanup process.

  6. Atrazine biodegradation efficiency, metabolite detection, and trzD gene expression by enrichment bacterial cultures from agricultural soil

    Science.gov (United States)

    Solomon, Robinson David Jebakumar; Kumar, Amit; Satheeja Santhi, Velayudhan

    2013-01-01

    Atrazine is a selective herbicide used in agricultural fields to control the emergence of broadleaf and grassy weeds. The persistence of this herbicide is influenced by the metabolic action of habituated native microorganisms. This study provides information on the occurrence of atrazine mineralizing bacterial strains with faster metabolizing ability. The enrichment cultures were tested for the biodegradation of atrazine by high-performance liquid chromatography (HPLC) and mass spectrometry. Nine cultures JS01.Deg01 to JS09.Deg01 were identified as the degrader of atrazine in the enrichment culture. The three isolates JS04.Deg01, JS07.Deg01, and JS08.Deg01 were identified as efficient atrazine metabolizers. Isolates JS04.Deg01 and JS07.Deg01 produced hydroxyatrazine (HA) N-isopropylammelide and cyanuric acid by dealkylation reaction. The isolate JS08.Deg01 generated deethylatrazine (DEA), deisopropylatrazine (DIA), and cyanuric acid by N-dealkylation in the upper degradation pathway and later it incorporated cyanuric acid in their biomass by the lower degradation pathway. The optimum pH for degrading atrazine by JS08.Deg01 was 7.0 and 16S rDNA phylogenetic typing identified it as Enterobacter cloacae strain JS08.Deg01. The highest atrazine mineralization was observed in case of isolate JS08.Deg01, where an ample amount of trzD mRNA was quantified at 72 h of incubation with atrazine. Atrazine bioremediating isolate E. cloacae strain JS08.Deg01 could be the better environmental remediator of agricultural soils and the crop fields contaminated with atrazine could be the source of the efficient biodegrading microbial strains for the environmental cleanup process. PMID:24302716

  7. Module network inference from a cancer gene expression data set identifies microRNA regulated modules.

    Directory of Open Access Journals (Sweden)

    Eric Bonnet

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are small RNAs that recognize and regulate mRNA target genes. Multiple lines of evidence indicate that they are key regulators of numerous critical functions in development and disease, including cancer. However, defining the place and function of miRNAs in complex regulatory networks is not straightforward. Systems approaches, like the inference of a module network from expression data, can help to achieve this goal. METHODOLOGY/PRINCIPAL FINDINGS: During the last decade, much progress has been made in the development of robust and powerful module network inference algorithms. In this study, we analyze and assess experimentally a module network inferred from both miRNA and mRNA expression data, using our recently developed module network inference algorithm based on probabilistic optimization techniques. We show that several miRNAs are predicted as statistically significant regulators for various modules of tightly co-expressed genes. A detailed analysis of three of those modules demonstrates that the specific assignment of miRNAs is functionally coherent and supported by literature. We further designed a set of experiments to test the assignment of miR-200a as the top regulator of a small module of nine genes. The results strongly suggest that miR-200a is regulating the module genes via the transcription factor ZEB1. Interestingly, this module is most likely involved in epithelial homeostasis and its dysregulation might contribute to the malignant process in cancer cells. CONCLUSIONS/SIGNIFICANCE: Our results show that a robust module network analysis of expression data can provide novel insights of miRNA function in important cellular processes. Such a computational approach, starting from expression data alone, can be helpful in the process of identifying the function of miRNAs by suggesting modules of co-expressed genes in which they play a regulatory role. As shown in this study, those modules can then be

  8. Gene Expression Differences between Enriched Normal and Chronic Myelogenous Leukemia Quiescent Stem/Progenitor Cells and Correlations with Biological Abnormalities

    Directory of Open Access Journals (Sweden)

    M. Affer

    2011-01-01

    Full Text Available In comparing gene expression of normal and CML CD34+ quiescent (G0 cell, 292 genes were downregulated and 192 genes upregulated in the CML/G0 Cells. The differentially expressed genes were grouped according to their reported functions, and correlations were sought with biological differences previously observed between the same groups. The most relevant findings include the following. (i CML G0 cells are in a more advanced stage of development and more poised to proliferate than normal G0 cells. (ii When CML G0 cells are stimulated to proliferate, they differentiate and mature more rapidly than normal counterpart. (iii Whereas normal G0 cells form only granulocyte/monocyte colonies when stimulated by cytokines, CML G0 cells form a combination of the above and erythroid clusters and colonies. (iv Prominin-1 is the gene most downregulated in CML G0 cells, and this appears to be associated with the spontaneous formation of erythroid colonies by CML progenitors without EPO.

  9. Multiplex Real-Time PCR Assays for Screening of Shiga Toxin 1 and 2 Genes, Including All Known Subtypes, and Escherichia coli O26-, O111-, and O157-Specific Genes in Beef and Sprout Enrichment Cultures.

    Science.gov (United States)

    Harada, Tetsuya; Iguchi, Atsushi; Iyoda, Sunao; Seto, Kazuko; Taguchi, Masumi; Kumeda, Yuko

    2015-10-01

    Shiga toxin family members have recently been classified using a new nomenclature into three Stx1 subtypes (Stx1a, Stx1c, and Stx1d) and seven Stx2 subtypes (Stx2a, Stx2b, Stx2c, Stx2d, Stx2e, Stx2f, and Stx2g). To develop screening methods for Stx genes, including all of these subtype genes, and Escherichia coli O26-, O111-, and O157-specific genes in laboratory investigations of Shiga toxin-producing E. coli (STEC) foodborne cases, we developed multiplex real-time PCR assays and evaluated their specificity and quantitative accuracy using STEC and non-STEC isolates, recombinant plasmids, and food enrichment cultures and by performing STEC spiking experiments with beef and sprout enrichment cultures. In addition, we evaluated the relationship between the recovery rates of the target strains by direct plating and immunomagnetic separation and the cycle threshold (CT) values of the real-time PCR assays for the Stx subtypes and STEC O26, O111, and O157 serogroups. All three stx1- and seven stx2-subtype genes were detected by real-time PCR with high sensitivity and specificity, and the quantitative accuracy of this assay was confirmed using control plasmids and STEC spiking experiments. The results of the STEC spiking experiments suggest that it is not routinely possible to isolate STEC from enrichment cultures with real-time PCR CT values greater than 30 by direct plating on MacConkey agar, although highly selective media and immunomagnetic beads were able to isolate the inoculated strains from the enrichment cultures. These data suggest that CT values obtained from the highly quantitative real-time PCR assays developed in this study provide useful information to develop effective isolation strategies for STEC from food samples. The real-time PCR assays developed here are expected to aid in investigations of infections or outbreaks caused by STEC harboring any of the stx-subtype genes in the new Stx nomenclature, as well as STEC O26, O111, and O157.

  10. HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

    Science.gov (United States)

    Le, Thanh; Altman, Tom; Gardiner, Katheleen

    2010-02-01

    Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

  11. A transcriptomic approach to identify regulatory genes involved in fruit set of wild-type and parthenocarpic tomato genotypes.

    Science.gov (United States)

    Ruiu, Fabrizio; Picarella, Maurizio Enea; Imanishi, Shunsuke; Mazzucato, Andrea

    2015-10-01

    The tomato parthenocarpic fruit (pat) mutation associates a strong competence for parthenocarpy with homeotic transformation of anthers and aberrancy of ovules. To dissect this complex floral phenotype, genes involved in the pollination-independent fruit set of the pat mutant were investigated by microarray analysis using wild-type and mutant ovaries. Normalized expression data were subjected to one-way ANOVA and 2499 differentially expressed genes (DEGs) displaying a >1.5 log-fold change in at least one of the pairwise comparisons analyzed were detected. DEGs were categorized into 20 clusters and clusters classified into five groups representing transcripts with similar expression dynamics. The "regulatory function" group (685 DEGs) contained putative negative or positive fruit set regulators, "pollination-dependent" (411 DEGs) included genes activated by pollination, "fruit growth-related" (815 DEGs) genes activated at early fruit growth. The last groups listed genes with different or similar expression pattern at all stages in the two genotypes. qRT-PCR validation of 20 DEGs plus other four selected genes assessed the high reliability of microarray expression data; the average correlation coefficient for the 20 DEGs was 0.90. In all the groups were evidenced relevant transcription factors encoding proteins regulating meristem differentiation and floral organ development, genes involved in metabolism, transport and response of hormones, genes involved in cell division and in primary and secondary metabolism. Among pathways related to secondary metabolites emerged genes related to the synthesis of flavonoids, supporting the recent evidence that these compounds are important at the fruit set phase. Selected genes showing a de-regulated expression pattern in pat were studied in other four parthenocarpic genotypes either genetically anonymous or carrying lesions in known gene sequences. This comparative approach offered novel insights for improving the present

  12. Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

    Directory of Open Access Journals (Sweden)

    Sugantham Priyanka Annabel

    2010-10-01

    Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding

  13. Identification of enriched driver gene alterations in subgroups of non-small cell lung cancer patients based on histology and smoking status.

    Directory of Open Access Journals (Sweden)

    She-Juan An

    Full Text Available BACKGROUND: Appropriate patient selection is needed for targeted therapies that are efficacious only in patients with specific genetic alterations. We aimed to define subgroups of patients with candidate driver genes in patients with non-small cell lung cancer. METHODS: Patients with primary lung cancer who underwent clinical genetic tests at Guangdong General Hospital were enrolled. Driver genes were detected by sequencing, high-resolution melt analysis, qPCR, or multiple PCR and RACE methods. RESULTS: 524 patients were enrolled in this study, and the differences in driver gene alterations among subgroups were analyzed based on histology and smoking status. In a subgroup of non-smokers with adenocarcinoma, EGFR was the most frequently altered gene, with a mutation rate of 49.8%, followed by EML4-ALK (9.3%, PTEN (9.1%, PIK3CA (5.2%, c-Met (4.8%, KRAS (4.5%, STK11 (2.7%, and BRAF (1.9%. The three most frequently altered genes in a subgroup of smokers with adenocarcinoma were EGFR (22.0%, STK11 (19.0%, and KRAS (12.0%. We only found EGFR (8.0%, c-Met (2.8%, and PIK3CA (2.6% alterations in the non-smoker with squamous cell carcinoma (SCC subgroup. PTEN (16.1%, STK11 (8.3%, and PIK3CA (7.2% were the three most frequently enriched genes in smokers with SCC. DDR2 and FGFR2 only presented in smokers with SCC (4.4% and 2.2%, respectively. Among these four subgroups, the differences in EGFR, KRAS, and PTEN mutations were statistically significant. CONCLUSION: The distinct features of driver gene alterations in different subgroups based on histology and smoking status were helpful in defining patients for future clinical trials that target these genes. This study also suggests that we may consider patients with infrequent alterations of driver genes as having rare or orphan diseases that should be managed with special molecularly targeted therapies.

  14. A new set of reference genes for RT-qPCR assays in the yeast Dekkera bruxellensis.

    Science.gov (United States)

    de Barros Pita, Will; Leite, Fernanda Cristina Bezerra; de Souza Liberal, Anna Theresa; Pereira, Luciana Filgueira; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante; de Morais, Marcos Antonio

    2012-12-01

    The yeast Dekkera bruxellensis has been recently regarded as an important microorganism for bioethanol production owing to its ability to convert glucose, sucrose, and cellobiose to ethanol. The aim of this work was to validate a new set of reference genes for gene expression analysis by quantitative real-time PCR in D. bruxellensis and compare the influence of the method of choice for quantification of mRNA levels with the reliability of our data. Three candidate reference genes, DbEFA1, DbEFB1, and DbYNA1, were used in a quantitative analysis of 4 genes of interest, DbYNR1, DbTPS1, DbADH7, and DbUBA4, based on an approach for calculating the normalization factors by means of the geNorm applet. Each reference gene was also individually used for a 2(-ΔΔC(q)) (comparative C(q) method) calculation of the relative expression of genes of interest. Our results showed that the 3 reference genes provided enough stability and were complementary to the normalization factors method in different culture conditions. This work was able to confirm the usefulness of a previously reported reference gene, EFA1/TEF1, and increased the set of possible reference genes in D. bruxellensis to 4. Moreover, this can improve the reliability of the analysis of the regulation of gene expression in the industrial yeast D. bruxellensis.

  15. A meta-analysis of multiple matched copy number and transcriptomics data sets for inferring gene regulatory relationships.

    Directory of Open Access Journals (Sweden)

    Richard Newton

    Full Text Available Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments.

  16. Higher primates, but not New World monkeys, have a duplicate set of enhancers flanking their apoC-I genes.

    Science.gov (United States)

    Puppione, Donald L

    2014-09-01

    Previous studies have demonstrated that the apoC-I gene and its pseudogene on human chromosome 19 are flanked by a duplicate set of enhancers. Multienhancers, ME.1 and ME.2, are located upstream from the genes and the hepatic control region enhancers, HCR.1 and HCR.2, are located downstream. The duplication of the enhancers has been thought to have occurred when the apoC-I gene was duplicated during primate evolution. Currently, the only primate data are for the human enhancers. Examining the genome of other primates (great and lesser apes, Old and New World monkeys), it was possible to locate the duplicate set of enhancers in apes and Old World monkeys. However, only a single set was found in New World monkeys. These observations provide additional evidence that the apoC-I gene and the flanking enhancers underwent duplication after the divergence of Old and New World monkeys.

  17. ChIP-enriched in silico targets (ChEST), a ChIP-on-chip approach applied to analyzing skeletal muscle genes.

    Science.gov (United States)

    Junion, Guillaume; Jagla, Krzysztof

    2012-01-01

    Mapping the cis-regulatory modules (CRMs) to which bind myogenic transcription factors is an -obligatory step towards understanding gene regulatory networks governing muscle development and function. This can be achieved in silico or by chromatin immunoprecipitation (ChIP) approaches. We have developed a ChIP-enriched in silico targets (ChEST) strategy designed for mapping the CRMs by combining in silico and ChIP methods. ChEST involves a software-assisted prediction of transcription factor (TF) - specific CRMs, which are spotted to produce a computed genomic CRM microarray. In parallel, the in vivo pool of targets of a given TF is isolated by ChIP and used as a probe for hybridization with the array generated. Here we describe ChEST strategy applied to identify direct targets of Myogenic Enhancer Factor, Dmef2 in Drosophila embryos.

  18. Transposable elements are enriched within or in close proximity to xenobiotic-metabolizing cytochrome P450 genes

    Directory of Open Access Journals (Sweden)

    Li Xianchun

    2007-03-01

    Full Text Available Abstract Background Transposons, i.e. transposable elements (TEs, are the major internal spontaneous mutation agents for the variability of eukaryotic genomes. To address the general issue of whether transposons mediate genomic changes in environment-adaptation genes, we scanned two alleles per each of the six xenobiotic-metabolizing Helicoverpa zea cytochrome P450 loci, including CYP6B8, CYP6B27, CYP321A1, CYP321A2, CYP9A12v3 and CYP9A14, for the presence of transposon insertions by genome walking and sequence analysis. We also scanned thirteen Drosophila melanogaster P450s genes for TE insertions by in silico mapping and literature search. Results Twelve novel transposons, including LINEs (long interspersed nuclear elements, SINEs (short interspersed nuclear elements, MITEs (miniature inverted-repeat transposable elements, one full-length transib-like transposon, and one full-length Tcl-like DNA transpson, are identified from the alleles of the six H. zea P450 genes. The twelve transposons are inserted into the 5'flanking region, 3'flanking region, exon, or intron of the six environment-adaptation P450 genes. In D. melanogaster, seven out of the eight Drosophila P450s (CYP4E2, CYP6A2, CYP6A8, CYP6A9, CYP6G1, CYP6W1, CYP12A4, CYP12D1 implicated in insecticide resistance are associated with a variety of transposons. By contrast, all the five Drosophila P450s (CYP302A1, CYP306A1, CYP307A1, CYP314A1 and CYP315A1 involved in ecdysone biosynthesis and developmental regulation are free of TE insertions. Conclusion These results indicate that TEs are selectively retained within or in close proximity to xenobiotic-metabolizing P450 genes.

  19. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    2016-01-01

    Full Text Available Among non-small cell lung cancer (NSCLC, adenocarcinoma (AC, and squamous cell carcinoma (SCC are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR, can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  20. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm.

    Science.gov (United States)

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu; Tian, Suyan

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  1. A Gene Co-Expression Network in Whole Blood of Schizophrenia Patients Is Independent of Antipsychotic-Use and Enriched for Brain-Expressed Genes

    NARCIS (Netherlands)

    de Jong, Simone; Boks, Marco P. M.; Fuller, Tova F.; Strengman, Eric; Janson, Esther; de Kovel, Carolien G. F.; Ori, Anil P. S.; Vi, Nancy; Mulder, Flip; Blom, Jan Dirk; Glenthoj, Birte; Schubart, Chris D.; Cahn, Wiepke; Kahn, Rene S.; Horvath, Steve; Ophoff, Roel A.

    2012-01-01

    Despite large-scale genome-wide association studies (GWAS), the underlying genes for schizophrenia are largely unknown. Additional approaches are therefore required to identify the genetic background of this disorder. Here we report findings from a large gene expression study in peripheral blood of

  2. Conjugated linoleic acid-enriched butter improved memory and up-regulated phospholipase A2 encoding-genes in rat brain tissue.

    Science.gov (United States)

    Gama, Marco A S; Raposo, Nádia R B; Mury, Fábio B; Lopes, Fernando C F; Dias-Neto, Emmanuel; Talib, Leda L; Gattaz, Wagner F

    2015-10-01

    Reduced phospholipase A2 (PLA2) activity has been reported in blood cells and in postmortem brains of patients with Alzheimer disease (AD), and there is evidence that conjugated linoleic acid (CLA) modulates the activity of PLA2 groups in non-brain tissues. As CLA isomers were shown to be actively incorporated and metabolized in the brains of rats, we hypothesized that feeding a diet naturally enriched in CLA would affect the activity and expression of Pla 2 -encoding genes in rat brain tissue, with possible implications for memory. To test this hypothesis, Wistar rats were trained for the inhibitory avoidance task and fed a commercial diet (control) or experimental diets containing either low CLA- or CLA-enriched butter for 4 weeks. After this period, the rats were tested for memory retrieval and killed for tissue collection. Hippocampal expression of 19 Pla 2 genes was evaluated by qPCR, and activities of PLA2 groups (cPLA2, iPLA2, and sPLA2) were determined by radioenzymatic assay. Rats fed the high CLA diet had increased hippocampal mRNA levels for specific PLA2 isoforms (iPla 2 g6γ; cPla 2 g4a, sPla 2 g3, sPla 2 g1b, and sPla 2 g12a) and higher enzymatic activity of all PLA2 groups as compared to those fed the control and the low CLA diet. The increment in PLA2 activities correlated significantly with memory enhancement, as assessed by increased latency in the step-down inhibitory avoidance task after 4 weeks of treatment (rs = 0.69 for iPLA2, P < 0.001; rs = 0.81 for cPLA2, P < 0.001; and rs = 0.69 for sPLA2, P < 0.001). In face of the previous reports showing reduced PLA2 activity in AD brains, the present findings suggest that dairy products enriched in cis-9, trans-11 CLA may be useful in the treatment of this disease.

  3. pathDIP: an annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis

    Science.gov (United States)

    Rahmati, Sara; Abovsky, Mark; Pastrello, Chiara; Jurisica, Igor

    2017-01-01

    Molecular pathway data are essential in current computational and systems biology research. While there are many primary and integrated pathway databases, several challenges remain, including low proteome coverage (57%), low overlap across different databases, unavailability of direct information about underlying physical connectivity of pathway members, and high fraction of protein-coding genes without any pathway annotations, i.e. ‘pathway orphans’. In order to address all these challenges, we developed pathDIP, which integrates data from 20 source pathway databases, ‘core pathways’, with physical protein–protein interactions to predict biologically relevant protein–pathway associations, referred to as ‘extended pathways’. Cross-validation determined 71% recovery rate of our predictions. Data integration and predictions increase coverage of pathway annotations for protein-coding genes to 86%, and provide novel annotations for 5732 pathway orphans. PathDIP (http://ophid.utoronto.ca/pathdip) annotates 17 070 protein-coding genes with 4678 pathways, and provides multiple query, analysis and output options. PMID:27899558

  4. Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrep-resented upstream motifs

    Directory of Open Access Journals (Sweden)

    Silengo Lorenzo

    2004-05-01

    Full Text Available Abstract Background Transcriptional regulation is a key mechanism in the functioning of the cell, and is mostly effected through transcription factors binding to specific recognition motifs located upstream of the coding region of the regulated gene. The computational identification of such motifs is made easier by the fact that they often appear several times in the upstream region of the regulated genes, so that the number of occurrences of relevant motifs is often significantly larger than expected by pure chance. Results To exploit this fact, we construct sets of genes characterized by the statistical overrepresentation of a certain motif in their upstream regions. Then we study the functional characterization of these sets by analyzing their annotation to Gene Ontology terms. For the sets showing a statistically significant specific functional characterization, we conjecture that the upstream motif characterizing the set is a binding site for a transcription factor involved in the regulation of the genes in the set. Conclusions The method we propose is able to identify many known binding sites in S. cerevisiae and new candidate targets of regulation by known transcritpion factors. Its application to less well studied organisms is likely to be valuable in the exploration of their regulatory interaction network.

  5. Transcriptome Analysis Reveals Candidate Genes Involved in Gibberellin-Induced Fruit Setting in Triploid Loquat (Eriobotrya japonica)

    Science.gov (United States)

    Jiang, Shuang; Luo, Jun; Xu, Fanjie; Zhang, Xueying

    2016-01-01

    The triploid loquat (Eriobotrya japonica) is a new germplasm with a high edible fruit rate. Under natural conditions, the triploid loquat has a low fruit setting ratio (not more than 10 fruits in a tree), reflecting fertilization failure. To unravel the molecular mechanism of gibberellin (GA) treatment to induce parthenocarpy in triploid loquats, a transcriptome analysis of fruit setting induced by GA3 was analyzed using RNA-seq at four different stages during the development of young fruit. Approximately 344 million high quality reads in seven libraries were de novo assembled, yielding 153,900 unique transcripts with more than 79.9% functionally annotated transcripts. A total of 2,220, 2,974, and 1,614 differentially expressed genes (DEGs) were observed at 3, 7, and 14 days after GA treatment, respectively. The weighted gene co-expression network and Venn diagram analysis of DEGs revealed that sixteen candidate genes may play critical roles in the fruit setting after GA treatment. Five genes were related to auxin, in which one auxin synthesis gene of yucca was upregulated, suggesting that auxin may act as a signal for fruit setting. Furthermore, ABA 8′-hydroxylase was upregulated, while ethylene-forming enzyme was downregulated, suggesting that multiple hormones may be involved in GA signaling. Four transcription factors, NAC7, NAC23, bHLH35, and HD16, were potentially negatively regulated in fruit setting, and two cell division-related genes, arr9 and CYCA3, were upregulated. In addition, the expression of the GA receptor gid1 was downregulated by GA treatment, suggesting that the negative feedback mechanism in GA signaling may be regulated by gid1. Altogether, the results of the present study provide information from a comprehensive gene expression analysis and insight into the molecular mechanism underlying fruit setting under GA treatment in E. japonica. PMID:28066478

  6. Enrichment of provitamin A content in wheat (Triticum aestivum L.) by introduction of the bacterial carotenoid biosynthetic genes CrtB and CrtI.

    Science.gov (United States)

    Wang, Cheng; Zeng, Jian; Li, Yin; Hu, Wei; Chen, Ling; Miao, Yingjie; Deng, Pengyi; Yuan, Cuihong; Ma, Cheng; Chen, Xi; Zang, Mingli; Wang, Qiong; Li, Kexiu; Chang, Junli; Wang, Yuesheng; Yang, Guangxiao; He, Guangyuan

    2014-06-01

    Carotenoid content is a primary determinant of wheat nutritional value and affects its end-use quality. Wheat grains contain very low carotenoid levels and trace amounts of provitamin A content. In order to enrich the carotenoid content in wheat grains, the bacterial phytoene synthase gene (CrtB) and carotene desaturase gene (CrtI) were transformed into the common wheat cultivar Bobwhite. Expression of CrtB or CrtI alone slightly increased the carotenoid content in the grains of transgenic wheat, while co-expression of both genes resulted in a darker red/yellow grain phenotype, accompanied by a total carotenoid content increase of approximately 8-fold achieving 4.76 μg g(-1) of seed dry weight, a β-carotene increase of 65-fold to 3.21 μg g(-1) of seed dry weight, and a provitamin A content (sum of α-carotene, β-carotene, and β-cryptoxanthin) increase of 76-fold to 3.82 μg g(-1) of seed dry weight. The high provitamin A content in the transgenic wheat was stably inherited over four generations. Quantitative PCR analysis revealed that enhancement of provitamin A content in transgenic wheat was also a result of the highly coordinated regulation of endogenous carotenoid biosynthetic genes, suggesting a metabolic feedback regulation in the wheat carotenoid biosynthetic pathway. These transgenic wheat lines are not only valuable for breeding wheat varieties with nutritional benefits for human health but also for understanding the mechanism regulating carotenoid biosynthesis in wheat endosperm.

  7. Effect of a long-chain n-3 polyunsaturated fatty acid-enriched diet on adipose tissue lipid profiles and gene expression in Holstein dairy cows.

    Science.gov (United States)

    Elis, Sebastien; Desmarchais, Alice; Freret, Sandrine; Maillard, Virginie; Labas, Valérie; Cognié, Juliette; Briant, Eric; Hivelin, Celine; Dupont, Joëlle; Uzbekova, Svetlana

    2016-12-01

    The objective of this study was to determine whether fish oil supplement has an effect on adipose tissue lipid profiles and gene expression in postpartum dairy cows. Holstein cows were supplemented with either long-chain n-3 polyunsaturated fatty acid (PUFA; protected fish oil) or control PUFA (n-6; toasted soybeans) for 2mo after calving (n=23 per diet). These cows showed no difference in milk production or metabolic parameters, but exhibited a tendency toward a decrease in early embryo mortality rate after artificial insemination. We hypothesized that, in addition to this effect, modifications in adipose tissue (AT) gene expression and lipid profiles would occur in response to diet. Subcutaneous AT samples were thus collected from the dewlaps of n-3 and n-6 dairy cows at 1mo antepartum, and 1wk, 2mo, and 5mo postpartum for the analysis of lipids and gene expression. Lipid profiles were obtained by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry in both positive and negative modes. We found 37 lipid species in the 200 to 1,200 m/z range, which differed between the n-3 and control groups, suggesting that the n-3 supplement affected the lipid composition through the enrichment of lipids integrating long-chain PUFA from fish oil sources: eicosapentaenoic and docosahexaenoic acid. Moreover, a decrease in triacylglycerolipids was observed in AT of n-3 supplemented cows. The expression of 44 genes involved in fatty acid metabolism and the adipokine system was assessed by real-time reverse-transcription PCR. Hierarchical clustering, according to either postpartum stage or diet, enabled us to group genes exhibiting similar kinetic properties during lactation or by those that varied in similar ways after n-3 supplementation, respectively. Among the genes exhibiting a dietary effect, FABP4, LIPE, CD36, and PLIN1 were overexpressed in n-3 AT samples compared with the control, suggesting an increase in lipolysis due to n-3 supplementation, which

  8. Characterization of an 18,166 EST dataset for cassava (Manihot esculenta Crantz) enriched for drought-responsive genes.

    Science.gov (United States)

    Lokko, Y; Anderson, J V; Rudd, S; Raji, A; Horvath, D; Mikel, M A; Kim, R; Liu, L; Hernandez, A; Dixon, A G O; Ingelbrecht, I L

    2007-09-01

    Cassava (Manihot esculenta Crantz) is a staple food for over 600 million people in the tropics and subtropics and is increasingly used as an industrial crop for starch production. Cassava has a high growth rate under optimal conditions but also performs well in drought-prone areas and on marginal soils. To increase the tools for understanding and manipulating drought tolerance in cassava, we generated expressed sequence tags (ESTs) from normalized cDNA libraries prepared from dehydration-stressed and control well-watered tissues. Analysis of a total of 18,166 ESTs resulted in the identification of 8,577 unique gene clusters (5,383 singletons and 3,194 clusters). Functional categories could be assigned to 63% of the unigenes, while another approximately 11% were homologous to hypothetical genes with unclear functions. The remaining approximately 26% were not significantly homologous to sequences in public databases suggesting that some may be novel and putatively specific to cassava. The dehydration-stressed library uncovered numerous ESTs with recognized roles in drought-responses, including those that encode late-embryogenesis-abundant proteins thought to confer osmoprotective functions during water stress, transcription factors, heat-shock proteins as well as proteins involved in signal transduction and oxidative stress. The unigene clusters were screened for short tandem repeats for further development as microsatellite markers. A total of 592 clusters contained 646 repeats, representing 3.3% of the ESTs queried. The ESTs presented here are the first dehydration stress transcriptome of cassava and can be utilized for the development of microarrays and gene-derived molecular markers to further dissect the molecular basis of drought tolerance in cassava.

  9. The genome of the generalist plant pathogen Fusarium avenaceum is enriched with genes involved in redox, signaling and secondary metabolism.

    Directory of Open Access Journals (Sweden)

    Erik Lysøe

    Full Text Available Fusarium avenaceum is a fungus commonly isolated from soil and associated with a wide range of host plants. We present here three genome sequences of F. avenaceum, one isolated from barley in Finland and two from spring and winter wheat in Canada. The sizes of the three genomes range from 41.6-43.1 MB, with 13217-13445 predicted protein-coding genes. Whole-genome analysis showed that the three genomes are highly syntenic, and share>95% gene orthologs. Comparative analysis to other sequenced Fusaria shows that F. avenaceum has a very large potential for producing secondary metabolites, with between 75 and 80 key enzymes belonging to the polyketide, non-ribosomal peptide, terpene, alkaloid and indole-diterpene synthase classes. In addition to known metabolites from F. avenaceum, fuscofusarin and JM-47 were detected for the first time in this species. Many protein families are expanded in F. avenaceum, such as transcription factors, and proteins involved in redox reactions and signal transduction, suggesting evolutionary adaptation to a diverse and cosmopolitan ecology. We found that 20% of all predicted proteins were considered to be secreted, supporting a life in the extracellular space during interaction with plant hosts.

  10. The Genome of the Generalist Plant Pathogen Fusarium avenaceum Is Enriched with Genes Involved in Redox, Signaling and Secondary Metabolism

    Science.gov (United States)

    Lysøe, Erik; Harris, Linda J.; Walkowiak, Sean; Subramaniam, Rajagopal; Divon, Hege H.; Riiser, Even S.; Llorens, Carlos; Gabaldón, Toni; Kistler, H. Corby; Jonkers, Wilfried; Kolseth, Anna-Karin; Nielsen, Kristian F.; Thrane, Ulf; Frandsen, Rasmus J. N.

    2014-01-01

    Fusarium avenaceum is a fungus commonly isolated from soil and associated with a wide range of host plants. We present here three genome sequences of F. avenaceum, one isolated from barley in Finland and two from spring and winter wheat in Canada. The sizes of the three genomes range from 41.6–43.1 MB, with 13217–13445 predicted protein-coding genes. Whole-genome analysis showed that the three genomes are highly syntenic, and share>95% gene orthologs. Comparative analysis to other sequenced Fusaria shows that F. avenaceum has a very large potential for producing secondary metabolites, with between 75 and 80 key enzymes belonging to the polyketide, non-ribosomal peptide, terpene, alkaloid and indole-diterpene synthase classes. In addition to known metabolites from F. avenaceum, fuscofusarin and JM-47 were detected for the first time in this species. Many protein families are expanded in F. avenaceum, such as transcription factors, and proteins involved in redox reactions and signal transduction, suggesting evolutionary adaptation to a diverse and cosmopolitan ecology. We found that 20% of all predicted proteins were considered to be secreted, supporting a life in the extracellular space during interaction with plant hosts. PMID:25409087

  11. Correlation of a set of gene variants, life events and personality features on adult ADHD severity.

    Science.gov (United States)

    Müller, Daniel J; Chiesa, Alberto; Mandelli, Laura; De Luca, Vincenzo; De Ronchi, Diana; Jain, Umesh; Serretti, Alessandro; Kennedy, James L

    2010-07-01

    Increasing evidence suggests that symptoms of attention deficit hyperactivity disorder (ADHD) could persist into adult life in a substantial proportion of cases. The aim of the present study was to investigate the impact of (1) adverse events, (2) personality traits and (3) genetic variants chosen on the basis of previous findings and (4) their possible interactions on adult ADHD severity. One hundred and ten individuals diagnosed with adult ADHD were evaluated for occurrence of adverse events in childhood and adulthood, and personality traits by the Temperament and Character Inventory (TCI). Common polymorphisms within a set of nine important candidate genes (SLC6A3, DBH, DRD4, DRD5, HTR2A, CHRNA7, BDNF, PRKG1 and TAAR9) were genotyped for each subject. Life events, personality traits and genetic variations were analyzed in relationship to severity of current symptoms, according to the Brown Attention Deficit Disorder Scale (BADDS). Genetic variations were not significantly associated with severity of ADHD symptoms. Life stressors displayed only a minor effect as compared to personality traits. Indeed, symptoms' severity was significantly correlated with the temperamental trait of Harm avoidance and the character trait of Self directedness. The results of the present work are in line with previous evidence of a significant correlation between some personality traits and adult ADHD. However, several limitations such as the small sample size and the exclusion of patients with other severe comorbid psychiatric disorders could have influenced the significance of present findings.

  12. Gene set based integrated data analysis reveals phenotypic differences in a brain cancer model.

    Directory of Open Access Journals (Sweden)

    Kjell Petersen

    Full Text Available A key challenge in the data analysis of biological high-throughput experiments is to handle the often low number of samples in the experiments compared to the number of biomolecules that are simultaneously measured. Combining experimental data using independent technologies to illuminate the same biological trends, as well as complementing each other in a larger perspective, is one natural way to overcome this challenge. In this work we investigated if integrating proteomics and transcriptomics data from a brain cancer animal model using gene set based analysis methodology, could enhance the biological interpretation of the data relative to more traditional analysis of the two datasets individually. The brain cancer model used is based on serial passaging of transplanted human brain tumor material (glioblastoma--GBM through several generations in rats. These serial transplantations lead over time to genotypic and phenotypic changes in the tumors and represent a medically relevant model with a rare access to samples and where consequent analyses of individual datasets have revealed relatively few significant findings on their own. We found that the integrated analysis both performed better in terms of significance measure of its findings compared to individual analyses, as well as providing independent verification of the individual results. Thus a better context for overall biological interpretation of the data can be achieved.

  13. Mountain pine beetles colonizing historical and naive host trees are associated with a bacterial community highly enriched in genes contributing to terpene metabolism.

    Science.gov (United States)

    Adams, Aaron S; Aylward, Frank O; Adams, Sandye M; Erbilgin, Nadir; Aukema, Brian H; Currie, Cameron R; Suen, Garret; Raffa, Kenneth F

    2013-06-01

    The mountain pine beetle, Dendroctonus ponderosae, is a subcortical herbivore native to western North America that can kill healthy conifers by overcoming host tree defenses, which consist largely of high terpene concentrations. The mechanisms by which these beetles contend with toxic compounds are not well understood. Here, we explore a component of the hypothesis that beetle-associated bacterial symbionts contribute to the ability of D. ponderosae to overcome tree defenses by assisting with terpene detoxification. Such symbionts may facilitate host tree transitions during range expansions currently being driven by climate change. For example, this insect has recently breached the historical geophysical barrier of the Canadian Rocky Mountains, providing access to näive tree hosts and unprecedented connectivity to eastern forests. We use culture-independent techniques to describe the bacterial community associated with D. ponderosae beetles and their galleries from their historical host, Pinus contorta, and their more recent host, hybrid P. contorta-Pinus banksiana. We show that these communities are enriched with genes involved in terpene degradation compared with other plant biomass-processing microbial communities. These pine beetle microbial communities are dominated by members of the genera Pseudomonas, Rahnella, Serratia, and Burkholderia, and the majority of genes involved in terpene degradation belong to these genera. Our work provides the first metagenome of bacterial communities associated with a bark beetle and is consistent with a potential microbial contribution to detoxification of tree defenses needed to survive the subcortical environment.

  14. Impact of date palm fruits extracts and probiotic enriched diet on antioxidant status, innate immune response and immune-related gene expression of European seabass (Dicentrarchus labrax).

    Science.gov (United States)

    Guardiola, F A; Porcino, C; Cerezuela, R; Cuesta, A; Faggio, C; Esteban, M A

    2016-05-01

    The application of additives in the diet as plants or extracts of plants as natural and innocuous compounds has potential in aquaculture as an alternative to antibiotics and immunoprophylactics. The aim of the current study was to evaluate the potential effects of dietary supplementation of date palm fruit extracts alone or in combination with Pdp11 probiotic on serum antioxidant status, on the humoral and cellular innate immune status, as well as, on the expression levels of some immune-related genes in head-kidney and gut of European sea bass (Dicentrarchus labrax) after 2 and 4 weeks of administration. This study showed for the first time in European sea bass an immunostimulation in several of the parameters evaluated in fish fed with date palm fruits extracts enriched diet or fed with this substance in combination with Pdp 11 probiotic, mainly after 4 weeks of treatment. In the same way, dietary supplementation of mixture diet has positive effects on the expression levels of immune-related genes, chiefly in head-kidney of Dicentrarchus labrax. Therefore, the combination of both could be considered of great interest as potential additives for farmed fish.

  15. Performance of single and concatenated sets of mitochondrial genes at inferring metazoan relationships relative to full mitogenome data.

    Directory of Open Access Journals (Sweden)

    Justin C Havird

    Full Text Available Mitochondrial (mt genes are some of the most popular and widely-utilized genetic loci in phylogenetic studies of metazoan taxa. However, their linked nature has raised questions on whether using the entire mitogenome for phylogenetics is overkill (at best or pseudoreplication (at worst. Moreover, no studies have addressed the comparative phylogenetic utility of mitochondrial genes across individual lineages within the entire Metazoa. To comment on the phylogenetic utility of individual mt genes as well as concatenated subsets of genes, we analyzed mitogenomic data from 1865 metazoan taxa in 372 separate lineages spanning genera to subphyla. Specifically, phylogenies inferred from these datasets were statistically compared to ones generated from all 13 mt protein-coding (PC genes (i.e., the "supergene" set to determine which single genes performed "best" at, and the minimum number of genes required to, recover the "supergene" topology. Surprisingly, the popular marker COX1 performed poorest, while ND5, ND4, and ND2 were most likely to reproduce the "supergene" topology. Averaged across all lineages, the longest ∼2 mt PC genes were sufficient to recreate the "supergene" topology, although this average increased to ∼5 genes for datasets with 40 or more taxa. Furthermore, concatenation of the three "best" performing mt PC genes outperformed that of the three longest mt PC genes (i.e, ND5, COX1, and ND4. Taken together, while not all mt PC genes are equally interchangeable in phylogenetic studies of the metazoans, some subset can serve as a proxy for the 13 mt PC genes. However, the exact number and identity of these genes is specific to the lineage in question and cannot be applied indiscriminately across the Metazoa.

  16. A set of vectors for introduction of antibiotic resistance genes by in vitro Cre-mediated recombination

    OpenAIRE

    Vassetzky Yegor S; Dmitriev Petr V

    2008-01-01

    Abstract Background Introduction of new antibiotic resistance genes in the plasmids of interest is a frequent task in molecular cloning practice. Classical approaches involving digestion with restriction endonucleases and ligation are time-consuming. Findings We have created a set of insertion vectors (pINS) carrying genes that provide resistance to various antibiotics (puromycin, blasticidin and G418) and containing a loxP site. Each vector (pINS-Puro, pINS-Blast or pINS-Neo) contains either...

  17. Robust de novo pathway enrichment with KeyPathwayMiner 5

    DEFF Research Database (Denmark)

    Alcaraz, Nicolas; List, Markus; Dissing-Hansen, Martin

    2016-01-01

    Identifying functional modules or novel active pathways, recently termed de novo pathway enrichment, is a computational systems biology challenge that has gained much attention during the last decade. Given a large biological interaction network, KeyPathwayMiner extracts connected subnetworks...... several network perturbation techniques and over a range of perturbation degrees. In addition, users may now provide a gold-standard set to determine how enriched extracted pathways are with relevant genes compared to randomized versions of the original network....

  18. ATRX binds to atypical chromatin domains at the 3' exons of zinc finger genes to preserve H3K9me3 enrichment.

    Science.gov (United States)

    Valle-García, David; Qadeer, Zulekha A; McHugh, Domhnall S; Ghiraldini, Flávia G; Chowdhury, Asif H; Hasson, Dan; Dyer, Michael A; Recillas-Targa, Félix; Bernstein, Emily

    2016-06-02

    ATRX is a SWI/SNF chromatin remodeler proposed to govern genomic stability through the regulation of repetitive sequences, such as rDNA, retrotransposons, and pericentromeric and telomeric repeats. However, few direct ATRX target genes have been identified and high-throughput genomic approaches are currently lacking for ATRX. Here we present a comprehensive ChIP-sequencing study of ATRX in multiple human cell lines, in which we identify the 3' exons of zinc finger genes (ZNFs) as a new class of ATRX targets. These 3' exonic regions encode the zinc finger motifs, which can range from 1-40 copies per ZNF gene and share large stretches of sequence similarity. These regions often contain an atypical chromatin signature: they are transcriptionally active, contain high levels of H3K36me3, and are paradoxically enriched in H3K9me3. We find that these ZNF 3' exons are co-occupied by SETDB1, TRIM28, and ZNF274, which form a complex with ATRX. CRISPR/Cas9-mediated loss-of-function studies demonstrate (i) a reduction of H3K9me3 at the ZNF 3' exons in the absence of ATRX and ZNF274 and, (ii) H3K9me3 levels at atypical chromatin regions are particularly sensitive to ATRX loss compared to other H3K9me3-occupied regions. As a consequence of ATRX or ZNF274 depletion, cells with reduced levels of H3K9me3 show increased levels of DNA damage, suggesting that ATRX binds to the 3' exons of ZNFs to maintain their genomic stability through preservation of H3K9me3.

  19. Effects of Cr(III) and CR(VI) on nitrification inhibition as determined by SOUR, function-specific gene expression and 16S rRNA sequence analysis of wastewater nitrifying enrichments

    Science.gov (United States)

    The effect of Cr(III) and Cr(VI) on ammonia oxidation, the transcriptional responses of functional genes involved in nitrification and changes in 16S rRNA level sequences were examined in nitrifying enrichment cultures. The nitrifying bioreactor was operated as a continuous react...

  20. Pilot Sequencing of Onion Genomic DNA Reveals Fragments of Transposable Elements, Low Gene Densities, and Significant Gene Enrichment After Methyl Filtration

    Science.gov (United States)

    Onion (Allium cepa) is a diploid (2n=2x=16) monocot with one of the largest nuclear genomes among cultivated plants, over 6 and 16 times that of maize and rice, respectively. In this study, we sequenced onion BACs to estimate gene densities and investigate the nature and distribution of repetitive ...

  1. The development of an efficient multipurpose bean pod mottle virus viral vector set for foreign gene expression and RNA silencing.

    Science.gov (United States)

    Zhang, Chunquan; Bradshaw, Jeffrey D; Whitham, Steven A; Hill, John H

    2010-05-01

    Plant viral vectors are valuable tools for heterologous gene expression, and because of virus-induced gene silencing (VIGS), they also have important applications as reverse genetics tools for gene function studies. Viral vectors are especially useful for plants such as soybean (Glycine max) that are recalcitrant to transformation. Previously, two generations of bean pod mottle virus (BPMV; genus Comovirus) vectors have been developed for overexpressing and silencing genes in soybean. However, the design of the previous vectors imposes constraints that limit their utility. For example, VIGS target sequences must be expressed as fusion proteins in the same reading frame as the viral polyprotein. This requirement limits the design of VIGS target sequences to open reading frames. Furthermore, expression of multiple genes or simultaneous silencing of one gene and expression of another was not possible. To overcome these and other issues, a new BPMV-based vector system was developed to facilitate a variety of applications for gene function studies in soybean as well as in common bean (Phaseolus vulgaris). These vectors are designed for simultaneous expression of multiple foreign genes, insertion of noncoding/antisense sequences, and simultaneous expression and silencing. The simultaneous expression of green fluorescent protein and silencing of phytoene desaturase shows that marker gene-assisted silencing is feasible. These results demonstrate the utility of this BPMV vector set for a wide range of applications in soybean and common bean, and they have implications for improvement of other plant virus-based vector systems.

  2. Genomic determinants of sporulation in Bacilli and Clostridia: towards the minimal set of sporulation-specific genes.

    Science.gov (United States)

    Galperin, Michael Y; Mekhedov, Sergei L; Puigbo, Pere; Smirnov, Sergey; Wolf, Yuri I; Rigden, Daniel J

    2012-11-01

    Three classes of low-G+C Gram-positive bacteria (Firmicutes), Bacilli, Clostridia and Negativicutes, include numerous members that are capable of producing heat-resistant endospores. Spore-forming firmicutes include many environmentally important organisms, such as insect pathogens and cellulose-degrading industrial strains, as well as human pathogens responsible for such diseases as anthrax, botulism, gas gangrene and tetanus. In the best-studied model organism Bacillus subtilis, sporulation involves over 500 genes, many of which are conserved among other bacilli and clostridia. This work aimed to define the genomic requirements for sporulation through an analysis of the presence of sporulation genes in various firmicutes, including those with smaller genomes than B. subtilis. Cultivable spore-formers were found to have genomes larger than 2300 kb and encompass over 2150 protein-coding genes of which 60 are orthologues of genes that are apparently essential for sporulation in B. subtilis. Clostridial spore-formers lack, among others, spoIIB, sda, spoVID and safA genes and have non-orthologous displacements of spoIIQ and spoIVFA, suggesting substantial differences between bacilli and clostridia in the engulfment and spore coat formation steps. Many B. subtilis sporulation genes, particularly those encoding small acid-soluble spore proteins and spore coat proteins, were found only in the family Bacillaceae, or even in a subset of Bacillus spp. Phylogenetic profiles of sporulation genes, compiled in this work, confirm the presence of a common sporulation gene core, but also illuminate the diversity of the sporulation processes within various lineages. These profiles should help further experimental studies of uncharacterized widespread sporulation genes, which would ultimately allow delineation of the minimal set(s) of sporulation-specific genes in Bacilli and Clostridia. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.

  3. Cyclin T1-dependent genes in activated CD4 T and macrophage cell lines appear enriched in HIV-1 co-factors.

    Directory of Open Access Journals (Sweden)

    Wendong Yu

    Full Text Available HIV-1 is dependent upon cellular co-factors to mediate its replication cycle in CD4(+ T cells and macrophages, the two major cell types infected by the virus in vivo. One critical co-factor is Cyclin T1, a subunit of a general RNA polymerase II elongation factor known as P-TEFb. Cyclin T1 is targeted directly by the viral Tat protein to activate proviral transcription. Cyclin T1 is up-regulated when resting CD4(+ T cells are activated and during macrophage differentiation or activation, conditions that are also necessary for high levels of HIV-1 replication. Because Cyclin T1 is a subunit of a transcription factor, the up-regulation of Cyclin T1 in these cells results in the induction of cellular genes, some of which might be HIV-1 co-factors. Using shRNA depletions of Cyclin T1 and transcriptional profiling, we identified 54 cellular mRNAs that appear to be Cyclin T1-dependent for their induction in activated CD4(+ T Jurkat T cells and during differentiation and activation of MM6 cells, a human monocytic cell line. The promoters for these Cyclin T1-dependent genes (CTDGs are over-represented in two transcription factor binding sites, SREBP1 and ARP1. Notably, 10 of these CTDGs have been reported to be involved in HIV-1 replication, a significant over-representation of such genes when compared to randomly generated lists of 54 genes (p value<0.00021. The results of siRNA depletion and dominant-negative protein experiments with two CTDGs identified here, CDK11 and Casein kinase 1 gamma 1, suggest that these genes are involved either directly or indirectly in HIV-1 replication. It is likely that the 54 CTDGs identified here include novel HIV-1 co-factors. The presence of CTDGs in the protein space that was available for HIV-1 to sample during its evolution and acquisition of Tat function may provide an explanation for why CTDGs are enriched in viral co-factors.

  4. Dominant effects of the Huntington's disease HTT CAG repeat length are captured in gene-expression data sets by a continuous analysis mathematical modeling strategy.

    Science.gov (United States)

    Lee, Jong-Min; Galkina, Ekaterina I; Levantovsky, Rachel M; Fossale, Elisa; Anne Anderson, Mary; Gillis, Tammy; Srinidhi Mysore, Jayalakshmi; Coser, Kathryn R; Shioda, Toshi; Zhang, Bin; Furia, Matthew D; Derry, Jonathan; Kohane, Isaac S; Seong, Ihn Sik; Wheeler, Vanessa C; Gusella, James F; MacDonald, Marcy E

    2013-08-15

    In Huntington's disease (HD), the size of the expanded HTT CAG repeat mutation is the primary driver of the processes that determine age at onset of motor symptoms. However, correlation of cellular biochemical parameters also extends across the normal repeat range, supporting the view that the CAG repeat represents a functional polymorphism with dominant effects determined by the longer allele. A central challenge to defining the functional consequences of this single polymorphism is the difficulty of distinguishing its subtle effects from the multitude of other sources of biological variation. We demonstrate that an analytical approach based upon continuous correlation with CAG size was able to capture the modest (∼21%) contribution of the repeat to the variation in genome-wide gene expression in 107 lymphoblastoid cell lines, with alleles ranging from 15 to 92 CAGs. Furthermore, a mathematical model from an iterative strategy yielded predicted CAG repeat lengths that were significantly positively correlated with true CAG allele size and negatively correlated with age at onset of motor symptoms. Genes negatively correlated with repeat size were also enriched in a set of genes whose expression were CAG-correlated in human HD cerebellum. These findings both reveal the relatively small, but detectable impact of variation in the CAG allele in global data in these peripheral cells and provide a strategy for building multi-dimensional data-driven models of the biological network that drives the HD disease process by continuous analysis across allelic panels of neuronal cells vulnerable to the dominant effects of the HTT CAG repeat.

  5. Validation of a set of reference genes to study response to herbicide stress in grasses

    OpenAIRE

    Petit Cécile; Pernin Fanny; Heydel Jean-Marie; Délye Christophe

    2012-01-01

    Abstract Background Non-target-site based resistance to herbicides is a major threat to the chemical control of agronomically noxious weeds. This adaptive trait is endowed by differences in the expression of a number of genes in plants that are resistant or sensitive to herbicides. Quantification of the expression of such genes requires normalising qPCR data using reference genes with stable expression in the system studied as internal standards. The aim of this study was to validate referenc...

  6. Identification of self-consistent modulons from bacterial microarray expression data with the help of structured regulon gene sets

    KAUST Repository

    Permina, Elizaveta A.

    2013-01-01

    Identification of bacterial modulons from series of gene expression measurements on microarrays is a principal problem, especially relevant for inadequately studied but practically important species. Usage of a priori information on regulatory interactions helps to evaluate parameters for regulatory subnetwork inference. We suggest a procedure for modulon construction where a seed regulon is iteratively updated with genes having expression patterns similar to those for regulon member genes. A set of genes essential for a regulon is used to control modulon updating. Essential genes for a regulon were selected as a subset of regulon genes highly related by different measures to each other. Using Escherichia coli as a model, we studied how modulon identification depends on the data, including the microarray experiments set, the adopted relevance measure and the regulon itself. We have found that results of modulon identification are highly dependent on all parameters studied and thus the resulting modulon varies substantially depending on the identification procedure. Yet, modulons that were identified correctly displayed higher stability during iterations, which allows developing a procedure for reliable modulon identification in the case of less studied species where the known regulatory interactions are sparse. Copyright © 2013 Taylor & Francis.

  7. Platform dependence of inference on gene-wise and gene-set involvement in human lung development

    Directory of Open Access Journals (Sweden)

    Kho Alvin T

    2009-06-01

    Full Text Available Abstract Background With the recent development of microarray technologies, the comparability of gene expression data obtained from different platforms poses an important problem. We evaluated two widely used platforms, Affymetrix U133 Plus 2.0 and the Illumina HumanRef-8 v2 Expression Bead Chips, for comparability in a biological system in which changes may be subtle, namely fetal lung tissue as a function of gestational age. Results We performed the comparison via sequence-based probe matching between the two platforms. "Significance grouping" was defined as a measure of comparability. Using both expression correlation and significance grouping as measures of comparability, we demonstrated that despite overall cross-platform differences at the single gene level, increased correlation between the two platforms was found in genes with higher expression level, higher probe overlap, and lower p-value. We also demonstrated that biological function as determined via KEGG pathways or GO categories is more consistent across platforms than single gene analysis. Conclusion We conclude that while the comparability of the platforms at the single gene level may be increased by increasing sample size, they are highly comparable ontologically even for subtle differences in a relatively small sample size. Biologically relevant inference should therefore be reproducible across laboratories using different platforms.

  8. Effects of Long-Term Environmental Enrichment on Anxiety, Memory, Hippocampal Plasticity and Overall Brain Gene Expression in C57BL6 Mice

    Science.gov (United States)

    Hüttenrauch, Melanie; Salinas, Gabriela; Wirths, Oliver

    2016-01-01

    There is ample evidence that physical activity exerts positive effects on a variety of brain functions by facilitating neuroprotective processes and influencing neuroplasticity. Accordingly, numerous studies have shown that continuous exercise can successfully diminish or prevent the pathology of neurodegenerative diseases such as Alzheimer’s disease in transgenic mouse models. However, the long-term effect of physical activity on brain health of aging wild-type (WT) mice has not yet been studied in detail. Here, we show that prolonged physical and cognitive stimulation, mediated by an enriched environment (EE) paradigm for a duration of 11 months, leads to reduced anxiety and improved spatial reference memory in C57BL6 WT mice. While the number of CA1 pyramidal neurons remained unchanged between standard housed (SH) and EE mice, the number of dentate gyrus (DG) neurons, as well as the CA1 and DG volume were significantly increased in EE mice. A whole-brain deep sequencing transcriptome analysis, carried out to better understand the molecular mechanisms underlying the observed effects, revealed an up-regulation of a variety of genes upon EE, mainly associated with synaptic plasticity and transcription regulation. The present findings corroborate the impact of continuous physical activity as a potential prospective route in the prevention of age-related cognitive decline and neurodegenerative disorders. PMID:27536216

  9. Identification of compound-protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds.

    Science.gov (United States)

    Chen, Lei; Zhang, Yu-Hang; Zheng, Mingyue; Huang, Tao; Cai, Yu-Dong

    2016-12-01

    Compound-protein interactions play important roles in every cell via the recognition and regulation of specific functional proteins. The correct identification of compound-protein interactions can lead to a good comprehension of this complicated system and provide useful input for the investigation of various attributes of compounds and proteins. In this study, we attempted to understand this system by extracting properties from both proteins and compounds, in which proteins were represented by gene ontology and KEGG pathway enrichment scores and compounds were represented by molecular fragments. Advanced feature selection methods, including minimum redundancy maximum relevance, incremental feature selection, and the basic machine learning algorithm random forest, were used to analyze these properties and extract core factors for the determination of actual compound-protein interactions. Compound-protein interactions reported in The Binding Databases were used as positive samples. To improve the reliability of the results, the analytic procedure was executed five times using different negative samples. Simultaneously, five optimal prediction methods based on a random forest and yielding maximum MCCs of approximately 77.55 % were constructed and may be useful tools for the prediction of compound-protein interactions. This work provides new clues to understanding the system of compound-protein interactions by analyzing extracted core features. Our results indicate that compound-protein interactions are related to biological processes involving immune, developmental and hormone-associated pathways.

  10. Effects of long-term environmental enrichment on anxiety, memory, hippocampal plasticity and overall brain gene expression in C57BL6 mice

    Directory of Open Access Journals (Sweden)

    Melanie Hüttenrauch

    2016-08-01

    Full Text Available There is ample evidence that physical activity exerts positive effects on a variety of brain functions by facilitating neuroprotective processes and influencing neuroplasticity. Accordingly, numerous studies have shown that continuous exercise can successfully diminish or prevent the pathology of neurodegenerative diseases such as Alzheimer’s disease in transgenic mouse models. However, the long-term effect of physical activity on brain health of aging WT mice has not been studied in detail yet. Here, we show that prolonged physical and cognitive stimulation, mediated by an enriched environment (EE paradigm for a duration of eleven months, leads to reduced anxiety and improved spatial reference memory in C57BL6 wildtype (WT mice. While the number of CA1 pyramidal neurons remained unchanged between standard housed (SH and EE mice, the number of dentate gyrus (DG neurons, as well as the CA1 and DG volume were significantly increased in EE mice. A whole-brain deep sequencing transcriptome analysis, carried out to better understand the molecular mechanisms underlying the observed effects, revealed an up-regulation of a variety of genes upon EE, mainly associated with synaptic plasticity and transcription regulation. The present findings corroborate the impact of continuous physical activity as a potential prospective route in the prevention of age-related cognitive decline and neurodegenerative disorders.

  11. Liver-Enriched Gene 1, a Glycosylated Secretory Protein, Binds to FGFR and Mediates an Anti-stress Pathway to Protect Liver Development in Zebrafish.

    Directory of Open Access Journals (Sweden)

    Minjie Hu

    2016-02-01

    Full Text Available Unlike mammals and birds, teleost fish undergo external embryogenesis, and therefore their embryos are constantly challenged by stresses from their living environment. These stresses, when becoming too harsh, will cause arrest of cell proliferation, abnormal cell death or senescence. Such organisms have to evolve a sophisticated anti-stress mechanism to protect the process of embryogenesis/organogenesis. However, very few signaling molecule(s mediating such activity have been identified. liver-enriched gene 1 (leg1 is an uncharacterized gene that encodes a novel secretory protein containing a single domain DUF781 (domain of unknown function 781 that is well conserved in vertebrates. In the zebrafish genome, there are two copies of leg1, namely leg1a and leg1b. leg1a and leg1b are closely linked on chromosome 20 and share high homology, but are differentially expressed. In this report, we generated two leg1a mutant alleles using the TALEN technique, then characterized liver development in the mutants. We show that a leg1a mutant exhibits a stress-dependent small liver phenotype that can be prevented by chemicals blocking the production of reactive oxygen species. Further studies reveal that Leg1a binds to FGFR3 and mediates a novel anti-stress pathway to protect liver development through enhancing Erk activity. More importantly, we show that the binding of Leg1a to FGFR relies on the glycosylation at the 70th asparagine (Asn(70 or N(70, and mutating the Asn(70 to Ala(70 compromised Leg1's function in liver development. Therefore, Leg1 plays a unique role in protecting liver development under different stress conditions by serving as a secreted signaling molecule/modulator.

  12. Automated Enrichment, Transduction, and Expansion of Clinical-Scale CD62L+ T Cells for Manufacturing of Gene Therapy Medicinal Products

    Science.gov (United States)

    Priesner, Christoph; Aleksandrova, Krasimira; Esser, Ruth; Mockel-Tenbrinck, Nadine; Leise, Jana; Drechsel, Katharina; Marburger, Michael; Quaiser, Andrea; Goudeva, Lilia; Arseniev, Lubomir; Kaiser, Andrew D.; Glienke, Wolfgang; Koehl, Ulrike

    2016-01-01

    Multiple clinical studies have demonstrated that adaptive immunotherapy using redirected T cells against advanced cancer has led to promising results with improved patient survival. The continuously increasing interest in those advanced gene therapy medicinal products (GTMPs) leads to a manufacturing challenge regarding automation, process robustness, and cell storage. Therefore, this study addresses the proof of principle in clinical-scale selection, stimulation, transduction, and expansion of T cells using the automated closed CliniMACS® Prodigy system. Naïve and central memory T cells from apheresis products were first immunomagnetically enriched using anti-CD62L magnetic beads and further processed freshly (n = 3) or split for cryopreservation and processed after thawing (n = 1). Starting with 0.5 × 108 purified CD3+ T cells, three mock runs and one run including transduction with green fluorescent protein (GFP)-containing vector resulted in a median final cell product of 16 × 108 T cells (32-fold expansion) up to harvesting after 2 weeks. Expression of CD62L was downregulated on T cells after thawing, which led to the decision to purify CD62L+CD3+ T cells freshly with cryopreservation thereafter. Most important in the split product, a very similar expansion curve was reached comparing the overall freshly CD62L selected cells with those after thawing, which could be demonstrated in the T cell subpopulations as well by showing a nearly identical conversion of the CD4/CD8 ratio. In the GFP run, the transduction efficacy was 83%. In-process control also demonstrated sufficient glucose levels during automated feeding and medium removal. The robustness of the process and the constant quality of the final product in a closed and automated system give rise to improve harmonized manufacturing protocols for engineered T cells in future gene therapy studies. PMID:27562135

  13. Different gene sets contribute to different symptom dimensions of depression and anxiety

    NARCIS (Netherlands)

    van Veen, Tineke; Goeman, Jelle J.; Monajemi, Ramin; Wardenaar, Klaas J.; Hartman, Catharina A.; Snieder, Harold; Nolte, Ilja M.; Penninx, Brenda W. J. H.; Zitman, Frans G.

    2012-01-01

    Although many genetic association studies have been carried out, it remains unclear which genes contribute to depression. This may be due to heterogeneity of the DSM-IV category of depression. Specific symptom-dimensions provide a more homogenous phenotype. Furthermore, as effects of individual gene

  14. A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments.

    Science.gov (United States)

    Broët, Philippe; Lewin, Alex; Richardson, Sylvia; Dalmasso, Cyril; Magdelenat, Henri

    2004-11-01

    Multiclass response (MCR) experiments are those in which there are more than two classes to be compared. In these experiments, though the null hypothesis is simple, there are typically many patterns of gene expression changes across the different classes that led to complex alternatives. In this paper, we propose a new strategy for selecting genes in MCR that is based on a flexible mixture model for the marginal distribution of a modified F-statistic. Using this model, false positive and negative discovery rates can be estimated and combined to produce a rule for selecting a subset of genes. Moreover, the method proposed allows calculation of these rates for any predefined subset of genes. We illustrate the performance our approach using simulated datasets and a real breast cancer microarray dataset. In this latter study, we investigate predefined subset of genes and point out interesting differences between three distinct biological pathways. http://www.bgx.org.uk/software.html

  15. Identification of candidate genes for prostate cancer-risk SNPs utilizing a normal prostate tissue eQTL data set

    Science.gov (United States)

    Thibodeau, S. N.; French, A. J.; McDonnell, S. K.; Cheville, J.; Middha, S.; Tillmans, L.; Riska, S.; Baheti, S.; Larson, M. C.; Fogarty, Z.; Zhang, Y.; Larson, N.; Nair, A.; O'Brien, D.; Wang, L.; Schaid, D J.

    2015-01-01

    Multiple studies have identified loci associated with the risk of developing prostate cancer but the associated genes are not well studied. Here we create a normal prostate tissue-specific eQTL data set and apply this data set to previously identified prostate cancer (PrCa)-risk SNPs in an effort to identify candidate target genes. The eQTL data set is constructed by the genotyping and RNA sequencing of 471 samples. We focus on 146 PrCa-risk SNPs, including all SNPs in linkage disequilibrium with each risk SNP, resulting in 100 unique risk intervals. We analyse cis-acting associations where the transcript is located within 2 Mb (±1 Mb) of the risk SNP interval. Of all SNP–gene combinations tested, 41.7% of SNPs demonstrate a significant eQTL signal after adjustment for sample histology and 14 expression principal component covariates. Of the 100 PrCa-risk intervals, 51 have a significant eQTL signal and these are associated with 88 genes. This study provides a rich resource to study biological mechanisms underlying genetic risk to PrCa. PMID:26611117

  16. Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data.

    Science.gov (United States)

    Ganesh Kumar, Pugalendhi; Kavitha, Muthu Subash; Ahn, Byeong-Cheol

    2016-01-01

    This study describes a novel approach to reducing the challenges of highly nonlinear multiclass gene expression values for cancer diagnosis. To build a fruitful system for cancer diagnosis, in this study, we introduced two levels of gene selection such as filtering and embedding for selection of potential genes and the most relevant genes associated with cancer, respectively. The filter procedure was implemented by developing a fuzzy rough set (FR)-based method for redefining the criterion function of f-information (FI) to identify the potential genes without discretizing the continuous gene expression values. The embedded procedure is implemented by means of a water swirl algorithm (WSA), which attempts to optimize the rule set and membership function required to classify samples using a fuzzy-rule-based multiclassification system (FRBMS). Two novel update equations are proposed in WSA, which have better exploration and exploitation abilities while designing a self-learning FRBMS. The efficiency of our new approach was evaluated on 13 multicategory and 9 binary datasets of cancer gene expression. Additionally, the performance of the proposed FRFI-WSA method in designing an FRBMS was compared with existing methods for gene selection and optimization such as genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony algorithm (ABC) on all the datasets. In the global cancer map with repeated measurements (GCM_RM) dataset, the FRFI-WSA showed the smallest number of 16 most relevant genes associated with cancer using a minimal number of 26 compact rules with the highest classification accuracy (96.45%). In addition, the statistical validation used in this study revealed that the biological relevance of the most relevant genes associated with cancer and their linguistics detected by the proposed FRFI-WSA approach are better than those in the other methods. The simple interpretable rules with most relevant genes and effectively classified

  17. Discovery of gene-gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium.

    Science.gov (United States)

    Hohman, Timothy J; Bush, William S; Jiang, Lan; Brown-Gentry, Kristin D; Torstenson, Eric S; Dudek, Scott M; Mukherjee, Shubhabrata; Naj, Adam; Kunkle, Brian W; Ritchie, Marylyn D; Martin, Eden R; Schellenberg, Gerard D; Mayeux, Richard; Farrer, Lindsay A; Pericak-Vance, Margaret A; Haines, Jonathan L; Thornton-Wells, Tricia A

    2016-02-01

    Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis.

  18. Candidate genes for chronic obstructive pulmonary disease in two large data sets

    DEFF Research Database (Denmark)

    Bakke, P S; Zhu, G; Gulsvik, A;

    2011-01-01

    to these phenotypes in this first study were tested in a second, family based, study that included 635 pedigrees with 1910 individuals. Significant associations to the binary COPD phenotype in both populations were seen for STAT1 (rs13010343) and NFKBIB/SIRT2 (rs2241704) (p... of the GC gene were significantly associated with FEV1 in percent predicted and FEV1/FVC, respectively in both populations (pSIRT2, and GC genes in two independent populations, the associations of the former two genes...

  19. Implementation of BacMam virus gene delivery technology in a drug discovery setting.

    Science.gov (United States)

    Kost, Thomas A; Condreay, J Patrick; Ames, Robert S; Rees, Stephen; Romanos, Michael A

    2007-05-01

    Membrane protein targets constitute a key segment of drug discovery portfolios and significant effort has gone into increasing the speed and efficiency of pursuing these targets. However, issues still exist in routine gene expression and stable cell-based assay development for membrane proteins, which are often multimeric or toxic to host cells. To enhance cell-based assay capabilities, modified baculovirus (BacMam virus) gene delivery technology has been successfully applied to the transient expression of target proteins in mammalian cells. Here, we review the development, full implementation and benefits of this platform-based gene expression technology in support of SAR and HTS assays across GlaxoSmithKline.

  20. Identification and Validation of a New Set of Five Genes for Prediction of Risk in Early Breast Cancer

    Directory of Open Access Journals (Sweden)

    Giorgio Mustacchi

    2013-05-01

    Full Text Available Molecular tests predicting the outcome of breast cancer patients based on gene expression levels can be used to assist in making treatment decisions after consideration of conventional markers. In this study we identified a subset of 20 mRNA differentially regulated in breast cancer analyzing several publicly available array gene expression data using R/Bioconductor package. Using RTqPCR we evaluate 261 consecutive invasive breast cancer cases not selected for age, adjuvant treatment, nodal and estrogen receptor status from paraffin embedded sections. The biological samples dataset was split into a training (137 cases and a validation set (124 cases. The gene signature was developed on the training set and a multivariate stepwise Cox analysis selected five genes independently associated with DFS: FGF18 (HR = 1.13, p = 0.05, BCL2 (HR = 0.57, p = 0.001, PRC1 (HR = 1.51, p = 0.001, MMP9 (HR = 1.11, p = 0.08, SERF1a (HR = 0.83, p = 0.007. These five genes were combined into a linear score (signature weighted according to the coefficients of the Cox model, as: 0.125FGF18 − 0.560BCL2 + 0.409PRC1 + 0.104MMP9 − 0.188SERF1A (HR = 2.7, 95% CI = 1.9–4.0, p < 0.001. The signature was then evaluated on the validation set assessing the discrimination ability by a Kaplan Meier analysis, using the same cut offs classifying patients at low, intermediate or high risk of disease relapse as defined on the training set (p < 0.001. Our signature, after a further clinical validation, could be proposed as prognostic signature for disease free survival in breast cancer patients where the indication for adjuvant chemotherapy added to endocrine treatment is uncertain.

  1. Candidate genes for chronic obstructive pulmonary disease in two large data sets

    DEFF Research Database (Denmark)

    Bakke, P S; Zhu, G; Gulsvik, A

    2011-01-01

    Lack of reproducibility of findings has been a criticism of genetic association studies in complex diseases like chronic obstructive pulmonary disease (COPD). We selected 257 polymorphisms of 16 genes with reported or potential relationshipsto COPD and genotyped these variants in a case-control s......Lack of reproducibility of findings has been a criticism of genetic association studies in complex diseases like chronic obstructive pulmonary disease (COPD). We selected 257 polymorphisms of 16 genes with reported or potential relationshipsto COPD and genotyped these variants in a case...... of the GC gene were significantly associated with FEV1 in percent predicted and FEV1/FVC, respectively in both populations (pgenes in two independent populations, the associations of the former two genes...

  2. Genome-wide survey and developmental expression mapping of zebrafish SET domain-containing genes

    National Research Council Canada - National Science Library

    Sun, Xiao-Jian; Xu, Peng-Fei; Zhou, Ting; Hu, Ming; Fu, Chun-Tang; Zhang, Yong; Jin, Yi; Chen, Yi; Chen, Sai-Juan; Huang, Qiu-Hua; Liu, Ting Xi; Chen, Zhu

    2008-01-01

    .... Since some of these genes have been revealed to be essential for embryonic development, we propose that the zebrafish, a vertebrate model organism possessing many advantages for developmental studies...

  3. Determining Semantically Related Significant Genes.

    Science.gov (United States)

    Taha, Kamal

    2014-01-01

    GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.

  4. Different gene sets contribute to different symptom dimensions of depression and anxiety

    OpenAIRE

    van Veen, Tineke; Goeman, Jelle J.; Monajemi, Ramin; Wardenaar, Klaas J; Hartman, Catharina A; Snieder, Harold; Nolte, Ilja M; Penninx, Brenda W. J. H.; Zitman, Frans G.

    2012-01-01

    Although many genetic association studies have been carried out, it remains unclear which genes contribute to depression. This may be due to heterogeneity of the DSM-IV category of depression. Specific symptom-dimensions provide a more homogenous phenotype. Furthermore, as effects of individual genes are small, analysis of genetic data at the pathway-level provides more power to detect associations and yield valuable biological insight. In 1,398 individuals with a Major Depressive Disorder, t...

  5. Prioritizing predicted cis-regulatory elements for co-expressed gene sets based on Lasso regression models.

    Science.gov (United States)

    Hu, Hong; Roqueiro, Damian; Dai, Yang

    2011-01-01

    Computational prediction of cis-regulatory elements for a set of co-expressed genes based on sequence analysis provides an overwhelming volume of potential transcription factor binding sites. It presents a challenge to prioritize transcription factors for regulatory functional studies. A novel approach based on the use of Lasso regression models is proposed to address this problem. We examine the ability of the Lasso model using time-course microarray data obtained from a comprehensive study of gene expression profiles in skin and mucosal wounds in mouse over all stages of wound healing.

  6. CLEAN: CLustering Enrichment ANalysis

    Directory of Open Access Journals (Sweden)

    Medvedovic Mario

    2009-07-01

    Full Text Available Abstract Background Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. Results We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score. The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at http://Clusteranalysis.org. The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView. Conclusion Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the

  7. A two-sample test for high-dimensional data with applications to gene-set testing

    CERN Document Server

    Chen, Song Xi; 10.1214/09-AOS716

    2010-01-01

    We propose a two-sample test for the means of high-dimensional data when the data dimension is much larger than the sample size. Hotelling's classical $T^2$ test does not work for this "large $p$, small $n$" situation. The proposed test does not require explicit conditions in the relationship between the data dimension and sample size. This offers much flexibility in analyzing high-dimensional data. An application of the proposed test is in testing significance for sets of genes which we demonstrate in an empirical study on a leukemia data set.

  8. HoxBlinc RNA recruits Set1/MLL complexes to activate Hox gene expression patterns and mesoderm lineage development

    Science.gov (United States)

    Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Nao; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming

    2015-01-01

    Summary Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1+ mesoderm and then promotes hematopoietic differentiation through regulating hoxb gene pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated KD or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb gene expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1+ precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1+ precursors and differentiation of Flk1+ cells into hematopoietic lineages. PMID:26725110

  9. A set of vectors for introduction of antibiotic resistance genes by in vitro Cre-mediated recombination

    Directory of Open Access Journals (Sweden)

    Vassetzky Yegor S

    2008-12-01

    Full Text Available Abstract Background Introduction of new antibiotic resistance genes in the plasmids of interest is a frequent task in molecular cloning practice. Classical approaches involving digestion with restriction endonucleases and ligation are time-consuming. Findings We have created a set of insertion vectors (pINS carrying genes that provide resistance to various antibiotics (puromycin, blasticidin and G418 and containing a loxP site. Each vector (pINS-Puro, pINS-Blast or pINS-Neo contains either a chloramphenicol or a kanamycin resistance gene and is unable to replicate in most E. coli strains as it contains a conditional R6Kγ replication origin. Introduction of the antibiotic resistance genes into the vector of interest is achieved by Cre-mediated recombination between the replication-incompetent pINS and a replication-competent target vector. The recombination mix is then transformed into E. coli and selected by the resistance marker (kanamycin or chloramphenicol present in pINS, which allows to recover the recombinant plasmids with 100% efficiency. Conclusion Here we propose a simple strategy that allows to introduce various antibiotic-resistance genes into any plasmid containing a replication origin, an ampicillin resistance gene and a loxP site.

  10. Gene expression profiling identifies a set of transcripts that are up-regulated inhuman testicular seminoma.

    Science.gov (United States)

    Yamada, Shigeyuki; Kohu, Kazuyoshi; Ishii, Tomohiko; Ishidoya, Shigeto; Ishidoya, Shigeru; Hiramatsu, Masayoshi; Kanto, Satoru; Fukuzaki, Atsushi; Adachi, Yutsu; Endoh, Mareyuki; Moriya, Takuya; Sasaki, Hiroki; Satake, Masanobu; Arai, Yoichi

    2004-10-31

    Seminoma constitutes one subtype of human testicular germ cell tumors and is uniformly composed of cells that are morphologically similar to the primordial germ cells and/or the cells in the carcinoma in situ. We performed a genome-wide exploration of the genes that are specifically up-regulated in seminoma by oligonucleotide-based microarray analysis. This revealed 106 genes that are significantly and consistently up-regulated in the seminomas compared to the adjacent normal tissues of the testes. The microarray data were validated by semi-quantitative RT-PCR analysis. Of the 106 genes, 42 mapped to a small number of specific chromosomal regions, namely, 1q21, 2p23, 6p21-22, 7p14-15, 12pll, 12p13, 12q13-14 and 22q12-13. This list of up-regulated genes may be useful in identifying the causative oncogene(s) and/or the origin of seminoma. Furthermore, immunohistochemical analysis revealed that the seminoma cells specifically expressed the six gene products that were selected randomly from the list. These proteins include CCND2 and DNMT3A and may be useful as molecular pathological markers of seminoma.

  11. Enriched transcription factor signatures in triple negative breast cancer indicates possible targeted therapies with existing drugs

    Directory of Open Access Journals (Sweden)

    Scooter Willis

    2015-06-01

    Conclusion: With the increasing number of large sample size breast cancer cohorts, an exploratory analysis of genes that are consistently enriched in TN sharing common promoter motifs allows for the identification of possible therapeutic targets with extensive validation in patient derived data sets.

  12. twzPEA: A Topology and Working Zone Based Pathway Enrichment Analysis Framework

    Science.gov (United States)

    Sensitive detection of involvement and adaptation of key signaling, regulatory, and metabolic pathways holds the key to deciphering molecular mechanisms such as those in the biomass-to-biofuel conversion process in yeast. Typical gene set enrichment analyses often do not use topology information in...

  13. Serum 25-Hydroxyvitamin D Levels, phosphoprotein enriched in diabetes gene product (PED/PEA-15 and leptin-to-adiponectin ratio in women with PCOS

    Directory of Open Access Journals (Sweden)

    Savastano Silvia

    2011-11-01

    Full Text Available Abstract Background Polycystic ovary syndrome (PCOS is frequently associated with hypovitaminosis D. Vitamin D is endowed with pleiotropic effects, including insulin resistance (IR and apoptotic pathway. Disruption of the complex mechanism that regulated ovarian apoptosis has been reported in PCOS. Phosphoprotein enriched in diabetes gene product (PED/PEA-15, an anti-apoptotic protein involved in type 2 diabetes mellitus (T2DM, is overexpressed in PCOS women, independently of obesity. Leptin-to-adiponectin ratio (L/A is a biomarker of IR and low-grade inflammation in PCOS. The aim of the study was to investigate the levels of 25-hydroxy vitamin D (25(OHD, and L/A, in association with PED/PEA-15 protein abundance, in both lean and overweight/obese (o/o women with PCOS. Patients and Methods PED/PEA-15 protein abundance and circulating levels of 25(OHD, L/A, sex hormone-binding globulin, and testosterone were evaluated in 90 untreated PCOS patients (25 ± 4 yrs; range 18-34 and 40 healthy controls age and BMI comparable, from the same geographical area. FAI (free androgen index and the homeostasis model assessment of insulin resistance (HoMA-IR index were calculated. Results In o/o PCOS, 25(OHD levels were significantly lower, and L/A values were significantly higher than in lean PCOS (p Conclusions Lower 25(OHD and higher L/A were associated to PED/PEA-15 protein abundance in PCOS, suggesting their involvement in the ovarian imbalance between pro-and anti-apoptotic mechanisms, with high L/A and insulin and low 25(OHD levels as the main determinants of PED/PEA-15 protein variability. Further studies, involving also different apoptotic pathways or inflammatory cytokines and granulosa cells are mandatory to better define the possible bidirectional relationships between 25(OHD, PED/PEA-15 protein abundance, leptin and adiponectin in PCOS pathogenesis.

  14. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    Science.gov (United States)

    2014-11-11

    information gleaned from these transposon studies was used to inform our next set of designs by predicting genes switching from N to E or I as paralogous ...remaining in RGD and homologs found in other organisms. A BLASTp score of 1e-5 was used as the similarity cutoff. Functional classifications... homologs to RGD in that organism. Inside the dashed circle is for prokaryotes and archea. Those outside are for eukaryotes.

  15. Identification of a novel set of genes reflecting different in vivo invasive patterns of human GBM cells

    Directory of Open Access Journals (Sweden)

    Monticone Massimiliano

    2012-08-01

    Full Text Available Abstract Background Most patients affected by Glioblastoma multiforme (GBM, grade IV glioma experience a recurrence of the disease because of the spreading of tumor cells beyond surgical boundaries. Unveiling mechanisms causing this process is a logic goal to impair the killing capacity of GBM cells by molecular targeting. We noticed that our long-term GBM cultures, established from different patients, may display two categories/types of growth behavior in an orthotopic xenograft model: expansion of the tumor mass and formation of tumor branches/nodules (nodular like, NL-type or highly diffuse single tumor cell infiltration (HD-type. Methods We determined by DNA microarrays the gene expression profiles of three NL-type and three HD-type long-term GBM cultures. Subsequently, individual genes with different expression levels between the two groups were identified using Significance Analysis of Microarrays (SAM. Real time RT-PCR, immunofluorescence and immunoblot analyses, were performed for a selected subgroup of regulated gene products to confirm the results obtained by the expression analysis. Results Here, we report the identification of a set of 34 differentially expressed genes in the two types of GBM cultures. Twenty-three of these genes encode for proteins localized to the plasma membrane and 9 of these for proteins are involved in the process of cell adhesion. Conclusions This study suggests the participation in the diffuse infiltrative/invasive process of GBM cells within the CNS of a novel set of genes coding for membrane-associated proteins, which should be thus susceptible to an inhibition strategy by specific targeting. Massimiliano Monticone and Antonio Daga contributed equally to this work

  16. Effects of Selenium-Enriched Probiotics on Lipid Metabolism, Antioxidative Status, Histopathological Lesions, and Related Gene Expression in Mice Fed a High-Fat Diet.

    Science.gov (United States)

    Nido, Sonia Agostinho; Shituleni, Shituleni Andreas; Mengistu, Berhe Mekonnen; Liu, Yunhuan; Khan, Alam Zeb; Gan, Fang; Kumbhar, Shahnawaz; Huang, Kehe

    2016-06-01

    A total of 80 female albino mice were randomly allotted into five groups (n = 16) as follows: (A) normal control, (B) high-fat diet (HFD),; (C) HFD + probiotics (P), (D) HFD + sodium selenite (SS), and (E) HFD + selenium-enriched probiotics (SP). The selenium content of diets in groups A, B, C, D, and E was 0.05, 0.05, 0.05, 0.3, and 0.3 μg/g, respectively. The amount of probiotics contained in groups C and E was similar (Lactobacillus acidophilus 0.25 × 10(11)/mL and Saccharomyces cerevisiae 0.25 × 10(9)/mL colony-forming units (CFU)). The high-fat diet was composed of 15 % lard, 1 % cholesterol, 0.3 % cholic acid, and 83.7 % basal diet. At the end of the 4-week experiment, blood and liver samples were collected for the measurements of lipid metabolism, antioxidative status, histopathological lesions, and related gene expressions. The result shows that HFD significantly increased the body weights and liver damages compared to control, while P, SS, or SP supplementation attenuated the body weights and liver damages in mice. P, SS, or SP supplementation also significantly reversed the changes of alanine aminotransferase (AST), aspartate aminotransferase (ALT), total cholesterol (TC), triglyceride (TG), low-density lipoprotein (LDL), total protein (TP), high-density lipoprotein (HDL), glutathione peroxidase (GSH-Px), superoxide dismutase (SOD), catalasa (CAT), and malondialdehyde (MDA) levels induced by HFD. Generally, adding P, SS, or SP up-regulated mRNA expression of carnitine palmitoyltransferase-I (CPT1), carnitine palmitoyltransferase II (CPT2), acetyl-CoA acetyltransferase II (ACAT2), acyl-coenzyme A oxidase (ACOX2), and peroxisome proliferator-activated receptor alpha (PPARα) and down-regulated mRNA expression of fatty acid synthase (FAS), lipoprotein lipase (LPL), peroxisome proliferator-activated receptor gamma (PPARγ), and sterol regulatory element-binding protein-1 (SREBP1) involved in lipid metabolism. Among the group

  17. Genetic investigation of 100 heart genes in sudden unexplained death victims in a forensic setting

    DEFF Research Database (Denmark)

    Christiansen, Sofie Lindgren; Hertz, Christin Løth; Ferrero, Laura

    2016-01-01

    indicate that broad genetic investigation of SUD victims increases the diagnostic outcome, and the investigation should comprise genes involved in both cardiomyopathies and cardiac channelopathies.European Journal of Human Genetics advance online publication, 21 September 2016; doi:10.1038/ejhg.2016.118....

  18. Using RNAi in C. "elegans" to Demonstrate Gene Knockdown Phenotypes in the Undergraduate Biology Lab Setting

    Science.gov (United States)

    Roy, Nicole M.

    2013-01-01

    RNA interference (RNAi) is a powerful technology used to knock down genes in basic research and medicine. In 2006 RNAi technology using "Caenorhabditis elegans" ("C. elegans") was awarded the Nobel Prize in medicine and thus students graduating in the biological sciences should have experience with this technology. However,…

  19. The analysis of translation-related gene set boosts debates around origin and evolution of mimiviruses

    Science.gov (United States)

    Colson, Philippe; La Scola, Bernard

    2017-01-01

    The giant mimiviruses challenged the well-established concept of viruses, blurring the roots of the tree of life, mainly due to their genetic content. Along with other nucleo-cytoplasmic large DNA viruses, they compose a new proposed order—named Megavirales—whose origin and evolution generate heated debate in the scientific community. The presence of an arsenal of genes not widespread in the virosphere related to important steps of the translational process, including transfer RNAs, aminoacyl-tRNA synthetases, and translation factors for peptide synthesis, constitutes an important element of this debate. In this review, we highlight the main findings to date about the translational machinery of the mimiviruses and compare their distribution along the distinct members of the family Mimiviridae. Furthermore, we discuss how the presence and/or absence of the translation-related genes among mimiviruses raises important insights to boost the debate on their origin and evolutionary history. PMID:28207761

  20. GAMYB controls different sets of genes and is differentially regulated by microRNA in aleurone cells and anthers.

    Science.gov (United States)

    Tsuji, Hiroyuki; Aya, Koichiro; Ueguchi-Tanaka, Miyako; Shimada, Yukihisa; Nakazono, Mikio; Watanabe, Ryosuke; Nishizawa, Naoko K; Gomi, Kenji; Shimada, Asako; Kitano, Hidemi; Ashikari, Motoyuki; Matsuoka, Makoto

    2006-08-01

    GAMYB is a component of gibberellin (GA) signaling in cereal aleurone cells, and has an important role in flower development. However, it is unclear how GAMYB function is regulated. We examined the involvement of a microRNA, miR159, in the regulation of GAMYB expression in cereal aleurone cells and flower development. In aleurone cells, no miR159 expression was observed with or without GA treatment, suggesting that miR159 is not involved in the regulation of GAMYB and GAMYB-like genes in this tissue. miR159 was expressed in tissues other than aleurone, and miR159 over-expressors showed similar but more severe phenotypes than the gamyb mutant. GAMYB and GAMYB-like genes are co-expressed with miR159 in anthers, and the mRNA levels for GAMYB and GAMYB-like genes are negatively correlated with miR159 levels during anther development. Thus, OsGAMYB and OsGAMYB-like genes are regulated by miR159 in flowers. A microarray analysis revealed that OsGAMYB and its upstream regulator SLR1 are involved in the regulation of almost all GA-mediated gene expression in rice aleurone cells. Moreover, different sets of genes are regulated by GAMYB in aleurone cells and anthers. GAMYB binds directly to promoter regions of its target genes in anthers as well as aleurone cells. Based on these observations, we suggest that the regulation of GAMYB expression and GAMYB function are different in aleurone cells and flowers in rice.

  1. Transcript and protein profiling identify candidate gene sets of potential adaptive significance in New Zealand Pachycladon

    Directory of Open Access Journals (Sweden)

    Schmidt Silvia

    2010-05-01

    Full Text Available Abstract Background Transcript profiling of closely related species provides a means for identifying genes potentially important in species diversification. However, the predictive value of transcript profiling for inferring downstream-physiological processes has been unclear. In the present study we use shotgun proteomics to validate inferences from microarray studies regarding physiological differences in three Pachycladon species. We compare transcript and protein profiling and evaluate their predictive value for inferring glucosinolate chemotypes characteristic of these species. Results Evidence from heterologous microarrays and shotgun proteomics revealed differential expression of genes involved in glucosinolate hydrolysis (myrosinase-associated proteins and biosynthesis (methylthioalkylmalate isomerase and dehydrogenase, the interconversion of carbon dioxide and bicarbonate (carbonic anhydrases, water use efficiency (ascorbate peroxidase, 2 cys peroxiredoxin, 20 kDa chloroplastic chaperonin, mitochondrial succinyl CoA ligase and others (glutathione-S-transferase, serine racemase, vegetative storage proteins, genes related to translation and photosynthesis. Differences in glucosinolate hydrolysis products were directly confirmed. Overall, prediction of protein abundances from transcript profiles was stronger than prediction of transcript abundance from protein profiles. Protein profiles also proved to be more accurate predictors of glucosinolate profiles than transcript profiles. The similarity of species profiles for both transcripts and proteins reflected previously inferred phylogenetic relationships while glucosinolate chemotypes did not. Conclusions We have used transcript and protein profiling to predict physiological processes that evolved differently during diversification of three Pachycladon species. This approach has also identified candidate genes potentially important in adaptation, which are now the focus of ongoing study

  2. Microarray analysis identifies a common set of cellular genes modulated by different HCV replicon clones

    OpenAIRE

    Gerosolimo Germano; Dallapiccola Bruno; Bruni Roberto; Ferraris Alessandro; Tataseo Paola; Tritarelli Elena; Marcantonio Cinzia; Ciccaglione Anna; Costantino Angela; Rapicetta Maria

    2008-01-01

    Abstract Background Hepatitis C virus (HCV) RNA synthesis and protein expression affect cell homeostasis by modulation of gene expression. The impact of HCV replication on global cell transcription has not been fully evaluated. Thus, we analysed the expression profiles of different clones of human hepatoma-derived Huh-7 cells carrying a self-replicating HCV RNA which express all viral proteins (HCV replicon system). Results First, we compared the expression profile of HCV replicon clone 21-5 ...

  3. The smallest known genomes of multicellular and toxic cyanobacteria: comparison, minimal gene sets for linked traits and the evolutionary implications.

    Directory of Open Access Journals (Sweden)

    Karina Stucken

    Full Text Available Cyanobacterial morphology is diverse, ranging from unicellular spheres or rods to multicellular structures such as colonies and filaments. Multicellular species represent an evolutionary strategy to differentiate and compartmentalize certain metabolic functions for reproduction and nitrogen (N(2 fixation into specialized cell types (e.g. akinetes, heterocysts and diazocytes. Only a few filamentous, differentiated cyanobacterial species, with genome sizes over 5 Mb, have been sequenced. We sequenced the genomes of two strains of closely related filamentous cyanobacterial species to yield further insights into the molecular basis of the traits of N(2 fixation, filament formation and cell differentiation. Cylindrospermopsis raciborskii CS-505 is a cylindrospermopsin-producing strain from Australia, whereas Raphidiopsis brookii D9 from Brazil synthesizes neurotoxins associated with paralytic shellfish poisoning (PSP. Despite their different morphology, toxin composition and disjunct geographical distribution, these strains form a monophyletic group. With genome sizes of approximately 3.9 (CS-505 and 3.2 (D9 Mb, these are the smallest genomes described for free-living filamentous cyanobacteria. We observed remarkable gene order conservation (synteny between these genomes despite the difference in repetitive element content, which accounts for most of the genome size difference between them. We show here that the strains share a specific set of 2539 genes with >90% average nucleotide identity. The fact that the CS-505 and D9 genomes are small and streamlined compared to those of other filamentous cyanobacterial species and the lack of the ability for heterocyst formation in strain D9 allowed us to define a core set of genes responsible for each trait in filamentous species. We presume that in strain D9 the ability to form proper heterocysts was secondarily lost together with N(2 fixation capacity. Further comparisons to all available cyanobacterial

  4. The smallest known genomes of multicellular and toxic cyanobacteria: comparison, minimal gene sets for linked traits and the evolutionary implications.

    Science.gov (United States)

    Stucken, Karina; John, Uwe; Cembella, Allan; Murillo, Alejandro A; Soto-Liebe, Katia; Fuentes-Valdés, Juan J; Friedel, Maik; Plominsky, Alvaro M; Vásquez, Mónica; Glöckner, Gernot

    2010-02-16

    Cyanobacterial morphology is diverse, ranging from unicellular spheres or rods to multicellular structures such as colonies and filaments. Multicellular species represent an evolutionary strategy to differentiate and compartmentalize certain metabolic functions for reproduction and nitrogen (N(2)) fixation into specialized cell types (e.g. akinetes, heterocysts and diazocytes). Only a few filamentous, differentiated cyanobacterial species, with genome sizes over 5 Mb, have been sequenced. We sequenced the genomes of two strains of closely related filamentous cyanobacterial species to yield further insights into the molecular basis of the traits of N(2) fixation, filament formation and cell differentiation. Cylindrospermopsis raciborskii CS-505 is a cylindrospermopsin-producing strain from Australia, whereas Raphidiopsis brookii D9 from Brazil synthesizes neurotoxins associated with paralytic shellfish poisoning (PSP). Despite their different morphology, toxin composition and disjunct geographical distribution, these strains form a monophyletic group. With genome sizes of approximately 3.9 (CS-505) and 3.2 (D9) Mb, these are the smallest genomes described for free-living filamentous cyanobacteria. We observed remarkable gene order conservation (synteny) between these genomes despite the difference in repetitive element content, which accounts for most of the genome size difference between them. We show here that the strains share a specific set of 2539 genes with >90% average nucleotide identity. The fact that the CS-505 and D9 genomes are small and streamlined compared to those of other filamentous cyanobacterial species and the lack of the ability for heterocyst formation in strain D9 allowed us to define a core set of genes responsible for each trait in filamentous species. We presume that in strain D9 the ability to form proper heterocysts was secondarily lost together with N(2) fixation capacity. Further comparisons to all available cyanobacterial genomes

  5. In vivo validation of a computationally predicted conserved Ath5 target gene set.

    Directory of Open Access Journals (Sweden)

    Filippo Del Bene

    2007-09-01

    Full Text Available So far, the computational identification of transcription factor binding sites is hampered by the complexity of vertebrate genomes. Here we present an in silico procedure to predict target sites of a transcription factor in complex genomes using its binding site. In a first step sequence, comparison of closely related genomes identifies the binding sites in conserved cis-regulatory regions (phylogenetic footprinting. Subsequently, more remote genomes are introduced into the comparison to identify highly conserved and therefore putatively functional binding sites (phylogenetic filtering. When applied to the binding site of atonal homolog 5 (Ath5 or ATOH7, this procedure efficiently filters evolutionarily conserved binding sites out of more than 300,000 instances in a vertebrate genome. We validate a selection of the linked target genes by showing coexpression with and transcriptional regulation by Ath5. Finally, chromatin immunoprecipitation demonstrates the occupancy of the target gene promoters by Ath5. Thus, our procedure, applied to whole genomes, is a fast and predictive tool to in silico filter the target genes of a given transcription factor with defined binding site.

  6. Screening Key Genes Associated with the Development and Progression of Non-small Cell Lung Cancer Based on Gene-enrichment Analysis and Meta-analysis%基因富集及meta分析筛选非小细胞肺癌发生发展关键基因的研究

    Institute of Scientific and Technical Information of China (English)

    何文武; 冼磊; 王永勇; 胡艳玲; 陈铭伍

    2012-01-01

    Background and objective Non-small cell lung cancer (NSCLC) is one of the most common malignant tumors; however, its causes are still not completely understood. This study was designed to screen the key genes and pathways related to NSCLC occurrence and development and to establish the scientific foundation for the genetic mechanisms and targeted therapy of NSCLC. Methods Both gene set-enrichment analysis (GSEA) and meta-analysis (meta) were used to screen the critical pathways and genes that might be corretacted with the development and progression of lung cancer at the transcription level. Results Using the GSEA and meta methods, focal adhesion and regulation of act in cytoskeleton were determined to be the more prominent overlapping significant pathways. In the focal adhesion pathway, 31 genes were statistically significant (P<0.05), whereas in the regulation of actin cytoskeleton pathway, 32 genes were statistically significant (P<0.05). Conclusion The focal adhesion and the regulation of actin cytoskeleton pathways might play important roles in the occurrence and development of NSCLC. Further studies are needed to determine the biological function for the positiue genes.%背景与目的 非小细胞肺癌(non-small cell lung cancer,NSCLC)是全球最常见的恶性肿瘤之一,其发病遗传机制仍不清楚.本研究旨在筛选影响NSCLC发生发展的关键基因和通路,为NSCLC发病遗传机制及靶向治疗的研究奠定科学基础.方法 运用基因组富集分析(gene set enrichment analysis,GSEA)以及对单套数据集单个基因元分析(meta-analysis,meta)的方法,筛选出在转录水平上影响NSCLC发生发展的关键通路和基因.结果 通过GSEA和meta两种分析方法得出的通路中,重叠性较高的主要为粘着斑通路和细胞骨架肌动蛋白调控通路.在粘着斑通路中31个基因具有统计学意义(P<0.05);细胞骨架肌动蛋白调控通路中32个基因具有统计学意义(P<0.05).结论 粘着

  7. Rapid identification of Salmonella serovars in feces by specific detection of virulence genes, invA and spvC, by an enrichment broth culture-multiplex PCR combination assay.

    Science.gov (United States)

    Chiu, C H; Ou, J T

    1996-10-01

    In order to make a rapid and definite diagnosis of Salmonella enteritis in children, an enrichment broth culture-multiplex PCR combination assay was devised to identify Salmonella serovars directly from fecal samples. Two pairs of oligonucleotide primers were prepared according to the sequences of the chromosomal invA and plasmid spvC genes. PCR with these two primers would produce either one amplicon (from the invA gene) or two amplicons (from the invA and spvC genes), depending on whether or not the Salmonella bacteria contained a virulence plasmid. The fecal sample was diluted 10- to 20-fold into gram-negative enrichment broth and incubated to eliminate inhibitory compounds and also to allow selective enrichment of the bacteria. One or two amplicons were obtained, the expected result if Salmonella bacteria were present. The detection limit of this PCR was about 200 bacteria per reaction mixture. The primers were specific, as no amplification products were obtained with 18 species and 22 isolates of non-Salmonella bacteria tested which could be present in the feces or cause contamination. In contrast, when 23 commonly seen Salmonella serovars (38 isolates) were tested, all were shown to carry the invA gene and seven concomitantly harbored the spvC gene of the virulence plasmid. This assay was applied to the diagnosis of Salmonella enteritis in 57 children who were suffering from mucoid and/or bloody diarrhea. Of the 57 children, 38 were PCR positive and 22 were culture positive. There were two culture-positive samples that were not detected by PCR. Thus, this PCR assay showed an efficiency of 95% (38 of 40), which is much higher than the 60% (24 of 40) by culture alone. Not only is this method more sensitive, rapid, and efficient but it will cause only an incremental increase in the cost of stool processing, since enrichment cultivation of fecal samples from diarrheal patients using gram-negative enrichment broth is a routine practice for identification in many

  8. Comparative genomic analysis of SET domain family reveals the origin, expansion, and putative function of the arthropod-specific SmydA genes as histone modifiers in insects

    Science.gov (United States)

    Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling

    2017-01-01

    Abstract The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain–containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. PMID:28444351

  9. Functional Gene-Set Analysis Does Not Support a Major Role for Synaptic Function in Attention Deficit/Hyperactivity Disorder (ADHD

    Directory of Open Access Journals (Sweden)

    Anke R. Hammerschlag

    2014-07-01

    Full Text Available Attention Deficit/Hyperactivity Disorder (ADHD is one of the most common childhood-onset neuropsychiatric disorders. Despite high heritability estimates, genome-wide association studies (GWAS have failed to find significant genetic associations, likely due to the polygenic character of ADHD. Nevertheless, genetic studies suggested the involvement of several processes important for synaptic function. Therefore, we applied a functional gene-set analysis to formally test whether synaptic functions are associated with ADHD. Gene-set analysis tests the joint effect of multiple genetic variants in groups of functionally related genes. This method provides increased statistical power compared to conventional GWAS. We used data from the Psychiatric Genomics Consortium including 896 ADHD cases and 2455 controls, and 2064 parent-affected offspring trios, providing sufficient statistical power to detect gene sets representing a genotype relative risk of at least 1.17. Although all synaptic genes together showed a significant association with ADHD, this association was not stronger than that of randomly generated gene sets matched for same number of genes. Further analyses showed no association of specific synaptic function categories with ADHD after correction for multiple testing. Given current sample size and gene sets based on current knowledge of genes related to synaptic function, our results do not support a major role for common genetic variants in synaptic genes in the etiology of ADHD.

  10. Root Exudates of Various Host Plants of Rhizobium leguminosarum Contain Different Sets of Inducers of Rhizobium Nodulation Genes.

    Science.gov (United States)

    Zaat, S A; Wijffelman, C A; Mulders, I H; van Brussel, A A; Lugtenberg, B J

    1988-04-01

    Rhizobium promoters involved in the formation of root nodules on leguminous plants are activated by flavonoids in plant root exudate. A series of Rhizobium strains which all contain the inducible Rhizobium leguminosarum nodA promoter fused to the Escherichia coli lacZ gene, and which differ only in the source of the regulatory nodD gene, were recently used to show that the regulatory nodD gene determines which flavonoids are able to activate the nodA promoter (HP Spaink, CA Wijffelman, E Pees, RJH Okker, BJJ Lugtenberg 1987 Nature 328: 337-340). Since these strains therefore are able to discriminate between various flavonoids, they were used to determine whether or not plants that are nodulated by R. leguminosarum produce different inducers. After chromatographic separation of root exudate constituents from Vicia sativa L. subsp. nigra (L.), V. hirsuta (L.) S.F. Gray, Pisum sativum L. cv Rondo, and Trifolium subterraneum L., the fractions were tested with a set of strains containing a nodD gene of R. leguminosarum, R. trifolii, or Rhizobium meliloti, respectively. It appeared that the source of nodD determined whether, and to what extent, the R. leguminosarum nodA promoter was induced. Lack of induction could not be attributed to the presence of inhibitors. Most of the inducers were able to activate the nodA promoter in the presence of one particular nodD gene only. The inducers that were active in the presence of the R. leguminosarum nodD gene were different in each root exudate.

  11. The transcriptional response to encystation stimuli in Giardia lamblia is restricted to a small set of genes.

    Science.gov (United States)

    Morf, Laura; Spycher, Cornelia; Rehrauer, Hubert; Fournier, Catharine Aquino; Morrison, Hilary G; Hehl, Adrian B

    2010-10-01

    The protozoan parasite Giardia lamblia undergoes stage differentiation in the small intestine of the host to an environmentally resistant and infectious cyst. Encystation involves the secretion of an extracellular matrix comprised of cyst wall proteins (CWPs) and a β(1-3)-GalNAc homopolymer. Upon the induction of encystation, genes coding for CWPs are switched on, and mRNAs coding for a Myb transcription factor and enzymes involved in cyst wall glycan synthesis are upregulated. Encystation in vitro is triggered by several protocols, which call for changes in bile concentrations or availability of lipids, and elevated pH. However, the conditions for induction are not standardized and we predicted significant protocol-specific side effects. This makes reliable identification of encystation factors difficult. Here, we exploited the possibility of inducing encystation with two different protocols, which we show to be equally effective, for a comparative mRNA profile analysis. The standard encystation protocol induced a bipartite transcriptional response with surprisingly minor involvement of stress genes. A comparative analysis revealed a core set of only 18 encystation genes and showed that a majority of genes was indeed upregulated as a side effect of inducing conditions. We also established a Myb binding sequence as a signature motif in encystation promoters, suggesting coordinated regulation of these factors.

  12. HoxBlinc RNA Recruits Set1/MLL Complexes to Activate Hox Gene Expression Patterns and Mesoderm Lineage Development

    Directory of Open Access Journals (Sweden)

    Changwang Deng

    2016-01-01

    Full Text Available Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1+ mesoderm and then promotes hematopoietic differentiation through regulation of hoxb pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated knockdown or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2–b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1+ precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1+ precursors and differentiation of Flk1+ cells into hematopoietic lineages.

  13. The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.

    Directory of Open Access Journals (Sweden)

    Byregowda Munishamappa

    2010-03-01

    .8% in molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8% markers with an average of four alleles per marker and an average polymorphic information content (PIC value of 0.40. Similarly, in silico mining of 133 contigs with ≥ 5 sequences detected 102 single nucleotide polymorphisms (SNPs in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS assay. Conclusion The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.

  14. A Method of Gene-Function Annotation Based on Variable Precision Rough Sets

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    It is very important in the field of bioinformatics to apply computer to perform the function annotation for new sequenced bio-sequences. Based on GO database and BLAST program, a novel method for the function annotation of new biological sequences is presented by using the variable-precision rough set theory. The proposed method is applied to the real data in GO database to examine its effectiveness. Numerical results show that the proposed method has better precision, recall-rate and harmonic mean value compared with existing methods.

  15. Construction and evaluation of normalized cDNA libraries enriched with full-length sequences for rapid discovery of new genes from Sisal (Agave sisalana Perr.) different developmental stages.

    Science.gov (United States)

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-10-12

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  16. Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr.) Different Developmental Stages

    Science.gov (United States)

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-01-01

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944

  17. [Expression of SET-NUP214 fusion gene in patients with T-cell acute lymphoblastic leukemia and its clinical significance].

    Science.gov (United States)

    Dai, Hai-Ping; Wang, Qian; Wu, Li-Li; Ping, Na-Na; Wu, Chun-Xiao; Xie, Jun-Dan; Pan, Jin-Lan; Xue, Yong-Quan; Wu, De-Pei; Chen, Su-Ning

    2012-10-01

    This study was aimed to investigate the occurrence and clinical significance of the SET-NUP214 fusion gene in patients with T-cell acute lymphoblastic leukemia (T-ALL), analyse clinical and biological characteristics in this disease. RT-PCR was used to detect the expression of SET-NUP214 fusion gene in 58 T-ALL cases. Interphase FISH and Array-CGH were used to detect the deletion of 9q34. Direct sequencing was applied to detect mutations of PHF6 and NOTCH1. The results showed that 6 out of 58 T-ALL cases (10.3%) were detected to have the SET-NUP214 fusion gene by RT-PCR. Besides T-lineage antigens, expression of CD13 and(or) CD33 were detected in all the 6 cases. Deletions of 9q34 were detected in 4 out of the 6 patients by FISH. Array-CGH results of 3 SET-NUP214 positive T-ALL patients confirmed that this fusion gene was resulted from a cryptic deletion of 9q34.11q34.13. PHF6 and NOTCH1 gene mutations were found in 4 and 5 out of 6 SET-NUP214 positive T-ALL patients, respectively. It is concluded that SET-NUP214 fusion gene is often resulted from del(9)(q34). PHF6 and NOTCH1 mutations may be potential leukemogenic event in SET-NUP214 fusion gene.

  18. A statistical approach towards the derivation of predictive gene sets for potency ranking of chemicals in the mouse embryonic stem cell test.

    Science.gov (United States)

    Schulpen, Sjors H W; Pennings, Jeroen L A; Tonk, Elisa C M; Piersma, Aldert H

    2014-03-21

    The embryonic stem cell test (EST) is applied as a model system for detection of embryotoxicants. The application of transcriptomics allows a more detailed effect assessment compared to the morphological endpoint. Genes involved in cell differentiation, modulated by chemical exposures, may be useful as biomarkers of developmental toxicity. We describe a statistical approach to obtain a predictive gene set for toxicity potency ranking of compounds within one class. This resulted in a gene set based on differential gene expression across concentration-response series of phthalatic monoesters. We determined the concentration at which gene expression was changed at least 1.5-fold. Genes responding with the same potency ranking in vitro and in vivo embryotoxicity were selected. A leave-one-out cross-validation showed that the relative potency of each phthalate was always predicted correctly. The classical morphological 50% effect level (ID50) in EST was similar to the predicted concentration using gene set expression responses. A general down-regulation of development-related genes and up-regulation of cell-cycle related genes was observed, reminiscent of the differentiation inhibition in EST. This study illustrates the feasibility of applying dedicated gene set selections as biomarkers for developmental toxicity potency ranking on the basis of in vitro testing in the EST.

  19. An application of MeSH enrichment analysis in livestock.

    Science.gov (United States)

    Morota, G; Peñagaricano, F; Petersen, J L; Ciobanu, D C; Tsuyuzaki, K; Nikaido, I

    2015-08-01

    An integral part of functional genomics studies is to assess the enrichment of specific biological terms in lists of genes found to be playing an important role in biological phenomena. Contrasting the observed frequency of annotated terms with those of the background is at the core of overrepresentation analysis (ORA). Gene Ontology (GO) is a means to consistently classify and annotate gene products and has become a mainstay in ORA. Alternatively, Medical Subject Headings (MeSH) offers a comprehensive life science vocabulary including additional categories that are not covered by GO. Although MeSH is applied predominantly in human and model organism research, its full potential in livestock genetics is yet to be explored. In this study, MeSH ORA was evaluated to discern biological properties of identified genes and contrast them with the results obtained from GO enrichment analysis. Three published datasets were employed for this purpose, representing a gene expression study in dairy cattle, the use of SNPs for genome-wide prediction in swine and the identification of genomic regions targeted by selection in horses. We found that several overrepresented MeSH annotations linked to these gene sets share similar concepts with those of GO terms. Moreover, MeSH yielded unique annotations, which are not directly provided by GO terms, suggesting that MeSH has the potential to refine and enrich the representation of biological knowledge. We demonstrated that MeSH can be regarded as another choice of annotation to draw biological inferences from genes identified via experimental analyses. When used in combination with GO terms, our results indicate that MeSH can enhance our functional interpretations for specific biological conditions or the genetic basis of complex