WorldWideScience

Sample records for gene set analysis

  1. Gene set analysis for GWAS

    DEFF Research Database (Denmark)

    Debrabant, Birgit; Soerensen, Mette

    2014-01-01

    Abstract We discuss the use of modified Kolmogorov-Smirnov (KS) statistics in the context of gene set analysis and review corresponding null and alternative hypotheses. Especially, we show that, when enhancing the impact of highly significant genes in the calculation of the test statistic...... parameter and the genesis and distribution of the gene-level statistics, and illustrate the effects of differential weighting in a real-life example....

  2. Gene set analysis using variance component tests

    Science.gov (United States)

    2013-01-01

    Background Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. Results We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). Conclusion We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data. PMID:23806107

  3. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  4. Gene set analysis for interpreting genetic studies

    DEFF Research Database (Denmark)

    Pers, Tune H

    2016-01-01

    Interpretation of genome-wide association study (GWAS) results is lacking behind the discovery of new genetic associations. Consequently, there is an urgent need for data-driven methods for interpreting genetic association studies. Gene set analysis (GSA) can identify aetiologic pathways and func......Interpretation of genome-wide association study (GWAS) results is lacking behind the discovery of new genetic associations. Consequently, there is an urgent need for data-driven methods for interpreting genetic association studies. Gene set analysis (GSA) can identify aetiologic pathways...

  5. Multidimensional gene set analysis of genomic data.

    Directory of Open Access Journals (Sweden)

    David Montaner

    Full Text Available Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms in response to one particular variable (e.g. differential gene expression. In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc. simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.

  6. Gene set analysis for longitudinal gene expression data

    Directory of Open Access Journals (Sweden)

    Piepho Hans-Peter

    2011-07-01

    Full Text Available Abstract Background Gene set analysis (GSA has become a successful tool to interpret gene expression profiles in terms of biological functions, molecular pathways, or genomic locations. GSA performs statistical tests for independent microarray samples at the level of gene sets rather than individual genes. Nowadays, an increasing number of microarray studies are conducted to explore the dynamic changes of gene expression in a variety of species and biological scenarios. In these longitudinal studies, gene expression is repeatedly measured over time such that a GSA needs to take into account the within-gene correlations in addition to possible between-gene correlations. Results We provide a robust nonparametric approach to compare the expressions of longitudinally measured sets of genes under multiple treatments or experimental conditions. The limiting distributions of our statistics are derived when the number of genes goes to infinity while the number of replications can be small. When the number of genes in a gene set is small, we recommend permutation tests based on our nonparametric test statistics to achieve reliable type I error and better power while incorporating unknown correlations between and within-genes. Simulation results demonstrate that the proposed method has a greater power than other methods for various data distributions and heteroscedastic correlation structures. This method was used for an IL-2 stimulation study and significantly altered gene sets were identified. Conclusions The simulation study and the real data application showed that the proposed gene set analysis provides a promising tool for longitudinal microarray analysis. R scripts for simulating longitudinal data and calculating the nonparametric statistics are posted on the North Dakota INBRE website http://ndinbre.org/programs/bioinformatics.php. Raw microarray data is available in Gene Expression Omnibus (National Center for Biotechnology Information with

  7. Self-Contained Statistical Analysis of Gene Sets

    Science.gov (United States)

    Cannon, Judy L.; Ricoy, Ulises M.; Johnson, Christopher

    2016-01-01

    Microarrays are a powerful tool for studying differential gene expression. However, lists of many differentially expressed genes are often generated, and unraveling meaningful biological processes from the lists can be challenging. For this reason, investigators have sought to quantify the statistical probability of compiled gene sets rather than individual genes. The gene sets typically are organized around a biological theme or pathway. We compute correlations between different gene set tests and elect to use Fisher’s self-contained method for gene set analysis. We improve Fisher’s differential expression analysis of a gene set by limiting the p-value of an individual gene within the gene set to prevent a small percentage of genes from determining the statistical significance of the entire set. In addition, we also compute dependencies among genes within the set to determine which genes are statistically linked. The method is applied to T-ALL (T-lineage Acute Lymphoblastic Leukemia) to identify differentially expressed gene sets between T-ALL and normal patients and T-ALL and AML (Acute Myeloid Leukemia) patients. PMID:27711232

  8. A general modular framework for gene set enrichment analysis

    Directory of Open Access Journals (Sweden)

    Strimmer Korbinian

    2009-02-01

    Full Text Available Abstract Background Analysis of microarray and other high-throughput data on the basis of gene sets, rather than individual genes, is becoming more important in genomic studies. Correspondingly, a large number of statistical approaches for detecting gene set enrichment have been proposed, but both the interrelations and the relative performance of the various methods are still very much unclear. Results We conduct an extensive survey of statistical approaches for gene set analysis and identify a common modular structure underlying most published methods. Based on this finding we propose a general framework for detecting gene set enrichment. This framework provides a meta-theory of gene set analysis that not only helps to gain a better understanding of the relative merits of each embedded approach but also facilitates a principled comparison and offers insights into the relative interplay of the methods. Conclusion We use this framework to conduct a computer simulation comparing 261 different variants of gene set enrichment procedures and to analyze two experimental data sets. Based on the results we offer recommendations for best practices regarding the choice of effective procedures for gene set enrichment analysis.

  9. Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data

    Directory of Open Access Journals (Sweden)

    Tintle Nathan L

    2012-08-01

    Full Text Available Abstract Background Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. Results We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Conclusions Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.

  10. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data.

    Directory of Open Access Journals (Sweden)

    Boris P Hejblum

    2015-06-01

    Full Text Available Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial, and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package.

  11. The limitations of simple gene set enrichment analysis assuming gene independence.

    Science.gov (United States)

    Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

    2016-02-01

    Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.

  12. Analysis of gene set using shrinkage covariance matrix approach

    Science.gov (United States)

    Karjanto, Suryaefiza; Aripin, Rasimah

    2013-09-01

    Microarray methodology has been exploited for different applications such as gene discovery and disease diagnosis. This technology is also used for quantitative and highly parallel measurements of gene expression. Recently, microarrays have been one of main interests of statisticians because they provide a perfect example of the paradigms of modern statistics. In this study, the alternative approach to estimate the covariance matrix has been proposed to solve the high dimensionality problem in microarrays. The extension of traditional Hotelling's T2 statistic is constructed for determining the significant gene sets across experimental conditions using shrinkage approach. Real data sets were used as illustrations to compare the performance of the proposed methods with other methods. The results across the methods are consistent, implying that this approach provides an alternative to existing techniques.

  13. GSMA: Gene Set Matrix Analysis, An Automated Method for Rapid Hypothesis Testing of Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Chris Cheadle

    2007-01-01

    Full Text Available Background: Microarray technology has become highly valuable for identifying complex global changes in gene expression patterns. The assignment of functional information to these complex patterns remains a challenging task in effectively interpreting data and correlating results from across experiments, projects and laboratories. Methods which allow the rapid and robust evaluation of multiple functional hypotheses increase the power of individual researchers to data mine gene expression data more efficiently.Results: We have developed (gene set matrix analysis GSMA as a useful method for the rapid testing of group-wise up- or downregulation of gene expression simultaneously for multiple lists of genes (gene sets against entire distributions of gene expression changes (datasets for single or multiple experiments. The utility of GSMA lies in its flexibility to rapidly poll gene sets related by known biological function or as designated solely by the end-user against large numbers of datasets simultaneously.Conclusions: GSMA provides a simple and straightforward method for hypothesis testing in which genes are tested by groups across multiple datasets for patterns of expression enrichment.

  14. New cyt b gene universal primer set for forensic analysis.

    Science.gov (United States)

    Lopez-Oceja, A; Gamarra, D; Borragan, S; Jiménez-Moreno, S; de Pancorbo, M M

    2016-07-01

    Analysis of mitochondrial DNA, and in particular the cytochrome b gene (cyt b), has become an essential tool for species identification in routine forensic practice. In cases of degraded samples, where the DNA is fractionated, universal primers that are highly efficient for the amplification of the target region are necessary. Therefore, in the present study a new universal cyt b primer set with high species identification capabilities, even in samples with highly degraded DNA, has been developed. In order to achieve this objective, the primers were designed following the alignment of complete sequences of the cyt b from 751 species from the Class of Mammalia listed in GenBank. A highly variable region of 148bp flanked by highly conserved sequences was chosen for placing the primers. The effectiveness of the new pair of primers was examined in 63 animal species belonging to 38 Families from 14 Orders and 5 Classes (Mammalia, Aves, Reptilia, Actinopterygii, and Malacostraca). Species determination was possible in all cases, which shows that the fragment analyzed provided a high capability for species identification. Furthermore, to ensure the efficiency of the 148bp fragment, the intraspecific variability was analyzed by calculating the concordance between individuals with the BLAST tool from the NCBI (National Center for Biotechnological Information). The intraspecific concordance levels were superior to 97% in all species. Likewise, the phylogenetic information from the selected fragment was confirmed by obtaining the phylogenetic tree from the sequences of the species analyzed. Evidence of the high power of phylogenetic discrimination of the analyzed fragment of the cyt b was obtained, as 93.75% of the species were grouped within their corresponding Orders. Finally, the analysis of 40 degraded samples with small-size DNA fragments showed that the new pair of primers permits identifying the species, even when the DNA is highly degraded as it is very common in

  15. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    Science.gov (United States)

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  16. Gene set enrichment analysis for non-monotone association and multiple experimental categories

    OpenAIRE

    Heinloth Alexandra N; Irwin Richard D; Dai Shuangshuang; Lin Rongheng; Boorman Gary A; Li Leping

    2008-01-01

    Abstract Background Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statis...

  17. ErmineJ: Tool for functional analysis of gene expression data sets

    Directory of Open Access Journals (Sweden)

    Braynen William

    2005-11-01

    Full Text Available Abstract Background It is common for the results of a microarray study to be analyzed in the context of biologically-motivated groups of genes such as pathways or Gene Ontology categories. The most common method for such analysis uses the hypergeometric distribution (or a related technique to look for "over-representation" of groups among genes selected as being differentially expressed or otherwise of interest based on a gene-by-gene analysis. However, this method suffers from some limitations, and biologist-friendly tools that implement alternatives have not been reported. Results We introduce ErmineJ, a multiplatform user-friendly stand-alone software tool for the analysis of functionally-relevant sets of genes in the context of microarray gene expression data. ErmineJ implements multiple algorithms for gene set analysis, including over-representation and resampling-based methods that focus on gene scores or correlation of gene expression profiles. In addition to a graphical user interface, ErmineJ has a command line interface and an application programming interface that can be used to automate analyses. The graphical user interface includes tools for creating and modifying gene sets, visualizing the Gene Ontology as a table or tree, and visualizing gene expression data. ErmineJ comes with a complete user manual, and is open-source software licensed under the Gnu Public License. Conclusion The availability of multiple analysis algorithms, together with a rich feature set and simple graphical interface, should make ErmineJ a useful addition to the biologist's informatics toolbox. ErmineJ is available from http://microarray.cu.genome.org.

  18. WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis.

    Science.gov (United States)

    Glez-Peña, Daniel; Gómez-López, Gonzalo; Pisano, David G; Fdez-Riverola, Florentino

    2009-07-01

    WhichGenes is a web-based interactive gene set building tool offering a very simple interface to extract always-updated gene lists from multiple databases and unstructured biological data sources. While the user can specify new gene sets of interest by following a simple four-step wizard, the tool is able to run several queries in parallel. Every time a new set is generated, it is automatically added to the private gene-set cart and the user is notified by an e-mail containing a direct link to the new set stored in the server. WhichGenes provides functionalities to edit, delete and rename existing sets as well as the capability of generating new ones by combining previous existing sets (intersection, union and difference operators). The user can export his sets configuring the output format and selecting among multiple gene identifiers. In addition to the user-friendly environment, WhichGenes allows programmers to access its functionalities in a programmatic way through a Representational State Transfer web service. WhichGenes front-end is freely available at http://www.whichgenes.org/, WhichGenes API is accessible at http://www.whichgenes.org/api/.

  19. Gene-based analysis of regionally enriched cortical genes in GWAS data sets of cognitive traits and psychiatric disorders.

    Directory of Open Access Journals (Sweden)

    Kari M Ersland

    Full Text Available BACKGROUND: Despite its estimated high heritability, the genetic architecture leading to differences in cognitive performance remains poorly understood. Different cortical regions play important roles in normal cognitive functioning and impairment. Recently, we reported on sets of regionally enriched genes in three different cortical areas (frontomedial, temporal and occipital cortices of the adult rat brain. It has been suggested that genes preferentially, or specifically, expressed in one region or organ reflect functional specialisation. Employing a gene-based approach to the analysis, we used the regionally enriched cortical genes to mine a genome-wide association study (GWAS of the Norwegian Cognitive NeuroGenetics (NCNG sample of healthy adults for association to nine psychometric tests measures. In addition, we explored GWAS data sets for the serious psychiatric disorders schizophrenia (SCZ (n = 3 samples and bipolar affective disorder (BP (n = 3 samples, to which cognitive impairment is linked. PRINCIPAL FINDINGS: At the single gene level, the temporal cortex enriched gene RAR-related orphan receptor B (RORB showed the strongest overall association, namely to a test of verbal intelligence (Vocabulary, P = 7.7E-04. We also applied gene set enrichment analysis (GSEA to test the candidate genes, as gene sets, for enrichment of association signal in the NCNG GWAS and in GWASs of BP and of SCZ. We found that genes differentially expressed in the temporal cortex showed a significant enrichment of association signal in a test measure of non-verbal intelligence (Reasoning in the NCNG sample. CONCLUSION: Our gene-based approach suggests that RORB could be involved in verbal intelligence differences, while the genes enriched in the temporal cortex might be important to intellectual functions as measured by a test of reasoning in the healthy population. These findings warrant further replication in independent samples on cognitive traits.

  20. A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity.

    Directory of Open Access Journals (Sweden)

    Adi L Tarca

    Full Text Available Identification of functional sets of genes associated with conditions of interest from omics data was first reported in 1999, and since, a plethora of enrichment methods were published for systematic analysis of gene sets collections including Gene Ontology and biological pathways. Despite their widespread usage in reducing the complexity of omics experiment results, their performance is poorly understood. Leveraging the existence of disease specific gene sets in KEGG and Metacore® databases, we compared the performance of sixteen methods under relaxed assumptions while using 42 real datasets (over 1,400 samples. Most of the methods ranked high the gene sets designed for specific diseases whenever samples from affected individuals were compared against controls via microarrays. The top methods for gene set prioritization were different from the top ones in terms of sensitivity, and four of the sixteen methods had large false positives rates assessed by permuting the phenotype of the samples. The best overall methods among those that generated reasonably low false positive rates, when permuting phenotypes, were PLAGE, GLOBALTEST, and PADOG. The best method in the category that generated higher than expected false positives was MRGSE.

  1. Associations between DNA methylation and schizophrenia-related intermediate phenotypes - a gene set enrichment analysis.

    Science.gov (United States)

    Hass, Johanna; Walton, Esther; Wright, Carrie; Beyer, Andreas; Scholz, Markus; Turner, Jessica; Liu, Jingyu; Smolka, Michael N; Roessner, Veit; Sponheim, Scott R; Gollub, Randy L; Calhoun, Vince D; Ehrlich, Stefan

    2015-06-03

    Multiple genetic approaches have identified microRNAs as key effectors in psychiatric disorders as they post-transcriptionally regulate expression of thousands of target genes. However, their role in specific psychiatric diseases remains poorly understood. In addition, epigenetic mechanisms such as DNA methylation, which affect the expression of both microRNAs and coding genes, are critical for our understanding of molecular mechanisms in schizophrenia. Using clinical, imaging, genetic, and epigenetic data of 103 patients with schizophrenia and 111 healthy controls of the Mind Clinical Imaging Consortium (MCIC) study of schizophrenia, we conducted gene set enrichment analysis to identify markers for schizophrenia-associated intermediate phenotypes. Genes were ranked based on the correlation between DNA methylation patterns and each phenotype, and then searched for enrichment in 221 predicted microRNA target gene sets. We found the predicted hsa-miR-219a-5p target gene set to be significantly enriched for genes (EPHA4, PKNOX1, ESR1, among others) whose methylation status is correlated with hippocampal volume independent of disease status. Our results were strengthened by significant associations between hsa-miR-219a-5p target gene methylation patterns and hippocampus-related neuropsychological variables. IPA pathway analysis of the respective predicted hsa-miR-219a-5p target genes revealed associated network functions in behavior and developmental disorders. Altered methylation patterns of predicted hsa-miR-219a-5p target genes are associated with a structural aberration of the brain that has been proposed as a possible biomarker for schizophrenia. The (dys)regulation of microRNA target genes by epigenetic mechanisms may confer additional risk for developing psychiatric symptoms. Further study is needed to understand possible interactions between microRNAs and epigenetic changes and their impact on risk for brain-based disorders such as schizophrenia.

  2. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    Science.gov (United States)

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  3. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data

    Science.gov (United States)

    Ben-Ari Fuchs, Shani; Lieder, Iris; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-01-01

    Abstract Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®—the human gene database; the MalaCards—the human diseases database; and the PathCards—the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics

  4. Investigating the effect of paralogs on microarray gene-set analysis

    LENUS (Irish Health Repository)

    Faure, Andre J

    2011-01-24

    Abstract Background In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research. Results We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene http:\\/\\/www.cbio.uct.ac.za\\/indygene, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs. Conclusions The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.

  5. Global adaptive rank truncated product method for gene-set analysis in association studies.

    Science.gov (United States)

    Vilor-Tejedor, Natalia; Calle, M Luz

    2014-09-01

    Gene set analysis (GSA) aims to assess the overall association of a set of genetic variants with a phenotype and has the potential to detect subtle effects of variants in a gene or a pathway that might be missed when assessed individually. We present a new implementation of the Adaptive Rank Truncated Product method (ARTP) for analyzing the association of a set of Single Nucleotide Polymorphisms (SNPs) in a gene or pathway. The new implementation, referred to as globalARTP, improves the original one by allowing the different SNPs in the set to have different modes of inheritance. We perform a simulation study for exploring the power of the proposed methodology in a set of scenarios with different numbers of causal SNPs with different effect sizes. Moreover, we show the advantage of using the gene set approach in the context of an Alzheimer's disease case-control study where we explore the endocytosis pathway. The new method is implemented in the R function globalARTP of the globalGSA package available at http://cran.r-project.org. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Gene-Based Analysis of Regionally Enriched Cortical Genes in GWAS Data Sets of Cognitive Traits and Psychiatric Disorders

    DEFF Research Database (Denmark)

    Ersland, Kari M; Christoforou, Andrea; Stansberg, Christine;

    2012-01-01

    the regionally enriched cortical genes to mine a genome-wide association study (GWAS) of the Norwegian Cognitive NeuroGenetics (NCNG) sample of healthy adults for association to nine psychometric tests measures. In addition, we explored GWAS data sets for the serious psychiatric disorders schizophrenia (SCZ) (n......Despite its estimated high heritability, the genetic architecture leading to differences in cognitive performance remains poorly understood. Different cortical regions play important roles in normal cognitive functioning and impairment. Recently, we reported on sets of regionally enriched genes...... in three different cortical areas (frontomedial, temporal and occipital cortices) of the adult rat brain. It has been suggested that genes preferentially, or specifically, expressed in one region or organ reflect functional specialisation. Employing a gene-based approach to the analysis, we used...

  7. Inferring phylogenies with incomplete data sets: a 5-gene, 567-taxon analysis of angiosperms

    Directory of Open Access Journals (Sweden)

    Hilu Khidir W

    2009-03-01

    Full Text Available Abstract Background Phylogenetic analyses of angiosperm relationships have used only a small percentage of available sequence data, but phylogenetic data matrices often can be augmented with existing data, especially if one allows missing characters. We explore the effects on phylogenetic analyses of adding 378 matK sequences and 240 26S rDNA sequences to the complete 3-gene, 567-taxon angiosperm phylogenetic matrix of Soltis et al. Results We performed maximum likelihood bootstrap analyses of the complete, 3-gene 567-taxon data matrix and the incomplete, 5-gene 567-taxon data matrix. Although the 5-gene matrix has more missing data (27.5% than the 3-gene data matrix (2.9%, the 5-gene analysis resulted in higher levels of bootstrap support. Within the 567-taxon tree, the increase in support is most evident for relationships among the 170 taxa for which both matK and 26S rDNA sequences were added, and there is little gain in support for relationships among the 119 taxa having neither matK nor 26S rDNA sequences. The 5-gene analysis also places the enigmatic Hydrostachys in Lamiales (BS = 97% rather than in Cornales (BS = 100% in 3-gene analysis. The placement of Hydrostachys in Lamiales is unprecedented in molecular analyses, but it is consistent with embryological and morphological data. Conclusion Adding available, and often incomplete, sets of sequences to existing data sets can be a fast and inexpensive way to increase support for phylogenetic relationships and produce novel and credible new phylogenetic hypotheses.

  8. Core gene set as the basis of multilocus sequence analysis of the subclass Actinobacteridae.

    Directory of Open Access Journals (Sweden)

    Toïdi Adékambi

    Full Text Available Comparative genomic sequencing is shedding new light on bacterial identification, taxonomy and phylogeny. An in silico assessment of a core gene set necessary for cellular functioning was made to determine a consensus set of genes that would be useful for the identification, taxonomy and phylogeny of the species belonging to the subclass Actinobacteridae which contained two orders Actinomycetales and Bifidobacteriales. The subclass Actinobacteridae comprised about 85% of the actinobacteria families. The following recommended criteria were used to establish a comprehensive gene set; the gene should (i be long enough to contain phylogenetically useful information, (ii not be subject to horizontal gene transfer, (iii be a single copy (iv have at least two regions sufficiently conserved that allow the design of amplification and sequencing primers and (v predict whole-genome relationships. We applied these constraints to 50 different Actinobacteridae genomes and made 1,224 pairwise comparisons of the genome conserved regions and gene fragments obtained by using Sequence VARiability Analysis Program (SVARAP, which allow designing the primers. Following a comparative statistical modeling phase, 3 gene fragments were selected, ychF, rpoB, and secY with R2>0.85. Selected sets of broad range primers were tested from the 3 gene fragments and were demonstrated to be useful for amplification and sequencing of 25 species belonging to 9 genera of Actinobacteridae. The intraspecies similarities were 96.3-100% for ychF, 97.8-100% for rpoB and 96.9-100% for secY among 73 strains belonging to 15 species of the subclass Actinobacteridae compare to 99.4-100% for 16S rRNA. The phylogenetic topology obtained from the combined datasets ychF+rpoB+secY was globally similar to that inferred from the 16S rRNA but with higher confidence. It was concluded that multi-locus sequence analysis using core gene set might represent the first consensus and valid approach for

  9. Gowinda: unbiased analysis of gene set enrichment for genome-wide association studies.

    Science.gov (United States)

    Kofler, Robert; Schlötterer, Christian

    2012-08-01

    An analysis of gene set [e.g. Gene Ontology (GO)] enrichment assumes that all genes are sampled independently from each other with the same probability. These assumptions are violated in genome-wide association (GWA) studies since (i) longer genes typically have more single-nucleotide polymorphisms resulting in a higher probability of being sampled and (ii) overlapping genes are sampled in clusters. Herein, we introduce Gowinda, a software specifically designed to test for enrichment of gene sets in GWA studies. We show that GO tests on GWA data could result in a substantial number of false-positive GO terms. Permutation tests implemented in Gowinda eliminate these biases, but maintain sufficient power to detect enrichment of GO terms. Since sufficient resolution for large datasets requires millions of permutations, we use multi-threading to keep computation times reasonable. Gowinda is implemented in Java (v1.6) and freely available on http://code.google.com/p/gowinda/ christian.schloetterer@vetmeduni.ac.at Manual: http://code.google.com/p/gowinda/wiki/Manual. Test data and tutorial: http://code.google.com/p/gowinda/wiki/Tutorial. http://code.google.com/p/gowinda/wiki/VALIDATION.

  10. Integrated analysis of DNA copy number and gene expression microarray data using gene sets

    NARCIS (Netherlands)

    R.X. de Menezes (Renee); M. Boetzer (Marten); M. Sieswerda (Melle); G.J.B. van Ommen; J.M. Boer (Judith)

    2009-01-01

    textabstractBackground: Genes that play an important role in tumorigenesis are expected to show association between DNA copy number and RNA expression. Optimal power to find such associations can only be achieved if analysing copy number and gene expression jointly. Furthermore, some copy number

  11. Meta-analysis of differentiating mouse embryonic stem cell gene expression kinetics reveals early change of a small gene set.

    Directory of Open Access Journals (Sweden)

    Clive H Glover

    2006-11-01

    Full Text Available Stem cell differentiation involves critical changes in gene expression. Identification of these should provide endpoints useful for optimizing stem cell propagation as well as potential clues about mechanisms governing stem cell maintenance. Here we describe the results of a new meta-analysis methodology applied to multiple gene expression datasets from three mouse embryonic stem cell (ESC lines obtained at specific time points during the course of their differentiation into various lineages. We developed methods to identify genes with expression changes that correlated with the altered frequency of functionally defined, undifferentiated ESC in culture. In each dataset, we computed a novel statistical confidence measure for every gene which captured the certainty that a particular gene exhibited an expression pattern of interest within that dataset. This permitted a joint analysis of the datasets, despite the different experimental designs. Using a ranking scheme that favored genes exhibiting patterns of interest, we focused on the top 88 genes whose expression was consistently changed when ESC were induced to differentiate. Seven of these (103728_at, 8430410A17Rik, Klf2, Nr0b1, Sox2, Tcl1, and Zfp42 showed a rapid decrease in expression concurrent with a decrease in frequency of undifferentiated cells and remained predictive when evaluated in additional maintenance and differentiating protocols. Through a novel meta-analysis, this study identifies a small set of genes whose expression is useful for identifying changes in stem cell frequencies in cultures of mouse ESC. The methods and findings have broader applicability to understanding the regulation of self-renewal of other stem cell types.

  12. Gene-set meta-analysis of lung cancer identifies pathway related to systemic lupus erythematosus.

    Science.gov (United States)

    Rosenberger, Albert; Sohns, Melanie; Friedrichs, Stefanie; Hung, Rayjean J; Fehringer, Gord; McLaughlin, John; Amos, Christopher I; Brennan, Paul; Risch, Angela; Brüske, Irene; Caporaso, Neil E; Landi, Maria Teresa; Christiani, David C; Wei, Yongyue; Bickeböller, Heike

    2017-01-01

    Gene-set analysis (GSA) is an approach using the results of single-marker genome-wide association studies when investigating pathways as a whole with respect to the genetic basis of a disease. We performed a meta-analysis of seven GSAs for lung cancer, applying the method META-GSA. Overall, the information taken from 11,365 cases and 22,505 controls from within the TRICL/ILCCO consortia was used to investigate a total of 234 pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. META-GSA reveals the systemic lupus erythematosus KEGG pathway hsa05322, driven by the gene region 6p21-22, as also implicated in lung cancer (p = 0.0306). This gene region is known to be associated with squamous cell lung carcinoma. The most important genes driving the significance of this pathway belong to the genomic areas HIST1-H4L, -1BN, -2BN, -H2AK, -H4K and C2/C4A/C4B. Within these areas, the markers most significantly associated with LC are rs13194781 (located within HIST12BN) and rs1270942 (located between C2 and C4A). We have discovered a pathway currently marked as specific to systemic lupus erythematosus as being significantly implicated in lung cancer. The gene region 6p21-22 in this pathway appears to be more extensively associated with lung cancer than previously assumed. Given wide-stretched linkage disequilibrium to the area APOM/BAG6/MSH5, there is currently simply not enough information or evidence to conclude whether the potential pleiotropy of lung cancer and systemic lupus erythematosus is spurious, biological, or mediated. Further research into this pathway and gene region will be necessary.

  13. Microarray analysis identifies a common set of cellular genes modulated by different HCV replicon clones

    Directory of Open Access Journals (Sweden)

    Gerosolimo Germano

    2008-06-01

    Full Text Available Abstract Background Hepatitis C virus (HCV RNA synthesis and protein expression affect cell homeostasis by modulation of gene expression. The impact of HCV replication on global cell transcription has not been fully evaluated. Thus, we analysed the expression profiles of different clones of human hepatoma-derived Huh-7 cells carrying a self-replicating HCV RNA which express all viral proteins (HCV replicon system. Results First, we compared the expression profile of HCV replicon clone 21-5 with both the Huh-7 parental cells and the 21-5 cured (21-5c cells. In these latter, the HCV RNA has been eliminated by IFN-α treatment. To confirm data, we also analyzed microarray results from both the 21-5 and two other HCV replicon clones, 22-6 and 21-7, compared to the Huh-7 cells. The study was carried out by using the Applied Biosystems (AB Human Genome Survey Microarray v1.0 which provides 31,700 probes that correspond to 27,868 human genes. Microarray analysis revealed a specific transcriptional program induced by HCV in replicon cells respect to both IFN-α-cured and Huh-7 cells. From the original datasets of differentially expressed genes, we selected by Venn diagrams a final list of 38 genes modulated by HCV in all clones. Most of the 38 genes have never been described before and showed high fold-change associated with significant p-value, strongly supporting data reliability. Classification of the 38 genes by Panther System identified functional categories that were significantly enriched in this gene set, such as histones and ribosomal proteins as well as extracellular matrix and intracellular protein traffic. The dataset also included new genes involved in lipid metabolism, extracellular matrix and cytoskeletal network, which may be critical for HCV replication and pathogenesis. Conclusion Our data provide a comprehensive analysis of alterations in gene expression induced by HCV replication and reveal modulation of new genes potentially useful

  14. Selection and validation of a set of reliable reference genes for quantitative sod gene expression analysis in C. elegans

    Directory of Open Access Journals (Sweden)

    Vandesompele Jo

    2008-01-01

    Full Text Available Abstract Background In the nematode Caenorhabditis elegans the conserved Ins/IGF-1 signaling pathway regulates many biological processes including life span, stress response, dauer diapause and metabolism. Detection of differentially expressed genes may contribute to a better understanding of the mechanism by which the Ins/IGF-1 signaling pathway regulates these processes. Appropriate normalization is an essential prerequisite for obtaining accurate and reproducible quantification of gene expression levels. The aim of this study was to establish a reliable set of reference genes for gene expression analysis in C. elegans. Results Real-time quantitative PCR was used to evaluate the expression stability of 12 candidate reference genes (act-1, ama-1, cdc-42, csq-1, eif-3.C, mdh-1, gpd-2, pmp-3, tba-1, Y45F10D.4, rgs-6 and unc-16 in wild-type, three Ins/IGF-1 pathway mutants, dauers and L3 stage larvae. After geNorm analysis, cdc-42, pmp-3 and Y45F10D.4 showed the most stable expression pattern and were used to normalize 5 sod expression levels. Significant differences in mRNA levels were observed for sod-1 and sod-3 in daf-2 relative to wild-type animals, whereas in dauers sod-1, sod-3, sod-4 and sod-5 are differentially expressed relative to third stage larvae. Conclusion Our findings emphasize the importance of accurate normalization using stably expressed reference genes. The methodology used in this study is generally applicable to reliably quantify gene expression levels in the nematode C. elegans using quantitative PCR.

  15. Joint genetic analysis using variant sets reveals polygenic gene-context interactions.

    Directory of Open Access Journals (Sweden)

    Francesco Paolo Casale

    2017-04-01

    Full Text Available Joint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait models have been designed to increase power for detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed models to test for interactions between sets of variants and environmental states or other contexts. Our model generalizes previous interaction tests and in particular provides a test for local differences in the genetic architecture between contexts. We first use simulations to validate iSet before applying the model to the analysis of genotype-environment interactions in an eQTL study. Our model retrieves a larger number of interactions than alternative methods and reveals that up to 20% of cases show context-specific configurations of causal variants. Finally, we apply iSet to test for sub-group specific genetic effects in human lipid levels in a large human cohort, where we identify a gene-sex interaction for C-reactive protein that is missed by alternative methods.

  16. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer

    Energy Technology Data Exchange (ETDEWEB)

    Pandi, Narayanan Sathiya, E-mail: sathiyapandi@gmail.com; Suganya, Sivagurunathan; Rajendran, Suriliyandi

    2013-10-04

    Highlights: •Identified stomach lineage specific gene set (SLSGS) was found to be under expressed in gastric tumors. •Elevated expression of SLSGS in gastric tumor is a molecular predictor of metabolic type gastric cancer. •In silico pathway scanning identified estrogen-α signaling is a putative regulator of SLSGS in gastric cancer. •Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. -- Abstract: Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.

  17. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer.

    Science.gov (United States)

    Pandi, Narayanan Sathiya; Suganya, Sivagurunathan; Rajendran, Suriliyandi

    2013-10-04

    Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Combining distance matrices on identical taxon sets for multi-gene analysis with singular value decomposition.

    Directory of Open Access Journals (Sweden)

    Melanie Abeysundera

    Full Text Available We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97, and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.

  19. Gene set based integrated data analysis reveals phenotypic differences in a brain cancer model.

    Directory of Open Access Journals (Sweden)

    Kjell Petersen

    Full Text Available A key challenge in the data analysis of biological high-throughput experiments is to handle the often low number of samples in the experiments compared to the number of biomolecules that are simultaneously measured. Combining experimental data using independent technologies to illuminate the same biological trends, as well as complementing each other in a larger perspective, is one natural way to overcome this challenge. In this work we investigated if integrating proteomics and transcriptomics data from a brain cancer animal model using gene set based analysis methodology, could enhance the biological interpretation of the data relative to more traditional analysis of the two datasets individually. The brain cancer model used is based on serial passaging of transplanted human brain tumor material (glioblastoma--GBM through several generations in rats. These serial transplantations lead over time to genotypic and phenotypic changes in the tumors and represent a medically relevant model with a rare access to samples and where consequent analyses of individual datasets have revealed relatively few significant findings on their own. We found that the integrated analysis both performed better in terms of significance measure of its findings compared to individual analyses, as well as providing independent verification of the individual results. Thus a better context for overall biological interpretation of the data can be achieved.

  20. Transcriptomic Analysis Identifies Candidate Genes and Gene Sets Controlling the Response of Porcine Peripheral Blood Mononuclear Cells to Poly I:C Stimulation

    Directory of Open Access Journals (Sweden)

    Jiying Wang

    2016-05-01

    Full Text Available Polyinosinic-polycytidylic acid (poly I:C, a synthetic dsRNA analog, has been demonstrated to have stimulatory effects similar to viral dsRNA. To gain deep knowledge of the host transcriptional response of pigs to poly I:C stimulation, in the present study, we cultured and stimulated peripheral blood mononuclear cells (PBMC of piglets of one Chinese indigenous breed (Dapulian and one modern commercial breed (Landrace with poly I:C, and compared their transcriptional profiling using RNA-sequencing (RNA-seq. Our results indicated that poly I:C stimulation can elicit significantly differentially expressed (DE genes in Dapulian (g = 290 as well as Landrace (g = 85. We also performed gene set analysis using the Gene Set Enrichment Analysis (GSEA package, and identified some significantly enriched gene sets in Dapulian (g = 18 and Landrace (g = 21. Most of the shared DE genes and gene sets were immune-related, and may play crucial rules in the immune response of poly I:C stimulation. In addition, we detected large sets of significantly DE genes and enriched gene sets when comparing the gene expression profile between the two breeds, including control and poly I:C stimulation groups. Besides immune-related functions, some of the DE genes and gene sets between the two breeds were involved in development and growth of various tissues, which may be correlated with the different characteristics of the two breeds. The DE genes and gene sets detected herein provide crucial information towards understanding the immune regulation of antiviral responses, and the molecular mechanisms of different genetic resistance to viral infection, in modern and indigenous pigs.

  1. Gene set enrichment analysis and ingenuity pathway analysis of metastatic clear cell renal cell carcinoma cell line.

    Science.gov (United States)

    Khan, Mohammed I; Dębski, Konrad J; Dabrowski, Michał; Czarnecka, Anna M; Szczylik, Cezary

    2016-08-01

    In recent years, genome-wide RNA expression analysis has become a routine tool that offers a great opportunity to study and understand the key role of genes that contribute to carcinogenesis. Various microarray platforms and statistical approaches can be used to identify genes that might serve as prognostic biomarkers and be developed as antitumor therapies in the future. Metastatic renal cell carcinoma (mRCC) is a serious, life-threatening disease, and there are few treatment options for patients. In this study, we performed one-color microarray gene expression (4×44K) analysis of the mRCC cell line Caki-1 and the healthy kidney cell line ASE-5063. A total of 1,921 genes were differentially expressed in the Caki-1 cell line (1,023 upregulated and 898 downregulated). Gene Set Enrichment Analysis (GSEA) and Ingenuity Pathway Analysis (IPA) approaches were used to analyze the differential-expression data. The objective of this research was to identify complex biological changes that occur during metastatic development using Caki-1 as a model mRCC cell line. Our data suggest that there are multiple deregulated pathways associated with metastatic clear cell renal cell carcinoma (mccRCC), including integrin-linked kinase (ILK) signaling, leukocyte extravasation signaling, IGF-I signaling, CXCR4 signaling, and phosphoinositol 3-kinase/AKT/mammalian target of rapamycin signaling. The IPA upstream analysis predicted top transcriptional regulators that are either activated or inhibited, such as estrogen receptors, TP53, KDM5B, SPDEF, and CDKN1A. The GSEA approach was used to further confirm enriched pathway data following IPA.

  2. A meta-analysis of multiple matched copy number and transcriptomics data sets for inferring gene regulatory relationships.

    Directory of Open Access Journals (Sweden)

    Richard Newton

    Full Text Available Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments.

  3. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    2016-01-01

    Full Text Available Among non-small cell lung cancer (NSCLC, adenocarcinoma (AC, and squamous cell carcinoma (SCC are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR, can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  4. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm.

    Science.gov (United States)

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu; Tian, Suyan

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  5. Transcriptome Analysis Reveals Candidate Genes Involved in Gibberellin-Induced Fruit Setting in Triploid Loquat (Eriobotrya japonica)

    Science.gov (United States)

    Jiang, Shuang; Luo, Jun; Xu, Fanjie; Zhang, Xueying

    2016-01-01

    The triploid loquat (Eriobotrya japonica) is a new germplasm with a high edible fruit rate. Under natural conditions, the triploid loquat has a low fruit setting ratio (not more than 10 fruits in a tree), reflecting fertilization failure. To unravel the molecular mechanism of gibberellin (GA) treatment to induce parthenocarpy in triploid loquats, a transcriptome analysis of fruit setting induced by GA3 was analyzed using RNA-seq at four different stages during the development of young fruit. Approximately 344 million high quality reads in seven libraries were de novo assembled, yielding 153,900 unique transcripts with more than 79.9% functionally annotated transcripts. A total of 2,220, 2,974, and 1,614 differentially expressed genes (DEGs) were observed at 3, 7, and 14 days after GA treatment, respectively. The weighted gene co-expression network and Venn diagram analysis of DEGs revealed that sixteen candidate genes may play critical roles in the fruit setting after GA treatment. Five genes were related to auxin, in which one auxin synthesis gene of yucca was upregulated, suggesting that auxin may act as a signal for fruit setting. Furthermore, ABA 8′-hydroxylase was upregulated, while ethylene-forming enzyme was downregulated, suggesting that multiple hormones may be involved in GA signaling. Four transcription factors, NAC7, NAC23, bHLH35, and HD16, were potentially negatively regulated in fruit setting, and two cell division-related genes, arr9 and CYCA3, were upregulated. In addition, the expression of the GA receptor gid1 was downregulated by GA treatment, suggesting that the negative feedback mechanism in GA signaling may be regulated by gid1. Altogether, the results of the present study provide information from a comprehensive gene expression analysis and insight into the molecular mechanism underlying fruit setting under GA treatment in E. japonica. PMID:28066478

  6. Genetic network and gene set enrichment analysis to identify biomarkers related to cigarette smoking and lung cancer.

    Science.gov (United States)

    Fang, Xiaocong; Netzer, Michael; Baumgartner, Christian; Bai, Chunxue; Wang, Xiangdong

    2013-02-01

    Cigarette smoking is the most demonstrated risk factor for the development of lung cancer, while the related genetic mechanisms are still unclear. The preprocessed microarray expression dataset was downloaded from Gene Expression Omnibus database. Samples were classified according to the disease state, stage and smoking state. A new computational strategy was applied for the identification and biological interpretation of new candidate genes in lung cancer and smoking by coupling a network-based approach with gene set enrichment analysis. Network analysis was performed by pair-wise comparison according to the disease states (tumor or normal), smoking states (current smokers or nonsmokers or former smokers), or the disease stage (stages I-IV). The most activated metabolic pathways were identified by gene set enrichment analysis. Panels of top ranked gene candidates in smoking or cancer development were identified, including genes involved in cell proliferation and drug metabolism like cytochrome P450 and WW domain containing transcription regulator 1. Semaphorin 5A and protein phosphatase 1F are the common genes represented as major hubs in both the smoking and cancer related network. Six pathways, e.g. cell cycle, DNA replication, RNA transport, protein processing in endoplasmic reticulum, vascular smooth muscle contraction and endocytosis were commonly involved in smoking and lung cancer when comparing the top ten selected pathways. New approach of bioinformatics for biomarker identification and validation can probe into deep genetic relationships between cigarette smoking and lung cancer. Our studies indicate that disease-specific network biomarkers, interaction between genes/proteins, or cross-talking of pathways provide more specific values for the development of precision therapies for lung. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. Phospholipase C isozymes are deregulated in colorectal cancer--insights gained from gene set enrichment analysis of the transcriptome.

    Directory of Open Access Journals (Sweden)

    Stine A Danielsen

    Full Text Available Colorectal cancer (CRC is one of the most common cancer types in developed countries. To identify molecular networks and biological processes that are deregulated in CRC compared to normal colonic mucosa, we applied Gene Set Enrichment Analysis to two independent transcriptome datasets, including a total of 137 CRC and ten normal colonic mucosa samples. Eighty-two gene sets as described by the Kyoto Encyclopedia of Genes and Genomes database had significantly altered gene expression in both datasets. These included networks associated with cell division, DNA maintenance, and metabolism. Among signaling pathways with known changes in key genes, the "Phosphatidylinositol signaling network", comprising part of the PI3K pathway, was found deregulated. The downregulated genes in this pathway included several members of the Phospholipase C protein family, and the reduced expression of two of these, PLCD1 and PLCE1, were successfully validated in CRC biopsies (n = 70 and cell lines (n = 19 by quantitative analyses. The repression of both genes was found associated with KRAS mutations (P = 0.005 and 0.006, respectively, and we observed that microsatellite stable carcinomas with reduced PLCD1 expression more frequently had TP53 mutations (P = 0.002. Promoter methylation analyses of PLCD1 and PLCE1 performed in cell lines and tumor biopsies revealed that methylation of PLCD1 can contribute to reduced expression in 40% of the microsatellite instable carcinomas. In conclusion, we have identified significantly deregulated pathways in CRC, and validated repression of PLCD1 and PLCE1 expression. This illustrates that the GSEA approach may guide discovery of novel biomarkers in cancer.

  8. rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.

    Science.gov (United States)

    Hundt, Christian; Hildebrandt, Andreas; Schmidt, Bertil

    2016-09-23

    Gene Set Enrichment Analysis (GSEA) is a popular method to reveal significant dependencies between predefined sets of gene symbols and observed phenotypes by evaluating the deviation of gene expression values between cases and controls. An established measure of inter-class deviation, the enrichment score, is usually computed using a weighted running sum statistic over the whole set of gene symbols. Due to the lack of analytic expressions the significance of enrichment scores is determined using a non-parametric estimation of their null distribution by permuting the phenotype labels of the probed patients. Accordingly, GSEA is a time-consuming task due to the large number of required permutations to accurately estimate the nominal p-value - a circumstance that is even more pronounced during multiple hypothesis testing since its estimate is lower-bounded by the inverse number of samples in permutation space. We present rapidGSEA - a software suite consisting of two tools for facilitating permutation-based GSEA: cudaGSEA and ompGSEA. cudaGSEA is a CUDA-accelerated tool using fine-grained parallelization schemes on massively parallel architectures while ompGSEA is a coarse-grained multi-threaded tool for multi-core CPUs. Nominal p-value estimation of 4,725 gene sets on a data set consisting of 20,639 unique gene symbols and 200 patients (183 cases + 17 controls) each probing one million permutations takes 19 hours on a Xeon CPU and less than one hour on a GeForce Titan X GPU while the established GSEA tool from the Broad Institute (broadGSEA) takes roughly 13 days. cudaGSEA outperforms broadGSEA by around two orders-of-magnitude on a single Tesla K40c or GeForce Titan X GPU. ompGSEA provides around one order-of-magnitude speedup to broadGSEA on a standard Xeon CPU. The rapidGSEA suite is open-source software and can be downloaded at https://github.com/gravitino/cudaGSEA as standalone application or package for the R framework.

  9. Transcriptome analysis of cortical tissue reveals shared sets of downregulated genes in autism and schizophrenia

    Science.gov (United States)

    Ellis, S E; Panitch, R; West, A B; Arking, D E

    2016-01-01

    Autism (AUT), schizophrenia (SCZ) and bipolar disorder (BPD) are three highly heritable neuropsychiatric conditions. Clinical similarities and genetic overlap between the three disorders have been reported; however, the causes and the downstream effects of this overlap remain elusive. By analyzing transcriptomic RNA-sequencing data generated from post-mortem cortical brain tissues from AUT, SCZ, BPD and control subjects, we have begun to characterize the extent of gene expression overlap between these disorders. We report that the AUT and SCZ transcriptomes are significantly correlated (P<0.001), whereas the other two cross-disorder comparisons (AUT–BPD and SCZ–BPD) are not. Among AUT and SCZ, we find that the genes differentially expressed across disorders are involved in neurotransmission and synapse regulation. Despite the lack of global transcriptomic overlap across all three disorders, we highlight two genes, IQSEC3 and COPS7A, which are significantly downregulated compared with controls across all three disorders, suggesting either shared etiology or compensatory changes across these neuropsychiatric conditions. Finally, we tested for enrichment of genes differentially expressed across disorders in genetic association signals in AUT, SCZ or BPD, reporting lack of signal in any of the previously published genome-wide association study (GWAS). Together, these studies highlight the importance of examining gene expression from the primary tissue involved in neuropsychiatric conditions—the cortical brain. We identify a shared role for altered neurotransmission and synapse regulation in AUT and SCZ, in addition to two genes that may more generally contribute to neurodevelopmental and neuropsychiatric conditions. PMID:27219343

  10. Comparative genomic analysis of the family Iridoviridae: re-annotating and defining the core set of iridovirus genes

    Directory of Open Access Journals (Sweden)

    Upton Chris

    2007-01-01

    Full Text Available Abstract Background Members of the family Iridoviridae can cause severe diseases resulting in significant economic and environmental losses. Very little is known about how iridoviruses cause disease in their host. In the present study, we describe the re-analysis of the Iridoviridae family of complex DNA viruses using a variety of comparative genomic tools to yield a greater consensus among the annotated sequences of its members. Results A series of genomic sequence comparisons were made among, and between the Ranavirus and Megalocytivirus genera in order to identify novel conserved ORFs. Of these two genera, the Megalocytivirus genomes required the greatest number of altered annotations. Prior to our re-analysis, the Megalocytivirus species orange-spotted grouper iridovirus and rock bream iridovirus shared 99% sequence identity, but only 82 out of 118 potential ORFs were annotated; in contrast, we predict that these species share an identical complement of genes. These annotation changes allowed the redefinition of the group of core genes shared by all iridoviruses. Seven new core genes were identified, bringing the total number to 26. Conclusion Our re-analysis of genomes within the Iridoviridae family provides a unifying framework to understand the biology of these viruses. Further re-defining the core set of iridovirus genes will continue to lead us to a better understanding of the phylogenetic relationships between individual iridoviruses as well as giving us a much deeper understanding of iridovirus replication. In addition, this analysis will provide a better framework for characterizing and annotating currently unclassified iridoviruses.

  11. The analysis of translation-related gene set boosts debates around origin and evolution of mimiviruses

    Science.gov (United States)

    Colson, Philippe; La Scola, Bernard

    2017-01-01

    The giant mimiviruses challenged the well-established concept of viruses, blurring the roots of the tree of life, mainly due to their genetic content. Along with other nucleo-cytoplasmic large DNA viruses, they compose a new proposed order—named Megavirales—whose origin and evolution generate heated debate in the scientific community. The presence of an arsenal of genes not widespread in the virosphere related to important steps of the translational process, including transfer RNAs, aminoacyl-tRNA synthetases, and translation factors for peptide synthesis, constitutes an important element of this debate. In this review, we highlight the main findings to date about the translational machinery of the mimiviruses and compare their distribution along the distinct members of the family Mimiviridae. Furthermore, we discuss how the presence and/or absence of the translation-related genes among mimiviruses raises important insights to boost the debate on their origin and evolutionary history. PMID:28207761

  12. Microarray analysis identifies a common set of cellular genes modulated by different HCV replicon clones

    OpenAIRE

    Gerosolimo Germano; Dallapiccola Bruno; Bruni Roberto; Ferraris Alessandro; Tataseo Paola; Tritarelli Elena; Marcantonio Cinzia; Ciccaglione Anna; Costantino Angela; Rapicetta Maria

    2008-01-01

    Abstract Background Hepatitis C virus (HCV) RNA synthesis and protein expression affect cell homeostasis by modulation of gene expression. The impact of HCV replication on global cell transcription has not been fully evaluated. Thus, we analysed the expression profiles of different clones of human hepatoma-derived Huh-7 cells carrying a self-replicating HCV RNA which express all viral proteins (HCV replicon system). Results First, we compared the expression profile of HCV replicon clone 21-5 ...

  13. Comparative genomic analysis of SET domain family reveals the origin, expansion, and putative function of the arthropod-specific SmydA genes as histone modifiers in insects

    Science.gov (United States)

    Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling

    2017-01-01

    Abstract The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain–containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. PMID:28444351

  14. Functional Gene-Set Analysis Does Not Support a Major Role for Synaptic Function in Attention Deficit/Hyperactivity Disorder (ADHD

    Directory of Open Access Journals (Sweden)

    Anke R. Hammerschlag

    2014-07-01

    Full Text Available Attention Deficit/Hyperactivity Disorder (ADHD is one of the most common childhood-onset neuropsychiatric disorders. Despite high heritability estimates, genome-wide association studies (GWAS have failed to find significant genetic associations, likely due to the polygenic character of ADHD. Nevertheless, genetic studies suggested the involvement of several processes important for synaptic function. Therefore, we applied a functional gene-set analysis to formally test whether synaptic functions are associated with ADHD. Gene-set analysis tests the joint effect of multiple genetic variants in groups of functionally related genes. This method provides increased statistical power compared to conventional GWAS. We used data from the Psychiatric Genomics Consortium including 896 ADHD cases and 2455 controls, and 2064 parent-affected offspring trios, providing sufficient statistical power to detect gene sets representing a genotype relative risk of at least 1.17. Although all synaptic genes together showed a significant association with ADHD, this association was not stronger than that of randomly generated gene sets matched for same number of genes. Further analyses showed no association of specific synaptic function categories with ADHD after correction for multiple testing. Given current sample size and gene sets based on current knowledge of genes related to synaptic function, our results do not support a major role for common genetic variants in synaptic genes in the etiology of ADHD.

  15. Social Set Analysis

    DEFF Research Database (Denmark)

    Vatrapu, Ravi; Mukkamala, Raghava Rao; Hussain, Abid

    2016-01-01

    Current analytical approaches in computational social science can be characterized by four dominant paradigms: text analysis (information extraction and classification), social network analysis (graph theory), social complexity analysis (complex systems science), and social simulations (cellular...... automata and agent-based modeling). However, when it comes to organizational and societal units of analysis, there exists no approach to conceptualize, model, analyze, explain, and predict social media interactions as individuals' associations with ideas, values, identities, and so on. To address...... this limitation, based on the sociology of associations and the mathematics of set theory, this paper presents a new approach to big data analytics called social set analysis. Social set analysis consists of a generative framework for the philosophies of computational social science, theory of social data...

  16. Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and Gene-Set Based Genetic Algorithm.

    Science.gov (United States)

    Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui; Zhu, Hu

    2016-01-01

    Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization.

  17. Data set for diet specific differential gene expression analysis in three Spodoptera moths

    Directory of Open Access Journals (Sweden)

    A. Roy

    2016-09-01

    Full Text Available Examination of closely related species pairs is suggested for evolutionary comparisons of different degrees of polyphagy, which we did here with three taxa of lepidopteran herbivores, Spodoptera spp (S. littoralis, S. frugiperda maize (C and rice (R strains for a RNAseq analysis of the midguts from the 3rd instar insect larvae for differential metabolic responses after feeding on pinto bean based artificial diet vs maize leaves. Paired-end (2×100 bp Illumina HiSeq2500 sequencing resulted in a total of 24, 23, 24, and 21 million reads for the SF-C-Maize, SF-C-Pinto, SF-R-Maize, SF-R Pinto, and a total of 35 and 36 million reads for the SL-Maize and SL-Pinto samples, respectively. After quality control measures, a total of 62.2 million reads from SL and 71.7 million reads from SF were used for transcriptome assembly (TA. The resulting final de novo reference TA (backbone for the SF taxa contained 37,985 contigs with a N50 contig size of 1030 bp and a maximum contig length of 17,093 bp, while for SL, 28,329 contigs were generated with a N50 contig size of 1980 bp and a maximum contig length of 18,267 bp. The data presented herein contains supporting information related to our research article Roy et al. (2016 http://dx.doi.org/10.1016/j.ibmb.2016.02.006 [1].

  18. Gene set analyses for interpreting microarray experiments on prokaryotic organisms

    OpenAIRE

    Heffron Fred; Van Bruggen Dirk; DeJongh Matthew; Best Aaron A; Tintle Nathan L; Porwollik Steffen; Taylor Ronald C

    2008-01-01

    Abstract Background Despite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fisher's exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, ...

  19. Comparative genomic analysis of Brucella abortus vaccine strain 104M reveals a set of candidate genes associated with its virulence attenuation.

    Science.gov (United States)

    Yu, Dong; Hui, Yiming; Zai, Xiaodong; Xu, Junjie; Liang, Long; Wang, Bingxiang; Yue, Junjie; Li, Shanhu

    2015-01-01

    The Brucella abortus strain 104M, a spontaneously attenuated strain, has been used as a vaccine strain in humans against brucellosis for 6 decades in China. Despite many studies, the molecular mechanisms that cause the attenuation are still unclear. Here, we determined the whole-genome sequence of 104M and conducted a comprehensive comparative analysis against the whole genome sequences of the virulent strain, A13334, and other reference strains. This analysis revealed a highly similar genome structure between 104M and A13334. The further comparative genomic analysis between 104M and A13334 revealed a set of genes missing in 104M. Some of these genes were identified to be directly or indirectly associated with virulence. Similarly, a set of mutations in the virulence-related genes was also identified, which may be related to virulence alteration. This study provides a set of candidate genes associated with virulence attenuation in B.abortus vaccine strain 104M.

  20. Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrep-resented upstream motifs

    Directory of Open Access Journals (Sweden)

    Silengo Lorenzo

    2004-05-01

    Full Text Available Abstract Background Transcriptional regulation is a key mechanism in the functioning of the cell, and is mostly effected through transcription factors binding to specific recognition motifs located upstream of the coding region of the regulated gene. The computational identification of such motifs is made easier by the fact that they often appear several times in the upstream region of the regulated genes, so that the number of occurrences of relevant motifs is often significantly larger than expected by pure chance. Results To exploit this fact, we construct sets of genes characterized by the statistical overrepresentation of a certain motif in their upstream regions. Then we study the functional characterization of these sets by analyzing their annotation to Gene Ontology terms. For the sets showing a statistically significant specific functional characterization, we conjecture that the upstream motif characterizing the set is a binding site for a transcription factor involved in the regulation of the genes in the set. Conclusions The method we propose is able to identify many known binding sites in S. cerevisiae and new candidate targets of regulation by known transcritpion factors. Its application to less well studied organisms is likely to be valuable in the exploration of their regulatory interaction network.

  1. Social Set Analysis

    DEFF Research Database (Denmark)

    Vatrapu, Ravi; Hussain, Abid; Buus Lassen, Niels

    2015-01-01

    of Facebook or Twitter data. However, there exist no other holistic computational social science approach beyond the relational sociology and graph theory of SNA. To address this limitation, this paper presents an alternative holistic approach to Big Social Data analytics called Social Set Analysis (SSA......This paper argues that the basic premise of Social Network Analysis (SNA) -- namely that social reality is constituted by dyadic relations and that social interactions are determined by structural properties of networks-- is neither necessary nor sufficient, for Big Social Data analytics......). Based on the sociology of associations and the mathematics of classical, fuzzy and rough set theories, this paper proposes a research program. The function of which is to design, develop and evaluate social set analytics in terms of fundamentally novel formal models, predictive methods and visual...

  2. Whole-Transcriptome RNA-seq, Gene Set Enrichment Pathway Analysis, and Exon Coverage Analysis of Two Plastid RNA Editing Mutants.

    Science.gov (United States)

    Hackett, Justin B; Lu, Yan

    2017-04-07

    In land plants, plastid and mitochondrial RNAs are subject to post-transcriptional C-to-U RNA editing. T-DNA insertions in the ORGANELLE RNA RECOGNITION MOTIF PROTEIN6 gene resulted in reduced photosystem II (PSII) activity and smaller plant and leaf sizes. Exon coverage analysis of the ORRM6 gene showed that orrm6-1 and orrm6-2 are loss-of-function mutants. Compared to other ORRM proteins, ORRM6 affects a relative small number of RNA editing sites. Sanger sequencing of reverse transcription-PCR products of plastid transcripts revealed two plastid RNA editing sites that are substantially affected in the orrm6 mutants: psbF-C77 and accD-C794. The psbF gene encodes the beta subunit of cytochrome b559, an essential component of PSII. The accD gene encodes the beta subunit of acetyl-CoA carboxylase, a protein required in plastid fatty acid biosynthesis. Whole-transcriptome RNA-seq demonstrated that editing at psbF-C77 is nearly absent and the editing extent at accD-C794 was significantly reduced. Gene set enrichment pathway analysis showed that expression of multiple gene sets involved in photosynthesis, especially photosynthetic electron transport, is significantly up-regulated in both orrm6 mutants. The up-regulation could be a mechanism to compensate for the reduced PSII electron transport rate in the orrm6 mutants. These results further demonstrated that Organelle RNA Recognition Motif protein ORRM6 is required in editing of specific RNAs in the Arabidopsis (Arabidopsis thaliana) plastid.

  3. A MultiSite GatewayTM vector set for the functional analysis of genes in the model Saccharomyces cerevisiae

    Directory of Open Access Journals (Sweden)

    Nagels Durand Astrid

    2012-09-01

    Full Text Available Abstract Background Recombinatorial cloning using the GatewayTM technology has been the method of choice for high-throughput omics projects, resulting in the availability of entire ORFeomes in GatewayTM compatible vectors. The MultiSite GatewayTM system allows combining multiple genetic fragments such as promoter, ORF and epitope tag in one single reaction. To date, this technology has not been accessible in the yeast Saccharomyces cerevisiae, one of the most widely used experimental systems in molecular biology, due to the lack of appropriate destination vectors. Results Here, we present a set of three-fragment MultiSite GatewayTM destination vectors that have been developed for gene expression in S. cerevisiae and that allow the assembly of any promoter, open reading frame, epitope tag arrangement in combination with any of four auxotrophic markers and three distinct replication mechanisms. As an example of its applicability, we used yeast three-hybrid to provide evidence for the assembly of a ternary complex of plant proteins involved in jasmonate signalling and consisting of the JAZ, NINJA and TOPLESS proteins. Conclusion Our vectors make MultiSite GatewayTM cloning accessible in S. cerevisiae and implement a fast and versatile cloning method for the high-throughput functional analysis of (heterologous proteins in one of the most widely used model organisms for molecular biology research.

  4. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    Directory of Open Access Journals (Sweden)

    Nilotpal Chowdhury

    Full Text Available Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis.The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets.Four microarray series (having 742 patients were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA.Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed.To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and

  5. AnovArray: a set of SAS macros for the analysis of variance of gene expression data

    Directory of Open Access Journals (Sweden)

    Renard Jean-Paul

    2005-06-01

    Full Text Available Abstract Background Analysis of variance is a powerful approach to identify differentially expressed genes in a complex experimental design for microarray and macroarray data. The advantage of the anova model is the possibility to evaluate multiple sources of variation in an experiment. Results AnovArray is a package implementing ANOVA for gene expression data using SAS® statistical software. The originality of the package is 1 to quantify the different sources of variation on all genes together, 2 to provide a quality control of the model, 3 to propose two models for a gene's variance estimation and to perform a correction for multiple comparisons. Conclusion AnovArray is freely available at http://www-mig.jouy.inra.fr/stat/AnovArray and requires only SAS® statistical software.

  6. GO-based Functional Dissimilarity of Gene Sets

    Directory of Open Access Journals (Sweden)

    Aguilar-Ruiz Jesús S

    2011-09-01

    Full Text Available Abstract Background The Gene Ontology (GO provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. Results To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity, a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Conclusions Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG. It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD.

  7. Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome

    Directory of Open Access Journals (Sweden)

    Gaora Peadar Ó

    2010-10-01

    Full Text Available Abstract Background Currently, a number of bioinformatics methods are available to generate appropriate lists of genes from a microarray experiment. While these lists represent an accurate primary analysis of the data, fewer options exist to contextualise those lists. The development and validation of such methods is crucial to the wider application of microarray technology in the clinical setting. Two key challenges in clinical bioinformatics involve appropriate statistical modelling of dynamic transcriptomic changes, and extraction of clinically relevant meaning from very large datasets. Results Here, we apply an approach to gene set enrichment analysis that allows for detection of bi-directional enrichment within a gene set. Furthermore, we apply canonical correlation analysis and Fisher's exact test, using plasma marker data with known clinical relevance to aid identification of the most important gene and pathway changes in our transcriptomic dataset. After a 28-day dietary intervention with high-CLA beef, a range of plasma markers indicated a marked improvement in the metabolic health of genetically obese mice. Tissue transcriptomic profiles indicated that the effects were most dramatic in liver (1270 genes significantly changed; p Conclusion Bi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of

  8. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework.

    Science.gov (United States)

    Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.

  9. Gene set analyses for interpreting microarray experiments on prokaryotic organisms.

    Energy Technology Data Exchange (ETDEWEB)

    Tintle, Nathan; Best, Aaron; Dejongh, Matthew; VanBruggen, Dirk; Heffron, Fred; Porwollik, Steffen; Taylor, Ronald C.

    2008-11-05

    Background: Recent advances in microarray technology have brought with them the need for enhanced methods of biologically interpreting gene expression data. Recently, methods like Gene Set Enrichment Analysis (GSEA) and variants of Fisher’s exact test have been proposed which utilize a priori biological information. Typically, these methods are demonstrated with a priori biological information from the Gene Ontology. Results: Alternative gene set definitions are presented based on gene sets inferred from the SEED: open-source software environment for comparative genome annotation and analysis of microbial organisms. Many of these gene sets are then shown to provide consistent expression across a series of experiments involving Salmonella Typhimurium. Implementation of the gene sets in an analysis of microarray data is then presented for the Salmonella Typhimurium data. Conclusions: SEED inferred gene sets can be naturally defined based on subsystems in the SEED. The consistent expression values of these SEED inferred gene sets suggest their utility for statistical analyses of gene expression data based on a priori biological information

  10. Semantic particularity measure for functional characterization of gene sets using gene ontology.

    Science.gov (United States)

    Bettembourg, Charles; Diot, Christian; Dameron, Olivier

    2014-01-01

    Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity. We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term's distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure. Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.

  11. Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.

    Science.gov (United States)

    Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy

    2013-08-01

    Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association

  12. Mapping of three QTLs for seed setting and analysis on the candidate gene forqSS-1 in rice (Oryza sativa L.)

    Institute of Scientific and Technical Information of China (English)

    Elsheikh Y M Ahmed; ZHANG Yan-pei; YU Jian-ping; Rashid M A Rehman; ZHANG Zhan-ying; ZHANG Hong-liang; LI Jin-jie; LI Zi-chao

    2016-01-01

    The lower seed setting is one of the major hindrances which face grain yield in rice. One of the main reasons to cause low spikelet fertility (seed setting) is male sterility or polen abortion. Notably, polen abortion has been frequently observed in advanced progenies of rice. In the present study, 149 BC2F6 individuals with signiifcant segregation in spikelet fertility were generated from the cross between N040212 (indica) and Nipponbare (japonica) and used for primary gene mapping. Three QTLs,qSS-1, qSS-7 andqSS-9 at chromosomes 1, 7 and 9, respectively, were found to be associated with seed setting. The recombinant analysis and the physical mapping information from publicly available resources exhibited that theqSS-1, qSS-7 andqSS-9 loci were mapped to an interval of 188, 701 and 3741 kb, respectively. The seed setting re-sponsible for QTLqSS-1 was further ifne mapped to 93.5 kb by using BC2F7 population of 1849 individuals. There are 16 possible putative genes in this 93.5 kb region. Polen vitality tests and artiifcial polination indicated that the male gamete has abnormal polen while the female gamete was normal. These data showed that low seed setting rate relative toqSS-1 may be caused by abnormal polen grains. These results wil be useful for cloning, functional analysis of the target gene governing spikelet fertility (seed setting) and understanding the genetic bases of polen sterility.

  13. Human longevity and variation in DNA damage response and repair: study of the contribution of sub-processes using competitive gene-set analysis.

    Science.gov (United States)

    Debrabant, Birgit; Soerensen, Mette; Flachsbart, Friederike; Dato, Serena; Mengel-From, Jonas; Stevnsner, Tinna; Bohr, Vilhelm A; Kruse, Torben A; Schreiber, Stefan; Nebel, Almut; Christensen, Kaare; Tan, Qihua; Christiansen, Lene

    2014-09-01

    DNA-damage response and repair are crucial to maintain genetic stability, and are consequently considered central to aging and longevity. Here, we investigate whether this pathway overall associates to longevity, and whether specific sub-processes are more strongly associated with longevity than others. Data were applied on 592 SNPs from 77 genes involved in nine sub-processes: DNA-damage response, base excision repair (BER), nucleotide excision repair, mismatch repair, non-homologous end-joining, homologous recombinational repair (HRR), RecQ helicase activities (RECQ), telomere functioning and mitochondrial DNA processes. The study population was 1089 long-lived and 736 middle-aged Danes. A self-contained set-based test of all SNPs displayed association with longevity (P-value=9.9 × 10(-5)), supporting that the overall pathway could affect longevity. Investigation of the nine sub-processes using the competitive gene-set analysis by Wang et al indicated that BER, HRR and RECQ associated stronger with longevity than the respective remaining genes of the pathway (P-values=0.004-0.048). For HRR and RECQ, only one gene contributed to the significance, whereas for BER several genes contributed. These associations did, however, generally not pass correction for multiple testing. Still, these findings indicate that, of the entire pathway, variation in BER might influence longevity the most. These modest sized P-values were not replicated in a German sample. This might, though, be due to differences in genotyping procedures and investigated SNPs, potentially inducing differences in the coverage of gene regions. Specifically, five genes were not covered at all in the German data. Therefore, investigations in additional study populations are needed before final conclusion can be drawn.

  14. The Influence of DNA Extraction Procedure and Primer Set on the Bacterial Community Analysis by Pyrosequencing of Barcoded 16S rRNA Gene Amplicons.

    Science.gov (United States)

    Starke, Ingo C; Vahjen, Wilfried; Pieper, Robert; Zentek, Jürgen

    2014-01-01

    In this study, the effect of different DNA extraction procedures and primer sets on pyrosequencing results regarding the composition of bacterial communities in the ileum of piglets was investigated. Ileal chyme from piglets fed a diet containing different amounts of zinc oxide was used to evaluate a pyrosequencing study with barcoded 16S rRNA PCR products. Two DNA extraction methods (bead beating versus silica gel columns) and two primer sets targeting variable regions of bacterial 16S rRNA genes (8f-534r versus 968f-1401r) were considered. The SEED viewer software of the MG-RAST server was used for automated sequence analysis. A total of 5.2 × 10(5) sequences were used for analysis after processing for read length (150 bp), minimum sequence occurrence (5), and exclusion of eukaryotic and unclassified/uncultured sequences. DNA extraction procedures and primer sets differed significantly in total sequence yield. The distribution of bacterial order and main bacterial genera was influenced significantly by both parameters. However, this study has shown that the results of pyrosequencing studies using barcoded PCR amplicons of bacterial 16S rRNA genes depend on DNA extraction and primer choice, as well as on the manner of downstream sequence analysis.

  15. The Influence of DNA Extraction Procedure and Primer Set on the Bacterial Community Analysis by Pyrosequencing of Barcoded 16S rRNA Gene Amplicons

    Directory of Open Access Journals (Sweden)

    Ingo C. Starke

    2014-01-01

    Full Text Available In this study, the effect of different DNA extraction procedures and primer sets on pyrosequencing results regarding the composition of bacterial communities in the ileum of piglets was investigated. Ileal chyme from piglets fed a diet containing different amounts of zinc oxide was used to evaluate a pyrosequencing study with barcoded 16S rRNA PCR products. Two DNA extraction methods (bead beating versus silica gel columns and two primer sets targeting variable regions of bacterial 16S rRNA genes (8f-534r versus 968f-1401r were considered. The SEED viewer software of the MG-RAST server was used for automated sequence analysis. A total of 5.2×105 sequences were used for analysis after processing for read length (150 bp, minimum sequence occurrence (5, and exclusion of eukaryotic and unclassified/uncultured sequences. DNA extraction procedures and primer sets differed significantly in total sequence yield. The distribution of bacterial order and main bacterial genera was influenced significantly by both parameters. However, this study has shown that the results of pyrosequencing studies using barcoded PCR amplicons of bacterial 16S rRNA genes depend on DNA extraction and primer choice, as well as on the manner of downstream sequence analysis.

  16. Analysis of the real EADGENE data set:

    DEFF Research Database (Denmark)

    Sørensen, Peter; Bonnet, Agnès; Buitenhuis, Bart

    2007-01-01

    approach looked at differential expression of predefined gene sets. Gene sets were defined based on information retrieved from biological databases such as Gene Ontology. Based on these annotation sources the teams used either the GlobalTest or the Fisher exact test to identify differentially expressed......) or principal component analysis (PCA) to identify groups of differentially expressed genes with a similar expression pattern over time points and infective agent (E. coli or S. aureus). The main result from these analyses was that HC and PCA were able to separate tissue samples taken at 24 h following E. coli...... infection from the other samples. The second approach identified groups of differentially co-expressed genes, by identifying clusters of genes highly correlated when animals were infected with E. coli but not correlated more than expected by chance when the infective pathogen was S. aureus. The third...

  17. Analysis of the real EADGENE data set: Comparison of methods and guidelines for data normalisation and selection of differentially expressed genes (Open Access publication

    Directory of Open Access Journals (Sweden)

    Sørensen Peter

    2007-11-01

    Full Text Available Abstract A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence WP1.4 participants for data quality control, normalisation and statistical methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two different mastitis causing bacteria: Escherichia coli and Staphylococcus aureus. It was reassuring to see that most of the teams found the same main biological results. In fact, most of the differentially expressed genes were found for infection by E. coli between uninfected and 24 h challenged udder quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised a biological problem of cross-talk between infected and uninfected quarters which will have to be dealt with for further microarray studies.

  18. Gene Set-Based Functionome Analysis of Pathogenesis in Epithelial Ovarian Serous Carcinoma and the Molecular Features in Different FIGO Stages

    Directory of Open Access Journals (Sweden)

    Chia-Ming Chang

    2016-06-01

    Full Text Available Serous carcinoma (SC is the most common subtype of epithelial ovarian carcinoma and is divided into four stages by the Federation of Gynecologists and Obstetrics (FIGO staging system. Currently, the molecular functions and biological processes of SC at different FIGO stages have not been quantified. Here, we conducted a whole-genome integrative analysis to investigate the functions of SC at different stages. The function, as defined by the GO term or canonical pathway gene set, was quantified by measuring the changes in the gene expressional order between cancerous and normal control states. The quantified function, i.e., the gene set regularity (GSR index, was utilized to investigate the pathogenesis and functional regulation of SC at different FIGO stages. We showed that the informativeness of the GSR indices was sufficient for accurate pattern recognition and classification for machine learning. The function regularity presented by the GSR indices showed stepwise deterioration during SC progression from FIGO stage I to stage IV. The pathogenesis of SC was centered on cell cycle deregulation and accompanied with multiple functional aberrations as well as their interactions.

  19. 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis

    NARCIS (Netherlands)

    Greshock, J; Naylor, TL; Margolin, A; Diskin, S; Cleaver, SH; Futreal, PA; deJong, PJ; Zhao, SY; Liebman, M; Weber, BL

    2004-01-01

    Array-based comparative genomic hybridization (aCGH) is a recently developed tool for genome-wide determination of DNA copy number alterations. This technology has tremendous potential for disease-gene discovery in cancer and developmental disorders as well as numerous other applications. However, w

  20. Zebrafish Expression Ontology of Gene Sets (ZEOGS): a tool to analyze enrichment of zebrafish anatomical terms in large gene sets.

    Science.gov (United States)

    Prykhozhij, Sergey V; Marsico, Annalisa; Meijsing, Sebastiaan H

    2013-09-01

    The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene expression

  1. Involvement of astrocyte and oligodendrocyte gene sets in migraine.

    Science.gov (United States)

    Eising, Else; de Leeuw, Christiaan; Min, Josine L; Anttila, Verneri; Verheijen, Mark Hg; Terwindt, Gisela M; Dichgans, Martin; Freilinger, Tobias; Kubisch, Christian; Ferrari, Michel D; Smit, August B; de Vries, Boukje; Palotie, Aarno; van den Maagdenberg, Arn Mjm; Posthuma, Danielle

    2016-06-01

    Migraine is a common episodic brain disorder characterized by recurrent attacks of severe unilateral headache and additional neurological symptoms. Two main migraine types can be distinguished based on the presence of aura symptoms that can accompany the headache: migraine with aura and migraine without aura. Multiple genetic and environmental factors confer disease susceptibility. Recent genome-wide association studies (GWAS) indicate that migraine susceptibility genes are involved in various pathways, including neurotransmission, which have already been implicated in genetic studies of monogenic familial hemiplegic migraine, a subtype of migraine with aura. To further explore the genetic background of migraine, we performed a gene set analysis of migraine GWAS data of 4954 clinic-based patients with migraine, as well as 13,390 controls. Curated sets of synaptic genes and sets of genes predominantly expressed in three glial cell types (astrocytes, microglia and oligodendrocytes) were investigated. Our results show that gene sets containing astrocyte- and oligodendrocyte-related genes are associated with migraine, which is especially true for gene sets involved in protein modification and signal transduction. Observed differences between migraine with aura and migraine without aura indicate that both migraine types, at least in part, seem to have a different genetic background. © International Headache Society 2015.

  2. Design and Generation of MLPA Probe Sets for Combined Copy Number and Small-Mutation Analysis of Human Genes: EGFR as an Example

    Directory of Open Access Journals (Sweden)

    Malgorzata Marcinkowska

    2010-01-01

    Full Text Available Multiplex ligation-dependent probe amplification (MLPA is a multiplex copy number analysis method that is routinely used to identify large mutations in many clinical and research labs. One of the most important drawbacks of the standard MLPA setup is a complicated, and therefore expensive, procedure of generating long MLPA probes. This drawback substantially limits the applicability of MLPA to those genomic regions for which ready-to-use commercial kits are available. Here we present a simple protocol for designing MLPA probe sets that are composed entirely of short oligonucleotide half-probes generated through chemical synthesis. As an example, we present the design and generation of an MLPA assay for parallel copy number and small-mutation analysis of the EGFR gene.

  3. XRCC5 as a risk gene for alcohol dependence: evidence from a genome-wide gene-set-based analysis and follow-up studies in Drosophila and humans.

    Science.gov (United States)

    Juraeva, Dilafruz; Treutlein, Jens; Scholz, Henrike; Frank, Josef; Degenhardt, Franziska; Cichon, Sven; Ridinger, Monika; Mattheisen, Manuel; Witt, Stephanie H; Lang, Maren; Sommer, Wolfgang H; Hoffmann, Per; Herms, Stefan; Wodarz, Norbert; Soyka, Michael; Zill, Peter; Maier, Wolfgang; Jünger, Elisabeth; Gaebel, Wolfgang; Dahmen, Norbert; Scherbaum, Norbert; Schmäl, Christine; Steffens, Michael; Lucae, Susanne; Ising, Marcus; Smolka, Michael N; Zimmermann, Ulrich S; Müller-Myhsok, Bertram; Nöthen, Markus M; Mann, Karl; Kiefer, Falk; Spanagel, Rainer; Brors, Benedikt; Rietschel, Marcella

    2015-01-01

    Genetic factors have as large role as environmental factors in the etiology of alcohol dependence (AD). Although genome-wide association studies (GWAS) enable systematic searches for loci not hitherto implicated in the etiology of AD, many true findings may be missed owing to correction for multiple testing. The aim of the present study was to circumvent this limitation by searching for biological system-level differences, and then following up these findings in humans and animals. Gene-set-based analysis of GWAS data from 1333 cases and 2168 controls identified 19 significantly associated gene-sets, of which 5 could be replicated in an independent sample. Clustered in these gene-sets were novel and previously identified susceptibility genes. The most frequently present gene, ie in 6 out of 19 gene-sets, was X-ray repair complementing defective repair in Chinese hamster cells 5 (XRCC5). Previous human and animal studies have implicated XRCC5 in alcohol sensitivity. This phenotype is inversely correlated with the development of AD, presumably as more alcohol is required to achieve the desired effects. In the present study, the functional role of XRCC5 in AD was further validated in animals and humans. Drosophila mutants with reduced function of Ku80-the homolog of mammalian XRCC5-due to RNAi silencing showed reduced sensitivity to ethanol. In humans with free access to intravenous ethanol self-administration in the laboratory, the maximum achieved blood alcohol concentration was influenced in an allele-dose-dependent manner by genetic variation in XRCC5. In conclusion, our convergent approach identified new candidates and generated independent evidence for the involvement of XRCC5 in alcohol dependence.

  4. Derivation of Tissue-specific Functional Gene Sets to Aid Transcriptomic Analysis of Chemical Impacts on the Teleost Reproductive Axis.

    Science.gov (United States)

    Oligonucleotide microarrays are a powerful tool for unsupervised analysis of chemical impacts on biological systems. However, the lack of well annotated biological pathways for many aquatic organisms, including fish, and the poor power of microarray-based analyses to detect diffe...

  5. Exon Array Analysis using re-defined probe sets results in reliable identification of alternatively spliced genes in non-small cell lung cancer

    Directory of Open Access Journals (Sweden)

    Gröne Jörn

    2010-11-01

    Full Text Available Abstract Background Treatment of non-small cell lung cancer with novel targeted therapies is a major unmet clinical need. Alternative splicing is a mechanism which generates diverse protein products and is of functional relevance in cancer. Results In this study, a genome-wide analysis of the alteration of splicing patterns between lung cancer and normal lung tissue was performed. We generated an exon array data set derived from matched pairs of lung cancer and normal lung tissue including both the adenocarcinoma and the squamous cell carcinoma subtypes. An enhanced workflow was developed to reliably detect differential splicing in an exon array data set. In total, 330 genes were found to be differentially spliced in non-small cell lung cancer compared to normal lung tissue. Microarray findings were validated with independent laboratory methods for CLSTN1, FN1, KIAA1217, MYO18A, NCOR2, NUMB, SLK, SYNE2, TPM1, (in total, 10 events and ADD3, which was analysed in depth. We achieved a high validation rate of 69%. Evidence was found that the activity of FOX2, the splicing factor shown to cause cancer-specific splicing patterns in breast and ovarian cancer, is not altered at the transcript level in several cancer types including lung cancer. Conclusions This study demonstrates how alternatively spliced genes can reliably be identified in a cancer data set. Our findings underline that key processes of cancer progression in NSCLC are affected by alternative splicing, which can be exploited in the search for novel targeted therapies.

  6. Analysis on a Fractal Set

    CERN Document Server

    Raut, Santanu

    2010-01-01

    The formulation of a new analysis on a zero measure Cantor set $C (\\subset I=[0,1])$ is presented. A non-archimedean absolute value is introduced in $C$ exploiting the concept of {\\em relative} infinitesimals and a scale invariant ultrametric valuation of the form $\\log_{\\varepsilon^{-1}} (\\varepsilon/x) $ for a given scale $\\varepsilon>0$ and infinitesimals $0set, if it exists. The formulation of a scale invariant real analysis is also outlined, when the singleton $\\{0\\}$ of the real line $R$ is replaced by a zero measure Cantor set. The Cantor function is realised as a locally constant function in this setting. The ordinary derivative $dx/dt$ in $R$ is replaced by the scale invariant logarithmic derivative $d\\log x/d\\log t$ on the set of valued infinitesimals. As a result, the ordinary real valued functions are expected to enjo...

  7. Dominant effects of the Huntington's disease HTT CAG repeat length are captured in gene-expression data sets by a continuous analysis mathematical modeling strategy.

    Science.gov (United States)

    Lee, Jong-Min; Galkina, Ekaterina I; Levantovsky, Rachel M; Fossale, Elisa; Anne Anderson, Mary; Gillis, Tammy; Srinidhi Mysore, Jayalakshmi; Coser, Kathryn R; Shioda, Toshi; Zhang, Bin; Furia, Matthew D; Derry, Jonathan; Kohane, Isaac S; Seong, Ihn Sik; Wheeler, Vanessa C; Gusella, James F; MacDonald, Marcy E

    2013-08-15

    In Huntington's disease (HD), the size of the expanded HTT CAG repeat mutation is the primary driver of the processes that determine age at onset of motor symptoms. However, correlation of cellular biochemical parameters also extends across the normal repeat range, supporting the view that the CAG repeat represents a functional polymorphism with dominant effects determined by the longer allele. A central challenge to defining the functional consequences of this single polymorphism is the difficulty of distinguishing its subtle effects from the multitude of other sources of biological variation. We demonstrate that an analytical approach based upon continuous correlation with CAG size was able to capture the modest (∼21%) contribution of the repeat to the variation in genome-wide gene expression in 107 lymphoblastoid cell lines, with alleles ranging from 15 to 92 CAGs. Furthermore, a mathematical model from an iterative strategy yielded predicted CAG repeat lengths that were significantly positively correlated with true CAG allele size and negatively correlated with age at onset of motor symptoms. Genes negatively correlated with repeat size were also enriched in a set of genes whose expression were CAG-correlated in human HD cerebellum. These findings both reveal the relatively small, but detectable impact of variation in the CAG allele in global data in these peripheral cells and provide a strategy for building multi-dimensional data-driven models of the biological network that drives the HD disease process by continuous analysis across allelic panels of neuronal cells vulnerable to the dominant effects of the HTT CAG repeat.

  8. Analysis of the real EADGENE data set:

    DEFF Research Database (Denmark)

    Jaffrézic, Florence; de Koning, Dirk-Jan; Boettcher, Paul J

    2007-01-01

    A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical...... methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two...... quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised...

  9. JAG: A Computational Tool to Evaluate the Role of Gene-Sets in Complex Traits.

    Science.gov (United States)

    Lips, Esther S; Kooyman, Maarten; de Leeuw, Christiaan; Posthuma, Danielle

    2015-05-14

    Gene-set analysis has been proposed as a powerful tool to deal with the highly polygenic architecture of complex traits, as well as with the small effect sizes typically found in GWAS studies for complex traits. We developed a tool, Joint Association of Genetic variants (JAG), which can be applied to Genome Wide Association (GWA) data and tests for the joint effect of all single nucleotide polymorphisms (SNPs) located in a user-specified set of genes or biological pathway. JAG assigns SNPs to genes and incorporates self-contained and/or competitive tests for gene-set analysis. JAG uses permutation to evaluate gene-set significance, which implicitly controls for linkage disequilibrium, sample size, gene size, the number of SNPs per gene and the number of genes in the gene-set. We conducted a power analysis using the Wellcome Trust Case Control Consortium (WTCCC) Crohn's disease data set and show that JAG correctly identifies validated gene-sets for Crohn's disease and has more power than currently available tools for gene-set analysis. JAG is a powerful, novel tool for gene-set analysis, and can be freely downloaded from the CTG Lab website.

  10. JAG: A Computational Tool to Evaluate the Role of Gene-Sets in Complex Traits

    Directory of Open Access Journals (Sweden)

    Esther S. Lips

    2015-05-01

    Full Text Available Gene-set analysis has been proposed as a powerful tool to deal with the highly polygenic architecture of complex traits, as well as with the small effect sizes typically found in GWAS studies for complex traits. We developed a tool, Joint Association of Genetic variants (JAG, which can be applied to Genome Wide Association (GWA data and tests for the joint effect of all single nucleotide polymorphisms (SNPs located in a user-specified set of genes or biological pathway. JAG assigns SNPs to genes and incorporates self-contained and/or competitive tests for gene-set analysis. JAG uses permutation to evaluate gene-set significance, which implicitly controls for linkage disequilibrium, sample size, gene size, the number of SNPs per gene and the number of genes in the gene-set. We conducted a power analysis using the Wellcome Trust Case Control Consortium (WTCCC Crohn’s disease data set and show that JAG correctly identifies validated gene-sets for Crohn’s disease and has more power than currently available tools for gene-set analysis. JAG is a powerful, novel tool for gene-set analysis, and can be freely downloaded from the CTG Lab website.

  11. Identification and analysis of house-keeping and tissue-specific genes based on RNA-seq data sets across 15 mouse tissues.

    Science.gov (United States)

    Zeng, Jingyao; Liu, Shoucheng; Zhao, Yuhui; Tan, Xinyu; Aljohi, Hasan Awad; Liu, Wanfei; Hu, Songnian

    2016-01-15

    Recently, RNA-seq has become widely used technology for transcriptome profiling due to its single-base accuracy and high-throughput speciality. In this study, we applied a computational approach on an integrated RNA-seq dataset across 15 normal mouse tissues, and consequently assigned 8408 house-keeping (HK) genes and 2581 tissue-specific (TS) genes among UCSC RefGene annotation. Apart from some basic genomic features, we also performed expression, function and pathway analysis with clustering, DAVID and Ingenuity Pathway Analysis, indicating the physiological connections (tissues) and diverse biological roles of HK genes (fundamental processes) and TS genes (tissue-corresponding processes). Moreover, we used RT-PCR method to test 18 candidate HK genes and finally identified a novel list of highly stable internal control genes: Ywhae, Ddb 1, Eif4h, etc. In summary, this study provides a new HK gene and TS gene resource for further genetic and evolution research and helps us better understand morphogenesis and biological diversity in mouse.

  12. Genetic and functional analysis of a set of HIV-1 envelope genes obtained from biological clones with varying syncytium-inducing capacities.

    NARCIS (Netherlands)

    A.C. Andeweg (Arno); M. Groenink (Maarten); P. Leeflang; R.E.Y. de Goede; A.D.M.E. Osterhaus (Ab); M. Tersmette; M.L. Bosch (Marnix)

    1992-01-01

    textabstractTo study HIV-1 envelope-mediated syncytium formation we have amplified, cloned, expressed, and sequenced individual envelope genes from a set of eight biological HIV-1 clones. These clones were obtained from two patients and display either a syncytium-inducing (SI) or nonsyncytium-induci

  13. Studying the Complex Expression Dependences between Sets of Coexpressed Genes

    Directory of Open Access Journals (Sweden)

    Mario Huerta

    2014-01-01

    Full Text Available Organisms simplify the orchestration of gene expression by coregulating genes whose products function together in the cell. The use of clustering methods to obtain sets of coexpressed genes from expression arrays is very common; nevertheless there are no appropriate tools to study the expression networks among these sets of coexpressed genes. The aim of the developed tools is to allow studying the complex expression dependences that exist between sets of coexpressed genes. For this purpose, we start detecting the nonlinear expression relationships between pairs of genes, plus the coexpressed genes. Next, we form networks among sets of coexpressed genes that maintain nonlinear expression dependences between all of them. The expression relationship between the sets of coexpressed genes is defined by the expression relationship between the skeletons of these sets, where this skeleton represents the coexpressed genes with a well-defined nonlinear expression relationship with the skeleton of the other sets. As a result, we can study the nonlinear expression relationships between a target gene and other sets of coexpressed genes, or start the study from the skeleton of the sets, to study the complex relationships of activation and deactivation between the sets of coexpressed genes that carry out the different cellular processes present in the expression experiments.

  14. A Rough Set based Gene Expression Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    J. J. Emilyn

    2011-01-01

    Full Text Available Problem statement: Microarray technology helps in monitoring the expression levels of thousands of genes across collections of related samples. Approach: The main goal in the analysis of large and heterogeneous gene expression datasets was to identify groups of genes that get expressed in a set of experimental conditions. Results: Several clustering techniques have been proposed for identifying gene signatures and to understand their role and many of them have been applied to gene expression data, but with partial success. The main aim of this work was to develop a clustering algorithm that would successfully indentify gene patterns. The proposed novel clustering technique (RCGED provides an efficient way of finding the hidden and unique gene expression patterns. It overcomes the restriction of one object being placed in only one cluster. Conclusion/Recommendations: The proposed algorithm is termed intelligent because it automatically determines the optimum number of clusters. The proposed algorithm was experimented with colon cancer dataset and the results were compared with Rough Fuzzy K Means algorithm.

  15. Transcriptome profiling of Set5 and Set1 methyltransferases: Tools for visualization of gene expression

    Directory of Open Access Journals (Sweden)

    Glòria Mas Martín

    2014-12-01

    Full Text Available Cells regulate transcription by coordinating the activities of multiple histone modifying complexes. We recently identified the yeast histone H4 methyltransferase Set5 and discovered functional overlap with the histone H3 methyltransferase Set1 in gene expression. Specifically, using next-generation RNA sequencing (RNA-Seq, we found that Set5 and Set1 function synergistically to regulate specific transcriptional programs at subtelomeres and transposable elements. Here we provide a comprehensive description of the methodology and analysis tools corresponding to the data deposited in NCBI's Gene Expression Omnibus (GEO under the accession number GSE52086. This data complements the experimental methods described in Mas Martín G et al. (2014 and provides the means to explore the cooperative functions of histone H3 and H4 methyltransferases in the regulation of transcription. Furthermore, a fully annotated R code is included to enable researchers to use the following computational tools: comparison of significant differential expression (SDE profiles; gene ontology enrichment of SDE; and enrichment of SDE relative to chromosomal features, such as centromeres, telomeres, and transposable elements. Overall, we present a bioinformatics platform that can be generally implemented for similar analyses with different datasets and in different organisms.

  16. Non-homologous end-joining pathway associated with occurrence of myocardial infarction: gene set analysis of genome-wide association study data.

    Directory of Open Access Journals (Sweden)

    Jeffrey J W Verschuren

    Full Text Available PURPOSE: DNA repair deficiencies have been postulated to play a role in the development and progression of cardiovascular disease (CVD. The hypothesis is that DNA damage accumulating with age may induce cell death, which promotes formation of unstable plaques. Defects in DNA repair mechanisms may therefore increase the risk of CVD events. We examined whether the joints effect of common genetic variants in 5 DNA repair pathways may influence the risk of CVD events. METHODS: The PLINK set-based test was used to examine the association to myocardial infarction (MI of the DNA repair pathway in GWAS data of 866 subjects of the GENetic DEterminants of Restenosis (GENDER study and 5,244 subjects of the PROspective Study of Pravastatin in the Elderly at Risk (PROSPER study. We included the main DNA repair pathways (base excision repair, nucleotide excision repair, mismatch repair, homologous recombination and non-homologous end-joining (NHEJ in the analysis. RESULTS: The NHEJ pathway was associated with the occurrence of MI in both GENDER (P = 0.0083 and PROSPER (P = 0.014. This association was mainly driven by genetic variation in the MRE11A gene (PGENDER = 0.0001 and PPROSPER = 0.002. The homologous recombination pathway was associated with MI in GENDER only (P = 0.011, for the other pathways no associations were observed. CONCLUSION: This is the first study analyzing the joint effect of common genetic variation in DNA repair pathways and the risk of CVD events, demonstrating an association between the NHEJ pathway and MI in 2 different cohorts.

  17. Comparison of gene sets for expression profiling: prediction of metastasis from low-malignant breast cancer

    DEFF Research Database (Denmark)

    Thomassen, Mads; Tan, Qihua; Eiriksdottir, Freyja;

    2007-01-01

    -six tumors from low-risk patients and 34 low-malignant T2 tumors from patients with slightly higher risk have been examined by genome-wide gene expression analysis. Nine prognostic gene sets were tested in this data set. RESULTS: A 32-gene profile (HUMAC32) that accurately predicts metastasis has previously...... sets, mainly developed in high-risk cancers, predict metastasis from low-malignant cancer....

  18. Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

    Directory of Open Access Journals (Sweden)

    Lijing Xu

    Full Text Available High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05. These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT. GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature.GCAT is freely available at http://binf1.memphis.edu/gcat.

  19. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data.

    Science.gov (United States)

    Lewin, Alex; Grieve, Ian C

    2006-10-03

    Gene Ontology (GO) terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  20. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data

    Directory of Open Access Journals (Sweden)

    Grieve Ian C

    2006-10-01

    Full Text Available Abstract Background Gene Ontology (GO terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. Results We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Conclusion Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  1. A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data

    Directory of Open Access Journals (Sweden)

    Scaglione Silvia

    2008-11-01

    Full Text Available Abstract Background Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been developed in order to analyze microarray data. Two of the most known free analysis software packages are the R-based Bioconductor and dChip. The part of dChip software concerning the calculation and the analysis of gene expression has been modified to permit its execution on both cluster environments (supercomputers and Grid infrastructures (distributed computing. This work is not aimed at replacing existing tools, but it provides researchers with a method to analyze large datasets without any hardware or software constraints. Results An application able to perform the computation and the analysis of gene expression on large datasets has been developed using algorithms provided by dChip. Different tests have been carried out in order to validate the results and to compare the performances obtained on different infrastructures. Validation tests have been performed using a small dataset related to the comparison of HUVEC (Human Umbilical Vein Endothelial Cells and Fibroblasts, derived from same donors, treated with IFN-α. Moreover performance tests have been executed just to compare performances on different environments using a large dataset including about 1000 samples related to Breast Cancer patients. Conclusion A Grid-enabled software application for the analysis of large Microarray datasets has been proposed. DChip software has been ported on Linux platform and modified, using appropriate parallelization strategies, to permit its execution on both cluster environments and Grid infrastructures. The added value provided by the use of Grid technologies is the possibility to exploit both computational and data Grid infrastructures to analyze

  2. Globaltest and GOEAST: two different approaches for Gene Ontology analysis

    NARCIS (Netherlands)

    Hulsegge, B.; Kommadath, A.; Smits, M.A.

    2009-01-01

    Background Gene set analysis is a commonly used method for analysing microarray data by considering groups of functionally related genes instead of individual genes. Here we present the use of two gene set analysis approaches: Globaltest and GOEAST. Globaltest is a method for testing whether sets of

  3. Gene set of nuclear-encoded mitochondrial regulators is enriched for common inherited variation in obesity.

    Directory of Open Access Journals (Sweden)

    Nadja Knoll

    Full Text Available There are hints of an altered mitochondrial function in obesity. Nuclear-encoded genes are relevant for mitochondrial function (3 gene sets of known relevant pathways: (1 16 nuclear regulators of mitochondrial genes, (2 91 genes for oxidative phosphorylation and (3 966 nuclear-encoded mitochondrial genes. Gene set enrichment analysis (GSEA showed no association with type 2 diabetes mellitus in these gene sets. Here we performed a GSEA for the same gene sets for obesity. Genome wide association study (GWAS data from a case-control approach on 453 extremely obese children and adolescents and 435 lean adult controls were used for GSEA. For independent confirmation, we analyzed 705 obesity GWAS trios (extremely obese child and both biological parents and a population-based GWAS sample (KORA F4, n = 1,743. A meta-analysis was performed on all three samples. In each sample, the distribution of significance levels between the respective gene set and those of all genes was compared using the leading-edge-fraction-comparison test (cut-offs between the 50(th and 95(th percentile of the set of all gene-wise corrected p-values as implemented in the MAGENTA software. In the case-control sample, significant enrichment of associations with obesity was observed above the 50(th percentile for the set of the 16 nuclear regulators of mitochondrial genes (p(GSEA,50 = 0.0103. This finding was not confirmed in the trios (p(GSEA,50 = 0.5991, but in KORA (p(GSEA,50 = 0.0398. The meta-analysis again indicated a trend for enrichment (p(MAGENTA,50 = 0.1052, p(MAGENTA,75 = 0.0251. The GSEA revealed that weak association signals for obesity might be enriched in the gene set of 16 nuclear regulators of mitochondrial genes.

  4. Gene set of nuclear-encoded mitochondrial regulators is enriched for common inherited variation in obesity.

    Science.gov (United States)

    Knoll, Nadja; Jarick, Ivonne; Volckmar, Anna-Lena; Klingenspor, Martin; Illig, Thomas; Grallert, Harald; Gieger, Christian; Wichmann, Heinz-Erich; Peters, Annette; Hebebrand, Johannes; Scherag, André; Hinney, Anke

    2013-01-01

    There are hints of an altered mitochondrial function in obesity. Nuclear-encoded genes are relevant for mitochondrial function (3 gene sets of known relevant pathways: (1) 16 nuclear regulators of mitochondrial genes, (2) 91 genes for oxidative phosphorylation and (3) 966 nuclear-encoded mitochondrial genes). Gene set enrichment analysis (GSEA) showed no association with type 2 diabetes mellitus in these gene sets. Here we performed a GSEA for the same gene sets for obesity. Genome wide association study (GWAS) data from a case-control approach on 453 extremely obese children and adolescents and 435 lean adult controls were used for GSEA. For independent confirmation, we analyzed 705 obesity GWAS trios (extremely obese child and both biological parents) and a population-based GWAS sample (KORA F4, n = 1,743). A meta-analysis was performed on all three samples. In each sample, the distribution of significance levels between the respective gene set and those of all genes was compared using the leading-edge-fraction-comparison test (cut-offs between the 50(th) and 95(th) percentile of the set of all gene-wise corrected p-values) as implemented in the MAGENTA software. In the case-control sample, significant enrichment of associations with obesity was observed above the 50(th) percentile for the set of the 16 nuclear regulators of mitochondrial genes (p(GSEA,50) = 0.0103). This finding was not confirmed in the trios (p(GSEA,50) = 0.5991), but in KORA (p(GSEA,50) = 0.0398). The meta-analysis again indicated a trend for enrichment (p(MAGENTA,50) = 0.1052, p(MAGENTA,75) = 0.0251). The GSEA revealed that weak association signals for obesity might be enriched in the gene set of 16 nuclear regulators of mitochondrial genes.

  5. A Bayesian variable selection procedure to rank overlapping gene sets

    Directory of Open Access Journals (Sweden)

    Skarman Axel

    2012-05-01

    Full Text Available Abstract Background Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability and stable given a limited number of observations. Conclusions Bayesian variable selection is a useful way to prioritize gene sets while considering their overlaps. Ignoring the overlaps gives different and possibly misleading results. Additional procedures may be needed in cases of highly overlapping pathways that are hard to prioritize.

  6. Gene-Set Local Hierarchical Clustering (GSLHC--A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups.

    Directory of Open Access Journals (Sweden)

    Feng-Hsiang Chung

    Full Text Available Gene-set-based analysis (GSA, which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA, which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap, an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap, in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases.

  7. Comparative genomic analysis of eutherian kallikrein genes

    Directory of Open Access Journals (Sweden)

    Marko Premzl

    2017-03-01

    Full Text Available The present study made attempts to update and revise eutherian kallikrein genes implicated in major physiological and pathological processes and in medical molecular diagnostics. Using eutherian comparative genomic analysis protocol and free available genomic sequence assemblies, the tests of reliability of eutherian public genomic sequences annotated most comprehensive curated third party data gene data set of eutherian kallikrein genes including 121 complete coding sequences among 335 potential coding sequences. The present analysis first described 13 major gene clusters of eutherian kallikrein genes, and explained their differential gene expansion patterns. One updated classification and nomenclature of eutherian kallikrein genes was proposed, as new framework of future experiments.

  8. goSTAG: gene ontology subtrees to tag and annotate genes within a set.

    Science.gov (United States)

    Bennett, Brian D; Bushel, Pierre R

    2017-01-01

    Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. goSTAG converts gene lists from genomic analyses into biological themes

  9. Genome-wide transcriptional analysis of grapevine berry ripening reveals a set of genes similarly modulated during three seasons and the occurrence of an oxidative burst at vèraison

    Directory of Open Access Journals (Sweden)

    Dal Ri Antonio

    2007-11-01

    Full Text Available Abstract Background Grapevine (Vitis species is among the most important fruit crops in terms of cultivated area and economic impact. Despite this relevance, little is known about the transcriptional changes and the regulatory circuits underlying the biochemical and physical changes occurring during berry development. Results Fruit ripening in the non-climacteric crop species Vitis vinifera L. has been investigated at the transcriptional level by the use of the Affymetrix Vitis GeneChip® which contains approximately 14,500 unigenes. Gene expression data obtained from berries sampled before and after véraison in three growing years, were analyzed to identify genes specifically involved in fruit ripening and to investigate seasonal influences on the process. From these analyses a core set of 1477 genes was found which was similarly modulated in all seasons. We were able to separate ripening specific isoforms within gene families and to identify ripening related genes which appeared strongly regulated also by the seasonal weather conditions. Transcripts annotation by Gene Ontology vocabulary revealed five overrepresented functional categories of which cell wall organization and biogenesis, carbohydrate and secondary metabolisms and stress response were specifically induced during the ripening phase, while photosynthesis was strongly repressed. About 19% of the core gene set was characterized by genes involved in regulatory processes, such as transcription factors and transcripts related to hormonal metabolism and signal transduction. Auxin, ethylene and light emerged as the main stimuli influencing berry development. In addition, an oxidative burst, previously not detected in grapevine, characterized by rapid accumulation of H2O2 starting from véraison and by the modulation of many ROS scavenging enzymes, was observed. Conclusion The time-course gene expression analysis of grapevine berry development has identified the occurrence of two well

  10. Competence Set Analysis Under Risk and Uncertainty

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    The competence set analysis technology can be applied to solve the decision making problems successfully and satisfactorily. This paper mainly focuses on the expanding strategy research and development of the competence set under risk and uncertainty. A systematic expression of the competence set analysis is described, several expanding principles and strategies with regard to several different cases are presented, and their applications in the personnel training program are discussed, some conclusions and suggestions to be developed in a further work are included.

  11. A Bayesian variable selection procedure for ranking overlapping gene sets

    DEFF Research Database (Denmark)

    Skarman, Axel; Mahdi Shariati, Mohammad; Janss, Luc

    2012-01-01

    described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian...... variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our...... data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability...

  12. GeneBrowser 2: an application to explore and identify common biological traits in a set of genes

    Directory of Open Access Journals (Sweden)

    Oliveira José

    2010-07-01

    Full Text Available Abstract Background The development of high-throughput laboratory techniques created a demand for computer-assisted result analysis tools. Many of these techniques return lists of genes whose interpretation requires finding relevant biological roles for the problem at hand. The required information is typically available in public databases, and usually, this information must be manually retrieved to complement the analysis. This process is a very time-consuming task that should be automated as much as possible. Results GeneBrowser is a web-based tool that, for a given list of genes, combines data from several public databases with visualisation and analysis methods to help identify the most relevant and common biological characteristics. The functionalities provided include the following: a central point with the most relevant biological information for each inserted gene; a list of the most related papers in PubMed and gene expression studies in ArrayExpress; and an extended approach to functional analysis applied to Gene Ontology, homologies, gene chromosomal localisation and pathways. Conclusions GeneBrowser provides a unique entry point to several visualisation and analysis methods, providing fast and easy analysis of a set of genes. GeneBrowser fills the gap between Web portals that analyse one gene at a time and functional analysis tools that are limited in scope and usually desktop-based.

  13. Genome-wide association analysis for heat tolerance at flowering detected a large set of genes involved in adaptation to thermal and other stresses

    Science.gov (United States)

    Lafarge, Tanguy; Bueno, Crisanta; Frouin, Julien; Jacquin, Laval; Courtois, Brigitte; Ahmadi, Nourollah

    2017-01-01

    Fertilization sensitivity to heat in rice is a major issue within climate change scenarios in the tropics. A panel of 167 indica landraces and improved varieties was phenotyped for spikelet sterility (SPKST) under 38°C during anthesis and for several secondary traits potentially affecting panicle micro-climate and thus the fertilization process. The panel was genotyped with an average density of one marker per 29 kb using genotyping by sequencing. Genome-wide association analyses (GWAS) were conducted using three methods based on single marker regression, haplotype regression and simultaneous fitting of all markers, respectively. Fourteen loci significantly associated with SPKST under at least two GWAS methods were detected. A large number of associations was also detected for the secondary traits. Analysis of co-localization of SPKST associated loci with QTLs detected in progenies of bi-parental crosses reported in the literature allowed to narrow -down the position of eight of those QTLs, including the most documented one, qHTSF4.1. Gene families underlying loci associated with SPKST corresponded to functions ranging from sensing abiotic stresses and regulating plant response, such as wall-associated kinases and heat shock proteins, to cell division and gametophyte development. Analysis of diversity at the vicinity of loci associated with SPKST within the rice three thousand genomes, revealed widespread distribution of the favourable alleles across O. sativa genetic groups. However, few accessions assembled the favourable alleles at all loci. Effective donors included the heat tolerant variety N22 and some Indian and Taiwanese varieties. These results provide a basis for breeding for heat tolerance during anthesis and for functional validation of major loci governing this trait. PMID:28152098

  14. Reduced retinal microvascular density, improved forepaw reach, comparative microarray and gene set enrichment analysis with c-jun targeting DNA enzyme.

    Directory of Open Access Journals (Sweden)

    Cecilia W S Chan

    Full Text Available Retinal neovascularization is a critical component in the pathogenesis of common ocular disorders that cause blindness, and treatment options are limited. We evaluated the therapeutic effect of a DNA enzyme targeting c-jun mRNA in mice with pre-existing retinal neovascularization. A single injection of Dz13 in a lipid formulation containing N-[1-(2,3-dioleoyloxypropyl]-N,N,N-trimethylammonium methyl-sulfate and 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine inhibited c-Jun expression and reduced retinal microvascular density. The DNAzyme inhibited retinal microvascular density as effectively as VEGF-A antibodies. Comparative microarray and gene expression analysis determined that Dz13 suppressed not only c-jun but a range of growth factors and matrix-degrading enzymes. Dz13 in this formulation inhibited microvascular endothelial cell proliferation, migration and tubule formation in vitro. Moreover, animals treated with Dz13 sensed the top of the cage in a modified forepaw reach model, unlike mice given a DNAzyme with scrambled RNA-binding arms that did not affect c-Jun expression. These findings demonstrate reduction of microvascular density and improvement in forepaw reach in mice administered catalytic DNA.

  15. Genes2GO: A web application for querying gene sets for specific GO terms.

    Science.gov (United States)

    Chawla, Konika; Kuiper, Martin

    2016-01-01

    Gene ontology annotations have become an essential resource for biological interpretations of experimental findings. The process of gathering basic annotation information in tables that link gene sets with specific gene ontology terms can be cumbersome, in particular if it requires above average computer skills or bioinformatics expertise. We have therefore developed Genes2GO, an intuitive R-based web application. Genes2GO uses the biomaRt package of Bioconductor in order to retrieve custom sets of gene ontology annotations for any list of genes from organisms covered by the Ensembl database. Genes2GO produces a binary matrix file, indicating for each gene the presence or absence of specific annotations for a gene. It should be noted that other GO tools do not offer this user-friendly access to annotations. Genes2GO is freely available and listed under http://www.semantic-systems-biology.org/tools/externaltools/.

  16. A reference gene set for chemosensory receptor genes of Manduca sexta.

    Science.gov (United States)

    Koenig, Christopher; Hirsh, Ariana; Bucks, Sascha; Klinner, Christian; Vogel, Heiko; Shukla, Aditi; Mansfield, Jennifer H; Morton, Brian; Hansson, Bill S; Grosse-Wilde, Ewald

    2015-11-01

    The order of Lepidoptera has historically been crucial for chemosensory research, with many important advances coming from the analysis of species like Bombyx mori or the tobacco hornworm, Manduca sexta. Specifically M. sexta has long been a major model species in the field, especially regarding the importance of olfaction in an ecological context, mainly the interaction with its host plants. In recent years transcriptomic data has led to the discovery of members of all major chemosensory receptor families in the species, but the data was fragmentary and incomplete. Here we present the analysis of the newly available high-quality genome data for the species, supplemented by additional transcriptome data to generate a high quality reference gene set for the three major chemosensory receptor gene families, the gustatory (GR), olfactory (OR) and antennal ionotropic receptors (IR). Coupled with gene expression analysis our approach allows association of specific receptor types and behaviors, like pheromone and host detection. The dataset will provide valuable support for future analysis of these essential chemosensory modalities in this species and in Lepidoptera in general.

  17. Transcriptome analysis of acetic-acid-treated yeast cells identifies a large set of genes whose overexpression or deletion enhances acetic acid tolerance.

    Science.gov (United States)

    Lee, Yeji; Nasution, Olviyani; Choi, Eunyong; Choi, In-Geol; Kim, Wankee; Choi, Wonja

    2015-08-01

    Acetic acid inhibits the metabolic activities of Saccharomyces cerevisiae. Therefore, a better understanding of how S. cerevisiae cells acquire the tolerance to acetic acid is of importance to develop robust yeast strains to be used in industry. To do this, we examined the transcriptional changes that occur at 12 h post-exposure to acetic acid, revealing that 56 and 58 genes were upregulated and downregulated, respectively. Functional categorization of them revealed that 22 protein synthesis genes and 14 stress response genes constituted the largest portion of the upregulated and downregulated genes, respectively. To evaluate the association of the regulated genes with acetic acid tolerance, 3 upregulated genes (DBP2, ASC1, and GND1) were selected among 34 non-protein synthesis genes, and 54 viable mutants individually deleted for the downregulated genes were retrieved from the non-essential haploid deletion library. Strains overexpressing ASC1 and GND1 displayed enhanced tolerance to acetic acid, whereas a strain overexpressing DBP2 was sensitive. Fifty of 54 deletion mutants displayed enhanced acetic acid tolerance. Three chosen deletion mutants (hsps82Δ, ato2Δ, and ssa3Δ) were also tolerant to benzoic acid but not propionic and sorbic acids. Moreover, all those five (two overexpressing and three deleted) strains were more efficient in proton efflux and lower in membrane permeability and internal hydrogen peroxide content than controls. Individually or in combination, those physiological changes are likely to contribute at least in part to enhanced acetic acid tolerance. Overall, information of our transcriptional profile was very useful to identify molecular factors associated with acetic acid tolerance.

  18. Identifying the genetic variation of gene expression using gene sets: application of novel gene Set eQTL approach to PharmGKB and KEGG.

    Directory of Open Access Journals (Sweden)

    Ryan Abo

    Full Text Available Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs and pathways, referred to as gene set - expression quantitative trait loci analysis (GS-eQTL. The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations.

  19. Identification of a core set of genes that signifies pathways underlying cardiac hypertrophy

    DEFF Research Database (Denmark)

    Strom, C.C.; Kruhoffer, M.; Knudsen, Steen

    2004-01-01

    Although the molecular signals underlying cardiac hypertrophy have been the subject of intense investigation, the extent of common and distinct gene regulation between different forms of cardiac hypertrophy remains unclear. We hypothesized that a general and comparative analysis of hypertrophic...... gene expression, using microarray technology in multiple models of cardiac hypertrophy, including aortic banding, myocardial infarction, an arteriovenous shunt and pharmacologically induced hypertrophy, would uncover networks of conserved hypertrophy-specific genes and identify novel genes involved...... in hypertrophic signalling. From gene expression analyses (8740 probe sets, n = 46) of rat ventricular RNA, we identified a core set of 139 genes with consistent differential expression in all hypertrophy models as compared to their controls, including 78 genes not previously associated with hypertrophy and 61...

  20. Identification of a core set of genes that signifies pathways underlying cardiac hypertrophy

    DEFF Research Database (Denmark)

    Strøm, Claes C; Kruhøffer, Mogens; Knudsen, Steen

    2004-01-01

    Although the molecular signals underlying cardiac hypertrophy have been the subject of intense investigation, the extent of common and distinct gene regulation between different forms of cardiac hypertrophy remains unclear. We hypothesized that a general and comparative analysis of hypertrophic...... gene expression, using microarray technology in multiple models of cardiac hypertrophy, including aortic banding, myocardial infarction, an arteriovenous shunt and pharmacologically induced hypertrophy, would uncover networks of conserved hypertrophy-specific genes and identify novel genes involved...... in hypertrophic signalling. From gene expression analyses (8740 probe sets, n = 46) of rat ventricular RNA, we identified a core set of 139 genes with consistent differential expression in all hypertrophy models as compared to their controls, including 78 genes not previously associated with hypertrophy and 61...

  1. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    cells are capable of regulating their gene expression, so that each cell can only express a particular set of genes yielding limited numbers of proteins with specialized functions. Therefore a rigid control of differential gene expression is necessary for cellular diversity. On the other hand, aberrant...... gene regulation will disrupt the cell’s fundamental processes, which in turn can cause disease. Hence, understanding gene regulation is essential for deciphering the code of life. Along with the development of high throughput sequencing (HTS) technology and the subsequent large-scale data analysis......, genome-wide assays have increased our understanding of gene regulation significantly. This thesis describes the integration and analysis of HTS data across different important aspects of gene regulation. Gene expression can be regulated at different stages when the genetic information is passed from gene...

  2. Time series analysis of benzo[a]pyrene-induced transcriptome changes suggests that a network of transcription factors regulates the effects on functional gene sets

    NARCIS (Netherlands)

    Delft, J.H.M. van; Mathijs, K.; Staal, Y.C.M.; Herwijnen, M.H.M. van; Brauers, K.J.J.; Boorsma, A.; Kleinjans, J.C.S.

    2010-01-01

    Chemical carcinogens may cause a multitude of effects inside cells, thereby affecting transcript levels of genes by direct activation of transcription factors (TF) or indirectly through the formation of DNA damage. As the temporal profiles of these responses may be profoundly different, examining ti

  3. Regularized Multiple-Set Canonical Correlation Analysis

    Science.gov (United States)

    Takane, Yoshio; Hwang, Heungsun; Abdi, Herve

    2008-01-01

    Multiple-set canonical correlation analysis (Generalized CANO or GCANO for short) is an important technique because it subsumes a number of interesting multivariate data analysis techniques as special cases. More recently, it has also been recognized as an important technique for integrating information from multiple sources. In this paper, we…

  4. ROUGH SET BASED CLUSTERING OF GENE EXPRESSION DATA: A SURVEY

    Directory of Open Access Journals (Sweden)

    J.JEBA EMILYN

    2010-12-01

    Full Text Available Microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. But the high dimensionality property of gene expression data makes it difficult to be analyzed. Lot of clustering algorithms are available for clustering. In this paper we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. Then we introduce rough clustering and itsadvantage over strict and fuzzy clustering is explored. We also explain why rough clustering is preferred over other conventional methods by presenting a survey on few clustering algorithms based on rough set theory for gene expression data. We conclude by stating that this area proves to be potential research field for the researchcommunity.

  5. Multi-edge gene set networks reveal novel insights into global relationships between biological themes.

    Directory of Open Access Journals (Sweden)

    Jignesh R Parikh

    Full Text Available Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.

  6. Multi-edge gene set networks reveal novel insights into global relationships between biological themes.

    Science.gov (United States)

    Parikh, Jignesh R; Xia, Yu; Marto, Jarrod A

    2012-01-01

    Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.

  7. Textrous!: extracting semantic textual meaning from gene sets.

    Directory of Open Access Journals (Sweden)

    Hongyu Chen

    Full Text Available The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual 'tokens' from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI, sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM, and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.

  8. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    Directory of Open Access Journals (Sweden)

    Hettne Kristina M

    2013-01-01

    Full Text Available Abstract Background Availability of chemical response-specific lists of genes (gene sets for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM, and that these can be used with gene set analysis (GSA methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human and 588 (mouse gene sets from the Comparative Toxicogenomics Database (CTD. We tested for significant differential expression (SDE (false discovery rate -corrected p-values Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.

  9. A brain region-specific predictive gene map for autism derived by profiling a reference gene set.

    Directory of Open Access Journals (Sweden)

    Ajay Kumar

    Full Text Available Molecular underpinnings of complex psychiatric disorders such as autism spectrum disorders (ASD remain largely unresolved. Increasingly, structural variations in discrete chromosomal loci are implicated in ASD, expanding the search space for its disease etiology. We exploited the high genetic heterogeneity of ASD to derive a predictive map of candidate genes by an integrated bioinformatics approach. Using a reference set of 84 Rare and Syndromic candidate ASD genes (AutRef84, we built a composite reference profile based on both functional and expression analyses. First, we created a functional profile of AutRef84 by performing Gene Ontology (GO enrichment analysis which encompassed three main areas: 1 neurogenesis/projection, 2 cell adhesion, and 3 ion channel activity. Second, we constructed an expression profile of AutRef84 by conducting DAVID analysis which found enrichment in brain regions critical for sensory information processing (olfactory bulb, occipital lobe, executive function (prefrontal cortex, and hormone secretion (pituitary. Disease specificity of this dual AutRef84 profile was demonstrated by comparative analysis with control, diabetes, and non-specific gene sets. We then screened the human genome with the dual AutRef84 profile to derive a set of 460 potential ASD candidate genes. Importantly, the power of our predictive gene map was demonstrated by capturing 18 existing ASD-associated genes which were not part of the AutRef84 input dataset. The remaining 442 genes are entirely novel putative ASD risk genes. Together, we used a composite ASD reference profile to generate a predictive map of novel ASD candidate genes which should be prioritized for future research.

  10. Analysis of the real EADGENE data set: Comparison of methods and guidelines for data normalisation and selection of differentially expressed genes

    NARCIS (Netherlands)

    Jafrezic, F.; Koning, de D.J.; Boettcher, P.; Bonnet, A.; Buitenhuis, B.; Closset, R.; Dejean, S.; Delmas, C.; Detilleux, J.C.; Dovc, P.; Duval, M.; Foulley, J.L.; Hedegaard, J.; Hoprnshoj, H.; Hulsegge, B.; Janss, L.; Jensen, K.; Jiang, L.; Lavric, M.; Cao Le, K.A.; Lund, M.S.; Malinverni, R.; Marot, G.; Nie, H.; Petzl, W.; Pool, M.H.; Robert-Granie, C.; Cristobal, M.; Schothorst, van E.M.; Schuberth, H.J.; Sorensen, P.; Stella, A.; Tosser-klopp, G.; Waddington, D.; Watson, M.; Yang, M.; Zerbe, H.; Seyfert, H.M.

    2007-01-01

    A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical methods

  11. Allele diversity for abiotic stress responsive candidate genes in chickpea reference set using gene based SNP markers

    Directory of Open Access Journals (Sweden)

    Manish eRoorkiwal

    2014-06-01

    Full Text Available Chickpea is an important food legume crop for the semi-arid regions, however, its productivity is adversely affected by various biotic and abiotic stresses. Identification of candidate genes associated with abiotic stress response will help breeding efforts aiming to enhance its productivity. With this objective, 10 abiotic stress responsive candidate genes were selected on the basis of prior knowledge of this complex trait. These 10 genes were subjected to allele specific sequencing across a chickpea reference set comprising 300 genotypes including 211 accessions of chickpea mini core collection. A total of 1.3 Mbp sequence data were generated. Multiple sequence alignment revealed 79 SNPs and 41 indels in nine genes while the CAP2 gene was found to be conserved across all the genotypes. Among ten candidate genes, the maximum number of SNPs (34 was observed in abscisic acid stress and ripening (ASR gene including 22 transitions, 11 transversions and one tri-allelic SNP. Nucleotide diversity varied from 0.0004 to 0.0029 while PIC values ranged from 0.01 (AKIN gene to 0.43 (CAP2 promoter. Haplotype analysis revealed that alleles were represented by more than two haplotype blocks, except alleles of the CAP2 and sucrose synthase (SuSy gene, where only one haplotype was identified. These genes can be used for association analysis and if validated, may be useful for enhancing abiotic stress, including drought tolerance, through molecular breeding.

  12. Identification of a conserved set of upregulated genes in mouse skeletal muscle hypertrophy and regrowth

    Science.gov (United States)

    Chaillou, Thomas; Jackson, Janna R.; England, Jonathan H.; Kirby, Tyler J.; Richards-White, Jena; Esser, Karyn A.; Dupont-Versteegden, Esther E.

    2014-01-01

    The purpose of this study was to compare the gene expression profile of mouse skeletal muscle undergoing two forms of growth (hypertrophy and regrowth) with the goal of identifying a conserved set of differentially expressed genes. Expression profiling by microarray was performed on the plantaris muscle subjected to 1, 3, 5, 7, 10, and 14 days of hypertrophy or regrowth following 2 wk of hind-limb suspension. We identified 97 differentially expressed genes (≥2-fold increase or ≥50% decrease compared with control muscle) that were conserved during the two forms of muscle growth. The vast majority (∼90%) of the differentially expressed genes was upregulated and occurred at a single time point (64 out of 86 genes), which most often was on the first day of the time course. Microarray analysis from the conserved upregulated genes showed a set of genes related to contractile apparatus and stress response at day 1, including three genes involved in mechanotransduction and four genes encoding heat shock proteins. Our analysis further identified three cell cycle-related genes at day and several genes associated with extracellular matrix (ECM) at both days 3 and 10. In conclusion, we have identified a core set of genes commonly upregulated in two forms of muscle growth that could play a role in the maintenance of sarcomere stability, ECM remodeling, cell proliferation, fast-to-slow fiber type transition, and the regulation of skeletal muscle growth. These findings suggest conserved regulatory mechanisms involved in the adaptation of skeletal muscle to increased mechanical loading. PMID:25554798

  13. Identification of a conserved set of upregulated genes in mouse skeletal muscle hypertrophy and regrowth.

    Science.gov (United States)

    Chaillou, Thomas; Jackson, Janna R; England, Jonathan H; Kirby, Tyler J; Richards-White, Jena; Esser, Karyn A; Dupont-Versteegden, Esther E; McCarthy, John J

    2015-01-01

    The purpose of this study was to compare the gene expression profile of mouse skeletal muscle undergoing two forms of growth (hypertrophy and regrowth) with the goal of identifying a conserved set of differentially expressed genes. Expression profiling by microarray was performed on the plantaris muscle subjected to 1, 3, 5, 7, 10, and 14 days of hypertrophy or regrowth following 2 wk of hind-limb suspension. We identified 97 differentially expressed genes (≥2-fold increase or ≥50% decrease compared with control muscle) that were conserved during the two forms of muscle growth. The vast majority (∼90%) of the differentially expressed genes was upregulated and occurred at a single time point (64 out of 86 genes), which most often was on the first day of the time course. Microarray analysis from the conserved upregulated genes showed a set of genes related to contractile apparatus and stress response at day 1, including three genes involved in mechanotransduction and four genes encoding heat shock proteins. Our analysis further identified three cell cycle-related genes at day and several genes associated with extracellular matrix (ECM) at both days 3 and 10. In conclusion, we have identified a core set of genes commonly upregulated in two forms of muscle growth that could play a role in the maintenance of sarcomere stability, ECM remodeling, cell proliferation, fast-to-slow fiber type transition, and the regulation of skeletal muscle growth. These findings suggest conserved regulatory mechanisms involved in the adaptation of skeletal muscle to increased mechanical loading. Copyright © 2015 the American Physiological Society.

  14. Gene set analyses of genome-wide association studies on 49 quantitative traits measured in a single genetic epidemiology dataset.

    Science.gov (United States)

    Kim, Jihye; Kwon, Ji-Sun; Kim, Sangsoo

    2013-09-01

    Gene set analysis is a powerful tool for interpreting a genome-wide association study result and is gaining popularity these days. Comparison of the gene sets obtained for a variety of traits measured from a single genetic epidemiology dataset may give insights into the biological mechanisms underlying these traits. Based on the previously published single nucleotide polymorphism (SNP) genotype data on 8,842 individuals enrolled in the Korea Association Resource project, we performed a series of systematic genome-wide association analyses for 49 quantitative traits of basic epidemiological, anthropometric, or blood chemistry parameters. Each analysis result was subjected to subsequent gene set analyses based on Gene Ontology (GO) terms using gene set analysis software, GSA-SNP, identifying a set of GO terms significantly associated to each trait (pcorr neuronal or nerve systems.

  15. Applying gene set enrichment analysis and meta-analysis to screen key genes controlling the development and progression of hepatic carcinoma%基因富集及Meta分析对影响肝癌发生发展关键基因的筛选

    Institute of Scientific and Technical Information of China (English)

    曹骥; 卢晓旭; 胡艳玲; 李瑗; 朱伶群; 杨春; 欧超; 唐艳萍

    2012-01-01

    AIM: To analyze vast amounts of hepatic carcinoma-related microarray data and identify crucial genes that control the development and progression of hepatocellular carcinoma (HCQ.METHODS: Cross-species comparison could be used to explore the similarities between HCC-related gene expression profiles of human beings and other species. In order to screen genes that are involved in hepatocarcinogenesis, gene set enrichment analysis (GSEA) and meta-analysis were performed to study five gene expression data sets of independent species.RESULTS: Among the five gene expression data sets, three up-regulated and two down-regulated pathways were found to be consistent by gene set enrichment analysis. The up-regulated pathways are amino sugar and nucle-otide sugar metabolism, cell cycle, and thyroid cancer, while the down-regulated pathways are linoleic acid metabolism and arachidonic acid metabolism. A total of 1 708 genes with a P < 0.05 were found in meta-analysis for five datas-ets, of which 720 could be assigned to functional pathways by DAVID and KEGG. These pathways include cell cycle, oocyte meiosis, and DNA replication. Cell cycle is the overlapping significant pathway between the two methods. Twenty-five genes with a P < 0.05 were identified in meta-analysis of cell cycle pathway. Five significant genes may be involved in the occurrence and progression of HCC.CONCLUSION: Cell cycle may be the crucial pathway to affect signal transduction in hepatocarcinogenesis.%目的:筛选影响肝癌发生发展的关键基因.方法:运用跨种属肿瘤基因筛选策略比较不同种属的肝癌基因表达谱间的相似改变,选择5套不同种属的肝癌基因表达芯片分别通过基因组富集(gene set enrichment analysis,GSEA)以及对单套数据集单个基因元分析(metaanalysis,Meta)的分析方法,筛选出在转录水平上影响肝癌的基因.结果:用GSEA方法分析,5组数据中所得通路对比,上调中皆有的通路为氨基糖核苷酸糖代谢

  16. Algorithm for Finding Optimal Gene Sets in Microarray Prediction

    CERN Document Server

    Deutsch, J M

    2001-01-01

    Motivation: Microarray data has been recently been shown to be efficacious in distinguishing closely related cell types that often appear in the diagnosis of cancer. It is useful to determine the minimum number of genes needed to do such a diagnosis both for clinical use and to determine the importance of specific genes for cancer. Here a replication algorithm is used for this purpose. It evolves an ensemble of predictors, all using different combinations of genes to generate a set of optimal predictors. Results: We apply this method to the leukemia data of the Whitehead/MIT group that attempts to differentially diagnose two kinds of leukemia, and also to data of Khan et. al. to distinguish four different kinds of childhood cancers. In the latter case we were able to reduce the number of genes needed from 96 down to 15, while at the same time being able to perfectly classify all of their test data. Availability: http://stravinsky.ucsc.edu/josh/gesses/ Contact: josh@physics.ucsc.edu

  17. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Science.gov (United States)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  18. Core set approach to reduce uncertainty of gene trees

    Directory of Open Access Journals (Sweden)

    Okuhara Yoshiyasu

    2006-05-01

    Full Text Available Abstract Background A genealogy based on gene sequences within a species plays an essential role in the estimation of the character, structure, and evolutionary history of that species. Because intraspecific sequences are more closely related than interspecific ones, detailed information on the evolutionary process may be available by determining all the node sequences of trees and provide insight into functional constraints and adaptations. However, strong evolutionary correlations on a few lineages make this determination difficult as a whole, and the maximum parsimony (MP method frequently allows a number of topologies with a same total branching length. Results Kitazoe et al. developed multidimensional vector-space representation of phylogeny. It converts additivity of evolutionary distances to orthogonality among the vectors expressing branches, and provides a unified index to measure deviations from the orthogoality. In this paper, this index is used to detect and exclude sequences with large deviations from orthogonality, and then selects a maximum subset ("core set" of sequences for which MP generates a single solution. Once the core set tree is formed whose all the node sequences are given, the excluded sequences are found to have basically two phylogenetic positions on this tree, respectively. Fortunately, since multiple substitutions are rare in intra-species sequences, the variance of nucleotide transitions is confined to a small range. By applying the core set approach to 38 partial env sequences of HIV-1 in a single patient and also 198 mitochondrial COI and COII DNA sequences of Anopheles dirus, we demonstrate how consistently this approach constructs the tree. Conclusion In the HIV dataset, we confirmed that the obtained core set tree is the unique maximum set for which MP proposes a single tree. In the mosquito data set, the fluctuation of nucleotide transitions caused by the sequences excluded from the core set was very small

  19. ChIP-Enrich: gene set enrichment testing for ChIP-seq data.

    Science.gov (United States)

    Welch, Ryan P; Lee, Chee; Imbriano, Paul M; Patil, Snehal; Weymouth, Terry E; Smith, R Alex; Scott, Laura J; Sartor, Maureen A

    2014-07-01

    Gene set enrichment testing can enhance the biological interpretation of ChIP-seq data. Here, we develop a method, ChIP-Enrich, for this analysis which empirically adjusts for gene locus length (the length of the gene body and its surrounding non-coding sequence). Adjustment for gene locus length is necessary because it is often positively associated with the presence of one or more peaks and because many biologically defined gene sets have an excess of genes with longer or shorter gene locus lengths. Unlike alternative methods, ChIP-Enrich can account for the wide range of gene locus length-to-peak presence relationships (observed in ENCODE ChIP-seq data sets). We show that ChIP-Enrich has a well-calibrated type I error rate using permuted ENCODE ChIP-seq data sets; in contrast, two commonly used gene set enrichment methods, Fisher's exact test and the binomial test implemented in Genomic Regions Enrichment of Annotations Tool (GREAT), can have highly inflated type I error rates and biases in ranking. We identify DNA-binding proteins, including CTCF, JunD and glucocorticoid receptor α (GRα), that show different enrichment patterns for peaks closer to versus further from transcription start sites. We also identify known and potential new biological functions of GRα. ChIP-Enrich is available as a web interface (http://chip-enrich.med.umich.edu) and Bioconductor package. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Different gene sets contribute to different symptom dimensions of depression and anxiety.

    Science.gov (United States)

    van Veen, Tineke; Goeman, Jelle J; Monajemi, Ramin; Wardenaar, Klaas J; Hartman, Catharina A; Snieder, Harold; Nolte, Ilja M; Penninx, Brenda W J H; Zitman, Frans G

    2012-07-01

    Although many genetic association studies have been carried out, it remains unclear which genes contribute to depression. This may be due to heterogeneity of the DSM-IV category of depression. Specific symptom-dimensions provide a more homogenous phenotype. Furthermore, as effects of individual genes are small, analysis of genetic data at the pathway-level provides more power to detect associations and yield valuable biological insight. In 1,398 individuals with a Major Depressive Disorder, the symptom dimensions of the tripartite model of anxiety and depression, General Distress, Anhedonic Depression, and Anxious Arousal, were measured with the Mood and Anxiety Symptoms Questionnaire (30-item Dutch adaptation; MASQ-D30). Association of these symptom dimensions with candidate gene sets and gene sets from two public pathway databases was tested using the Global test. One pathway was associated with General Distress, and concerned molecules expressed in the endoplasmatic reticulum lumen. Seven pathways were associated with Anhedonic Depression. Important themes were neurodevelopment, neurodegeneration, and cytoskeleton. Furthermore, three gene sets associated with Anxious Arousal regarded development, morphology, and genetic recombination. The individual pathways explained up to 1.7% of the variance. These data demonstrate mechanisms that influence the specific dimensions. Moreover, they show the value of using dimensional phenotypes on one hand and gene sets on the other hand.

  1. Glutamatergic and GABAergic gene sets in attention-deficit/hyperactivity disorder

    DEFF Research Database (Denmark)

    Naaijen, J; Bralten, J; Poelmans, G

    2017-01-01

    Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance...... within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms......, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome...

  2. The Core Mouse Response to Infection by Neospora Caninum Defined by Gene Set Enrichment Analyses

    Science.gov (United States)

    Ellis, John; Goodswen, Stephen; Kennedy, Paul J; Bush, Stephen

    2012-01-01

    In this study, the BALB/c and Qs mouse responses to infection by the parasite Neospora caninum were investigated in order to identify host response mechanisms. Investigation was done using gene set (enrichment) analyses of microarray data. GSEA, MANOVA, Romer, subGSE and SAM-GS were used to study the contrasts Neospora strain type, Mouse type (BALB/c and Qs) and time post infection (6 hours post infection and 10 days post infection). The analyses show that the major signal in the core mouse response to infection is from time post infection and can be defined by gene ontology terms Protein Kinase Activity, Cell Proliferation and Transcription Initiation. Several terms linked to signaling, morphogenesis, response and fat metabolism were also identified. At 10 days post infection, genes associated with fatty acid metabolism were identified as up regulated in expression. The value of gene set (enrichment) analyses in the analysis of microarray data is discussed. PMID:23012496

  3. A gene pattern mining algorithm using interchangeable gene sets for prokaryotes

    Directory of Open Access Journals (Sweden)

    Kim Sun

    2008-02-01

    Full Text Available Abstract Background Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity. Results In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable, we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection. Conclusion The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function.

  4. Filtering Genes for Cluster and Network Analysis

    Directory of Open Access Journals (Sweden)

    Parkhomenko Elena

    2009-06-01

    Full Text Available Abstract Background Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias. Results This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks. Conclusion The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.

  5. Music analysis and point-set compression

    DEFF Research Database (Denmark)

    Meredith, David

    A musical analysis represents a particular way of understanding certain aspects of the structure of a piece of music. The quality of an analysis can be evaluated to some extent by the degree to which knowledge of it improves performance on tasks such as mistake spotting, memorising a piece...... as the minimum description length principle and relates closely to certain ideas in the theory of Kolmogorov complexity. Inspired by this general principle, the hypothesis explored in this paper is that the best ways of understanding (or explanations for) a piece of music are those that are represented...... by the shortest possible descriptions of the piece. With this in mind, two compression algorithms are presented, COSIATEC and SIATECCompress. Each of these algorithms takes as input an in extenso description of a piece of music as a set of points in pitch-time space representing notes. Each algorithm...

  6. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets.

    Science.gov (United States)

    Khan, Aziz; Mathelier, Anthony

    2017-05-31

    A common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited. To address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets. Intervene and its web application companion provide an easy command line and an interactive web interface to compute intersections of multiple genomic and list sets. They have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene , with the web application available at https://asntech.shinyapps.io/intervene .

  7. Analysis of the real EADGENE data set: Multivariate approaches and post analysis (Open Access publication

    Directory of Open Access Journals (Sweden)

    Schuberth Hans-Joachim

    2007-11-01

    Full Text Available Abstract The aim of this paper was to describe, and when possible compare, the multivariate methods used by the participants in the EADGENE WP1.4 workshop. The first approach was for class discovery and class prediction using evidence from the data at hand. Several teams used hierarchical clustering (HC or principal component analysis (PCA to identify groups of differentially expressed genes with a similar expression pattern over time points and infective agent (E. coli or S. aureus. The main result from these analyses was that HC and PCA were able to separate tissue samples taken at 24 h following E. coli infection from the other samples. The second approach identified groups of differentially co-expressed genes, by identifying clusters of genes highly correlated when animals were infected with E. coli but not correlated more than expected by chance when the infective pathogen was S. aureus. The third approach looked at differential expression of predefined gene sets. Gene sets were defined based on information retrieved from biological databases such as Gene Ontology. Based on these annotation sources the teams used either the GlobalTest or the Fisher exact test to identify differentially expressed gene sets. The main result from these analyses was that gene sets involved in immune defence responses were differentially expressed.

  8. Comprehensive set of integrative plasmid vectors for copper-inducible gene expression in Myxococcus xanthus.

    Science.gov (United States)

    Gómez-Santos, Nuria; Treuner-Lange, Anke; Moraleda-Muñoz, Aurelio; García-Bravo, Elena; García-Hernández, Raquel; Martínez-Cayuela, Marina; Pérez, Juana; Søgaard-Andersen, Lotte; Muñoz-Dorado, José

    2012-04-01

    Myxococcus xanthus is widely used as a model system for studying gliding motility, multicellular development, and cellular differentiation. Moreover, M. xanthus is a rich source of novel secondary metabolites. The analysis of these processes has been hampered by the limited set of tools for inducible gene expression. Here we report the construction of a set of plasmid vectors to allow copper-inducible gene expression in M. xanthus. Analysis of the effect of copper on strain DK1622 revealed that copper concentrations of up to 500 μM during growth and 60 μM during development do not affect physiological processes such as cell viability, motility, or aggregation into fruiting bodies. Of the copper-responsive promoters in M. xanthus reported so far, the multicopper oxidase cuoA promoter was used to construct expression vectors, because no basal expression is observed in the absence of copper and induction linearly depends on the copper concentration in the culture medium. Four different plasmid vectors have been constructed, with different marker selection genes and sites of integration in the M. xanthus chromosome. The vectors have been tested and gene expression quantified using the lacZ gene. Moreover, we demonstrate the functional complementation of the motility defect caused by lack of PilB by the copper-induced expression of the pilB gene. These versatile vectors are likely to deepen our understanding of the biology of M. xanthus and may also have biotechnological applications.

  9. Jetset: selecting the optimal microarray probe set to represent a gene

    DEFF Research Database (Denmark)

    Li, Qiyuan; Birkbak, Nicolai Juul; Gyorffy, Balazs

    2011-01-01

    Background: Interpretation of gene expression microarrays requires a mapping from probe set to gene. On many Affymetrix gene expression microarrays, a given gene may be detected by multiple probe sets, which may deliver inconsistent or even contradictory measurements. Therefore, obtaining...... an unambiguous expression estimate of a pre-specified gene can be a nontrivial but essential task. Results: We developed scoring methods to assess each probe set for specificity, splice isoform coverage, and robustness against transcript degradation. We used these scores to select a single representative probe...... set for each gene, thus creating a simple one-to-one mapping between gene and probe set. To test this method, we evaluated concordance between protein measurements and gene expression values, and between sets of genes whose expression is known to be correlated. For both test cases, we identified genes...

  10. Gene regulatory network inference using fused LASSO on multiple data sets.

    Science.gov (United States)

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M O; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-02-11

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions.

  11. Application of multidisciplinary analysis to gene expression.

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Xuefel (University of New Mexico, Albuquerque, NM); Kang, Huining (University of New Mexico, Albuquerque, NM); Fields, Chris (New Mexico State University, Las Cruces, NM); Cowie, Jim R. (New Mexico State University, Las Cruces, NM); Davidson, George S.; Haaland, David Michael; Sibirtsev, Valeriy (New Mexico State University, Las Cruces, NM); Mosquera-Caro, Monica P. (University of New Mexico, Albuquerque, NM); Xu, Yuexian (University of New Mexico, Albuquerque, NM); Martin, Shawn Bryan; Helman, Paul (University of New Mexico, Albuquerque, NM); Andries, Erik (University of New Mexico, Albuquerque, NM); Ar, Kerem (University of New Mexico, Albuquerque, NM); Potter, Jeffrey (University of New Mexico, Albuquerque, NM); Willman, Cheryl L. (University of New Mexico, Albuquerque, NM); Murphy, Maurice H. (University of New Mexico, Albuquerque, NM)

    2004-01-01

    Molecular analysis of cancer, at the genomic level, could lead to individualized patient diagnostics and treatments. The developments to follow will signal a significant paradigm shift in the clinical management of human cancer. Despite our initial hopes, however, it seems that simple analysis of microarray data cannot elucidate clinically significant gene functions and mechanisms. Extracting biological information from microarray data requires a complicated path involving multidisciplinary teams of biomedical researchers, computer scientists, mathematicians, statisticians, and computational linguists. The integration of the diverse outputs of each team is the limiting factor in the progress to discover candidate genes and pathways associated with the molecular biology of cancer. Specifically, one must deal with sets of significant genes identified by each method and extract whatever useful information may be found by comparing these different gene lists. Here we present our experience with such comparisons, and share methods developed in the analysis of an infant leukemia cohort studied on Affymetrix HG-U95A arrays. In particular, spatial gene clustering, hyper-dimensional projections, and computational linguistics were used to compare different gene lists. In spatial gene clustering, different gene lists are grouped together and visualized on a three-dimensional expression map, where genes with similar expressions are co-located. In another approach, projections from gene expression space onto a sphere clarify how groups of genes can jointly have more predictive power than groups of individually selected genes. Finally, online literature is automatically rearranged to present information about genes common to multiple groups, or to contrast the differences between the lists. The combination of these methods has improved our understanding of infant leukemia. While the complicated reality of the biology dashed our initial, optimistic hopes for simple answers from

  12. Integrating genome-wide association study and expression quantitative trait loci data identifies multiple genes and gene set associated with neuroticism.

    Science.gov (United States)

    Fan, Qianrui; Wang, Wenyu; Hao, Jingcan; He, Awen; Wen, Yan; Guo, Xiong; Wu, Cuiyan; Ning, Yujie; Wang, Xi; Wang, Sen; Zhang, Feng

    2017-08-01

    Neuroticism is a fundamental personality trait with significant genetic determinant. To identify novel susceptibility genes for neuroticism, we conducted an integrative analysis of genomic and transcriptomic data of genome wide association study (GWAS) and expression quantitative trait locus (eQTL) study. GWAS summary data was driven from published studies of neuroticism, totally involving 170,906 subjects. eQTL dataset containing 927,753 eQTLs were obtained from an eQTL meta-analysis of 5311 samples. Integrative analysis of GWAS and eQTL data was conducted by summary data-based Mendelian randomization (SMR) analysis software. To identify neuroticism associated gene sets, the SMR analysis results were further subjected to gene set enrichment analysis (GSEA). The gene set annotation dataset (containing 13,311 annotated gene sets) of GSEA Molecular Signatures Database was used. SMR single gene analysis identified 6 significant genes for neuroticism, including MSRA (p value=2.27×10(-10)), MGC57346 (p value=6.92×10(-7)), BLK (p value=1.01×10(-6)), XKR6 (p value=1.11×10(-6)), C17ORF69 (p value=1.12×10(-6)) and KIAA1267 (p value=4.00×10(-6)). Gene set enrichment analysis observed significant association for Chr8p23 gene set (false discovery rate=0.033). Our results provide novel clues for the genetic mechanism studies of neuroticism. Copyright © 2017. Published by Elsevier Inc.

  13. Identification of a set of genes showing regionally enriched expression in the mouse brain

    Directory of Open Access Journals (Sweden)

    Marra Marco A

    2008-07-01

    Full Text Available Abstract Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters ( Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression.

  14. Function analysis of unknown genes

    DEFF Research Database (Denmark)

    Rogowska-Wrzesinska, A.

    2002-01-01

      This thesis entitled "Function analysis of unknown genes" presents the use of proteome analysis for the characterisation of yeast (Saccharomyces cerevisiae) genes and their products (proteins especially those of unknown function). This study illustrates that proteome analysis can be used...... to describe different aspects of molecular biology of the cell, to study changes that occur in the cell due to overexpression or deletion of a gene and to identify various protein modifications. The biological questions and the results of the described studies show the diversity of the information that can...... genes and proteins. It reports the first global proteome database collecting 36 yeast single gene deletion mutants and selecting over 650 differences between analysed mutants and the wild type strain. The obtained results show that two-dimensional gel electrophoresis and mass spectrometry based proteome...

  15. Beyond main effects of gene-sets: harsh parenting moderates the association between a dopamine gene-set and child externalizing behavior

    NARCIS (Netherlands)

    J. Windhorst (Judith); V. Mileva-Seitz; R.C.A. Rippe (Ralph C.A.); H.W. Tiemeier (Henning); V.W.V. Jaddoe (Vincent); F.C. Verhulst (Frank); M.H. van IJzendoorn (Marinus); M.J. Bakermans-Kranenburg (Marian)

    2016-01-01

    textabstractBackground: In a longitudinal cohort study, we investigated the interplay of harsh parenting and genetic variation across a set of functionally related dopamine genes, in association with children's externalizing behavior. This is one of the first studies to employ gene-based and gene-se

  16. Beyond main effects of gene-sets: Harsh parenting moderates the association between a dopamine gene-set and child externalizing behavior

    OpenAIRE

    Windhorst, D.A.; Mileva, V.R.; Rippe, R.C.A.; Tiemeier, H; Jaddoe, V. W. V.; Verhulst, F. C.; IJzendoorn, van, M.H.; Bakermans, M.J.

    2016-01-01

    Abstract Background In a longitudinal cohort study, we investigated the interplay of harsh parenting and genetic variation across a set of functionally related dopamine genes, in association with children's externalizing behavior. This is one of the first studies to employ gene‐based and gene‐set approaches in tests of Gene by Environment (G × E) effects on complex behavior. This approach can offer an important alternative or complement to candidate gene and genome‐wide environmental interact...

  17. Analysis of the heat setting process

    Science.gov (United States)

    Besler, N.; Gloy, Y. S.; Gries, T.

    2016-07-01

    Heat setting is an expensive and energy elaborative textile process. Heat setting is necessary to guarantee size accuracy and dimensional stability for textile materials. Depending on the material different heat setting methods such as saturated steam or hot air are used for the fixation. The research aim is to define the influence of heat setting on mechanical characteristics and to analyse the correlation of heat setting parameters for polyester. With the help of a “one factor at a time” experimental design heat setting parameters are varied. Mechanical characteristics and the material quality of heat set and not heat set material are evaluated to analyse the heat setting influence. In the described experimental design up to a temperature of 195 °C and a dwell time of 30 seconds the material shrinkage of polyester is increasing with increasing temperature and dwell time. Shrinkage in wales direction is higher than in course direction. The tensile strength in course direction stays constant whereas the tensile strength in wales direction can be increased by heat setting.

  18. Gene Selection Integrated with Biological Knowledge for Plant Stress Response Using Neighborhood System and Rough Set Theory.

    Science.gov (United States)

    Meng, Jun; Zhang, Jing; Luan, Yushi

    2015-01-01

    Mining knowledge from gene expression data is a hot research topic and direction of bioinformatics. Gene selection and sample classification are significant research trends, due to the large amount of genes and small size of samples in gene expression data. Rough set theory has been successfully applied to gene selection, as it can select attributes without redundancy. To improve the interpretability of the selected genes, some researchers introduced biological knowledge. In this paper, we first employ neighborhood system to deal directly with the new information table formed by integrating gene expression data with biological knowledge, which can simultaneously present the information in multiple perspectives and do not weaken the information of individual gene for selection and classification. Then, we give a novel framework for gene selection and propose a significant gene selection method based on this framework by employing reduction algorithm in rough set theory. The proposed method is applied to the analysis of plant stress response. Experimental results on three data sets show that the proposed method is effective, as it can select significant gene subsets without redundancy and achieve high classification accuracy. Biological analysis for the results shows that the interpretability is well.

  19. Human Effector / Initiator Gene Sets That Regulate Myometrial Contractility During Term and Preterm Labor

    Science.gov (United States)

    WEINER, Carl P.; MASON, Clifford W.; DONG, Yafeng; BUHIMSCHI, Irina A.; SWAAN, Peter W.; BUHIMSCHI, Catalin S.

    2010-01-01

    Objective Distinct processes govern transition from quiescence to activation during term (TL) and preterm labor (PTL). We sought gene sets responsible for TL and PTL, along with the effector genes necessary for labor independent of gestation and underlying trigger. Methods Expression was analyzed in term and preterm +/− labor (n =6 subjects/group). Gene sets were generated using logic operations. Results 34 genes were similarly expressed in PTL/TL but absent from nonlabor samples (Effector Set). 49 genes were specific to PTL (Preterm Initiator Set) and 174 to TL (Term Initiator Set). The gene ontogeny processes comprising Term Initiator and Effector Sets were diverse, though inflammation was represented in 4 of the top 10; inflammation dominated the Preterm Initiator Set. Comments TL and PTL differ dramatically in initiator profiles. Though inflammation is part of the Term Initiator and the Effector Sets, it is an overwhelming part of PTL associated with intraamniotic inflammation. PMID:20452493

  20. Evidence for intron length conservation in a set of mammalian genes associated with embryonic development

    LENUS (Irish Health Repository)

    2011-10-05

    Abstract Background We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples of genes that showed the most extreme conservation of total intron content in mammals. Results Gene sets annotated as being involved in pattern specification in the early embryo or containing the homeobox DNA-binding domain, were significantly enriched among genes with highly conserved intron content. We used ancestral sequences reconstructed with probabilistic models that account for insertion and deletion mutations to distinguish insertion and deletion events on lineages leading to human and mouse from their last common ancestor. Using a randomization procedure, we show that genes containing the homeobox domain show less change in intron content than expected, given the number of insertion and deletion events within their introns. Conclusions Our results suggest selection for gene expression precision or the existence of additional development-associated genes for which transcriptional delay is functionally significant.

  1. Evidence for intron length conservation in a set of mammalian genes associated with embryonic development

    Directory of Open Access Journals (Sweden)

    Korir Paul K

    2011-10-01

    Full Text Available Abstract Background We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples of genes that showed the most extreme conservation of total intron content in mammals. Results Gene sets annotated as being involved in pattern specification in the early embryo or containing the homeobox DNA-binding domain, were significantly enriched among genes with highly conserved intron content. We used ancestral sequences reconstructed with probabilistic models that account for insertion and deletion mutations to distinguish insertion and deletion events on lineages leading to human and mouse from their last common ancestor. Using a randomization procedure, we show that genes containing the homeobox domain show less change in intron content than expected, given the number of insertion and deletion events within their introns. Conclusions Our results suggest selection for gene expression precision or the existence of additional development-associated genes for which transcriptional delay is functionally significant.

  2. Chromatin analysis of occluded genes

    Science.gov (United States)

    Lee, Jae Hyun; Gaetz, Jedidiah; Bugarija, Branimir; Fernandes, Croydon J.; Snyder, Gregory E.; Bush, Eliot C.; Lahn, Bruce T.

    2009-01-01

    We recently described two opposing states of transcriptional competency. One is termed ‘competent’ whereby a gene is capable of responding to trans-acting transcription factors of the cell, such that it is active if appropriate transcriptional activators are present, though it can also be silent if activators are absent or repressors are present. The other is termed ‘occluded’ whereby a gene is silenced by cis-acting, chromatin-based mechanisms in a manner that blocks it from responding to trans-acting factors, such that it is silent even when activators are present in the cellular milieu. We proposed that gene occlusion is a mechanism by which differentiated cells stably maintain their phenotypic identities. Here, we describe chromatin analysis of occluded genes. We found that DNA methylation plays a causal role in maintaining occlusion for a subset of occluded genes. We further examined a variety of other chromatin marks typically associated with transcriptional silencing, including histone variants, covalent histone modifications and chromatin-associated proteins. Surprisingly, we found that although many of these marks are robustly linked to silent genes (which include both occluded genes and genes that are competent but silent), none is linked specifically to occluded genes. Although the observation does not rule out a possible causal role of these chromatin marks in occlusion, it does suggest that these marks might be secondary effect rather than primary cause of the silent state in many genes. PMID:19380460

  3. A rough set based rational clustering framework for determining correlated genes.

    Science.gov (United States)

    Jeyaswamidoss, Jeba Emilyn; Thangaraj, Kesavan; Ramar, Kadarkarai; Chitra, Muthusamy

    2016-06-01

    Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters.

  4. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    Science.gov (United States)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  5. Object oriented data analysis: Sets of trees

    OpenAIRE

    Wang, Haonan; Marron, J.S.

    2007-01-01

    Object oriented data analysis is the statistical analysis of populations of complex objects. In the special case of functional data analysis, these data objects are curves, where standard Euclidean approaches, such as principal component analysis, have been very successful. Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie groups and symmetric spaces, or of ...

  6. A method for developing regulatory gene set networks to characterize complex biological systems.

    Science.gov (United States)

    Suphavilai, Chayaporn; Zhu, Liugen; Chen, Jake Y

    2015-01-01

    Traditional approaches to studying molecular networks are based on linking genes or proteins. Higher-level networks linking gene sets or pathways have been proposed recently. Several types of gene set networks have been used to study complex molecular networks such as co-membership gene set networks (M-GSNs) and co-enrichment gene set networks (E-GSNs). Gene set networks are useful for studying biological mechanism of diseases and drug perturbations. In this study, we proposed a new approach for constructing directed, regulatory gene set networks (R-GSNs) to reveal novel relationships among gene sets or pathways. We collected several gene set collections and high-quality gene regulation data in order to construct R-GSNs in a comparative study with co-membership gene set networks (M-GSNs). We described a method for constructing both global and disease-specific R-GSNs and determining their significance. To demonstrate the potential applications to disease biology studies, we constructed and analysed an R-GSN specifically built for Alzheimer's disease. R-GSNs can provide new biological insights complementary to those derived at the protein regulatory network level or M-GSNs. When integrated properly to functional genomics data, R-GSNs can help enable future research on systems biology and translational bioinformatics.

  7. Gene set based association analyses for the WSSV resistance of Pacific white shrimp Litopenaeus vannamei

    Science.gov (United States)

    Yu, Yang; Liu, Jingwen; Li, Fuhua; Zhang, Xiaojun; Zhang, Chengsong; Xiang, Jianhai

    2017-01-01

    White Spot Syndrome Virus (WSSV) is regarded as a virus with the strongest pathogenicity to shrimp. For the threshold trait such as disease resistance, marker assisted selection (MAS) was considered to be a more effective approach. In the present study, association analyses of single nucleotide polymorphisms (SNPs) located in a set of immune related genes were conducted to identify markers associated with WSSV resistance. SNPs were detected by bioinformatics analysis on RNA sequencing data generated by Illimina sequencing platform and Roche 454 sequencing technology. A total of 681 SNPs located in the exons of immune related genes were selected as candidate SNPs. Among these SNPs, 77 loci were genotyped in WSSV susceptible group and resistant group. Association analysis was performed based on logistic regression method under an additive and dominance model in GenABEL package. As a result, five SNPs showed associations with WSSV resistance at a significant level of 0.05. Besides, SNP-SNP interaction analysis was conducted. The combination of SNP loci in TRAF6, Cu/Zn SOD and nLvALF2 exhibited a significant effect on the WSSV resistance of shrimp. Gene expression analysis revealed that these SNPs might influence the expression of these immune-related genes. This study provides a useful method for performing MAS in shrimp. PMID:28094323

  8. Gene set-based module discovery in the breast cancer transcriptome

    Directory of Open Access Journals (Sweden)

    Zhang Michael Q

    2009-02-01

    Full Text Available Abstract Background Although microarray-based studies have revealed global view of gene expression in cancer cells, we still have little knowledge about regulatory mechanisms underlying the transcriptome. Several computational methods applied to yeast data have recently succeeded in identifying expression modules, which is defined as co-expressed gene sets under common regulatory mechanisms. However, such module discovery methods are not applied cancer transcriptome data. Results In order to decode oncogenic regulatory programs in cancer cells, we developed a novel module discovery method termed EEM by extending a previously reported module discovery method, and applied it to breast cancer expression data. Starting from seed gene sets prepared based on cis-regulatory elements, ChIP-chip data, and gene locus information, EEM identified 10 principal expression modules in breast cancer based on their expression coherence. Moreover, EEM depicted their activity profiles, which predict regulatory programs in each subtypes of breast tumors. For example, our analysis revealed that the expression module regulated by the Polycomb repressive complex 2 (PRC2 is downregulated in triple negative breast cancers, suggesting similarity of transcriptional programs between stem cells and aggressive breast cancer cells. We also found that the activity of the PRC2 expression module is negatively correlated to the expression of EZH2, a component of PRC2 which belongs to the E2F expression module. E2F-driven EZH2 overexpression may be responsible for the repression of the PRC2 expression modules in triple negative tumors. Furthermore, our network analysis predicts regulatory circuits in breast cancer cells. Conclusion These results demonstrate that the gene set-based module discovery approach is a powerful tool to decode regulatory programs in cancer cells.

  9. Coverage and characteristics of the Affymetrix GeneChip Human Mapping 100K SNP set.

    Directory of Open Access Journals (Sweden)

    2006-05-01

    Full Text Available Improvements in technology have made it possible to conduct genome-wide association mapping at costs within reach of academic investigators, and experiments are currently being conducted with a variety of high-throughput platforms. To provide an appropriate context for interpreting results of such studies, we summarize here results of an investigation of one of the first of these technologies to be publicly available, the Affymetrix GeneChip Human Mapping 100K set of single nucleotide polymorphisms (SNPs. In a systematic analysis of the pattern and distribution of SNPs in the Mapping 100K set, we find that SNPs in this set are undersampled from coding regions (both nonsynonymous and synonymous and oversampled from regions outside genes, relative to SNPs in the overall HapMap database. In addition, we utilize a novel multilocus linkage disequilibrium (LD coefficient based on information content (analogous to the information content scores commonly used for linkage mapping that is equivalent to the familiar measure r2 in the special case of two loci. Using this approach, we are able to summarize for any subset of markers, such as the Affymetrix Mapping 100K set, the information available for association mapping in that subset, relative to the information available in the full set of markers included in the HapMap, and highlight circumstances in which this multilocus measure of LD provides substantial additional insight about the haplotype structure in a region over pairwise measures of LD.

  10. A new set of reference genes for RT-qPCR assays in the yeast Dekkera bruxellensis.

    Science.gov (United States)

    de Barros Pita, Will; Leite, Fernanda Cristina Bezerra; de Souza Liberal, Anna Theresa; Pereira, Luciana Filgueira; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante; de Morais, Marcos Antonio

    2012-12-01

    The yeast Dekkera bruxellensis has been recently regarded as an important microorganism for bioethanol production owing to its ability to convert glucose, sucrose, and cellobiose to ethanol. The aim of this work was to validate a new set of reference genes for gene expression analysis by quantitative real-time PCR in D. bruxellensis and compare the influence of the method of choice for quantification of mRNA levels with the reliability of our data. Three candidate reference genes, DbEFA1, DbEFB1, and DbYNA1, were used in a quantitative analysis of 4 genes of interest, DbYNR1, DbTPS1, DbADH7, and DbUBA4, based on an approach for calculating the normalization factors by means of the geNorm applet. Each reference gene was also individually used for a 2(-ΔΔC(q)) (comparative C(q) method) calculation of the relative expression of genes of interest. Our results showed that the 3 reference genes provided enough stability and were complementary to the normalization factors method in different culture conditions. This work was able to confirm the usefulness of a previously reported reference gene, EFA1/TEF1, and increased the set of possible reference genes in D. bruxellensis to 4. Moreover, this can improve the reliability of the analysis of the regulation of gene expression in the industrial yeast D. bruxellensis.

  11. Music analysis and point-set compression

    DEFF Research Database (Denmark)

    Meredith, David

    2015-01-01

    on three analytical tasks that depend on the discovery of repeated patterns: classifying folk song melodies into tune families, discovering themes and sections in polyphonic music, and discovering subject and countersubject entries in fugues. Each algorithm computes a compressed encoding of a point......COSIATEC, SIATECCompress and Forth’s algorithm are point-set compression algorithms developed for discovering repeated patterns in music, such as themes and motives that would be of interest to a music analyst. To investigate their effectiveness and versatility, these algorithms were evaluated......-set representation of a musical object in the form of a list of compact patterns, each pattern being given with a set of vectors indicating its occurrences. However, the algorithms adopt different strategies in their attempts to discover encodings that maximize compression.The best-performing algorithm on the folk...

  12. A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction.

    Science.gov (United States)

    Jung, Hye-Young; Leem, Sangseob; Lee, Sungyoung; Park, Taesung

    2016-12-01

    Gene-gene interaction (GGI) is one of the most popular approaches for finding the missing heritability of common complex traits in genetic association studies. The multifactor dimensionality reduction (MDR) method has been widely studied for detecting GGIs. In order to identify the best interaction model associated with disease susceptibility, MDR compares all possible genotype combinations in terms of their predictability of disease status from a simple binary high(H) and low(L) risk classification. However, this simple binary classification does not reflect the uncertainty of H/L classification. We regard classifying H/L as equivalent to defining the degree of membership of two risk groups H/L. By adopting the fuzzy set theory, we propose Fuzzy MDR which takes into account the uncertainty of H/L classification. Fuzzy MDR allows the possibility of partial membership of H/L through a membership function which transforms the degree of uncertainty into a [0,1] scale. The best genotype combinations can be selected which maximizes a new fuzzy set based accuracy measure. Two simulation studies are conducted to compare the power of the proposed Fuzzy MDR with that of MDR. Our results show that Fuzzy MDR has higher power than MDR. We illustrate the proposed Fuzzy MDR by analysing bipolar disorder (BD) trait of the WTCCC dataset to detect GGI associated with BD. We propose a novel Fuzzy MDR method to detect gene-gene interaction by taking into account the uncertainly of H/L classification and show that it has higher power than MDR. Fuzzy MDR can be easily extended to handle continuous phenotypes as well. The program written in R for the proposed Fuzzy MDR is available at https://statgen.snu.ac.kr/software/FuzzyMDR. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

    Institute of Scientific and Technical Information of China (English)

    Heng Li; Tao Liu; Hai-Hong Li; Yan Li; Li-Jun Fang; Hui-Min Xie; Wei-Mou Zheng; Bai-Lin Hao; Jin-Song Liu; Zhao Xu; Jiao Jin; Lin Fang; Lei Gao; Yu-Dong Li; Zi-Xing Xing; Shao-Gen Gao

    2005-01-01

    With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM,GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.

  14. Gene coexpression network analysis as a source of functional annotation for rice genes.

    Directory of Open Access Journals (Sweden)

    Kevin L Childs

    Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional

  15. A decision-theory approach to interpretable set analysis for high-dimensional data.

    Science.gov (United States)

    Boca, Simina M; Bravo, Héctor Céorrada; Caffo, Brian; Leek, Jeffrey T; Parmigiani, Giovanni

    2013-09-01

    A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses.

  16. Evaluation of endogenous control genes for gene expression studies across multiple tissues and in the specific sets of fat- and muscle-type samples of the pig.

    Science.gov (United States)

    Gu, Y R; Li, M Z; Zhang, K; Chen, L; Jiang, A A; Wang, J Y; Li, X W

    2011-08-01

    To normalize a set of quantitative real-time PCR (q-PCR) data, it is essential to determine an optimal number/set of housekeeping genes, as the abundance of housekeeping genes can vary across tissues or cells during different developmental stages, or even under certain environmental conditions. In this study, of the 20 commonly used endogenous control genes, 13, 18 and 17 genes exhibited credible stability in 56 different tissues, 10 types of adipose tissue and five types of muscle tissue, respectively. Our analysis clearly showed that three optimal housekeeping genes are adequate for an accurate normalization, which correlated well with the theoretical optimal number (r ≥ 0.94). In terms of economical and experimental feasibility, we recommend the use of the three most stable housekeeping genes for calculating the normalization factor. Based on our results, the three most stable housekeeping genes in all analysed samples (TOP2B, HSPCB and YWHAZ) are recommended for accurate normalization of q-PCR data. We also suggest that two different sets of housekeeping genes are appropriate for 10 types of adipose tissue (the HSPCB, ALDOA and GAPDH genes) and five types of muscle tissue (the TOP2B, HSPCB and YWHAZ genes), respectively. Our report will serve as a valuable reference for other studies aimed at measuring tissue-specific mRNA abundance in porcine samples.

  17. Identifying the optimal gene and gene set in hepatocellular carcinoma based on differential expression and differential co-expression algorithm.

    Science.gov (United States)

    Dong, Li-Yang; Zhou, Wei-Zhong; Ni, Jun-Wei; Xiang, Wei; Hu, Wen-Hao; Yu, Chang; Li, Hai-Yan

    2017-02-01

    The objective of this study was to identify the optimal gene and gene set for hepatocellular carcinoma (HCC) utilizing differential expression and differential co-expression (DEDC) algorithm. The DEDC algorithm consisted of four parts: calculating differential expression (DE) by absolute t-value in t-statistics; computing differential co-expression (DC) based on Z-test; determining optimal thresholds on the basis of Chi-squared (χ2) maximization and the corresponding gene was the optimal gene; and evaluating functional relevance of genes categorized into different partitions to determine the optimal gene set with highest mean minimum functional information (FI) gain (Δ*G). The optimal thresholds divided genes into four partitions, high DE and high DC (HDE-HDC), high DE and low DC (HDE-LDC), low DE and high DC (LDE‑HDC), and low DE and low DC (LDE-LDC). In addition, the optimal gene was validated by conducting reverse transcription-polymerase chain reaction (RT-PCR) assay. The optimal threshold for DC and DE were 1.032 and 1.911, respectively. Using the optimal gene, the genes were divided into four partitions including: HDE-HDC (2,053 genes), HED-LDC (2,822 genes), LDE-HDC (2,622 genes), and LDE-LDC (6,169 genes). The optimal gene was microtubule‑associated protein RP/EB family member 1 (MAPRE1), and RT-PCR assay validated the significant difference between the HCC and normal state. The optimal gene set was nucleoside metabolic process (GO\\GO:0009116) with Δ*G = 18.681 and 24 HDE-HDC partitions in total. In conclusion, we successfully investigated the optimal gene, MAPRE1, and gene set, nucleoside metabolic process, which may be potential biomarkers for targeted therapy and provide significant insight for revealing the pathological mechanism underlying HCC.

  18. Construction and expression of SET gene and siRNA recombinant adenovirus vectors

    Institute of Scientific and Technical Information of China (English)

    Xu Bo-qun; Lu Pin-hong; Li Ying; Xue Kai; Li Mei; Ma Xiang; Diao Fei-yan; Cui Yu-gui; Liu Jia-yin

    2010-01-01

    Objective: To construct SET gene recombinant adenovirus vector and SET gene small interfering RNA (SiRNA) recombinant adenovirus vector for over-expression or knock-down of SET levels.Methods: The cDNA sequence of SET was cloned by reverse transcriptive polymerase chain reaction (RT-PCR) and the SET gene fragment was subcloned into adenovirus shuttle plasmid pAdTrack-CMV to construct the shuttle plasmid pAdTrack-SET. The shuttle plasmid pAdtrack-SET was transformed into BJ5183 cells with the adenoviral backbone pAdEasy-1 to obtain the homologous recombinant Ad-CMV-SET and the recombinant Ad-CMV-SET was packaged and amplified in the AD293 cells. The expression of SET in AD293 cells was detected by Western blot. In addition, we constructed SET gene SiRNA recombinant adenovirus vector (Ad-H1-SiRNA/SET) and its efficacy of knockdown of SET protein was detected in infected GC-2spd(ts) cells by Western blot. Results: The recombinant adenovirus vectors, both SET gene recombinant adenovirus vector Ad-CMV-SET and SET gene SiRNA recombinant adenovirus vector Ad-H1-SiRNA/SET, were proven to be constructed successfully by the evidence of endonulease digestion and sequencing. AD293 cells infected with either recombinant adenovirus vector of Ad-CMV-SET or Ad-H1-SiRNA/SET were observed to express GFP. The expression of SET protein was up-regulated significantly in AD293 cells infected with SET gene recombinant adenovirus vector. On the contrast, SET protein was significantly down-regulated in the GC-2spd(ts) cells infected with Ad-H1-SiRNA/SET (P<0.05) and the knockdown efficiency was approximately 50%-70%. Conclusion: The recombinant adenovirus vector Ad-CMV-SET and Ad-H1-SiRNA/SET were successfully constructed and effectively expressed in germ cells and somatic cells. It provides an experimental tool for further study of SET gene in the physiological and pathophysiological mechanism of reproduction-related diseases.

  19. Generation of an algorithm based on minimal gene sets to clinically subtype triple negative breast cancer patients.

    Science.gov (United States)

    Ring, Brian Z; Hout, David R; Morris, Stephan W; Lawrence, Kasey; Schweitzer, Brock L; Bailey, Daniel B; Lehmann, Brian D; Pietenpol, Jennifer A; Seitz, Robert S

    2016-02-23

    Recently, a gene expression algorithm, TNBCtype, was developed that can divide triple-negative breast cancer (TNBC) into molecularly-defined subtypes. The algorithm has potential to provide predictive value for TNBC subtype-specific response to various treatments. TNBCtype used in a retrospective analysis of neoadjuvant clinical trial data of TNBC patients demonstrated that TNBC subtype and pathological complete response to neoadjuvant chemotherapy were significantly associated. Herein we describe an expression algorithm reduced to 101 genes with the power to subtype TNBC tumors similar to the original 2188-gene expression algorithm and predict patient outcomes. The new classification model was built using the same expression data sets used for the original TNBCtype algorithm. Gene set enrichment followed by shrunken centroid analysis were used for feature reduction, then elastic-net regularized linear modeling was used to identify genes for a centroid model classifying all subtypes, comprised of 101 genes. The predictive capability of both this new "lean" algorithm and the original 2188-gene model were applied to an independent clinical trial cohort of 139 TNBC patients treated initially with neoadjuvant doxorubicin/cyclophosphamide and then randomized to receive either paclitaxel or ixabepilone to determine association of pathologic complete response within the subtypes. The new 101-gene expression model reproduced the classification provided by the 2188-gene algorithm and was highly concordant in the same set of seven TNBC cohorts used to generate the TNBCtype algorithm (87%), as well as in the independent clinical trial cohort (88%), when cases with significant correlations to multiple subtypes were excluded. Clinical responses to both neoadjuvant treatment arms, found BL2 to be significantly associated with poor response (Odds Ratio (OR) =0.12, p=0.03 for the 2188-gene model; OR = 0.23, p sets can recapitulate the TNBC subtypes identified by the original 2188

  20. Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.

    Science.gov (United States)

    Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas

    2017-01-21

    We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.

  1. A transcriptomic approach to identify regulatory genes involved in fruit set of wild-type and parthenocarpic tomato genotypes.

    Science.gov (United States)

    Ruiu, Fabrizio; Picarella, Maurizio Enea; Imanishi, Shunsuke; Mazzucato, Andrea

    2015-10-01

    The tomato parthenocarpic fruit (pat) mutation associates a strong competence for parthenocarpy with homeotic transformation of anthers and aberrancy of ovules. To dissect this complex floral phenotype, genes involved in the pollination-independent fruit set of the pat mutant were investigated by microarray analysis using wild-type and mutant ovaries. Normalized expression data were subjected to one-way ANOVA and 2499 differentially expressed genes (DEGs) displaying a >1.5 log-fold change in at least one of the pairwise comparisons analyzed were detected. DEGs were categorized into 20 clusters and clusters classified into five groups representing transcripts with similar expression dynamics. The "regulatory function" group (685 DEGs) contained putative negative or positive fruit set regulators, "pollination-dependent" (411 DEGs) included genes activated by pollination, "fruit growth-related" (815 DEGs) genes activated at early fruit growth. The last groups listed genes with different or similar expression pattern at all stages in the two genotypes. qRT-PCR validation of 20 DEGs plus other four selected genes assessed the high reliability of microarray expression data; the average correlation coefficient for the 20 DEGs was 0.90. In all the groups were evidenced relevant transcription factors encoding proteins regulating meristem differentiation and floral organ development, genes involved in metabolism, transport and response of hormones, genes involved in cell division and in primary and secondary metabolism. Among pathways related to secondary metabolites emerged genes related to the synthesis of flavonoids, supporting the recent evidence that these compounds are important at the fruit set phase. Selected genes showing a de-regulated expression pattern in pat were studied in other four parthenocarpic genotypes either genetically anonymous or carrying lesions in known gene sequences. This comparative approach offered novel insights for improving the present

  2. Transcriptional shift identifies a set of genes driving breast cancer chemoresistance.

    Directory of Open Access Journals (Sweden)

    Laura Vera-Ramirez

    Full Text Available BACKGROUND: Distant recurrences after antineoplastic treatment remain a serious problem for breast cancer clinical management, which threats patients' life. Systemic therapy is administered to eradicate cancer cells from the organism, both at the site of the primary tumor and at any other potential location. Despite this intervention, a significant proportion of breast cancer patients relapse even many years after their primary tumor has been successfully treated according to current clinical standards, evidencing the existence of a chemoresistant cell subpopulation originating from the primary tumor. METHODS/FINDINGS: To identify key molecules and signaling pathways which drive breast cancer chemoresistance we performed gene expression analysis before and after anthracycline and taxane-based chemotherapy and compared the results between different histopathological response groups (good-, mid- and bad-response, established according to the Miller & Payne grading system. Two cohorts of 33 and 73 breast cancer patients receiving neoadjuvant chemotherapy were recruited for whole-genome expression analysis and validation assay, respectively. Identified genes were subjected to a bioinformatic analysis in order to ascertain the molecular function of the proteins they encode and the signaling in which they participate. High throughput technologies identified 65 gene sequences which were over-expressed in all groups (P ≤ 0·05 Bonferroni test. Notably we found that, after chemotherapy, a significant proportion of these genes were over-expressed in the good responders group, making their tumors indistinguishable from those of the bad responders in their expression profile (P ≤ 0.05 Benjamini-Hochgerg`s method. CONCLUSIONS: These data identify a set of key molecular pathways selectively up-regulated in post-chemotherapy cancer cells, which may become appropriate targets for the development of future directed therapies against breast cancer.

  3. Transcriptional Shift Identifies a Set of Genes Driving Breast Cancer Chemoresistance

    Science.gov (United States)

    Vera-Ramirez, Laura; Sanchez-Rovira, Pedro; Ramirez-Tortosa, Cesar L.; Quiles, Jose L.; Ramirez-Tortosa, MCarmen; Lorente, Jose A.

    2013-01-01

    Background Distant recurrences after antineoplastic treatment remain a serious problem for breast cancer clinical management, which threats patients’ life. Systemic therapy is administered to eradicate cancer cells from the organism, both at the site of the primary tumor and at any other potential location. Despite this intervention, a significant proportion of breast cancer patients relapse even many years after their primary tumor has been successfully treated according to current clinical standards, evidencing the existence of a chemoresistant cell subpopulation originating from the primary tumor. Methods/Findings To identify key molecules and signaling pathways which drive breast cancer chemoresistance we performed gene expression analysis before and after anthracycline and taxane-based chemotherapy and compared the results between different histopathological response groups (good-, mid- and bad-response), established according to the Miller & Payne grading system. Two cohorts of 33 and 73 breast cancer patients receiving neoadjuvant chemotherapy were recruited for whole-genome expression analysis and validation assay, respectively. Identified genes were subjected to a bioinformatic analysis in order to ascertain the molecular function of the proteins they encode and the signaling in which they participate. High throughput technologies identified 65 gene sequences which were over-expressed in all groups (P ≤ 0·05 Bonferroni test). Notably we found that, after chemotherapy, a significant proportion of these genes were over-expressed in the good responders group, making their tumors indistinguishable from those of the bad responders in their expression profile (P ≤ 0.05 Benjamini-Hochgerg`s method). Conclusions These data identify a set of key molecular pathways selectively up-regulated in post-chemotherapy cancer cells, which may become appropriate targets for the development of future directed therapies against breast cancer. PMID:23326553

  4. ANALYSIS OF CIRCUIT TOLERANCE BASED ON RANDOM SET THEORY

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Monte Carlo Analysis has been an accepted method for circuit tolerance analysis,but the heavy computational complexity has always prevented its applications.Based on random set theory,this paper presents a simple and flexible tolerance analysis method to estimate circuit yield.It is the alternative to Monte Carlo analysis,but reduces the number of calculations dramatically.

  5. A DSRPCL-SVM Approach to Informative Gene Analysis

    Institute of Scientific and Technical Information of China (English)

    Wei Xiong; Zhibin Cai; Jinwen Ma

    2008-01-01

    Microarray data based tumor diagnosis is a very interesting topic in bioinformatics. One of the key problems is the discovery and analysis of informative genes of a tumor. Although there are many elaborate approaches to this problem, it is still difficult to select a reasonable set of informative genes for tumor diagnosis only with microarray data. In this paper, we classify the genes expressed through microarray data into a number of clusters via the distance sensitive rival penalized competitive learning (DSRPCL) algorithm and then detect the informative gene cluster or set with the help of support vector machine (SVM). Moreover, the critical or powerful informative genes can be found through further classifications and detections on the obtained informative gene clusters. It is well demonstrated by experiments on the colon, leukemia, and breast cancer datasets that our proposed DSRPCL-SVM approach leads to a reasonable selection of informative genes for tumor diagnosis.

  6. Identification of candidate genes in osteoporosis by integrated microarray analysis

    OpenAIRE

    Li, J J; Wang, B. Q.; Fei, Q.; Yang, Y; Li, D.

    2017-01-01

    Objectives In order to screen the altered gene expression profile in peripheral blood mononuclear cells of patients with osteoporosis, we performed an integrated analysis of the online microarray studies of osteoporosis. Methods We searched the Gene Expression Omnibus (GEO) database for microarray studies of peripheral blood mononuclear cells in patients with osteoporosis. Subsequently, we integrated gene expression data sets from multiple microarray studies to obtain differentially expressed...

  7. Identification of noise in linear data sets by factor analysis

    Energy Technology Data Exchange (ETDEWEB)

    Roscoe, B.A.; Hopke, Ph.K. (Illinois Univ., Urbana (USA))

    1982-01-01

    A technique which has the ability to identify bad data points, after the data has been generated, is classical factor analysis. The ability of classical factor analysis to identify two different types of data errors make it ideally suited for scanning large data sets. Since the results yielded by factor analysis indicate correlations between parameters, one must know something about the nature of the data set and the analytical techniques used to obtain it to confidentially isolate errors.

  8. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, van D.A.M.; Goeman, J.J.; Jong, de E.; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    Background: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set

  9. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, D.A. van; Goeman, J.J.; Jong, Esther de; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    BACKGROUND: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set

  10. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, D.A. van; Goeman, J.J.; Jong, Esther de; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    BACKGROUND: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set anal

  11. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, van D.A.M.; Goeman, J.J.; Jong, de E.; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    Background: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set anal

  12. An improved method for functional similarity analysis of genes based on Gene Ontology.

    Science.gov (United States)

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

    2016-12-23

    Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  13. META-GSA: Combining Findings from Gene-Set Analyses across Several Genome-Wide Association Studies.

    Directory of Open Access Journals (Sweden)

    Albert Rosenberger

    Full Text Available Gene-set analysis (GSA methods are used as complementary approaches to genome-wide association studies (GWASs. The single marker association estimates of a predefined set of genes are either contrasted with those of all remaining genes or with a null non-associated background. To pool the p-values from several GSAs, it is important to take into account the concordance of the observed patterns resulting from single marker association point estimates across any given gene set. Here we propose an enhanced version of Fisher's inverse χ2-method META-GSA, however weighting each study to account for imperfect correlation between association patterns.We investigated the performance of META-GSA by simulating GWASs with 500 cases and 500 controls at 100 diallelic markers in 20 different scenarios, simulating different relative risks between 1 and 1.5 in gene sets of 10 genes. Wilcoxon's rank sum test was applied as GSA for each study. We found that META-GSA has greater power to discover truly associated gene sets than simple pooling of the p-values, by e.g. 59% versus 37%, when the true relative risk for 5 of 10 genes was assume to be 1.5. Under the null hypothesis of no difference in the true association pattern between the gene set of interest and the set of remaining genes, the results of both approaches are almost uncorrelated. We recommend not relying on p-values alone when combining the results of independent GSAs.We applied META-GSA to pool the results of four case-control GWASs of lung cancer risk (Central European Study and Toronto/Lunenfeld-Tanenbaum Research Institute Study; German Lung Cancer Study and MD Anderson Cancer Center Study, which had already been analyzed separately with four different GSA methods (EASE; SLAT, mSUMSTAT and GenGen. This application revealed the pathway GO0015291 "transmembrane transporter activity" as significantly enriched with associated genes (GSA-method: EASE, p = 0.0315 corrected for multiple testing. Similar

  14. Exploration of data partitioning in an eight-gene data set

    DEFF Research Database (Denmark)

    Rota, Jadranka; Wahlberg, Niklas

    2012-01-01

    Molecular data sets for phylogenetic inference continue to increase in size, especially with respect to the number of genes sampled. As more and more genes are included in analyses, the importance of partitioning the data to avoid problems that can arise from underparameterization becomes more...... apparent. With an eight-gene data set from 38 metalmark moth species (12 genera represented) and three outgroups, we explored different data partitioning strategies and their influence on convergence and mixing of Markov Chains Monte Carlo in a Bayesian setting. We found that in larger data sets......, with an increase in the number of partitions that are made a priori (e.g. by gene and codon position), convergence and mixing become poor. This problem can be overcome by using a recently published algorithm in which homologous sites are grouped into blocks with similar evolutionary rates that can then be modelled...

  15. Hierarchical Parallelization of Gene Differential Association Analysis

    Directory of Open Access Journals (Sweden)

    Dwarkadas Sandhya

    2011-09-01

    Full Text Available Abstract Background Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Results Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. Conclusions The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.

  16. Gene expression analysis identifies global gene dosage sensitivity in cancer

    DEFF Research Database (Denmark)

    Fehrmann, Rudolf S. N.; Karjalainen, Juha M.; Krajewska, Malgorzata;

    2015-01-01

    expression. We reanalyzed 77,840 expression profiles and observed a limited set of 'transcriptional components' that describe well-known biology, explain the vast majority of variation in gene expression and enable us to predict the biological function of genes. On correcting expression profiles...... for these components, we observed that the residual expression levels (in 'functional genomic mRNA' profiling) correlated strongly with copy number. DNA copy number correlated positively with expression levels for 99% of all abundantly expressed human genes, indicating global gene dosage sensitivity. By applying...

  17. Primer Sets Developed for Functional Genes Reveal Shifts in Functionality of Fungal Community in Soils

    NARCIS (Netherlands)

    Hannula, S.E.; van Veen, J.A.

    2016-01-01

    Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in

  18. Identification of Human HK Genes and Gene Expression Regulation Study in Cancer from Transcriptomics Data Analysis

    Science.gov (United States)

    Zhang, Zhang; Liu, Jingxing; Wu, Jiayan; Yu, Jun

    2013-01-01

    The regulation of gene expression is essential for eukaryotes, as it drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms. RNA-Sequencing (RNA-Seq) provides researchers with a powerful toolbox for characterization and quantification of transcriptome. Many different human tissue/cell transcriptome datasets coming from RNA-Seq technology are available on public data resource. The fundamental issue here is how to develop an effective analysis method to estimate expression pattern similarities between different tumor tissues and their corresponding normal tissues. We define the gene expression pattern from three directions: 1) expression breadth, which reflects gene expression on/off status, and mainly concerns ubiquitously expressed genes; 2) low/high or constant/variable expression genes, based on gene expression level and variation; and 3) the regulation of gene expression at the gene structure level. The cluster analysis indicates that gene expression pattern is higher related to physiological condition rather than tissue spatial distance. Two sets of human housekeeping (HK) genes are defined according to cell/tissue types, respectively. To characterize the gene expression pattern in gene expression level and variation, we firstly apply improved K-means algorithm and a gene expression variance model. We find that cancer-associated HK genes (a HK gene is specific in cancer group, while not in normal group) are expressed higher and more variable in cancer condition than in normal condition. Cancer-associated HK genes prefer to AT-rich genes, and they are enriched in cell cycle regulation related functions and constitute some cancer signatures. The expression of large genes is also avoided in cancer group. These studies will help us understand which cell type-specific patterns of gene expression differ among different cell types, and particularly for cancer. PMID:23382867

  19. Identification of noise in linear data sets by factor analysis

    Energy Technology Data Exchange (ETDEWEB)

    Roscoe, B.A.; Hopke, P.K.

    1982-01-01

    With the use of atomic and nuclear methods to analyze samples for a multitude of elements, very large data sets have been generated. Due to the ease of obtaining these results with computerized systems, the elemental data acquired are not always as thoroughly checked as they should be leading to some, if not many, bad data points. It is advantageous to have some feeling for the trouble spots in a data set before it is used for further studies. A technique which has the ability to identify bad data points, after the data has been generated, is classical factor analysis. The ability of classical factor analysis to identify two different types of data errors make it ideally suited for scanning large data sets. Since the results yielded by factor analysis indicate correlations between parameters, one must know something about the nature of the data set and the analytical techniques used to obtain it to confidentially isolate errors.

  20. Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.

    Directory of Open Access Journals (Sweden)

    Alexander Kaever

    Full Text Available A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.

  1. Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.

    Science.gov (United States)

    Kaever, Alexander; Landesfeind, Manuel; Feussner, Kirstin; Morgenstern, Burkhard; Feussner, Ivo; Meinicke, Peter

    2014-01-01

    A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene) Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.

  2. Module network inference from a cancer gene expression data set identifies microRNA regulated modules.

    Directory of Open Access Journals (Sweden)

    Eric Bonnet

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are small RNAs that recognize and regulate mRNA target genes. Multiple lines of evidence indicate that they are key regulators of numerous critical functions in development and disease, including cancer. However, defining the place and function of miRNAs in complex regulatory networks is not straightforward. Systems approaches, like the inference of a module network from expression data, can help to achieve this goal. METHODOLOGY/PRINCIPAL FINDINGS: During the last decade, much progress has been made in the development of robust and powerful module network inference algorithms. In this study, we analyze and assess experimentally a module network inferred from both miRNA and mRNA expression data, using our recently developed module network inference algorithm based on probabilistic optimization techniques. We show that several miRNAs are predicted as statistically significant regulators for various modules of tightly co-expressed genes. A detailed analysis of three of those modules demonstrates that the specific assignment of miRNAs is functionally coherent and supported by literature. We further designed a set of experiments to test the assignment of miR-200a as the top regulator of a small module of nine genes. The results strongly suggest that miR-200a is regulating the module genes via the transcription factor ZEB1. Interestingly, this module is most likely involved in epithelial homeostasis and its dysregulation might contribute to the malignant process in cancer cells. CONCLUSIONS/SIGNIFICANCE: Our results show that a robust module network analysis of expression data can provide novel insights of miRNA function in important cellular processes. Such a computational approach, starting from expression data alone, can be helpful in the process of identifying the function of miRNAs by suggesting modules of co-expressed genes in which they play a regulatory role. As shown in this study, those modules can then be

  3. Constructing Minimal Spanning Tree Based on Rough Set Theory for Gene Selection

    Directory of Open Access Journals (Sweden)

    Soumen Kumar Pati

    2012-11-01

    Full Text Available Microarray gene dataset often contains high dimensionalities which cause difficulty in clustering and classification. Datasets containing huge number of genes lead to increased complexity and therefore, degradation of dataset handling performance. Often, all the measured features of these high-dimensional datasets are not relevant for understanding the underlying phenomena of interest. Dimensionality reduction by reduct generation is hence performed as an important step before clustering and classification. The reduced attribute set has the same characteristics as the entire set of attributes in the information system. In this paper, a new attribute reduction technique, based on directed minimal spanning tree and rough set theory is done, for unsupervised learning. The method, firstly, computes a similarity factor between each pair of attributes using indiscernibility relation, a concept of rough set theory. Based on the similarity factors, an attribute similarity set is formed from which a directed weighted graph with vertices as attributes and edge weights as the inverse of the similarity factor is constructed. Then, all possible minimal spanning trees of the graph are generated. From each tree, iteratively, the most important vertex is included in the reduct set and all its out-going edges are removed. The process stops when the edge set is empty, thus producing multiple reducts. The proposed method and some well-known attribute reduction techniques have been applied on several microarray gene datasets for gene selection. The results obtained show the effectiveness of the method.

  4. CONSTRUCTING MINIMAL SPANNING TREE BASED ON ROUGH SET THEORY FOR GENE SELECTION

    Directory of Open Access Journals (Sweden)

    Soumen Kumar Pati

    2013-01-01

    Full Text Available Microarray gene dataset often contains high dimensionalities which cause difficulty in clustering and classification. Datasets containing huge number of genes lead to increased complexity and therefore, degradation of dataset handling performance. Often, all the measured features of these high-dimensional datasets are not relevant for understanding the underlying phenomena of interest. Dimensionality reduction by reduct generation is hence performed as an important step before clustering and classification. The reduced attribute set has the same characteristics as the entire set of attributes in the information system. In this paper, a new attribute reduction technique, based on directed minimal spanning tree and rough set theory is done, for unsupervised learning. The method, firstly, computes a similarity factor between each pair of attributes using indiscernibility relation, a concept of rough set theory. Based on the similarity factors, an attribute similarity set is formed from which a directed weighted graph with vertices as attributes and edge weights as the inverse of the similarity factor is constructed. Then, all possible minimal spanning trees of the graph are generated. From each tree, iteratively, the most important vertex is included in the reduct set and all its out-going edges are removed. The process stops when the edge set is empty, thus producing multiple reducts. The proposed method and some well-known attribute reduction techniques have been applied on several microarray gene datasets for gene selection. The results obtained show the effectiveness of the method.

  5. Genomic determinants of sporulation in Bacilli and Clostridia: towards the minimal set of sporulation-specific genes.

    Science.gov (United States)

    Galperin, Michael Y; Mekhedov, Sergei L; Puigbo, Pere; Smirnov, Sergey; Wolf, Yuri I; Rigden, Daniel J

    2012-11-01

    Three classes of low-G+C Gram-positive bacteria (Firmicutes), Bacilli, Clostridia and Negativicutes, include numerous members that are capable of producing heat-resistant endospores. Spore-forming firmicutes include many environmentally important organisms, such as insect pathogens and cellulose-degrading industrial strains, as well as human pathogens responsible for such diseases as anthrax, botulism, gas gangrene and tetanus. In the best-studied model organism Bacillus subtilis, sporulation involves over 500 genes, many of which are conserved among other bacilli and clostridia. This work aimed to define the genomic requirements for sporulation through an analysis of the presence of sporulation genes in various firmicutes, including those with smaller genomes than B. subtilis. Cultivable spore-formers were found to have genomes larger than 2300 kb and encompass over 2150 protein-coding genes of which 60 are orthologues of genes that are apparently essential for sporulation in B. subtilis. Clostridial spore-formers lack, among others, spoIIB, sda, spoVID and safA genes and have non-orthologous displacements of spoIIQ and spoIVFA, suggesting substantial differences between bacilli and clostridia in the engulfment and spore coat formation steps. Many B. subtilis sporulation genes, particularly those encoding small acid-soluble spore proteins and spore coat proteins, were found only in the family Bacillaceae, or even in a subset of Bacillus spp. Phylogenetic profiles of sporulation genes, compiled in this work, confirm the presence of a common sporulation gene core, but also illuminate the diversity of the sporulation processes within various lineages. These profiles should help further experimental studies of uncharacterized widespread sporulation genes, which would ultimately allow delineation of the minimal set(s) of sporulation-specific genes in Bacilli and Clostridia. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.

  6. APPLICATION OF FUZZY SET THEORY IN FAULT TREE ANALYSIS

    Institute of Scientific and Technical Information of China (English)

    1998-01-01

    Based on the fuzzy set theory and the expand principle, using fuzzy number as the boundary condition of fault tree analysis, a new method of analyzing fuzzy fault probability of the top event is developed. Fuzzy importance analysis of the basic event is proposed as well. A practical example is given. This method is a new way to solve the obscure problems of fault tree analysis and has great value in engineering practice.

  7. A novel CpG island set identifies tissue-specific methylation at developmental gene loci.

    Directory of Open Access Journals (Sweden)

    Robert Illingworth

    2008-01-01

    Full Text Available CpG islands (CGIs are dense clusters of CpG sequences that punctuate the CpG-deficient human genome and associate with many gene promoters. As CGIs also differ from bulk chromosomal DNA by their frequent lack of cytosine methylation, we devised a CGI enrichment method based on nonmethylated CpG affinity chromatography. The resulting library was sequenced to define a novel human blood CGI set that includes many that are not detected by current algorithms. Approximately half of CGIs were associated with annotated gene transcription start sites, the remainder being intra- or intergenic. Using an array representing over 17,000 CGIs, we established that 6%-8% of CGIs are methylated in genomic DNA of human blood, brain, muscle, and spleen. Inter- and intragenic CGIs are preferentially susceptible to methylation. CGIs showing tissue-specific methylation were overrepresented at numerous genetic loci that are essential for development, including HOX and PAX family members. The findings enable a comprehensive analysis of the roles played by CGI methylation in normal and diseased human tissues.

  8. GIFtS: annotation landscape analysis with GeneCards

    Directory of Open Access Journals (Sweden)

    Dalah Irina

    2009-10-01

    Full Text Available Abstract Background Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards® is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO, pathways, interactions, phenotypes, publications and many more. Results We present the GeneCards Inferred Functionality Score (GIFtS which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25 between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a

  9. Discovery of gene-gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium.

    Science.gov (United States)

    Hohman, Timothy J; Bush, William S; Jiang, Lan; Brown-Gentry, Kristin D; Torstenson, Eric S; Dudek, Scott M; Mukherjee, Shubhabrata; Naj, Adam; Kunkle, Brian W; Ritchie, Marylyn D; Martin, Eden R; Schellenberg, Gerard D; Mayeux, Richard; Farrer, Lindsay A; Pericak-Vance, Margaret A; Haines, Jonathan L; Thornton-Wells, Tricia A

    2016-02-01

    Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis.

  10. Mechanical Unloading of Mouse Bone in Microgravity Significantly Alters Cell Cycle Gene Set Expression

    Science.gov (United States)

    Blaber, Elizabeth; Dvorochkin, Natalya; Almeida, Eduardo; Kaplan, Warren; Burns, Brnedan

    2012-07-01

    unloading in spaceflight, we conducted genome wide microarray analysis of total RNA isolated from the mouse pelvis. Specifically, 16 week old mice were subjected to 15 days spaceflight onboard NASA's STS-131 space shuttle mission. The pelvis of the mice was dissected, the bone marrow was flushed and the bones were briefly stored in RNAlater. The pelvii were then homogenized, and RNA was isolated using TRIzol. RNA concentration and quality was measured using a Nanodrop spectrometer, and 0.8% agarose gel electrophoresis. Samples of cDNA were analyzed using an Affymetrix GeneChip\\S Gene 1.0 ST (Sense Target) Array System for Mouse and GenePattern Software. We normalized the ST gene arrays using Robust Multichip Average (RMA) normalization, which summarizes perfectly matched spots on the array through the median polish algorithm, rather than normalizing according to mismatched spots. We also used Limma for statistical analysis, using the BioConductor Limma Library by Gordon Smyth, and differential expression analysis to identify genes with significant changes in expression between the two experimental conditions. Finally we used GSEApreRanked for Gene Set Enrichment Analysis (GSEA), with Kolmogorov-Smirnov style statistics to identify groups of genes that are regulated together using the t-statistics derived from Limma. Preliminary results show that 6,603 genes expressed in pelvic bone had statistically significant alterations in spaceflight compared to ground controls. These prominently included cell cycle arrest molecules p21, and p18, cell survival molecule Crbp1, and cell cycle molecules cyclin D1, and Cdk1. Additionally, GSEA results indicated alterations in molecular targets of cyclin D1 and Cdk4, senescence pathways resulting from abnormal laminin maturation, cell-cell contacts via E-cadherin, and several pathways relating to protein translation and metabolism. In total 111 gene sets out of 2,488, about 4%, showed statistically significant set alterations. These

  11. An Efficient Soft Set-Based Approach for Conflict Analysis.

    Science.gov (United States)

    Sutoyo, Edi; Mungad, Mungad; Hamid, Suraya; Herawan, Tutut

    2016-01-01

    Conflict analysis has been used as an important tool in economic, business, governmental and political dispute, games, management negotiations, military operations and etc. There are many mathematical formal models have been proposed to handle conflict situations and one of the most popular is rough set theory. With the ability to handle vagueness from the conflict data set, rough set theory has been successfully used. However, computational time is still an issue when determining the certainty, coverage, and strength of conflict situations. In this paper, we present an alternative approach to handle conflict situations, based on some ideas using soft set theory. The novelty of the proposed approach is that, unlike in rough set theory that uses decision rules, it is based on the concept of co-occurrence of parameters in soft set theory. We illustrate the proposed approach by means of a tutorial example of voting analysis in conflict situations. Furthermore, we elaborate the proposed approach on real world dataset of political conflict in Indonesian Parliament. We show that, the proposed approach achieves lower computational time as compared to rough set theory of up to 3.9%.

  12. An Efficient Soft Set-Based Approach for Conflict Analysis.

    Directory of Open Access Journals (Sweden)

    Edi Sutoyo

    Full Text Available Conflict analysis has been used as an important tool in economic, business, governmental and political dispute, games, management negotiations, military operations and etc. There are many mathematical formal models have been proposed to handle conflict situations and one of the most popular is rough set theory. With the ability to handle vagueness from the conflict data set, rough set theory has been successfully used. However, computational time is still an issue when determining the certainty, coverage, and strength of conflict situations. In this paper, we present an alternative approach to handle conflict situations, based on some ideas using soft set theory. The novelty of the proposed approach is that, unlike in rough set theory that uses decision rules, it is based on the concept of co-occurrence of parameters in soft set theory. We illustrate the proposed approach by means of a tutorial example of voting analysis in conflict situations. Furthermore, we elaborate the proposed approach on real world dataset of political conflict in Indonesian Parliament. We show that, the proposed approach achieves lower computational time as compared to rough set theory of up to 3.9%.

  13. An Efficient Soft Set-Based Approach for Conflict Analysis

    Science.gov (United States)

    Sutoyo, Edi; Mungad, Mungad; Hamid, Suraya; Herawan, Tutut

    2016-01-01

    Conflict analysis has been used as an important tool in economic, business, governmental and political dispute, games, management negotiations, military operations and etc. There are many mathematical formal models have been proposed to handle conflict situations and one of the most popular is rough set theory. With the ability to handle vagueness from the conflict data set, rough set theory has been successfully used. However, computational time is still an issue when determining the certainty, coverage, and strength of conflict situations. In this paper, we present an alternative approach to handle conflict situations, based on some ideas using soft set theory. The novelty of the proposed approach is that, unlike in rough set theory that uses decision rules, it is based on the concept of co-occurrence of parameters in soft set theory. We illustrate the proposed approach by means of a tutorial example of voting analysis in conflict situations. Furthermore, we elaborate the proposed approach on real world dataset of political conflict in Indonesian Parliament. We show that, the proposed approach achieves lower computational time as compared to rough set theory of up to 3.9%. PMID:26928627

  14. Prioritizing predicted cis-regulatory elements for co-expressed gene sets based on Lasso regression models.

    Science.gov (United States)

    Hu, Hong; Roqueiro, Damian; Dai, Yang

    2011-01-01

    Computational prediction of cis-regulatory elements for a set of co-expressed genes based on sequence analysis provides an overwhelming volume of potential transcription factor binding sites. It presents a challenge to prioritize transcription factors for regulatory functional studies. A novel approach based on the use of Lasso regression models is proposed to address this problem. We examine the ability of the Lasso model using time-course microarray data obtained from a comprehensive study of gene expression profiles in skin and mucosal wounds in mouse over all stages of wound healing.

  15. Constructing Minimal Spanning Tree Based on Rough Set Theory for Gene Selection

    Directory of Open Access Journals (Sweden)

    Soumen Kumar Pati

    2013-02-01

    Full Text Available Microarray gene dataset often contains high dimensionalities which cause difficulty in clustering andclassification. Datasets containing huge number of genes lead to increased complexity and therefore,degradation of dataset handling performance. Often, all the measured features of these high-dimensionaldatasets are not relevant for understanding the underlying phenomena of interest. Dimensionality reductionby reduct generation is hence performed as an important step before clustering and classification. Thereduced attribute set has the same characteristics as the entire set of attributes in the information system.In this paper, a new attribute reduction technique, based on directed minimal spanning tree and rough settheory is done, for unsupervised learning. The method, firstly, computes a similarity factor between eachpair of attributes using indiscernibility relation, a concept of rough set theory. Based on the similarityfactors, an attribute similarity set is formed from which a directed weighted graph with vertices asattributes and edge weights as the inverse of the similarity factor is constructed. Then, all possible minimalspanning trees of the graph are generated. From each tree, iteratively, the most important vertex isincluded in the reduct set and all its out-going edges are removed. The process stops when the edge set isempty, thus producing multiple reducts. The proposed method and some well-known attribute reductiontechniques have been applied on several microarray gene datasets for gene selection. The results obtainedshow the effectiveness of the method.

  16. Rice Transcriptome Analysis to Identify Possible Herbicide Quinclorac Detoxification Genes

    Directory of Open Access Journals (Sweden)

    Wenying eXu

    2015-09-01

    Full Text Available Quinclorac is a highly selective auxin-type herbicide, and is widely used in the effective control of barnyard grass in paddy rice fields, improving the world’s rice yield. The herbicide mode of action of quinclorac has been proposed and hormone interactions affect quinclorac signaling. Because of widespread use, quinclorac may be transported outside rice fields with the drainage waters, leading to soil and water pollution and environmental health problems.In this study, we used 57K Affymetrix rice whole-genome array to identify quinclorac signaling response genes to study the molecular mechanisms of action and detoxification of quinclorac in rice plants. Overall, 637 probe sets were identified with differential expression levels under either 6 or 24 h of quinclorac treatment. Auxin-related genes such as GH3 and OsIAAs responded to quinclorac treatment. Gene Ontology analysis showed that genes of detoxification-related family genes were significantly enriched, including cytochrome P450, GST, UGT, and ABC and drug transporter genes. Moreover, real-time RT-PCR analysis showed that top candidate P450 families such as CYP81, CYP709C and CYP72A genes were universally induced by different herbicides. Some Arabidopsis genes for the same P450 family were up-regulated under quinclorac treatment.We conduct rice whole-genome GeneChip analysis and the first global identification of quinclorac response genes. This work may provide potential markers for detoxification of quinclorac and biomonitors of environmental chemical pollution.

  17. Identification and Validation of a New Set of Five Genes for Prediction of Risk in Early Breast Cancer

    Directory of Open Access Journals (Sweden)

    Giorgio Mustacchi

    2013-05-01

    Full Text Available Molecular tests predicting the outcome of breast cancer patients based on gene expression levels can be used to assist in making treatment decisions after consideration of conventional markers. In this study we identified a subset of 20 mRNA differentially regulated in breast cancer analyzing several publicly available array gene expression data using R/Bioconductor package. Using RTqPCR we evaluate 261 consecutive invasive breast cancer cases not selected for age, adjuvant treatment, nodal and estrogen receptor status from paraffin embedded sections. The biological samples dataset was split into a training (137 cases and a validation set (124 cases. The gene signature was developed on the training set and a multivariate stepwise Cox analysis selected five genes independently associated with DFS: FGF18 (HR = 1.13, p = 0.05, BCL2 (HR = 0.57, p = 0.001, PRC1 (HR = 1.51, p = 0.001, MMP9 (HR = 1.11, p = 0.08, SERF1a (HR = 0.83, p = 0.007. These five genes were combined into a linear score (signature weighted according to the coefficients of the Cox model, as: 0.125FGF18 − 0.560BCL2 + 0.409PRC1 + 0.104MMP9 − 0.188SERF1A (HR = 2.7, 95% CI = 1.9–4.0, p < 0.001. The signature was then evaluated on the validation set assessing the discrimination ability by a Kaplan Meier analysis, using the same cut offs classifying patients at low, intermediate or high risk of disease relapse as defined on the training set (p < 0.001. Our signature, after a further clinical validation, could be proposed as prognostic signature for disease free survival in breast cancer patients where the indication for adjuvant chemotherapy added to endocrine treatment is uncertain.

  18. Quantitative Expression Analysis in Brassica napus by Northern Blot Analysis and Reverse Transcription-Quantitative PCR in a Complex Experimental Setting.

    Science.gov (United States)

    Rumlow, Annekathrin; Keunen, Els; Klein, Jan; Pallmann, Philip; Riemenschneider, Anja; Cuypers, Ann; Papenbrock, Jutta

    Analysis of gene expression is one of the major ways to better understand plant reactions to changes in environmental conditions. The comparison of many different factors influencing plant growth challenges the gene expression analysis for specific gene-targeted experiments, especially with regard to the choice of suitable reference genes. The aim of this study is to compare expression results obtained by Northern blot, semi-quantitative PCR and RT-qPCR, and to identify a reliable set of reference genes for oilseed rape (Brassica napus L.) suitable for comparing gene expression under complex experimental conditions. We investigated the influence of several factors such as sulfur deficiency, different time points during the day, varying light conditions, and their interaction on gene expression in oilseed rape plants. The expression of selected reference genes was indeed influenced under these conditions in different ways. Therefore, a recently developed algorithm, called GrayNorm, was applied to validate a set of reference genes for normalizing results obtained by Northern blot analysis. After careful comparison of the three methods mentioned above, Northern blot analysis seems to be a reliable and cost-effective alternative for gene expression analysis under a complex growth regime. For using this method in a quantitative way a number of references was validated revealing that for our experiment a set of three references provides an appropriate normalization. Semi-quantitative PCR was prone to many handling errors and difficult to control while RT-qPCR was very sensitive to expression fluctuations of the reference genes.

  19. Comparison of pathway analysis approaches using lung cancer GWAS data sets.

    Directory of Open Access Journals (Sweden)

    Gordon Fehringer

    Full Text Available Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO; the other a combined data set from Germany and MD Anderson (GRMD. We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA, which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT, which tests for association by averaging χ(2 statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT, the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD. This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001 and GRMD (FDR = 0.009, although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4 drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen approach.

  20. Variable precision rough set for multiple decision attribute analysis

    Institute of Scientific and Technical Information of China (English)

    Lai; Kin; Keung

    2008-01-01

    A variable precision rough set (VPRS) model is used to solve the multi-attribute decision analysis (MADA) problem with multiple conflicting decision attributes and multiple condition attributes. By introducing confidence measures and a β-reduct, the VPRS model can rationally solve the conflicting decision analysis problem with multiple decision attributes and multiple condition attributes. For illustration, a medical diagnosis example is utilized to show the feasibility of the VPRS model in solving the MADA...

  1. Integrative Data Analysis: The Simultaneous Analysis of Multiple Data Sets

    Science.gov (United States)

    Curran, Patrick J.; Hussong, Andrea M.

    2009-01-01

    There are both quantitative and methodological techniques that foster the development and maintenance of a cumulative knowledge base within the psychological sciences. Most noteworthy of these techniques is meta-analysis, which allows for the synthesis of summary statistics drawn from multiple studies when the original data are not available.…

  2. Accurate Gene Expression-Based Biodosimetry Using a Minimal Set of Human Gene Transcripts

    Energy Technology Data Exchange (ETDEWEB)

    Tucker, James D., E-mail: jtucker@biology.biosci.wayne.edu [Department of Biological Sciences, Wayne State University, Detroit, Michigan (United States); Joiner, Michael C. [Department of Radiation Oncology, Wayne State University, Detroit, Michigan (United States); Thomas, Robert A.; Grever, William E.; Bakhmutsky, Marina V. [Department of Biological Sciences, Wayne State University, Detroit, Michigan (United States); Chinkhota, Chantelle N.; Smolinski, Joseph M. [Department of Electrical and Computer Engineering, Wayne State University, Detroit, Michigan (United States); Divine, George W. [Department of Public Health Sciences, Henry Ford Hospital, Detroit, Michigan (United States); Auner, Gregory W. [Department of Electrical and Computer Engineering, Wayne State University, Detroit, Michigan (United States)

    2014-03-15

    Purpose: Rapid and reliable methods for conducting biological dosimetry are a necessity in the event of a large-scale nuclear event. Conventional biodosimetry methods lack the speed, portability, ease of use, and low cost required for triaging numerous victims. Here we address this need by showing that polymerase chain reaction (PCR) on a small number of gene transcripts can provide accurate and rapid dosimetry. The low cost and relative ease of PCR compared with existing dosimetry methods suggest that this approach may be useful in mass-casualty triage situations. Methods and Materials: Human peripheral blood from 60 adult donors was acutely exposed to cobalt-60 gamma rays at doses of 0 (control) to 10 Gy. mRNA expression levels of 121 selected genes were obtained 0.5, 1, and 2 days after exposure by reverse-transcriptase real-time PCR. Optimal dosimetry at each time point was obtained by stepwise regression of dose received against individual gene transcript expression levels. Results: Only 3 to 4 different gene transcripts, ASTN2, CDKN1A, GDF15, and ATM, are needed to explain ≥0.87 of the variance (R{sup 2}). Receiver-operator characteristics, a measure of sensitivity and specificity, of 0.98 for these statistical models were achieved at each time point. Conclusions: The actual and predicted radiation doses agree very closely up to 6 Gy. Dosimetry at 8 and 10 Gy shows some effect of saturation, thereby slightly diminishing the ability to quantify higher exposures. Analyses of these gene transcripts may be advantageous for use in a field-portable device designed to assess exposures in mass casualty situations or in clinical radiation emergencies.

  3. Primer Sets Developed for Functional Genes Reveal Shifts in Functionality of Fungal Community in Soils

    Science.gov (United States)

    Hannula, S. Emilia; van Veen, Johannes A.

    2016-01-01

    Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in carbon cycling were designed and/or validated and the functioning of soil fungal communities along a chronosequence of land abandonment from agriculture was evaluated using them. We hypothesized that changes in fungal community structure during secondary succession would lead to difference in the types of genes present in soils and that these changes would be directional. We expected an increase in genes involved in degradation of recalcitrant organic matter in time since agriculture. Out of the investigated genes, the richness of the genes related to carbon cycling was significantly higher in fields abandoned for longer time. The composition of six of the genes analyzed revealed significant differences between fields abandoned for shorter and longer time. However, all genes revealed significant variance over the fields studied, and this could be related to other parameters than the time since agriculture such as pH, organic matter, and the amount of available nitrogen. Contrary to our initial hypothesis, the genes significantly different between fields were not related to the decomposition of more recalcitrant matter but rather involved in degradation of cellulose and hemicellulose. PMID:27965632

  4. Primer Sets Developed for Functional Genes Reveal Shifts in Functionality of Fungal Community in Soils.

    Science.gov (United States)

    Hannula, S Emilia; van Veen, Johannes A

    2016-01-01

    Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in carbon cycling were designed and/or validated and the functioning of soil fungal communities along a chronosequence of land abandonment from agriculture was evaluated using them. We hypothesized that changes in fungal community structure during secondary succession would lead to difference in the types of genes present in soils and that these changes would be directional. We expected an increase in genes involved in degradation of recalcitrant organic matter in time since agriculture. Out of the investigated genes, the richness of the genes related to carbon cycling was significantly higher in fields abandoned for longer time. The composition of six of the genes analyzed revealed significant differences between fields abandoned for shorter and longer time. However, all genes revealed significant variance over the fields studied, and this could be related to other parameters than the time since agriculture such as pH, organic matter, and the amount of available nitrogen. Contrary to our initial hypothesis, the genes significantly different between fields were not related to the decomposition of more recalcitrant matter but rather involved in degradation of cellulose and hemicellulose.

  5. Primer sets developed for fungal functional genes reveal shifts in functionality of fungal community in soils

    Directory of Open Access Journals (Sweden)

    Emilia Silja Hannula

    2016-11-01

    Full Text Available Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in carbon cycling were designed and/or validated and the functioning of soil fungal communities along a chronosequence of land abandonment from agriculture was evaluated using them. We hypothesized that changes in fungal community structure during secondary succession would lead to difference in the types of genes present in soils and that these changes would be directional. We expected an increase in genes involved in degradation of recalcitrant organic matter in time since agriculture. Out of the investigated genes, the richness of the genes related to carbon cycling was significantly higher in fields abandoned for longer time. The composition of six of the genes analyzed revealed significant differences between fields abandoned for shorter and longer time. However, all genes revealed significant variance over the fields studied, and this could be related to other parameters than the time since agriculture such as pH, organic matter and the amount of available nitrogen. Contrary to our initial hypothesis, the genes significantly different between fields were not related to the decomposition of more recalcitrant matter but rather involved in degradation of cellulose and hemicellulose.

  6. Meta-analysis based variable selection for gene expression data.

    Science.gov (United States)

    Li, Quefeng; Wang, Sijian; Huang, Chiang-Ching; Yu, Menggang; Shao, Jun

    2014-12-01

    Recent advance in biotechnology and its wide applications have led to the generation of many high-dimensional gene expression data sets that can be used to address similar biological questions. Meta-analysis plays an important role in summarizing and synthesizing scientific evidence from multiple studies. When the dimensions of datasets are high, it is desirable to incorporate variable selection into meta-analysis to improve model interpretation and prediction. According to our knowledge, all existing methods conduct variable selection with meta-analyzed data in an "all-in-or-all-out" fashion, that is, a gene is either selected in all of studies or not selected in any study. However, due to data heterogeneity commonly exist in meta-analyzed data, including choices of biospecimens, study population, and measurement sensitivity, it is possible that a gene is important in some studies while unimportant in others. In this article, we propose a novel method called meta-lasso for variable selection with high-dimensional meta-analyzed data. Through a hierarchical decomposition on regression coefficients, our method not only borrows strength across multiple data sets to boost the power to identify important genes, but also keeps the selection flexibility among data sets to take into account data heterogeneity. We show that our method possesses the gene selection consistency, that is, when sample size of each data set is large, with high probability, our method can identify all important genes and remove all unimportant genes. Simulation studies demonstrate a good performance of our method. We applied our meta-lasso method to a meta-analysis of five cardiovascular studies. The analysis results are clinically meaningful.

  7. A small set of extra-embryonic genes defines a new landmark for bovine embryo staging.

    Science.gov (United States)

    Degrelle, Séverine A; Lê Cao, Kim-Anh; Heyman, Yvan; Everts, Robin E; Campion, Evelyne; Richard, Christophe; Ducroix-Crépy, Céline; Tian, X Cindy; Lewin, Harris A; Renard, Jean-Paul; Robert-Granié, Christèle; Hue, Isabelle

    2011-01-01

    Axis specification in mouse is determined by a sequence of reciprocal interactions between embryonic and extra-embryonic tissues so that a few extra-embryonic genes appear as 'patterning' the embryo. Considering these interactions as essential, but lacking in most mammals the genetically driven approaches used in mouse and the corresponding patterning mutants, we examined whether a molecular signature originating from extra-embryonic tissues could relate to the developmental stage of the embryo proper and predict it. To this end, we have profiled bovine extra-embryonic tissues at peri-implantation stages, when gastrulation and early neurulation occur, and analysed the subsequent expression profiles through the use of predictive methods as previously reported for tumour classification. A set of six genes (CALM1, CPA3, CITED1, DLD, HNRNPDL, and TGFB3), half of which had not been previously associated with any extra-embryonic feature, appeared significantly discriminative and mainly dependent on embryonic tissues for its faithful expression. The predictive value of this set of genes for gastrulation and early neurulation stages, as assessed on naive samples, was remarkably high (93%). In silico connected to the bovine orthologues of the mouse patterning genes, this gene set is proposed as a new trait for embryo staging. As such, this will allow saving the bovine embryo proper for molecular or cellular studies. To us, it offers as well new perspectives for developmental phenotyping and modelling of embryonic/extra-embryonic co-differentiation.

  8. Dynamic association rules for gene expression data analysis.

    Science.gov (United States)

    Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung

    2015-10-14

    The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed

  9. An Empirical Analysis of Rough Set Categorical Clustering Techniques

    Science.gov (United States)

    2017-01-01

    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. PMID:28068344

  10. An Empirical Analysis of Rough Set Categorical Clustering Techniques.

    Science.gov (United States)

    Uddin, Jamal; Ghazali, Rozaida; Deris, Mustafa Mat

    2017-01-01

    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.

  11. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    K.M. Hettne (Kristina); J. Boorsma (Jeffrey); D.A.M. van Dartel (Dorien A M); J.J. Goeman (Jelle); E.C. de Jong (Esther); A.H. Piersma (Aldert); R.H. Stierum (Rob); J. Kleinjans (Jos); J.A. Kors (Jan)

    2013-01-01

    textabstractBackground: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with

  12. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    K.M. Hettne (Kristina); J. Boorsma (Jeffrey); D.A.M. van Dartel (Dorien A M); J.J. Goeman (Jelle); E.C. de Jong (Esther); A.H. Piersma (Aldert); R.H. Stierum (Rob); J. Kleinjans (Jos); J.A. Kors (Jan)

    2013-01-01

    textabstractBackground: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with g

  13. General approach for in vivo recovery of cell type-specific effector gene sets.

    Science.gov (United States)

    Barsi, Julius C; Tu, Qiang; Davidson, Eric H

    2014-05-01

    Differentially expressed, cell type-specific effector gene sets hold the key to multiple important problems in biology, from theoretical aspects of developmental gene regulatory networks (GRNs) to various practical applications. Although individual cell types of interest have been recovered by various methods and analyzed, systematic recovery of multiple cell type-specific gene sets from whole developing organisms has remained problematic. Here we describe a general methodology using the sea urchin embryo, a material of choice because of the large-scale GRNs already solved for this model system. This method utilizes the regulatory states expressed by given cells of the embryo to define cell type and includes a fluorescence activated cell sorting (FACS) procedure that results in no perturbation of transcript representation. We have extensively validated the method by spatial and qualitative analyses of the transcriptome expressed in isolated embryonic skeletogenic cells and as a consequence, generated a prototypical cell type-specific transcriptome database.

  14. Assessing the Association of Mitochondrial Genetic Variation With Primary Open-Angle Glaucoma Using Gene-Set Analyses.

    Science.gov (United States)

    Khawaja, Anthony P; Cooke Bailey, Jessica N; Kang, Jae Hee; Allingham, R Rand; Hauser, Michael A; Brilliant, Murray; Budenz, Donald L; Christen, William G; Fingert, John; Gaasterland, Douglas; Gaasterland, Terry; Kraft, Peter; Lee, Richard K; Lichter, Paul R; Liu, Yutao; Medeiros, Felipe; Moroi, Syoko E; Richards, Julia E; Realini, Tony; Ritch, Robert; Schuman, Joel S; Scott, William K; Singh, Kuldev; Sit, Arthur J; Vollrath, Douglas; Wollstein, Gadi; Zack, Donald J; Zhang, Kang; Pericak-Vance, Margaret; Weinreb, Robert N; Haines, Jonathan L; Pasquale, Louis R; Wiggs, Janey L

    2016-09-01

    Recent studies indicate that mitochondrial proteins may contribute to the pathogenesis of primary open-angle glaucoma (POAG). In this study, we examined the association between POAG and common variations in gene-encoding mitochondrial proteins. We examined genetic data from 3430 POAG cases and 3108 controls derived from the combination of the GLAUGEN and NEIGHBOR studies. We constructed biological-system coherent mitochondrial nuclear-encoded protein gene-sets by intersecting the MitoCarta database with the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We examined the mitochondrial gene-sets for association with POAG and with normal-tension glaucoma (NTG) and high-tension glaucoma (HTG) subsets using Pathway Analysis by Randomization Incorporating Structure. We identified 22 KEGG pathways with significant mitochondrial protein-encoding gene enrichment, belonging to six general biological classes. Among the pathway classes, mitochondrial lipid metabolism was associated with POAG overall (P = 0.013) and with NTG (P = 0.0006), and mitochondrial carbohydrate metabolism was associated with NTG (P = 0.030). Examining the individual KEGG pathway mitochondrial gene-sets, fatty acid elongation and synthesis and degradation of ketone bodies, both lipid metabolism pathways, were significantly associated with POAG (P = 0.005 and P = 0.002, respectively) and NTG (P = 0.0004 and P < 0.0001, respectively). Butanoate metabolism, a carbohydrate metabolism pathway, was significantly associated with POAG (P = 0.004), NTG (P = 0.001), and HTG (P = 0.010). We present an effective approach for assessing the contributions of mitochondrial genetic variation to open-angle glaucoma. Our findings support a role for mitochondria in POAG pathogenesis and specifically point to lipid and carbohydrate metabolism pathways as being important.

  15. Comparative genomic analysis of soybean flowering genes.

    Directory of Open Access Journals (Sweden)

    Chol-Hee Jung

    Full Text Available Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant

  16. Can survival prediction be improved by merging gene expression data sets?

    Directory of Open Access Journals (Sweden)

    Haleh Yasrebi

    Full Text Available BACKGROUND: High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. RESULTS: Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. CONCLUSIONS: Merging did not deteriorate performance on average despite (a The diversity of microarray platforms used. (b The heterogeneity of patients cohorts. (c The heterogeneity of breast cancer disease. (d Substantial variation of time to death or relapse. (e The reduced number of genes in the merged data

  17. A sequence-based approach to identify reference genes for gene expression analysis

    Directory of Open Access Journals (Sweden)

    Chari Raj

    2010-08-01

    Full Text Available Abstract Background An important consideration when analyzing both microarray and quantitative PCR expression data is the selection of appropriate genes as endogenous controls or reference genes. This step is especially critical when identifying genes differentially expressed between datasets. Moreover, reference genes suitable in one context (e.g. lung cancer may not be suitable in another (e.g. breast cancer. Currently, the main approach to identify reference genes involves the mining of expression microarray data for highly expressed and relatively constant transcripts across a sample set. A caveat here is the requirement for transcript normalization prior to analysis, and measurements obtained are relative, not absolute. Alternatively, as sequencing-based technologies provide digital quantitative output, absolute quantification ensues, and reference gene identification becomes more accurate. Methods Serial analysis of gene expression (SAGE profiles of non-malignant and malignant lung samples were compared using a permutation test to identify the most stably expressed genes across all samples. Subsequently, the specificity of the reference genes was evaluated across multiple tissue types, their constancy of expression was assessed using quantitative RT-PCR (qPCR, and their impact on differential expression analysis of microarray data was evaluated. Results We show that (i conventional references genes such as ACTB and GAPDH are highly variable between cancerous and non-cancerous samples, (ii reference genes identified for lung cancer do not perform well for other cancer types (breast and brain, (iii reference genes identified through SAGE show low variability using qPCR in a different cohort of samples, and (iv normalization of a lung cancer gene expression microarray dataset with or without our reference genes, yields different results for differential gene expression and subsequent analyses. Specifically, key established pathways in lung

  18. Identification of a novel set of genes reflecting different in vivo invasive patterns of human GBM cells

    Directory of Open Access Journals (Sweden)

    Monticone Massimiliano

    2012-08-01

    Full Text Available Abstract Background Most patients affected by Glioblastoma multiforme (GBM, grade IV glioma experience a recurrence of the disease because of the spreading of tumor cells beyond surgical boundaries. Unveiling mechanisms causing this process is a logic goal to impair the killing capacity of GBM cells by molecular targeting. We noticed that our long-term GBM cultures, established from different patients, may display two categories/types of growth behavior in an orthotopic xenograft model: expansion of the tumor mass and formation of tumor branches/nodules (nodular like, NL-type or highly diffuse single tumor cell infiltration (HD-type. Methods We determined by DNA microarrays the gene expression profiles of three NL-type and three HD-type long-term GBM cultures. Subsequently, individual genes with different expression levels between the two groups were identified using Significance Analysis of Microarrays (SAM. Real time RT-PCR, immunofluorescence and immunoblot analyses, were performed for a selected subgroup of regulated gene products to confirm the results obtained by the expression analysis. Results Here, we report the identification of a set of 34 differentially expressed genes in the two types of GBM cultures. Twenty-three of these genes encode for proteins localized to the plasma membrane and 9 of these for proteins are involved in the process of cell adhesion. Conclusions This study suggests the participation in the diffuse infiltrative/invasive process of GBM cells within the CNS of a novel set of genes coding for membrane-associated proteins, which should be thus susceptible to an inhibition strategy by specific targeting. Massimiliano Monticone and Antonio Daga contributed equally to this work

  19. Human gene correlation analysis (HGCA): a tool for the identification of transcriptionally co-expressed genes.

    Science.gov (United States)

    Michalopoulos, Ioannis; Pavlopoulos, Georgios A; Malatras, Apostolos; Karelas, Alexandros; Kostadima, Myrto-Areti; Schneider, Reinhard; Kossida, Sophia

    2012-06-06

    Bioinformatics and high-throughput technologies such as microarray studies allow the measure of the expression levels of large numbers of genes simultaneously, thus helping us to understand the molecular mechanisms of various biological processes in a cell. We calculate the Pearson Correlation Coefficient (r-value) between probe set signal values from Affymetrix Human Genome Microarray samples and cluster the human genes according to the r-value correlation matrix using the Neighbour Joining (NJ) clustering method. A hyper-geometric distribution is applied on the text annotations of the probe sets to quantify the term overrepresentations. The aim of the tool is the identification of closely correlated genes for a given gene of interest and/or the prediction of its biological function, which is based on the annotations of the respective gene cluster. Human Gene Correlation Analysis (HGCA) is a tool to classify human genes according to their coexpression levels and to identify overrepresented annotation terms in correlated gene groups. It is available at: http://biobank-informatics.bioacademy.gr/coexpression/.

  20. Gene Bionetwork Analysis of Ovarian Primordial Follicle Development

    Science.gov (United States)

    Nilsson, Eric E.; Savenkova, Marina I.; Schindler, Ryan; Zhang, Bin; Schadt, Eric E.; Skinner, Michael K.

    2010-01-01

    Ovarian primordial follicles are critical for female reproduction and comprise a finite pool of gametes arrested in development. A systems biology approach was used to identify regulatory gene networks essential for primordial follicle development. Transcriptional responses to eight different growth factors known to influence primordial follicles were used to construct a bionetwork of regulatory genes involved in rat primordial follicle development. Over 1,500 genes were found to be regulated by the various growth factors and a network analysis identified critical gene modules involved in a number of signaling pathways and cellular processes. A set of 55 genes was identified as potential critical regulators of these gene modules, and a sub-network associated with development was determined. Within the network two previously identified regulatory genes were confirmed (i.e., Pdgfa and Fgfr2) and a new factor was identified, connective tissue growth factor (CTGF). CTGF was tested in ovarian organ cultures and found to stimulate primordial follicle development. Therefore, the relevant gene network associated with primordial follicle development was validated and the critical genes and pathways involved in this process were identified. This is one of the first applications of network analysis to a normal developmental process. These observations provide insights into potential therapeutic targets for preventing ovarian disease and promoting female reproduction. PMID:20661288

  1. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  2. Gene bionetwork analysis of ovarian primordial follicle development.

    Directory of Open Access Journals (Sweden)

    Eric E Nilsson

    Full Text Available Ovarian primordial follicles are critical for female reproduction and comprise a finite pool of gametes arrested in development. A systems biology approach was used to identify regulatory gene networks essential for primordial follicle development. Transcriptional responses to eight different growth factors known to influence primordial follicles were used to construct a bionetwork of regulatory genes involved in rat primordial follicle development. Over 1,500 genes were found to be regulated by the various growth factors and a network analysis identified critical gene modules involved in a number of signaling pathways and cellular processes. A set of 55 genes was identified as potential critical regulators of these gene modules, and a sub-network associated with development was determined. Within the network two previously identified regulatory genes were confirmed (i.e., Pdgfa and Fgfr2 and a new factor was identified, connective tissue growth factor (CTGF. CTGF was tested in ovarian organ cultures and found to stimulate primordial follicle development. Therefore, the relevant gene network associated with primordial follicle development was validated and the critical genes and pathways involved in this process were identified. This is one of the first applications of network analysis to a normal developmental process. These observations provide insights into potential therapeutic targets for preventing ovarian disease and promoting female reproduction.

  3. Prediction and analysis of retinoblastoma related genes through gene ontology and KEGG.

    Science.gov (United States)

    Li, Zhen; Li, Bi-Qing; Jiang, Min; Chen, Lei; Zhang, Jian; Liu, Lin; Huang, Tao

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  4. Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG

    Directory of Open Access Journals (Sweden)

    Zhen Li

    2013-01-01

    Full Text Available One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR method followed by incremental feature selection (IFS. 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  5. Gene Expression Signature in Endemic Osteoarthritis by Microarray Analysis

    Directory of Open Access Journals (Sweden)

    Xi Wang

    2015-05-01

    Full Text Available Kashin-Beck Disease (KBD is an endemic osteochondropathy with an unknown pathogenesis. Diagnosis of KBD is effective only in advanced cases, which eliminates the possibility of early treatment and leads to an inevitable exacerbation of symptoms. Therefore, we aim to identify an accurate blood-based gene signature for the detection of KBD. Previously published gene expression profile data on cartilage and peripheral blood mononuclear cells (PBMCs from adults with KBD were compared to select potential target genes. Microarray analysis was conducted to evaluate the expression of the target genes in a cohort of 100 KBD patients and 100 healthy controls. A gene expression signature was identified using a training set, which was subsequently validated using an independent test set with a minimum redundancy maximum relevance (mRMR algorithm and support vector machine (SVM algorithm. Fifty unique genes were differentially expressed between KBD patients and healthy controls. A 20-gene signature was identified that distinguished between KBD patients and controls with 90% accuracy, 85% sensitivity, and 95% specificity. This study identified a 20-gene signature that accurately distinguishes between patients with KBD and controls using peripheral blood samples. These results promote the further development of blood-based genetic biomarkers for detection of KBD.

  6. Stochastic analysis in discrete and continuous settings with normal martingales

    CERN Document Server

    Privault, Nicolas

    2009-01-01

    This volume gives a unified presentation of stochastic analysis for continuous and discontinuous stochastic processes, in both discrete and continuous time. It is mostly self-contained and accessible to graduate students and researchers having already received a basic training in probability. The simultaneous treatment of continuous and jump processes is done in the framework of normal martingales; that includes the Brownian motion and compensated Poisson processes as specific cases. In particular, the basic tools of stochastic analysis (chaos representation, gradient, divergence, integration by parts) are presented in this general setting. Applications are given to functional and deviation inequalities and mathematical finance.

  7. Adaptive Learning Environments: A Requirements Analysis in Business Settings

    Directory of Open Access Journals (Sweden)

    Kai Michael Höver

    2009-08-01

    Full Text Available The design and development of an adaptive learning system (ALS should be guided by a thorough analysis of users’ expectations and needs. A requirements analysis has been carried out by means of scenario-based semi-structured interviews in order to investigate the personalization and adaptation preferences of different stakeholder groups in business settings. Results show that an ALS has a decided advantage over a non-adaptive learning system by offering individual treatment of learners. The adaptation of content and learning activities to learner knowledge and learning goal, particularly determined by the job role, is perceived to be most relevant.

  8. A Project Risk Ranking Approach Based on Set Pair Analysis

    Institute of Scientific and Technical Information of China (English)

    Gao Feng; Chen Yingwu

    2006-01-01

    Set Pair Analysis (SPA) is a new methodology to describe and process system uncertainty. It is different from stochastic or fuzzy methods in reasoning and operation, and it has been applied in many areas recently. In this paper, the application of SPA in risk ranking is presented, which includes review of risk ranking, introduction of Connecting Degree (CD) that is a key role in SPA., Arithmetic and Tendency Grade (TG) of CDs, and a risk ranking approach proposed. Finally a case analysis is presented to illustrate the reasonability of this approach. It is found that this approach is very convenient to operate, while the ranking result is more comprehensible.

  9. Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets

    Directory of Open Access Journals (Sweden)

    Burgun Anita

    2006-05-01

    Full Text Available Abstract Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence or text-mining of the published scientific literature (literature profiling. Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2 and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells. Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions.

  10. Reachable set modeling and engagement analysis of exoatmospheric interceptor

    Institute of Scientific and Technical Information of China (English)

    Chai Hua; Liang Yangang; Chen Lei; Tang Guojin

    2014-01-01

    A novel reachable set (RS) model is developed within a framework of exoatmospheric interceptor engagement analysis. The boost phase steering scheme and trajectory distortion mech-anism of the interceptor are firstly explored. A mathematical model of the distorted RS is then for-mulated through a dimension–reduction analysis. By treating the outer boundary of the RS on sphere surface as a spherical convex hull, two relevant theorems are proposed and the RS envelope is depicted by the computational geometry theory. Based on RS model, the algorithms of intercept window analysis and launch parameters determination are proposed, and numerical simulations are carried out for interceptors with different energy or launch points. Results show that the proposed method can avoid intensive on-line computation and provide an accurate and effective approach for interceptor engagement analysis. The suggested RS model also serves as a ready reference to other related problems such as interceptor effectiveness evaluation and platform disposition.

  11. Reachable set modeling and engagement analysis of exoatmospheric interceptor

    Directory of Open Access Journals (Sweden)

    Chai Hua

    2014-12-01

    Full Text Available A novel reachable set (RS model is developed within a framework of exoatmospheric interceptor engagement analysis. The boost phase steering scheme and trajectory distortion mechanism of the interceptor are firstly explored. A mathematical model of the distorted RS is then formulated through a dimension–reduction analysis. By treating the outer boundary of the RS on sphere surface as a spherical convex hull, two relevant theorems are proposed and the RS envelope is depicted by the computational geometry theory. Based on RS model, the algorithms of intercept window analysis and launch parameters determination are proposed, and numerical simulations are carried out for interceptors with different energy or launch points. Results show that the proposed method can avoid intensive on-line computation and provide an accurate and effective approach for interceptor engagement analysis. The suggested RS model also serves as a ready reference to other related problems such as interceptor effectiveness evaluation and platform disposition.

  12. Different gene sets contribute to different symptom dimensions of depression and anxiety

    OpenAIRE

    van Veen, Tineke; Goeman, Jelle J.; Monajemi, Ramin; Wardenaar, Klaas J; Hartman, Catharina A; Snieder, Harold; Nolte, Ilja M; Penninx, Brenda W. J. H.; Zitman, Frans G.

    2012-01-01

    Although many genetic association studies have been carried out, it remains unclear which genes contribute to depression. This may be due to heterogeneity of the DSM-IV category of depression. Specific symptom-dimensions provide a more homogenous phenotype. Furthermore, as effects of individual genes are small, analysis of genetic data at the pathway-level provides more power to detect associations and yield valuable biological insight. In 1,398 individuals with a Major Depressive Disorder, t...

  13. Identification of noise in linear data sets by factor analysis

    Energy Technology Data Exchange (ETDEWEB)

    Roscoe, B.A.; Hopke, P.K.

    1981-01-01

    The approach to classical factor analysis described in this paper, i.e. doing the analysis for varying numbers of factors without prior assumptions to the number of factors, prevents one from getting eroneous results by inherent computer code assumptions. Identification of a factor containing most of the variance of one variable with little variance of other variables, pinpoints a possible difficulty in the data, if the singularity has no obvious physical significance. Examination of the factor scores will determine whether the problem is isolated to a few samples or over all the samples. Having this information, one may then go back to the raw data and take the appropriate corrective action. Classical factor analysis has the ability to identify several types of errors in data after it has been generated. It is then ideally suited for scanning large data sets. The ease of the identification technique makes it a beneficial tool to use before reduction and analysis of large data sets and should, in the long run, save time and effort.

  14. Sequencing and Gene Expression Analysis of Leishmania tropica LACK Gene.

    Directory of Open Access Journals (Sweden)

    Nour Hammoudeh

    2014-12-01

    Full Text Available Leishmania Homologue of receptors for Activated C Kinase (LACK antigen is a 36-kDa protein, which provokes a very early immune response against Leishmania infection. There are several reports on the expression of LACK through different life-cycle stages of genus Leishmania, but only a few of them have focused on L.tropica.The present study provides details of the cloning, DNA sequencing and gene expression of LACK in this parasite species. First, several local isolates of Leishmania parasites were typed in our laboratory using PCR technique to verify of Leishmania parasite species. After that, LACK gene was amplified and cloned into a vector for sequencing. Finally, the expression of this molecule in logarithmic and stationary growth phase promastigotes, as well as in amastigotes, was evaluated by Reverse Transcription-PCR (RT-PCR technique.The typing result confirmed that all our local isolates belong to L.tropica. LACK gene sequence was determined and high similarity was observed with the sequences of other Leishmania species. Furthermore, the expression of LACK gene in both promastigotes and amastigotes forms was confirmed.Overall, the data set the stage for future studies of the properties and immune role of LACK gene products.

  15. Statistical analysis of unlabeled point sets: comparing molecules in chemoinformatics.

    Science.gov (United States)

    Dryden, Ian L; Hirst, Jonathan D; Melville, James L

    2007-03-01

    We consider Bayesian methodology for comparing two or more unlabeled point sets. Application of the technique to a set of steroid molecules illustrates its potential utility involving the comparison of molecules in chemoinformatics and bioinformatics. We initially match a pair of molecules, where one molecule is regarded as random and the other fixed. A type of mixture model is proposed for the point set coordinates, and the parameters of the distribution are a labeling matrix (indicating which pairs of points match) and a concentration parameter. An important property of the likelihood is that it is invariant under rotations and translations of the data. Bayesian inference for the parameters is carried out using Markov chain Monte Carlo simulation, and it is demonstrated that the procedure works well on the steroid data. The posterior distribution is difficult to simulate from, due to multiple local modes, and we also use additional data (partial charges on atoms) to help with this task. An approximation is considered for speeding up the simulation algorithm, and the approximating fast algorithm leads to essentially identical inference to that under the exact method for our data. Extensions to multiple molecule alignment are also introduced, and an algorithm is described which also works well on the steroid data set. After all the steroid molecules have been matched, exploratory data analysis is carried out to examine which molecules are similar. Also, further Bayesian inference for the multiple alignment problem is considered.

  16. SIGNATURE: A workbench for gene expression signature analysis

    Directory of Open Access Journals (Sweden)

    Chang Jeffrey T

    2011-11-01

    Full Text Available Abstract Background The biological phenotype of a cell, such as a characteristic visual image or behavior, reflects activities derived from the expression of collections of genes. As such, an ability to measure the expression of these genes provides an opportunity to develop more precise and varied sets of phenotypes. However, to use this approach requires computational methods that are difficult to implement and apply, and thus there is a critical need for intelligent software tools that can reduce the technical burden of the analysis. Tools for gene expression analyses are unusually difficult to implement in a user-friendly way because their application requires a combination of biological data curation, statistical computational methods, and database expertise. Results We have developed SIGNATURE, a web-based resource that simplifies gene expression signature analysis by providing software, data, and protocols to perform the analysis successfully. This resource uses Bayesian methods for processing gene expression data coupled with a curated database of gene expression signatures, all carried out within a GenePattern web interface for easy use and access. Conclusions SIGNATURE is available for public use at http://genepattern.genome.duke.edu/signature/.

  17. The transcriptional response to encystation stimuli in Giardia lamblia is restricted to a small set of genes.

    Science.gov (United States)

    Morf, Laura; Spycher, Cornelia; Rehrauer, Hubert; Fournier, Catharine Aquino; Morrison, Hilary G; Hehl, Adrian B

    2010-10-01

    The protozoan parasite Giardia lamblia undergoes stage differentiation in the small intestine of the host to an environmentally resistant and infectious cyst. Encystation involves the secretion of an extracellular matrix comprised of cyst wall proteins (CWPs) and a β(1-3)-GalNAc homopolymer. Upon the induction of encystation, genes coding for CWPs are switched on, and mRNAs coding for a Myb transcription factor and enzymes involved in cyst wall glycan synthesis are upregulated. Encystation in vitro is triggered by several protocols, which call for changes in bile concentrations or availability of lipids, and elevated pH. However, the conditions for induction are not standardized and we predicted significant protocol-specific side effects. This makes reliable identification of encystation factors difficult. Here, we exploited the possibility of inducing encystation with two different protocols, which we show to be equally effective, for a comparative mRNA profile analysis. The standard encystation protocol induced a bipartite transcriptional response with surprisingly minor involvement of stress genes. A comparative analysis revealed a core set of only 18 encystation genes and showed that a majority of genes was indeed upregulated as a side effect of inducing conditions. We also established a Myb binding sequence as a signature motif in encystation promoters, suggesting coordinated regulation of these factors.

  18. An analysis of strategic price setting in retail gasoline markets

    Science.gov (United States)

    Jaureguiberry, Florencia

    This dissertation studies price-setting behavior in the retail gasoline industry. The main questions addressed are: How important is a retail station's brand and proximity to competitors when retail stations set price? How do retailers adjust their pricing when they cater to consumers who are less aware of competing options or have less discretion over where they purchase gasoline? These questions are explored in two separate analyses using a unique datasets containing retail pricing behavior of stations in California and in 24 different metropolitan areas. The evidence suggests that brand and location generate local market power for gasoline stations. After controlling for market and station characteristics, the analysis finds a spread of 11 cents per gallon between the highest and the lowest priced retail gasoline brands. The analysis also indicates that when the nearest competitor is located over 2 miles away as opposed to next door, consumers will pay an additional 1 cent per gallon of gasoline. In order to quantify the significance of local market power, data for stations located near major airport rental car locations are utilized. The presumption here is that rental car users are less aware or less sensitive to fueling options near the rental car return location and are to some extent "captured consumers". Retailers located near rental car locations have incentives to adjust their pricing strategies to exploit this. The analysis of pricing near rental car locations indicates that retailers charge prices that are 4 cent per gallon higher than other stations in the same metropolitan area. This analysis is of interest to regulators who are concerned with issues of consolidation, market power, and pricing in the retail gasoline industry. This dissertation concludes with a discussion of the policy implications of the empirical analysis.

  19. Using GenePattern for Gene Expression Analysis

    Science.gov (United States)

    Kuehn, Heidi; Liberzon, Arthur; Reich, Michael; Mesirov, Jill P.

    2013-01-01

    The abundance of genomic data now available in biomedical research has stimulated the development of sophisticated statistical methods for interpreting the data, and of special visualization tools for displaying the results in a concise and meaningful manner. However, biologists often find these methods and tools difficult to understand and use correctly. GenePattern is a freely available software package that addresses this issue by providing more than 100 analysis and visualization tools for genomic research in a comprehensive user-friendly environment for users at all levels of computational experience and sophistication. This unit demonstrates how to prepare and analyze microarray data in GenePattern. PMID:18551415

  20. Gene expression profiling identifies a set of transcripts that are up-regulated inhuman testicular seminoma.

    Science.gov (United States)

    Yamada, Shigeyuki; Kohu, Kazuyoshi; Ishii, Tomohiko; Ishidoya, Shigeto; Ishidoya, Shigeru; Hiramatsu, Masayoshi; Kanto, Satoru; Fukuzaki, Atsushi; Adachi, Yutsu; Endoh, Mareyuki; Moriya, Takuya; Sasaki, Hiroki; Satake, Masanobu; Arai, Yoichi

    2004-10-31

    Seminoma constitutes one subtype of human testicular germ cell tumors and is uniformly composed of cells that are morphologically similar to the primordial germ cells and/or the cells in the carcinoma in situ. We performed a genome-wide exploration of the genes that are specifically up-regulated in seminoma by oligonucleotide-based microarray analysis. This revealed 106 genes that are significantly and consistently up-regulated in the seminomas compared to the adjacent normal tissues of the testes. The microarray data were validated by semi-quantitative RT-PCR analysis. Of the 106 genes, 42 mapped to a small number of specific chromosomal regions, namely, 1q21, 2p23, 6p21-22, 7p14-15, 12pll, 12p13, 12q13-14 and 22q12-13. This list of up-regulated genes may be useful in identifying the causative oncogene(s) and/or the origin of seminoma. Furthermore, immunohistochemical analysis revealed that the seminoma cells specifically expressed the six gene products that were selected randomly from the list. These proteins include CCND2 and DNMT3A and may be useful as molecular pathological markers of seminoma.

  1. Stacks: an analysis tool set for population genomics.

    Science.gov (United States)

    Catchen, Julian; Hohenlohe, Paul A; Bassham, Susan; Amores, Angel; Cresko, William A

    2013-06-01

    Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.

  2. Fuzzy-Set Based Sentiment Analysis of Big Social Data

    DEFF Research Database (Denmark)

    Mukkamala, Raghava Rao; Hussain, Abid; Vatrapu, Ravi

    Computational approaches to social media analytics are largely limited to graph theoretical approaches such as social network analysis (SNA) informed by the social philosophical approach of relational sociology. There are no other unified modelling approaches to social data that integrate...... the conceptual, formal, software, analytical and empirical realms. In this paper, we first present and discuss a theory and conceptual model of social data. Second, we outline a formal model based on fuzzy set theory and describe the operational semantics of the formal model with a real-world social data example...... from Facebook. Third, we briefly present and discuss the Social Data Analytics Tool (SODATO) that realizes the conceptual model in software and provisions social data analysis based on the conceptual and formal models. Fourth, we use SODATO to fetch social data from the facebook wall of a global brand...

  3. Fuzzy-Set Based Sentiment Analysis of Big Social Data

    DEFF Research Database (Denmark)

    Mukkamala, Raghava Rao; Hussain, Abid; Vatrapu, Ravi

    2014-01-01

    Abstract—Computational approaches to social media analytics are largely limited to graph theoretical approaches such as social network analysis (SNA) informed by the social philosophical approach of relational sociology. There are no other unified modelling approaches to social data that integrate...... the conceptual, formal, software, analytical and empirical realms. In this paper, we first present and discuss a theory and conceptual model of social data. Second, we outline a formal model based on fuzzy set theory and describe the operational semantics of the formal model with a real-world social data example...... from Facebook. Third, we briefly present and discuss the Social Data Analytics Tool (SODATO) that realizes the conceptual model in software and provisions social data analysis based on the conceptual and formal models. Fourth, we use SODATO to fetch social data from the facebook wall of a global brand...

  4. GAMYB controls different sets of genes and is differentially regulated by microRNA in aleurone cells and anthers.

    Science.gov (United States)

    Tsuji, Hiroyuki; Aya, Koichiro; Ueguchi-Tanaka, Miyako; Shimada, Yukihisa; Nakazono, Mikio; Watanabe, Ryosuke; Nishizawa, Naoko K; Gomi, Kenji; Shimada, Asako; Kitano, Hidemi; Ashikari, Motoyuki; Matsuoka, Makoto

    2006-08-01

    GAMYB is a component of gibberellin (GA) signaling in cereal aleurone cells, and has an important role in flower development. However, it is unclear how GAMYB function is regulated. We examined the involvement of a microRNA, miR159, in the regulation of GAMYB expression in cereal aleurone cells and flower development. In aleurone cells, no miR159 expression was observed with or without GA treatment, suggesting that miR159 is not involved in the regulation of GAMYB and GAMYB-like genes in this tissue. miR159 was expressed in tissues other than aleurone, and miR159 over-expressors showed similar but more severe phenotypes than the gamyb mutant. GAMYB and GAMYB-like genes are co-expressed with miR159 in anthers, and the mRNA levels for GAMYB and GAMYB-like genes are negatively correlated with miR159 levels during anther development. Thus, OsGAMYB and OsGAMYB-like genes are regulated by miR159 in flowers. A microarray analysis revealed that OsGAMYB and its upstream regulator SLR1 are involved in the regulation of almost all GA-mediated gene expression in rice aleurone cells. Moreover, different sets of genes are regulated by GAMYB in aleurone cells and anthers. GAMYB binds directly to promoter regions of its target genes in anthers as well as aleurone cells. Based on these observations, we suggest that the regulation of GAMYB expression and GAMYB function are different in aleurone cells and flowers in rice.

  5. Platform dependence of inference on gene-wise and gene-set involvement in human lung development

    Directory of Open Access Journals (Sweden)

    Kho Alvin T

    2009-06-01

    Full Text Available Abstract Background With the recent development of microarray technologies, the comparability of gene expression data obtained from different platforms poses an important problem. We evaluated two widely used platforms, Affymetrix U133 Plus 2.0 and the Illumina HumanRef-8 v2 Expression Bead Chips, for comparability in a biological system in which changes may be subtle, namely fetal lung tissue as a function of gestational age. Results We performed the comparison via sequence-based probe matching between the two platforms. "Significance grouping" was defined as a measure of comparability. Using both expression correlation and significance grouping as measures of comparability, we demonstrated that despite overall cross-platform differences at the single gene level, increased correlation between the two platforms was found in genes with higher expression level, higher probe overlap, and lower p-value. We also demonstrated that biological function as determined via KEGG pathways or GO categories is more consistent across platforms than single gene analysis. Conclusion We conclude that while the comparability of the platforms at the single gene level may be increased by increasing sample size, they are highly comparable ontologically even for subtle differences in a relatively small sample size. Biologically relevant inference should therefore be reproducible across laboratories using different platforms.

  6. Identification of the Core Set of Carbon-Associated Genes in a Bioenergy Grassland Soil

    Science.gov (United States)

    Howe, Adina; Yang, Fan; Williams, Ryan J.; Meyer, Folker; Hofmockel, Kirsten S.

    2016-01-01

    Despite the central role of soil microbial communities in global carbon (C) cycling, little is known about soil microbial community structure and even less about their metabolic pathways. Efforts to characterize soil communities often focus on identifying differences in gene content across environmental gradients, but an alternative question is what genes are similar in soils. These genes may indicate critical species or potential functions that are required in all soils. Here we identified the “core” set of C cycling sequences widely present in multiple soil metagenomes from a fertilized prairie (FP). Of 226,887 sequences associated with known enzymes involved in the synthesis, metabolism, and transport of carbohydrates, 843 were identified to be consistently prevalent across four replicate soil metagenomes. This core metagenome was functionally and taxonomically diverse, representing five enzyme classes and 99 enzyme families within the CAZy database. Though it only comprised 0.4% of all CAZy-associated genes identified in FP metagenomes, the core was found to be comprised of functions similar to those within cumulative soils. The FP CAZy-associated core sequences were present in multiple publicly available soil metagenomes and most similar to soils sharing geographic proximity. In soil ecosystems, where high diversity remains a key challenge for metagenomic investigations, these core genes represent a subset of critical functions necessary for carbohydrate metabolism, which can be targeted to evaluate important C fluxes in these and other similar soils. PMID:27855202

  7. GeneMesh: a web-based microarray analysis tool for relating differentially expressed genes to MeSH terms

    Directory of Open Access Journals (Sweden)

    Argraves W Scott

    2010-04-01

    Full Text Available Abstract Background An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Results Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc. or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.. Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. Conclusions GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies

  8. A fuzzy set preference model for market share analysis

    Science.gov (United States)

    Turksen, I. B.; Willson, Ian A.

    1992-01-01

    Consumer preference models are widely used in new product design, marketing management, pricing, and market segmentation. The success of new products depends on accurate market share prediction and design decisions based on consumer preferences. The vague linguistic nature of consumer preferences and product attributes, combined with the substantial differences between individuals, creates a formidable challenge to marketing models. The most widely used methodology is conjoint analysis. Conjoint models, as currently implemented, represent linguistic preferences as ratio or interval-scaled numbers, use only numeric product attributes, and require aggregation of individuals for estimation purposes. It is not surprising that these models are costly to implement, are inflexible, and have a predictive validity that is not substantially better than chance. This affects the accuracy of market share estimates. A fuzzy set preference model can easily represent linguistic variables either in consumer preferences or product attributes with minimal measurement requirements (ordinal scales), while still estimating overall preferences suitable for market share prediction. This approach results in flexible individual-level conjoint models which can provide more accurate market share estimates from a smaller number of more meaningful consumer ratings. Fuzzy sets can be incorporated within existing preference model structures, such as a linear combination, using the techniques developed for conjoint analysis and market share estimation. The purpose of this article is to develop and fully test a fuzzy set preference model which can represent linguistic variables in individual-level models implemented in parallel with existing conjoint models. The potential improvements in market share prediction and predictive validity can substantially improve management decisions about what to make (product design), for whom to make it (market segmentation), and how much to make (market share

  9. Genome-Wide Temporal Expression Profiling in Caenorhabditis elegans Identifies a Core Gene Set Related to Long-Term Memory.

    Science.gov (United States)

    Freytag, Virginie; Probst, Sabine; Hadziselimovic, Nils; Boglari, Csaba; Hauser, Yannick; Peter, Fabian; Gabor Fenyves, Bank; Milnik, Annette; Demougin, Philippe; Vukojevic, Vanja; de Quervain, Dominique J-F; Papassotiropoulos, Andreas; Stetak, Attila

    2017-07-12

    The identification of genes related to encoding, storage, and retrieval of memories is a major interest in neuroscience. In the current study, we analyzed the temporal gene expression changes in a neuronal mRNA pool during an olfactory long-term associative memory (LTAM) in Caenorhabditis elegans hermaphrodites. Here, we identified a core set of 712 (538 upregulated and 174 downregulated) genes that follows three distinct temporal peaks demonstrating multiple gene regulation waves in LTAM. Compared with the previously published positive LTAM gene set (Lakhina et al., 2015), 50% of the identified upregulated genes here overlap with the previous dataset, possibly representing stimulus-independent memory-related genes. On the other hand, the remaining genes were not previously identified in positive associative memory and may specifically regulate aversive LTAM. Our results suggest a multistep gene activation process during the formation and retrieval of long-term memory and define general memory-implicated genes as well as conditioning-type-dependent gene sets.SIGNIFICANCE STATEMENT The identification of genes regulating different steps of memory is of major interest in neuroscience. Identification of common memory genes across different learning paradigms and the temporal activation of the genes are poorly studied. Here, we investigated the temporal aspects of Caenorhabditis elegans gene expression changes using aversive olfactory associative long-term memory (LTAM) and identified three major gene activation waves. Like in previous studies, aversive LTAM is also CREB dependent, and CREB activity is necessary immediately after training. Finally, we define a list of memory paradigm-independent core gene sets as well as conditioning-dependent genes. Copyright © 2017 the authors 0270-6474/17/376661-12$15.00/0.

  10. The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.

    Directory of Open Access Journals (Sweden)

    Byregowda Munishamappa

    2010-03-01

    .8% in molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8% markers with an average of four alleles per marker and an average polymorphic information content (PIC value of 0.40. Similarly, in silico mining of 133 contigs with ≥ 5 sequences detected 102 single nucleotide polymorphisms (SNPs in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS assay. Conclusion The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.

  11. Biclustering methods: biological relevance and application in gene expression analysis.

    Directory of Open Access Journals (Sweden)

    Ali Oghabian

    Full Text Available DNA microarray technologies are used extensively to profile the expression levels of thousands of genes under various conditions, yielding extremely large data-matrices. Thus, analyzing this information and extracting biologically relevant knowledge becomes a considerable challenge. A classical approach for tackling this challenge is to use clustering (also known as one-way clustering methods where genes (or respectively samples are grouped together based on the similarity of their expression profiles across the set of all samples (or respectively genes. An alternative approach is to develop biclustering methods to identify local patterns in the data. These methods extract subgroups of genes that are co-expressed across only a subset of samples and may feature important biological or medical implications. In this study we evaluate 13 biclustering and 2 clustering (k-means and hierarchical methods. We use several approaches to compare their performance on two real gene expression data sets. For this purpose we apply four evaluation measures in our analysis: (1 we examine how well the considered (biclustering methods differentiate various sample types; (2 we evaluate how well the groups of genes discovered by the (biclustering methods are annotated with similar Gene Ontology categories; (3 we evaluate the capability of the methods to differentiate genes that are known to be specific to the particular sample types we study and (4 we compare the running time of the algorithms. In the end, we conclude that as long as the samples are well defined and annotated, the contamination of the samples is limited, and the samples are well replicated, biclustering methods such as Plaid and SAMBA are useful for discovering relevant subsets of genes and samples.

  12. Risk analysis of colorectal cancer incidence by gene expression analysis

    Science.gov (United States)

    Shangkuan, Wei-Chuan; Lin, Hung-Che; Chang, Yu-Tien; Jian, Chen-En; Fan, Hueng-Chuen; Chen, Kang-Hua; Liu, Ya-Fang; Hsu, Huan-Ming; Chou, Hsiu-Ling; Yao, Chung-Tay

    2017-01-01

    Background Colorectal cancer (CRC) is one of the leading cancers worldwide. Several studies have performed microarray data analyses for cancer classification and prognostic analyses. Microarray assays also enable the identification of gene signatures for molecular characterization and treatment prediction. Objective Microarray gene expression data from the online Gene Expression Omnibus (GEO) database were used to to distinguish colorectal cancer from normal colon tissue samples. Methods We collected microarray data from the GEO database to establish colorectal cancer microarray gene expression datasets for a combined analysis. Using the Prediction Analysis for Microarrays (PAM) method and the GSEA MSigDB resource, we analyzed the 14,698 genes that were identified through an examination of their expression values between normal and tumor tissues. Results Ten genes (ABCG2, AQP8, SPIB, CA7, CLDN8, SCNN1B, SLC30A10, CD177, PADI2, and TGFBI) were found to be good indicators of the candidate genes that correlate with CRC. From these selected genes, an average of six significant genes were obtained using the PAM method, with an accuracy rate of 95%. The results demonstrate the potential of utilizing a model with the PAM method for data mining. After a detailed review of the published reports, the results confirmed that the screened candidate genes are good indicators for cancer risk analysis using the PAM method. Conclusions Six genes were selected with 95% accuracy to effectively classify normal and colorectal cancer tissues. We hope that these results will provide the basis for new research projects in clinical practice that aim to rapidly assess colorectal cancer risk using microarray gene expression analysis. PMID:28229027

  13. Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression

    Institute of Scientific and Technical Information of China (English)

    Amichai Painsky; Saharon Rosset

    2014-01-01

    The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.

  14. Association Analysis for Visual Exploration of Multivariate Scientific Data Sets.

    Science.gov (United States)

    Liu, Xiaotong; Shen, Han-Wei

    2016-01-01

    The heterogeneity and complexity of multivariate characteristics poses a unique challenge to visual exploration of multivariate scientific data sets, as it requires investigating the usually hidden associations between different variables and specific scalar values to understand the data's multi-faceted properties. In this paper, we present a novel association analysis method that guides visual exploration of scalar-level associations in the multivariate context. We model the directional interactions between scalars of different variables as information flows based on association rules. We introduce the concepts of informativeness and uniqueness to describe how information flows between scalars of different variables and how they are associated with each other in the multivariate domain. Based on scalar-level associations represented by a probabilistic association graph, we propose the Multi-Scalar Informativeness-Uniqueness (MSIU) algorithm to evaluate the informativeness and uniqueness of scalars. We present an exploration framework with multiple interactive views to explore the scalars of interest with confident associations in the multivariate spatial domain, and provide guidelines for visual exploration using our framework. We demonstrate the effectiveness and usefulness of our approach through case studies using three representative multivariate scientific data sets.

  15. Gene expression risk signatures maintain prognostic power in multiple myeloma despite microarray probe set translation

    DEFF Research Database (Denmark)

    Hermansen, N E U; Borup, R; Andersen, M K

    2016-01-01

    INTRODUCTION: Gene expression profiling (GEP) risk models in multiple myeloma are based on 3'-end microarrays. We hypothesized that GEP risk signatures could retain prognostic power despite being translated and applied to whole-transcript microarray data. METHODS: We studied CD138-positive bone...... marrow plasma cells in a prospective cohort of 59 samples from newly diagnosed patients eligible for high-dose therapy (HDT) and 67 samples from previous HDT patients with progressive disease. We used Affymetrix Human Gene 1.1 ST microarrays for GEP. Nine GEP risk signatures were translated by probe set......-87). Various translated GEP risk signatures or combinations hereof were significantly correlated with survival: among newly diagnosed patients mainly in combination with cytogenetic high-risk markers and among relapsed patients mainly in combination with ISS stage III. CONCLUSION: Translated GEP risk...

  16. Identification of candidate genes in osteoporosis by integrated microarray analysis

    Science.gov (United States)

    Li, J. J.; Wang, B. Q.; Yang, Y.; Li, D.

    2016-01-01

    Objectives In order to screen the altered gene expression profile in peripheral blood mononuclear cells of patients with osteoporosis, we performed an integrated analysis of the online microarray studies of osteoporosis. Methods We searched the Gene Expression Omnibus (GEO) database for microarray studies of peripheral blood mononuclear cells in patients with osteoporosis. Subsequently, we integrated gene expression data sets from multiple microarray studies to obtain differentially expressed genes (DEGs) between patients with osteoporosis and normal controls. Gene function analysis was performed to uncover the functions of identified DEGs. Results A total of three microarray studies were selected for integrated analysis. In all, 1125 genes were found to be significantly differentially expressed between osteoporosis patients and normal controls, with 373 upregulated and 752 downregulated genes. Positive regulation of the cellular amino metabolic process (gene ontology (GO): 0033240, false discovery rate (FDR) = 1.00E + 00) was significantly enriched under the GO category for biological processes, while for molecular functions, flavin adenine dinucleotide binding (GO: 0050660, FDR = 3.66E-01) and androgen receptor binding (GO: 0050681, FDR = 6.35E-01) were significantly enriched. DEGs were enriched in many osteoporosis-related signalling pathways, including those of mitogen-activated protein kinase (MAPK) and calcium. Protein-protein interaction (PPI) network analysis showed that the significant hub proteins contained ubiquitin specific peptidase 9, X-linked (Degree = 99), ubiquitin specific peptidase 19 (Degree = 57) and ubiquitin conjugating enzyme E2 B (Degree = 57). Conclusion Analysis of gene function of identified differentially expressed genes may expand our understanding of fundamental mechanisms leading to osteoporosis. Moreover, significantly enriched pathways, such as MAPK and calcium, may involve in osteoporosis through osteoblastic differentiation and

  17. GeneNet Toolbox for MATLAB: a flexible platform for the analysis of gene connectivity in biological networks.

    Science.gov (United States)

    Taylor, Avigail; Steinberg, Julia; Andrews, Tallulah S; Webber, Caleb

    2015-02-01

    We present GeneNet Toolbox for MATLAB (also available as a set of standalone applications for Linux). The toolbox, available as command-line or with a graphical user interface, enables biologists to assess connectivity among a set of genes of interest ('seed-genes') within a biological network of their choosing. Two methods are implemented for calculating the significance of connectivity among seed-genes: 'seed randomization' and 'network permutation'. Options include restricting analyses to a specified subnetwork of the primary biological network, and calculating connectivity from the seed-genes to a second set of interesting genes. Pre-analysis tools help the user choose the best connectivity-analysis algorithm for their network. The toolbox also enables visualization of the connections among seed-genes. GeneNet Toolbox functions execute in reasonable time for very large networks (∼10 million edges) on a desktop computer. GeneNet Toolbox is open source and freely available from http://avigailtaylor.github.io/gntat14. Supplementary data are available at Bioinformatics online. avigail.taylor@dpag.ox.ac.uk. © The Author 2014. Published by Oxford University Press.

  18. Gene family level comparative analysis of gene expression in mammals validates the ortholog conjecture.

    Science.gov (United States)

    Rogozin, Igor B; Managadze, David; Shabalina, Svetlana A; Koonin, Eugene V

    2014-04-01

    The ortholog conjecture (OC), which is central to functional annotation of genomes, posits that orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of Gene Ontology (GO) annotations and expression profiles, among within-species paralogs compared with orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. However, several subsequent studies suggest that GO annotations and microarray data could artificially inflate functional similarity between paralogs from the same organism. We sought to test the OC using approaches distinct from those used in previous studies. Analysis of a large RNAseq data set from multiple human and mouse tissues shows that expression similarity (correlations coefficients, rank's, or Z-scores) between orthologs is substantially greater than that for between-species paralogs with the same sequence divergence, in agreement with the OC and the results of recent detailed analyses. These findings are further corroborated by a fine-grain analysis in which expression profiles of orthologs and paralogs were compared separately for individual gene families. Expression profiles of within-species paralogs are more strongly correlated than profiles of orthologs but it is shown that this is caused by high background noise, that is, correlation between profiles of unrelated genes in the same organism. Z-scores and rank scores show a nonmonotonic dependence of expression profile similarity on sequence divergence. This complexity of gene expression evolution after duplication might be at least partially caused by selection for protein dosage rebalancing following gene duplication.

  19. Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data

    Directory of Open Access Journals (Sweden)

    Kim Seon-Young

    2006-07-01

    Full Text Available Abstract Background A complete understanding of the regulatory mechanisms of gene expression is the next important issue of genomics. Many bioinformaticians have developed methods and algorithms for predicting transcriptional regulatory mechanisms from sequence, gene expression, and binding data. However, most of these studies involved the use of yeast which has much simpler regulatory networks than human and has many genome wide binding data and gene expression data under diverse conditions. Studies of genome wide transcriptional networks of human genomes currently lag behind those of yeast. Results We report herein a new method that combines gene expression data analysis with promoter analysis to infer transcriptional regulatory elements of human genes. The Z scores from the application of gene set analysis with gene sets of transcription factor binding sites (TFBSs were successfully used to represent the activity of TFBSs in a given microarray data set. A significant correlation between the Z scores of gene sets of TFBSs and individual genes across multiple conditions permitted successful identification of many known human transcriptional regulatory elements of genes as well as the prediction of numerous putative TFBSs of many genes which will constitute a good starting point for further experiments. Using Z scores of gene sets of TFBSs produced better predictions than the use of mRNA levels of a transcription factor itself, suggesting that the Z scores of gene sets of TFBSs better represent diverse mechanisms for changing the activity of transcription factors in the cell. In addition, cis-regulatory modules, combinations of co-acting TFBSs, were readily identified by our analysis. Conclusion By a strategic combination of gene set level analysis of gene expression data sets and promoter analysis, we were able to identify and predict many transcriptional regulatory elements of human genes. We conclude that this approach will aid in decoding

  20. A six degrees-of-freedom marker set for gait analysis: repeatability and comparison with a modified Helen Hayes set.

    Science.gov (United States)

    Collins, Thomas D; Ghoussayni, Salim N; Ewins, David J; Kent, Jenny A

    2009-08-01

    Kinematic gait analysis is limited by simplified marker sets and related models. The majority of sets in clinical use were developed with low resolution imaging systems so required various assumptions about body behaviour. Further major limitations include soft tissue artefact and ambiguity in landmark identification. An alternative is the use of sets based on six degrees-of-freedom (DOF) principles, primarily using marker clusters for tracking. This study evaluates performance of a 6DOF set, based largely on CAST/ISB recommendations, through comparison with a conventional set and assessment of repeatability. Ten healthy subjects were assessed in treadmill walking, with both sets applied simultaneously on two occasions. Data were analysed using repeatability coefficients, correlation of key features, and comparison of joint angle curves and difference curves with confidence bands. Apart from pelvic tilt all segment and joint angles from both sets showed high within and between session repeatability (CMC>0.80). Hip rotations showed clear differences between the two sets with indications in support of the 6DOF set. Knee coronal angles showed evidence of cross-talk in the conventional set, highlighting difficulties with anatomical identification despite control measures such as a foot alignment template. Knee transverse angles showed inconsistent patterns for both sets. At the ankle the conventional set only allowed true measurement in two planes so with high repeatability the 6DOF set is preferable. The 6DOF set showed comparable performance to the conventional set and overcomes a number of theoretical limitations, however further development is needed prior to clinical implementation.

  1. MoSET1 (Histone H3K4 Methyltransferase in Magnaporthe oryzae Regulates Global Gene Expression during Infection-Related Morphogenesis.

    Directory of Open Access Journals (Sweden)

    Kieu Thi Minh Pham

    2015-07-01

    Full Text Available Here we report the genetic analyses of histone lysine methyltransferase (KMT genes in the phytopathogenic fungus Magnaporthe oryzae. Eight putative M. oryzae KMT genes were targeted for gene disruption by homologous recombination. Phenotypic assays revealed that the eight KMTs were involved in various infection processes at varying degrees. Moset1 disruptants (Δmoset1 impaired in histone H3 lysine 4 methylation (H3K4me showed the most severe defects in infection-related morphogenesis, including conidiation and appressorium formation. Consequently, Δmoset1 lost pathogenicity on wheat host plants, thus indicating that H3K4me is an important epigenetic mark for infection-related gene expression in M. oryzae. Interestingly, appressorium formation was greatly restored in the Δmoset1 mutants by exogenous addition of cAMP or of the cutin monomer, 16-hydroxypalmitic acid. The Δmoset1 mutants were still infectious on the super-susceptible barley cultivar Nigrate. These results suggested that MoSET1 plays roles in various aspects of infection, including signal perception and overcoming host-specific resistance. However, since Δmoset1 was also impaired in vegetative growth, the impact of MoSET1 on gene regulation was not infection specific. ChIP-seq analysis of H3K4 di- and tri-methylation (H3K4me2/me3 and MoSET1 protein during infection-related morphogenesis, together with RNA-seq analysis of the Δmoset1 mutant, led to the following conclusions: 1 Approximately 5% of M. oryzae genes showed significant changes in H3K4-me2 or -me3 abundance during infection-related morphogenesis. 2 In general, H3K4-me2 and -me3 abundance was positively associated with active transcription. 3 Lack of MoSET1 methyltransferase, however, resulted in up-regulation of a significant portion of the M. oryzae genes in the vegetative mycelia (1,491 genes, and during infection-related morphogenesis (1,385 genes, indicating that MoSET1 has a role in gene repression either

  2. A plastome primer set for comprehensive quantitative real time RT-PCR analysis of Zea mays: a starter primer set for other Poaceae species

    Directory of Open Access Journals (Sweden)

    Dunn Sade N

    2008-06-01

    Full Text Available Abstract Background Quantitative Real Time RT-PCR (q2(RTPCR is a maturing technique which gives researchers the ability to quantify and compare very small amounts of nucleic acids. Primer design and optimization is an essential yet time consuming aspect of using q2(RTPCR. In this paper we describe the design and empirical optimization of primers to amplify and quantify plastid RNAs from Zea mays that are robust enough to use with other closely related species. Results Primers were designed and successfully optimized for 57 of the 104 reported genes in the maize plastome plus two nuclear genes. All 59 primer pairs produced single amplicons after end-point reverse transcriptase polymerase chain reactions (RT-PCR as visualized on agarose gels and subsequently verified by q2(RTPCR. Primer pairs were divided into several categories based on the optimization requirements or the uniqueness of the target gene. An in silico test suggested the majority of the primer sets should work with other members of the Poaceae family. An in vitro test of the primer set on two unsequenced species (Panicum virgatum and Miscanthus sinensis supported this assumption by successfully producing single amplicons for each primer pair. Conclusion Due to the highly conserved chloroplast genome in plant families it is possible to utilize primer pairs designed against one genomic sequence to detect the presence and abundance of plastid genes or transcripts from genomes that have yet to be sequenced. Analysis of steady state transcription of vital system genes is a necessary requirement to comprehensively elucidate gene expression in any organism. The primer pairs reported in this paper were designed for q2(RTPCR of maize chloroplast genes but should be useful for other members of the Poaceae family. Both in silico and in vitro data are presented to support this assumption.

  3. Coupled Two-Way Clustering Analysis of Gene Microarray Data

    CERN Document Server

    Getz, G; Domany, E

    2000-01-01

    We present a novel coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task: we present an algorithm, based on iterative clustering, which performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  4. Coupled two-way clustering analysis of gene microarray data

    Science.gov (United States)

    Getz, Gad; Levine, Erel; Domany, Eytan

    2000-10-01

    We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  5. Function analysis of unknown genes

    DEFF Research Database (Denmark)

    Rogowska-Wrzesinska, A.

    2002-01-01

    leading to decreased growth rate, decreased glucose metabolism, decreased amino acid and protein synthesis and increased protein degradation. Some of these responses define a new type of stress that results from changes in the internal cell environment by overexpression of a membrane protein. Chapter 5...... that have been post-translationally modified by N- or C-terminal truncation and we show that this protein processing is not random and shows a specific pattern for a given yeast strain. Chapter 7 illustrates the construction of yeast proteome database and its potential application in characterising yeast...... analysis is a powerful tool to study yeast proteome and the complex proteome database gives a broad view on the molecular cell biology of yeast. The global database approach allows combining proteome data from different mutants and experiment conditions (e.g. heat stress, phosphate labelling, N...

  6. ExAtlas: An interactive online tool for meta-analysis of gene expression data.

    Science.gov (United States)

    Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H

    2015-12-01

    We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.

  7. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    IP-seq and small RNA-seq, we delineated the landscape of the promoters with bidirectional transcriptions that yield steady-state RNA in only one directions (Paper III). A subsequent motif analysis enabled us to uncover specific DNA signals – early polyA sites – that make RNA on the reverse strand sensitive...... they regulated or if the sites had global elevated usage rates by multiple TFs. Using RNA-seq, 5’end-seq in combination with depletion of 5’exonuclease as well as nonsensemediated decay (NMD) factors, we systematically analyzed NMD substrates as well as their degradation intermediates in human cells (Paper V......). Gene enrichment analysis on the detected NMD substrates revealed an unappreciated NMD-based regulatory mechanism of the genes hosting multiple intronic snoRNAs, which can facilitate differential expression of individual snoRNAs from a single host gene locus. Finally, supported by RNA-seq and small RNA-seq...

  8. Refining ensembles of predicted gene regulatory networks based on characteristic interaction sets.

    Directory of Open Access Journals (Sweden)

    Lukas Windhager

    Full Text Available Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate

  9. κMicroarray analysis of relative gene expression stability for selection of internal reference genes in the rhesus macaque brain

    Directory of Open Access Journals (Sweden)

    Urbanski Henryk F

    2010-06-01

    Full Text Available Abstract Background Normalization of gene expression data refers to the comparison of expression values using reference standards that are consistent across all conditions of an experiment. In PCR studies, genes designated as "housekeeping genes" have been used as internal reference genes under the assumption that their expression is stable and independent of experimental conditions. However, verification of this assumption is rarely performed. Here we assess the use of gene microarray analysis to facilitate selection of internal reference sequences with higher expression stability across experimental conditions than can be expected using traditional selection methods. We recently demonstrated that relative gene expression from qRT-PCR data normalized using GAPDH, ALG9 and RPL13A expression values mirrored relative expression using quantile normalization in Robust Multichip Analysis (RMA on the Affymetrix® GeneChip® rhesus Macaque Genome Array. Having shown that qRT-PCR and Affymetrix® GeneChip® data from the same hormone replacement therapy (HRT study yielded concordant results, we used quantile-normalized gene microarray data to identify the most stably expressed among probe sets for prospective internal reference genes across three brain regions from the HRT study and an additional study of normally menstruating rhesus macaques (cycle study. Gene selection was limited to 575 previously published human "housekeeping" genes. Twelve animals were used per study, and three brain regions were analyzed from each animal. Gene expression stabilities were determined using geNorm, NormFinder and BestKeeper software packages. Results Sequences co-annotated for ribosomal protein S27a (RPS27A, and ubiquitin were among the most stably expressed under all conditions and selection criteria used for both studies. Higher annotation quality on the human GeneChip® facilitated more targeted analysis than could be accomplished using the rhesus GeneChip®. In

  10. An ancient dental gene set governs development and continuous regeneration of teeth in sharks.

    Science.gov (United States)

    Rasch, Liam J; Martin, Kyle J; Cooper, Rory L; Metscher, Brian D; Underwood, Charlie J; Fraser, Gareth J

    2016-07-15

    The evolution of oral teeth is considered a major contributor to the overall success of jawed vertebrates. This is especially apparent in cartilaginous fishes including sharks and rays, which develop elaborate arrays of highly specialized teeth, organized in rows and retain the capacity for life-long regeneration. Perpetual regeneration of oral teeth has been either lost or highly reduced in many other lineages including important developmental model species, so cartilaginous fishes are uniquely suited for deep comparative analyses of tooth development and regeneration. Additionally, sharks and rays can offer crucial insights into the characters of the dentition in the ancestor of all jawed vertebrates. Despite this, tooth development and regeneration in chondrichthyans is poorly understood and remains virtually uncharacterized from a developmental genetic standpoint. Using the emerging chondrichthyan model, the catshark (Scyliorhinus spp.), we characterized the expression of genes homologous to those known to be expressed during stages of early dental competence, tooth initiation, morphogenesis, and regeneration in bony vertebrates. We have found that expression patterns of several genes from Hh, Wnt/β-catenin, Bmp and Fgf signalling pathways indicate deep conservation over ~450 million years of tooth development and regeneration. We describe how these genes participate in the initial emergence of the shark dentition and how they are redeployed during regeneration of successive tooth generations. We suggest that at the dawn of the vertebrate lineage, teeth (i) were most likely continuously regenerative structures, and (ii) utilised a core set of genes from members of key developmental signalling pathways that were instrumental in creating a dental legacy redeployed throughout vertebrate evolution. These data lay the foundation for further experimental investigations utilizing the unique regenerative capacity of chondrichthyan models to answer evolutionary

  11. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser

    Science.gov (United States)

    2010-01-01

    Background Candida species are the most common cause of opportunistic fungal infection worldwide. Recent sequencing efforts have provided a wealth of Candida genomic data. We have developed the Candida Gene Order Browser (CGOB), an online tool that aids comparative syntenic analyses of Candida species. CGOB incorporates all available Candida clade genome sequences including two Candida albicans isolates (SC5314 and WO-1) and 8 closely related species (Candida dubliniensis, Candida tropicalis, Candida parapsilosis, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis, Candida guilliermondii and Candida lusitaniae). Saccharomyces cerevisiae is also included as a reference genome. Results CGOB assignments of homology were manually curated based on sequence similarity and synteny. In total CGOB includes 65617 genes arranged into 13625 homology columns. We have also generated improved Candida gene sets by merging/removing partial genes in each genome. Interrogation of CGOB revealed that the majority of tandemly duplicated genes are under strong purifying selection in all Candida species. We identified clusters of adjacent genes involved in the same metabolic pathways (such as catabolism of biotin, galactose and N-acetyl glucosamine) and we showed that some clusters are species or lineage-specific. We also identified one example of intron gain in C. albicans. Conclusions Our analysis provides an important resource that is now available for the Candida community. CGOB is available at http://cgob.ucd.ie. PMID:20459735

  12. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser

    LENUS (Irish Health Repository)

    Fitzpatrick, David A

    2010-05-10

    Abstract Background Candida species are the most common cause of opportunistic fungal infection worldwide. Recent sequencing efforts have provided a wealth of Candida genomic data. We have developed the Candida Gene Order Browser (CGOB), an online tool that aids comparative syntenic analyses of Candida species. CGOB incorporates all available Candida clade genome sequences including two Candida albicans isolates (SC5314 and WO-1) and 8 closely related species (Candida dubliniensis, Candida tropicalis, Candida parapsilosis, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis, Candida guilliermondii and Candida lusitaniae). Saccharomyces cerevisiae is also included as a reference genome. Results CGOB assignments of homology were manually curated based on sequence similarity and synteny. In total CGOB includes 65617 genes arranged into 13625 homology columns. We have also generated improved Candida gene sets by merging\\/removing partial genes in each genome. Interrogation of CGOB revealed that the majority of tandemly duplicated genes are under strong purifying selection in all Candida species. We identified clusters of adjacent genes involved in the same metabolic pathways (such as catabolism of biotin, galactose and N-acetyl glucosamine) and we showed that some clusters are species or lineage-specific. We also identified one example of intron gain in C. albicans. Conclusions Our analysis provides an important resource that is now available for the Candida community. CGOB is available at http:\\/\\/cgob.ucd.ie.

  13. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser

    Directory of Open Access Journals (Sweden)

    Byrne Kevin P

    2010-05-01

    Full Text Available Abstract Background Candida species are the most common cause of opportunistic fungal infection worldwide. Recent sequencing efforts have provided a wealth of Candida genomic data. We have developed the Candida Gene Order Browser (CGOB, an online tool that aids comparative syntenic analyses of Candida species. CGOB incorporates all available Candida clade genome sequences including two Candida albicans isolates (SC5314 and WO-1 and 8 closely related species (Candida dubliniensis, Candida tropicalis, Candida parapsilosis, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis, Candida guilliermondii and Candida lusitaniae. Saccharomyces cerevisiae is also included as a reference genome. Results CGOB assignments of homology were manually curated based on sequence similarity and synteny. In total CGOB includes 65617 genes arranged into 13625 homology columns. We have also generated improved Candida gene sets by merging/removing partial genes in each genome. Interrogation of CGOB revealed that the majority of tandemly duplicated genes are under strong purifying selection in all Candida species. We identified clusters of adjacent genes involved in the same metabolic pathways (such as catabolism of biotin, galactose and N-acetyl glucosamine and we showed that some clusters are species or lineage-specific. We also identified one example of intron gain in C. albicans. Conclusions Our analysis provides an important resource that is now available for the Candida community. CGOB is available at http://cgob.ucd.ie.

  14. Evaluation of reference genes for gene expression analysis using quantitative RT-PCR in Azospirillum brasilense.

    Directory of Open Access Journals (Sweden)

    Mary McMillan

    Full Text Available Azospirillum brasilense is a nitrogen fixing bacterium that has been shown to have various beneficial effects on plant growth and yield. Under normal conditions A. brasilense exists in a motile flagellated form, which, under starvation or stress conditions, can undergo differentiation into an encapsulated, cyst-like form. Quantitative RT-PCR can be used to analyse changes in gene expression during this differentiation process. The accuracy of quantification of mRNA levels by qRT-PCR relies on the normalisation of data against stably expressed reference genes. No suitable set of reference genes has yet been described for A. brasilense. Here we evaluated the expression of ten candidate reference genes (16S rRNA, gapB, glyA, gyrA, proC, pykA, recA, recF, rpoD, and tpiA in wild-type and mutant A. brasilense strains under different culture conditions, including conditions that induce differentiation. Analysis with the software programs BestKeeper, NormFinder and GeNorm indicated that gyrA, glyA and recA are the most stably expressed reference genes in A. brasilense. The results also suggested that the use of two reference genes (gyrA and glyA is sufficient for effective normalisation of qRT-PCR data.

  15. Evaluation of reference genes for gene expression analysis using quantitative RT-PCR in Azospirillum brasilense.

    Science.gov (United States)

    McMillan, Mary; Pereg, Lily

    2014-01-01

    Azospirillum brasilense is a nitrogen fixing bacterium that has been shown to have various beneficial effects on plant growth and yield. Under normal conditions A. brasilense exists in a motile flagellated form, which, under starvation or stress conditions, can undergo differentiation into an encapsulated, cyst-like form. Quantitative RT-PCR can be used to analyse changes in gene expression during this differentiation process. The accuracy of quantification of mRNA levels by qRT-PCR relies on the normalisation of data against stably expressed reference genes. No suitable set of reference genes has yet been described for A. brasilense. Here we evaluated the expression of ten candidate reference genes (16S rRNA, gapB, glyA, gyrA, proC, pykA, recA, recF, rpoD, and tpiA) in wild-type and mutant A. brasilense strains under different culture conditions, including conditions that induce differentiation. Analysis with the software programs BestKeeper, NormFinder and GeNorm indicated that gyrA, glyA and recA are the most stably expressed reference genes in A. brasilense. The results also suggested that the use of two reference genes (gyrA and glyA) is sufficient for effective normalisation of qRT-PCR data.

  16. An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis.

    Science.gov (United States)

    Vashisht, Shikha; Bagler, Ganesh

    2012-01-01

    Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC) is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.

  17. An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis.

    Directory of Open Access Journals (Sweden)

    Shikha Vashisht

    Full Text Available Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.

  18. HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

    Science.gov (United States)

    Song, Chi; Tseng, George C

    2014-01-01

    Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values (rth ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

  19. Identification and functional validation of a unique set of drought induced genes preferentially expressed in response to gradual water stress in peanut.

    Science.gov (United States)

    Govind, Geetha; Harshavardhan, Vokkaliga ThammeGowda; ThammeGowda, Harshavardhan Vokkaliga; Patricia, Jayaker Kalaiarasi; Kalaiarasi, Patricia Jayaker; Dhanalakshmi, Ramachandra; Iyer, Dhanalakshmi Ramchandra; Senthil Kumar, Muthappa; Muthappa, Senthil Kumar; Sreenivasulu, Nese; Nese, Sreenivasulu; Udayakumar, Makarla; Makarla, Udaya Kumar

    2009-06-01

    Peanut, found to be relatively drought tolerant crop, has been the choice of study to characterize the genes expressed under gradual water deficit stress. Nearly 700 genes were identified to be enriched in subtractive cDNA library from gradual process of drought stress adaptation. Further, expression of the drought inducible genes related to various signaling components and gene sets involved in protecting cellular function has been described based on dot blot experiments. Fifty genes (25 regulators and 25 functional related genes) selected based on dot blot experiments were tested for their stress responsiveness using northern blot analysis and confirmed their nature of differential regulation under different field capacity of drought stress treatments. ESTs generated from this subtracted cDNA library offered a rich source of stress-related genes including signaling components. Additional 50% uncharacterized sequences are noteworthy. Insights gained from this study would provide the foundation for further studies to understand the question of how peanut plants are able to adapt to naturally occurring harsh drought conditions. At present functional validation cannot be deemed in peanut, hence as a proof of concept seven orthologues of drought induced genes of peanut have been silenced in heterologous N. benthamiana system, using virus induced gene silencing method. These results point out the functional importance for HSP70 gene and key regulators such as Jumonji in drought stress response.

  20. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    Science.gov (United States)

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

  1. Molecular analysis of the glucocerebrosidase gene locus

    Energy Technology Data Exchange (ETDEWEB)

    Winfield, S.L.; Martin, B.M.; Fandino, A. [Clinical Neuroscience Branch, Bethesda, MD (United States)] [and others

    1994-09-01

    Gaucher disease is due to a deficiency in the activity of the lysosomal enzyme glucocerebrosidase. Both the functional gene for this enzyme and a pseudogene are located in close proximity on chromosome 1q21. Analysis of the mutations present in patient samples has suggested interaction between the functional gene and the pseudogene in the origin of mutant genotypes. To investigate the involvement of regions flanking the functional gene and pseudogene in the origin of mutations found in Gaucher disease, a YAC clone containing DNA from this locus has been subcloned and characterized. The original YAC containing {approximately}360 kb was truncated with the use of fragmentation plasmids to about 85 kb. A lambda library derived from this YAC was screened to obtain clones containing glucocerebrosidase sequences. PCR amplification was used to identify subclones containing 5{prime}, central, or 3{prime} sequences of the functional gene or of the pseudogene. Clones spanning the entire distance from the last exon of the functional gene to intron 1 of the pseudogene, the 5{prime} end of the functional gene and 16 kb of 5{prime} flanking region and approximately 15 kb of 3{prime} flanking region of the pseudogene were sequenced. Sequence data from 48 kb of intergenic and flanking regions of the glucocerebrosidase gene and its pseudogene has been generated. A large number of Alu sequences and several simple repeats have been found. Two of these repeats exhibit fragment length polymorphism. There is almost 100% homology between the 3{prime} flanking regions of the functional gene and the pseudogene, extending to about 4 kb past the termination codons. A much lower degree of homology is observed in the 5{prime} flanking region. Patient samples are currently being screened for polymorphisms in these flanking regions.

  2. Exploiting gene families for phylogenomic analysis of myzostomid transcriptome data.

    Directory of Open Access Journals (Sweden)

    Stefanie Hartmann

    Full Text Available BACKGROUND: In trying to understand the evolutionary relationships of organisms, the current flood of sequence data offers great opportunities, but also reveals new challenges with regard to data quality, the selection of data for subsequent analysis, and the automation of steps that were once done manually for single-gene analyses. Even though genome or transcriptome data is available for representatives of most bilaterian phyla, some enigmatic taxa still have an uncertain position in the animal tree of life. This is especially true for myzostomids, a group of symbiotic (or parasitic protostomes that are either placed with annelids or flatworms. METHODOLOGY: Based on similarity criteria, Illumina-based transcriptome sequences of one myzostomid were compared to protein sequences of one additional myzostomid and 29 reference metazoa and clustered into gene families. These families were then used to investigate the phylogenetic position of Myzostomida using different approaches: Alignments of 989 sequence families were concatenated, and the resulting superalignment was analyzed under a Maximum Likelihood criterion. We also used all 1,878 gene trees with at least one myzostomid sequence for a supertree approach: the individual gene trees were computed and then reconciled into a species tree using gene tree parsimony. CONCLUSIONS: Superalignments require strictly orthologous genes, and both the gene selection and the widely varying amount of data available for different taxa in our dataset may cause anomalous placements and low bootstrap support. In contrast, gene tree parsimony is designed to accommodate multilocus gene families and therefore allows a much more comprehensive data set to be analyzed. Results of this supertree approach showed a well-resolved phylogeny, in which myzostomids were part of the annelid radiation, and major bilaterian taxa were found to be monophyletic.

  3. When is hub gene selection better than standard meta-analysis?

    Directory of Open Access Journals (Sweden)

    Peter Langfelder

    Full Text Available Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g., gene expression or DNA methylation data. Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility in independent data sets and often applies in clinical diagnostic or prognostic applications. We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA in three comprehensive and unbiased empirical studies: (1 Finding genes predictive of lung cancer survival, (2 finding methylation markers related to age, and (3 finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1. However, standard meta-analysis methods perform as good as (if not better than a consensus network approach in terms of validation success (criterion 2. The article also reports a comparison of meta-analysis techniques

  4. Functional annotation and identification of candidate disease genes by computational analysis of normal tissue gene expression data.

    Directory of Open Access Journals (Sweden)

    Laura Miozzi

    Full Text Available BACKGROUND: High-throughput gene expression data can predict gene function through the "guilt by association" principle: coexpressed genes are likely to be functionally associated. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG, small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin. CONCLUSIONS/SIGNIFICANCE: We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several genetic diseases of unknown molecular basis.

  5. Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks

    Directory of Open Access Journals (Sweden)

    Zwinderman Aeilko H

    2009-09-01

    Full Text Available Abstract Background We generalized penalized canonical correlation analysis for analyzing microarray gene-expression measurements for checking completeness of known metabolic pathways and identifying candidate genes for incorporation in the pathway. We used Wold's method for calculation of the canonical variates, and we applied ridge penalization to the regression of pathway genes on canonical variates of the non-pathway genes, and the elastic net to the regression of non-pathway genes on the canonical variates of the pathway genes. Results We performed a small simulation to illustrate the model's capability to identify new candidate genes to incorporate in the pathway: in our simulations it appeared that a gene was correctly identified if the correlation with the pathway genes was 0.3 or more. We applied the methods to a gene-expression microarray data set of 12, 209 genes measured in 45 patients with glioblastoma, and we considered genes to incorporate in the glioma-pathway: we identified more than 25 genes that correlated > 0.9 with canonical variates of the pathway genes. Conclusion We concluded that penalized canonical correlation analysis is a powerful tool to identify candidate genes in pathway analysis.

  6. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    Directory of Open Access Journals (Sweden)

    Ueki Masao

    2012-05-01

    Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.

  7. GRYFUN: a web application for GO term annotation visualization and analysis in protein sets.

    Science.gov (United States)

    Bastos, Hugo P; Sousa, Lisete; Clarke, Luka A; Couto, Francisco M

    2015-01-01

    Functional context for biological sequence is provided in the form of annotations. However, within a group of similar sequences there can be annotation heterogeneity in terms of coverage and specificity. This in turn can introduce issues regarding the interpretation of actual functional similarity and overall functional coherence of such a group. One way to mitigate such issues is through the use of visualization and statistical techniques. Therefore, in order to help interpret this annotation heterogeneity we created a web application that generates Gene Ontology annotation graphs for protein sets and their associated statistics from simple frequencies to enrichment values and Information Content based metrics. The publicly accessible website http://xldb.di.fc.ul.pt/gryfun/ currently accepts lists of UniProt accession numbers in order to create user-defined protein sets for subsequent annotation visualization and statistical assessment. GRYFUN is a freely available web application that allows GO annotation visualization of protein sets and which can be used for annotation coherence and cohesiveness analysis and annotation extension assessments within under-annotated protein sets.

  8. HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

    Science.gov (United States)

    Le, Thanh; Altman, Tom; Gardiner, Katheleen

    2010-02-01

    Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

  9. Association analysis of chromosome 1 migraine candidate genes

    Directory of Open Access Journals (Sweden)

    MacMillan John

    2007-08-01

    Full Text Available Abstract Background Migraine with aura (MA is a subtype of typical migraine. Migraine with aura (MA also encompasses a rare severe subtype Familial Hemiplegic Migraine (FHM with several known genetic loci. The type 2 FHM (FHM-2 susceptibility locus maps to chromosome 1q23 and mutations in the ATP1A2 gene at this site have recently been implicated. We have previously provided evidence of linkage of typical migraine (predominantly MA to microsatellite markers on chromosome 1, in the 1q31 and 1q23 regions. In this study, we have undertaken a large genomic investigation involving candidate genes that lie within the chromosome 1q23 and 1q31 regions using an association analysis approach. Methods We have genotyped a large population of case-controls (243 unrelated Caucasian migraineurs versus 243 controls examining a set of 5 single nucleotide polymorphisms (SNPs and the Fas Ligand dinucleotide repeat marker, located within the chromosome 1q23 and 1q31 regions. Results Several genes have been studied including membrane protein (ATP 1 subtype A4 and FasL, cytoplasmic glycoprotein (CASQ 1 genes and potassium (KCN J9 and KCN J10 and calcium (CACNA1E channel genes in 243 migraineurs (including 85% MA and 15% of migraine without aura (MO and 243 matched controls. After correction for multiple testing, chi-square results showed non-significant P values (P > 0.008 across all SNPs (and a CA repeat tested in these different genes, however results with the KCN J10 marker gave interesting results (P = 0.02 that may be worth exploring further in other populations. Conclusion These results do not show a significant role for the tested candidate gene variants and also do not support the hypothesis that a common chromosome 1 defective gene influences both FHM and the more common forms of migraine.

  10. Higher primates, but not New World monkeys, have a duplicate set of enhancers flanking their apoC-I genes.

    Science.gov (United States)

    Puppione, Donald L

    2014-09-01

    Previous studies have demonstrated that the apoC-I gene and its pseudogene on human chromosome 19 are flanked by a duplicate set of enhancers. Multienhancers, ME.1 and ME.2, are located upstream from the genes and the hepatic control region enhancers, HCR.1 and HCR.2, are located downstream. The duplication of the enhancers has been thought to have occurred when the apoC-I gene was duplicated during primate evolution. Currently, the only primate data are for the human enhancers. Examining the genome of other primates (great and lesser apes, Old and New World monkeys), it was possible to locate the duplicate set of enhancers in apes and Old World monkeys. However, only a single set was found in New World monkeys. These observations provide additional evidence that the apoC-I gene and the flanking enhancers underwent duplication after the divergence of Old and New World monkeys.

  11. Gene expression analysis of flax seed development

    Directory of Open Access Journals (Sweden)

    Sharpe Andrew

    2011-04-01

    Full Text Available Abstract Background Flax, Linum usitatissimum L., is an important crop whose seed oil and stem fiber have multiple industrial applications. Flax seeds are also well-known for their nutritional attributes, viz., omega-3 fatty acids in the oil and lignans and mucilage from the seed coat. In spite of the importance of this crop, there are few molecular resources that can be utilized toward improving seed traits. Here, we describe flax embryo and seed development and generation of comprehensive genomic resources for the flax seed. Results We describe a large-scale generation and analysis of expressed sequences in various tissues. Collectively, the 13 libraries we have used provide a broad representation of genes active in developing embryos (globular, heart, torpedo, cotyledon and mature stages seed coats (globular and torpedo stages and endosperm (pooled globular to torpedo stages and genes expressed in flowers, etiolated seedlings, leaves, and stem tissue. A total of 261,272 expressed sequence tags (EST (GenBank accessions LIBEST_026995 to LIBEST_027011 were generated. These EST libraries included transcription factor genes that are typically expressed at low levels, indicating that the depth is adequate for in silico expression analysis. Assembly of the ESTs resulted in 30,640 unigenes and 82% of these could be identified on the basis of homology to known and hypothetical genes from other plants. When compared with fully sequenced plant genomes, the flax unigenes resembled poplar and castor bean more than grape, sorghum, rice or Arabidopsis. Nearly one-fifth of these (5,152 had no homologs in sequences reported for any organism, suggesting that this category represents genes that are likely unique to flax. Digital analyses revealed gene expression dynamics for the biosynthesis of a number of important seed constituents during seed development. Conclusions We have developed a foundational database of expressed sequences and collection of plasmid

  12. Large Scale Gene Expression Meta-Analysis Reveals Tissue-Specific, Sex-Biased Gene Expression in Humans

    Science.gov (United States)

    Mayne, Benjamin T.; Bianco-Miotto, Tina; Buckberry, Sam; Breen, James; Clifton, Vicki; Shoubridge, Cheryl; Roberts, Claire T.

    2016-01-01

    The severity and prevalence of many diseases are known to differ between the sexes. Organ specific sex-biased gene expression may underpin these and other sexually dimorphic traits. To further our understanding of sex differences in transcriptional regulation, we performed meta-analyses of sex biased gene expression in multiple human tissues. We analyzed 22 publicly available human gene expression microarray data sets including over 2500 samples from 15 different tissues and 9 different organs. Briefly, by using an inverse-variance method we determined the effect size difference of gene expression between males and females. We found the greatest sex differences in gene expression in the brain, specifically in the anterior cingulate cortex, (1818 genes), followed by the heart (375 genes), kidney (224 genes), colon (218 genes), and thyroid (163 genes). More interestingly, we found different parts of the brain with varying numbers and identity of sex-biased genes, indicating that specific cortical regions may influence sexually dimorphic traits. The majority of sex-biased genes in other tissues such as the bladder, liver, lungs, and pancreas were on the sex chromosomes or involved in sex hormone production. On average in each tissue, 32% of autosomal genes that were expressed in a sex-biased fashion contained androgen or estrogen hormone response elements. Interestingly, across all tissues, we found approximately two-thirds of autosomal genes that were sex-biased were not under direct influence of sex hormones. To our knowledge this is the largest analysis of sex-biased gene expression in human tissues to date. We identified many sex-biased genes that were not under the direct influence of sex chromosome genes or sex hormones. These may provide targets for future development of sex-specific treatments for diseases.

  13. Large scale gene expression meta-analysis reveals tissue-specific, sex-biased gene expression in humans

    Directory of Open Access Journals (Sweden)

    Benjamin Mayne

    2016-10-01

    Full Text Available The severity and prevalence of many diseases are known to differ between the sexes. Organ specific sex-biased gene expression may underpin these and other sexually dimorphic traits. To further our understanding of sex differences in transcriptional regulation, we performed meta-analyses of sex biased gene expression in multiple human tissues. We analysed 22 publicly available human gene expression microarray data sets including over 2500 samples from 15 different tissues and 9 different organs. Briefly, by using an inverse-variance method we determined the effect size difference of gene expression between males and females. We found the greatest sex differences in gene expression in the brain, specifically in the anterior cingulate cortex, (1818 genes, followed by the heart (375 genes, kidney (224 genes, colon (218 genes and thyroid (163 genes. More interestingly, we found different parts of the brain with varying numbers and identity of sex-biased genes, indicating that specific cortical regions may influence sexually dimorphic traits. The majority of sex-biased genes in other tissues such as the bladder, liver, lungs and pancreas were on the sex chromosomes or involved in sex hormone production. On average in each tissue, 32% of autosomal genes that were expressed in a sex-biased fashion contained androgen or estrogen hormone response elements. Interestingly, across all tissues, we found approximately two-thirds of autosomal genes that were sex-biased were not under direct influence of sex hormones. To our knowledge this is the largest analysis of sex-biased gene expression in human tissues to date. We identified many sex-biased genes that were not under the direct influence of sex chromosome genes or sex hormones. These may provide targets for future development of sex-specific treatments for diseases.

  14. Functional analysis of plastid-encoded genes

    OpenAIRE

    Swiatek, Magdalena

    2002-01-01

    Plastid chromosomes from the variety of plant species contain several conserved open reading frames of unknown function, which most probably represent functional genes. The primary aim of this thesis was the analysis of the role of two such ORFs, designated ycfs or hypothetical chloroplast reading frames, namely ycf9 (ORF62) and ycf10 (ORF229, cemA). Both were analyzed in Nicotiana tabacum (tobacco) via their inactivation using biolistic plastid transformation. A new experiment...

  15. Computational Analysis of PTEN Gene Mutation

    Directory of Open Access Journals (Sweden)

    Siew-Kien Mah

    2012-01-01

    Full Text Available Post-genomic data can be efficiently analyzed using computational tools. It has the advantage over the biochemical and biophysical methods in term of higher coverage. In this research, we adopted a computational analysis on PTEN gene mutation.  Mutation in PTEN is responsible for many human diseases. The results of this research provide insights into the protein domains of PTEN and the distribution of mutation.

  16. SAP domain-dependent Mkl1 signaling stimulates proliferation and cell migration by induction of a distinct gene set indicative of poor prognosis in breast cancer patients

    Science.gov (United States)

    2014-01-01

    Background The main cause of death of breast cancer patients is not the primary tumor itself but the metastatic disease. Identifying breast cancer-specific signatures for metastasis and learning more about the nature of the genes involved in the metastatic process would 1) improve our understanding of the mechanisms of cancer progression and 2) reveal new therapeutic targets. Previous studies showed that the transcriptional regulator megakaryoblastic leukemia-1 (Mkl1) induces tenascin-C expression in normal and transformed mammary epithelial cells. Tenascin-C is known to be expressed in metastatic niches, is highly induced in cancer stroma and promotes breast cancer metastasis to the lung. Methods Using HC11 mammary epithelial cells overexpressing different Mkl1 constructs, we devised a subtractive transcript profiling screen to identify the mechanism by which Mkl1 induces a gene set co-regulated with tenascin-C. We performed computational analysis of the Mkl1 target genes and used cell biological experiments to confirm the effect of these gene products on cell behavior. To analyze whether this gene set is prognostic of accelerated cancer progression in human patients, we used the bioinformatics tool GOBO that allowed us to investigate a large breast tumor data set linked to patient data. Results We discovered a breast cancer-specific set of genes including tenascin-C, which is regulated by Mkl1 in a SAP domain-dependent, serum response factor-independent manner and is strongly implicated in cell proliferation, cell motility and cancer. Downregulation of this set of transcripts by overexpression of Mkl1 lacking the SAP domain inhibited cell growth and cell migration. Many of these genes are direct Mkl1 targets since their promoter-reporter constructs were induced by Mkl1 in a SAP domain-dependent manner. Transcripts, most strongly reduced in the absence of the SAP domain were mechanoresponsive. Finally, expression of this gene set is associated with high

  17. Some new thin sets of integers in Harmonic Analysis

    CERN Document Server

    Li, Daniel; Rodriguez-Piazza, Luis

    2009-01-01

    We randomly construct various subsets $\\Lambda$ of the integers which have both smallness and largeness properties. They are small since they are very close, in various meanings, to Sidon sets: the continuous functions with spectrum in $\\Lambda$ have uniformly convergent series, and their Fourier coefficients are in $\\ell_p$ for all $p>1$; moreover, all the Lebesgue spaces $L^q_\\Lambda$ are equal for $q<+\\infty$. On the other hand, they are large in the sense that they are dense in the Bohr group and that the space of the bounded functions with spectrum in $\\Lambda$ is non separable. So these sets are very different from the thin sets of integers previously known.

  18. Separate enrichment analysis of pathways for up- and downregulated genes.

    Science.gov (United States)

    Hong, Guini; Zhang, Wenjing; Li, Hongdong; Shen, Xiaopei; Guo, Zheng

    2014-03-06

    Two strategies are often adopted for enrichment analysis of pathways: the analysis of all differentially expressed (DE) genes together or the analysis of up- and downregulated genes separately. However, few studies have examined the rationales of these enrichment analysis strategies. Using both microarray and RNA-seq data, we show that gene pairs with functional links in pathways tended to have positively correlated expression levels, which could result in an imbalance between the up- and downregulated genes in particular pathways. We then show that the imbalance could greatly reduce the statistical power for finding disease-associated pathways through the analysis of all-DE genes. Further, using gene expression profiles from five types of tumours, we illustrate that the separate analysis of up- and downregulated genes could identify more pathways that are really pertinent to phenotypic difference. In conclusion, analysing up- and downregulated genes separately is more powerful than analysing all of the DE genes together.

  19. A set of vectors with a tetracycline-regulatable promoter system for modulated gene expression in Saccharomyces cerevisiae.

    Science.gov (United States)

    Garí, E; Piedrafita, L; Aldea, M; Herrero, E

    1997-07-01

    A set of Saccharomyces cerevisiae expression vectors has been developed in which transcription is driven by a hybrid tetO-CYC1 promoter through the action of a tetR-VP16 (tTA) activator. Expression from the promoter is regulated by tetracycline or derivatives. Various modalities of promoter and activator are used in order to achieve different levels of maximal expression. In the presence of antibiotic in the growth medium at concentrations that do not affect cell growth, expression from the tetO promoter is negligible, and upon antibiotic removal induction ratios of up to 1000-fold are observed with a lacZ reporter system. With the strongest system, overexpression levels comparable with those observed with GAL1-driven promoters are reached. For each particular promoter/tTA combination, expression can be modulated by changing the tetracycline concentration in the growth medium. These vectors may be useful for the study of the function of essential genes in yeast, as well as for phenotypic analysis of genes in overexpression conditions, without restrictions imposed by growth medium composition.

  20. Transcriptional analysis of Pleurotus ostreatus laccase genes.

    Science.gov (United States)

    Pezzella, Cinzia; Lettera, Vincenzo; Piscitelli, Alessandra; Giardina, Paola; Sannia, Giovanni

    2013-01-01

    Fungal laccases (p-diphenol:oxygen oxidoreductase; EC 1.10.3.2) are multi-copper-containing oxidases that catalyse the oxidation of a great variety of phenolic compounds and aromatic amines through simultaneous reduction of molecular oxygen to water. Fungi generally produce several laccase isoenzymes encoded by complex multi-gene families. The Pleurotus ostreatus genome encodes 11 putative laccase coding genes, and only six different laccase isoenzymes have been isolated and characterised so far. Laccase expression was found to be regulated by culture conditions and developmental stages even if the redundancy of these genes still raises the question about their respective functions in vivo. In this context, laccase transcript profiling analysis has been used to unravel the physiological role played by the different isoforms produced by P. ostreatus. Even if reported results depict a complex picture of the transcriptional responses exhibited by the analysed laccase genes, they were allowed to speculate on the isoform role in vivo. Among the produced laccases, LACC10 (POXC) seems to play a major role during vegetative growth, since its transcription is downregulated when the fungus starts the fructification process. Furthermore, a new tessera has been added to the puzzling mosaic of the heterodimeric laccase LACC2 (POXA3). LACC2 small subunit seems to play an additional physiological role during fructification, beside that of LACC2 complex activation/stabilisation.

  1. Analysis of gene expression in rabbit muscle

    Directory of Open Access Journals (Sweden)

    Alena Gálová

    2014-02-01

    Full Text Available Increasing consumer knowledge of the link between diet and health has raised the demand for high quality food. Meat and meat products may be considered as irreplaceable in human nutrition. Breeding livestock to higher content of lean meat and the use of modern hybrids entails problems with the quality of meat. Analysing of livestock genomes could get us a great deal of important information, which may significantly affect the improvement process. Domestic animals are invaluable resources for study of the molecular architecture of complex traits. Although the mapping of quantitative trait loci (QTL responsible for economically important traits in domestic animals has achieved remarkable results in recent decades, not all of the genetic variation in the complex traits has been captured because of the low density of markers used in QTL mapping studies. The genome wide association study (GWAS, which utilizes high-density single-nucleotide polymorphism (SNP, provides a new way to tackle this issue. New technologies now allow producing microarrays containing thousands of hybridization probes on a single membrane or other solid support. We used microarray analysis to study gene expression in rabbit muscle during different developmental age stages. The outputs from GeneSpring GX sotware are presented in this work. After the evaluation of gene expression in rabbits, will be selected genes of interest in relation to meat quality parameters and will be further analyzed by the available methods of molecular biology and genetics.

  2. A game-theoretic perspective on rough set analysis

    Institute of Scientific and Technical Information of China (English)

    YAO Jing-tao; HERBERT Joseph P

    2008-01-01

    Determining the correct threshold values for the probabilistic rough set approaches has been a heated issue among the community. Existing techniques offer no way in guaranteeing that the calculated values optimize the classification ability of the decision rules derived from this configuration. This article will formulate a game theoretic approach to calculating these thresholds to ensure correct approximation region size. Using payoff tables created from approximation measures and modified conditional risk strategies, we provide the user with tolerance levels for their loss functions. Using the tolerance values, new thresholds are calculated to provide correct classification regions. This will aid in determining a set of optimal region threshold values for decision making.

  3. Lists2Networks: Integrated analysis of gene/protein lists

    Directory of Open Access Journals (Sweden)

    Ma'ayan Avi

    2010-02-01

    Full Text Available Abstract Background Systems biologists are faced with the difficultly of analyzing results from large-scale studies that profile the activity of many genes, RNAs and proteins, applied in different experiments, under different conditions, and reported in different publications. To address this challenge it is desirable to compare the results from different related studies such as mRNA expression microarrays, genome-wide ChIP-X, RNAi screens, proteomics and phosphoproteomics experiments in a coherent global framework. In addition, linking high-content multilayered experimental results with prior biological knowledge can be useful for identifying functional themes and form novel hypotheses. Results We present Lists2Networks, a web-based system that allows users to upload lists of mammalian genes/proteins onto a server-based program for integrated analysis. The system includes web-based tools to manipulate lists with different set operations, to expand lists using existing mammalian networks of protein-protein interactions, co-expression correlation, or background knowledge co-annotation correlation, as well as to apply gene-list enrichment analyses against many gene-list libraries of prior biological knowledge such as pathways, gene ontology terms, kinase-substrate, microRNA-mRAN, and protein-protein interactions, metabolites, and protein domains. Such analyses can be applied to several lists at once against many prior knowledge libraries of gene-lists associated with specific annotations. The system also contains features that allow users to export networks and share lists with other users of the system. Conclusions Lists2Networks is a user friendly web-based software system expected to significantly ease the computational analysis process for experimental systems biologists employing high-throughput experiments at multiple layers of regulation. The system is freely available at http://www.lists2networks.org.

  4. Functional Analysis and Treatment of Elopement across Two School Settings

    Science.gov (United States)

    Lang, Russell; Davis, Tonya; O'Reilly, Mark; Machalicek, Wendy; Rispoli, Mandy; Sigafoos, Jeff; Lancioni, Giulio; Regester, April

    2010-01-01

    The elopement of a child with Asperger syndrome was assessed using functional analyses and was treated in two school settings (classroom and resource room). Functional analyses indicated that elopement was maintained by access to attention in the resource room and obtaining a preferred activity in the classroom. Attention- and tangible-based…

  5. Gas hydrates and magnetism : comparative geological settings for diagenetic analysis

    Energy Technology Data Exchange (ETDEWEB)

    Esteban, L.; Enkin, R.J. [Natural Resources Canada, Sidney, BC (Canada). Geological Survey of Canada; Hamilton, T. [Camosun College, Victoria, BC (Canada)

    2008-07-01

    Geophysical and geochemical methods assist in locating and quantifying natural gas hydrate deposits. They are also useful in understanding these resources, their climate impacts and their potential role in geohazards. In order to understand the mechanisms of gas hydrate formation and its natural distribution in sediments, magnetic studies were conducted on cores from three different geological settings. This paper presented the results of a detailed magnetic investigation, as well as petrological observations, that were conducted on cores from a permafrost setting in the Mackenzie Delta located in the Canadian Northwest Territories Mallik region, and two marine settings, from the Cascadia margin off Vancouver Island and the Indian National Gas Hydrate Program from the Bengal Fan. The paper provided background information on the permafrost setting in Mallik region of the Mackenzie Delta as well as the Cascadia margin. The magnetic properties of gas hydrate bearing sediments were found to be a combination of the original detrital content and the diagenetic transformations of iron minerals caused by the unique environment produced by gas hydrate formation. The availability of methane to provide food for bacteria coupled with the concentration of solutes outside gas hydrate accumulation zones led to the creation of iron sulphides. These new minerals were observable using magnetic techniques, which help in delineating the gas hydrate formation mechanism and may be developed into new geophysical methods of gas hydrate exploration. 7 refs., 7 figs.

  6. Health care priority setting in Norway a multicriteria decision analysis

    NARCIS (Netherlands)

    Defechereux, T.; Paolucci, F.; Mirelman, A.; Youngkong, S.; Botten, G.; Hagen, T.P.; Niessen, L.W.

    2012-01-01

    BACKGROUND: Priority setting in population health is increasingly based on explicitly formulated values. The Patients Rights Act of the Norwegian tax-based health service guaranties all citizens health care in case of a severe illness, a proven health benefit, and proportionality between need and tr

  7. Design and experimental application of a novel non-degenerate universal primer set that amplifies prokaryotic 16S rRNA genes with a low possibility to amplify eukaryotic rRNA genes.

    Science.gov (United States)

    Mori, Hiroshi; Maruyama, Fumito; Kato, Hiromi; Toyoda, Atsushi; Dozono, Ayumi; Ohtsubo, Yoshiyuki; Nagata, Yuji; Fujiyama, Asao; Tsuda, Masataka; Kurokawa, Ken

    2014-01-01

    The deep sequencing of 16S rRNA genes amplified by universal primers has revolutionized our understanding of microbial communities by allowing the characterization of the diversity of the uncultured majority. However, some universal primers also amplify eukaryotic rRNA genes, leading to a decrease in the efficiency of sequencing of prokaryotic 16S rRNA genes with possible mischaracterization of the diversity in the microbial community. In this study, we compared 16S rRNA gene sequences from genome-sequenced strains and identified candidates for non-degenerate universal primers that could be used for the amplification of prokaryotic 16S rRNA genes. The 50 identified candidates were investigated to calculate their coverage for prokaryotic and eukaryotic rRNA genes, including those from uncultured taxa and eukaryotic organelles, and a novel universal primer set, 342F-806R, covering many prokaryotic, but not eukaryotic, rRNA genes was identified. This primer set was validated by the amplification of 16S rRNA genes from a soil metagenomic sample and subsequent pyrosequencing using the Roche 454 platform. The same sample was also used for pyrosequencing of the amplicons by employing a commonly used primer set, 338F-533R, and for shotgun metagenomic sequencing using the Illumina platform. Our comparison of the taxonomic compositions inferred by the three sequencing experiments indicated that the non-degenerate 342F-806R primer set can characterize the taxonomic composition of the microbial community without substantial bias, and is highly expected to be applicable to the analysis of a wide variety of microbial communities.

  8. Phylogenetic and evolutionary analysis of NBS-encoding genes in Rutaceae fruit crops.

    Science.gov (United States)

    Xu, Qiang; Biswas, Manosh Kumar; Lan, Hong; Zeng, Wenfang; Liu, Chaoyang; Xu, Jidi; Deng, Xiuxin

    2011-02-01

    The nucleotide-binding site leucine-rich repeat (NBS-LRR) genes are the largest class of disease resistance genes in plants. However, our understanding of the evolution of NBS-LRR genes in Rutaceae fruit crops is rather limited. We report an evolutionary study of 103 NBS-encoding genes isolated from Poncirus trifoliata (trifoliate orange), Citrus reticulata (tangerine) and their F(1) progeny. In all, 58 of the sequences contained a continuous open reading frame. Phylogenetic analysis classified the 58 NBS genes into nine clades, eight of which were genus specific. This was taken to imply that most of the ancestors of these NBS genes evolved after the genus split. The motif pattern of the 58 NBS-encoding genes was consistent with their phylogenetic profile. An extended phylogenetic analysis, incorporating citrus NBS genes from the public database, classified 95 citrus NBS genes into six clades, half of which were genus specific. RFLP analysis showed that citrus NBS-encoding genes have been evolving rapidly, and that they are unstable when passed through an intergeneric cross. Of 32 NBS-encoding genes tracked by gene-specific PCR, 24 showed segregation distortion among a set of 94 F(1) individuals. This study provides new insight into the evolution of Rutaceae NBS genes and their behaviour following an intergeneric cross.

  9. A comparative analysis of biclustering algorithms for gene expression data.

    Science.gov (United States)

    Eren, Kemal; Deveci, Mehmet; Küçüktunç, Onur; Çatalyürek, Ümit V

    2013-05-01

    The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.

  10. Correlation of a set of gene variants, life events and personality features on adult ADHD severity.

    Science.gov (United States)

    Müller, Daniel J; Chiesa, Alberto; Mandelli, Laura; De Luca, Vincenzo; De Ronchi, Diana; Jain, Umesh; Serretti, Alessandro; Kennedy, James L

    2010-07-01

    Increasing evidence suggests that symptoms of attention deficit hyperactivity disorder (ADHD) could persist into adult life in a substantial proportion of cases. The aim of the present study was to investigate the impact of (1) adverse events, (2) personality traits and (3) genetic variants chosen on the basis of previous findings and (4) their possible interactions on adult ADHD severity. One hundred and ten individuals diagnosed with adult ADHD were evaluated for occurrence of adverse events in childhood and adulthood, and personality traits by the Temperament and Character Inventory (TCI). Common polymorphisms within a set of nine important candidate genes (SLC6A3, DBH, DRD4, DRD5, HTR2A, CHRNA7, BDNF, PRKG1 and TAAR9) were genotyped for each subject. Life events, personality traits and genetic variations were analyzed in relationship to severity of current symptoms, according to the Brown Attention Deficit Disorder Scale (BADDS). Genetic variations were not significantly associated with severity of ADHD symptoms. Life stressors displayed only a minor effect as compared to personality traits. Indeed, symptoms' severity was significantly correlated with the temperamental trait of Harm avoidance and the character trait of Self directedness. The results of the present work are in line with previous evidence of a significant correlation between some personality traits and adult ADHD. However, several limitations such as the small sample size and the exclusion of patients with other severe comorbid psychiatric disorders could have influenced the significance of present findings.

  11. The Rules of Standard Setting Organizations: an Empirical Analysis

    OpenAIRE

    Chiao, Benjamin; Lerner, Josh; Tirole, Jean

    2006-01-01

    This paper empirically explores the procedures employed by standard-setting organizations. Consistent with Lerner-Tirole (2004), we find (a) a negative relationship between the extent to which an SSO is oriented to technology sponsors and the concession level required of sponsors and (b) a positive correlation between the sponsor-friendliness of the selected SSO and the quality of the standard. We also develop and test two extensions of the earlier model: the presence of provisions mandating ...

  12. Vector optimization set-valued and variational analysis

    CERN Document Server

    Chen, Guang-ya; Yang, Xiaogi

    2005-01-01

    This book is devoted to vector or multiple criteria approaches in optimization. Topics covered include: vector optimization, vector variational inequalities, vector variational principles, vector minmax inequalities and vector equilibrium problems. In particular, problems with variable ordering relations and set-valued mappings are treated. The nonlinear scalarization method is extensively used throughout the book to deal with various vector-related problems. The results presented are original and should be interesting to researchers and graduates in applied mathematics and operations research

  13. Analysis of the real EADGENE data set::Multivariate approaches and post analysis

    OpenAIRE

    Schuberth Hans-Joachim; van Schothorst Evert M; Lund Mogens; San Cristobal Magali; Robert-Granié Christèle; Pool Marco H; Petzl Wolfram; Nie Haisheng; Cao Kim-Anh; de Koning Dirk-Jan; Jiang Li; Jensen Kirsty; Hulsegge Ina; Jaffrézic Florence; Hornshøj Henrik

    2007-01-01

    Abstract The aim of this paper was to describe, and when possible compare, the multivariate methods used by the participants in the EADGENE WP1.4 workshop. The first approach was for class discovery and class prediction using evidence from the data at hand. Several teams used hierarchical clustering (HC) or principal component analysis (PCA) to identify groups of differentially expressed genes with a similar expression pattern over time points and infective agent (E. coli or S. aureus). The m...

  14. Epistemic Analysis of Strategic Games with Arbitrary Strategy Sets

    NARCIS (Netherlands)

    Apt, K.R.; Samet, D.

    2007-01-01

    We provide here an epistemic analysis of arbitrary strategic games based on the possibility correspondences. Such an analysis calls for the use of transfinite iterations of the corresponding operators. Our approach is based on Tarski’s Fixpoint Theorem and applies both to the notions of rationalizab

  15. Epistemic Analysis of Strategic Games with Arbitrary Strategy Sets

    CERN Document Server

    Apt, Krzysztof R

    2007-01-01

    We provide here an epistemic analysis of arbitrary strategic games based on the possibility correspondences. Such an analysis calls for the use of transfinite iterations of the corresponding operators. Our approach is based on Tarski's Fixpoint Theorem and applies both to the notions of rationalizability and the iterated elimination of strictly dominated strategies.

  16. Performance of single and concatenated sets of mitochondrial genes at inferring metazoan relationships relative to full mitogenome data.

    Directory of Open Access Journals (Sweden)

    Justin C Havird

    Full Text Available Mitochondrial (mt genes are some of the most popular and widely-utilized genetic loci in phylogenetic studies of metazoan taxa. However, their linked nature has raised questions on whether using the entire mitogenome for phylogenetics is overkill (at best or pseudoreplication (at worst. Moreover, no studies have addressed the comparative phylogenetic utility of mitochondrial genes across individual lineages within the entire Metazoa. To comment on the phylogenetic utility of individual mt genes as well as concatenated subsets of genes, we analyzed mitogenomic data from 1865 metazoan taxa in 372 separate lineages spanning genera to subphyla. Specifically, phylogenies inferred from these datasets were statistically compared to ones generated from all 13 mt protein-coding (PC genes (i.e., the "supergene" set to determine which single genes performed "best" at, and the minimum number of genes required to, recover the "supergene" topology. Surprisingly, the popular marker COX1 performed poorest, while ND5, ND4, and ND2 were most likely to reproduce the "supergene" topology. Averaged across all lineages, the longest ∼2 mt PC genes were sufficient to recreate the "supergene" topology, although this average increased to ∼5 genes for datasets with 40 or more taxa. Furthermore, concatenation of the three "best" performing mt PC genes outperformed that of the three longest mt PC genes (i.e, ND5, COX1, and ND4. Taken together, while not all mt PC genes are equally interchangeable in phylogenetic studies of the metazoans, some subset can serve as a proxy for the 13 mt PC genes. However, the exact number and identity of these genes is specific to the lineage in question and cannot be applied indiscriminately across the Metazoa.

  17. A set of vectors for introduction of antibiotic resistance genes by in vitro Cre-mediated recombination

    OpenAIRE

    Vassetzky Yegor S; Dmitriev Petr V

    2008-01-01

    Abstract Background Introduction of new antibiotic resistance genes in the plasmids of interest is a frequent task in molecular cloning practice. Classical approaches involving digestion with restriction endonucleases and ligation are time-consuming. Findings We have created a set of insertion vectors (pINS) carrying genes that provide resistance to various antibiotics (puromycin, blasticidin and G418) and containing a loxP site. Each vector (pINS-Puro, pINS-Blast or pINS-Neo) contains either...

  18. Linking Hematopoietic Differentiation to Co-Expressed Sets of Pluripotency-Associated and Imprinted Genes and to Regulatory microRNA-Transcription Factor Motifs

    Science.gov (United States)

    Hamed, Mohamed; Trumm, Johannes; Spaniol, Christian; Sethi, Riccha; Irhimeh, Mohammad R.; Fuellen, Georg; Paulsen, Martina

    2017-01-01

    Maintenance of cell pluripotency, differentiation, and reprogramming are regulated by complex gene regulatory networks (GRNs) including monoallelically-expressed imprinted genes. Besides transcriptional control, epigenetic modifications and microRNAs contribute to cellular differentiation. As a model system for studying the capacity of cells to preserve their pluripotency state and the onset of differentiation and subsequent specialization, murine hematopoiesis was used and compared to embryonic stem cells (ESCs) as a control. Using published microarray data, the expression profiles of two sets of genes, pluripotent and imprinted, were compared to a third set of known hematopoietic genes. We found that more than half of the pluripotent and imprinted genes are clearly upregulated in ESCs but subsequently repressed during hematopoiesis. The remaining genes were either upregulated in hematopoietic progenitors or in differentiated blood cells. The three gene sets each consist of three similarly behaving gene groups with similar expression profiles in various lineages of the hematopoietic system as well as in ESCs. To explain this co-regulation behavior, we explored the transcriptional and post-transcriptional mechanisms of pluripotent and imprinted genes and their regulator/target miRNAs in six different hematopoietic lineages. Therewith, lineage-specific transcription factor (TF)-miRNA regulatory networks were generated and their topologies and functional impacts during hematopoiesis were analyzed. This led to the identification of TF-miRNA co-regulatory motifs, for which we validated the contribution to the cellular development of the corresponding lineage in terms of statistical significance and relevance to biological evidence. This analysis also identified key miRNAs and TFs/genes that might play important roles in the derived lineage networks. These molecular associations suggest new aspects of the cellular regulation of the onset of cellular differentiation and

  19. Identification and Analysis of the SET-Domain Family in Silkworm, Bombyx mori

    Directory of Open Access Journals (Sweden)

    Hailong Zhao

    2015-01-01

    Full Text Available As an important economic insect, Bombyx mori is also a useful model organism for lepidopteran insect. SET-domain-containing proteins belong to a group of enzymes named after a common domain that utilizes the cofactor S-adenosyl-L-methionine (SAM to achieve methylation of its substrates. Many SET-domain-containing proteins have been shown to display catalytic activity towards particular lysine residues on histones, but emerging evidence also indicates that various nonhistone proteins are specifically targeted by this clade of enzymes. To explore their diverse functions of SET-domain superfamily in insect, we identified, cloned, and analyzed the SET-domains proteins in silkworm, Bombyx mori. Firstly, 24 genes containing SET domain from silkworm genome were characterized and 17 of them belonged to six subfamilies of SUV39, SET1, SET2, SUV4-20, EZ, and SMYD. Secondly, SET domains of silkworm SET-domain family were intraspecifically and interspecifically conserved, especially for the catalytic core “NHSC” motif, substrate binding site, and catalytic site in the SET domain. Lastly, further analyses indicated that silkworm SET-domain gene BmSu(var3-9 owned different characterization and expression profiles compared to other invertebrates. Overall, our results provide a new insight into the functional and evolutionary features of SET-domain family.

  20. Identification and Analysis of the SET-Domain Family in Silkworm, Bombyx mori.

    Science.gov (United States)

    Zhao, Hailong; Zheng, Chunqin; Cui, Hongjuan

    2015-01-01

    As an important economic insect, Bombyx mori is also a useful model organism for lepidopteran insect. SET-domain-containing proteins belong to a group of enzymes named after a common domain that utilizes the cofactor S-adenosyl-L-methionine (SAM) to achieve methylation of its substrates. Many SET-domain-containing proteins have been shown to display catalytic activity towards particular lysine residues on histones, but emerging evidence also indicates that various nonhistone proteins are specifically targeted by this clade of enzymes. To explore their diverse functions of SET-domain superfamily in insect, we identified, cloned, and analyzed the SET-domains proteins in silkworm, Bombyx mori. Firstly, 24 genes containing SET domain from silkworm genome were characterized and 17 of them belonged to six subfamilies of SUV39, SET1, SET2, SUV4-20, EZ, and SMYD. Secondly, SET domains of silkworm SET-domain family were intraspecifically and interspecifically conserved, especially for the catalytic core "NHSC" motif, substrate binding site, and catalytic site in the SET domain. Lastly, further analyses indicated that silkworm SET-domain gene BmSu(var)3-9 owned different characterization and expression profiles compared to other invertebrates. Overall, our results provide a new insight into the functional and evolutionary features of SET-domain family.

  1. Extensible Data Set Architecture for Systems Analysis Project

    Data.gov (United States)

    National Aeronautics and Space Administration — The process of aircraft design requires the integration of data from individual analysis of aerodynamic, structural, thermal, and behavioral properties of a flight...

  2. Fuzzy set theoretic approach to fault tree analysis

    African Journals Online (AJOL)

    user

    Research in conventional fault tree analysis (FTA) is based mainly on failure ... Thus for a very complex system having large number of components, the ..... Smaller, the triangular fuzzy number B-Ai, will result in the best approximation for B.

  3. Teaching Result Analysis Using Rough Sets and Data Mining

    CERN Document Server

    Ramasubramanian, P; Thangavelu, P; Winston, J Joy

    2009-01-01

    The development of IT and WWW provides different teaching strategies, which are chosen by teachers. Students can acquire knowledge through different learning models. The problem based learning is a popular teaching strategy for teachers. Based on the educational theory, students increase their learning motivation, which can increase learning effectiveness. In this paper, we propose a concept map for each student and staff. This map finds the result of the subjects and also recommends a sequence of remedial teaching. Here, rough set theory is used for dealing with uncertainty in the hidden pattern of data. For each competence the lower and upper approximations are calculated based on the brainstorm maps.

  4. A geometric level set model for ultrasounds analysis

    Energy Technology Data Exchange (ETDEWEB)

    Sarti, A.; Malladi, R.

    1999-10-01

    We propose a partial differential equation (PDE) for filtering and segmentation of echocardiographic images based on a geometric-driven scheme. The method allows edge-preserving image smoothing and a semi-automatic segmentation of the heart chambers, that regularizes the shapes and improves edge fidelity especially in presence of distinct gaps in the edge map as is common in ultrasound imagery. A numerical scheme for solving the proposed PDE is borrowed from level set methods. Results on human in vivo acquired 2D, 2D+time,3D, 3D+time echocardiographic images are shown.

  5. Bridging cancer biology with the clinic: relative expression of a GRHL2-mediated gene-set pair predicts breast cancer metastasis.

    Directory of Open Access Journals (Sweden)

    Xinan Yang

    Full Text Available Identification and characterization of crucial gene target(s that will allow focused therapeutics development remains a challenge. We have interrogated the putative therapeutic targets associated with the transcription factor Grainy head-like 2 (GRHL2, a critical epithelial regulatory factor. We demonstrate the possibility to define the molecular functions of critical genes in terms of their personalized expression profiles, allowing appropriate functional conclusions to be derived. A novel methodology, relative expression analysis with gene-set pairs (RXA-GSP, is designed to explore the potential clinical utility of cancer-biology discovery. Observing that Grhl2-overexpression leads to increased metastatic potential in vitro, we established a model assuming Grhl2-induced or -inhibited genes confer poor or favorable prognosis respectively for cancer metastasis. Training on public gene expression profiles of 995 breast cancer patients, this method prioritized one gene-set pair (GRHL2, CDH2, FN1, CITED2, MKI67 versus CTNNB1 and CTNNA3 from all 2717 possible gene-set pairs (GSPs. The identified GSP significantly dichotomized 295 independent patients for metastasis-free survival (log-rank tested p = 0.002; severe empirical p = 0.035. It also showed evidence of clinical prognostication in another independent 388 patients collected from three studies (log-rank tested p = 3.3e-6. This GSP is independent of most traditional prognostic indicators, and is only significantly associated with the histological grade of breast cancer (p = 0.0017, a GRHL2-associated clinical character (p = 6.8e-6, Spearman correlation, suggesting that this GSP is reflective of GRHL2-mediated events. Furthermore, a literature review indicates the therapeutic potential of the identified genes. This research demonstrates a novel strategy to integrate both biological experiments and clinical gene expression profiles for extracting and elucidating the genomic

  6. Gene expression meta-analysis identifies chromosomal regions involved in ovarian cancer survival

    DEFF Research Database (Denmark)

    Thomassen, Mads; Jochumsen, Kirsten M; Mogensen, Ole;

    2009-01-01

    the relation of gene expression and chromosomal position to identify chromosomal regions of importance for early recurrence of ovarian cancer. By use of *Gene Set Enrichment Analysis*, we have ranked chromosomal regions according to their association to survival. Over-representation analysis including 1......Ovarian cancer cells exhibit complex karyotypic alterations causing deregulation of numerous genes. Some of these genes are probably causal for cancer formation and local growth, whereas others are causal for metastasis and recurrence. By using publicly available data sets, we have investigated......-4 consecutive cytogenetic bands identified regions with increased expression for chromosome 5q12-14, and a very large region of chromosome 7 with the strongest signal at 7p15-13 among tumors from short-living patients. Reduced gene expression was identified at 4q26-32, 6p12-q15, 9p21-q32, and 11p14-11. We...

  7. A tandem sequence motif acts as a distance-dependent enhancer in a set of genes involved in translation by binding the proteins NonO and SFPQ

    Directory of Open Access Journals (Sweden)

    Roepcke Stefan

    2011-12-01

    Full Text Available Abstract Background Bioinformatic analyses of expression control sequences in promoters of co-expressed or functionally related genes enable the discovery of common regulatory sequence motifs that might be involved in co-ordinated gene expression. By studying promoter sequences of the human ribosomal protein genes we recently identified a novel highly specific Localized Tandem Sequence Motif (LTSM. In this work we sought to identify additional genes and LTSM-binding proteins to elucidate potential regulatory mechanisms. Results Genome-wide analyses allowed finding a considerable number of additional LTSM-positive genes, the products of which are involved in translation, among them, translation initiation and elongation factors, and 5S rRNA. Electromobility shift assays then showed specific signals demonstrating the binding of protein complexes to LTSM in ribosomal protein gene promoters. Pull-down assays with LTSM-containing oligonucleotides and subsequent mass spectrometric analysis identified the related multifunctional nucleotide binding proteins NonO and SFPQ in the binding complex. Functional characterization then revealed that LTSM enhances the transcriptional activity of the promoters in dependency of the distance from the transcription start site. Conclusions Our data demonstrate the power of bioinformatic analyses for the identification of biologically relevant sequence motifs. LTSM and the here found LTSM-binding proteins NonO and SFPQ were discovered through a synergistic combination of bioinformatic and biochemical methods and are regulators of the expression of a set of genes of the translational apparatus in a distance-dependent manner.

  8. The development of an efficient multipurpose bean pod mottle virus viral vector set for foreign gene expression and RNA silencing.

    Science.gov (United States)

    Zhang, Chunquan; Bradshaw, Jeffrey D; Whitham, Steven A; Hill, John H

    2010-05-01

    Plant viral vectors are valuable tools for heterologous gene expression, and because of virus-induced gene silencing (VIGS), they also have important applications as reverse genetics tools for gene function studies. Viral vectors are especially useful for plants such as soybean (Glycine max) that are recalcitrant to transformation. Previously, two generations of bean pod mottle virus (BPMV; genus Comovirus) vectors have been developed for overexpressing and silencing genes in soybean. However, the design of the previous vectors imposes constraints that limit their utility. For example, VIGS target sequences must be expressed as fusion proteins in the same reading frame as the viral polyprotein. This requirement limits the design of VIGS target sequences to open reading frames. Furthermore, expression of multiple genes or simultaneous silencing of one gene and expression of another was not possible. To overcome these and other issues, a new BPMV-based vector system was developed to facilitate a variety of applications for gene function studies in soybean as well as in common bean (Phaseolus vulgaris). These vectors are designed for simultaneous expression of multiple foreign genes, insertion of noncoding/antisense sequences, and simultaneous expression and silencing. The simultaneous expression of green fluorescent protein and silencing of phytoene desaturase shows that marker gene-assisted silencing is feasible. These results demonstrate the utility of this BPMV vector set for a wide range of applications in soybean and common bean, and they have implications for improvement of other plant virus-based vector systems.

  9. Dissection of the oncogenic MYCN transcriptional network reveals a large set of clinically relevant cell cycle genes as drivers of neuroblastoma tumorigenesis.

    Science.gov (United States)

    Murphy, Derek M; Buckley, Patrick G; Bryan, Kenneth; Watters, Karen M; Koster, Jan; van Sluis, Peter; Molenaar, Jan; Versteeg, Rogier; Stallings, Raymond L

    2011-06-01

    Amplification of the oncogenic transcription factor MYCN plays a major role in the pathogenesis of several pediatric cancers, including neuroblastoma, medulloblastoma, and rhabodomyosarcoma. For neuroblastoma, MYCN amplification is the most powerful genetic predictor of poor patient survival, yet the mechanism by which MYCN drives tumorigenesis is only partially understood. To gain an insight into the distribution of MYCN binding and to identify clinically relevant MYCN target genes, we performed an integrated analysis of MYCN ChIP-chip and mRNA expression using the MYCN repressible SHEP-21N neuroblastoma cell line. We hypothesized that genes exclusively MYCN bound in SHEP-21N cells over-expressing MYCN would be enriched for direct targets which contribute to the process of disease progression. Integrated analysis revealed that MYCN drives tumorigenesis predominantly as a positive regulator of target gene transcription. A high proportion of genes (24%) that are MYCN bound and up-regulated in the SHEP-21N model are significantly associated with poor overall patient survival (OS) in a set of 88 tumors. In contrast, the proportion of genes down-regulated when bound by MYCN in the SHEP-21N model and which are significantly associated with poor overall patient survival when under-expressed in primary tumors was significantly lower (5%). Gene ontology analysis determined a highly statistically significant enrichment for cell cycle related genes within the over-expressed MYCN target group which were also associated with poor OS. We conclude that the over-expression of MYCN leads to aberrant binding and over-expression of genes associated with cell cycle regulation which are significantly correlated with poor OS and MYCN amplification.

  10. Gene identification for risk of relapse in stage I lung adenocarcinoma patients: a combined methodology of gene expression profiling and computational gene network analysis.

    Science.gov (United States)

    Ludovini, Vienna; Bianconi, Fortunato; Siggillino, Annamaria; Piobbico, Danilo; Vannucci, Jacopo; Metro, Giulio; Chiari, Rita; Bellezza, Guido; Puma, Francesco; Della Fazia, Maria Agnese; Servillo, Giuseppe; Crinò, Lucio

    2016-05-24

    Risk assessment and treatment choice remains a challenge in early non-small-cell lung cancer (NSCLC). The aim of this study was to identify novel genes involved in the risk of early relapse (ER) compared to no relapse (NR) in resected lung adenocarcinoma (AD) patients using a combination of high throughput technology and computational analysis. We identified 18 patients (n.13 NR and n.5 ER) with stage I AD. Frozen samples of patients in ER, NR and corresponding normal lung (NL) were subjected to Microarray technology and quantitative-PCR (Q-PCR). A gene network computational analysis was performed to select predictive genes. An independent set of 79 ADs stage I samples was used to validate selected genes by Q-PCR.From microarray analysis we selected 50 genes, using the fold change ratio of ER versus NR. They were validated both in pool and individually in patient samples (ER and NR) by Q-PCR. Fourteen increased and 25 decreased genes showed a concordance between two methods. They were used to perform a computational gene network analysis that identified 4 increased (HOXA10, CLCA2, AKR1B10, FABP3) and 6 decreased (SCGB1A1, PGC, TFF1, PSCA, SPRR1B and PRSS1) genes. Moreover, in an independent dataset of ADs samples, we showed that both high FABP3 expression and low SCGB1A1 expression was associated with a worse disease-free survival (DFS).Our results indicate that it is possible to define, through gene expression and computational analysis, a characteristic gene profiling of patients with an increased risk of relapse that may become a tool for patient selection for adjuvant therapy.

  11. Canine Mammary Carcinomas: A Comparative Analysis of Altered Gene Expression

    Directory of Open Access Journals (Sweden)

    Farruk M. Lutful Kabir

    2015-12-01

    Full Text Available Breast cancer represents the second most frequent neoplasm in humans and sexually intact female dogs after lung and skin cancers, respectively. Many similar features in human and dog cancers including, spontaneous development, clinical presentation, tumor heterogeneity, disease progression and response to conventional therapies have supported development of this comparative model as an alternative to mice. The highly conserved similarities between canine and human genomes are also key to this comparative analysis, especially when compared to the murine genome. Studies with canine mammary tumor (CMT models have shown a strong genetic correlation with their human counterparts, particularly in terms of altered expression profiles of cell cycle regulatory genes, tumor suppressor and oncogenes and also a large group of non-coding RNAs or microRNAs (miRNAs. Because CMTs are considered predictive intermediate models for human breast cancer, similarities in genetic alterations and cancer predisposition between humans and dogs have raised further interest. Many cancer-associated genetic defects critical to mammary tumor development and oncogenic determinants of metastasis have been reported and appear to be similar in both species. Comparative analysis of deregulated gene sets or cancer signaling pathways has shown that a significant proportion of orthologous genes are comparably up- or down-regulated in both human and dog breast tumors. Particularly, a group of cell cycle regulators called cyclin-dependent kinase inhibitors (CKIs acting as potent tumor suppressors are frequently defective in CMTs. Interestingly, comparative analysis of coding sequences has also shown that these genes are highly conserved in mammals in terms of their evolutionary divergence from a common ancestor. Moreover, co-deletion and/or homozygous loss of the INK4A/ARF/INK4B (CDKN2A/B locus, encoding three members of the CKI tumor suppressor gene families (p16/INK4A, p14ARF and p15

  12. Analysis of tectonic settings of global superlarge porphyry copper deposits

    Institute of Scientific and Technical Information of China (English)

    XIA; Bin(夏斌); CHEN; Genwen(陈根文); WANG; He(王核)

    2003-01-01

    About three quarters of superlarge porphyry copper deposits throughout the world occur along the eastern Pacific basin rim, most of which were formed during the Mesozoic-Cenozoic. Porphyry copper deposits often occur in the upper parts of a subduction zone and in a within-plate orogenic belt. Some porphyry copper deposits are inconsistent with plate subduction with respect to their formation time, and most of them in the world are associated with tensional environment. Metallogenic porphyries originated from the mantle, and the involvement of the lower-crust or oceanic crust materials have played an important role. Based on the geochemical characteristics and tectonic settings of the ore-bearing porphyries in the Gandise and Yulong metallogenic zones, it is proposed that delamination may be the important mechanism of formation of porphyry copper deposits.

  13. Selection and validation of a set of reliable reference genes for quantitative RT-PCR studies in the brain of the Cephalopod Mollusc Octopus vulgaris

    Directory of Open Access Journals (Sweden)

    Biffali Elio

    2009-07-01

    Full Text Available Abstract Background Quantitative real-time polymerase chain reaction (RT-qPCR is valuable for studying the molecular events underlying physiological and behavioral phenomena. Normalization of real-time PCR data is critical for a reliable mRNA quantification. Here we identify reference genes to be utilized in RT-qPCR experiments to normalize and monitor the expression of target genes in the brain of the cephalopod mollusc Octopus vulgaris, an invertebrate. Such an approach is novel for this taxon and of advantage in future experiments given the complexity of the behavioral repertoire of this species when compared with its relatively simple neural organization. Results We chose 16S, and 18S rRNA, actB, EEF1A, tubA and ubi as candidate reference genes (housekeeping genes, HKG. The expression of 16S and 18S was highly variable and did not meet the requirements of candidate HKG. The expression of the other genes was almost stable and uniform among samples. We analyzed the expression of HKG into two different set of animals using tissues taken from the central nervous system (brain parts and mantle (here considered as control tissue by BestKeeper, geNorm and NormFinder. We found that HKG expressions differed considerably with respect to brain area and octopus samples in an HKG-specific manner. However, when the mantle is treated as control tissue and the entire central nervous system is considered, NormFinder revealed tubA and ubi as the most suitable HKG pair. These two genes were utilized to evaluate the relative expression of the genes FoxP, creb, dat and TH in O. vulgaris. Conclusion We analyzed the expression profiles of some genes here identified for O. vulgaris by applying RT-qPCR analysis for the first time in cephalopods. We validated candidate reference genes and found the expression of ubi and tubA to be the most appropriate to evaluate the expression of target genes in the brain of different octopuses. Our results also underline the

  14. Cyber-Bullying in School Settings: A Research Citation Analysis

    Science.gov (United States)

    Piotrowski, Chris

    2011-01-01

    Research on the topic of cyber-bullying has proliferated over the past decade, particularly on its impact on school-aged children. Thus, it would be of interest to examine the scope and extent of research interest in the topic in scholarly publications. This paper reports on a reference citation analysis of the database PsycINFO, using…

  15. Multigroup Confirmatory Factor Analysis: Locating the Invariant Referent Sets

    Science.gov (United States)

    French, Brian F.; Finch, W. Holmes

    2008-01-01

    Multigroup confirmatory factor analysis (MCFA) is a popular method for the examination of measurement invariance and specifically, factor invariance. Recent research has begun to focus on using MCFA to detect invariance for test items. MCFA requires certain parameters (e.g., factor loadings) to be constrained for model identification, which are…

  16. Canonical correlation analysis for gene-based pleiotropy discovery.

    Directory of Open Access Journals (Sweden)

    Jose A Seoane

    2014-10-01

    Full Text Available Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy and for testing multiple variants for association with a single phenotype (gene-based association tests. Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study, we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1 with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels.

  17. Comparative mitogenomic analyses of three scallops (Bivalvia: Pectinidae reveal high level variation of genomic organization and a diversity of transfer RNA gene sets

    Directory of Open Access Journals (Sweden)

    Kong Xiaoyu

    2009-05-01

    Full Text Available Abstract Background It can be seen from the available mollusk mitogenomes that the family Pectinidae exhibits the most variation in genome organization. In this study, comparative mitogenomic analyses were performed for three scallops from the subfamily Chlamydinae (Pectinidae, with the goal of characterizing the degree of variability of mitogenome organization and other characteristics among species from the same subfamily and exploring their possible evolution route. Findings The complete or nearly complete mtDNA sequences of scallop Mimachlamys nobilis (17 935 bp, Mizuhopecten yessoensis (20 964 bp and Chlamys farreri (17 035 bp were determined using long PCR amplification and primer walking sequencing strategy. Highly variable size difference of the three genomes resulted primarily from length and number variations of non-coding regions, and the major difference in gene content of the three scallop species are due to varying tRNA gene sets. Only 21, 16, and 17 tRNA genes were detected in the mitogenomes of M. nobilis, M. yessoensis and C. farreri, respectively. Remarkably, no trnS gene could be identified in any of the three scallops. A newly-detected trnA-like sequence within the mitogenome of M. yessoensis seems to exemplify the functional loss of a tRNA gene, and the duplication of trnD in M. yessoensis raises a fundamental question of whether the retention of the tRNA gene copy of 2-tRNAs is easier than that of 4-tRNAs. Analysis of putative evolutionary pathways of gene rearrangement indicates that transposition of neighboring gene blocks may play an important role in the evolution of mitogenomes in scallops. Parsimonious analysis of the genomic variations implies that the mitogenomes of M. yessoensis and C. farreri are likely to derive independently from a common ancestor that was closely related to M. nobilis. Conclusion Comparative mitogenomic analyses among three species from the subfamily Chlamydinae show that the three genomes

  18. Application of Ordinal Set Pair Analysis in Annual Rainfall Prediction of Liao River Basin

    Institute of Scientific and Technical Information of China (English)

    2011-01-01

    [Objective] The research aimed to study the application of ordinal set pair analysis in the annual precipitation prediction of Liao River basin.[Method] The ordinal theory was introduced into the set pair analysis modeling,and the prediction model of set pair analysis was improved.A kind of rainfall prediction model based on the ordinal set pair analysis (OSPA) was put forward.The time sequence of annual rainfall in the hydrological rainfall station of Liao River basin during 1956-2006 was the research obje...

  19. GEGEINTOOL: A Computer-Based Tool for Automated Analysis of Gene-Gene Interactions in Large Epidemiological Studies in Cardiovascular Genomics

    Directory of Open Access Journals (Sweden)

    Oscar Coltell

    2013-06-01

    Full Text Available Current methods of data analysis of gene-gene interactions in complex diseases, after taking into account environmental factors using traditional approaches, are inefficient. High-throughput methods of analysis in large scale studies including thousands of subjects and hundreds of SNPs should be implemented. We developed an integrative computer tool, GEGEINTOOL (GEne- GEne INTeraction tOOL, for large-scale analysis of gene-gene interactions, in human studies of complex diseases including a large number of subjects, SNPs, as well as environmental factors. That resource uses standard statistical packages (SPSS, etc. to build and fit the gene-gene interaction models by means of syntax scripts in predicting one or more continuous or dichotomic phenotypes. Codominant, dominant and recessive genetic interaction models including control for covariates are automatically created for each SNP in order to test the best model. From the standard outputs, GEGEINTOOL extracts a selected set of parameters (regression coefficients, p-values, adjusted means, etc., and groups them in a single MS Excel Spreadsheet. The tool allows editing the set of filter parameters, filtering the selected results depending on p-values, as well as plotting the selected gene-gene interactions to check consistency. In conclusion, GEGEINTOOL is a useful and friendly tool for exploring and identifying gene-gene interactions in complex diseases.

  20. Analysis On Classification Techniques In Mammographic Mass Data Set

    OpenAIRE

    K.K.Kavitha; Dr.A.Kangaiammal

    2015-01-01

    Data mining, the extraction of hidden information from large databases, is to predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. Data-Mining classification techniques deals with determining to which group each data instances are associated with. It can deal with a wide variety of data so that large amount of data can be involved in processing. This paper deals with analysis on various data mining classification techniques such a...

  1. Analysis of the complement and molecular evolution of tRNA genes in cow

    Directory of Open Access Journals (Sweden)

    Barris Wesley C

    2009-04-01

    Full Text Available Abstract Background Detailed information regarding the number and organization of transfer RNA (tRNA genes at the genome level is becoming readily available with the increase of DNA sequencing of whole genomes. However the identification of functional tRNA genes is challenging for species that have large numbers of repetitive elements containing tRNA derived sequences, such as Bos taurus. Reliable identification and annotation of entire sets of tRNA genes allows the evolution of tRNA genes to be understood on a genomic scale. Results In this study, we explored the B. taurus genome using bioinformatics and comparative genomics approaches to catalogue and analyze cow tRNA genes. The initial analysis of the cow genome using tRNAscan-SE identified 31,868 putative tRNA genes and 189,183 pseudogenes, where 28,830 of the 31,868 predicted tRNA genes were classified as repetitive elements by the RepeatMasker program. We then used comparative genomics to further discriminate between functional tRNA genes and tRNA-derived sequences for the remaining set of 3,038 putative tRNA genes. For our analysis, we used the human, chimpanzee, mouse, rat, horse, dog, chicken and fugu genomes to predict that the number of active tRNA genes in cow lies in the vicinity of 439. Of this set, 150 tRNA genes were 100% identical in their sequences across all nine vertebrate genomes studied. Using clustering analyses, we identified a new tRNA-GlyCCC subfamily present in all analyzed mammalian genomes. We suggest that this subfamily originated from an ancestral tRNA-GlyGCC gene via a point mutation prior to the radiation of the mammalian lineages. Lastly, in a separate analysis we created phylogenetic profiles for each putative cow tRNA gene using a representative set of genomes to gain an overview of common evolutionary histories of tRNA genes. Conclusion The use of a combination of bioinformatics and comparative genomics approaches has allowed the confident identification of a

  2. Health care priority setting in Norway a multicriteria decision analysis

    Directory of Open Access Journals (Sweden)

    Defechereux Thierry

    2012-02-01

    Full Text Available Abstract Background Priority setting in population health is increasingly based on explicitly formulated values. The Patients Rights Act of the Norwegian tax-based health service guaranties all citizens health care in case of a severe illness, a proven health benefit, and proportionality between need and treatment. This study compares the values of the country's health policy makers with these three official principles. Methods In total 34 policy makers participated in a discrete choice experiment, weighting the relative value of six policy criteria. We used multi-variate logistic regression with selection as dependent valuable to derive odds ratios for each criterion. Next, we constructed a composite league table - based on the sum score for the probability of selection - to rank potential interventions in five major disease areas. Results The group considered cost effectiveness, large individual benefits and severity of disease as the most important criteria in decision making. Priority interventions are those related to cardiovascular diseases and respiratory diseases. Less attractive interventions rank those related to mental health. Conclusions Norwegian policy makers' values are in agreement with principles formulated in national health laws. Multi-criteria decision approaches may provide a tool to support explicit allocation decisions.

  3. Validation of a set of reference genes to study response to herbicide stress in grasses

    OpenAIRE

    Petit Cécile; Pernin Fanny; Heydel Jean-Marie; Délye Christophe

    2012-01-01

    Abstract Background Non-target-site based resistance to herbicides is a major threat to the chemical control of agronomically noxious weeds. This adaptive trait is endowed by differences in the expression of a number of genes in plants that are resistant or sensitive to herbicides. Quantification of the expression of such genes requires normalising qPCR data using reference genes with stable expression in the system studied as internal standards. The aim of this study was to validate referenc...

  4. Cartilage-selective genes identified in genome-scale analysis of non-cartilage and cartilage gene expression

    Directory of Open Access Journals (Sweden)

    Cohn Zachary A

    2007-06-01

    Full Text Available Abstract Background Cartilage plays a fundamental role in the development of the human skeleton. Early in embryogenesis, mesenchymal cells condense and differentiate into chondrocytes to shape the early skeleton. Subsequently, the cartilage anlagen differentiate to form the growth plates, which are responsible for linear bone growth, and the articular chondrocytes, which facilitate joint function. However, despite the multiplicity of roles of cartilage during human fetal life, surprisingly little is known about its transcriptome. To address this, a whole genome microarray expression profile was generated using RNA isolated from 18–22 week human distal femur fetal cartilage and compared with a database of control normal human tissues aggregated at UCLA, termed Celsius. Results 161 cartilage-selective genes were identified, defined as genes significantly expressed in cartilage with low expression and little variation across a panel of 34 non-cartilage tissues. Among these 161 genes were cartilage-specific genes such as cartilage collagen genes and 25 genes which have been associated with skeletal phenotypes in humans and/or mice. Many of the other cartilage-selective genes do not have established roles in cartilage or are novel, unannotated genes. Quantitative RT-PCR confirmed the unique pattern of gene expression observed by microarray analysis. Conclusion Defining the gene expression pattern for cartilage has identified new genes that may contribute to human skeletogenesis as well as provided further candidate genes for skeletal dysplasias. The data suggest that fetal cartilage is a complex and transcriptionally active tissue and demonstrate that the set of genes selectively expressed in the tissue has been greatly underestimated.

  5. Analysis On Classification Techniques In Mammographic Mass Data Set

    Directory of Open Access Journals (Sweden)

    Mrs. K. K. Kavitha

    2015-07-01

    Full Text Available Data mining, the extraction of hidden information from large databases, is to predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. Data-Mining classification techniques deals with determining to which group each data instances are associated with. It can deal with a wide variety of data so that large amount of data can be involved in processing. This paper deals with analysis on various data mining classification techniques such as Decision Tree Induction, Naïve Bayes , k-Nearest Neighbour (KNN classifiers in mammographic mass dataset.

  6. Analysis of Duplicate Genes in Soybean

    Institute of Scientific and Technical Information of China (English)

    C.M. Cai; K.J. Van; M.Y. Kim; S.H. Lee

    2007-01-01

    @@ Gene duplication is a major determinant of the size and gene complement of eukaryotic genomes (Lockton and Gaut, 2005). There are a number of different ways in which duplicate genes can arise (Sankoff, 2001), but the most spectacular method of gene duplication may be whole genome duplication via polyploidization.

  7. Genome-wide analysis of homeobox genes from Mesobuthus martensii reveals Hox gene duplication in scorpions.

    Science.gov (United States)

    Di, Zhiyong; Yu, Yao; Wu, Yingliang; Hao, Pei; He, Yawen; Zhao, Huabin; Li, Yixue; Zhao, Guoping; Li, Xuan; Li, Wenxin; Cao, Zhijian

    2015-06-01

    Homeobox genes belong to a large gene group, which encodes the famous DNA-binding homeodomain that plays a key role in development and cellular differentiation during embryogenesis in animals. Here, one hundred forty-nine homeobox genes were identified from the Asian scorpion, Mesobuthus martensii (Chelicerata: Arachnida: Scorpiones: Buthidae) based on our newly assembled genome sequence with approximately 248 × coverage. The identified homeobox genes were categorized into eight classes including 82 families: 67 ANTP class genes, 33 PRD genes, 11 LIM genes, five POU genes, six SINE genes, 14 TALE genes, five CUT genes, two ZF genes and six unclassified genes. Transcriptome data confirmed that more than half of the genes were expressed in adults. The homeobox gene diversity of the eight classes is similar to the previously analyzed Mandibulata arthropods. Interestingly, it is hypothesized that the scorpion M. martensii may have two Hox clusters. The first complete genome-wide analysis of homeobox genes in Chelicerata not only reveals the repertoire of scorpion, arachnid and chelicerate homeobox genes, but also shows some insights into the evolution of arthropod homeobox genes.

  8. Identification of self-consistent modulons from bacterial microarray expression data with the help of structured regulon gene sets

    KAUST Repository

    Permina, Elizaveta A.

    2013-01-01

    Identification of bacterial modulons from series of gene expression measurements on microarrays is a principal problem, especially relevant for inadequately studied but practically important species. Usage of a priori information on regulatory interactions helps to evaluate parameters for regulatory subnetwork inference. We suggest a procedure for modulon construction where a seed regulon is iteratively updated with genes having expression patterns similar to those for regulon member genes. A set of genes essential for a regulon is used to control modulon updating. Essential genes for a regulon were selected as a subset of regulon genes highly related by different measures to each other. Using Escherichia coli as a model, we studied how modulon identification depends on the data, including the microarray experiments set, the adopted relevance measure and the regulon itself. We have found that results of modulon identification are highly dependent on all parameters studied and thus the resulting modulon varies substantially depending on the identification procedure. Yet, modulons that were identified correctly displayed higher stability during iterations, which allows developing a procedure for reliable modulon identification in the case of less studied species where the known regulatory interactions are sparse. Copyright © 2013 Taylor & Francis.

  9. Basis set expansion for inverse problems in plasma diagnostic analysis

    Science.gov (United States)

    Jones, B.; Ruiz, C. L.

    2013-07-01

    A basis set expansion method [V. Dribinski, A. Ossadtchi, V. A. Mandelshtam, and H. Reisler, Rev. Sci. Instrum. 73, 2634 (2002)], 10.1063/1.1482156 is applied to recover physical information about plasma radiation sources from instrument data, which has been forward transformed due to the nature of the measurement technique. This method provides a general approach for inverse problems, and we discuss two specific examples relevant to diagnosing fast z pinches on the 20-25 MA Z machine [M. E. Savage, L. F. Bennett, D. E. Bliss, W. T. Clark, R. S. Coats, J. M. Elizondo, K. R. LeChien, H. C. Harjes, J. M. Lehr, J. E. Maenchen, D. H. McDaniel, M. F. Pasik, T. D. Pointon, A. C. Owen, D. B. Seidel, D. L. Smith, B. S. Stoltzfus, K. W. Struve, W. A. Stygar, L. K. Warne, J. R. Woodworth, C. W. Mendel, K. R. Prestwich, R. W. Shoup, D. L. Johnson, J. P. Corley, K. C. Hodge, T. C. Wagoner, and P. E. Wakeland, in Proceedings of the Pulsed Power Plasma Sciences Conference (IEEE, 2007), p. 979]. First, Abel inversion of time-gated, self-emission x-ray images from a wire array implosion is studied. Second, we present an approach for unfolding neutron time-of-flight measurements from a deuterium gas puff z pinch to recover information about emission time history and energy distribution. Through these examples, we discuss how noise in the measured data limits the practical resolution of the inversion, and how the method handles discontinuities in the source function and artifacts in the projected image. We add to the method a propagation of errors calculation for estimating uncertainties in the inverted solution.

  10. Basis set expansion for inverse problems in plasma diagnostic analysis

    Energy Technology Data Exchange (ETDEWEB)

    Jones, B.; Ruiz, C. L. [Sandia National Laboratories, PO Box 5800, Albuquerque, New Mexico 87185 (United States)

    2013-07-15

    A basis set expansion method [V. Dribinski, A. Ossadtchi, V. A. Mandelshtam, and H. Reisler, Rev. Sci. Instrum. 73, 2634 (2002)] is applied to recover physical information about plasma radiation sources from instrument data, which has been forward transformed due to the nature of the measurement technique. This method provides a general approach for inverse problems, and we discuss two specific examples relevant to diagnosing fast z pinches on the 20–25 MA Z machine [M. E. Savage, L. F. Bennett, D. E. Bliss, W. T. Clark, R. S. Coats, J. M. Elizondo, K. R. LeChien, H. C. Harjes, J. M. Lehr, J. E. Maenchen, D. H. McDaniel, M. F. Pasik, T. D. Pointon, A. C. Owen, D. B. Seidel, D. L. Smith, B. S. Stoltzfus, K. W. Struve, W. A. Stygar, L. K. Warne, J. R. Woodworth, C. W. Mendel, K. R. Prestwich, R. W. Shoup, D. L. Johnson, J. P. Corley, K. C. Hodge, T. C. Wagoner, and P. E. Wakeland, in Proceedings of the Pulsed Power Plasma Sciences Conference (IEEE, 2007), p. 979]. First, Abel inversion of time-gated, self-emission x-ray images from a wire array implosion is studied. Second, we present an approach for unfolding neutron time-of-flight measurements from a deuterium gas puff z pinch to recover information about emission time history and energy distribution. Through these examples, we discuss how noise in the measured data limits the practical resolution of the inversion, and how the method handles discontinuities in the source function and artifacts in the projected image. We add to the method a propagation of errors calculation for estimating uncertainties in the inverted solution.

  11. Polymorphism Interaction Analysis (PIA: a method for investigating complex gene-gene interactions

    Directory of Open Access Journals (Sweden)

    Chanock Stephen J

    2008-03-01

    Full Text Available Abstract Background The risk of common diseases is likely determined by the complex interplay between environmental and genetic factors, including single nucleotide polymorphisms (SNPs. Traditional methods of data analysis are poorly suited for detecting complex interactions due to sparseness of data in high dimensions, which often occurs when data are available for a large number of SNPs for a relatively small number of samples. Validation of associations observed using multiple methods should be implemented to minimize likelihood of false-positive associations. Moreover, high-throughput genotyping methods allow investigators to genotype thousands of SNPs at one time. Investigating associations for each individual SNP or interactions between SNPs using traditional approaches is inefficient and prone to false positives. Results We developed the Polymorphism Interaction Analysis tool (PIA version 2.0 to include different approaches for ranking and scoring SNP combinations, to account for imbalances between case and control ratios, stratify on particular factors, and examine associations of user-defined pathways (based on SNP or gene with case status. PIA v. 2.0 detected 2-SNP interactions as the highest ranking model 77% of the time, using simulated data sets of genetic models of interaction (minor allele frequency = 0.2; heritability = 0.01; N = 1600 generated previously [Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 2007, 31:306–315.]. Interacting SNPs were detected in both balanced (20 SNPs and imbalanced data (case:control 1:2 and 1:4, 10 SNPs in the context of non-interacting SNPs. Conclusion PIA v. 2.0 is a useful tool for exploring gene*gene or gene*environment interactions and identifying a small number of putative associations which may be investigated further using other

  12. The Schizophrenia-Associated BRD1 Gene Regulates Behavior, Neurotransmission, and Expression of Schizophrenia Risk Enriched Gene Sets in Mice

    DEFF Research Database (Denmark)

    Qvist, Per; Christensen, Jane Hvarregaard; Vardya, Irina;

    2016-01-01

    BACKGROUND: The schizophrenia-associated BRD1 gene encodes a transcriptional regulator whose comprehensive chromatin interactome is enriched with schizophrenia risk genes. However, the biology underlying the disease association of BRD1 remains speculative. METHODS: This study assessed......-inhibition imbalances involving loss of parvalbumin immunoreactive interneurons. RNA-sequencing analyses of cortical and striatal micropunches from Brd1(+/-) and wild-type mice revealed differential expression of genes enriched for schizophrenia risk, including several schizophrenia genome-wide association study risk...... the transcriptional drive of a schizophrenia-associated BRD1 risk variant in vitro. Accordingly, to examine the effects of reduced Brd1 expression, we generated a genetically modified Brd1(+/-) mouse and subjected it to behavioral, electrophysiological, molecular, and integrative genomic analyses with focus...

  13. Different gene sets contribute to different symptom dimensions of depression and anxiety

    NARCIS (Netherlands)

    van Veen, Tineke; Goeman, Jelle J.; Monajemi, Ramin; Wardenaar, Klaas J.; Hartman, Catharina A.; Snieder, Harold; Nolte, Ilja M.; Penninx, Brenda W. J. H.; Zitman, Frans G.

    2012-01-01

    Although many genetic association studies have been carried out, it remains unclear which genes contribute to depression. This may be due to heterogeneity of the DSM-IV category of depression. Specific symptom-dimensions provide a more homogenous phenotype. Furthermore, as effects of individual gene

  14. Comprehensive screening for a complete set of Japanese-population-specific filaggrin gene mutations.

    Science.gov (United States)

    Kono, M; Nomura, T; Ohguchi, Y; Mizuno, O; Suzuki, S; Tsujiuchi, H; Hamajima, N; McLean, W H I; Shimizu, H; Akiyama, M

    2014-04-01

    Mutations in FLG coding profilaggrin cause ichthyosis vulgaris and are an important predisposing factor for atopic dermatitis. Until now, most case-control studies and population-based screenings have been performed only for prevalent mutations. In this study, we established a high-throughput FLG mutation detection system by real-time PCR with a set of two double-dye probes and conducted comprehensive screening for almost all of the Japanese-population-specific FLG mutations (ten FLG mutations). The present comprehensive screening for all ten FLG mutations provided a more precise prevalence rate for FLG mutations (11.1%, n = 820), which seemed high compared with data of previous reports based on screening for limited numbers of FLG mutations. Our comprehensive screening suggested that population-specific FLG mutations may be a significant predisposing factor for hay fever (odds ratio = 2.01 [95% CI: 1.027-3.936, P < 0.05]), although the sample sizes of this study were too small for reliable subphenotype analysis on the association between FLG mutations and hay fever in the eczema patients and the noneczema individuals, and it is not clear whether the association between FLG mutations and hay fever is due to the close association between FLG mutations and hay fever patients with eczema.

  15. Setting up Multiplex Panels for Genetic Testing of Familial Hy¬pertrophic Cardiomyopathy Based on Linkage Analysis

    Directory of Open Access Journals (Sweden)

    Hoorieh SAGHAFI

    2016-03-01

    Full Text Available Background: Familial hypertrophic cardiomyopathy (HCM is caused by mutations in genes encoding cardiac sarcomere proteins. Nowadays genetic testing of HCM plays an important role in clinical practice by contributing to the diagnosis, prognosis, and screening of high-risk individuals. The aim of this study was developing a reliable testing strategy for HCM based on linkage analysis and appropriate for Iranian population.Methods: Six panels of four microsatellite markers surrounding MYH7, MYBPC3, TNNT2, TNNI3, TPM1, and MYL2 genes (24 markers in total were selected for multiplex PCR and fragment length analysis. Characteristics of markers and informativeness of the panels were evaluated in 50 unrelated Iranians. The efficacy of the strategy was verified in a family with HCM.Results: All markers were highly polymorphic. The panels were informative in 96-100% of samples. Multipoint linkage analysis excluded the linkage between the disease and all six genes by obtaining maximum LOD score ≤-2.Conclusion: This study suggests a reliable genetic testing method based on linkage analysis between 6 sarcomere genes and familial HCM. It could be applied for diagnostic, predictive, or screening testing in clinical setting. Keywords: Cardiomyopathy, Hypertrophic, Genetic linkage, Diagnosis 

  16. A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments.

    Science.gov (United States)

    Broët, Philippe; Lewin, Alex; Richardson, Sylvia; Dalmasso, Cyril; Magdelenat, Henri

    2004-11-01

    Multiclass response (MCR) experiments are those in which there are more than two classes to be compared. In these experiments, though the null hypothesis is simple, there are typically many patterns of gene expression changes across the different classes that led to complex alternatives. In this paper, we propose a new strategy for selecting genes in MCR that is based on a flexible mixture model for the marginal distribution of a modified F-statistic. Using this model, false positive and negative discovery rates can be estimated and combined to produce a rule for selecting a subset of genes. Moreover, the method proposed allows calculation of these rates for any predefined subset of genes. We illustrate the performance our approach using simulated datasets and a real breast cancer microarray dataset. In this latter study, we investigate predefined subset of genes and point out interesting differences between three distinct biological pathways. http://www.bgx.org.uk/software.html

  17. RNA amplification for successful gene profiling analysis

    Directory of Open Access Journals (Sweden)

    Wang Ena

    2005-07-01

    Full Text Available Abstract The study of clinical samples is often limited by the amount of material available to study. While proteins cannot be multiplied in their natural form, DNA and RNA can be amplified from small specimens and used for high-throughput analyses. Therefore, genetic studies offer the best opportunity to screen for novel insights of human pathology when little material is available. Precise estimates of DNA copy numbers in a given specimen are necessary. However, most studies investigate static variables such as the genetic background of patients or mutations within pathological specimens without a need to assess proportionality of expression among different genes throughout the genome. Comparative genomic hybridization of DNA samples represents a crude exception to this rule since genomic amplification or deletion is compared among different specimens directly. For gene expression analysis, however, it is critical to accurately estimate the proportional expression of distinct RNA transcripts since such proportions directly govern cell function by modulating protein expression. Furthermore, comparative estimates of relative RNA expression at different time points portray the response of cells to environmental stimuli, indirectly informing about broader biological events affecting a particular tissue in physiological or pathological conditions. This cognitive reaction of cells is similar to the detection of electroencephalographic patterns which inform about the status of the brain in response to external stimuli. As our need to understand human pathophysiology at the global level increases, the development and refinement of technologies for high fidelity messenger RNA amplification have become the focus of increasing interest during the past decade. The need to increase the abundance of RNA has been met not only for gene specific amplification, but, most importantly for global transcriptome wide, unbiased amplification. Now gene

  18. Analysis of RNAseq datasets from a comparative infectious disease zebrafish model using GeneTiles bioinformatics.

    Science.gov (United States)

    Veneman, Wouter J; de Sonneville, Jan; van der Kolk, Kees-Jan; Ordas, Anita; Al-Ars, Zaid; Meijer, Annemarie H; Spaink, Herman P

    2015-03-01

    We present a RNA deep sequencing (RNAseq) analysis of a comparison of the transcriptome responses to infection of zebrafish larvae with Staphylococcus epidermidis and Mycobacterium marinum bacteria. We show how our developed GeneTiles software can improve RNAseq analysis approaches by more confidently identifying a large set of markers upon infection with these bacteria. For analysis of RNAseq data currently, software programs such as Bowtie2 and Samtools are indispensable. However, these programs that are designed for a LINUX environment require some dedicated programming skills and have no options for visualisation of the resulting mapped sequence reads. Especially with large data sets, this makes the analysis time consuming and difficult for non-expert users. We have applied the GeneTiles software to the analysis of previously published and newly obtained RNAseq datasets of our zebrafish infection model, and we have shown the applicability of this approach also to published RNAseq datasets of other organisms by comparing our data with a published mammalian infection study. In addition, we have implemented the DEXSeq module in the GeneTiles software to identify genes, such as glucagon A, that are differentially spliced under infection conditions. In the analysis of our RNAseq data, this has led to the possibility to improve the size of data sets that could be efficiently compared without using problem-dedicated programs, leading to a quick identification of marker sets. Therefore, this approach will also be highly useful for transcriptome analyses of other organisms for which well-characterised genomes are available.

  19. Rice fortification: a comparative analysis in mandated settings.

    Science.gov (United States)

    Forsman, Carmen; Milani, Peiman; Schondebare, Jill A; Matthias, Dipika; Guyondet, Christophe

    2014-09-01

    Legal mandates can play an important role in the success of rice fortification programs that involve the private sector. However, merely enacting mandatory legislation does not guarantee success; it requires a coordinated, multidimensional cross-sector effort that addresses stewardship, develops an appropriate rice fortification technology, enables sustainable production and distribution channels through a range of private-sector players, ensures quality, generates consumer demand, and monitors progress. Furthermore, economic sustainability must be built into the supply chain and distribution network to enable the program to outlast government administrations and/or time-limited funding. Hence, mandates can serve as valuable long-term enablers of cross-sector mobilization and collaboration and as catalysts of civil society engagement in and ownership of fortification programs. This paper compares the rice fortification experiences of Costa Rica and the Philippines--two countries with mandates, yet distinctly different industry landscapes. Costa Rica has achieved national success through strong government stewardship and active market development--key elements of success regardless of industry structure. With a comparatively more diffuse rice industry structure, the Philippines has also had success in limited geographies where key stakeholders have played an active role in market development. A comparative analysis provides lessons that may be relevant to other rice fortification programs.

  20. The map-1 gene family in root-knot nematodes, Meloidogyne spp.: a set of taxonomically restricted genes specific to clonal species.

    Directory of Open Access Journals (Sweden)

    Iva Tomalova

    Full Text Available Taxonomically restricted genes (TRGs, i.e., genes that are restricted to a limited subset of phylogenetically related organisms, may be important in adaptation. In parasitic organisms, TRG-encoded proteins are possible determinants of the specificity of host-parasite interactions. In the root-knot nematode (RKN Meloidogyne incognita, the map-1 gene family encodes expansin-like proteins that are secreted into plant tissues during parasitism, thought to act as effectors to promote successful root infection. MAP-1 proteins exhibit a modular architecture, with variable number and arrangement of 58 and 13-aa domains in their central part. Here, we address the evolutionary origins of this gene family using a combination of bioinformatics and molecular biology approaches. Map-1 genes were solely identified in one single member of the phylum Nematoda, i.e., the genus Meloidogyne, and not detected in any other nematode, thus indicating that the map-1 gene family is indeed a TRG family. A phylogenetic analysis of the distribution of map-1 genes in RKNs further showed that these genes are specifically present in species that reproduce by mitotic parthenogenesis, with the exception of M. floridensis, and could not be detected in RKNs reproducing by either meiotic parthenogenesis or amphimixis. These results highlight the divergence between mitotic and meiotic RKN species as a critical transition in the evolutionary history of these parasites. Analysis of the sequence conservation and organization of repeated domains in map-1 genes suggests that gene duplication(s together with domain loss/duplication have contributed to the evolution of the map-1 family, and that some strong selection mechanism may be acting upon these genes to maintain their functional role(s in the specificity of the plant-RKN interactions.

  1. Homogeneity analysis with k sets of variables: An alternating least squares method with optimal scaling features

    NARCIS (Netherlands)

    van der Burg, Eeke; de Leeuw, Jan; Verdegaal, R.

    1986-01-01

    Homogeneity analysis, or multiple correspondence analysis, is usually applied to k separate variables. In this paper, it is applied to sets of variables by using sums within sets. The resulting technique is referred to as OVERALS. It uses the notion of optimal scaling, with transformations that can

  2. Gene expression profile analysis of type 2 diabetic mouse liver.

    Directory of Open Access Journals (Sweden)

    Fang Zhang

    Full Text Available Liver plays a key role in glucose metabolism and homeostasis, and impaired hepatic glucose metabolism contributes to the development of type 2 diabetes. However, the precise gene expression profile of diabetic liver and its association with diabetes and related diseases are yet to be further elucidated. In this study, we detected the gene expression profile by high-throughput sequencing in 9-week-old normal and type 2 diabetic db/db mouse liver. Totally 12132 genes were detected, and 2627 genes were significantly changed in diabetic mouse liver. Biological process analysis showed that the upregulated genes in diabetic mouse liver were mainly enriched in metabolic processes. Surprisingly, the downregulated genes in diabetic mouse liver were mainly enriched in immune-related processes, although all the altered genes were still mainly enriched in metabolic processes. Similarly, KEGG pathway analysis showed that metabolic pathways were the major pathways altered in diabetic mouse liver, and downregulated genes were enriched in immune and cancer pathways. Analysis of the key enzyme genes in fatty acid and glucose metabolism showed that some key enzyme genes were significantly increased and none of the detected key enzyme genes were decreased. In addition, FunDo analysis showed that liver cancer and hepatitis were most likely to be associated with diabetes. Taken together, this study provides the digital gene expression profile of diabetic mouse liver, and demonstrates the main diabetes-associated hepatic biological processes, pathways, key enzyme genes in fatty acid and glucose metabolism and potential hepatic diseases.

  3. Identification of candidate genes for prostate cancer-risk SNPs utilizing a normal prostate tissue eQTL data set

    Science.gov (United States)

    Thibodeau, S. N.; French, A. J.; McDonnell, S. K.; Cheville, J.; Middha, S.; Tillmans, L.; Riska, S.; Baheti, S.; Larson, M. C.; Fogarty, Z.; Zhang, Y.; Larson, N.; Nair, A.; O'Brien, D.; Wang, L.; Schaid, D J.

    2015-01-01

    Multiple studies have identified loci associated with the risk of developing prostate cancer but the associated genes are not well studied. Here we create a normal prostate tissue-specific eQTL data set and apply this data set to previously identified prostate cancer (PrCa)-risk SNPs in an effort to identify candidate target genes. The eQTL data set is constructed by the genotyping and RNA sequencing of 471 samples. We focus on 146 PrCa-risk SNPs, including all SNPs in linkage disequilibrium with each risk SNP, resulting in 100 unique risk intervals. We analyse cis-acting associations where the transcript is located within 2 Mb (±1 Mb) of the risk SNP interval. Of all SNP–gene combinations tested, 41.7% of SNPs demonstrate a significant eQTL signal after adjustment for sample histology and 14 expression principal component covariates. Of the 100 PrCa-risk intervals, 51 have a significant eQTL signal and these are associated with 88 genes. This study provides a rich resource to study biological mechanisms underlying genetic risk to PrCa. PMID:26611117

  4. Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data.

    Science.gov (United States)

    Ganesh Kumar, Pugalendhi; Kavitha, Muthu Subash; Ahn, Byeong-Cheol

    2016-01-01

    This study describes a novel approach to reducing the challenges of highly nonlinear multiclass gene expression values for cancer diagnosis. To build a fruitful system for cancer diagnosis, in this study, we introduced two levels of gene selection such as filtering and embedding for selection of potential genes and the most relevant genes associated with cancer, respectively. The filter procedure was implemented by developing a fuzzy rough set (FR)-based method for redefining the criterion function of f-information (FI) to identify the potential genes without discretizing the continuous gene expression values. The embedded procedure is implemented by means of a water swirl algorithm (WSA), which attempts to optimize the rule set and membership function required to classify samples using a fuzzy-rule-based multiclassification system (FRBMS). Two novel update equations are proposed in WSA, which have better exploration and exploitation abilities while designing a self-learning FRBMS. The efficiency of our new approach was evaluated on 13 multicategory and 9 binary datasets of cancer gene expression. Additionally, the performance of the proposed FRFI-WSA method in designing an FRBMS was compared with existing methods for gene selection and optimization such as genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony algorithm (ABC) on all the datasets. In the global cancer map with repeated measurements (GCM_RM) dataset, the FRFI-WSA showed the smallest number of 16 most relevant genes associated with cancer using a minimal number of 26 compact rules with the highest classification accuracy (96.45%). In addition, the statistical validation used in this study revealed that the biological relevance of the most relevant genes associated with cancer and their linguistics detected by the proposed FRFI-WSA approach are better than those in the other methods. The simple interpretable rules with most relevant genes and effectively classified

  5. Candidate genes for chronic obstructive pulmonary disease in two large data sets

    DEFF Research Database (Denmark)

    Bakke, P S; Zhu, G; Gulsvik, A;

    2011-01-01

    to these phenotypes in this first study were tested in a second, family based, study that included 635 pedigrees with 1910 individuals. Significant associations to the binary COPD phenotype in both populations were seen for STAT1 (rs13010343) and NFKBIB/SIRT2 (rs2241704) (p... of the GC gene were significantly associated with FEV1 in percent predicted and FEV1/FVC, respectively in both populations (pSIRT2, and GC genes in two independent populations, the associations of the former two genes...

  6. Implementation of BacMam virus gene delivery technology in a drug discovery setting.

    Science.gov (United States)

    Kost, Thomas A; Condreay, J Patrick; Ames, Robert S; Rees, Stephen; Romanos, Michael A

    2007-05-01

    Membrane protein targets constitute a key segment of drug discovery portfolios and significant effort has gone into increasing the speed and efficiency of pursuing these targets. However, issues still exist in routine gene expression and stable cell-based assay development for membrane proteins, which are often multimeric or toxic to host cells. To enhance cell-based assay capabilities, modified baculovirus (BacMam virus) gene delivery technology has been successfully applied to the transient expression of target proteins in mammalian cells. Here, we review the development, full implementation and benefits of this platform-based gene expression technology in support of SAR and HTS assays across GlaxoSmithKline.

  7. A network-based gene-weighting approach for pathway analysis

    Institute of Scientific and Technical Information of China (English)

    Zhaoyuan Fang; Weidong Tian; Hongbin Ji

    2012-01-01

    Classical algorithms aiming at identifying biological pathways significantly related to studying conditions frequently reduced pathways to gene sets,with an obvious ignorance of the constitutive non-equivalence of various genes within a defined pathway.We here designed a network-based method to determine such non-equivalence in terms of gene weights.The gene weights determined are biologically consistent and robust to network perturbations.By integrating the gene weights into the classical gene set analysis,with a subsequent correction for the “over-counting”bias associated with multi-subunit proteins,we have developed a novel gene-weighed pathway analysis approach,as implemented in an R package called “Gene Associaqtion Network-based Pathway Analysis”(GANPA).Through analysis of several microarray datasets,including the p53 dataset,asthma dataset and three breast cancer datasets,we demonstrated that our approach is biologically reliable and reproducible,and therefore helpful for microarray data interpretation and hypothesis generation.

  8. Pathway-based analysis of GWAs data identifies association of sex determination genes with susceptibility to testicular germ cell tumors.

    Science.gov (United States)

    Koster, Roelof; Mitra, Nandita; D'Andrea, Kurt; Vardhanabhuti, Saran; Chung, Charles C; Wang, Zhaoming; Loren Erickson, R; Vaughn, David J; Litchfield, Kevin; Rahman, Nazneen; Greene, Mark H; McGlynn, Katherine A; Turnbull, Clare; Chanock, Stephen J; Nathanson, Katherine L; Kanetsky, Peter A

    2014-11-15

    Genome-wide association (GWA) studies of testicular germ cell tumor (TGCT) have identified 18 susceptibility loci, some containing genes encoding proteins important in male germ cell development. Deletions of one of these genes, DMRT1, lead to male-to-female sex reversal and are associated with development of gonadoblastoma. To further explore genetic association with TGCT, we undertook a pathway-based analysis of SNP marker associations in the Penn GWAs (349 TGCT cases and 919 controls). We analyzed a custom-built sex determination gene set consisting of 32 genes using three different methods of pathway-based analysis. The sex determination gene set ranked highly compared with canonical gene sets, and it was associated with TGCT (FDRG = 2.28 × 10(-5), FDRM = 0.014 and FDRI = 0.008 for Gene Set Analysis-SNP (GSA-SNP), Meta-Analysis Gene Set Enrichment of Variant Associations (MAGENTA) and Improved Gene Set Enrichment Analysis for Genome-wide Association Study (i-GSEA4GWAS) analysis, respectively). The association remained after removal of DMRT1 from the gene set (FDRG = 0.0002, FDRM = 0.055 and FDRI = 0.009). Using data from the NCI GWA scan (582 TGCT cases and 1056 controls) and UK scan (986 TGCT cases and 4946 controls), we replicated these findings (NCI: FDRG = 0.006, FDRM = 0.014, FDRI = 0.033, and UK: FDRG = 1.04 × 10(-6), FDRM = 0.016, FDRI = 0.025). After removal of DMRT1 from the gene set, the sex determination gene set remains associated with TGCT in the NCI (FDRG = 0.039, FDRM = 0.050 and FDRI = 0.055) and UK scans (FDRG = 3.00 × 10(-5), FDRM = 0.056 and FDRI = 0.044). With the exception of DMRT1, genes in the sex determination gene set have not previously been identified as TGCT susceptibility loci in these GWA scans, demonstrating the complementary nature of a pathway-based approach for genome-wide analysis of TGCT.

  9. Mutation analysis of the preproghrelin gene

    DEFF Research Database (Denmark)

    Larsen, Lesli H; Gjesing, Anette P; Sørensen, Thorkild I A;

    2005-01-01

    To investigate the preproghrelin gene for variants and their association with obesity and type 2 diabetes.......To investigate the preproghrelin gene for variants and their association with obesity and type 2 diabetes....

  10. Candidate genes for chronic obstructive pulmonary disease in two large data sets

    DEFF Research Database (Denmark)

    Bakke, P S; Zhu, G; Gulsvik, A

    2011-01-01

    Lack of reproducibility of findings has been a criticism of genetic association studies in complex diseases like chronic obstructive pulmonary disease (COPD). We selected 257 polymorphisms of 16 genes with reported or potential relationshipsto COPD and genotyped these variants in a case-control s......Lack of reproducibility of findings has been a criticism of genetic association studies in complex diseases like chronic obstructive pulmonary disease (COPD). We selected 257 polymorphisms of 16 genes with reported or potential relationshipsto COPD and genotyped these variants in a case...... of the GC gene were significantly associated with FEV1 in percent predicted and FEV1/FVC, respectively in both populations (pgenes in two independent populations, the associations of the former two genes...

  11. Genome-wide survey and developmental expression mapping of zebrafish SET domain-containing genes

    National Research Council Canada - National Science Library

    Sun, Xiao-Jian; Xu, Peng-Fei; Zhou, Ting; Hu, Ming; Fu, Chun-Tang; Zhang, Yong; Jin, Yi; Chen, Yi; Chen, Sai-Juan; Huang, Qiu-Hua; Liu, Ting Xi; Chen, Zhu

    2008-01-01

    .... Since some of these genes have been revealed to be essential for embryonic development, we propose that the zebrafish, a vertebrate model organism possessing many advantages for developmental studies...

  12. Genome-wide analysis of the WRKY gene family in physic nut (Jatropha curcas L.).

    Science.gov (United States)

    Xiong, Wangdan; Xu, Xueqin; Zhang, Lin; Wu, Pingzhi; Chen, Yaping; Li, Meiru; Jiang, Huawu; Wu, Guojiang

    2013-07-25

    The WRKY proteins, which contain highly conserved WRKYGQK amino acid sequences and zinc-finger-like motifs, constitute a large family of transcription factors in plants. They participate in diverse physiological and developmental processes. WRKY genes have been identified and characterized in a number of plant species. We identified a total of 58 WRKY genes (JcWRKY) in the genome of the physic nut (Jatropha curcas L.). On the basis of their conserved WRKY domain sequences, all of the JcWRKY proteins could be assigned to one of the previously defined groups, I-III. Phylogenetic analysis of JcWRKY genes with Arabidopsis and rice WRKY genes, and separately with castor bean WRKY genes, revealed no evidence of recent gene duplication in JcWRKY gene family. Analysis of transcript abundance of JcWRKY gene products were tested in different tissues under normal growth condition. In addition, 47 WRKY genes responded to at least one abiotic stress (drought, salinity, phosphate starvation and nitrogen starvation) in individual tissues (leaf, root and/or shoot cortex). Our study provides a useful reference data set as the basis for cloning and functional analysis of physic nut WRKY genes.

  13. Application of Multi-SOM clustering approach to macrophage gene expression analysis.

    Science.gov (United States)

    Ghouila, Amel; Yahia, Sadok Ben; Malouche, Dhafer; Jmel, Haifa; Laouini, Dhafer; Guerfali, Fatma Z; Abdelhak, Sonia

    2009-05-01

    The production of increasingly reliable and accessible gene expression data has stimulated the development of computational tools to interpret such data and to organize them efficiently. The clustering techniques are largely recognized as useful exploratory tools for gene expression data analysis. Genes that show similar expression patterns over a wide range of experimental conditions can be clustered together. This relies on the hypothesis that genes that belong to the same cluster are coregulated and involved in related functions. Nevertheless, clustering algorithms still show limits, particularly for the estimation of the number of clusters and the interpretation of hierarchical dendrogram, which may significantly influence the outputs of the analysis process. We propose here a multi level SOM based clustering algorithm named Multi-SOM. Through the use of clustering validity indices, Multi-SOM overcomes the problem of the estimation of clusters number. To test the validity of the proposed clustering algorithm, we first tested it on supervised training data sets. Results were evaluated by computing the number of misclassified samples. We have then used Multi-SOM for the analysis of macrophage gene expression data generated in vitro from the same individual blood infected with 5 different pathogens. This analysis led to the identification of sets of tightly coregulated genes across different pathogens. Gene Ontology tools were then used to estimate the biological significance of the clustering, which showed that the obtained clusters are coherent and biologically significant.

  14. Genomewide identification, classification and analysis of NAC type gene family in maize

    Indian Academy of Sciences (India)

    Xiaojian Peng; Yang Zhao; Xiaoming Li; Min Wu; Wenbo Chai; Lei Sheng; Yu Wang; Qing Dong; Haiyang Jiang; Beijiu Cheng

    2015-09-01

    NAC transcription factors comprise a large plant-specific gene family. Increasing evidence suggests that members of this family have diverse functions in plant growth and development. In this study, we performed a genomewide survey of NAC type genes in maize (Zea mays L.). A complete set of 148 nonredundant NAC genes (ZmNAC1–ZmNAC148) were identified in the maize genome using Blast search tools, and divided into 12 groups (a–l) based on phylogeny. Chromosomal location of these genes revealed that they are distributed unevenly across all 10 chromosomes. Segmental and tandem duplication contributed largely to the expansion of the maize NAC gene family. The a/s ratio suggested that the duplicated genes of maize NAC family mainly experienced purifying selection, with limited functional divergence after duplication events. Microarray analysis indicated most of the maize NAC genes were expressed across different developmental stages. Moreover, 19 maize NAC genes grouped with published stress-responsive genes from other plants were found to contain putative stress-responsive cis-elements in their promoter regions. All these stress-responsive genes belonged to the group d (stress-related). Further, these genes showed differential expression patterns over time in response to drought treatments by quantitative real-time PCR analysis. Our results reveal a comprehensive overview of the maize NAC, and form the foundation for future functional research to uncover their roles in maize growth and development.

  15. Intertwining threshold settings, biological data and database knowledge to optimize the selection of differentially expressed genes from microarray

    OpenAIRE

    Paul Chuchana; Philippe Holzmuller; Frederic Vezilier; David Berthier; Isabelle Chantal; Dany Severac; Jean Loup Lemesre; Gerard Cuny; Philippe Nirdé; Bruno Bucheton

    2010-01-01

    International audience; BACKGROUND: Many tools used to analyze microarrays in different conditions have been described. However, the integration of deregulated genes within coherent metabolic pathways is lacking. Currently no objective selection criterion based on biological functions exists to determine a threshold demonstrating that a gene is indeed differentially expressed. METHODOLOGY/PRINCIPAL FINDINGS: To improve transcriptomic analysis of microarrays, we propose a new statistical appro...

  16. Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes

    Directory of Open Access Journals (Sweden)

    Eils Roland

    2005-11-01

    Full Text Available Abstract Background The extensive use of DNA microarray technology in the characterization of the cell transcriptome is leading to an ever increasing amount of microarray data from cancer studies. Although similar questions for the same type of cancer are addressed in these different studies, a comparative analysis of their results is hampered by the use of heterogeneous microarray platforms and analysis methods. Results In contrast to a meta-analysis approach where results of different studies are combined on an interpretative level, we investigate here how to directly integrate raw microarray data from different studies for the purpose of supervised classification analysis. We use median rank scores and quantile discretization to derive numerically comparable measures of gene expression from different platforms. These transformed data are then used for training of classifiers based on support vector machines. We apply this approach to six publicly available cancer microarray gene expression data sets, which consist of three pairs of studies, each examining the same type of cancer, i.e. breast cancer, prostate cancer or acute myeloid leukemia. For each pair, one study was performed by means of cDNA microarrays and the other by means of oligonucleotide microarrays. In each pair, high classification accuracies (> 85% were achieved with training and testing on data instances randomly chosen from both data sets in a cross-validation analysis. To exemplify the potential of this cross-platform classification analysis, we use two leukemia microarray data sets to show that important genes with regard to the biology of leukemia are selected in an integrated analysis, which are missed in either single-set analysis. Conclusion Cross-platform classification of multiple cancer microarray data sets yields discriminative gene expression signatures that are found and validated on a large number of microarray samples, generated by different laboratories and

  17. RISK DISCLOSURE ANALYSIS IN THE CORPORATE GOVERNANCE ANNUAL REPORT USING FUZZY-SET QUALITATIVE COMPARATIVE ANALYSIS

    Directory of Open Access Journals (Sweden)

    Pedro Carmona

    2016-05-01

    Full Text Available This paper explores the necessary and sufficient conditions of good Corporate Governance practices for high risk disclosure by firms in their Corporate Governance Annual Report. Additionally, we explore whether those recipes have changed during the financial crisis. With a sample of 271 Spanish listed companies, we applied fuzzy-set qualitative comparative analysis to a database of financial and non-financial data. We report that Board of Directors independence, size, level of activity and gender diversity, CEO duality, Audit Committee independence, being audited by the Big Four auditing firms and the presence of institutional investors are associated with high risk disclosure. The conditions included in almost every combination are the presence of institutional investors and being audited by the Big Four. We found similar combinations for 2006 and 2012, while the analysis for 2009 showed the lowest number of causal configurations.

  18. Comprehensive analysis of plant rapid alkalization factor (RALF) genes.

    Science.gov (United States)

    Sharma, Arti; Hussain, Adil; Mun, Bong-Gyu; Imran, Qari Muhammad; Falak, Noreen; Lee, Sang-Uk; Kim, Jae Young; Hong, Jeum Kyu; Loake, Gary John; Ali, Asad; Yun, Byung-Wook

    2016-09-01

    Receptor mediated signal carriers play a critical role in the regulation of plant defense and development. Rapid alkalization factor (RALF) proteins potentially comprise important signaling components which may have a key role in plant biology. The RALF gene family contains large number of genes in several plant species, however, only a few RALF genes have been characterized to date. In this study, an extensive database search identified 39, 43, 34 and 18 RALF genes in Arabidopsis, rice, maize and soybean, respectively. These RALF genes were found to be highly conserved across the 4 plant species. A comprehensive analysis including the chromosomal location, gene structure, subcellular location, conserved motifs, protein structure, protein-ligand interaction and promoter analysis was performed. RALF genes from four plant species were divided into 7 groups based on phylogenetic analysis. In silico expression analysis of these genes, using microarray and EST data, revealed that these genes exhibit a variety of expression patterns. Furthermore, RALF genes showed distinct expression patterns of transcript accumulation in vivo following nitrosative and oxidative stresses in Arabidopsis. Predicted interaction between RALF and heme ligand also showed that RALF proteins may contribute towards transporting or scavenging oxygen moieties. This suggests a possible role for RALF genes during changes in cellular redox status. Collectively, our data provides a valuable resource to prime future research in the role of RALF genes in plant growth and development.

  19. A combined strategy of "in silico" transcriptome analysis and web search engine optimization allows an agile identification of reference genes suitable for normalization in gene expression studies.

    Science.gov (United States)

    Faccioli, Primetta; Ciceri, Gian Paolo; Provero, Paolo; Stanca, Antonio Michele; Morcia, Caterina; Terzi, Valeria

    2007-03-01

    Traditionally housekeeping genes have been employed as endogenous reference (internal control) genes for normalization in gene expression studies. Since the utilization of single housekeepers cannot assure an unbiased result, new normalization methods involving multiple housekeeping genes and normalizing using their mean expression have been recently proposed. Moreover, since a gold standard gene suitable for every experimental condition does not exist, it is also necessary to validate the expression stability of every putative control gene on the specific requirements of the planned experiment. As a consequence, finding a good set of reference genes is for sure a non-trivial problem requiring quite a lot of lab-based experimental testing. In this work we identified novel candidate barley reference genes suitable for normalization in gene expression studies. An advanced web search approach aimed to collect, from publicly available web resources, the most interesting information regarding the expression profiling of candidate housekeepers on a specific experimental basis has been set up and applied, as an example, on stress conditions. A complementary lab-based analysis has been carried out to verify the expression profile of the selected genes in different tissues and during heat shock response. This combined dry/wet approach can be applied to any species and physiological condition of interest and can be considered very helpful to identify putative reference genes to be shortlisted every time a new experimental design has to be set up.

  20. Slide Set: Reproducible image analysis and batch processing with ImageJ.

    Science.gov (United States)

    Nanes, Benjamin A

    2015-11-01

    Most imaging studies in the biological sciences rely on analyses that are relatively simple. However, manual repetition of analysis tasks across multiple regions in many images can complicate even the simplest analysis, making record keeping difficult, increasing the potential for error, and limiting reproducibility. While fully automated solutions are necessary for very large data sets, they are sometimes impractical for the small- and medium-sized data sets common in biology. Here we present the Slide Set plugin for ImageJ, which provides a framework for reproducible image analysis and batch processing. Slide Set organizes data into tables, associating image files with regions of interest and other relevant information. Analysis commands are automatically repeated over each image in the data set, and multiple commands can be chained together for more complex analysis tasks. All analysis parameters are saved, ensuring transparency and reproducibility. Slide Set includes a variety of built-in analysis commands and can be easily extended to automate other ImageJ plugins, reducing the manual repetition of image analysis without the set-up effort or programming expertise required for a fully automated solution.

  1. Random matrix analysis of localization properties of gene coexpression network.

    Science.gov (United States)

    Jalan, Sarika; Solymosi, Norbert; Vattay, Gábor; Li, Baowen

    2010-04-01

    We analyze gene coexpression network under the random matrix theory framework. The nearest-neighbor spacing distribution of the adjacency matrix of this network follows Gaussian orthogonal statistics of random matrix theory (RMT). Spectral rigidity test follows random matrix prediction for a certain range and deviates afterwards. Eigenvector analysis of the network using inverse participation ratio suggests that the statistics of bulk of the eigenvalues of network is consistent with those of the real symmetric random matrix, whereas few eigenvalues are localized. Based on these IPR calculations, we can divide eigenvalues in three sets: (a) The nondegenerate part that follows RMT. (b) The nondegenerate part, at both ends and at intermediate eigenvalues, which deviates from RMT and expected to contain information about important nodes in the network. (c) The degenerate part with zero eigenvalue, which fluctuates around RMT-predicted value. We identify nodes corresponding to the dominant modes of the corresponding eigenvectors and analyze their structural properties.

  2. A two-sample test for high-dimensional data with applications to gene-set testing

    CERN Document Server

    Chen, Song Xi; 10.1214/09-AOS716

    2010-01-01

    We propose a two-sample test for the means of high-dimensional data when the data dimension is much larger than the sample size. Hotelling's classical $T^2$ test does not work for this "large $p$, small $n$" situation. The proposed test does not require explicit conditions in the relationship between the data dimension and sample size. This offers much flexibility in analyzing high-dimensional data. An application of the proposed test is in testing significance for sets of genes which we demonstrate in an empirical study on a leukemia data set.

  3. Integrated Weighted Gene Co-expression Network Analysis with an Application to Chronic Fatigue Syndrome

    Directory of Open Access Journals (Sweden)

    Rajeevan Mangalathu S

    2008-11-01

    Full Text Available Abstract Background Systems biologic approaches such as Weighted Gene Co-expression Network Analysis (WGCNA can effectively integrate gene expression and trait data to identify pathways and candidate biomarkers. Here we show that the additional inclusion of genetic marker data allows one to characterize network relationships as causal or reactive in a chronic fatigue syndrome (CFS data set. Results We combine WGCNA with genetic marker data to identify a disease-related pathway and its causal drivers, an analysis which we refer to as "Integrated WGCNA" or IWGCNA. Specifically, we present the following IWGCNA approach: 1 construct a co-expression network, 2 identify trait-related modules within the network, 3 use a trait-related genetic marker to prioritize genes within the module, 4 apply an integrated gene screening strategy to identify candidate genes and 5 carry out causality testing to verify and/or prioritize results. By applying this strategy to a CFS data set consisting of microarray, SNP and clinical trait data, we identify a module of 299 highly correlated genes that is associated with CFS severity. Our integrated gene screening strategy results in 20 candidate genes. We show that our approach yields biologically interesting genes that function in the same pathway and are causal drivers for their parent module. We use a separate data set to replicate findings and use Ingenuity Pathways Analysis software to functionally annotate the candidate gene pathways. Conclusion We show how WGCNA can be combined with genetic marker data to identify disease-related pathways and the causal drivers within them. The systems genetics approach described here can easily be used to generate testable genetic hypotheses in other complex disease studies.

  4. HoxBlinc RNA recruits Set1/MLL complexes to activate Hox gene expression patterns and mesoderm lineage development

    Science.gov (United States)

    Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Nao; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming

    2015-01-01

    Summary Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1+ mesoderm and then promotes hematopoietic differentiation through regulating hoxb gene pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated KD or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb gene expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1+ precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1+ precursors and differentiation of Flk1+ cells into hematopoietic lineages. PMID:26725110

  5. A set of vectors for introduction of antibiotic resistance genes by in vitro Cre-mediated recombination

    Directory of Open Access Journals (Sweden)

    Vassetzky Yegor S

    2008-12-01

    Full Text Available Abstract Background Introduction of new antibiotic resistance genes in the plasmids of interest is a frequent task in molecular cloning practice. Classical approaches involving digestion with restriction endonucleases and ligation are time-consuming. Findings We have created a set of insertion vectors (pINS carrying genes that provide resistance to various antibiotics (puromycin, blasticidin and G418 and containing a loxP site. Each vector (pINS-Puro, pINS-Blast or pINS-Neo contains either a chloramphenicol or a kanamycin resistance gene and is unable to replicate in most E. coli strains as it contains a conditional R6Kγ replication origin. Introduction of the antibiotic resistance genes into the vector of interest is achieved by Cre-mediated recombination between the replication-incompetent pINS and a replication-competent target vector. The recombination mix is then transformed into E. coli and selected by the resistance marker (kanamycin or chloramphenicol present in pINS, which allows to recover the recombinant plasmids with 100% efficiency. Conclusion Here we propose a simple strategy that allows to introduce various antibiotic-resistance genes into any plasmid containing a replication origin, an ampicillin resistance gene and a loxP site.

  6. A Comprehensive Gene Expression Meta-analysis Identifies Novel Immune Signatures in Rheumatoid Arthritis Patients.

    Science.gov (United States)

    Afroz, Sumbul; Giddaluru, Jeevan; Vishwakarma, Sandeep; Naz, Saima; Khan, Aleem Ahmed; Khan, Nooruddin

    2017-01-01

    Rheumatoid arthritis (RA), a symmetric polyarticular arthritis, has long been feared as one of the most disabling forms of arthritis. Identification of gene signatures associated with RA onset and progression would lead toward development of novel diagnostics and therapeutic interventions. This study was undertaken to identify unique gene signatures of RA patients through large-scale meta-profiling of a diverse collection of gene expression data sets. We carried out a meta-analysis of 8 publicly available RA patients' (107 RA patients and 76 healthy controls) gene expression data sets and further validated a few meta-signatures in RA patients through quantitative real-time PCR (RT-qPCR). We identified a robust meta-profile comprising 33 differentially expressed genes, which were consistently and significantly expressed across all the data sets. Our meta-analysis unearthed upregulation of a few novel gene signatures including PLCG2, HLA-DOB, HLA-F, EIF4E2, and CYFIP2, which were validated in peripheral blood mononuclear cell samples of RA patients. Further, functional and pathway enrichment analysis reveals perturbation of several meta-genes involved in signaling pathways pertaining to inflammation, antigen presentation, hypoxia, and apoptosis during RA. Additionally, PLCG2 (phospholipase Cγ2) popped out as a novel meta-gene involved in most of the pathways relevant to RA including inflammasome activation, platelet aggregation, and activation, thereby suggesting PLCG2 as a potential therapeutic target for controlling excessive inflammation during RA. In conclusion, these findings highlight the utility of meta-analysis approach in identifying novel gene signatures that might provide mechanistic insights into disease onset, progression and possibly lead toward the development of better diagnostic and therapeutic interventions against RA.

  7. A Comprehensive Gene Expression Meta-analysis Identifies Novel Immune Signatures in Rheumatoid Arthritis Patients

    Science.gov (United States)

    Afroz, Sumbul; Giddaluru, Jeevan; Vishwakarma, Sandeep; Naz, Saima; Khan, Aleem Ahmed; Khan, Nooruddin

    2017-01-01

    Rheumatoid arthritis (RA), a symmetric polyarticular arthritis, has long been feared as one of the most disabling forms of arthritis. Identification of gene signatures associated with RA onset and progression would lead toward development of novel diagnostics and therapeutic interventions. This study was undertaken to identify unique gene signatures of RA patients through large-scale meta-profiling of a diverse collection of gene expression data sets. We carried out a meta-analysis of 8 publicly available RA patients’ (107 RA patients and 76 healthy controls) gene expression data sets and further validated a few meta-signatures in RA patients through quantitative real-time PCR (RT-qPCR). We identified a robust meta-profile comprising 33 differentially expressed genes, which were consistently and significantly expressed across all the data sets. Our meta-analysis unearthed upregulation of a few novel gene signatures including PLCG2, HLA-DOB, HLA-F, EIF4E2, and CYFIP2, which were validated in peripheral blood mononuclear cell samples of RA patients. Further, functional and pathway enrichment analysis reveals perturbation of several meta-genes involved in signaling pathways pertaining to inflammation, antigen presentation, hypoxia, and apoptosis during RA. Additionally, PLCG2 (phospholipase Cγ2) popped out as a novel meta-gene involved in most of the pathways relevant to RA including inflammasome activation, platelet aggregation, and activation, thereby suggesting PLCG2 as a potential therapeutic target for controlling excessive inflammation during RA. In conclusion, these findings highlight the utility of meta-analysis approach in identifying novel gene signatures that might provide mechanistic insights into disease onset, progression and possibly lead toward the development of better diagnostic and therapeutic interventions against RA. PMID:28210261

  8. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    Science.gov (United States)

    2014-11-11

    information gleaned from these transposon studies was used to inform our next set of designs by predicting genes switching from N to E or I as paralogous ...remaining in RGD and homologs found in other organisms. A BLASTp score of 1e-5 was used as the similarity cutoff. Functional classifications... homologs to RGD in that organism. Inside the dashed circle is for prokaryotes and archea. Those outside are for eukaryotes.

  9. Analysis of gene expression using gene sets discriminates cancer patients with and without late radiation toxicity

    NARCIS (Netherlands)

    J.P. Svensson; L.J.A. Stalpers; R.E.E. Esveldt-van Lange; N.A.P. Franken; J. Haveman; B. Klein; I. Turesson; H. Vrieling; M. Giphart-Gassler

    2006-01-01

    Background Radiation is an effective anti-cancer therapy but leads to severe late radiation toxicity in 5%-10% of patients. Assuming that genetic susceptibility impacts this risk, we hypothesized that the cellular response of normal tissue to X-rays could discriminate patients with and without late

  10. EXP-PAC: providing comparative analysis and storage of next generation gene expression data.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Lefèvre, Christophe

    2012-07-01

    Microarrays and more recently RNA sequencing has led to an increase in available gene expression data. How to manage and store this data is becoming a key issue. In response we have developed EXP-PAC, a web based software package for storage, management and analysis of gene expression and sequence data. Unique to this package is SQL based querying of gene expression data sets, distributed normalization of raw gene expression data and analysis of gene expression data across experiments and species. This package has been populated with lactation data in the international milk genomic consortium web portal (http://milkgenomics.org/). Source code is also available which can be hosted on a Windows, Linux or Mac APACHE server connected to a private or public network (http://mamsap.it.deakin.edu.au/~pcc/Release/EXP_PAC.html). Copyright © 2012 Elsevier Inc. All rights reserved.

  11. Global analysis of gene expression in pulmonary fibrosis reveals distinct programs regulating lung inflammation and fibrosis

    Science.gov (United States)

    Kaminski, Naftali; Allard, John D.; Pittet, Jean F.; Zuo, Fengrong; Griffiths, Mark J. D.; Morris, David; Huang, Xiaozhu; Sheppard, Dean; Heller, Renu A.

    2000-02-01

    The molecular mechanisms of pulmonary fibrosis are poorly understood. We have used oligonucleotide arrays to analyze the gene expression programs that underlie pulmonary fibrosis in response to bleomycin, a drug that causes lung inflammation and fibrosis, in two strains of susceptible mice (129 and C57BL/6). We then compared the gene expression patterns in these mice with 129 mice carrying a null mutation in the epithelial-restricted integrin 6 subunit (6/-), which develop inflammation but are protected from pulmonary fibrosis. Cluster analysis identified two distinct groups of genes involved in the inflammatory and fibrotic responses. Analysis of gene expression at multiple time points after bleomycin administration revealed sequential induction of subsets of genes that characterize each response. The availability of this comprehensive data set should accelerate the development of more effective strategies for intervention at the various stages in the development of fibrotic diseases of the lungs and other organs.

  12. Integrated analysis of gene expression by association rules discovery

    Directory of Open Access Journals (Sweden)

    Carazo Jose M

    2006-02-01

    Full Text Available Abstract Background Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process. Results In this study we present a method for the integrative analysis of microarray data based on the Association Rules Discovery data mining technique. The approach integrates gene annotations and expression data to discover intrinsic associations among both data sources based on co-occurrence patterns. We applied the proposed methodology to the analysis of gene expression datasets in which genes were annotated with metabolic pathways, transcriptional regulators and Gene Ontology categories. Automatically extracted associations revealed significant relationships among these gene attributes and expression patterns, where many of them are clearly supported by recently reported work. Conclusion The integration of external biological information and gene expression data can provide insights about the biological processes associated to gene expression programs. In this paper we show that the proposed methodology is able to integrate multiple gene annotations and expression data in the same analytic framework and extract meaningful associations among heterogeneous sources of data. An implementation of the method is included in the Engene software package.

  13. Phylogenetic analysis of cubilin (CUBN) gene.

    Science.gov (United States)

    Shaik, Abjal Pasha; Alsaeed, Abbas H; Kiranmayee, S; Bammidi, Vk; Sultana, Asma

    2013-01-01

    Cubilin, (CUBN; also known as intrinsic factor-cobalamin receptor [Homo sapiens Entrez Pubmed ref NM_001081.3; NG_008967.1; GI: 119606627]), located in the epithelium of intestine and kidney acts as a receptor for intrinsic factor - vitamin B12 complexes. Mutations in CUBN may play a role in autosomal recessive megaloblastic anemia. The current study investigated the possible role of CUBN in evolution using phylogenetic testing. A total of 588 BLAST hits were found for the cubilin query sequence and these hits showed putative conserved domain, CUB superfamily (as on 27(th) Nov 2012). A first-pass phylogenetic tree was constructed to identify the taxa which most often contained the CUBN sequences. Following this, we narrowed down the search by manually deleting sequences which were not CUBN. A repeat phylogenetic analysis of 25 taxa was performed using PhyML, RAxML and TreeDyn softwares to confirm that CUBN is a conserved protein emphasizing its importance as an extracellular domain and being present in proteins mostly known to be involved in development in many chordate taxa but not found in prokaryotes, plants and yeast.. No horizontal gene transfers have been found between different taxa.

  14. How Many Genes Are Needed for a Discriminant Microarray Data Analysis ?

    CERN Document Server

    Li, W; Li, Wentian; Yang, Yaning

    2001-01-01

    The analysis of the leukemia data from Whitehead/MIT group is a discriminant analysis (also called a supervised learning). Among thousands of genes whose expression levels are measured, not all are needed for discriminant analysis: a gene may either not contribute to the separation of two types of tissues/cancers, or it may be redundant because it is highly correlated with other genes. There are two theoretical frameworks in which variable selection (or gene selection in our case) can be addressed. The first is model selection, and the second is model averaging. We have carried out model selection using Akaike information criterion and Bayesian information criterion with logistic regression (discrimination, prediction, or classification) to determine the number of genes that provide the best model. These model selection criteria set upper limits of 22-25 and 12-13 genes for this data set with 38 samples, and the best model consists of only one (no.4847, zyxin) or two genes. We have also carried out model aver...

  15. Gene expression profile analysis of human intervertebral disc degeneration

    OpenAIRE

    Kai Chen; Dajiang Wu; Xiaodong Zhu; Haijian Ni; Xianzhao Wei; Ningfang Mao; Yang Xie; Yunfei Niu; Ming Li

    2013-01-01

    In this study, we used microarray analysis to investigate the biogenesis and progression of intervertebral disc degeneration. The gene expression profiles of 37 disc tissue samples obtained from patients with herniated discs and degenerative disc disease collected by the National Cancer Institute Cooperative Tissue Network were analyzed. Differentially expressed genes between more and less degenerated discs were identified by significant analysis of microarray. A total of 555 genes were signi...

  16. Genetic investigation of 100 heart genes in sudden unexplained death victims in a forensic setting

    DEFF Research Database (Denmark)

    Christiansen, Sofie Lindgren; Hertz, Christin Løth; Ferrero, Laura

    2016-01-01

    indicate that broad genetic investigation of SUD victims increases the diagnostic outcome, and the investigation should comprise genes involved in both cardiomyopathies and cardiac channelopathies.European Journal of Human Genetics advance online publication, 21 September 2016; doi:10.1038/ejhg.2016.118....

  17. Using RNAi in C. "elegans" to Demonstrate Gene Knockdown Phenotypes in the Undergraduate Biology Lab Setting

    Science.gov (United States)

    Roy, Nicole M.

    2013-01-01

    RNA interference (RNAi) is a powerful technology used to knock down genes in basic research and medicine. In 2006 RNAi technology using "Caenorhabditis elegans" ("C. elegans") was awarded the Nobel Prize in medicine and thus students graduating in the biological sciences should have experience with this technology. However,…

  18. Integrative meta-analysis of differential gene expression in acute myeloid leukemia.

    Directory of Open Access Journals (Sweden)

    Brady G Miller

    Full Text Available BACKGROUND: Acute myeloid leukemia (AML is a heterogeneous disease with an overall poor prognosis. Gene expression profiling studies of patients with AML has provided key insights into disease pathogenesis while exposing potential diagnostic and prognostic markers and therapeutic targets. A systematic comparison of the large body of gene expression profiling studies in AML has the potential to test the extensibility of conclusions based on single studies and provide further insights into AML. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we systematically compared 25 published reports of gene expression profiling in AML. There were a total of 4,918 reported genes of which one third were reported in more than one study. We found that only a minority of reported prognostically-associated genes (9.6% were replicated in at least one other study. In a combined analysis, we comprehensively identified both gene sets and functional gene categories and pathways that exhibited significant differential regulation in distinct prognostic categories, including many previously unreported associations. CONCLUSIONS/SIGNIFICANCE: We developed a novel approach for granular, cross-study analysis of gene-by-gene data and their relationships with established prognostic features and patient outcome. We identified many robust novel prognostic molecular features in AML that were undetected in prior studies, and which provide insights into AML pathogenesis with potential diagnostic, prognostic, and therapeutic implications. Our database and integrative analysis are available online (http://gat.stamlab.org.

  19. Three Approaches to Data Analysis Test Theory, Rough Sets and Logical Analysis of Data

    CERN Document Server

    Chikalov, Igor; Lozina, Irina; Moshkov, Mikhail; Nguyen, Hung Son; Skowron, Andrzej; Zielosko, Beata

    2013-01-01

    In this book, the following three approaches to data analysis are presented:  - Test Theory, founded by Sergei V. Yablonskii (1924-1998); the first publications appeared in 1955 and 1958, -           Rough Sets, founded by Zdzisław I. Pawlak (1926-2006); the first publications appeared in 1981 and 1982, -           Logical Analysis of Data, founded by Peter L. Hammer (1936-2006); the first publications appeared in 1986 and 1988. These three approaches have much in common, but researchers active in one of these areas often have a limited knowledge about the results and methods developed in the other two. On the other hand, each of the approaches shows some originality and we believe that the exchange of knowledge can stimulate further development of each of them. This can lead to new theoretical results and real-life applications and, in particular, new results based on combination of these three data analysis approaches can be expected.

  20. Turkish Special Education Teachers' Implementation of Functional Analysis in Classroom Settings

    Science.gov (United States)

    Erbas, Dilek; Yucesoy, Serife; Turan, Yasemin; Ostrosky, Michaelene M.

    2006-01-01

    Three Turkish special education teachers conducted a functional analysis to identify variables that might initiate or maintain the problem behaviors of three children with developmental disabilities. The analysis procedures were conducted in natural classroom settings. In Phase 1, following initial training in functional analysis procedures, the…

  1. Analysis of bacterial xylose isomerase gene diversity using gene-targeted metagenomics.

    Science.gov (United States)

    Nurdiani, Dini; Ito, Michihiro; Maruyama, Toru; Terahara, Takeshi; Mori, Tetsushi; Ugawa, Shin; Takeyama, Haruko

    2015-08-01

    Bacterial xylose isomerases (XI) are promising resources for efficient biofuel production from xylose in lignocellulosic biomass. Here, we investigated xylose isomerase gene (xylA) diversity in three soil metagenomes differing in plant vegetation and geographical location, using an amplicon pyrosequencing approach and two newly-designed primer sets. A total of 158,555 reads from three metagenomic DNA replicates for each soil sample were classified into 1127 phylotypes, detected in triplicate and defined by 90% amino acid identity. The phylotype coverage was estimated to be within the range of 84.0-92.7%. The xylA gene phylotypes obtained were phylogenetically distributed across the two known xylA groups. They shared 49-100% identities with their closest-related XI sequences in GenBank. Phylotypes demonstrating analysis, suggesting soil-specific xylA genotypes and taxonomic compositions. The differences among xylA members and their compositions in the soil were strongly correlated with 16S rRNA variation between soil samples, also assessed by amplicon pyrosequencing. This is the first report of xylA diversity in environmental samples assessed by amplicon pyrosequencing. Our data provide information regarding xylA diversity in nature, and can be a basis for the screening of novel xylA genotypes for practical applications.

  2. A Hybrid SOM-SVM Approach for the Zebrafish Gene Expression Analysis

    Institute of Scientific and Technical Information of China (English)

    Wei Wu; Xin Liu; Min Xu; Jin-Rong Peng; Rudy Setiono

    2005-01-01

    Microarray technology can be employed to quantitatively measure the expression of thousands of genes in a single experiment. It has become one of the main tools for global gene expression analysis in molecular biology research in recent years. The large amount of expression data generated by this technology makes the study of certain complex biological problems possible, and machine learning methods are expected to play a crucial role in the analysis process. In this paper,we present our results from integrating the self-organizing map (SOM) and the support vector machine (SVM) for the analysis of the various functions of zebrafish genes based on their expression. The most distinctive characteristic of our zebrafish gene expression is that the number of samples of different classes is imbalanced. We discuss how SOM can be used as a data-filtering tool to improve the classification performance of the SVM on this data set.

  3. A resampling-based meta-analysis for detection of differential gene expression in breast cancer

    Directory of Open Access Journals (Sweden)

    Ergul Gulusan

    2008-12-01

    Full Text Available Abstract Background Accuracy in the diagnosis of breast cancer and classification of cancer subtypes has improved over the years with the development of well-established immunohistopathological criteria. More recently, diagnostic gene-sets at the mRNA expression level have been tested as better predictors of disease state. However, breast cancer is heterogeneous in nature; thus extraction of differentially expressed gene-sets that stably distinguish normal tissue from various pathologies poses challenges. Meta-analysis of high-throughput expression data using a collection of statistical methodologies leads to the identification of robust tumor gene expression signatures. Methods A resampling-based meta-analysis strategy, which involves the use of resampling and application of distribution statistics in combination to assess the degree of significance in differential expression between sample classes, was developed. Two independent microarray datasets that contain normal breast, invasive ductal carcinoma (IDC, and invasive lobular carcinoma (ILC samples were used for the meta-analysis. Expression of the genes, selected from the gene list for classification of normal breast samples and breast tumors encompassing both the ILC and IDC subtypes were tested on 10 independent primary IDC samples and matched non-tumor controls by real-time qRT-PCR. Other existing breast cancer microarray datasets were used in support of the resampling-based meta-analysis. Results The two independent microarray studies were found to be comparable, although differing in their experimental methodologies (Pearson correlation coefficient, R = 0.9389 and R = 0.8465 for ductal and lobular samples, respectively. The resampling-based meta-analysis has led to the identification of a highly stable set of genes for classification of normal breast samples and breast tumors encompassing both the ILC and IDC subtypes. The expression results of the selected genes obtained through real

  4. Bioinformatics analysis of estrogen-responsive genes

    Science.gov (United States)

    Handel, Adam E.

    2016-01-01

    Estrogen is a steroid hormone that plays critical roles in a myriad of intracellular pathways. The expression of many genes is regulated through the steroid hormone receptors ESR1 and ESR2. These bind to DNA and modulate the expression of target genes. Identification of estrogen target genes is greatly facilitated by the use of transcriptomic methods, such as RNA-seq and expression microarrays, and chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq). Combining transcriptomic and ChIP-seq data enables a distinction to be drawn between direct and indirect estrogen target genes. This chapter will discuss some methods of identifying estrogen target genes that do not require any expertise in programming languages or complex bioinformatics. PMID:26585125

  5. Effective Boolean dynamics analysis to identify functionally important genes in large-scale signaling networks.

    Science.gov (United States)

    Trinh, Hung-Cuong; Kwon, Yung-Keun

    2015-11-01

    Efficiently identifying functionally important genes in order to understand the minimal requirements of normal cellular development is challenging. To this end, a variety of structural measures have been proposed and their effectiveness has been investigated in recent literature; however, few studies have shown the effectiveness of dynamics-based measures. This led us to investigate a dynamic measure to identify functionally important genes, and the effectiveness of which was verified through application on two large-scale human signaling networks. We specifically consider Boolean sensitivity-based dynamics against an update-rule perturbation (BSU) as a dynamic measure. Through investigations on two large-scale human signaling networks, we found that genes with relatively high BSU values show slower evolutionary rate and higher proportions of essential genes and drug targets than other genes. Gene-ontology analysis showed clear differences between the former and latter groups of genes. Furthermore, we compare the identification accuracies of essential genes and drug targets via BSU and five well-known structural measures. Although BSU did not always show the best performance, it effectively identified the putative set of genes, which is significantly different from the results obtained via the structural measures. Most interestingly, BSU showed the highest synergy effect in identifying the functionally important genes in conjunction with other measures. Our results imply that Boolean-sensitive dynamics can be used as a measure to effectively identify functionally important genes in signaling networks.

  6. Transcriptomic analysis of tomato carpel development reveals alterations in ethylene and gibberellin synthesis during pat3/pat4 parthenocarpic fruit set

    Directory of Open Access Journals (Sweden)

    Pascual Laura

    2009-05-01

    Full Text Available Abstract Background Tomato fruit set is a key process that has a great economic impact on crop production. We employed the Affymetrix GeneChip Tomato Genome Array to compare the transcriptome of a non-parthenocarpic line, UC82, with that of the parthenocarpic line RP75/59 (pat3/pat4 mutant. We analyzed the transcriptome under normal conditions as well as with forced parthenocarpic development in RP75/59, emasculating the flowers 2 days before anthesis. This analysis helps to understand the fruit set in tomato. Results Differentially expressed genes were extracted with maSigPro, which is designed for the analysis of single and multiseries time course microarray experiments. 2842 genes showed changes throughout normal carpel development and fruit set. Most of them showed a change of expression at or after anthesis. The main differences between lines were concentrated at the anthesis stage. We found 758 genes differentially expressed in parthenocarpic fruit set. Among these genes we detected cell cycle-related genes that were still activated at anthesis in the parthenocarpic line, which shows the lack of arrest in the parthenocarpic line at anthesis. Key genes for the synthesis of gibberellins and ethylene, which were up-regulated in the parthenocarpic line were also detected. Conclusion Comparisons between array experiments determined that anthesis was the most different stage and the key point at which most of the genes were modulated. In the parthenocarpic line, anthesis seemed to be a short transitional stage to fruit set. In this line, the high GAs contends leads to the development of a parthenocarpic fruit, and ethylene may mimic pollination signals, inducing auxin synthesis in the ovary and the development of a jelly fruit.

  7. Meta-analysis of several gene lists for distinct types of cancer: A simple way to reveal common prognostic markers

    Directory of Open Access Journals (Sweden)

    Sun Xiao

    2007-04-01

    Full Text Available Abstract Background Although prognostic biomarkers specific for particular cancers have been discovered, microarray analysis of gene expression profiles, supported by integrative analysis algorithms, helps to identify common factors in molecular oncology. Similarities of Ordered Gene Lists (SOGL is a recently proposed approach to meta-analysis suitable for identifying features shared by two data sets. Here we extend the idea of SOGL to the detection of significant prognostic marker genes from microarrays of multiple data sets. Three data sets for leukemia and the other six for different solid tumors are used to demonstrate our method, using established statistical techniques. Results We describe a set of significantly similar ordered gene lists, representing outcome comparisons for distinct types of cancer. This kind of similarity could improve the diagnostic accuracies of individual studies when SOGL is incorporated into the support vector machine algorithm. In particular, we investigate the similarities among three ordered gene lists pertaining to mesothelioma survival, prostate recurrence and glioma survival. The similarity-driving genes are related to the outcomes of patients with lung cancer with a hazard ratio of 4.47 (p = 0.035. Many of these genes are involved in breakdown of EMC proteins regulating angiogenesis, and may be used for further research on prognostic markers and molecular targets of gene therapy for cancers. Conclusion The proposed method and its application show the potential of such meta-analyses in clinical studies of gene expression profiles.

  8. Inverse bifurcation analysis: application to simple gene systems

    Directory of Open Access Journals (Sweden)

    Schuster Peter

    2006-07-01

    Full Text Available Abstract Background Bifurcation analysis has proven to be a powerful method for understanding the qualitative behavior of gene regulatory networks. In addition to the more traditional forward problem of determining the mapping from parameter space to the space of model behavior, the inverse problem of determining model parameters to result in certain desired properties of the bifurcation diagram provides an attractive methodology for addressing important biological problems. These include understanding how the robustness of qualitative behavior arises from system design as well as providing a way to engineer biological networks with qualitative properties. Results We demonstrate that certain inverse bifurcation problems of biological interest may be cast as optimization problems involving minimal distances of reference parameter sets to bifurcation manifolds. This formulation allows for an iterative solution procedure based on performing a sequence of eigen-system computations and one-parameter continuations of solutions, the latter being a standard capability in existing numerical bifurcation software. As applications of the proposed method, we show that the problem of maximizing regions of a given qualitative behavior as well as the reverse engineering of bistable gene switches can be modelled and efficiently solved.

  9. Analysis of multiplex gene expression maps obtained by voxelation

    Directory of Open Access Journals (Sweden)

    Smith Desmond J

    2009-04-01

    Full Text Available Abstract Background Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. Results To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in

  10. Transcript and protein profiling identify candidate gene sets of potential adaptive significance in New Zealand Pachycladon

    Directory of Open Access Journals (Sweden)

    Schmidt Silvia

    2010-05-01

    Full Text Available Abstract Background Transcript profiling of closely related species provides a means for identifying genes potentially important in species diversification. However, the predictive value of transcript profiling for inferring downstream-physiological processes has been unclear. In the present study we use shotgun proteomics to validate inferences from microarray studies regarding physiological differences in three Pachycladon species. We compare transcript and protein profiling and evaluate their predictive value for inferring glucosinolate chemotypes characteristic of these species. Results Evidence from heterologous microarrays and shotgun proteomics revealed differential expression of genes involved in glucosinolate hydrolysis (myrosinase-associated proteins and biosynthesis (methylthioalkylmalate isomerase and dehydrogenase, the interconversion of carbon dioxide and bicarbonate (carbonic anhydrases, water use efficiency (ascorbate peroxidase, 2 cys peroxiredoxin, 20 kDa chloroplastic chaperonin, mitochondrial succinyl CoA ligase and others (glutathione-S-transferase, serine racemase, vegetative storage proteins, genes related to translation and photosynthesis. Differences in glucosinolate hydrolysis products were directly confirmed. Overall, prediction of protein abundances from transcript profiles was stronger than prediction of transcript abundance from protein profiles. Protein profiles also proved to be more accurate predictors of glucosinolate profiles than transcript profiles. The similarity of species profiles for both transcripts and proteins reflected previously inferred phylogenetic relationships while glucosinolate chemotypes did not. Conclusions We have used transcript and protein profiling to predict physiological processes that evolved differently during diversification of three Pachycladon species. This approach has also identified candidate genes potentially important in adaptation, which are now the focus of ongoing study

  11. The smallest known genomes of multicellular and toxic cyanobacteria: comparison, minimal gene sets for linked traits and the evolutionary implications.

    Directory of Open Access Journals (Sweden)

    Karina Stucken

    Full Text Available Cyanobacterial morphology is diverse, ranging from unicellular spheres or rods to multicellular structures such as colonies and filaments. Multicellular species represent an evolutionary strategy to differentiate and compartmentalize certain metabolic functions for reproduction and nitrogen (N(2 fixation into specialized cell types (e.g. akinetes, heterocysts and diazocytes. Only a few filamentous, differentiated cyanobacterial species, with genome sizes over 5 Mb, have been sequenced. We sequenced the genomes of two strains of closely related filamentous cyanobacterial species to yield further insights into the molecular basis of the traits of N(2 fixation, filament formation and cell differentiation. Cylindrospermopsis raciborskii CS-505 is a cylindrospermopsin-producing strain from Australia, whereas Raphidiopsis brookii D9 from Brazil synthesizes neurotoxins associated with paralytic shellfish poisoning (PSP. Despite their different morphology, toxin composition and disjunct geographical distribution, these strains form a monophyletic group. With genome sizes of approximately 3.9 (CS-505 and 3.2 (D9 Mb, these are the smallest genomes described for free-living filamentous cyanobacteria. We observed remarkable gene order conservation (synteny between these genomes despite the difference in repetitive element content, which accounts for most of the genome size difference between them. We show here that the strains share a specific set of 2539 genes with >90% average nucleotide identity. The fact that the CS-505 and D9 genomes are small and streamlined compared to those of other filamentous cyanobacterial species and the lack of the ability for heterocyst formation in strain D9 allowed us to define a core set of genes responsible for each trait in filamentous species. We presume that in strain D9 the ability to form proper heterocysts was secondarily lost together with N(2 fixation capacity. Further comparisons to all available cyanobacterial

  12. The smallest known genomes of multicellular and toxic cyanobacteria: comparison, minimal gene sets for linked traits and the evolutionary implications.

    Science.gov (United States)

    Stucken, Karina; John, Uwe; Cembella, Allan; Murillo, Alejandro A; Soto-Liebe, Katia; Fuentes-Valdés, Juan J; Friedel, Maik; Plominsky, Alvaro M; Vásquez, Mónica; Glöckner, Gernot

    2010-02-16

    Cyanobacterial morphology is diverse, ranging from unicellular spheres or rods to multicellular structures such as colonies and filaments. Multicellular species represent an evolutionary strategy to differentiate and compartmentalize certain metabolic functions for reproduction and nitrogen (N(2)) fixation into specialized cell types (e.g. akinetes, heterocysts and diazocytes). Only a few filamentous, differentiated cyanobacterial species, with genome sizes over 5 Mb, have been sequenced. We sequenced the genomes of two strains of closely related filamentous cyanobacterial species to yield further insights into the molecular basis of the traits of N(2) fixation, filament formation and cell differentiation. Cylindrospermopsis raciborskii CS-505 is a cylindrospermopsin-producing strain from Australia, whereas Raphidiopsis brookii D9 from Brazil synthesizes neurotoxins associated with paralytic shellfish poisoning (PSP). Despite their different morphology, toxin composition and disjunct geographical distribution, these strains form a monophyletic group. With genome sizes of approximately 3.9 (CS-505) and 3.2 (D9) Mb, these are the smallest genomes described for free-living filamentous cyanobacteria. We observed remarkable gene order conservation (synteny) between these genomes despite the difference in repetitive element content, which accounts for most of the genome size difference between them. We show here that the strains share a specific set of 2539 genes with >90% average nucleotide identity. The fact that the CS-505 and D9 genomes are small and streamlined compared to those of other filamentous cyanobacterial species and the lack of the ability for heterocyst formation in strain D9 allowed us to define a core set of genes responsible for each trait in filamentous species. We presume that in strain D9 the ability to form proper heterocysts was secondarily lost together with N(2) fixation capacity. Further comparisons to all available cyanobacterial genomes

  13. The Sentiment Trend Analysis of Twitter Based on Set Pair Contact Degree

    Directory of Open Access Journals (Sweden)

    Chunying Zhang

    2013-01-01

    Full Text Available Sentiment trend of twitter users have a great influence on their friends and the crowd listened. This paper directs at the user sentiment state of twitter, the unique medium, and applies set pair analysis method for trend analysis. First, we begin with set pair contact degree, then based on set pair affective computing model to make comparison with the size relationship of same degree, difference degree, opposition degree of the emotion, to build the user sentiment trend analysis model; Secondly, we analyze the influence for the user's own sentiment trend when the value changed of difference coefficient ; thirdly, after analyze to obtain one user's sentiment orientation threshold as prerequisite for user behavior prediction. Finally, setting an example to calculate the sentiment trend of one twitter, then to get the conclusion is that the analysis of user emotion from a three-dimensional angle is more realistic than the single angle.

  14. In vivo validation of a computationally predicted conserved Ath5 target gene set.

    Directory of Open Access Journals (Sweden)

    Filippo Del Bene

    2007-09-01

    Full Text Available So far, the computational identification of transcription factor binding sites is hampered by the complexity of vertebrate genomes. Here we present an in silico procedure to predict target sites of a transcription factor in complex genomes using its binding site. In a first step sequence, comparison of closely related genomes identifies the binding sites in conserved cis-regulatory regions (phylogenetic footprinting. Subsequently, more remote genomes are introduced into the comparison to identify highly conserved and therefore putatively functional binding sites (phylogenetic filtering. When applied to the binding site of atonal homolog 5 (Ath5 or ATOH7, this procedure efficiently filters evolutionarily conserved binding sites out of more than 300,000 instances in a vertebrate genome. We validate a selection of the linked target genes by showing coexpression with and transcriptional regulation by Ath5. Finally, chromatin immunoprecipitation demonstrates the occupancy of the target gene promoters by Ath5. Thus, our procedure, applied to whole genomes, is a fast and predictive tool to in silico filter the target genes of a given transcription factor with defined binding site.

  15. PKA phosphorylation redirects ERα to promoters of a unique gene set to induce tamoxifen resistance.

    Science.gov (United States)

    de Leeuw, R; Flach, K; Bentin Toaldo, C; Alexi, X; Canisius, S; Neefjes, J; Michalides, R; Zwart, W

    2013-07-25

    Protein kinase A (PKA)-induced estrogen receptor alpha (ERα) phosphorylation at serine residue 305 (ERαS305-P) can induce tamoxifen (TAM) resistance in breast cancer. How this phospho-modification affects ERα specificity and translates into TAM resistance is unclear. Here, we show that S305-P modification of ERα reprograms the receptor, redirecting it to new transcriptional start sites, thus modulating the transcriptome. By altering the chromatin-binding pattern, Ser305 phosphorylation of ERα translates into a 26-gene expression classifier that identifies breast cancer patients with a poor disease outcome after TAM treatment. MYC-target genes and networks were significantly enriched in this gene classifier that includes a number of selective targets for ERαS305-P. The enhanced expression of MYC increased cell proliferation in the presence of TAM. We demonstrate that activation of the PKA signaling pathway alters the transcriptome by redirecting ERα to new transcriptional start sites, resulting in altered transcription and TAM resistance.

  16. Heat-inducible RNAi for gene functional analysis in plants.

    Science.gov (United States)

    Masclaux, Frédéric; Galaud, Jean-Philippe

    2011-01-01

    Controlling gene expression during plant development is an efficient method to explore gene function and RNA interference (RNAi) is now considered as a powerful technology for gene functional analysis. However, constitutive gene silencing cannot be used with genes involved in fundamental processes such as embryo viability or plant growth and alternative silencing strategies avoiding these limitations should be preferred. Tissue-specific and inducible promoters, able to control gene expression at spatial and/or temporal level, can be used to circumvent viability problems. In this chapter, after a rapid overview of the inducible promoters currently used for transgenic approaches in plants, we describe a method we have developed to study gene function by heat-inducible RNAi. This system is easy to use and complementary to those based on chemical gene inducer treatments and might be useful for both research and biotechnological applications.

  17. Database for exchangeable gene trap clones: pathway and gene ontology analysis of exchangeable gene trap clone mouse lines.

    Science.gov (United States)

    Araki, Masatake; Nakahara, Mai; Muta, Mayumi; Itou, Miharu; Yanai, Chika; Yamazoe, Fumika; Miyake, Mikiko; Morita, Ayaka; Araki, Miyuki; Okamoto, Yoshiyuki; Nakagata, Naomi; Yoshinobu, Kumiko; Yamamura, Ken-ichi; Araki, Kimi

    2014-02-01

    Gene trapping in embryonic stem (ES) cells is a proven method for large-scale random insertional mutagenesis in the mouse genome. We have established an exchangeable gene trap system, in which a reporter gene can be exchanged for any other DNA of interest through Cre/mutant lox-mediated recombination. We isolated trap clones, analyzed trapped genes, and constructed the database for Exchangeable Gene Trap Clones (EGTC) [http://egtc.jp]. The number of registered ES cell lines was 1162 on 31 August 2013. We also established 454 mouse lines from trap ES clones and deposited them in the mouse embryo bank at the Center for Animal Resources and Development, Kumamoto University, Japan. The EGTC database is the most extensive academic resource for gene-trap mouse lines. Because we us