Aravind, L.; Anantharaman, Vivek; Venancio, Thiago M.
The genomic revolution has provided the first glimpses of the architecture of regulatory networks. Combined with evolutionary information, the “network view” of life processes leads to remarkable insights into how biological systems have been shaped by various forces. This understanding is critical because biological systems, including regulatory networks, are not products of engineering but of historical contingencies. In this light, we attempt a synthetic overview of the natural history of regulatory networks operating in the development and differentiation of multicellular organisms. We first introduce regulatory networks and their organizational principles as can be deduced using ideas from the graph theory. We then discuss findings from comparative genomics to illustrate the effects of lineage-specific expansions, gene-loss, and non-protein-coding DNA on the architecture of networks. We consider the interaction between expansions of transcription factors, and cis regulatory and more general chromatin state stabilizing elements in the emergence of morphological complexity. Finally, we consider a case study of the Notch sub-network, which is present throughout Metazoa, to examine how such a regulatory system has been pieced together in evolution from new innovations and pre-existing components that were originally functionally distinct. PMID:19530132
Aravind, L; Anantharaman, Vivek; Venancio, Thiago M
The genomic revolution has provided the first glimpses of the architecture of regulatory networks. Combined with evolutionary information, the "network view" of life processes leads to remarkable insights into how biological systems have been shaped by various forces. This understanding is critical because biological systems, including regulatory networks, are not products of engineering but of historical contingencies. In this light, we attempt a synthetic overview of the natural history of regulatory networks operating in the development and differentiation of multicellular organisms. We first introduce regulatory networks and their organizational principles as can be deduced using ideas from the graph theory. We then discuss findings from comparative genomics to illustrate the effects of lineage-specific expansions, gene-loss, and nonprotein-coding DNA on the architecture of networks. We consider the interaction between expansions of transcription factors, and cis regulatory and more general chromatin state stabilizing elements in the emergence of morphological complexity. Finally, we consider a case study of the Notch subnetwork, which is present throughout Metazoa, to examine how such a regulatory system has been pieced together in evolution from new innovations and pre-existing components that were originally functionally distinct.
Song, Qi; Grene, Ruth; Heath, Lenwood S; Li, Song
In gene regulatory networks, transcription factors often function as co-regulators to synergistically induce or inhibit expression of their target genes. However, most existing module-finding algorithms can only identify densely connected genes but not co-regulators in regulatory networks. We have developed a new computational method, CoReg, to identify transcription co-regulators in large-scale regulatory networks. CoReg calculates gene similarities based on number of common neighbors of any two genes. Using simulated and real networks, we compared the performance of different similarity indices and existing module-finding algorithms and we found CoReg outperforms other published methods in identifying co-regulatory genes. We applied CoReg to a large-scale network of Arabidopsis with more than 2.8 million edges and we analyzed more than 2,300 published gene expression profiles to charaterize co-expression patterns of gene moduled identified by CoReg. We identified three types of modules in the Arabidopsis network: regulator modules, target modules and intermediate modules. Regulator modules include genes with more than 90% edges as out-going edges; Target modules include genes with more than 90% edges as incoming edges. Other modules are classified as intermediate modules. We found that genes in target modules tend to be highly co-expressed under abiotic stress conditions, suggesting this network struture is robust against perturbation. Our analysis shows that the CoReg is an accurate method in identifying co-regulatory genes in large-scale networks. We provide CoReg as an R package, which can be applied in finding co-regulators in any organisms with genome-scale regulatory network data.
Zheng, Guangyong; Xu, Yaochen; Zhang, Xiujun; Liu, Zhi-Ping; Wang, Zhuo; Chen, Luonan; Zhu, Xin-Guang
A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .
Full Text Available Reconstruction of the regulatory network is an important step in understanding how organisms control the expression of gene products and therefore phenotypes. Recent studies have pointed out the importance of regulatory network plasticity in bacterial adaptation and evolution. The evolution of such networks within and outside the species boundary is however still obscure. Sinorhizobium meliloti is an ideal species for such study, having three large replicons, many genomes available and a significant knowledge of its transcription factors (TF. Each replicon has a specific functional and evolutionary mark; which might also emerge from the analysis of their regulatory signatures. Here we have studied the plasticity of the regulatory network within and outside the S. meliloti species, looking for the presence of 41 TFs binding motifs in 51 strains and 5 related rhizobial species. We have detected a preference of several TFs for one of the three replicons, and the function of regulated genes was found to be in accordance with the overall replicon functional signature: house-keeping functions for the chromosome, metabolism for the chromid, symbiosis for the megaplasmid. This therefore suggests a replicon-specific wiring of the regulatory network in the S. meliloti species. At the same time a significant part of the predicted regulatory network is shared between the chromosome and the chromid, thus adding an additional layer by which the chromid integrates itself in the core genome. Furthermore, the regulatory network distance was found to be correlated with both promoter regions and accessory genome evolution inside the species, indicating that both pangenome compartments are involved in the regulatory network evolution. We also observed that genes which are not included in the species regulatory network are more likely to belong to the accessory genome, indicating that regulatory interactions should also be considered to predict gene conservation in
Full Text Available De-novo reverse-engineering of genome-scale regulatory networks is a fundamental problem of biological and translational research. One of the major obstacles in developing and evaluating approaches for de-novo gene network reconstruction is the absence of high-quality genome-scale gold-standard networks of direct regulatory interactions. To establish a foundation for assessing the accuracy of de-novo gene network reverse-engineering, we constructed high-quality genome-scale gold-standard networks of direct regulatory interactions in Saccharomyces cerevisiae that incorporate binding and gene knockout data. Then we used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different datasets spanning over 4 data types. We found that most reconstructed networks had statistically significant accuracies. We also determined which statistical approaches and datasets/data types lead to networks with better reconstruction accuracies. While we found that de-novo reverse-engineering of the entire network is a challenging problem, it is possible to reconstruct sub-networks around some transcription factors with good accuracy. The latter transcription factors can be identified by assessing their connectivity in the inferred networks. Overall, this study provides the gene network reverse-engineering community with a rigorous assessment of the accuracy of S. cerevisiae gene network reconstruction and variability in performance of various approaches for learning both the entire network and sub-networks around transcription factors.
Koch, Christopher; Konieczka, Jay; Delorey, Toni; Lyons, Ana; Socha, Amanda; Davis, Kathleen; Knaack, Sara A; Thompson, Dawn; O'Shea, Erin K; Regev, Aviv; Roy, Sushmita
Changes in transcriptional regulatory networks can significantly contribute to species evolution and adaptation. However, identification of genome-scale regulatory networks is an open challenge, especially in non-model organisms. Here, we introduce multi-species regulatory network learning (MRTLE), a computational approach that uses phylogenetic structure, sequence-specific motifs, and transcriptomic data, to infer the regulatory networks in different species. Using simulated data from known networks and transcriptomic data from six divergent yeasts, we demonstrate that MRTLE predicts networks with greater accuracy than existing methods because it incorporates phylogenetic information. We used MRTLE to infer the structure of the transcriptional networks that control the osmotic stress responses of divergent, non-model yeast species and then validated our predictions experimentally. Interrogating these networks reveals that gene duplication promotes network divergence across evolution. Taken together, our approach facilitates study of regulatory network evolutionary dynamics across multiple poorly studied species. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Jensen Paul A
Full Text Available Abstract Background Several methods have been developed for analyzing genome-scale models of metabolism and transcriptional regulation. Many of these methods, such as Flux Balance Analysis, use constrained optimization to predict relationships between metabolic flux and the genes that encode and regulate enzyme activity. Recently, mixed integer programming has been used to encode these gene-protein-reaction (GPR relationships into a single optimization problem, but these techniques are often of limited generality and lack a tool for automating the conversion of rules to a coupled regulatory/metabolic model. Results We present TIGER, a Toolbox for Integrating Genome-scale Metabolism, Expression, and Regulation. TIGER converts a series of generalized, Boolean or multilevel rules into a set of mixed integer inequalities. The package also includes implementations of existing algorithms to integrate high-throughput expression data with genome-scale models of metabolism and transcriptional regulation. We demonstrate how TIGER automates the coupling of a genome-scale metabolic model with GPR logic and models of transcriptional regulation, thereby serving as a platform for algorithm development and large-scale metabolic analysis. Additionally, we demonstrate how TIGER's algorithms can be used to identify inconsistencies and improve existing models of transcriptional regulation with examples from the reconstructed transcriptional regulatory network of Saccharomyces cerevisiae. Conclusion The TIGER package provides a consistent platform for algorithm development and extending existing genome-scale metabolic models with regulatory networks and high-throughput data.
Yan, Koon-Kiu; Fang, Gang; Bhardwaj, Nitin; Alexander, Roger P.; Gerstein, Mark
The genome has often been called the operating system (OS) for a living organism. A computer OS is described by a regulatory control network termed the call graph, which is analogous to the transcriptional regulatory network in a cell. To apply our firsthand knowledge of the architecture of software systems to understand cellular design principles, we present a comparison between the transcriptional regulatory network of a well-studied bacterium (Escherichia coli) and the call graph of a cano...
Nepal, Chirag; O'Rourke, Colm J; Oliveira, Douglas Vnp
-exome sequencing, targeted exome sequencing) and epigenomic data from 496 patients, and used the three most recurrently mutated genes to stratify patients (IDH, KRAS, TP53, 'undetermined'). Using this molecular dissection approach, each subgroup was determined to possess unique mutational signature preferences, co...... all 3 mutations ('undetermined') harbored the most extensive structural alterations while IDH mutant tumors displayed the most extensive DNA methylome dysregulation, consistent with previous findings. CONCLUSION: Stratification of iCCA patients based on occurrence of mutations in three classifier...... genes (IDH, KRAS, TP53) revealed unique oncogenic programs (mutational, structural, epi-mutational) that influence pharmacologic response in drug repositioning protocols. This genome dissection approach highlights the potential of individual mutations to induce extensive molecular heterogeneity...
Davila-Velderrain, Jose; Servin-Marquez, Andres; Alvarez-Buylla, Elena R
The gene regulatory network of floral organ cell fate specification of Arabidopsis thaliana is a robust developmental regulatory module. Although such finding was proposed to explain the overall conservation of floral organ types and organization among angiosperms, it has not been confirmed that the network components are conserved at the molecular level among flowering plants. Using the genomic data that have accumulated, we address the conservation of the genes involved in this network and the forces that have shaped its evolution during the divergence of angiosperms. We recovered the network gene homologs for 18 species of flowering plants spanning nine families. We found that all the genes are highly conserved with no evidence of positive selection. We studied the sequence conservation features of the genes in the context of their known biological function and the strength of the purifying selection acting upon them in relation to their placement within the network. Our results suggest an association between protein length and sequence conservation, evolutionary rates, and functional category. On the other hand, we found no significant correlation between the strength of purifying selection and gene placement. Our results confirm that the studied robust developmental regulatory module has been subjected to strong functional constraints. However, unlike previous studies, our results do not support the notion that network topology plays a major role in constraining evolutionary rates. We speculate that the dynamical functional role of genes within the network and not just its connectivity could play an important role in constraining evolution.
Background Low temperature leads to major crop losses every year. Although several studies have been conducted focusing on diversity of cold tolerance level in multiple phenotypically divergent Arabidopsis thaliana (A. thaliana) ecotypes, genome-scale molecular understanding is still lacking. Results In this study, we report genome-scale transcript response diversity of 10 A. thaliana ecotypes originating from different geographical locations to non-freezing cold stress (10°C). To analyze the transcriptional response diversity, we initially compared transcriptome changes in all 10 ecotypes using Arabidopsis NimbleGen ATH6 microarrays. In total 6061 transcripts were significantly cold regulated (p cold stress regulon genes. Significant numbers of non-synonymous amino acid changes were observed in the coding region of the CBF regulon genes. Considering the limited knowledge about regulatory interactions between transcription factors and their target genes in the model plant A. thaliana, we have adopted a powerful systems genetics approach- Network Component Analysis (NCA) to construct an in-silico transcriptional regulatory network model during response to cold stress. The resulting regulatory network contained 1,275 nodes and 7,720 connections, with 178 transcription factors and 1,331 target genes. Conclusions A. thaliana ecotypes exhibit considerable variation in transcriptome level responses to non-freezing cold stress treatment. Ecotype specific transcripts and related gene ontology (GO) categories were identified to delineate natural variation of cold stress regulated differential gene expression in the model plant A. thaliana. The predicted regulatory network model was able to identify new ecotype specific transcription factors and their regulatory interactions, which might be crucial for their local geographic adaptation to cold temperature. Additionally, since the approach presented here is general, it could be adapted to study networks regulating
Ravcheev, Dmitry A; Godzik, Adam; Osterman, Andrei L; Rodionov, Dmitry A
Bacteroides thetaiotaomicron, a predominant member of the human gut microbiota, is characterized by its ability to utilize a wide variety of polysaccharides using the extensive saccharolytic machinery that is controlled by an expanded repertoire of transcription factors (TFs). The availability of genomic sequences for multiple Bacteroides species opens an opportunity for their comparative analysis to enable characterization of their metabolic and regulatory networks. A comparative genomics approach was applied for the reconstruction and functional annotation of the carbohydrate utilization regulatory networks in 11 Bacteroides genomes. Bioinformatics analysis of promoter regions revealed putative DNA-binding motifs and regulons for 31 orthologous TFs in the Bacteroides. Among the analyzed TFs there are 4 SusR-like regulators, 16 AraC-like hybrid two-component systems (HTCSs), and 11 regulators from other families. Novel DNA motifs of HTCSs and SusR-like regulators in the Bacteroides have the common structure of direct repeats with a long spacer between two conserved sites. The inferred regulatory network in B. thetaiotaomicron contains 308 genes encoding polysaccharide and sugar catabolic enzymes, carbohydrate-binding and transport systems, and TFs. The analyzed TFs control pathways for utilization of host and dietary glycans to monosaccharides and their further interconversions to intermediates of the central metabolism. The reconstructed regulatory network allowed us to suggest and refine specific functional assignments for sugar catabolic enzymes and transporters, providing a substantial improvement to the existing metabolic models for B. thetaiotaomicron. The obtained collection of reconstructed TF regulons is available in the RegPrecise database (http://regprecise.lbl.gov).
Yan, Koon-Kiu; Fang, Gang; Bhardwaj, Nitin; Alexander, Roger P; Gerstein, Mark
The genome has often been called the operating system (OS) for a living organism. A computer OS is described by a regulatory control network termed the call graph, which is analogous to the transcriptional regulatory network in a cell. To apply our firsthand knowledge of the architecture of software systems to understand cellular design principles, we present a comparison between the transcriptional regulatory network of a well-studied bacterium (Escherichia coli) and the call graph of a canonical OS (Linux) in terms of topology and evolution. We show that both networks have a fundamentally hierarchical layout, but there is a key difference: The transcriptional regulatory network possesses a few global regulators at the top and many targets at the bottom; conversely, the call graph has many regulators controlling a small set of generic functions. This top-heavy organization leads to highly overlapping functional modules in the call graph, in contrast to the relatively independent modules in the regulatory network. We further develop a way to measure evolutionary rates comparably between the two networks and explain this difference in terms of network evolution. The process of biological evolution via random mutation and subsequent selection tightly constrains the evolution of regulatory network hubs. The call graph, however, exhibits rapid evolution of its highly connected generic components, made possible by designers' continual fine-tuning. These findings stem from the design principles of the two systems: robustness for biological systems and cost effectiveness (reuse) for software systems.
Doane, Ashley S; Elemento, Olivier
Regulatory elements determine the connectivity of molecular networks and mediate a variety of regulatory processes ranging from DNA looping to transcriptional, posttranscriptional, and posttranslational regulation. This review highlights our current understanding of the different types of regulatory elements found in molecular networks with a focus on DNA regulatory elements. We highlight technical advances and current challenges for the mapping of regulatory elements at the genome-wide scale, and describe new computational methods to uncover these elements via reconstructing regulatory networks from large genomic datasets. WIREs Syst Biol Med 2017, 9:e1374. doi: 10.1002/wsbm.1374 For further resources related to this article, please visit the WIREs website. © 2017 Wiley Periodicals, Inc.
Belcastro, Vincenzo; Gregoretti, Francesco; Siciliano, Velia; Santoro, Michele; D'Angelo, Giovanni; Oliva, Gennaro; di Bernardo, Diego
Regulation of gene expression is a carefully regulated phenomenon in the cell. “Reverse-engineering” algorithms try to reconstruct the regulatory interactions among genes from genome-scale measurements of gene expression profiles (microarrays). Mammalian cells express tens of thousands of genes; hence, hundreds of gene expression profiles are necessary in order to have acceptable statistical evidence of interactions between genes. As the number of profiles to be analyzed increases, so do computational costs and memory requirements. In this work, we designed and developed a parallel computing algorithm to reverse-engineer genome-scale gene regulatory networks from thousands of gene expression profiles. The algorithm is based on computing pairwise Mutual Information between each gene-pair. We successfully tested it to reverse engineer the Mus Musculus (mouse) gene regulatory network in liver from gene expression profiles collected from a public repository. A parallel hierarchical clustering algorithm was implemented to discover “communities” within the gene network. Network communities are enriched for genes involved in the same biological functions. The inferred network was used to identify two mitochondrial proteins.
Liu, Guodong; Marras, Antonio; Nielsen, Jens
regulatory information is necessary to improve the accuracy and predictive ability of metabolic models. Here we review the strategies for the reconstruction of a transcriptional regulatory network (TRN) for yeast and the integration of such a reconstruction into a flux balance analysis-based metabolic model....... While many large-scale TRN reconstructions have been reported for yeast, these reconstructions still need to be improved regarding the functionality and dynamic property of the regulatory interactions. In addition, mathematical modeling approaches need to be further developed to efficiently integrate...
Glinsky, Gennadi V
Thousands of candidate human-specific regulatory sequences (HSRS) have been identified, supporting the hypothesis that unique to human phenotypes result from human-specific alterations of genomic regulatory networks. Collectively, a compendium of multiple diverse families of HSRS that are functionally and structurally divergent from Great Apes could be defined as the backbone of human-specific genomic regulatory networks. Here, the conservation patterns analysis of 18,364 candidate HSRS was carried out requiring that 100% of bases must remap during the alignments of human, chimpanzee, and bonobo sequences. A total of 5,535 candidate HSRS were identified that are: (i) highly conserved in Great Apes; (ii) evolved by the exaptation of highly conserved ancestral DNA; (iii) defined by either the acceleration of mutation rates on the human lineage or the functional divergence from non-human primates. The exaptation of highly conserved ancestral DNA pathway seems mechanistically distinct from the evolution of regulatory DNA segments driven by the species-specific expansion of transposable elements. Genome-wide proximity placement analysis of HSRS revealed that a small fraction of topologically associating domains (TADs) contain more than half of HSRS from four distinct families. TADs that are enriched for HSRS and termed rapidly evolving in humans TADs (revTADs) comprise 0.8-10.3% of 3,127 TADs in the hESC genome. RevTADs manifest distinct correlation patterns between placements of human accelerated regions, human-specific transcription factor-binding sites, and recombination rates. There is a significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (P Homo sapiens is driven by the evolution of human-specific genomic regulatory networks via at least two mechanistically distinct pathways of
Hu, Guangan; Chen, Jianzhu
.... To identify transcription factors and their interactions in memory CD8⁺ T-cell development, we construct a genome-wide regulatory network and apply it to identify key transcription factors that regulate memory signature genes...
Flavia Vischi Winck
Full Text Available The unicellular green alga Chlamydomonas reinhardtii is a long-established model organism for studies on photosynthesis and carbon metabolism-related physiology. Under conditions of air-level carbon dioxide concentration [CO2], a carbon concentrating mechanism (CCM is induced to facilitate cellular carbon uptake. CCM increases the availability of carbon dioxide at the site of cellular carbon fixation. To improve our understanding of the transcriptional control of the CCM, we employed FAIRE-seq (formaldehyde-assisted Isolation of Regulatory Elements, followed by deep sequencing to determine nucleosome-depleted chromatin regions of algal cells subjected to carbon deprivation. Our FAIRE data recapitulated the positions of known regulatory elements in the promoter of the periplasmic carbonic anhydrase (Cah1 gene, which is upregulated during CCM induction, and revealed new candidate regulatory elements at a genome-wide scale. In addition, time series expression patterns of 130 transcription factor (TF and transcription regulator (TR genes were obtained for cells cultured under photoautotrophic condition and subjected to a shift from high to low [CO2]. Groups of co-expressed genes were identified and a putative directed gene-regulatory network underlying the CCM was reconstructed from the gene expression data using the recently developed IOTA (inner composition alignment method. Among the candidate regulatory genes, two members of the MYB-related TF family, Lcr1 (Low-CO 2 response regulator 1 and Lcr2 (Low-CO2 response regulator 2, may play an important role in down-regulating the expression of a particular set of TF and TR genes in response to low [CO2]. The results obtained provide new insights into the transcriptional control of the CCM and revealed more than 60 new candidate regulatory genes. Deep sequencing of nucleosome-depleted genomic regions indicated the presence of new, previously unknown regulatory elements in the C. reinhardtii genome
Full Text Available Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription and signaling have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert – a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system.
Wang, Dong; Amornsiripanitch, Nita; Dong, Xinnian
Many biological processes are controlled by intricate networks of transcriptional regulators. With the development of microarray technology, transcriptional changes can be examined at the whole-genome level. However, such analysis often lacks information on the hierarchical relationship between components of a given system. Systemic acquired resistance (SAR) is an inducible plant defense response involving a cascade of transcriptional events induced by salicylic acid through the transcription cofactor NPR1. To identify additional regulatory nodes in the SAR network, we performed microarray analysis on Arabidopsis plants expressing the NPR1-GR (glucocorticoid receptor) fusion protein. Since nuclear translocation of NPR1-GR requires dexamethasone, we were able to control NPR1-dependent transcription and identify direct transcriptional targets of NPR1. We show that NPR1 directly upregulates the expression of eight WRKY transcription factor genes. This large family of 74 transcription factors has been implicated in various defense responses, but no specific WRKY factor has been placed in the SAR network. Identification of NPR1-regulated WRKY factors allowed us to perform in-depth genetic analysis on a small number of WRKY factors and test well-defined phenotypes of single and double mutants associated with NPR1. Among these WRKY factors we found both positive and negative regulators of SAR. This genomics-directed approach unambiguously positioned five WRKY factors in the complex transcriptional regulatory network of SAR. Our work not only discovered new transcription regulatory components in the signaling network of SAR but also demonstrated that functional studies of large gene families have to take into consideration sequence similarity as well as the expression patterns of the candidates.
Full Text Available Many biological processes are controlled by intricate networks of transcriptional regulators. With the development of microarray technology, transcriptional changes can be examined at the whole-genome level. However, such analysis often lacks information on the hierarchical relationship between components of a given system. Systemic acquired resistance (SAR is an inducible plant defense response involving a cascade of transcriptional events induced by salicylic acid through the transcription cofactor NPR1. To identify additional regulatory nodes in the SAR network, we performed microarray analysis on Arabidopsis plants expressing the NPR1-GR (glucocorticoid receptor fusion protein. Since nuclear translocation of NPR1-GR requires dexamethasone, we were able to control NPR1-dependent transcription and identify direct transcriptional targets of NPR1. We show that NPR1 directly upregulates the expression of eight WRKY transcription factor genes. This large family of 74 transcription factors has been implicated in various defense responses, but no specific WRKY factor has been placed in the SAR network. Identification of NPR1-regulated WRKY factors allowed us to perform in-depth genetic analysis on a small number of WRKY factors and test well-defined phenotypes of single and double mutants associated with NPR1. Among these WRKY factors we found both positive and negative regulators of SAR. This genomics-directed approach unambiguously positioned five WRKY factors in the complex transcriptional regulatory network of SAR. Our work not only discovered new transcription regulatory components in the signaling network of SAR but also demonstrated that functional studies of large gene families have to take into consideration sequence similarity as well as the expression patterns of the candidates.
Sankar, Savita; Yellajoshyula, Dhananjay; Zhang, Bo; Teets, Bryan; Rockweiler, Nicole; Kroll, Kristen L.
Neural cell fate acquisition is mediated by transcription factors expressed in nascent neuroectoderm, including Geminin and members of the Zic transcription factor family. However, regulatory networks through which this occurs are not well defined. Here, we identified Geminin-associated chromatin locations in embryonic stem cells and Geminin- and Zic1-associated locations during neural fate acquisition at a genome-wide level. We determined how Geminin deficiency affected histone acetylation at gene promoters during this process. We integrated these data to demonstrate that Geminin associates with and promotes histone acetylation at neurodevelopmental genes, while Geminin and Zic1 bind a shared gene subset. Geminin- and Zic1-associated genes exhibit embryonic nervous system-enriched expression and encode other regulators of neural development. Both Geminin and Zic1-associated peaks are enriched for Zic1 consensus binding motifs, while Zic1-bound peaks are also enriched for Sox3 motifs, suggesting co-regulatory potential. Accordingly, we found that Geminin and Zic1 could cooperatively activate the expression of several shared targets encoding transcription factors that control neurogenesis, neural plate patterning, and neuronal differentiation. We used these data to construct gene regulatory networks underlying neural fate acquisition. Establishment of this molecular program in nascent neuroectoderm directly links early neural cell fate acquisition with regulatory control of later neurodevelopment. PMID:27881878
Kelley, David R; Snoek, Jasper; Rinn, John L
The complex language of eukaryotic gene expression remains incompletely understood. Despite the importance suggested by many noncoding variants statistically associated with human disease, nearly all such variants have unknown mechanisms. Here, we address this challenge using an approach based on a recent machine learning advance-deep convolutional neural networks (CNNs). We introduce the open source package Basset to apply CNNs to learn the functional activity of DNA sequences from genomics data. We trained Basset on a compendium of accessible genomic sites mapped in 164 cell types by DNase-seq, and demonstrate greater predictive accuracy than previous methods. Basset predictions for the change in accessibility between variant alleles were far greater for Genome-wide association study (GWAS) SNPs that are likely to be causal relative to nearby SNPs in linkage disequilibrium with them. With Basset, a researcher can perform a single sequencing assay in their cell type of interest and simultaneously learn that cell's chromatin accessibility code and annotate every mutation in the genome with its influence on present accessibility and latent potential for accessibility. Thus, Basset offers a powerful computational approach to annotate and interpret the noncoding genome. © 2016 Kelley et al.; Published by Cold Spring Harbor Laboratory Press.
Mathilde de Taffin
Full Text Available Collier, the single Drosophila COE (Collier/EBF/Olf-1 transcription factor, is required in several developmental processes, including head patterning and specification of muscle and neuron identity during embryogenesis. To identify direct Collier (Col targets in different cell types, we used ChIP-seq to map Col binding sites throughout the genome, at mid-embryogenesis. In vivo Col binding peaks were associated to 415 potential direct target genes. Gene Ontology analysis revealed a strong enrichment in proteins with DNA binding and/or transcription-regulatory properties. Characterization of a selection of candidates, using transgenic CRM-reporter assays, identified direct Col targets in dorso-lateral somatic muscles and specific neuron types in the central nervous system. These data brought new evidence that Col direct control of the expression of the transcription regulators apterous and eyes-absent (eya is critical to specifying neuronal identities. They also showed that cross-regulation between col and eya in muscle progenitor cells is required for specification of muscle identity, revealing a new parallel between the myogenic regulatory networks operating in Drosophila and vertebrates. Col regulation of eya, both in specific muscle and neuronal lineages, may illustrate one mechanism behind the evolutionary diversification of Col biological roles.
Kim, Man-Sun; Kim, Dongsan; Kang, Nam Sook; Kim, Jeong-Rae
In order to discover the common characteristics of various cell types in the human body, many researches have been conducted to find the set of genes commonly expressed in various cell types and tissues. However, the functional characteristics of a cell is determined by the complex regulatory relationships among the genes rather than by expressed genes themselves. Therefore, it is more important to identify and analyze a core regulatory network where all regulatory relationship between genes are active across all cell types to uncover the common features of various cell types. Here, based on hundreds of tissue-specific gene regulatory networks constructed by recent genome-wide experimental data, we constructed the core regulatory network. Interestingly, we found that the core regulatory network is organized by simple cascade and has few complex regulations such as feedback or feed-forward loops. Moreover, we discovered that the regulatory links from genes in the core regulatory network to genes in the peripheral regulatory network are much more abundant than the reverse direction links. These results suggest that the core regulatory network locates at the top of regulatory network and plays a role as a 'hub' in terms of information flow, and the information that is common to all cells can be modified to achieve the tissue-specific characteristics through various types of feedback and feed-forward loops in the peripheral regulatory networks. We also found that the genes in the core regulatory network are evolutionary conserved, essential and non-disease, non-druggable genes compared to the peripheral genes. Overall, our study provides an insight into how all human cells share a common function and generate tissue-specific functional traits by transmitting and processing information through regulatory network. Copyright Â© 2017 Elsevier Inc. All rights reserved.
Barah, Pankaj; Jayavelu, Naresh Doni; Rasmussen, Simon
BACKGROUND: Low temperature leads to major crop losses every year. Although several studies have been conducted focusing on diversity of cold tolerance level in multiple phenotypically divergent Arabidopsis thaliana (A. thaliana) ecotypes, genome-scale molecular understanding is still lacking. RE...
Terpstra, I.R.; Snoek, L.B.; Keurentjes, J.J.B.; Peeters, A.J.M.; Ackerveken, van den G.
Gene expression differences between individuals within a species can be largely explained by differences in genetic background. The effect of genetic variants (alleles) of genes on expression can be studied in a multifactorial way by application of genetical genomics or expression quantitative trait
Vishnubalaji, R; Hamam, R; Abdulla, M-H
Despite recent advances in cancer management, colorectal cancer (CRC) remains the third most common cancer and a major health-care problem worldwide. MicroRNAs have recently emerged as key regulators of cancer development and progression by targeting multiple cancer-related genes; however......, such regulatory networks are not well characterized in CRC. Thus, the aim of this study was to perform global messenger RNA (mRNA) and microRNA expression profiling in the same CRC samples and adjacent normal tissues and to identify potential miRNA-mRNA regulatory networks. Our data revealed 1273 significantly...... in cell proliferation, and migration in vitro. Concordantly, small interfering RNA-mediated knockdown of EZH2 led to similar effects on CRC cell growth in vitro. Therefore, our data have revealed several hundred potential miRNA-mRNA regulatory networks in CRC and suggest targeting relevant networks...
Freyre-González, Julio A; Tauch, Andreas
Corynebacterium glutamicum is a Gram-positive, anaerobic, rod-shaped soil bacterium able to grow on a diversity of carbon sources like sugars and organic acids. It is a biotechnological relevant organism because of its highly efficient ability to biosynthesize amino acids, such as l-glutamic acid and l-lysine. Here, we reconstructed the most complete C. glutamicum regulatory network to date and comprehensively analyzed its global organizational properties, systems-level features and functional architecture. Our analyses show the tremendous power of Abasy Atlas to study the functional organization of regulatory networks. We created two models of the C. glutamicum regulatory network: all-evidences (containing both weak and strong supported interactions, genomic coverage=73%) and strongly-supported (only accounting for strongly supported evidences, genomic coverage=71%). Using state-of-the-art methodologies, we prove that power-law behaviors truly govern the connectivity and clustering coefficient distributions. We found a non-previously reported circuit motif that we named complex feed-forward motif. We highlighted the importance of feedback loops for the functional architecture, beyond whether they are statistically over-represented or not in the network. We show that the previously reported top-down approach is inadequate to infer the hierarchy governing a regulatory network because feedback bridges different hierarchical layers, and the top-down approach disregards the presence of intermodular genes shaping the integration layer. Our findings all together further support a diamond-shaped, three-layered hierarchy exhibiting some feedback between processing and coordination layers, which is shaped by four classes of systems-level elements: global regulators, locally autonomous modules, basal machinery and intermodular genes. Copyright © 2016 Elsevier B.V. All rights reserved.
Vishnubalaji, R; Hamam, R; Abdulla, M-H
Despite recent advances in cancer management, colorectal cancer (CRC) remains the third most common cancer and a major health-care problem worldwide. MicroRNAs have recently emerged as key regulators of cancer development and progression by targeting multiple cancer-related genes; however......, such regulatory networks are not well characterized in CRC. Thus, the aim of this study was to perform global messenger RNA (mRNA) and microRNA expression profiling in the same CRC samples and adjacent normal tissues and to identify potential miRNA-mRNA regulatory networks. Our data revealed 1273 significantly......-β (using SB-431542) pathways led to dose- and time-dependent inhibition of CRC cell growth. Similarly, our data revealed up- (42) and downregulated (61) microRNAs in the same matched samples. Using target prediction and bioinformatics, ~77% of the upregulated genes were predicted to be targeted by microRNAs...
Tiffany B. Taylor
Full Text Available Bacteria have evolved complex regulatory networks that enable integration of multiple intracellular and extracellular signals to coordinate responses to environmental changes. However, our knowledge of how regulatory systems function and evolve is still relatively limited. There is often extensive homology between components of different networks, due to past cycles of gene duplication, divergence, and horizontal gene transfer, raising the possibility of cross-talk or redundancy. Consequently, evolutionary resilience is built into gene networks – homology between regulators can potentially allow rapid rescue of lost regulatory function across distant regions of the genome. In our recent study [Taylor, et al. Science (2015, 347(6225] we find that mutations that facilitate cross-talk between pathways can contribute to gene network evolution, but that such mutations come with severe pleiotropic costs. Arising from this work are a number of questions surrounding how this phenomenon occurs.
Tegnér, Jesper N.
Mapping out cellular networks in general and transcriptional networks in particular has proved to be a bottle-neck hampering our understanding of biological processes. Integrative approaches fusing computational and experimental technologies for decoding transcriptional networks at a high level of resolution is therefore of uttermost importance. Yet, this is challenging since the control of gene expression in eukaryotes is a complex multi-level process influenced by several epigenetic factors and the fine interplay between regulatory proteins and the promoter structure governing the combinatorial regulation of gene expression. In this chapter we review how the CAGE data can be integrated with other measurements such as expression, physical interactions and computational prediction of regulatory motifs, which together can provide a genome-wide picture of eukaryotic transcriptional regulatory networks at a new level of resolution. © 2010 by Pan Stanford Publishing Pte. Ltd. All rights reserved.
Full Text Available Gene regulatory networks are perhaps the most important organizational level in the cell where signals from the cell state and the outside environment are integrated in terms of activation and inhibition of genes. For the last decade, the study of such networks has been fueled by large-scale experiments and renewed attention from the theoretical field. Different models have been proposed to, for instance, investigate expression dynamics, explain the network topology we observe in bacteria and yeast, and for the analysis of evolvability and robustness of such networks. Yet how these gene regulatory networks evolve and become evolvable remains an open question. An individual-oriented evolutionary model is used to shed light on this matter. Each individual has a genome from which its gene regulatory network is derived. Mutations, such as gene duplications and deletions, alter the genome, while the resulting network determines the gene expression pattern and hence fitness. With this protocol we let a population of individuals evolve under Darwinian selection in an environment that changes through time. Our work demonstrates that long-term evolution of complex gene regulatory networks in a changing environment can lead to a striking increase in the efficiency of generating beneficial mutations. We show that the population evolves towards genotype-phenotype mappings that allow for an orchestrated network-wide change in the gene expression pattern, requiring only a few specific gene indels. The genes involved are hubs of the networks, or directly influencing the hubs. Moreover, throughout the evolutionary trajectory the networks maintain their mutational robustness. In other words, evolution in an alternating environment leads to a network that is sensitive to a small class of beneficial mutations, while the majority of mutations remain neutral: an example of evolution of evolvability.
Jung, Kwang-Woo; Yang, Dong-Hoon; Kim, Min-Kyu; Seo, Ho Seong; Lim, Sangyong; Bahn, Yong-Sun
have been found to show high radiation resistance. Organisms harboring the ability of radiation resistance have unique regulatory networks to overcome this stress. Cryptococcus neoformans is one of the radiation-resistant fungi and is found in highly radioactive environments. However, it remains elusive how radiation-resistant eukaryotic microorganisms work differentially from radiation-sensitive ones. Here, we performed transcriptome analysis of C. neoformans to explore gene expression profiles after gamma radiation exposure and functionally characterized some of identified radiation resistance genes. Notably, we identified a novel regulator of radiation resistance, named Bdr1 (a bZIP TF for DNA damage response 1), which is a transcription factor (TF) that is not closely homologous to any known TF and is transcriptionally controlled by the Rad53 kinase. Therefore, our work could shed light on understanding not only the radiation response but also the radiation resistance mechanism of C. neoformans. Copyright © 2016 Jung et al.
Motamedian, Ehsan; Mohammadi, Maryam; Shojaosadati, Seyed Abbas; Heydari, Mona
Integration of different biological networks and data-types has been a major challenge in systems biology. The present study introduces the transcriptional regulated flux balance analysis (TRFBA) algorithm that integrates transcriptional regulatory and metabolic models using a set of expression data for various perturbations. TRFBA considers the expression levels of genes as a new continuous variable and introduces two new linear constraints. The first constraint limits the rate of reaction(s) supported by a metabolic gene using a constant parameter (C) that converts the expression levels to the upper bounds of the reactions. Considering the concept of constraint-based modeling, the second set of constraints correlates the expression level of each target gene with that of its regulating genes. A set of constraints and binary variables was also added to prevent the second set of constraints from overlapping. TRFBA was implemented on Escherichia coli and Saccharomyces cerevisiae models to estimate growth rates under various environmental and genetic perturbations. The error sensitivity to the algorithm parameter was evaluated to find the best value of C. The results indicate a significant improvement in the quantitative prediction of growth in comparison with previously presented algorithms. The robustness of the algorithm to change in the expression data and the regulatory network was tested to evaluate the effect of noisy and incomplete data. Furthermore, the use of added constraints for perturbations without their gene expression profile demonstrates that these constraints can be applied to improve the growth prediction of FBA. TRFBA is implemented in Matlab software and requires COBRA toolbox. Source code is freely available at http://sbme.modares.ac.ir . : firstname.lastname@example.org. Supplementary data are available at Bioinformatics online.
Wong, Darren Chern Jan; Lopez Gutierrez, Rodrigo; Gambetta, Gregory Alan; Castellarin, Simone Diego
Coordinated transcriptional and metabolic reprogramming ensures a plant's continued growth and survival under adverse environmental conditions. Transcription factors (TFs) act to modulate gene expression through complex cis-regulatory element (CRE) interactions. Genome-wide analysis of known plant CREs was performed for all currently predicted protein-coding gene promoters in grapevine (Vitis vinifera L.). Many CREs such as abscisic acid (ABA)-responsive, drought-responsive, auxin-responsive, and evening elements, exhibit bona fide CRE properties such as strong position bias towards the transcription start site (TSS) and over-representation when compared with random promoters. Genes containing these CREs are enriched in a large repertoire of plant biological pathways. Large-scale transcriptome analyses also show that these CREs are highly implicated in grapevine development and stress response. Numerous CRE-driven modules in condition-specific gene co-expression networks (GCNs) were identified and many of these modules were highly enriched for plant biological functions. Several modules corroborate known roles of CREs in drought response, pathogen defense, cell wall metabolism, and fruit ripening, whereas others reveal novel functions in plants. Comparisons with Arabidopsis suggest a general conservation in promoter architecture, gene expression dynamics, and GCN structure across species. Systems analyses of CREs provide insights into the grapevine cis-regulatory code and establish a foundation for future genomic studies in grapevine. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
In vivo genome-wide analysis of multiple tissues identifies gene regulatory networks, novel functions and downstream regulatory genes for Bapx1 and its co-regulation with Sox9 in the mammalian vertebral column.
Chatterjee, Sumantra; Sivakamasundari, V; Yap, Sook Peng; Kraus, Petra; Kumar, Vibhor; Xing, Xing; Lim, Siew Lan; Sng, Joel; Prabhakar, Shyam; Lufkin, Thomas
Vertebrate organogenesis is a highly complex process involving sequential cascades of transcription factor activation or repression. Interestingly a single developmental control gene can occasionally be essential for the morphogenesis and differentiation of tissues and organs arising from vastly disparate embryological lineages. Here we elucidated the role of the mammalian homeobox gene Bapx1 during the embryogenesis of five distinct organs at E12.5 - vertebral column, spleen, gut, forelimb and hindlimb - using expression profiling of sorted wildtype and mutant cells combined with genome wide binding site analysis. Furthermore we analyzed the development of the vertebral column at the molecular level by combining transcriptional profiling and genome wide binding data for Bapx1 with similarly generated data sets for Sox9 to assemble a detailed gene regulatory network revealing genes previously not reported to be controlled by either of these two transcription factors. The gene regulatory network appears to control cell fate decisions and morphogenesis in the vertebral column along with the prevention of premature chondrocyte differentiation thus providing a detailed molecular view of vertebral column development.
Wang, Ronghua; Mei, Yi; Xu, Liang; Zhu, Xianwen; Wang, Yan; Guo, Jun; Liu, Liwang
Heat stress (HS) causes detrimental effects on plant morphology, physiology, and biochemistry that lead to drastic reduction in plant biomass production and economic yield worldwide. To date, little is known about HS-responsive genes involved in thermotolerance mechanism in radish. In this study, a total of 6600 differentially expressed genes (DEGs) from the control and Heat24 cDNA libraries of radish were isolated by high-throughput sequencing. With Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, some genes including MAPK, DREB, ERF, AP2, GST, Hsf, and Hsp were predominantly assigned in signal transductions, metabolic pathways, and biosynthesis and abiotic stress-responsive pathways. These pathways played significant roles in reducing stress-induced damages and enhancing heat tolerance in radish. Expression patterns of 24 candidate genes were validated by reverse-transcription quantitative PCR (RT-qPCR). Based mainly on the analysis of DEGs combining with the previous miRNAs analysis, the schematic model of HS-responsive regulatory network was proposed. To counter the effects of HS, a rapid response of the plasma membrane leads to the opening of specific calcium channels and cytoskeletal reorganization, after which HS-responsive genes are activated to repair damaged proteins and ultimately facilitate further enhancement of thermotolerance in radish. These results could provide fundamental insight into the regulatory network underlying heat tolerance in radish and facilitate further genetic manipulation of thermotolerance in root vegetable crops.
Lee, Wenqing Jean; Chatterjee, Sumantra; Yap, Sook Peng; Lim, Siew Lan; Xing, Xing; Kraus, Petra; Sun, Wenjie; Hu, Xiaoming; Sivakamasundari, V.; Chan, Hsiao Yun; Kolatkar, Prasanna R.; Prabhakar, Shyam
Embryogenesis is an intricate process involving multiple genes and pathways. Some of the key transcription factors controlling specific cell types are the Sox trio, namely, Sox5, Sox6, and Sox9, which play crucial roles in organogenesis working in a concerted manner. Much however still needs to be learned about their combinatorial roles during this process. A developmental genomics and systems biology approach offers to complement the reductionist methodology of current developmental biology and provide a more comprehensive and integrated view of the interrelationships of complex regulatory networks that occur during organogenesis. By combining cell type-specific transcriptome analysis and in vivo ChIP-Seq of the Sox trio using mouse embryos, we provide evidence for the direct control of Sox5 and Sox6 by the transcriptional trio in the murine model and by Morpholino knockdown in zebrafish and demonstrate the novel role of Tgfb2, Fbxl18, and Tle3 in formation of Sox5, Sox6, and Sox9 dependent tissues. Concurrently, a complete embryonic gene regulatory network has been generated, identifying a wide repertoire of genes involved and controlled by the Sox trio in the intricate process of normal embryogenesis. PMID:28630873
Full Text Available Extending genome wide association analysis by the inclusion of gene expression data may assist in the dissection of complex traits. We examined piebald, a pigmentation phenotype in both human and Merino sheep, by analysing multiple data types using a systems approach. First, a case control analysis of 49,034 ovine SNP was performed which confirmed a multigenic basis for the condition. We combined these results with gene expression data from five tissue types analysed with a skin-specific microarray. Promoter sequence analysis of differentially expressed genes allowed us to reverse-engineer a regulatory network. Likewise, by testing two-loci models derived from all pair-wise comparisons across piebald-associated SNP, we generated an epistatic network. At the intersection of both networks, we identified thirteen genes with insulin-like growth factor binding protein 7 (IGFBP7, platelet-derived growth factor alpha (PDGFRA and the tetraspanin platelet activator CD9 at the kernel of the intersection. Further, we report a number of differentially expressed genes in regions containing highly associated SNP including ATRN, DOCK7, FGFR1OP, GLI3, SILV and TBX15. The application of network theory facilitated co-analysis of genetic variation with gene expression, recapitulated aspects of the known molecular biology of skin pigmentation and provided insights into the transcription regulation and epistatic interactions involved in piebald Merino sheep.
Tong, Weida; Ostroff, Stephen; Blais, Burton; Silva, Primal; Dubuc, Martine; Healy, Marion; Slikker, William
Genomics science has played a major role in the generation of new knowledge in the basic research arena, and currently question arises as to its potential to support regulatory processes. However, the integration of genomics in the regulatory decision-making process requires rigorous assessment and would benefit from consensus amongst international partners and research communities. To that end, the Global Coalition for Regulatory Science Research (GCRSR) hosted the fourth Global Summit on Regulatory Science (GSRS2014) to discuss the role of genomics in regulatory decision making, with a specific emphasis on applications in food safety and medical product development. Challenges and issues were discussed in the context of developing an international consensus for objective criteria in the analysis, interpretation and reporting of genomics data with an emphasis on transparency, traceability and "fitness for purpose" for the intended application. It was recognized that there is a need for a global path in the establishment of a regulatory bioinformatics framework for the development of transparent, reliable, reproducible and auditable processes in the management of food and medical product safety risks. It was also recognized that training is an important mechanism in achieving internationally consistent outcomes. GSRS2014 provided an effective venue for regulators andresearchers to meet, discuss common issues, and develop collaborations to address the challenges posed by the application of genomics to regulatory science, with the ultimate goal of wisely integrating novel technical innovations into regulatory decision-making. Published by Elsevier Inc.
Ramesh, Archana; Trevino, Robert; VON Hoff, Daniel D; Kim, Seungchan
Gene regulatory networks (GRNs) learned from high throughput genomic data are often hard to visualize due to the large number of nodes and edges involved, rendering them difficult to appreciate. This becomes an important issue when modular structures are inherent in the inferred networks, such as in the recently proposed context-specific GRNs.(12) In this study, we investigate the application of graph clustering techniques to discern modularity in such highly complex graphs, focusing on context-specific GRNs. Identified modules are then associated with a subset of samples and the key pathways enriched in the module. Specifically, we study the use of Markov clustering and spectral clustering on cancer datasets to yield evidence on the possible association amongst different tumor types. Two sets of gene expression profiling data were analyzed to reveal context-specificity as well as modularity in genomic regulations.
Seo, Sang Woo; Kim, Donghyuk; O'Brien, Edward J.
. We demonstrate that GadEWX directly and coherently regulate several proton-generating/consuming enzymes with pairs of negative-feedback loops for pH homeostasis. In addition, GadEWX regulate genes with assorted functions, including molecular chaperones, acid resistance, stress response and other...... regulatory activities. These results show how GadEWX simultaneously coordinate many cellular processes to produce the overall response of E. coli to acid stress....
Full Text Available Myc is a master transcription factor that has been demonstrated to be required for embryonic stem cell (ESC pluripotency, self-renewal, and inhibition of differentiation. Although recent works have identified several Myc-targets in ESCs, the list of Myc binding sites is largely incomplete due to the low sensitivity and specificity of the antibodies available. To systematically identify Myc binding sites in mouse ESCs, we used a stringent streptavidin-based genome-wide chromatin immunoprecipitation (ChIP-Seq approach with biotin-tagged Myc (Bio-Myc as well as a ChIP-Seq of the Myc binding partner Max. This analysis identified 4325 Myc binding sites, of which 2885 were newly identified. The identified sites overlap with more than 85% of the Max binding sites and are enriched for H3K4me3-positive promoters and active enhancers. Remarkably, this analysis unveils that Myc/Max regulates chromatin modifiers and transcriptional regulators involved in stem cell self-renewal linking the Myc-centered network with the Polycomb and the Core networks. These results provide insights into the contribution of Myc and Max in maintaining stem cell self-renewal and keeping these cells in an undifferentiated state.
Liang, Siqi; Tippens, Nathaniel D; Zhou, Yaoda; Mort, Matthew; Stenson, Peter D; Cooper, David N; Yu, Haiyuan
The mechanistic details of most disease-causing mutations remain poorly explored within the context of regulatory networks. We present a high-resolution three-dimensional integrated regulatory network (iRegNet3D) in the form of a web tool, where we resolve the interfaces of all known transcription factor (TF)-TF, TF-DNA and chromatin-chromatin interactions for the analysis of both coding and non-coding disease-associated mutations to obtain mechanistic insights into their functional impact. Using iRegNet3D, we find that disease-associated mutations may perturb the regulatory network through diverse mechanisms including chromatin looping. iRegNet3D promises to be an indispensable tool in large-scale sequencing and disease association studies.
Full Text Available To accomplish adaptability, all living organisms are constructed of regulatory networks on different levels which are capable to differentially respond to a variety of environmental inputs. Structure of regulatory networks determines their phenotypical plasticity, that is, the degree of detail and appropriateness of regulatory replies to environmental or developmental challenges. This regulatory network structure is encoded within the genotype. Our conceptual simulation study investigates how network structure constrains the evolution of networks and their adaptive abilities. The focus is on the structural parameter network size. We show that small regulatory networks adapt fast, but not as good as larger networks in the longer perspective. Selection leads to an optimal network size dependent on heterogeneity of the environment and time pressure of adaptation. Optimal mutation rates are higher for smaller networks. We put special emphasis on discussing our simulation results on the background of functional observations from experimental and evolutionary biology.
Full Text Available The basidiomycetous fungus Cryptococcus neoformans has been known to be highly radiation resistant and has been found in fatal radioactive environments such as the damaged nuclear reactor at Chernobyl. To elucidate the mechanisms underlying the radiation resistance phenotype of C. neoformans, we identified genes affected by gamma radiation through genome-wide transcriptome analysis and characterized their functions. We found that genes involved in DNA damage repair systems were upregulated in response to gamma radiation. Particularly, deletion of recombinase RAD51 and two DNA-dependent ATPase genes, RAD54 and RDH54, increased cellular susceptibility to both gamma radiation and DNA-damaging agents. A variety of oxidative stress response genes were also upregulated. Among them, sulfiredoxin contributed to gamma radiation resistance in a peroxiredoxin/thioredoxin-independent manner. Furthermore, we found that genes involved in molecular chaperone expression, ubiquitination systems, and autophagy were induced, whereas genes involved in the biosynthesis of proteins and fatty acids/sterols were downregulated. Most importantly, we discovered a number of novel C. neoformans genes, the expression of which was modulated by gamma radiation exposure, and their deletion rendered cells susceptible to gamma radiation exposure, as well as DNA damage insults. Among these genes, we found that a unique transcription factor containing the basic leucine zipper domain, named Bdr1, served as a regulator of the gamma radiation resistance of C. neoformans by controlling expression of DNA repair genes, and its expression was regulated by the evolutionarily conserved DNA damage response protein kinase Rad53. Taken together, the current transcriptome and functional analyses contribute to the understanding of the unique molecular mechanism of the radiation-resistant fungus C. neoformans.
Seo, Sang Woo; Kim, Donghyuk; Szubin, Richard
Three transcription factors (TFs), OxyR, SoxR, and SoxS, play a critical role in transcriptional regulation of the defense system for oxidative stress in bacteria. However, their full genome-wide regulatory potential is unknown. Here, we perform a genome-scale reconstruction of the OxyR, Sox......R, and SoxS regulons in Escherichia coli K-12 MG1655. Integrative data analysis reveals that a total of 68 genes in 51 transcription units (TUs) belong to these regulons. Among them, 48 genes showed more than 2-fold changes in expression level under single-TF-knockout conditions. This reconstruction expands...
Transcription plays a key role in cellular processes and its regulation is of paramount importance. The aim of the work described in this thesis is to study the transcription regulatory network of Saccharomyces cerevisiae, employing genome-wide approaches. All the three presented research studies
Ovcharenko, I; Nobrega, M A
Synonymous gene regulation, defined as driving shared temporal and/or spatial expression of groups of genes, is likely predicated on genomic elements that contain similar modules of certain transcription factor binding sites (TFBS). We have developed a method to scan vertebrate genomes for evolutionary conserved modules of TFBS in a predefined configuration, and created a tool, named SynoR that identify synonymous regulatory elements (SREs) in vertebrate genomes. SynoR performs de novo identification of SREs utilizing known patterns of TFBS in active regulatory elements (REs) as seeds for genome scans. Layers of multiple-species conservation allow the use of differential phylogenetic sequence conservation filters in the search of SREs and the results are displayed as to provide an extensive annotation of genes containing detected REs. Gene Ontology categories are utilized to further functionally classify the identified genes, and integrated GNF Expression Atlas 2 data allow the cataloging of tissue-specificities of the predicted SREs. We illustrate how this new tool can be used to establish a linkage between human diseases and noncoding genomic content. SynoR is publicly available at http://synor.dcode.org.
Akalin, Altuna; Fredman, David; Arner, Erik; Dong, Xianjun; Bryne, Jan Christian; Suzuki, Harukazu; Daub, Carsten O; Hayashizaki, Yoshihide; Lenhard, Boris
Genomic regulatory blocks (GRBs) are chromosomal regions spanned by highly conserved non-coding elements (HCNEs), most of which serve as regulatory inputs of one target gene in the region. The target genes are most often transcription factors involved in embryonic development and differentiation. GRBs often contain extensive gene deserts, as well as additional 'bystander' genes intertwined with HCNEs but whose expression and function are unrelated to those of the target gene. The tight regulation of target genes, complex arrangement of regulatory inputs, and the differential responsiveness of genes in the region call for the examination of fundamental rules governing transcriptional activity in GRBs. Here we use extensive CAGE tag mapping of transcription start sites across different human tissues and differentiation stages combined with expression data and a number of sequence and epigenetic features to discover these rules and patterns. We show evidence that GRB target genes have properties that set them apart from their bystanders as well as other genes in the genome: longer CpG islands, a higher number and wider spacing of alternative transcription start sites, and a distinct composition of transcription factor binding sites in their core/proximal promoters. Target gene expression correlates with the acetylation state of HCNEs in the region. Additionally, target gene promoters have a distinct combination of activating and repressing histone modifications in mouse embryonic stem cell lines. GRB targets are genes with a number of unique features that are the likely cause of their ability to respond to regulatory inputs from very long distances.
Matsuda, Kazunari; Oki, Shinya; Iida, Hideaki; Andrabi, Munazah; Yamaguchi, Katsushi
To obtain insight into the transcription factor (TF)-dependent regulation of epiblast stem cells (EpiSCs), we performed ChIP-seq analysis of the genomic binding regions of five major TFs. Analysis of in vivo biotinylated ZIC2, OTX2, SOX2, POU5F1 and POU3F1 binding in EpiSCs identified several new features. (1) Megabase-scale genomic domains rich in ZIC2 peaks and genes alternate with those rich in POU3F1 but sparse in genes, reflecting the clustering of regulatory regions that act at short and long-range, which involve binding of ZIC2 and POU3F1, respectively. (2) The enhancers bound by ZIC2 and OTX2 prominently regulate TF genes in EpiSCs. (3) The binding sites for SOX2 and POU5F1 in mouse embryonic stem cells (ESCs) and EpiSCs are divergent, reflecting the shift in the major acting TFs from SOX2/POU5F1 in ESCs to OTX2/ZIC2 in EpiSCs. (4) This shift in the major acting TFs appears to be primed by binding of ZIC2 in ESCs at relevant genomic positions that later function as enhancers following the disengagement of SOX2/POU5F1 from major regulatory functions and subsequent binding by OTX2. These new insights into EpiSC gene regulatory networks gained from this study are highly relevant to early stage embryogenesis. PMID:28455373
Kohn, Donald B; Porteus, Matthew H; Scharenberg, Andrew M
Gene editing is a rapidly developing area of biotechnology in which the nucleotide sequence of the genome of living cells is precisely changed. The use of genome-editing technologies to modify various types of blood cells, including hematopoietic stem cells, has emerged as an important field of therapeutic development for hematopoietic disease. Although these technologies offer the potential for generation of transformative therapies for patients suffering from myriad disorders of hematopoiesis, their application for therapeutic modification of primary human cells is still in its infancy. Consequently, development of ethical and regulatory frameworks that ensure their safe and effective use is an increasingly important consideration. Here, we review a number of issues that have the potential to impact the clinical implementation of genome-editing technologies, and suggest paths forward for resolving them such that new therapies can be safely and rapidly translated to the clinic. © 2016 by The American Society of Hematology.
Full Text Available Salmonella enterica pathogenicity island 1 (SPI-1 encodes proteins required for invasion of gut epithelial cells. The timing of invasion is tightly controlled by a complex regulatory network. The transcription factor (TF HilD is the master regulator of this process and senses environmental signals associated with invasion. HilD activates transcription of genes within and outside SPI-1, including six other TFs. Thus, the transcriptional program associated with host cell invasion is controlled by at least 7 TFs. However, very few of the regulatory targets are known for these TFs, and the extent of the regulatory network is unclear. In this study, we used complementary genomic approaches to map the direct regulatory targets of all 7 TFs. Our data reveal a highly complex and interconnected network that includes many previously undescribed regulatory targets. Moreover, the network extends well beyond the 7 TFs, due to the inclusion of many additional TFs and noncoding RNAs. By comparing gene expression profiles of regulatory targets for the 7 TFs, we identified many uncharacterized genes that are likely to play direct roles in invasion. We also uncovered cross talk between SPI-1 regulation and other regulatory pathways, which, in turn, identified gene clusters that likely share related functions. Our data are freely available through an intuitive online browser and represent a valuable resource for the bacterial research community.
Sang Woo Seo
Full Text Available Three transcription factors (TFs, OxyR, SoxR, and SoxS, play a critical role in transcriptional regulation of the defense system for oxidative stress in bacteria. However, their full genome-wide regulatory potential is unknown. Here, we perform a genome-scale reconstruction of the OxyR, SoxR, and SoxS regulons in Escherichia coli K-12 MG1655. Integrative data analysis reveals that a total of 68 genes in 51 transcription units (TUs belong to these regulons. Among them, 48 genes showed more than 2-fold changes in expression level under single-TF-knockout conditions. This reconstruction expands the genome-wide roles of these factors to include direct activation of genes related to amino acid biosynthesis (methionine and aromatic amino acids, cell wall synthesis (lipid A biosynthesis and peptidoglycan growth, and divalent metal ion transport (Mn2+, Zn2+, and Mg2+. Investigating the co-regulation of these genes with other stress-response TFs reveals that they are independently regulated by stress-specific TFs.
Full Text Available Abstract Background Signal transduction is the major mechanism through which cells transmit external stimuli to evoke intracellular biochemical responses. Diverse cellular stimuli create a wide variety of transcription factor activities through signal transduction pathways, resulting in different gene expression patterns. Understanding the relationship between external stimuli and the corresponding cellular responses, as well as the subsequent effects on downstream genes, is a major challenge in systems biology. Thus, a systematic approach is needed to integrate experimental data and theoretical hypotheses to identify the physiological consequences of environmental stimuli. Results We proposed a systematic approach that combines forward and reverse engineering to link the signal transduction cascade with the gene responses. To demonstrate the feasibility of our strategy, we focused on linking the NF-κB signaling pathway with the inflammatory gene regulatory responses because NF-κB has long been recognized to play a crucial role in inflammation. We first utilized forward engineering (Hybrid Functional Petri Nets to construct the NF-κB signaling pathway and reverse engineering (Network Components Analysis to build a gene regulatory network (GRN. Then, we demonstrated that the corresponding IKK profiles can be identified in the GRN and are consistent with the experimental validation of the IKK kinase assay. We found that the time-lapse gene expression of several cytokines and chemokines (TNF-α, IL-1, IL-6, CXCL1, CXCL2 and CCL3 is concordant with the NF-κB activity profile, and these genes have stronger influence strength within the GRN. Such regulatory effects have highlighted the crucial roles of NF-κB signaling in the acute inflammatory response and enhance our understanding of the systemic inflammatory response syndrome. Conclusion We successfully identified and distinguished the corresponding signaling profiles among three microarray
Thomas, P.; Durek, P.; Solt, I.; Klinger, B.; Witzel, F.; Schulthess, P.; Mayer, Y.; Tikk, D.; Blüthgen, N.; Leser, U.
Motivation: A highly interlinked network of transcription factors (TFs) orchestrates the context-dependent expression of human genes. ChIP-chip experiments that interrogate the binding of particular TFs to genomic regions are used to reconstruct gene regulatory networks at genome-scale, but are
Matsuda, Kazunari; Mikami, Tomoyuki; Oki, Shinya; Iida, Hideaki; Andrabi, Munazah; Boss, Jeremy M; Yamaguchi, Katsushi; Shigenobu, Shuji; Kondoh, Hisato
To obtain insight into the transcription factor (TF)-dependent regulation of epiblast stem cells (EpiSCs), we performed ChIP-seq analysis of the genomic binding regions of five major TFs. Analysis of in vivo biotinylated ZIC2, OTX2, SOX2, POU5F1 and POU3F1 binding in EpiSCs identified several new features. (1) Megabase-scale genomic domains rich in ZIC2 peaks and genes alternate with those rich in POU3F1 but sparse in genes, reflecting the clustering of regulatory regions that act at short and long-range, which involve binding of ZIC2 and POU3F1, respectively. (2) The enhancers bound by ZIC2 and OTX2 prominently regulate TF genes in EpiSCs. (3) The binding sites for SOX2 and POU5F1 in mouse embryonic stem cells (ESCs) and EpiSCs are divergent, reflecting the shift in the major acting TFs from SOX2/POU5F1 in ESCs to OTX2/ZIC2 in EpiSCs. (4) This shift in the major acting TFs appears to be primed by binding of ZIC2 in ESCs at relevant genomic positions that later function as enhancers following the disengagement of SOX2/POU5F1 from major regulatory functions and subsequent binding by OTX2. These new insights into EpiSC gene regulatory networks gained from this study are highly relevant to early stage embryogenesis. © 2017. Published by The Company of Biologists Ltd.
Aalt D J van Dijk
Full Text Available Mutational robustness of gene regulatory networks refers to their ability to generate constant biological output upon mutations that change network structure. Such networks contain regulatory interactions (transcription factor-target gene interactions but often also protein-protein interactions between transcription factors. Using computational modeling, we study factors that influence robustness and we infer several network properties governing it. These include the type of mutation, i.e. whether a regulatory interaction or a protein-protein interaction is mutated, and in the case of mutation of a regulatory interaction, the sign of the interaction (activating vs. repressive. In addition, we analyze the effect of combinations of mutations and we compare networks containing monomeric with those containing dimeric transcription factors. Our results are consistent with available data on biological networks, for example based on evolutionary conservation of network features. As a novel and remarkable property, we predict that networks are more robust against mutations in monomer than in dimer transcription factors, a prediction for which analysis of conservation of DNA binding residues in monomeric vs. dimeric transcription factors provides indirect evidence.
Garg, Abhishek; Mohanram, Kartik; De Micheli, Giovanni; Xenarios, Ioannis
Advancements in high-throughput technologies to measure increasingly complex biological phenomena at the genomic level are rapidly changing the face of biological research from the single-gene single-protein experimental approach to studying the behavior of a gene in the context of the entire genome (and proteome). This shift in research methodologies has resulted in a new field of network biology that deals with modeling cellular behavior in terms of network structures such as signaling pathways and gene regulatory networks. In these networks, different biological entities such as genes, proteins, and metabolites interact with each other, giving rise to a dynamical system. Even though there exists a mature field of dynamical systems theory to model such network structures, some technical challenges are unique to biology such as the inability to measure precise kinetic information on gene-gene or gene-protein interactions and the need to model increasingly large networks comprising thousands of nodes. These challenges have renewed interest in developing new computational techniques for modeling complex biological systems. This chapter presents a modeling framework based on Boolean algebra and finite-state machines that are reminiscent of the approach used for digital circuit synthesis and simulation in the field of very-large-scale integration (VLSI). The proposed formalism enables a common mathematical framework to develop computational techniques for modeling different aspects of the regulatory networks such as steady-state behavior, stochasticity, and gene perturbation experiments.
Full Text Available With rapid development of high-throughput techniques and accumulation of big transcriptomic data, plenty of computational methods and algorithms such as differential analysis and network analysis have been proposed to explore genome-wide gene expression characteristics. These efforts are aiming to transform underlying genomic information into valuable knowledges in biological and medical research fields. Recently, tremendous integrative research methods are dedicated to interpret the development and progress of neoplastic diseases, whereas differential regulatory analysis (DRA based on gene coexpression network (GCN increasingly plays a robust complement to regular differential expression analysis in revealing regulatory functions of cancer related genes such as evading growth suppressors and resisting cell death. Differential regulatory analysis based on GCN is prospective and shows its essential role in discovering the system properties of carcinogenesis features. Here we briefly review the paradigm of differential regulatory analysis based on GCN. We also focus on the applications of differential regulatory analysis based on GCN in cancer research and point out that DRA is necessary and extraordinary to reveal underlying molecular mechanism in large-scale carcinogenesis studies.
Christian L Barrett
Full Text Available The number of complete, publicly available genome sequences is now greater than 200, and this number is expected to rapidly grow in the near future as metagenomic and environmental sequencing efforts escalate and the cost of sequencing drops. In order to make use of this data for understanding particular organisms and for discerning general principles about how organisms function, it will be necessary to reconstruct their various biochemical reaction networks. Principal among these will be transcriptional regulatory networks. Given the physical and logical complexity of these networks, the various sources of (often noisy data that can be utilized for their elucidation, the monetary costs involved, and the huge number of potential experiments approximately 10(12 that can be performed, experiment design algorithms will be necessary for synthesizing the various computational and experimental data to maximize the efficiency of regulatory network reconstruction. This paper presents an algorithm for experimental design to systematically and efficiently reconstruct transcriptional regulatory networks. It is meant to be applied iteratively in conjunction with an experimental laboratory component. The algorithm is presented here in the context of reconstructing transcriptional regulation for metabolism in Escherichia coli, and, through a retrospective analysis with previously performed experiments, we show that the produced experiment designs conform to how a human would design experiments. The algorithm is able to utilize probability estimates based on a wide range of computational and experimental sources to suggest experiments with the highest potential of discovering the greatest amount of new regulatory knowledge.
Bojanovic, Klara; Long, Katherine
chemicals and has a potential to be used as an efficient cell factory for various products. P. putida KT2240 is a genome-sequenced strain and a well characterized pseudomonad. Our major aim is to identify small RNA molecules (sRNAs) and their regulatory networks. A previous study has identified 37 sRNAs...... in this strain, while in other pseudomonads many more sRNAs have been found so far.P. putida KT2440 has been grown in different conditions which are likely to be encountered in industrial fermentations with the aim of using sRNAs for generation of improved cell factories. For that, cells have been grown in LB...... and harvested in different growth phases, as well as osmotic, membrane and oxidative stress conditions. RNA sequencing data has been analysed with the open source software system Rockhopper, and it has revealed over 180 putative sRNAs. Most of them (86%) seem to be novel and uncharacterized. The majority...
Cussat-Blanc, Sylvain; Harrington, Kyle; Pollack, Jordan
Artificial gene regulatory networks (GRNs) are biologically inspired dynamical systems used to control various kinds of agents, from the cells in developmental models to embodied robot swarms. Most recent work uses a genetic algorithm (GA) or an evolution strategy in order to optimize the network for a specific task. However, the empirical performances of these algorithms are unsatisfactory. This paper presents an algorithm that primarily exploits a network distance metric, which allows genet...
Cussat-Blanc, Sylvain; Harrington, Kyle; Pollack, Jordan
International audience; Artificial gene regulatory networks (GRNs) are biologically inspired dynamical systems used to control various kinds of agents, from the cells in developmental models to embodied robot swarms. Most recent work uses a genetic algorithm (GA) or an evolution strategy in order to optimize the network for a specific task. However, the empirical performances of these algorithms are unsatisfactory. This paper presents an algorithm that primarily exploits a network distance me...
Baran, Nicole M.; Patrick T McGrath; Streelman, J Todd
Animal behavior is ultimately the product of gene regulatory networks (GRNs) for brain development and neural networks for brain function. The GRN approach has advanced the fields of genomics and development, and we identify organizational similarities between networks of genes that build the brain and networks of neurons that encode brain function. In this perspective, we engage the analogy between developmental networks and neural networks, exploring the advantages of using GRN logic to stu...
Fioravanti, Fabio; Helmer-Citterich, Manuela; Nardelli, Enrico
Gene regulatory networks are widely used by biologists to describe the interactions among genes, proteins and other components at the intra-cellular level. Recently, a great effort has been devoted to give gene regulatory networks a formal semantics based on existing computational frameworks.For this purpose, we consider Statecharts, which are a modular, hierarchical and executable formal model widely used to represent software systems. We use Statecharts for modeling small and recurring patterns of interactions in gene regulatory networks, called motifs. We present an improved method for modeling gene regulatory network motifs using Statecharts and we describe the successful modeling of several motifs, including those which could not be modeled or whose models could not be distinguished using the method of a previous proposal.We model motifs in an easy and intuitive way by taking advantage of the visual features of Statecharts. Our modeling approach is able to simulate some interesting temporal properties of gene regulatory network motifs: the delay in the activation and the deactivation of the "output" gene in the coherent type-1 feedforward loop, the pulse in the incoherent type-1 feedforward loop, the bistability nature of double positive and double negative feedback loops, the oscillatory behavior of the negative feedback loop, and the "lock-in" effect of positive autoregulation. We present a Statecharts-based approach for the modeling of gene regulatory network motifs in biological systems. The basic motifs used to build more complex networks (that is, simple regulation, reciprocal regulation, feedback loop, feedforward loop, and autoregulation) can be faithfully described and their temporal dynamics can be analyzed.
Weaver, D.C.; Workman, Christopher; Stormo, Gary D.
Systematic gene expression analyses provide comprehensive information about the transcriptional responseto different environmental and developmental conditions. With enough gene expression data points,computational biologists may eventually generate predictive computer models of transcription...... regulation.Such models will require computational methodologies consistent with the behavior of known biologicalsystems that remain tractable. We represent regulatory relationships between genes as linear coefficients orweights, with the "net" regulation influence on a gene's expression being...... the mathematical summation of theindependent regulatory inputs. Test regulatory networks generated with this approach display stable andcyclically stable gene expression levels, consistent with known biological systems. We include variables tomodel the effect of environmental conditions on transcription regulation...
Rodionov, Dmitry A [Sanford-Burnham Medical Research Institute; Novichkov, Pavel S [Lawrence Berkeley National Laboratory
This project had the goal(s) of development of integrated bioinformatics platform for genome-scale inference and visualization of transcriptional regulatory networks (TRNs) in bacterial genomes. The work was done in Sanford-Burnham Medical Research Institute (SBMRI, P.I. D.A. Rodionov) and Lawrence Berkeley National Laboratory (LBNL, co-P.I. P.S. Novichkov). The developed computational resources include: (1) RegPredict web-platform for TRN inference and regulon reconstruction in microbial genomes, and (2) RegPrecise database for collection, visualization and comparative analysis of transcriptional regulons reconstructed by comparative genomics. These analytical resources were selected as key components in the DOE Systems Biology KnowledgeBase (SBKB). The high-quality data accumulated in RegPrecise will provide essential datasets of reference regulons in diverse microbes to enable automatic reconstruction of draft TRNs in newly sequenced genomes. We outline our progress toward the three aims of this grant proposal, which were: Develop integrated platform for genome-scale regulon reconstruction; Infer regulatory annotations in several groups of bacteria and building of reference collections of microbial regulons; and Develop KnowledgeBase on microbial transcriptional regulation.
Kalender Atak, Zeynep; Imrichova, Hana; Svetlichnyy, Dmitry; Hulselmans, Gert; Christiaens, Valerie; Reumers, Joke; Ceulemans, Hugo; Aerts, Stein
The identification of functional non-coding mutations is a key challenge in the field of genomics. Here we introduce μ-cisTarget to filter, annotate and prioritize cis-regulatory mutations based on their putative effect on the underlying "personal" gene regulatory network. We validated μ-cisTarget by re-analyzing the TAL1 and LMO1 enhancer mutations in T-ALL, and the TERT promoter mutation in melanoma. Next, we re-sequenced the full genomes of ten cancer cell lines and used matched transcriptome data and motif discovery to identify master regulators with de novo binding sites that result in the up-regulation of nearby oncogenic drivers. μ-cisTarget is available from http://mucistarget.aertslab.org .
Needham, Chris J; Manfield, Iain W; Bulpitt, Andrew J; Gilmartin, Philip M; Westhead, David R
The elucidation of networks from a compendium of gene expression data is one of the goals of systems biology and can be a valuable source of new hypotheses for experimental researchers. For Arabidopsis, there exist several thousand microarrays which form a valuable resource from which to learn. A novel Bayesian network-based algorithm to infer gene regulatory networks from gene expression data is introduced and applied to learn parts of the transcriptomic network in Arabidopsis thaliana from a large number (thousands) of separate microarray experiments. Starting from an initial set of genes of interest, a network is grown by iterative addition to the model of the gene, from another defined set of genes, which gives the 'best' learned network structure. The gene set for iterative growth can be as large as the entire genome. A number of networks are inferred and analysed; these show (i) an agreement with the current literature on the circadian clock network, (ii) the ability to model other networks, and (iii) that the learned network hypotheses can suggest new roles for poorly characterized genes, through addition of relevant genes from an unconstrained list of over 15,000 possible genes. To demonstrate the latter point, the method is used to suggest that particular GATA transcription factors are regulators of photosynthetic genes. Additionally, the performance in recovering a known network from different amounts of synthetically generated data is evaluated. Our results show that plausible regulatory networks can be learned from such gene expression data alone. This work demonstrates that network hypotheses can be generated from existing gene expression data for use by experimental biologists.
Schober, Steffen; Kracht, David; Heckel, Reinhard; Bossert, Martin
Boolean models of regulatory networks are assumed to be tolerant to perturbations. That qualitatively implies that each function can only depend on a few nodes. Biologically motivated constraints further show that functions found in Boolean regulatory networks belong to certain classes of functions, for example, the unate functions. It turns out that these classes have specific properties in the Fourier domain. That motivates us to study the problem of detecting controlling nodes in classes of Boolean networks using spectral techniques. We consider networks with unbalanced functions and functions of an average sensitivity less than 23k, where k is the number of controlling variables for a function. Further, we consider the class of 1-low networks which include unate networks, linear threshold networks, and networks with nested canalyzing functions. We show that the application of spectral learning algorithms leads to both better time and sample complexity for the detection of controlling nodes compared with algorithms based on exhaustive search. For a particular algorithm, we state analytical upper bounds on the number of samples needed to find the controlling nodes of the Boolean functions. Further, improved algorithms for detecting controlling nodes in large-scale unate networks are given and numerically studied.
Elkon, Ran; Agami, Reuven
Genetic variants associated with common diseases are usually located in noncoding parts of the human genome. Delineation of the full repertoire of functional noncoding elements, together with efficient methods for probing their biological roles, is therefore of crucial importance. Over the past decade, DNA accessibility and various epigenetic modifications have been associated with regulatory functions. Mapping these features across the genome has enabled researchers to begin to document the full complement of putative regulatory elements. High-throughput reporter assays to probe the functions of regulatory regions have also been developed but these methods separate putative regulatory elements from the chromosome so that any effects of chromatin context and long-range regulatory interactions are lost. Definitive assignment of function(s) to putative cis-regulatory elements requires perturbation of these elements. Genome-editing technologies are now transforming our ability to perturb regulatory elements across entire genomes. Interpretation of high-throughput genetic screens that incorporate genome editors might enable the construction of an unbiased map of functional noncoding elements in the human genome.
Wang, Chen; Xuan, Jianhua; Shih, Ie-Ming; Clarke, Robert; Wang, Yue
With the advent of high-throughput biotechnology capable of monitoring genomic signals, it becomes increasingly promising to understand molecular cellular mechanisms through systems biology approaches. One of the active research topics in systems biology is to infer gene transcriptional regulatory networks using various genomic data; this inference problem can be formulated as a linear model with latent signals associated with some regulatory proteins called transcription factors (TFs). As common statistical assumptions may not hold for genomic signals, typical latent variable algorithms such as independent component analysis (ICA) are incapable to reveal underlying true regulatory signals. Liao et al.  proposed to perform inference using an approach named network component analysis (NCA), the optimization of which is achieved by a least-squares fitting approach with biological knowledge constraints. However, the incompleteness of biological knowledge and its inconsistency with gene expression data are not considered in the original NCA solution, which could greatly affect the inference accuracy. To overcome these limitations, we propose a linear extraction scheme, namely regulatory component analysis (RCA), to infer underlying regulatory signals even with partial biological knowledge. Numerical simulations show a significant improvement of our proposed RCA over NCA, not only when signal-to-noise-ratio (SNR) is low, but also when the given biological knowledge is incomplete and inconsistent to gene expression data. Furthermore, real biological experiments on E. coli are performed for regulatory network inference in comparison with several typical linear latent variable methods, which again demonstrates the effectiveness and improved performance of the proposed algorithm.
Liu, Fei; Zhang, Shao-Wu; Guo, Wei-Feng; Wei, Ze-Gang; Chen, Luonan
The inference of gene regulatory networks (GRNs) from expression data can mine the direct regulations among genes and gain deep insights into biological processes at a network level. During past decades, numerous computational approaches have been introduced for inferring the GRNs. However, many of them still suffer from various problems, e.g., Bayesian network (BN) methods cannot handle large-scale networks due to their high computational complexity, while information theory-based methods cannot identify the directions of regulatory interactions and also suffer from false positive/negative problems. To overcome the limitations, in this work we present a novel algorithm, namely local Bayesian network (LBN), to infer GRNs from gene expression data by using the network decomposition strategy and false-positive edge elimination scheme. Specifically, LBN algorithm first uses conditional mutual information (CMI) to construct an initial network or GRN, which is decomposed into a number of local networks or GRNs. Then, BN method is employed to generate a series of local BNs by selecting the k-nearest neighbors of each gene as its candidate regulatory genes, which significantly reduces the exponential search space from all possible GRN structures. Integrating these local BNs forms a tentative network or GRN by performing CMI, which reduces redundant regulations in the GRN and thus alleviates the false positive problem. The final network or GRN can be obtained by iteratively performing CMI and local BN on the tentative network. In the iterative process, the false or redundant regulations are gradually removed. When tested on the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in E.coli, our results suggest that LBN outperforms other state-of-the-art methods (ARACNE, GENIE3 and NARROMI) significantly, with more accurate and robust performance. In particular, the decomposition strategy with local Bayesian networks not only effectively reduce
Chu, Brian K; Tse, Margaret J; Sato, Royce R; Read, Elizabeth L
Gene regulatory networks with dynamics characterized by multiple stable states underlie cell fate-decisions. Quantitative models that can link molecular-level knowledge of gene regulation to a global understanding of network dynamics have the potential to guide cell-reprogramming strategies. Networks are often modeled by the stochastic Chemical Master Equation, but methods for systematic identification of key properties of the global dynamics are currently lacking. The method identifies the number, phenotypes, and lifetimes of long-lived states for a set of common gene regulatory network models. Application of transition path theory to the constructed Markov State Model decomposes global dynamics into a set of dominant transition paths and associated relative probabilities for stochastic state-switching. In this proof-of-concept study, we found that the Markov State Model provides a general framework for analyzing and visualizing stochastic multistability and state-transitions in gene networks. Our results suggest that this framework-adopted from the field of atomistic Molecular Dynamics-can be a useful tool for quantitative Systems Biology at the network scale.
Full Text Available Abstract Background The ambition of most molecular biologists is the understanding of the intricate network of molecular interactions that control biological systems. As scientists uncover the components and the connectivity of these networks, it becomes possible to study their dynamical behavior as a whole and discover what is the specific role of each of their components. Since the behavior of a network is by no means intuitive, it becomes necessary to use computational models to understand its behavior and to be able to make predictions about it. Unfortunately, most current computational models describe small networks due to the scarcity of kinetic data available. To overcome this problem, we previously published a methodology to convert a signaling network into a dynamical system, even in the total absence of kinetic information. In this paper we present a software implementation of such methodology. Results We developed SQUAD, a software for the dynamic simulation of signaling networks using the standardized qualitative dynamical systems approach. SQUAD converts the network into a discrete dynamical system, and it uses a binary decision diagram algorithm to identify all the steady states of the system. Then, the software creates a continuous dynamical system and localizes its steady states which are located near the steady states of the discrete system. The software permits to make simulations on the continuous system, allowing for the modification of several parameters. Importantly, SQUAD includes a framework for perturbing networks in a manner similar to what is performed in experimental laboratory protocols, for example by activating receptors or knocking out molecular components. Using this software we have been able to successfully reproduce the behavior of the regulatory network implicated in T-helper cell differentiation. Conclusion The simulation of regulatory networks aims at predicting the behavior of a whole system when subject
Xiong, Hao; Choe, Yoonsuck
Reverse engineering of genetic regulatory networks from experimental data is the first step toward the modeling of genetic networks. Linear state-space models, also known as linear dynamical models, have been applied to model genetic networks from gene expression time series data, but existing works have not taken into account available structural information. Without structural constraints, estimated models may contradict biological knowledge and estimation methods may over-fit. In this report, we extended expectation-maximization (EM) algorithms to incorporate prior network structure and to estimate genetic regulatory networks that can track and predict gene expression profiles. We applied our method to synthetic data and to SOS data and showed that our method significantly outperforms the regular EM without structural constraints. The Matlab code is available upon request and the SOS data can be downloaded from http://www.weizmann.ac.il/mcb/UriAlon/Papers/SOSData/, courtesy of Uri Alon. Zak's data is available from his website, http://www.che.udel.edu/systems/people/zak.
Brant K Peterson
Full Text Available The identification of regulatory sequences in animal genomes remains a significant challenge. Comparative genomic methods that use patterns of evolutionary conservation to identify non-coding sequences with regulatory function have yielded many new vertebrate enhancers. However, these methods have not contributed significantly to the identification of regulatory sequences in sequenced invertebrate taxa. We demonstrate here that this differential success, which is often attributed to fundamental differences in the nature of vertebrate and invertebrate regulatory sequences, is instead primarily a product of the relatively small size of sequenced invertebrate genomes. We sequenced and compared loci involved in early embryonic patterning from four species of true fruit flies (family Tephritidae that have genomes four to six times larger than those of Drosophila melanogaster. Unlike in Drosophila, where virtually all non-coding DNA is highly conserved, blocks of conserved non-coding sequence in tephritids are flanked by large stretches of poorly conserved sequence, similar to what is observed in vertebrate genomes. We tested the activities of nine conserved non-coding sequences flanking the even-skipped gene of the teprhitid Ceratis capitata in transgenic D. melanogaster embryos, six of which drove patterns that recapitulate those of known D. melanogaster enhancers. In contrast, none of the three non-conserved tephritid non-coding sequences that we tested drove expression in D. melanogaster embryos. Based on the landscape of non-coding conservation in tephritids, and our initial success in using conservation in tephritids to identify D. melanogaster regulatory sequences, we suggest that comparison of tephritid genomes may provide a systematic means to annotate the non-coding portion of the D. melanogaster genome. We also propose that large genomes be given more consideration in the selection of species for comparative genomics projects, to provide
Joana P Gonçalves
Full Text Available Explaining regulatory mechanisms is crucial to understand complex cellular responses leading to system perturbations. Some strategies reverse engineer regulatory interactions from experimental data, while others identify functional regulatory units (modules under the assumption that biological systems yield a modular organization. Most modular studies focus on network structure and static properties, ignoring that gene regulation is largely driven by stimulus-response behavior. Expression time series are key to gain insight into dynamics, but have been insufficiently explored by current methods, which often (1 apply generic algorithms unsuited for expression analysis over time, due to inability to maintain the chronology of events or incorporate time dependency; (2 ignore local patterns, abundant in most interesting cases of transcriptional activity; (3 neglect physical binding or lack automatic association of regulators, focusing mainly on expression patterns; or (4 limit the discovery to a predefined number of modules. We propose Regulatory Snapshots, an integrative mining approach to identify regulatory modules over time by combining transcriptional control with response, while overcoming the above challenges. Temporal biclustering is first used to reveal transcriptional modules composed of genes showing coherent expression profiles over time. Personalized ranking is then applied to prioritize prominent regulators targeting the modules at each time point using a network of documented regulatory associations and the expression data. Custom graphics are finally depicted to expose the regulatory activity in a module at consecutive time points (snapshots. Regulatory Snapshots successfully unraveled modules underlying yeast response to heat shock and human epithelial-to-mesenchymal transition, based on regulations documented in the YEASTRACT and JASPAR databases, respectively, and available expression data. Regulatory players involved in
Ferreira, Luiz Henrique O.; de Castro, Maria Clicia S.; da Silva, Fabricio A. B.
Boolean networks have been used for some time to model Gene Regulatory Networks (GRNs), which describe cell functions. Those models can help biologists to make predictions, prognosis and even specialized treatment when some disturb on the GRN lead to a sick condition. However, the amount of information related to a GRN can be huge, making the task of inferring its boolean network representation quite a challenge. The method shown here takes into account information about the interactome to build a network, where each node represents a protein, and uses the entropy of each node as a key to reduce the size of the network, allowing the further inferring process to focus only on the main protein hubs, the ones with most potential to interfere in overall network behavior.
Full Text Available Regulatory motifs are patterns of activation and inhibition that appear repeatedly in various signaling networks and that show specific regulatory properties. However, the network structures of regulatory motifs are highly diverse and complex, rendering their identification difficult. Here, we present a RMOD, a web-based system for the identification of regulatory motifs and their properties in signaling networks. RMOD finds various network structures of regulatory motifs by compressing the signaling network and detecting the compressed forms of regulatory motifs. To apply it into a large-scale signaling network, it adopts a new subgraph search algorithm using a novel data structure called path-tree, which is a tree structure composed of isomorphic graphs of query regulatory motifs. This algorithm was evaluated using various sizes of signaling networks generated from the integration of various human signaling pathways and it showed that the speed and scalability of this algorithm outperforms those of other algorithms. RMOD includes interactive analysis and auxiliary tools that make it possible to manipulate the whole processes from building signaling network and query regulatory motifs to analyzing regulatory motifs with graphical illustration and summarized descriptions. As a result, RMOD provides an integrated view of the regulatory motifs and mechanism underlying their regulatory motif activities within the signaling network. RMOD is freely accessible online at the following URL: http://pks.kaist.ac.kr/rmod.
Kim, Jinki; Yi, Gwan-Su
Regulatory motifs are patterns of activation and inhibition that appear repeatedly in various signaling networks and that show specific regulatory properties. However, the network structures of regulatory motifs are highly diverse and complex, rendering their identification difficult. Here, we present a RMOD, a web-based system for the identification of regulatory motifs and their properties in signaling networks. RMOD finds various network structures of regulatory motifs by compressing the signaling network and detecting the compressed forms of regulatory motifs. To apply it into a large-scale signaling network, it adopts a new subgraph search algorithm using a novel data structure called path-tree, which is a tree structure composed of isomorphic graphs of query regulatory motifs. This algorithm was evaluated using various sizes of signaling networks generated from the integration of various human signaling pathways and it showed that the speed and scalability of this algorithm outperforms those of other algorithms. RMOD includes interactive analysis and auxiliary tools that make it possible to manipulate the whole processes from building signaling network and query regulatory motifs to analyzing regulatory motifs with graphical illustration and summarized descriptions. As a result, RMOD provides an integrated view of the regulatory motifs and mechanism underlying their regulatory motif activities within the signaling network. RMOD is freely accessible online at the following URL: http://pks.kaist.ac.kr/rmod.
Full Text Available It is increasingly apparent that genes and networks that influence complex behaviour are evolutionary conserved, which is paradoxical considering that behaviour is labile over evolutionary timescales. How does adaptive change in behaviour arise if behaviour is controlled by conserved, pleiotropic, and likely evolutionary constrained genes? Pleiotropy and connectedness are known to constrain the general rate of protein evolution, prompting some to suggest that the evolution of complex traits, including behaviour, is fuelled by regulatory sequence evolution. However, we seldom have data on the strength of selection on mutations in coding and regulatory sequences, and this hinders our ability to study how pleiotropy influences coding and regulatory sequence evolution. Here we use population genomics to estimate the strength of selection on coding and regulatory mutations for a transcriptional regulatory network that influences complex behaviour of honey bees. We found that replacement mutations in highly connected transcription factors and target genes experience significantly stronger negative selection relative to weakly connected transcription factors and targets. Adaptively evolving proteins were significantly more likely to reside at the periphery of the regulatory network, while proteins with signs of negative selection were near the core of the network. Interestingly, connectedness and network structure had minimal influence on the strength of selection on putative regulatory sequences for both transcription factors and their targets. Our study indicates that adaptive evolution of complex behaviour can arise because of positive selection on protein-coding mutations in peripheral genes, and on regulatory sequence mutations in both transcription factors and their targets throughout the network.
Full Text Available In this study, we infer the breast cancer gene regulatory network from gene expression data. This network is obtained from the application of the BC3Net inference algorithm to a large-scale gene expression data set consisting of $351$ patient samples. In order to elucidate the functional relevance of the inferred network, we are performing a Gene Ontology (GO analysis for its structural components. Our analysis reveals that most significant GO-terms we find for the breast cancer network represent functional modules of biological processes that are described by known cancer hallmarks, including translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. Furthermore, by using a curated list of census cancer genes, we find an enrichment in these functional modules. Finally, we study cooperative effects of chromosomes based on information of interacting genes in the beast cancer network. We find that chromosome $21$ is most coactive with other chromosomes. To our knowledge this is the first study investigating the genome-scale breast cancer network.
We present the current form of a provisional DNA sequence-based regulatory gene network that explains in outline how endomesodermal specification in the sea urchin embryo is controlled. The model of the network is in a continuous process of revision and growth as new genes are added and new experimental results become available; see http://www.its.caltech.edu/mirsky/endomeso.htm (End-mes Gene Network Update) for the latest version. The network contains over 40 genes at present, many newly uncovered in the course of this work, and most encoding DNA-binding transcriptional regulatory factors. The architecture of the network was approached initially by construction of a logic model that integrated the extensive experimental evidence now available on endomesoderm specification. The internal linkages between genes in the network have been determined functionally, by measurement of the effects of regulatory perturbations on the expression of all relevant genes in the network. Five kinds of perturbation have been applied: (1) use of morpholino antisense oligonucleotides targeted to many of the key regulatory genes in the network; (2) transformation of other regulatory factors into dominant repressors by construction of Engrailed repressor domain fusions; (3) ectopic expression of given regulatory factors, from genetic expression constructs and from injected mRNAs; (4) blockade of the beta-catenin/Tcf pathway by introduction of mRNA encoding the intracellular domain of cadherin; and (5) blockade of the Notch signaling pathway by introduction of mRNA encoding the extracellular domain of the Notch receptor. The network model predicts the cis-regulatory inputs that link each gene into the network. Therefore, its architecture is testable by cis-regulatory analysis. Strongylocentrotus purpuratus and Lytechinus variegatus genomic BAC recombinants that include a large number of the genes in the network have been sequenced and annotated. Tests of the cis-regulatory predictions of
Smyth, Stuart J
New breeding techniques in plant agriculture exploded upon the scene about two years ago, in 2014. While these innovative plant breeding techniques, soon to be led by CRISPR/Cas9, initially appear to hold tremendous promise for plant breeding, if not a revolution for the industry, the question of how the products of these technologies will be regulated is rapidly becoming a key aspect of the technology's future potential. Regulation of innovative technologies and products has always lagged that of the science, but in the past decade, regulatory systems in many jurisdictions have become gridlocked as they try to regulate genetically modified (GM) crops. This regulatory incapability to efficiently assess and approve innovative new agricultural products is particularly important for new plant breeding techniques as if these techniques are classified as genetically modified breeding techniques, then their acceptance and future will diminish considerably as they will be rejected by the European Union. Conversely, if the techniques are accepted as conventional plant breeding, then the future is blindingly bright. This article examines the international debate about the regulation of new plant breeding techniques and then assesses how the Canadian regulatory system has approached the regulation of these technologies through two more public product approvals, GM apples and GM potatoes, then discusses other crop variety approval and those in the regulatory pipeline.
Russo, Francesco; Belling, Kirstine; Jensen, Anders Boeck
MicroRNAs (miRNAs) are small noncoding RNAs involved in the posttranscriptional regulation of messenger RNAs (mRNAs). Each miRNA targets a specific set of mRNAs. Upon binding the miRNA inhibits mRNA translation or facilitate mRNA degradation. miRNAs are frequently deregulated in several pathologi...... on sequence complementarity and integration of expression data. In the last section of the chapter we discuss new opportunities in the study of miRNA regulatory networks in the context of temporal disease progression and comorbidities....
Lawrence, Charles E. [Brown Univ., Providence, RI (United States); McCue, Lee Ann [Brown Univ., Providence, RI (United States)
The transcription regulatory network is arguably the most important foundation of cellular function, since it exerts the most fundamental control over the abundance of virtually all of a cell’s functional macromolecules. The two major components of a prokaryotic cell’s transcription regulation network are the transcription factors (TFs) and the transcription factor binding sites (TFBS); these components are connected by the binding of TFs to their cognate TFBS under appropriate environmental conditions. Comparative genomics has proven to be a powerful bioinformatics method with which to study transcription regulation on a genome-wide level. We have further extended comparative genomics technologies that we introduced over the last several years. Specifically, we developed and applied statistical approaches to analysis of correlated sequence data (i.e., sequences from closely related species). We also combined these technologies with functional genomic, proteomic and sequence data from multiple species, and developed computational technologies that provide inferences on the regulatory network connections, identifying the cognate transcription factor for predicted regulatory sites. Arguably the most important contribution of this work emerged in the course of the project. Specifically, the development of novel procedures of estimation and prediction in discrete high-D settings has broad implications for biology, genomics and well beyond. We showed that these procedures enjoy advantages over existing technologies in the identification of TBFS. These efforts are aimed toward identifying a cell’s complete transcription regulatory network and underlying molecular mechanisms.
Full Text Available Abstract Background Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria. Results To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. Multiple variations in regulatory strategies between the Shewanella spp. and E. coli include regulon contraction and expansion (as in the case of PdhR, HexR, FadR, numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. PsrA for fatty acid degradation and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp. Conclusions We tentatively defined the first reference collection of ~100 transcriptional regulons in 16 Shewanella genomes. The resulting regulatory network contains ~600 regulated genes per genome that are mostly involved in metabolism of carbohydrates, amino acids, fatty acids, vitamins, metals, and stress responses. Several reconstructed regulons including NagR for N-acetylglucosamine catabolism were experimentally validated in S
Baran, Nicole M; McGrath, Patrick T; Streelman, J Todd
Animal behavior is ultimately the product of gene regulatory networks (GRNs) for brain development and neural networks for brain function. The GRN approach has advanced the fields of genomics and development, and we identify organizational similarities between networks of genes that build the brain and networks of neurons that encode brain function. In this perspective, we engage the analogy between developmental networks and neural networks, exploring the advantages of using GRN logic to study behavior. Applying the GRN approach to the brain and behavior provides a quantitative and manipulative framework for discovery. We illustrate features of this framework using the example of social behavior and the neural circuitry of aggression.
Park, Heewon; Shimamura, Teppei; Imoto, Seiya; Miyano, Satoru
-specific generalized cross-validation for choosing the sample-specific tuning parameters in the kernel-based L1-type regularization method. Numerical studies demonstrate that the proposed adaptive NetworkProfiler effectively performs sample-specific gene network construction. We apply the proposed statistical strategy to the publicly available Sanger Genomic data analysis, and extract anti-cancer drug sensitivity-specific gene regulatory networks.
Andresen, Eva Kammer
are subjected to forces that allow the bacteria to break genomic constraints, remodel existing regulatory networks, and colonise new environments. While experimental evolution studies have documented that global regulators of gene expression are indeed targets for adaptive mutations, it is less clear to which......Bacteria are remarkable organisms with the capacity to adapt to new environments by remodelling their gene expression profiles. The specific genomic material of any bacterium determines its capacity for any gene regulatory repertoire. However, by evolutionary shaping, these regulatory networks......) as a natural model system, the work has focused on characterising a number of mutations in global regulators that are known to provide an adaptive advantage in this specific environment. The aim has been to provide a molecular explanation of the effects of the specific mutations in relation to regulatory...
Transcription regulation is important for nearly all cellular processes. To understand how transcription is regulated by different regulatory complexes, DNA microarray expression analysis is used to determine the genome-wide changes in mRNA levels upon deletion of individual factors that belong to
The rapid advancement of high-throughput technologies provides huge amounts of information for gene expression and protein activity in the genome-wide scale. The availability of genomics, transcriptomics, proteomics, and metabolomics dataset gives an unprecedented opportunity to study detailed molecular regulations that is very important to precision medicine. However, it is still a significant challenge to design effective and efficient method to infer the network structure and dynamic property of regulatory networks. In recent years a number of computing methods have been designed to explore the regulatory mechanisms as well as estimate unknown model parameters. Among them, the Bayesian inference method can combine both prior knowledge and experimental data to generate updated information regarding the regulatory mechanisms. This chapter gives a brief review for Bayesian statistical methods that are used to infer the network structure and estimate model parameters based on experimental data.
Voytas, Daniel F; Gao, Caixia
Plant agriculture is poised at a technological inflection point. Recent advances in genome engineering make it possible to precisely alter DNA sequences in living cells, providing unprecedented control over a plant's genetic material. Potential future crops derived through genome engineering include those that better withstand pests, that have enhanced nutritional value, and that are able to grow on marginal lands. In many instances, crops with such traits will be created by altering only a few nucleotides among the billions that comprise plant genomes. As such, and with the appropriate regulatory structures in place, crops created through genome engineering might prove to be more acceptable to the public than plants that carry foreign DNA in their genomes. Public perception and the performance of the engineered crop varieties will determine the extent to which this powerful technology contributes towards securing the world's food supply.
Daniel F Voytas
Full Text Available Plant agriculture is poised at a technological inflection point. Recent advances in genome engineering make it possible to precisely alter DNA sequences in living cells, providing unprecedented control over a plant's genetic material. Potential future crops derived through genome engineering include those that better withstand pests, that have enhanced nutritional value, and that are able to grow on marginal lands. In many instances, crops with such traits will be created by altering only a few nucleotides among the billions that comprise plant genomes. As such, and with the appropriate regulatory structures in place, crops created through genome engineering might prove to be more acceptable to the public than plants that carry foreign DNA in their genomes. Public perception and the performance of the engineered crop varieties will determine the extent to which this powerful technology contributes towards securing the world's food supply.
Roy, Sushmita; Thompson, Dawn
The opportunistic human fungal pathogen Candida glabrata is second only to C. albicans as the cause of Candida infections and yet is more closely related to Saccharomyces cerevisiae. Recent advances in functional genomics technologies and computational approaches to decipher regulatory networks, and the comparison of these networks among these and other Ascomycete species, have revealed both unique and shared strategies in adaptation to a human commensal/opportunistic pathogen lifestyle and antifungal drug resistance in C. glabrata. Recently, several C. glabrata sister species in the Nakeseomyces clade representing both human associated (commensal) and environmental isolates have had their genomes sequenced and analyzed. This has paved the way for comparative functional genomics studies to characterize the regulatory networks in these species to identify informative patterns of conservation and divergence linked to phenotypic evolution in the Nakaseomyces lineage. © FEMS 2015. All rights reserved. For permissions, please e-mail: email@example.com.
Regulatory networks underlying mycorrhizal development delineated by genome-wide expression profiling and functional analysis of the transcription factor repertoire of the plant symbiotic fungus Laccaria bicolor.
Daguerre, Y; Levati, E; Ruytinx, J; Tisserant, E; Morin, E; Kohler, A; Montanini, B; Ottonello, S; Brun, A; Veneault-Fourrey, C; Martin, F
as Secreted Transcriptional Activator Proteins (STAPs). Transcriptional regulators required for ECM symbiosis development in L. bicolor have been uncovered and classified through genome-wide analysis. This study also identifies the STAPs as a new class of potential ECM effectors, highly expressed in mycorrhizae, which may be involved in the control of the symbiotic root transcriptome.
Gene coexpression patterns can reveal gene collections with functional consistency. This study systematically constructs regulatory networks for pituitary tumours by integrating gene coexpression, transcriptional and posttranscriptional regulation. Through network analysis, we elaborate the incidence mechanism of pituitary ...
Dec 9, 2013 ... Abstract. Gene coexpression patterns can reveal gene collections with functional consistency. This study systematically constructs regulatory networks for pituitary tumours by integrating gene coexpression, transcriptional and posttranscriptional regulation. Through network analysis, we elaborate the ...
Talukdar, Husain A; Foroughi Asl, Hassan; Jain, Rajeev K; Ermel, Raili; Ruusalepp, Arno; Franzén, Oscar; Kidd, Brian A; Readhead, Ben; Giannarelli, Chiara; Kovacic, Jason C; Ivert, Torbjörn; Dudley, Joel T; Civelek, Mete; Lusis, Aldons J; Schadt, Eric E; Skogsberg, Josefin; Michoel, Tom; Björkegren, Johan L M
Inferring molecular networks can reveal how genetic perturbations interact with environmental factors to cause common complex diseases. We analyzed genetic and gene expression data from seven tissues relevant to coronary artery disease (CAD) and identified regulatory gene networks (RGNs) and their key drivers. By integrating data from genome-wide association studies, we identified 30 CAD-causal RGNs interconnected in vascular and metabolic tissues, and we validated them with corresponding data from the Hybrid Mouse Diversity Panel. As proof of concept, by targeting the key drivers AIP, DRAP1, POLR2I, and PQBP1 in a cross-species-validated, arterial-wall RGN involving RNA-processing genes, we re-identified this RGN in THP-1 foam cells and independent data from CAD macrophages and carotid lesions. This characterization of the molecular landscape in CAD will help better define the regulation of CAD candidate genes identified by genome-wide association studies and is a first step toward achieving the goals of precision medicine. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Wolt, Jeffrey D; Wang, Kan; Yang, Bing
Genome editing with engineered nucleases (GEEN) represents a highly specific and efficient tool for crop improvement with the potential to rapidly generate useful novel phenotypes/traits. Genome editing techniques initiate specifically targeted double strand breaks facilitating DNA-repair pathways that lead to base additions or deletions by non-homologous end joining as well as targeted gene replacements or transgene insertions involving homology-directed repair mechanisms. Many of these techniques and the ancillary processes they employ generate phenotypic variation that is indistinguishable from that obtained through natural means or conventional mutagenesis; and therefore, they do not readily fit current definitions of genetically engineered or genetically modified used within most regulatory regimes. Addressing ambiguities regarding the regulatory status of genome editing techniques is critical to their application for development of economically useful crop traits. Continued regulatory focus on the process used, rather than the nature of the novel phenotype developed, results in confusion on the part of regulators, product developers, and the public alike and creates uncertainty as of the use of genome engineering tools for crop improvement. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Walczak, Aleksandra M
Genetic regulatory networks enable cells to respond to the changes in internal and external conditions by dynamically coordinating their gene expression profiles. Our ability to make quantitative measurements in these biochemical circuits has deepened our understanding of what kinds of computations genetic regulatory networks can perform and with what reliability. These advances have motivated researchers to look for connections between the architecture and function of genetic regulatory networks. Transmitting information between network's inputs and its outputs has been proposed as one such possible measure of function, relevant in certain biological contexts. Here we summarize recent developments in the application of information theory to gene regulatory networks. We first review basic concepts in information theory necessary to understand recent work. We then discuss the functional complexity of gene regulation which arrises from the molecular nature of the regulatory interactions. We end by reviewing som...
The first step in the definition of transcriptional regulatory networks is to establish correct relationships between transcription factors (TFs) and their target genes, together with the effect of their regulatory activity (activator or repressor). Fundamental advances in this direction have been made possible by the introduction of experimental techniques such as Chromatin Immunoprecipitation, which, coupled with next-generation sequencing technologies (ChIP-Seq), permit the genome-wide identification of TF binding sites. This chapter provides a survey on how data of this kind are to be processed and integrated with expression and other types of data to infer transcriptional regulatory rules and codes.
Chain, P; Garcia, E; Mcloughlin, K; Ovcharenko, I
This project was begun to implement, test, and experimentally validate the results of a novel algorithm for genome-wide identification of candidate transcription-factor binding sites in prokaryotes. Most techniques used to identify regulatory regions rely on conservation between different genomes or have a predetermined sequence motif(s) to perform a genome-wide search. Therefore, such techniques cannot be used with new genome sequences, where information regarding such motifs has not yet been discovered. This project aimed to apply a de novo search algorithm to identify candidate binding-site motifs in intergenic regions of prokaryotic organisms, initially testing the available genomes of the Yersinia genus. We retrofitted existing nucleotide pattern-matching algorithms, analyzed the candidate sites identified by these algorithms as well as their target genes to screen for meaningful patterns. Using properly annotated prokaryotic genomes, this project aimed to develop a set of procedures to identify candidate intergenic sites important for gene regulation. We planned to demonstrate this in Yersinia pestis, a model biodefense, Category A Select Agent pathogen, and then follow up with experimental evidence that these regions are indeed involved in regulation. The ability to quickly characterize transcription-factor binding sites will help lead to a better understanding of how known virulence pathways are modulated in biodefense-related organisms, and will help our understanding and exploration of regulons--gene regulatory networks--and novel pathways for metabolic processes in environmental microbes.
Loots, G G; Sharan, R; Ovcharenko, I; Ben-Hur, A
The binding of transcription factors to specific regulatory sequence elements is a primary mechanism for controlling gene transcription. Eukaryotic genes are often regulated by several transcription factors, whose binding sites are tightly clustered and form cis-regulatory modules. In this paper we present a web-server, CREME, for identifying and visualizing cis-regulatory modules in the promoter regions of a given set of potentially co-regulated genes. CREME relies on a database of putative transcription factor binding sites that have been annotated across the human genome using a library of position weight matrices and evolutionary conservation with the mouse and rat genomes. A search algorithm is applied to this dataset to identify combinations of transcription factors whose binding sites tend to co-occur in close proximity in the promoter regions of the input gene set. The identified cis-regulatory modules are statistically scored and significant combinations are reported and graphically visualized. Our web-server is available at http://creme.dcode.org/.
Macnamara, Cicely K; Chaplain, Mark Aj
Gene regulatory networks (GRNs) play an important role in maintaining cellular function by correctly timing key processes such as cell division and apoptosis. GRNs are known to contain similar structural components, which describe how genes and proteins within a network interact - typically by feedback. In many GRNs, proteins bind to gene-sites in the nucleus thereby altering the transcription rate. If the binding reduces the transcription rate there is a negative feedback leading to oscillatory behaviour in mRNA and protein levels, both spatially (e.g. by observing fluorescently labelled molecules in single cells) and temporally (e.g. by observing protein/mRNA levels over time). Mathematical modelling of GRNs has focussed on such oscillatory behaviour. Recent computational modelling has demonstrated that spatial movement of the molecules is a vital component of GRNs, while it has been proved rigorously that the diffusion coefficient of the protein/mRNA acts as a bifurcation parameter and gives rise to a Hopf-bifurcation. In this paper we consider the spatial aspect further by considering the specific location of gene and protein production, showing that there is an optimum range for the distance between an mRNA gene-site and a protein production site in order to achieve oscillations. We first present a model of a well-known GRN, the Hes1 system, and then extend the approach to examine spatio-temporal models of synthetic GRNs e.g. n-gene repressilator and activator-repressor systems. By incorporating the idea of production sites into such models we show that the spatial component is vital to fully understand GRN dynamics. Copyright © 2016 Elsevier Ltd. All rights reserved.
Leslie M Turner
Full Text Available Hybrid dysfunction, a common feature of reproductive barriers between species, is often caused by negative epistasis between loci ("Dobzhansky-Muller incompatibilities". The nature and complexity of hybrid incompatibilities remain poorly understood because identifying interacting loci that affect complex phenotypes is difficult. With subspecies in the early stages of speciation, an array of genetic tools, and detailed knowledge of reproductive biology, house mice (Mus musculus provide a model system for dissecting hybrid incompatibilities. Male hybrids between M. musculus subspecies often show reduced fertility. Previous studies identified loci and several X chromosome-autosome interactions that contribute to sterility. To characterize the genetic basis of hybrid sterility in detail, we used a systems genetics approach, integrating mapping of gene expression traits with sterility phenotypes and QTL. We measured genome-wide testis expression in 305 male F2s from a cross between wild-derived inbred strains of M. musculus musculus and M. m. domesticus. We identified several thousand cis- and trans-acting QTL contributing to expression variation (eQTL. Many trans eQTL cluster into eleven 'hotspots,' seven of which co-localize with QTL for sterility phenotypes identified in the cross. The number and clustering of trans eQTL-but not cis eQTL-were substantially lower when mapping was restricted to a 'fertile' subset of mice, providing evidence that trans eQTL hotspots are related to sterility. Functional annotation of transcripts with eQTL provides insights into the biological processes disrupted by sterility loci and guides prioritization of candidate genes. Using a conditional mapping approach, we identified eQTL dependent on interactions between loci, revealing a complex system of epistasis. Our results illuminate established patterns, including the role of the X chromosome in hybrid sterility. The integrated mapping approach we employed is
Turner, Leslie M; White, Michael A; Tautz, Diethard; Payseur, Bret A
Hybrid dysfunction, a common feature of reproductive barriers between species, is often caused by negative epistasis between loci ("Dobzhansky-Muller incompatibilities"). The nature and complexity of hybrid incompatibilities remain poorly understood because identifying interacting loci that affect complex phenotypes is difficult. With subspecies in the early stages of speciation, an array of genetic tools, and detailed knowledge of reproductive biology, house mice (Mus musculus) provide a model system for dissecting hybrid incompatibilities. Male hybrids between M. musculus subspecies often show reduced fertility. Previous studies identified loci and several X chromosome-autosome interactions that contribute to sterility. To characterize the genetic basis of hybrid sterility in detail, we used a systems genetics approach, integrating mapping of gene expression traits with sterility phenotypes and QTL. We measured genome-wide testis expression in 305 male F2s from a cross between wild-derived inbred strains of M. musculus musculus and M. m. domesticus. We identified several thousand cis- and trans-acting QTL contributing to expression variation (eQTL). Many trans eQTL cluster into eleven 'hotspots,' seven of which co-localize with QTL for sterility phenotypes identified in the cross. The number and clustering of trans eQTL-but not cis eQTL-were substantially lower when mapping was restricted to a 'fertile' subset of mice, providing evidence that trans eQTL hotspots are related to sterility. Functional annotation of transcripts with eQTL provides insights into the biological processes disrupted by sterility loci and guides prioritization of candidate genes. Using a conditional mapping approach, we identified eQTL dependent on interactions between loci, revealing a complex system of epistasis. Our results illuminate established patterns, including the role of the X chromosome in hybrid sterility. The integrated mapping approach we employed is applicable in a broad
Turner, Leslie M.; White, Michael A.; Tautz, Diethard; Payseur, Bret A.
Hybrid dysfunction, a common feature of reproductive barriers between species, is often caused by negative epistasis between loci (“Dobzhansky-Muller incompatibilities”). The nature and complexity of hybrid incompatibilities remain poorly understood because identifying interacting loci that affect complex phenotypes is difficult. With subspecies in the early stages of speciation, an array of genetic tools, and detailed knowledge of reproductive biology, house mice (Mus musculus) provide a model system for dissecting hybrid incompatibilities. Male hybrids between M. musculus subspecies often show reduced fertility. Previous studies identified loci and several X chromosome-autosome interactions that contribute to sterility. To characterize the genetic basis of hybrid sterility in detail, we used a systems genetics approach, integrating mapping of gene expression traits with sterility phenotypes and QTL. We measured genome-wide testis expression in 305 male F2s from a cross between wild-derived inbred strains of M. musculus musculus and M. m. domesticus. We identified several thousand cis- and trans-acting QTL contributing to expression variation (eQTL). Many trans eQTL cluster into eleven ‘hotspots,’ seven of which co-localize with QTL for sterility phenotypes identified in the cross. The number and clustering of trans eQTL—but not cis eQTL—were substantially lower when mapping was restricted to a ‘fertile’ subset of mice, providing evidence that trans eQTL hotspots are related to sterility. Functional annotation of transcripts with eQTL provides insights into the biological processes disrupted by sterility loci and guides prioritization of candidate genes. Using a conditional mapping approach, we identified eQTL dependent on interactions between loci, revealing a complex system of epistasis. Our results illuminate established patterns, including the role of the X chromosome in hybrid sterility. The integrated mapping approach we employed is
Full Text Available Genetic regulatory networks are dynamic systems which describe the interactions among gene products (mRNAs and proteins. The internal states of a genetic regulatory network consist of the concentrations of mRNA and proteins involved in it, which are very helpful in understanding its dynamic behaviors. However, because of some limitations such as experiment techniques, not all internal states of genetic regulatory network can be effectively measured. Therefore it becomes an important issue to estimate the unmeasured states via the available measurements. In this study, we design a state observer to estimate the states of genetic regulatory networks with time delays from available measurements. Furthermore, based on linear matrix inequality (LMI approach, a criterion is established to guarantee that the dynamic of estimation error is globally asymptotically stable. A gene repressillatory network is employed to illustrate the effectiveness of our design approach.
Qin, Jing; Hu, Yaohua; Xu, Feng; Yalamanchili, Hari Krishna; Wang, Junwen
Inferring gene regulatory networks from gene expression data at whole genome level is still an arduous challenge, especially in higher organisms where the number of genes is large but the number of experimental samples is small. It is reported that the accuracy of current methods at genome scale significantly drops from Escherichia coli to Saccharomyces cerevisiae due to the increase in number of genes. This limits the applicability of current methods to more complex genomes, like human and mouse. Least absolute shrinkage and selection operator (LASSO) is widely used for gene regulatory network inference from gene expression profiles. However, the accuracy of LASSO on large genomes is not satisfactory. In this study, we apply two extended models of LASSO, L0 and L1/2 regularization models to infer gene regulatory network from both high-throughput gene expression data and transcription factor binding data in mouse embryonic stem cells (mESCs). We find that both the L0 and L1/2 regularization models significantly outperform LASSO in network inference. Incorporating interactions between transcription factors and their targets remarkably improved the prediction accuracy. Current study demonstrates the efficiency and applicability of these two models for gene regulatory network inference from integrative omics data in large genomes. The applications of the two models will facilitate biologists to study the gene regulation of higher model organisms in a genome-wide scale. Copyright © 2014 Elsevier Inc. All rights reserved.
Rodionov, Dmitry A.; Novichkov, Pavel; Stavrovskaya, Elena D.; Rodionova, Irina A.; Li, Xiaoqing; Kazanov, Marat D.; Ravcheev, Dmitry A.; Gerasimova, Anna V.; Kazakov, Alexey E.; Kovaleva, Galina Y.; Permina, Elizabeth A.; Laikova, Olga N.; Overbeek, Ross; Romine, Margaret F.; Fredrickson, Jim K.; Arkin, Adam P.; Dubchak, Inna; Osterman, Andrei L.; Gelfand, Mikhail S.
Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. Despite the growing number of genome-scale gene expression studies, our abilities to convert the results of these studies into accurate regulatory annotations and to project them from model to other organisms are extremely limited. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. However, even orthologous regulators with conserved DNA-binding motifs may control substantially different gene sets, revealing striking differences in regulatory strategies between the Shewanella spp. and E. coli. Multiple examples of regulatory network rewiring include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), and numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. NagR for N-acetylglucosamine catabolism and PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp).
Yu, Bowen; Doraiswamy, Harish; Chen, Xi; Miraldi, Emily; Arrieta-Ortiz, Mario Luis; Hafemeister, Christoph; Madar, Aviv; Bonneau, Richard; Silva, Cláudio T
Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).
Full Text Available Abstract Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user can be analyzed in the context of known
Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user) can be analyzed in the context of known transcriptional regulatory networks to predict putative
Tan, Mehmet; Alhajj, Reda; Polat, Faruk
Controlling gene regulatory networks (GRNs) is an important and hard problem. As it is the case in all control problems, the curse of dimensionality is the main issue in real applications. It is possible that hundreds of genes may regulate one biological activity in an organism; this implies a huge state space, even in the case of Boolean models. This is also evident in the literature that shows that only models of small portions of the genome could be used in control applications. In this paper, we empower our framework for controlling GRNs by eliminating the need for expert knowledge to specify some crucial threshold that is necessary for producing effective results. Our framework is characterized by applying the factored Markov decision problem (FMDP) method to the control problem of GRNs. The FMDP is a suitable framework for large state spaces as it represents the probability distribution of state transitions using compact models so that more space and time efficient algorithms could be devised for solving control problems. We successfully mapped the GRN control problem to an FMDP and propose a model reduction algorithm that helps find approximate solutions for large networks by using existing FMDP solvers. The test results reported in this paper demonstrate the efficiency and effectiveness of the proposed approach.
Santillán Zerón, Moisés
Knowing the complete genome of a given species is just a piece of the puzzle. To fully unveil the systems behavior of an organism, an organ, or even a single cell, we need to understand the underlying gene regulatory dynamics. Given the complexity of the whole system, the ultimate goal is unattainable for the moment. But perhaps, by analyzing the most simple genetic systems, we may be able to develop the mathematical techniques and procedures required to tackle more complex genetic networks in the near future. In the present work, the techniques for developing mathematical models of simple bacterial gene networks, like the tryptophan and lactose operons are introduced. Despite all of the underlying assumptions, such models can provide valuable information regarding gene regulation dynamics. Here, we pay special attention to robustness as an emergent property. These notes are organized as follows. In the first section, the long historical relation between mathematics, physics, and biology is briefly reviewed. Recently, the multidisciplinary work in biology has received great attention in the form of systems biology. The main concepts of this novel science are discussed in the second section. A very slim introduction to the essential concepts of molecular biology is given in the third section. In the fourth section, a brief introduction to chemical kinetics is presented. Finally, in the fifth section, a mathematical model for the lactose operon is developed and analyzed..
Ciofani, Maria; Madar, Aviv; Galan, Carolina; Sellars, Maclean; Mace, Kieran; Pauli, Florencia; Agarwal, Ashish; Huang, Wendy; Parkhurst, Christopher N.; Muratet, Michael; Newberry, Kim M.; Meadows, Sarah; Greenfield, Alex; Yang, Yi; Jain, Preti; Kirigin, Francis F.; Birchmeier, Carmen; Wagner, Erwin F.; Murphy, Kenneth M.; Myers, Richard M.; Bonneau, Richard; Littman, Dan R.
Th17 cells have critical roles in mucosal defense and are major contributors to inflammatory disease. Their differentiation requires the nuclear hormone receptor RORγt working with multiple other essential transcription factors (TFs). We have used an iterative systems approach, combining genome-wide TF occupancy, expression profiling of TF mutants, and expression time series to delineate the Th17 global transcriptional regulatory network. We find that cooperatively-bound BATF and IRF4 contribute to initial chromatin accessibility, and with STAT3 initiate a transcriptional program that is then globally tuned by the lineage-specifying TF RORγt, which plays a focal deterministic role at key loci. Integration of multiple datasets allowed inference of an accurate predictive model that we computationally and experimentally validated, identifying multiple new Th17 regulators, including Fosl2, a key determinant of cellular plasticity. This interconnected network can be used to investigate new therapeutic approaches to manipulate Th17 functions in the setting of inflammatory disease. PMID:23021777
Paul H Reeves
Full Text Available For self-pollinating plants to reproduce, male and female organ development must be coordinated as flowers mature. The Arabidopsis transcription factors AUXIN RESPONSE FACTOR 6 (ARF6 and ARF8 regulate this complex process by promoting petal expansion, stamen filament elongation, anther dehiscence, and gynoecium maturation, thereby ensuring that pollen released from the anthers is deposited on the stigma of a receptive gynoecium. ARF6 and ARF8 induce jasmonate production, which in turn triggers expression of MYB21 and MYB24, encoding R2R3 MYB transcription factors that promote petal and stamen growth. To understand the dynamics of this flower maturation regulatory network, we have characterized morphological, chemical, and global gene expression phenotypes of arf, myb, and jasmonate pathway mutant flowers. We found that MYB21 and MYB24 promoted not only petal and stamen development but also gynoecium growth. As well as regulating reproductive competence, both the ARF and MYB factors promoted nectary development or function and volatile sesquiterpene production, which may attract insect pollinators and/or repel pathogens. Mutants lacking jasmonate synthesis or response had decreased MYB21 expression and stamen and petal growth at the stage when flowers normally open, but had increased MYB21 expression in petals of older flowers, resulting in renewed and persistent petal expansion at later stages. Both auxin response and jasmonate synthesis promoted positive feedbacks that may ensure rapid petal and stamen growth as flowers open. MYB21 also fed back negatively on expression of jasmonate biosynthesis pathway genes to decrease flower jasmonate level, which correlated with termination of growth after flowers have opened. These dynamic feedbacks may promote timely, coordinated, and transient growth of flower organs.
Reeves, Paul H; Ellis, Christine M; Ploense, Sara E; Wu, Miin-Feng; Yadav, Vandana; Tholl, Dorothea; Chételat, Aurore; Haupt, Ina; Kennerley, Brian J; Hodgens, Charles; Farmer, Edward E; Nagpal, Punita; Reed, Jason W
For self-pollinating plants to reproduce, male and female organ development must be coordinated as flowers mature. The Arabidopsis transcription factors AUXIN RESPONSE FACTOR 6 (ARF6) and ARF8 regulate this complex process by promoting petal expansion, stamen filament elongation, anther dehiscence, and gynoecium maturation, thereby ensuring that pollen released from the anthers is deposited on the stigma of a receptive gynoecium. ARF6 and ARF8 induce jasmonate production, which in turn triggers expression of MYB21 and MYB24, encoding R2R3 MYB transcription factors that promote petal and stamen growth. To understand the dynamics of this flower maturation regulatory network, we have characterized morphological, chemical, and global gene expression phenotypes of arf, myb, and jasmonate pathway mutant flowers. We found that MYB21 and MYB24 promoted not only petal and stamen development but also gynoecium growth. As well as regulating reproductive competence, both the ARF and MYB factors promoted nectary development or function and volatile sesquiterpene production, which may attract insect pollinators and/or repel pathogens. Mutants lacking jasmonate synthesis or response had decreased MYB21 expression and stamen and petal growth at the stage when flowers normally open, but had increased MYB21 expression in petals of older flowers, resulting in renewed and persistent petal expansion at later stages. Both auxin response and jasmonate synthesis promoted positive feedbacks that may ensure rapid petal and stamen growth as flowers open. MYB21 also fed back negatively on expression of jasmonate biosynthesis pathway genes to decrease flower jasmonate level, which correlated with termination of growth after flowers have opened. These dynamic feedbacks may promote timely, coordinated, and transient growth of flower organs.
Sep 28, 2015 ... [Patel N and Wang JTL 2015 Semi-supervised prediction of gene regulatory networks using machine learning algorithms. J. Biosci. 40 731–740]. DOI 10.1007/s12038-015-9558-9. 1. Introduction. 1.1 Background. Using gene expression data to infer gene regulatory net- works (GRNs) is a key approach to ...
Wellmer, Frank; Riechmann, José Luis
The analysis of the gene regulatory networks underlying development is of central importance for a better understanding of the mechanisms that control the formation of the different cell-types, tissues or organs of an organism. The recent invention of genomic technologies has opened the possibility of studying these networks at a global level. In this paper, we summarize some of the recent advances that have been made in the understanding of plant development by the application of genomic technologies. We focus on a few specific processes, namely flower and root development and the control of the cell cycle, but we also highlight landmark studies in other areas that opened new avenues of experimentation or analysis. We describe the methods and the strategies that are currently used for the analysis of plant development by genomic technologies, as well as some of the problems and limitations that hamper their application. Since many genomic technologies and concepts were first developed and tested in organisms other than plants, we make reference to work in non-plant species and compare the current state of network analysis in plants to that in other multicellular organisms.
On 9 and 10 May 2012, the IEA International CCS Regulatory Network (Network), launched in Paris in May 2008 to provide a neutral forum for CCS regulators, policy makers and stakeholders to share updates and views on CCS regulatory developments, held its fourth meeting at the International Energy Agency (IEA) offices in Paris, France. The aim of the meeting was to: provide an update on government efforts to develop and implement carbon capture and storage (CCS) legal and regulatory frameworks; and consider ways in which governments are dealing with some of the more difficult or complex aspects of CCS regulation. This report summarises the proceedings of the meeting.
Yousefi, Mohammadmahdi R; Dougherty, Edward R
A basic issue for translational genomics is to model gene interaction via gene regulatory networks (GRNs) and thereby provide an informatics environment to study the effects of intervention (say, via drugs) and to derive effective intervention strategies. Taking the view that the phenotype is characterized by the long-run behavior (steady-state distribution) of the network, we desire interventions to optimally move the probability mass from undesirable to desirable states Heretofore, two external control approaches have been taken to shift the steady-state mass of a GRN: (i) use a user-defined cost function for which desirable shift of the steady-state mass is a by-product and (ii) use heuristics to design a greedy algorithm. Neither approach provides an optimal control policy relative to long-run behavior. We use a linear programming approach to optimally shift the steady-state mass from undesirable to desirable states, i.e. optimization is directly based on the amount of shift and therefore must outperform previously proposed methods. Moreover, the same basic linear programming structure is used for both unconstrained and constrained optimization, where in the latter case, constraints on the optimization limit the amount of mass that may be shifted to 'ambiguous' states, these being states that are not directly undesirable relative to the pathology of interest but which bear some perceived risk. We apply the method to probabilistic Boolean networks, but the theory applies to any Markovian GRN. Supplementary materials, including the simulation results, MATLAB source code and description of suboptimal methods are available at http://gsp.tamu.edu/Publications/supplementary/yousefi13b. firstname.lastname@example.org Supplementary data are available at Bioinformatics online.
Singh, Pramesh; Chen, Tianlong; Arendsee, Zebulun; Wurtele, Eve S.; Bassler, Kevin E.
Orphan genes, which are genes unique to each particular species, have recently drawn significant attention for their potential usefulness for organismal robustness. Their origin and regulatory interaction patterns remain largely undiscovered. Recently, methods that use the context likelihood of relatedness to infer a network followed by modularity maximizing community detection algorithms on the inferred network to find the functional structure of regulatory networks were shown to be effective. We apply improved versions of these methods to gene expression data from Arabidopsis thaliana, identify groups (clusters) of interacting genes with related patterns of expression and analyze the structure within those groups. Focusing on clusters that contain orphan genes, we compare the identified clusters to gene ontology (GO) terms, regulons, and pathway designations and analyze their hierarchical structure. We predict new regulatory interactions and unravel the structure of the regulatory interaction patterns of orphan genes. Work supported by the NSF through Grants DMR-1507371 and IOS-1546858.
Diogo Fernando Veiga; Pedro de Stege Cecconello; José Eduardo De Lucca; Luismar Marques Porto
In this work we developed an extension of IsaViz software, a RDF (Resource Description Framework) authoring tool, designed to be a graphical environment to build models of metabolic and regulatory networks. This environment, called Metabolic IsaViz, was linked to a genomic library of types and was modeled on the basis of ontologies. Biochemical pathways included data at sequence level (e.g., the amino acid sequence of enzymes), besides kinetic and thermodynamic parameters for the reactions. M...
Harrington, Eoghan D; Jensen, Lars J; Bork, Peer
Continuing improvements in DNA sequencing technologies are providing us with vast amounts of genomic data from an ever-widening range of organisms. The resulting challenge for bioinformatics is to interpret this deluge of data and place it back into its biological context. Biological networks...... provide a conceptual framework with which we can describe part of this context, namely the different interactions that occur between the molecular components of a cell. Here, we review the computational methods available to predict biological networks from genomic sequence data and discuss how they relate...
Erkelenz, Steffen; Theiss, Stephan; Otte, Marianne; Widera, Marek; Peter, Jan Otto; Schaal, Heiner
Effective splice site selection is critically controlled by flanking splicing regulatory elements (SREs) that can enhance or repress splice site use. Although several computational algorithms currently identify a multitude of potential SRE motifs, their predictive power with respect to mutation effects is limited. Following a RESCUE-type approach, we defined a hexamer-based 'HEXplorer score' as average Z-score of all six hexamers overlapping with a given nucleotide in an arbitrary genomic sequence. Plotted along genomic regions, HEXplorer score profiles varied slowly in the vicinity of splice sites. They reflected the respective splice enhancing and silencing properties of splice site neighborhoods beyond the identification of single dedicated SRE motifs. In particular, HEXplorer score differences between mutant and reference sequences faithfully represented exonic mutation effects on splice site usage. Using the HIV-1 pre-mRNA as a model system highly dependent on SREs, we found an excellent correlation in 29 mutations between splicing activity and HEXplorer score. We successfully predicted and confirmed five novel SREs and optimized mutations inactivating a known silencer. The HEXplorer score allowed landscaping of splicing regulatory regions, provided a quantitative measure of mutation effects on splice enhancing and silencing properties and permitted calculation of the mutationally most effective nucleotide. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Siggia Eric D
Full Text Available Abstract Background To explain the vastly different phenotypes exhibited by the same organism under different conditions, it is essential that we understand how the organism's genes are coordinately regulated. While there are many excellent tools for predicting sequences encoding proteins or RNA genes, few algorithms exist to predict regulatory sequences on a genome wide scale with no prior information. Results To identify motifs involved in the control of transcription, an algorithm was developed that searches upstream of operons for improbably frequent dimers. The algorithm was applied to the B. subtilis genome, which is predicted to encode for approximately 200 DNA binding proteins. The dimers found to be over-represented could be clustered into 317 distinct groups, each thought to represent a class of motifs uniquely recognized by some transcription factor. For each cluster of dimers, a representative weight matrix was derived and scored over the regions upstream of the operons to predict the sites recognized by the cluster's factor, and a putative regulon of the operons immediately downstream of the sites was inferred. The distribution in number of operons per predicted regulon is comparable to that for well characterized transcription factors. The most highly over-represented dimers matched σA, the T-box, and σW sites. We have evidence to suggest that at least 52 of our clusters of dimers represent actual regulatory motifs, based on the groups' weight matrix matches to experimentally characterized sites, the functional similarity of the component operons of the groups' regulons, and the positional biases of the weight matrix matches. All predictions are assigned a significance value, and thresholds are set to avoid false positives. Where possible, we examine our false negatives, drawing examples from known regulatory motifs and regulons inferred from RNA expression data. Conclusions We have demonstrated that in the case of B. subtilis
Mandal, Sudip; Saha, Goutam; Pal, Rajat Kumar
Correct inference of genetic regulations inside a cell from the biological database like time series microarray data is one of the greatest challenges in post genomic era for biologists and researchers. Recurrent Neural Network (RNN) is one of the most popular and simple approach to model the dynamics as well as to infer correct dependencies among genes. Inspired by the behavior of social elephants, we propose a new metaheuristic namely Elephant Swarm Water Search Algorithm (ESWSA) to infer Gene Regulatory Network (GRN). This algorithm is mainly based on the water search strategy of intelligent and social elephants during drought, utilizing the different types of communication techniques. Initially, the algorithm is tested against benchmark small and medium scale artificial genetic networks without and with presence of different noise levels and the efficiency was observed in term of parametric error, minimum fitness value, execution time, accuracy of prediction of true regulation, etc. Next, the proposed algorithm is tested against the real time gene expression data of Escherichia Coli SOS Network and results were also compared with others state of the art optimization methods. The experimental results suggest that ESWSA is very efficient for GRN inference problem and performs better than other methods in many ways.
Berto, Stefano; Perdomo-Sabogal, Alvaro; Gerighausen, Daniel; Qin, Jing; Nowick, Katja
Cognitive abilities, such as memory, learning, language, problem solving, and planning, involve the frontal lobe and other brain areas. Not much is known yet about the molecular basis of cognitive abilities, but it seems clear that cognitive abilities are determined by the interplay of many genes. One approach for analyzing the genetic networks involved in cognitive functions is to study the coexpression networks of genes with known importance for proper cognitive functions, such as genes that have been associated with cognitive disorders like intellectual disability (ID) or autism spectrum disorders (ASD). Because many of these genes are gene regulatory factors (GRFs) we aimed to provide insights into the gene regulatory networks active in the human frontal lobe. Using genome wide human frontal lobe expression data from 10 independent data sets, we first derived 10 individual coexpression networks for all GRFs including their potential target genes. We observed a high level of variability among these 10 independently derived networks, pointing out that relying on results from a single study can only provide limited biological insights. To instead focus on the most confident information from these 10 networks we developed a method for integrating such independently derived networks into a consensus network. This consensus network revealed robust GRF interactions that are conserved across the frontal lobes of different healthy human individuals. Within this network, we detected a strong central module that is enriched for 166 GRFs known to be involved in brain development and/or cognitive disorders. Interestingly, several hubs of the consensus network encode for GRFs that have not yet been associated with brain functions. Their central role in the network suggests them as excellent new candidates for playing an essential role in the regulatory network of the human frontal lobe, which should be investigated in future studies. PMID:27014338
Full Text Available Cognitive abilities, such as memory, learning, language, problem solving, and planning, involve the frontal lobe and other brain areas. Not much is known yet about the molecular basis of cognitive abilities, but it seems clear that cognitive abilities are determined by the interplay of many genes. One approach for analyzing the genetic networks involved in cognitive functions is to study the coexpression networks of genes with known importance for proper cognitive functions, such as genes that have been associated with cognitive disorders like intellectual disability (ID or autism spectrum disorders (ASD. Because many of these genes are gene regulatory factors (GRFs we aimed to provide insights into the gene regulatory networks active in the human frontal lobe. Using genome wide human frontal lobe expression data from 10 independent data sets, we first derived 10 individual coexpression networks for all GRFs including their potential target genes. We observed a high level of variability among these 10 independently derived networks, pointing out that relying on results from a single study can only provide limited biological insights. To instead focus on the most confident information from these 10 networks we developed a method for integrating such independently derived networks into a consensus network. This consensus network revealed robust GRF interactions that are conserved across the frontal lobes of different healthy human individuals. Within this network, we detected a strong central module that is enriched for 166 GRFs known to be involved in brain development and/or cognitive disorders. Interestingly, several hubs of the consensus network encode for GRFs that have not yet been associated with brain functions. Their central role in the network suggests them as excellent new candidates for playing an essential role in the regulatory network of the human frontal lobe, which should be investigated in future studies.
Smadar eBen-Tabou De-Leon
Full Text Available Developmental gene regulatory networks robustly control the timely activation of regulatory and differentiation genes. The structure of these networks underlies their capacity to buffer intrinsic and extrinsic noise and maintain embryonic morphology. Here I illustrate how the use of specific architectures by the sea urchin developmental regulatory networks enables the robust control of cell fate decisions. The Wnt-βcatenin signaling pathway patterns the primary embryonic axis while the BMP signaling pathway patterns the secondary embryonic axis in the sea urchin embryo and across bilateria. Interestingly, in the sea urchin in both cases, the signaling pathway that defines the axis controls directly the expression of a set of downstream regulatory genes. I propose that this direct activation of a set of regulatory genes enables a uniform regulatory response and a clear cut cell fate decision in the endoderm and in the dorsal ectoderm. The specification of the mesodermal pigment cell lineage is activated by Delta signaling that initiates a triple positive feedback loop that locks down the pigment specification state. I propose that the use of compound positive feedback circuitry provides the endodermal cells enough time to turn off mesodermal genes and ensures correct mesoderm vs. endoderm fate decision. Thus, I argue that understanding the control properties of repeatedly used regulatory architectures illuminates their role in embryogenesis and provides possible explanations to their resistance to evolutionary change.
Ben-Tabou de-Leon, Smadar
Developmental gene regulatory networks robustly control the timely activation of regulatory and differentiation genes. The structure of these networks underlies their capacity to buffer intrinsic and extrinsic noise and maintain embryonic morphology. Here I illustrate how the use of specific architectures by the sea urchin developmental regulatory networks enables the robust control of cell fate decisions. The Wnt-βcatenin signaling pathway patterns the primary embryonic axis while the BMP signaling pathway patterns the secondary embryonic axis in the sea urchin embryo and across bilateria. Interestingly, in the sea urchin in both cases, the signaling pathway that defines the axis controls directly the expression of a set of downstream regulatory genes. I propose that this direct activation of a set of regulatory genes enables a uniform regulatory response and a clear cut cell fate decision in the endoderm and in the dorsal ectoderm. The specification of the mesodermal pigment cell lineage is activated by Delta signaling that initiates a triple positive feedback loop that locks down the pigment specification state. I propose that the use of compound positive feedback circuitry provides the endodermal cells enough time to turn off mesodermal genes and ensures correct mesoderm vs. endoderm fate decision. Thus, I argue that understanding the control properties of repeatedly used regulatory architectures illuminates their role in embryogenesis and provides possible explanations to their resistance to evolutionary change.
Full Text Available The relationship between the design and functionality of molecular networks is now a key issue in biology. Comparison of regulatory networks performing similar tasks can provide insights into how network architecture is constrained by the functions it directs. Here, we discuss methods of network comparison based on network architecture and signaling logic. Introducing local and global signaling scores for the difference between two networks, we quantify similarities between evolutionarily closely and distantly related bacteriophages. Despite the large evolutionary separation between phage lambda and 186, their networks are found to be similar when difference is measured in terms of global signaling. We finally discuss how network alignment can be used to pinpoint protein similarities viewed from the network perspective.
Splinter, E.; de Laat, W.
The non-coding part of our genome contains sequence motifs that can control gene transcription over distance. Here, we discuss functional genomics studies that uncover and characterize these sequences across the mammalian genome. The picture emerging is of a genome being a complex regulatory
Lehmann, Martin; Sneppen, K.
Sensing a graded input and differentiating between its different levels is at the core of many developmental decisions. Here, we want to examine how this can be realized for a simple system. We model gene regulatory circuits that reach distinct states when setting the underlying gene copy number...
Avsec, Žiga; Barekatain, Mohammadamin; Cheng, Jun; Gagneur, Julien
Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries, or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at goo.gl/3yMY5w. email@example.com; firstname.lastname@example.org. Supplementary data are available at Bioinformatics online.
Dong, Xinran; Wang, Xiao; Zhang, Feng; Tian, Weidong
Accelerated evolution of regulatory sequence can alter the expression pattern of target genes, and cause phenotypic changes. In this study, we used DNase I hypersensitive sites (DHSs) to annotate putative regulatory sequences in the human genome, and conducted a genome-wide analysis of the effects of accelerated evolution on regulatory sequences. Working under the assumption that local ancient repeat elements of DHSs are under neutral evolution, we discovered that ∼0.44% of DHSs are under accelerated evolution (ace-DHSs). We found that ace-DHSs tend to be more active than background DHSs, and are strongly associated with epigenetic marks of active transcription. The target genes of ace-DHSs are significantly enriched in neuron-related functions, and their expression levels are positively selected in the human brain. Thus, these lines of evidences strongly suggest that accelerated evolution on regulatory sequences plays important role in the evolution of human-specific phenotypes. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Many processes change genomes. Koonin and Wolf. 2008. Page 5 .. including horizontal gene transfer. Koonin and Wolf. 2008. Page 6. Horizontal gene transfer. Drastic modification of genetic material. Rapid exploration of ne niches and phenot pes. Page 7. Horizontal gene transfer regulates. New selective forces for gene ...
Vân Anh Huynh-Thu
Full Text Available One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene is predicted from the expression patterns of all the other genes (input genes, using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.
Sprink, Thorben; Eriksson, Dennis; Schiemann, Joachim; Hartung, Frank
Novel plant genome editing techniques call for an updated legislation regulating the use of plants produced by genetic engineering or genome editing, especially in the European Union. Established more than 25 years ago and based on a clear distinction between transgenic and conventionally bred plants, the current EU Directives fail to accommodate the new continuum between genetic engineering and conventional breeding. Despite the fact that the Directive 2001/18/EC contains both process- and product-related terms, it is commonly interpreted as a strictly process-based legislation. In view of several new emerging techniques which are closer to the conventional breeding than common genetic engineering, we argue that it should be actually interpreted more in relation to the resulting product. A legal guidance on how to define plants produced by exploring novel genome editing techniques in relation to the decade-old legislation is urgently needed, as private companies and public researchers are waiting impatiently with products and projects in the pipeline. We here outline the process in the EU to develop a legislation that properly matches the scientific progress. As the process is facing several hurdles, we also compare with existing frameworks in other countries and discuss ideas for an alternative regulatory system.
Chen, Yun-Ru; Huang, Hsuan-Cheng; Lin, Chen-Ching
The development of disease involves a systematic disturbance inside cells and is associated with changes in the interactions or regulations among genes forming biological networks. The bridges inside a network are critical in shortening the distances between nodes. We observed that, inside the human gene regulatory network, one strongly connected core bridged the whole network. Other regulations outside the core formed a weakly connected component surrounding the core like a peripheral structure. Furthermore, the regulatory feedback loops (FBLs) inside the core compose an interface-like structure between the core and periphery. We then denoted the regulatory FBLs as the interface core. Notably, both the cancer-associated and essential biomolecules and regulations were significantly overrepresented in the interface core. These results implied that the interface core is not only critical for the network structure but central in cellular systems. Furthermore, the enrichment of the cancer-associated and essential regulations in the interface core might be attributed to its bridgeness in the network. More importantly, we identified one regulatory FBL between HNF4A and NR2F2 that possesses the highest bridgeness in the interface core. Further investigation suggested that the disturbance of the HNF4A-NR2F2 FBL might protect tumor cells from apoptotic processes. Our results emphasize the relevance of the regulatory network properties to cellular systems and might reveal a critical role of the interface core in cancer. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com.
Full Text Available The inference of gene regulatory network from expression data is an important area of research that provides insight to the inner workings of a biological system. The relevance-network-based approaches provide a simple and easily-scalable solution to the understanding of interaction between genes. Up until now, most works based on relevance network focus on the discovery of direct regulation using correlation coefficient or mutual information. However, some of the more complicated interactions such as interactive regulation and coregulation are not easily detected. In this work, we propose a relevance network model for gene regulatory network inference which employs both mutual information and conditional mutual information to determine the interactions between genes. For this purpose, we propose a conditional mutual information estimator based on adaptive partitioning which allows us to condition on both discrete and continuous random variables. We provide experimental results that demonstrate that the proposed regulatory network inference algorithm can provide better performance when the target network contains coregulated and interactively regulated genes.
Karp Peter D
Full Text Available Abstract Background Escherichia coli is the model organism for which our knowledge of its regulatory network is the most extensive. Over the last few years, our project has been collecting and curating the literature concerning E. coli transcription initiation and operons, providing in both the RegulonDB and EcoCyc databases the largest electronically encoded network available. A paper published recently by Ma et al. (2004 showed several differences in the versions of the network present in these two databases. Discrepancies have been corrected, annotations from this and other groups (Shen-Orr et al., 2002 have been added, making the RegulonDB and EcoCyc databases the largest comprehensive and constantly curated regulatory network of E. coli K-12. Results Several groups have been using these curated data as part of their bioinformatics and systems biology projects, in combination with external data obtained from other sources, thus enlarging the dataset initially obtained from either RegulonDB or EcoCyc of the E. coli K12 regulatory network. We kindly obtained from the groups of Uri Alon and Hong-Wu Ma the interactions they have added to enrich their public versions of the E. coli regulatory network. These were used to search for original references and curate them with the same standards we use regularly, adding in several cases the original references (instead of reviews or missing references, as well as adding the corresponding experimental evidence codes. We also corrected all discrepancies in the two databases available as explained below. Conclusion One hundred and fifty new interactions have been added to our databases as a result of this specific curation effort, in addition to those added as a result of our continuous curation work. RegulonDB gene names are now based on those of EcoCyc to avoid confusion due to gene names and synonyms, and the public releases of RegulonDB and EcoCyc are henceforth synchronized to avoid confusion due to
Yang, Bing; Wittkopp, Patricia J
Transcriptional control of gene expression is regulated by biochemical interactions between cis-regulatory DNA sequences and trans-acting factors that form complex regulatory networks. Genetic changes affecting both cis- and trans-acting sequences in these networks have been shown to alter patterns of gene expression as well as higher-order organismal phenotypes. Here, we investigate how the structure of these regulatory networks relates to patterns of polymorphism and divergence in gene expression. To do this, we compared a transcriptional regulatory network inferred for Drosophila melanogaster to differences in gene regulation observed between two strains of D. melanogaster as well as between two pairs of closely related species: Drosophila sechellia and Drosophila simulans, and D. simulans and D. melanogaster. We found that the number of transcription factors predicted to directly regulate a gene ("in-degree") was negatively correlated with divergence in both gene expression (mRNA abundance) and cis-regulation. This observation suggests that the number of transcription factors directly regulating a gene's expression affects the conservation of cis-regulation and gene expression over evolutionary time. We also tested the hypothesis that transcription factors regulating more target genes (higher "out-degree") are less likely to evolve changes in their cis-regulation and expression (presumably due to increased pleiotropy), but found little support for this predicted relationship. Taken together, these data show how the architecture of regulatory networks can influence regulatory evolution. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.
Wade, Joseph T
Bacterial genomes encode numerous transcription factors, DNA-binding proteins that regulate transcription initiation. Identifying the regulatory targets of transcription factors is a major challenge of systems biology. Here I describe the use of two genome-scale approaches, ChIP-seq and RNA-seq, that are used to map transcription factor regulons. ChIP-seq maps the association of transcription factors with DNA, and RNA-seq determines changes in RNA levels associated with transcription factor perturbation. I discuss the strengths and weaknesses of these and related approaches, and I describe how ChIP-seq and RNA-seq can be combined to map individual transcription factor regulons and entire regulatory networks.
Full Text Available Reverse engineering of gene regulatory networks has been an intensively studied topic in bioinformatics since it constitutes an intermediate step from explorative to causative gene expression analysis. Many methods have been proposed through recent years leading to a wide range of mathematical approaches. In practice, different mathematical approaches will generate different resulting network structures, thus, it is very important for users to assess the performance of these algorithms. We have conducted a comparative study with six different reverse engineering methods, including relevance networks, neural networks, and Bayesian networks. Our approach consists of the generation of defined benchmark data, the analysis of these data with the different methods, and the assessment of algorithmic performances by statistical analyses. Performance was judged by network size and noise levels. The results of the comparative study highlight the neural network approach as best performing method among those under study.
Full Text Available Abstract Background Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact. Results A recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement. Conclusions Compared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA.
This chapter is split into two main sections; first, I will present an introduction to gene networks. Second, I will discuss various approaches to gene network modeling which will include some examples for using different data sources. Computational modeling has been used for many different biological systems and many approaches have been developed addressing the different needs posed by the different application fields. The modeling approaches presented here are not limited to gene regulatory networks and occasionally I will present other examples. The material covered here is an update based on several previous publications by Thomas Schlitt and Alvis Brazma (FEBS Lett 579(8),1859-1866, 2005; Philos Trans R Soc Lond B Biol Sci 361(1467), 483-494, 2006; BMC Bioinformatics 8(suppl 6), S9, 2007) that formed the foundation for a lecture on gene regulatory networks at the In Silico Systems Biology workshop series at the European Bioinformatics Institute in Hinxton.
Mar 13, 2012 ... Human adenocarcinoma (AC) is the most frequently diagnosed human lung cancer and its absolute incidence is increasing dramatically. Our study aimed to interpret the mechanisms of human adenocarcinoma through the regulation network based on differentially expressed genes (DEGs). We used the ...
Full Text Available Abstract Background Few genome-scale models of organisms focus on the regulatory networks and none of them integrates all known levels of regulation. In particular, the regulations involving metabolite pools are often neglected. However, metabolite pools link the metabolic to the genetic network through genetic regulations, including those involving effectors of transcription factors or riboswitches. Consequently, they play pivotal roles in the global organization of the genetic and metabolic regulatory networks. Results We report the manually curated reconstruction of the genetic and metabolic regulatory networks of the central metabolism of Bacillus subtilis (transcriptional, translational and post-translational regulations and modulation of enzymatic activities. We provide a systematic graphic representation of regulations of each metabolic pathway based on the central role of metabolites in regulation. We show that the complex regulatory network of B. subtilis can be decomposed as sets of locally regulated modules, which are coordinated by global regulators. Conclusion This work reveals the strong involvement of metabolite pools in the general regulation of the metabolic network. Breaking the metabolic network down into modules based on the control of metabolite pools reveals the functional organization of the genetic and metabolic regulatory networks of B. subtilis.
Krol, Elizaveta; Blom, Jochen; Winnebald, Jörn; Berhörster, Alexander; Barnett, Melanie J; Goesmann, Alexander; Baumbach, Jan; Becker, Anke
Sinorhizobium meliloti is a symbiotic soil bacterium that forms nitrogen-fixing nodules on roots of leguminous plants, including Medicago truncatula (barrel medic), and M. sativa (alfalfa). The Sinorhizobium-Medicago symbiosis is an important symbiosis model system. Knowledge gained from this system can be extended to other agriculturally important "rhizobial" symbioses. Since the publication of the S. meliloti genome in 2001, many new genetic, biochemical and physiological data have been generated. Effective methods to organize, store, and mine this postgenome data are crucial for continued success of the S. meliloti model system. In 2009, we introduced a portal for rhizobial genomes, RhizoGATE (Becker et al., J. Biotechnol. 140, 45-50). The RhizoGATE portal combines continuously updated S. meliloti genome annotation with postgenome data resources. Here we report integration of a new component, RhizoRegNet, to RhizoGATE. RhizoRegNet combines transcriptome data and operon predictions with published data on regulatory interactions. By allowing searching and visualisation of complex transcriptional regulatory networks, RhizoRegNet advances our understanding of transcriptional regulation in S. meliloti. The current version of RhizoRegNet is divided into 13 functional modules containing information for 114 regulators, 475 regulated genes, and 178 transcription factor binding motifs. In this report, we provide an example of how RhizoRegNet facilitates visualisation and analysis of the regulatory network for exopolysaccharide biosynthesis and motility. Presently, RhizoRegNet contains regulatory network information for S. meliloti and the closely related bacterium, S. medicae, but can be expanded to include other rhizobial species. Copyright © 2010 Elsevier B.V. All rights reserved.
Rämö, Pauli; Kesseli, Juha; Yli-Harja, Olli
Boolean networks are used to model large nonlinear systems such as gene regulatory networks. We will present results that can be used to understand how the choice of functions affects the network dynamics. The so called bias-map and its fixed points depict much of the function's dynamical role in the network. We define the concept of stabilizing functions and show that many Post and canalizing functions are also stabilizing functions. Boolean networks constructed using the same type of stabilizing functions are always stable regardless of the average in-degree of network functions. We derive the number of all stabilizing functions and find it to be much larger than the number of Post and canalizing functions. We also discuss the implementation of functions and apply the presented results to biological data that give an approximation of the distribution of regulatory functions in eucaryotic cells. We find that the obtained theoretical results on the number of active genes are biologically plausible. Finally, based on the presented results, we discuss why canalizing and Post regulatory functions seem to be common in cells.
Ma, Tianle; Zhang, Aidong
Reconstructing context-specific transcriptional regulatory network is crucial for deciphering principles of regulatory mechanisms underlying various conditions. Recently studies that reconstructed transcriptional networks have focused on individual organisms or cell types and relied on data repositories of context-free regulatory relationships. Here we present a comprehensive framework to systematically derive putative regulator-target pairs in any given context by integrating context-specific transcriptional profiling and public data repositories of gene regulatory networks. Moreover, our framework can identify core regulatory modules and signature genes underlying global regulatory circuitry, and detect network rewiring and core rewired modules in different contexts by considering gene modules and edge (gene interaction) modules collaboratively. We applied our methods to analyzing Autism RNA-seq experiment data and produced biologically meaningful results. In particular, all 11 hub genes in a predicted rewired autistic regulatory subnetwork have been linked to autism based on literature review. The predicted rewired autistic regulatory network may shed some new insight into disease mechanism. Published by Elsevier Inc.
Naresh Doni Jayavelu
Full Text Available Understanding gene transcription regulatory networks is critical to deciphering the molecular mechanisms of different cellular states. Most studies focus on static transcriptional networks. In the current study, we used the gastrin-regulated system as a model to understand the dynamics of transcriptional networks composed of transcription factors (TFs and target genes (TGs. The hormone gastrin activates and stimulates signaling pathways leading to various cellular states through transcriptional programs. Dysregulation of gastrin can result in cancerous tumors, for example. However, the regulatory networks involving gastrin are highly complex, and the roles of most of the components of these networks are unknown. We used time series microarray data of AR42J adenocarcinoma cells treated with gastrin combined with static TF-TG relationships integrated from different sources, and we reconstructed the dynamic activities of TFs using network component analysis (NCA. Based on the peak expression of TGs and activity of TFs, we created active sub-networks at four time ranges after gastrin treatment, namely immediate-early (IE, mid-early (ME, mid-late (ML and very late (VL. Network analysis revealed that the active sub-networks were topologically different at the early and late time ranges. Gene ontology analysis unveiled that each active sub-network was highly enriched in a particular biological process. Interestingly, network motif patterns were also distinct between the sub-networks. This analysis can be applied to other time series microarray datasets, focusing on smaller sub-networks that are activated in a cascade, allowing better overview of the mechanisms involved at each time range.
Doni Jayavelu, Naresh; Bar, Nadav
Understanding gene transcription regulatory networks is critical to deciphering the molecular mechanisms of different cellular states. Most studies focus on static transcriptional networks. In the current study, we used the gastrin-regulated system as a model to understand the dynamics of transcriptional networks composed of transcription factors (TFs) and target genes (TGs). The hormone gastrin activates and stimulates signaling pathways leading to various cellular states through transcriptional programs. Dysregulation of gastrin can result in cancerous tumors, for example. However, the regulatory networks involving gastrin are highly complex, and the roles of most of the components of these networks are unknown. We used time series microarray data of AR42J adenocarcinoma cells treated with gastrin combined with static TF-TG relationships integrated from different sources, and we reconstructed the dynamic activities of TFs using network component analysis (NCA). Based on the peak expression of TGs and activity of TFs, we created active sub-networks at four time ranges after gastrin treatment, namely immediate-early (IE), mid-early (ME), mid-late (ML) and very late (VL). Network analysis revealed that the active sub-networks were topologically different at the early and late time ranges. Gene ontology analysis unveiled that each active sub-network was highly enriched in a particular biological process. Interestingly, network motif patterns were also distinct between the sub-networks. This analysis can be applied to other time series microarray datasets, focusing on smaller sub-networks that are activated in a cascade, allowing better overview of the mechanisms involved at each time range.
Azpeitia, Eugenio; Davila-Velderrain, José; Villarreal, Carlos; Alvarez-Buylla, Elena R
Understanding how genotypes map unto phenotypes implies an integrative understanding of the processes regulating cell differentiation and morphogenesis, which comprise development. Such a task requires the use of theoretical and computational approaches to integrate and follow the concerted action of multiple genetic and nongenetic components that hold highly nonlinear interactions. Gene regulatory network (GRN) models have been proposed to approach such task. GRN models have become very useful to understand how such types of interactions restrict the multi-gene expression patterns that characterize different cell-fates. More recently, such temporal single-cell models have been extended to recover the temporal and spatial components of morphogenesis. Since the complete genomic GRN is still unknown and intractable for any organism, and some clear developmental modules have been identified, we focus here on the analysis of well-curated and experimentally grounded small GRN modules. One of the first experimentally grounded GRN that was proposed and validated corresponds to the regulatory module involved in floral organ determination. In this chapter we use this GRN as an example of the methodologies involved in: (1) formalizing and integrating molecular genetic data into the logical functions (Boolean functions) that rule gene interactions and dynamics in a Boolean GRN; (2) the algorithms and computational approaches used to recover the steady-states that correspond to each cell type, as well as the set of initial GRN configurations that lead to each one of such states (i.e., basins of attraction); (3) the approaches used to validate a GRN model using wild type and mutant or overexpression data, or to test the robustness of the GRN being proposed; (4) some of the methods that have been used to incorporate random fluctuations in the GRN Boolean functions and enable stochastic GRN models to address the temporal sequence with which gene configurations and cell fates are
Gui, Shupeng; Rice, Andrew P; Chen, Rui; Wu, Liang; Liu, Ji; Miao, Hongyu
Gene regulatory interactions are of fundamental importance to various biological functions and processes. However, only a few previous computational studies have claimed success in revealing genome-wide regulatory landscapes from temporal gene expression data, especially for complex eukaryotes like human. Moreover, recent work suggests that these methods still suffer from the curse of dimensionality if a network size increases to 100 or higher. Here we present a novel scalable algorithm for identifying genome-wide gene regulatory network (GRN) structures, and we have verified the algorithm performances by extensive simulation studies based on the DREAM challenge benchmark data. The highlight of our method is that its superior performance does not degenerate even for a network size on the order of 10(4), and is thus readily applicable to large-scale complex networks. Such a breakthrough is achieved by considering both prior biological knowledge and multiple topological properties (i.e., sparsity and hub gene structure) of complex networks in the regularized formulation. We also validate and illustrate the application of our algorithm in practice using the time-course gene expression data from a study on human respiratory epithelial cells in response to influenza A virus (IAV) infection, as well as the CHIP-seq data from ENCODE on transcription factor (TF) and target gene interactions. An interesting finding, owing to the proposed algorithm, is that the biggest hub structures (e.g., top ten) in the GRN all center at some transcription factors in the context of epithelial cell infection by IAV. The proposed algorithm is the first scalable method for large complex network structure identification. The GRN structure identified by our algorithm could reveal possible biological links and help researchers to choose which gene functions to investigate in a biological event. The algorithm described in this article is implemented in MATLAB (Ⓡ) , and the source code is
Full Text Available The root epidermis of Arabidopsis provides an exceptional model for studying the molecular basis of cell fate and differentiation. To obtain a systems-level view of root epidermal cell differentiation, we used a genome-wide transcriptome approach to define and organize a large set of genes into a transcriptional regulatory network. Using cell fate mutants that produce only one of the two epidermal cell types, together with fluorescence-activated cell-sorting to preferentially analyze the root epidermis transcriptome, we identified 1,582 genes differentially expressed in the root-hair or non-hair cell types, including a set of 208 "core" root epidermal genes. The organization of the core genes into a network was accomplished by using 17 distinct root epidermis mutants and 2 hormone treatments to perturb the system and assess the effects on each gene's transcript accumulation. In addition, temporal gene expression information from a developmental time series dataset and predicted gene associations derived from a Bayesian modeling approach were used to aid the positioning of genes within the network. Further, a detailed functional analysis of likely bHLH regulatory genes within the network, including MYC1, bHLH54, bHLH66, and bHLH82, showed that three distinct subfamilies of bHLH proteins participate in root epidermis development in a stage-specific manner. The integration of genetic, genomic, and computational analyses provides a new view of the composition, architecture, and logic of the root epidermal transcriptional network, and it demonstrates the utility of a comprehensive systems approach for dissecting a complex regulatory network.
Bruex, Angela; Kainkaryam, Raghunandan M.; Wieckowski, Yana; Kang, Yeon Hee; Bernhardt, Christine; Xia, Yang; Zheng, Xiaohua; Wang, Jean Y.; Lee, Myeong Min; Benfey, Philip; Woolf, Peter J.; Schiefelbein, John
The root epidermis of Arabidopsis provides an exceptional model for studying the molecular basis of cell fate and differentiation. To obtain a systems-level view of root epidermal cell differentiation, we used a genome-wide transcriptome approach to define and organize a large set of genes into a transcriptional regulatory network. Using cell fate mutants that produce only one of the two epidermal cell types, together with fluorescence-activated cell-sorting to preferentially analyze the root epidermis transcriptome, we identified 1,582 genes differentially expressed in the root-hair or non-hair cell types, including a set of 208 “core” root epidermal genes. The organization of the core genes into a network was accomplished by using 17 distinct root epidermis mutants and 2 hormone treatments to perturb the system and assess the effects on each gene's transcript accumulation. In addition, temporal gene expression information from a developmental time series dataset and predicted gene associations derived from a Bayesian modeling approach were used to aid the positioning of genes within the network. Further, a detailed functional analysis of likely bHLH regulatory genes within the network, including MYC1, bHLH54, bHLH66, and bHLH82, showed that three distinct subfamilies of bHLH proteins participate in root epidermis development in a stage-specific manner. The integration of genetic, genomic, and computational analyses provides a new view of the composition, architecture, and logic of the root epidermal transcriptional network, and it demonstrates the utility of a comprehensive systems approach for dissecting a complex regulatory network. PMID:22253603
Linksvayer, Timothy A; Fewell, Jennifer H; Gadau, Jürgen; Laubichler, Manfred D
The evolution and development of complex phenotypes in social insect colonies, such as queen-worker dimorphism or division of labor, can, in our opinion, only be fully understood within an expanded mechanistic framework of Developmental Evolution. Conversely, social insects offer a fertile research area in which fundamental questions of Developmental Evolution can be addressed empirically. We review the concept of gene regulatory networks (GRNs) that aims to fully describe the battery of interacting genomic modules that are differentially expressed during the development of individual organisms. We discuss how distinct types of network models have been used to study different levels of biological organization in social insects, from GRNs to social networks. We propose that these hierarchical networks spanning different organizational levels from genes to societies should be integrated and incorporated into full GRN models to elucidate the evolutionary and developmental mechanisms underlying social insect phenotypes. Finally, we discuss prospects and approaches to achieve such an integration. © 2012 WILEY PERIODICALS, INC.
Full Text Available Abstract Background Transcriptional regulation of gene activity is essential for any living organism. Transcription factors therefore recognize specific binding sites within the DNA to regulate the expression of particular target genes. The genome-scale reconstruction of the emerging regulatory networks is important for biotechnology and human medicine but cost-intensive, time-consuming, and impossible to perform for any species separately. By using bioinformatics methods one can partially transfer networks from well-studied model organisms to closely related species. However, the prediction quality is limited by the low level of evolutionary conservation of the transcription factor binding sites, even within organisms of the same genus. Results Here we present an integrated bioinformatics workflow that assures the reliability of transferred gene regulatory networks. Our approach combines three methods that can be applied on a large-scale: re-assessment of annotated binding sites, subsequent binding site prediction, and homology detection. A gene regulatory interaction is considered to be conserved if (1 the transcription factor, (2 the adjusted binding site, and (3 the target gene are conserved. The power of the approach is demonstrated by transferring gene regulations from the model organism Corynebacterium glutamicum to the human pathogens C. diphtheriae, C. jeikeium, and the biotechnologically relevant C. efficiens. For these three organisms we identified reliable transcriptional regulations for ~40% of the common transcription factors, compared to ~5% for which knowledge was available before. Conclusion Our results suggest that trustworthy genome-scale transfer of gene regulatory networks between organisms is feasible in general but still limited by the level of evolutionary conservation.
Alessandra M. Sullivan
Full Text Available Our understanding of gene regulation in plants is constrained by our limited knowledge of plant cis-regulatory DNA and its dynamics. We mapped DNase I hypersensitive sites (DHSs in A. thaliana seedlings and used genomic footprinting to delineate ∼700,000 sites of in vivo transcription factor (TF occupancy at nucleotide resolution. We show that variation associated with 72 diverse quantitative phenotypes localizes within DHSs. TF footprints encode an extensive cis-regulatory lexicon subject to recent evolutionary pressures, and widespread TF binding within exons may have shaped codon usage patterns. The architecture of A. thaliana TF regulatory networks is strikingly similar to that of animals in spite of diverged regulatory repertoires. We analyzed regulatory landscape dynamics during heat shock and photomorphogenesis, disclosing thousands of environmentally sensitive elements and enabling mapping of key TF regulatory circuits underlying these fundamental responses. Our results provide an extensive resource for the study of A. thaliana gene regulation and functional biology.
Zopf, Christopher; Maheshri, Narendra
At eukaryotic promoters, chromatin can influence the relationship between a gene's expression and transcription factor (TF) activity. This additional complexity might allow single promoters to exhibit dynamical behavior commonly attributed to regulatory motifs involving multiple genes. We investigate the role of promoter chromatin architecture in the kinetics of gene activation using a previously described set of promoter variants based on the phosphate-regulated PHO5 promoter in S. cerevisiae. Accurate quantitative measurement of transcription activation kinetics is facilitated by a controllable and observable TF input to a promoter of interest leading to an observable expression output in single cells. We find the particular architecture of these promoters can result in a significant delay in activation, filtering of noisy TF signals, and a memory of previous activation -- dynamical behaviors reminiscent of a feed-forward loop but only requiring a single promoter. We suggest this is a consequence of chromatin transactions at the promoter, likely passing through a long-lived ``primed'' state between its inactive and competent states. Finally, we show our experimental setup can be generalized as a ``gene oscilloscope'' to probe the kinetics of heterologous promoter architectures.
Imam, Saheed; Noguera, Daniel R; Donohue, Timothy J
Photosynthesis is a crucial biological process that depends on the interplay of many components. This work analyzed the gene targets for 4 transcription factors: FnrL, PrrA, CrpK and MppG (RSP_2888), which are known or predicted to control photosynthesis in Rhodobacter sphaeroides. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) identified 52 operons under direct control of FnrL, illustrating its regulatory role in photosynthesis, iron homeostasis, nitrogen metabolism and regulation of sRNA synthesis. Using global gene expression analysis combined with ChIP-seq, we mapped the regulons of PrrA, CrpK and MppG. PrrA regulates ∼34 operons encoding mainly photosynthesis and electron transport functions, while CrpK, a previously uncharacterized Crp-family protein, regulates genes involved in photosynthesis and maintenance of iron homeostasis. Furthermore, CrpK and FnrL share similar DNA binding determinants, possibly explaining our observation of the ability of CrpK to partially compensate for the growth defects of a ΔFnrL mutant. We show that the Rrf2 family protein, MppG, plays an important role in photopigment biosynthesis, as part of an incoherent feed-forward loop with PrrA. Our results reveal a previously unrealized, high degree of combinatorial regulation of photosynthetic genes and significant cross-talk between their transcriptional regulators, while illustrating previously unidentified links between photosynthesis and the maintenance of iron homeostasis.
Pluripotency is a state that exists transiently in the early embryo and, remarkably, can be recapitulated in vitro by deriving embryonic stem cells or by reprogramming somatic cells to become induced pluripotent stem cells. The state of pluripotency, which is stabilized by an interconnected network of pluripotency-associated genes, integrates external signals and exerts control over the decision between self-renewal and differentiation at the transcriptional, post-transcriptional and epigenetic levels. Recent evidence of alternative pluripotency states indicates the regulatory flexibility of this network. Insights into the underlying principles of the pluripotency network may provide unprecedented opportunities for studying development and for regenerative medicine.
Gottesman, Omri; Kuivaniemi, Helena; Tromp, Gerard; Faucett, W Andrew; Li, Rongling; Manolio, Teri A; Sanderson, Saskia C; Kannry, Joseph; Zinberg, Randi; Basford, Melissa A; Brilliant, Murray; Carey, David J; Chisholm, Rex L; Chute, Christopher G; Connolly, John J; Crosslin, David; Denny, Joshua C; Gallego, Carlos J; Haines, Jonathan L; Hakonarson, Hakon; Harley, John; Jarvik, Gail P; Kohane, Isaac; Kullo, Iftikhar J; Larson, Eric B; McCarty, Catherine; Ritchie, Marylyn D; Roden, Dan M; Smith, Maureen E; Böttinger, Erwin P; Williams, Marc S
The Electronic Medical Records and Genomics Network is a National Human Genome Research Institute-funded consortium engaged in the development of methods and best practices for using the electronic medical record as a tool for genomic research. Now in its sixth year and second funding cycle, and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from electronic medical records can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and health-care informatics, particularly for electronic phenotyping, genome-wide association studies, genomic medicine implementation, and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here, we describe the evolution, accomplishments, opportunities, and challenges of the network from its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting toward the implementation of genomic medicine.
Pájaro, Manuel; Otero-Muras, Irene; Vázquez, Carlos; Alonso, Antonio A
Gene regulation is inherently stochastic. In many applications concerning Systems and Synthetic Biology such as the reverse engineering and the de novo design of genetic circuits, stochastic effects (yet potentially crucial) are often neglected due to the high computational cost of stochastic simulations. With advances in these fields there is an increasing need of tools providing accurate approximations of the stochastic dynamics of gene regulatory networks (GRNs) with reduced computational effort. This work presents SELANSI (SEmi-LAgrangian SImulation of GRNs), a software toolbox for the simulation of stochastic multidimensional gene regulatory networks. SELANSI exploits intrinsic structural properties of gene regulatory networks to accurately approximate the corresponding chemical master equation (CME) with a partial integral differential equation (PIDE) that is solved by a semi-lagrangian method with high efficiency. Networks under consideration might involve multiple genes with self and cross regulations, in which genes can be regulated by different transcription factors. Moreover, the validity of the method is not restricted to a particular type of kinetics. The tool offers total flexibility regarding network topology, kinetics and parameterization, as well as simulation options. SELANSI runs under the MATLAB environment, and is available under GPLv3 license at https://sites.google.com/view/selansi. email@example.com.
Lanza, Val F; Baquero, Fernando; de la Cruz, Fernando; Coque, Teresa M
AcCNET (Accessory genome Constellation Network) is a Perl application that aims to compare accessory genomes of a large number of genomic units, both at qualitative and quantitative levels. Using the proteomes extracted from the analysed genomes, AcCNET creates a bipartite network compatible with standard network analysis platforms. AcCNET allows merging phylogenetic and functional information about the concerned genomes, thus improving the capability of current methods of network analysis. The AcCNET bipartite network opens a new perspective to explore the pangenome of bacterial species, focusing on the accessory genome behind the idiosyncrasy of a particular strain and/or population. AcCNET is available under GNU General Public License version 3.0 (GPLv3) from http://sourceforge.net/projects/accnet CONTACT: : firstname.lastname@example.orgSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com.
Sanz, J; Borge-Holthoefer, J; Moreno, Y
The topological analysis of biological networks has been a prolific topic in network science during the last decade. A persistent problem with this approach is the inherent uncertainty and noisy nature of the data. One of the cases in which this situation is more marked is that of transcriptional regulatory networks (TRNs) in bacteria. The datasets are incomplete because regulatory pathways associated to a relevant fraction of bacterial genes remain unknown. Furthermore, direction, strengths and signs of the links are sometimes unknown or simply overlooked. Finally, the experimental approaches to infer the regulations are highly heterogeneous, in a way that induces the appearance of systematic experimental-topological correlations. And yet, the quality of the available data increases constantly. In this work we capitalize on these advances to point out the influence of data (in)completeness and quality on some classical results on topological analysis of TRNs, specially regarding modularity at different level...
Full Text Available Human gene regulatory networks (GRN can be difficult to interpret due to a tangle of edges interconnecting thousands of genes. We constructed a general human GRN from extensive transcription factor and microRNA target data obtained from public databases. In a subnetwork of this GRN that is active during estrogen stimulation of MCF-7 breast cancer cells, we benchmarked automated algorithms for identifying core regulatory genes (transcription factors and microRNAs. Among these algorithms, we identified K-core decomposition, pagerank and betweenness centrality algorithms as the most effective for discovering core regulatory genes in the network evaluated based on previously known roles of these genes in MCF-7 biology as well as in their ability to explain the up or down expression status of up to 70% of the remaining genes. Finally, we validated the use of K-core algorithm for organizing the GRN in an easier to interpret layered hierarchy where more influential regulatory genes percolate towards the inner layers. The integrated human gene and miRNA network and software used in this study are provided as supplementary materials (S1 Data accompanying this manuscript.
Narang, Vipin; Ramli, Muhamad Azfar; Singhal, Amit; Kumar, Pavanish; de Libero, Gennaro; Poidinger, Michael; Monterola, Christopher
Human gene regulatory networks (GRN) can be difficult to interpret due to a tangle of edges interconnecting thousands of genes. We constructed a general human GRN from extensive transcription factor and microRNA target data obtained from public databases. In a subnetwork of this GRN that is active during estrogen stimulation of MCF-7 breast cancer cells, we benchmarked automated algorithms for identifying core regulatory genes (transcription factors and microRNAs). Among these algorithms, we identified K-core decomposition, pagerank and betweenness centrality algorithms as the most effective for discovering core regulatory genes in the network evaluated based on previously known roles of these genes in MCF-7 biology as well as in their ability to explain the up or down expression status of up to 70% of the remaining genes. Finally, we validated the use of K-core algorithm for organizing the GRN in an easier to interpret layered hierarchy where more influential regulatory genes percolate towards the inner layers. The integrated human gene and miRNA network and software used in this study are provided as supplementary materials (S1 Data) accompanying this manuscript.
Kluger Harriet M
Full Text Available Abstract Background High throughput gene expression experiments yield large amounts of data that can augment our understanding of disease processes, in addition to classifying samples. Here we present new paradigms of data Separation based on construction of transcriptional regulatory networks for normal and abnormal cells using sequence predictions, literature based data and gene expression studies. We analyzed expression datasets from a number of diseased and normal cells, including different types of acute leukemia, and breast cancer with variable clinical outcome. Results We constructed sample-specific regulatory networks to identify links between transcription factors (TFs and regulated genes that differentiate between healthy and diseased states. This approach carries the advantage of identifying key transcription factor-gene pairs with differential activity between healthy and diseased states rather than merely using gene expression profiles, thus alluding to processes that may be involved in gene deregulation. We then generalized this approach by studying simultaneous changes in functionality of multiple regulatory links pointing to a regulated gene or emanating from one TF (or changes in gene centrality defined by its in-degree or out-degree measures, respectively. We found that samples can often be separated based on these measures of gene centrality more robustly than using individual links. We examined distributions of distances (the number of links needed to traverse the path between each pair of genes in the transcriptional networks for gene subsets whose collective expression profiles could best separate each dataset into predefined groups. We found that genes that optimally classify samples are concentrated in neighborhoods in the gene regulatory networks. This suggests that genes that are deregulated in diseased states exhibit a remarkable degree of connectivity. Conclusion Transcription factor-regulated gene links and
Imam, Saheed; Noguera, Daniel R.; Donohue, Timothy J.
Photosynthesis is a crucial biological process that depends on the interplay of many components. This work analyzed the gene targets for 4 transcription factors: FnrL, PrrA, CrpK and MppG (RSP_2888), which are known or predicted to control photosynthesis in Rhodobacter sphaeroides. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) identified 52 operons under direct control of FnrL, illustrating its regulatory role in photosynthesis, iron homeostasis, nitrogen metabolism and regulation of sRNA synthesis. Using global gene expression analysis combined with ChIP-seq, we mapped the regulons of PrrA, CrpK and MppG. PrrA regulates ∼34 operons encoding mainly photosynthesis and electron transport functions, while CrpK, a previously uncharacterized Crp-family protein, regulates genes involved in photosynthesis and maintenance of iron homeostasis. Furthermore, CrpK and FnrL share similar DNA binding determinants, possibly explaining our observation of the ability of CrpK to partially compensate for the growth defects of a ΔFnrL mutant. We show that the Rrf2 family protein, MppG, plays an important role in photopigment biosynthesis, as part of an incoherent feed-forward loop with PrrA. Our results reveal a previously unrealized, high degree of combinatorial regulation of photosynthetic genes and significant cross-talk between their transcriptional regulators, while illustrating previously unidentified links between photosynthesis and the maintenance of iron homeostasis. PMID:25503406
Full Text Available Photosynthesis is a crucial biological process that depends on the interplay of many components. This work analyzed the gene targets for 4 transcription factors: FnrL, PrrA, CrpK and MppG (RSP_2888, which are known or predicted to control photosynthesis in Rhodobacter sphaeroides. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq identified 52 operons under direct control of FnrL, illustrating its regulatory role in photosynthesis, iron homeostasis, nitrogen metabolism and regulation of sRNA synthesis. Using global gene expression analysis combined with ChIP-seq, we mapped the regulons of PrrA, CrpK and MppG. PrrA regulates ∼34 operons encoding mainly photosynthesis and electron transport functions, while CrpK, a previously uncharacterized Crp-family protein, regulates genes involved in photosynthesis and maintenance of iron homeostasis. Furthermore, CrpK and FnrL share similar DNA binding determinants, possibly explaining our observation of the ability of CrpK to partially compensate for the growth defects of a ΔFnrL mutant. We show that the Rrf2 family protein, MppG, plays an important role in photopigment biosynthesis, as part of an incoherent feed-forward loop with PrrA. Our results reveal a previously unrealized, high degree of combinatorial regulation of photosynthetic genes and significant cross-talk between their transcriptional regulators, while illustrating previously unidentified links between photosynthesis and the maintenance of iron homeostasis.
Full Text Available Transcriptional networks consist of multiple regulatory layers corresponding to the activity of global regulators, specialized repressors and activators as well as proteins and enzymes shaping the DNA template. Such intrinsic complexity makes uncovering connections difficult and it calls for corresponding methodologies, which are adapted to the available data. Here we present a new computational method that predicts interactions between transcription factors and target genes using compendia of microarray gene expression data and documented interactions between genes and transcription factors. The proposed method, called Kernel Embedding of Regulatory Networks (KEREN, is based on the concept of gene-regulon association, and captures hidden geometric patterns of the network via manifold embedding. We applied KEREN to reconstruct transcription regulatory interactions on a genome-wide scale in the model bacteria Escherichia coli (E. coli. Application of the method not only yielded accurate predictions of verifiable interactions, which outperformed on certain metrics comparable methodologies, but also demonstrated the utility of a geometric approach in the analysis of high-dimensional biological data. We also described possible applications of kernel embedding techniques to other function and network discovery algorithms.
Zare, Hossein; Kaveh, Mostafa; Khodursky, Arkady
Transcriptional networks consist of multiple regulatory layers corresponding to the activity of global regulators, specialized repressors and activators as well as proteins and enzymes shaping the DNA template. Such intrinsic complexity makes uncovering connections difficult and it calls for corresponding methodologies, which are adapted to the available data. Here we present a new computational method that predicts interactions between transcription factors and target genes using compendia of microarray gene expression data and documented interactions between genes and transcription factors. The proposed method, called Kernel Embedding of Regulatory Networks (KEREN), is based on the concept of gene-regulon association, and captures hidden geometric patterns of the network via manifold embedding. We applied KEREN to reconstruct transcription regulatory interactions on a genome-wide scale in the model bacteria Escherichia coli (E. coli). Application of the method not only yielded accurate predictions of verifiable interactions, which outperformed on certain metrics comparable methodologies, but also demonstrated the utility of a geometric approach in the analysis of high-dimensional biological data. We also described possible applications of kernel embedding techniques to other function and network discovery algorithms.
Full Text Available Next-generation sequencing was exploited to gain deeper insight into the response to infection by Candidatus liberibacter asiaticus (CaLas, especially the immune disregulation and metabolic dysfunction caused by source-sink disruption. Previous fruit transcriptome data were compared with additional RNA-Seq data in three tissues: immature fruit, and young and mature leaves. Four categories of orchard trees were studied: symptomatic, asymptomatic, apparently healthy, and healthy. Principal component analysis found distinct expression patterns between immature and mature fruits and leaf samples for all four categories of trees. A predicted protein - protein interaction network identified HLB-regulated genes for sugar transporters playing key roles in the overall plant responses. Gene set and pathway enrichment analyses highlight the role of sucrose and starch metabolism in disease symptom development in all tissues. HLB-regulated genes (glucose-phosphate-transporter, invertase, starch-related genes would likely determine the source-sink relationship disruption. In infected leaves, transcriptomic changes were observed for light reactions genes (downregulation, sucrose metabolism (upregulation, and starch biosynthesis (upregulation. In parallel, symptomatic fruits over-expressed genes involved in photosynthesis, sucrose and raffinose metabolism, and downregulated starch biosynthesis. We visualized gene networks between tissues inducing a source-sink shift. CaLas alters the hormone crosstalk, resulting in weak and ineffective tissue-specific plant immune responses necessary for bacterial clearance. Accordingly, expression of WRKYs (including WRKY70 was higher in fruits than in leaves. Systemic acquired responses were inadequately activated in young leaves, generally considered the sites where most new infections occur.
Fung, Elizabeth-sharon [Los Alamos National Laboratory
Choice of a T-lymphoid fate by hematopoietic progenitor cells depends on sustained Notch-Delta signaling combined with tightly-regulated activities of multiple transcription factors. To dissect the regulatory network connections that mediate this process, we have used high-resolution analysis of regulatory gene expression trajectories from the beginning to the end of specification; tests of the short-term Notchdependence of these gene expression changes; and perturbation analyses of the effects of overexpression of two essential transcription factors, namely PU.l and GATA-3. Quantitative expression measurements of >50 transcription factor and marker genes have been used to derive the principal components of regulatory change through which T-cell precursors progress from primitive multipotency to T-lineage commitment. Distinct parts of the path reveal separate contributions of Notch signaling, GATA-3 activity, and downregulation of PU.l. Using BioTapestry, the results have been assembled into a draft gene regulatory network for the specification of T-cell precursors and the choice of T as opposed to myeloid dendritic or mast-cell fates. This network also accommodates effects of E proteins and mutual repression circuits of Gfil against Egr-2 and of TCF-l against PU.l as proposed elsewhere, but requires additional functions that remain unidentified. Distinctive features of this network structure include the intense dose-dependence of GATA-3 effects; the gene-specific modulation of PU.l activity based on Notch activity; the lack of direct opposition between PU.l and GATA-3; and the need for a distinct, late-acting repressive function or functions to extinguish stem and progenitor-derived regulatory gene expression.
Full Text Available Abstract Background Fission yeast Schizosaccharomyces pombe and budding yeast Saccharomyces cerevisiae are among the original model organisms in the study of the cell-division cycle. Unlike budding yeast, no large-scale regulatory network has been constructed for fission yeast. It has only been partially characterized. As a result, important regulatory cascades in budding yeast have no known or complete counterpart in fission yeast. Results By integrating genome-wide data from multiple time course cell cycle microarray experiments we reconstructed a gene regulatory network. Based on the network, we discovered in addition to previously known regulatory hubs in M phase, a new putative regulatory hub in the form of the HMG box transcription factor SPBC19G7.04. Further, we inferred periodic activities of several less known transcription factors over the course of the cell cycle, identified over 500 putative regulatory targets and detected many new phase-specific and conserved cis-regulatory motifs. In particular, we show that SPBC19G7.04 has highly significant periodic activity that peaks in early M phase, which is coordinated with the late G2 activity of the forkhead transcription factor fkh2. Finally, using an enhanced Bayesian algorithm to co-cluster the expression data, we obtained 31 clusters of co-regulated genes 1 which constitute regulatory modules from different phases of the cell cycle, 2 whose phase order is coherent across the 10 time course experiments, and 3 which lead to identification of phase-specific control elements at both the transcriptional and post-transcriptional levels in S. pombe. In particular, the ribosome biogenesis clusters expressed in G2 phase reveal new, highly conserved RNA motifs. Conclusion Using a systems-level analysis of the phase-specific nature of the S. pombe cell cycle gene regulation, we have provided new testable evidence for post-transcriptional regulation in the G2 phase of the fission yeast cell cycle
Teixeira, Miguel C; Monteiro, Pedro T; Palma, Margarida; Costa, Catarina; Godinho, Cláudia P; Pais, Pedro; Cavalheiro, Mafalda; Antunes, Miguel; Lemos, Alexandre; Pedreira, Tiago; Sá-Correia, Isabel
The YEAst Search for Transcriptional Regulators And Consensus Tracking (YEASTRACT-www.yeastract.com) information system has been, for 11 years, a key tool for the analysis and prediction of transcription regulatory associations at the gene and genomic levels in Saccharomyces cerevisiae. Since its last update in June 2017, YEASTRACT includes approximately 163000 regulatory associations between transcription factors (TF) and target genes in S. cerevisiae, based on more than 1600 bibliographic references; it also includes 247 specific DNA binding consensus recognized by 113 TFs. This release of the YEASTRACT database provides new visualization tools to visualize each regulatory network in an interactive fashion, enabling the user to select and observe subsets of the network such as: (i) considering only DNA binding evidence or both DNA binding and expression evidence; (ii) considering only either positive or negative regulatory associations; or (iii) considering only one set of related environmental conditions. A further tool to observe TF regulons is also offered, enabling a clear-cut understanding of the exact meaning of the available data. We believe that with this new version, YEASTRACT will improve its role as an open web resource instrumental for Yeast Biologists and Systems Biology researchers. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Full Text Available Alzheimer’s disease (AD is the most common form of dementia and leads to irreversible neurodegenerative damage of the brain. Finding the dynamic responses of genes, signaling proteins, transcription factor (TF activities, and regulatory networks of the progressively deteriorative progress of AD would represent a significant advance in discovering the pathogenesis of AD. However, the high throughput technologies of measuring TF activities are not yet available on a genome-wide scale. In this study, based on DNA microarray gene expression data and a priori information of TFs, network component analysis (NCA algorithm is applied to determining the TF activities and regulatory influences on TGs of incipient, moderate, and severe AD. Based on that, the dynamical gene regulatory networks of the deteriorative courses of AD were reconstructed. To select significant genes which are differentially expressed in different courses of AD, independent component analysis (ICA, which is better than the traditional clustering methods and can successfully group one gene in different meaningful biological processes, was used. The molecular biological analysis showed that the changes of TF activities and interactions of signaling proteins in mitosis, cell cycle, immune response, and inflammation play an important role in the deterioration of AD.
Crombach, Anton; Wotton, Karl R.; Jiménez-Guri, Eva; Jaeger, Johannes
Developmental gene networks implement the dynamic regulatory mechanisms that pattern and shape the organism. Over evolutionary time, the wiring of these networks changes, yet the patterning outcome is often preserved, a phenomenon known as “system drift.” System drift is illustrated by the gap gene network—involved in segmental patterning—in dipteran insects. In the classic model organism Drosophila melanogaster and the nonmodel scuttle fly Megaselia abdita, early activation and placement of gap gene expression domains show significant quantitative differences, yet the final patterning output of the system is essentially identical in both species. In this detailed modeling analysis of system drift, we use gene circuits which are fit to quantitative gap gene expression data in M. abdita and compare them with an equivalent set of models from D. melanogaster. The results of this comparative analysis show precisely how compensatory regulatory mechanisms achieve equivalent final patterns in both species. We discuss the larger implications of the work in terms of “genotype networks” and the ways in which the structure of regulatory networks can influence patterns of evolutionary change (evolvability). PMID:26796549
Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard
The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the
Full Text Available Abstract Background Estrogens regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression. Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. Dynamic gene expression changes have been shown to characterize the breast cancer cell response to estrogens, the every molecular mechanism of which is still not well understood. Results We developed a modulated empirical Bayes model, and constructed a novel topological and temporal transcription factor (TF regulatory network in MCF7 breast cancer cell line upon stimulation by 17β-estradiol stimulation. In the network, significant TF genomic hubs were identified including ER-alpha and AP-1; significant non-genomic hubs include ZFP161, TFDP1, NRF1, TFAP2A, EGR1, E2F1, and PITX2. Although the early and late networks were distinct ( Conclusions We identified a number of estrogen regulated target genes and established estrogen-regulated network that distinguishes the genomic and non-genomic actions of estrogen receptor. Many gene targets of this network were not active anymore in anti-estrogen resistant cell lines, possibly because their DNA methylation and histone acetylation patterns have changed.
Jolly Emmitt R
Full Text Available Abstract Background A major challenge in computational genomics is the development of methodologies that allow accurate genome-wide prediction of the regulatory targets of a transcription factor. We present a method for target identification that combines experimental characterization of binding requirements with computational genomic analysis. Results Our method identified potential target genes of the transcription factor Ndt80, a key transcriptional regulator involved in yeast sporulation, using the combined information of binding affinity, positional distribution, and conservation of the binding sites across multiple species. We have also developed a mathematical approach to compute the false positive rate and the total number of targets in the genome based on the multiple selection criteria. Conclusion We have shown that combining biochemical characterization and computational genomic analysis leads to accurate identification of the genome-wide targets of a transcription factor. The method can be extended to other transcription factors and can complement other genomic approaches to transcriptional regulation.
Chowdhury, Ahsan Raja; Chetty, Madhu; Evans, Rob
Microarray gene expression data can provide insights into biological processes at a system-wide level and is commonly used for reverse engineering gene regulatory networks (GRN). Due to the amalgamation of noise from different sources, microarray expression profiles become inherently noisy leading to significant impact on the GRN reconstruction process. Microarray replicates (both biological and technical), generated to increase the reliability of data obtained under noisy conditions, have limited influence in enhancing the accuracy of reconstruction . Therefore, instead of the conventional GRN modeling approaches which are deterministic, stochastic techniques are becoming increasingly necessary for inferring GRN from noisy microarray data. In this paper, we propose a new stochastic GRN model by investigating incorporation of various standard noise measurements in the deterministic S-system model. Experimental evaluations performed for varying sizes of synthetic network, representing different stochastic processes, demonstrate the effect of noise on the accuracy of genetic network modeling and the significance of stochastic modeling for GRN reconstruction . The proposed stochastic model is subsequently applied to infer the regulations among genes in two real life networks: (1) the well-studied IRMA network, a real-life in-vivo synthetic network constructed within the Saccharomyces cerevisiae yeast, and (2) the SOS DNA repair network in Escherichia coli.
Shao, Bin; Wu, Jiayi; Tian, Binghui; Ouyang, Qi
Reconstructing the topological structure of biological regulatory networks from microarray expression data or data of protein expression profiles is one of major tasks in systems biology. In recent years, various mathematical methods have been developed to meet this task. Here, based on our previously reported reverse engineering method, we propose a new constraint, i.e., the minimum network constraint, to facilitate the reconstruction of biological networks. Three well studied regulatory networks (the budding yeast cell cycle network, the fission yeast cell cycle network, and the SOS network of Escherichia coli) were used as the test sets to verify the performance of this method. Numerical results show that the biological networks prefer to use the minimal networks to fulfill their functional tasks, making it possible to apply minimal network criteria in the network reconstruction process. Two scenarios were considered in the reconstruction process: generating data using different initial conditions; and generating data from knock out and over-expression experiments. In both cases, network structures are revealed faithfully in a few steps using our approach. Copyright © 2015 Elsevier Ltd. All rights reserved.
Shlykova, Irina; Ponosov, Arcady
There are different ways of how to model gene regulatory networks. Differential equations allow for a detailed description of the network's dynamics and provide an explicit model of the gene concentration changes over time. Production and relative degradation rate functions used in such models depend on the vector of steeply sloped threshold functions which characterize the activity of genes. The most popular example of the threshold functions comes from the Boolean network approach, where the threshold functions are given by step functions. The system of differential equations becomes then piecewise linear. The dynamics of this system can be described very easily between the thresholds, but not in the switching domains. For instance this approach fails to analyze stationary points of the system and to define continuous solutions in the switching domains. These problems were studied in , , but the proposed model did not take into account a time delay in cellular systems. However, analysis of real gene expression data shows a considerable number of time-delayed interactions suggesting that time delay is essential in gene regulation. Therefore, delays may have a great effect on the dynamics of the system presenting one of the critical factors that should be considered in reconstruction of gene regulatory networks. The goal of this work is to apply the singular perturbation analysis to certain systems with delay and to obtain an analog of Tikhonov's theorem, which provides sufficient conditions for constracting the limit system in the delay case.
Lowe, Elijah K; Cuomo, Claudia; Arnone, Maria I
Gene regulatory networks (GRNs) describe the interactions for a developmental process at a given time and space. Historically, perturbation experiments represent one of the key methods for analyzing and reconstructing a GRN, and the GRN governing early development in the sea urchin embryo stands as one of the more deeply dissected so far. As technology progresses, so do the methods used to address different biological questions. Next-generation sequencing (NGS) has become a standard experimental technique for genome and transcriptome sequencing and studies of protein-DNA interactions and DNA accessibility. While several efforts have been made toward the integration of different omics approaches for the study of the regulatory genome in many animals, in a few cases, these are applied with the purpose of reconstructing and experimentally testing developmental GRNs. Here, we review emerging approaches integrating multiple NGS technologies for the prediction and validation of gene interactions within echinoderm GRNs. These approaches can be applied to both 'model' and 'non-model' organisms. Although a number of issues still need to be addressed, advances in NGS applications, such as assay for transposase-accessible chromatin sequencing, combined with the availability of embryos belonging to different species, all separated by various evolutionary distances and accessible to experimental regulatory biology, place echinoderms in an unprecedented position for the reconstruction and evolutionary comparison of developmental GRNs. We conclude that sequencing technologies and integrated omics approaches allow the examination of GRNs on a genome-wide scale only if biological perturbation and cis-regulatory analyses are experimentally accessible, as in the case of echinoderm embryos. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: firstname.lastname@example.org.
Pino Del Carpio, Dunia; Basnet, Ram Kumar; Arends, Danny; Lin, Ke; De Vos, Ric C H; Muth, Dorota; Kodde, Jan; Boutilier, Kim; Bucher, Johan; Wang, Xiaowu; Jansen, Ritsert; Bonnema, Guusje
Brassica rapa studies towards metabolic variation have largely been focused on the profiling of the diversity of metabolic compounds in specific crop types or regional varieties, but none aimed to identify genes with regulatory function in metabolite composition. Here we followed a genetical genomics approach to identify regulatory genes for six biosynthetic pathways of health-related phytochemicals, i.e carotenoids, tocopherols, folates, glucosinolates, flavonoids and phenylpropanoids. Leaves from six weeks-old plants of a Brassica rapa doubled haploid population, consisting of 92 genotypes, were profiled for their secondary metabolite composition, using both targeted and LC-MS-based untargeted metabolomics approaches. Furthermore, the same population was profiled for transcript variation using a microarray containing EST sequences mainly derived from three Brassica species: B. napus, B. rapa and B. oleracea. The biochemical pathway analysis was based on the network analyses of both metabolite QTLs (mQTLs) and transcript QTLs (eQTLs). Co-localization of mQTLs and eQTLs lead to the identification of candidate regulatory genes involved in the biosynthesis of carotenoids, tocopherols and glucosinolates. We subsequently focused on the well-characterized glucosinolate pathway and revealed two hotspots of co-localization of eQTLs with mQTLs in linkage groups A03 and A09. Our results indicate that such a large-scale genetical genomics approach combining transcriptomics and metabolomics data can provide new insights into the genetic regulation of metabolite composition of Brassica vegetables.
Bérenguier, D.; Chaouiya, C.; Monteiro, P. T.; Naldi, A.; Remy, E.; Thieffry, D.; Tichit, L.
The dynamical analysis of large biological regulatory networks requires the development of scalable methods for mathematical modeling. Following the approach initially introduced by Thomas, we formalize the interactions between the components of a network in terms of discrete variables, functions, and parameters. Model simulations result in directed graphs, called state transition graphs. We are particularly interested in reachability properties and asymptotic behaviors, which correspond to terminal strongly connected components (or "attractors") in the state transition graph. A well-known problem is the exponential increase of the size of state transition graphs with the number of network components, in particular when using the biologically realistic asynchronous updating assumption. To address this problem, we have developed several complementary methods enabling the analysis of the behavior of large and complex logical models: (i) the definition of transition priority classes to simplify the dynamics; (ii) a model reduction method preserving essential dynamical properties, (iii) a novel algorithm to compact state transition graphs and directly generate compressed representations, emphasizing relevant transient and asymptotic dynamical properties. The power of an approach combining these different methods is demonstrated by applying them to a recent multilevel logical model for the network controlling CD4+ T helper cell response to antigen presentation and to a dozen cytokines. This model accounts for the differentiation of canonical Th1 and Th2 lymphocytes, as well as of inflammatory Th17 and regulatory T cells, along with many hybrid subtypes. All these methods have been implemented into the software GINsim, which enables the definition, the analysis, and the simulation of logical regulatory graphs.
Bérenguier, D; Chaouiya, C; Monteiro, P T; Naldi, A; Remy, E; Thieffry, D; Tichit, L
The dynamical analysis of large biological regulatory networks requires the development of scalable methods for mathematical modeling. Following the approach initially introduced by Thomas, we formalize the interactions between the components of a network in terms of discrete variables, functions, and parameters. Model simulations result in directed graphs, called state transition graphs. We are particularly interested in reachability properties and asymptotic behaviors, which correspond to terminal strongly connected components (or "attractors") in the state transition graph. A well-known problem is the exponential increase of the size of state transition graphs with the number of network components, in particular when using the biologically realistic asynchronous updating assumption. To address this problem, we have developed several complementary methods enabling the analysis of the behavior of large and complex logical models: (i) the definition of transition priority classes to simplify the dynamics; (ii) a model reduction method preserving essential dynamical properties, (iii) a novel algorithm to compact state transition graphs and directly generate compressed representations, emphasizing relevant transient and asymptotic dynamical properties. The power of an approach combining these different methods is demonstrated by applying them to a recent multilevel logical model for the network controlling CD4+ T helper cell response to antigen presentation and to a dozen cytokines. This model accounts for the differentiation of canonical Th1 and Th2 lymphocytes, as well as of inflammatory Th17 and regulatory T cells, along with many hybrid subtypes. All these methods have been implemented into the software GINsim, which enables the definition, the analysis, and the simulation of logical regulatory graphs.
Meng, Yijun; Shao, Chaogang; Chen, Ming
Current achievements in plant microRNA (miRNA) research area are inspiring. Molecular cloning and functional elucidation have greatly advanced our understanding of this small RNA species. As one of the ultimate goals, many research efforts devoted to draw a comprehensive view of miRNA-mediated gene regulatory networks in plants. Numerous bioinformatics tools competent for network analysis have been available. However, the most important point for network construction is to obtain reliable analytical results based on sufficient experimental data. Here, we introduced a general workflow to retrieve and analyze the desired data sets that serve as the cornerstones for network construction. For the upstream analyses of miRNA genes, the sequence feature of miRNA promoters should be characterized. And, regulatory relationships between transcription factors (TFs) and miRNA genes need to be investigated. For the downstream part, we emphasized that the high-throughput degradome sequencing data were especially useful for genuine miRNA-target pair identification. Functional characterization of the miRNA targets is essential to provide deep biological insights into certain miRNA-mediated pathways. For miRNAs themselves, studies on their organ- or tissue-specific expression patterns and the mechanism of self-regulation were discussed. Besides, exhaustive literature mining is required to further support or improve the established networks. It is desired that the introduced framework for miRNA-mediated network construction is timely and useful and could inspire more research efforts in the miRNA research area.
Goossen, Emray R.; Buster, Duke A.
Over the years, avionics systems have increased in complexity to the point where 1st tier suppliers to an aircraft OEM find it financially beneficial to outsource designs of subsystems to 2nd tier and at times to 3rd tier suppliers. Combined with challenging schedule and budgetary pressures, the environment in which safety-critical systems are being developed introduces new hurdles for regulatory agencies and industry. This new environment of both complex systems and tiered development has raised concerns in the ability of the designers to ensure safety considerations are fully addressed throughout the tier levels. This has also raised questions about the sufficiency of current regulatory guidance to ensure: proper flow down of safety awareness, avionics application understanding at the lower tiers, OEM and 1st tier oversight practices, and capabilities of lower tier suppliers. Therefore, NASA established a research project to address Regulatory Compliance in a Multi-tier Supplier Network. This research was divided into three major study efforts: 1. Describe Modern Multi-tier Avionics Development 2. Identify Current Issues in Achieving Safety and Regulatory Compliance 3. Short-term/Long-term Recommendations Toward Higher Assurance Confidence This report presents our findings of the risks, weaknesses, and our recommendations. It also includes a collection of industry-identified risks, an assessment of guideline weaknesses related to multi-tier development of complex avionics systems, and a postulation of potential modifications to guidelines to close the identified risks and weaknesses.
Smedley, Damian; Schubach, Max; Jacobsen, Julius O B; Köhler, Sebastian; Zemojtel, Tomasz; Spielmann, Malte; Jäger, Marten; Hochheiser, Harry; Washington, Nicole L; McMurry, Julie A; Haendel, Melissa A; Mungall, Christopher J; Lewis, Suzanna E; Groza, Tudor; Valentini, Giorgio; Robinson, Peter N
The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease. Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
McClure, Ryan S.; Overall, Christopher C.; McDermott, Jason E.; Hill, Eric A.; Markillie, Lye Meng; McCue, Lee Ann; Taylor, Ronald C.; Ludwig, Marcus; Bryant, Donald A.; Beliaev, Alexander S.
Cyanobacterial regulation of gene expression must contend with a genome organization that lacks apparent functional context, as the majority of cellular processes and metabolic pathways are encoded by genes found at disparate locations across the genome. In addition, the fact that coordinated regulation of cyanobacterial cellular machinery takes place with significantly fewer transcription factors, compared to other Eubacteria, suggests the involvement of post-transcriptional mechanisms and regulatory adaptations which are not fully understood. Global transcript abundance from model cyanobacterium Synechococcus sp. PCC 7002 grown under 42 different conditions was analyzed using context-likelihood of relatedness. The resulting 903-gene network, which was organized into 11 modules, not only allowed classification of cyanobacterial responses to specific environmental variables but provided insight into the transcriptional network topology and led to the expansion of predicted regulons. When used in conjunction with genome sequence, the global transcript abundance allowed identification of putative post-transcriptional changes in expression as well as novel potential targets of both DNA binding proteins and asRNA regulators. The results offer a new perspective into the multi-level regulation that governs cellular adaptations of fast-growing physiologically robust cyanobacterium Synechococcus sp. PCC 7002 to changing environmental variables. It also extends a methodological knowledge-based framework for studying multi-scale regulatory mechanisms that operate in cyanobacteria. Finally, it provides valuable context for integrating systems-level data to enhance evidence-driven genomic annotation, especially in organisms where traditional context analyses cannot be implemented due to lack of operon-based functional organization.
Full Text Available Abstract Background Inflammation is a hallmark of many human diseases. Elucidating the mechanisms underlying systemic inflammation has long been an important topic in basic and clinical research. When primary pathogenetic events remains unclear due to its immense complexity, construction and analysis of the gene regulatory network of inflammation at times becomes the best way to understand the detrimental effects of disease. However, it is difficult to recognize and evaluate relevant biological processes from the huge quantities of experimental data. It is hence appealing to find an algorithm which can generate a gene regulatory network of systemic inflammation from high-throughput genomic studies of human diseases. Such network will be essential for us to extract valuable information from the complex and chaotic network under diseased conditions. Results In this study, we construct a gene regulatory network of inflammation using data extracted from the Ensembl and JASPAR databases. We also integrate and apply a number of systematic algorithms like cross correlation threshold, maximum likelihood estimation method and Akaike Information Criterion (AIC on time-lapsed microarray data to refine the genome-wide transcriptional regulatory network in response to bacterial endotoxins in the context of dynamic activated genes, which are regulated by transcription factors (TFs such as NF-κB. This systematic approach is used to investigate the stochastic interaction represented by the dynamic leukocyte gene expression profiles of human subject exposed to an inflammatory stimulus (bacterial endotoxin. Based on the kinetic parameters of the dynamic gene regulatory network, we identify important properties (such as susceptibility to infection of the immune system, which may be useful for translational research. Finally, robustness of the inflammatory gene network is also inferred by analyzing the hubs and "weak ties" structures of the gene network
Tong, Pin; Monahan, Jack; Prendergast, James G D
Large-scale gene expression datasets are providing an increasing understanding of the location of cis-eQTLs in the human genome and their role in disease. However, little is currently known regarding the extent of regulatory site-sharing between genes. This is despite it having potentially wide-ranging implications, from the determination of the way in which genetic variants may shape multiple phenotypes to the understanding of the evolution of human gene order. By first identifying the location of non-redundant cis-eQTLs, we show that regulatory site-sharing is a relatively common phenomenon in the human genome, with over 10% of non-redundant regulatory variants linked to the expression of multiple nearby genes. We show that these shared, local regulatory sites are linked to high levels of chromatin looping between the regulatory sites and their associated genes. In addition, these co-regulated gene modules are found to be strongly conserved across mammalian species, suggesting that shared regulatory sites have played an important role in shaping human gene order. The association of these shared cis-eQTLs with multiple genes means they also appear to be unusually important in understanding the genetics of human phenotypes and pleiotropy, with shared regulatory sites more often linked to multiple human phenotypes than other regulatory variants. This study shows that regulatory site-sharing is likely an underappreciated aspect of gene regulation and has important implications for the understanding of various biological phenomena, including how the two and three dimensional structures of the genome have been shaped and the potential causes of disease pleiotropy outside coding regions.
Knabe, Johannes F
Genetic Regulatory Networks (GRNs) in biological organisms are primary engines for cells to enact their engagements with environments, via incessant, continually active coupling. In differentiated multicellular organisms, tremendous complexity has arisen in the course of evolution of life on earth. Engineering and science have so far achieved no working system that can compare with this complexity, depth and scope of organization. Abstracting the dynamics of genetic regulatory control to a computational framework in which artificial GRNs in artificial simulated cells differentiate while connected in a changing topology, it is possible to apply Darwinian evolution in silico to study the capacity of such developmental/differentiated GRNs to evolve. In this volume an evolutionary GRN paradigm is investigated for its evolvability and robustness in models of biological clocks, in simple differentiated multicellularity, and in evolving artificial developing 'organisms' which grow and express an ontogeny starting fr...
Serna, Laura; Martin, Cathie
Sometimes, proteins, biological structures or even organisms have similar functions and appearances but have evolved through widely divergent pathways. There is experimental evidence to suggest that different developmental pathways have converged to produce similar outgrowths of the aerial plant epidermis, referred to as trichomes. The emerging picture suggests that trichomes in Arabidopsis thaliana and, perhaps, in cotton develop through a transcriptional regulatory network that differs from those regulating trichome formation in Antirrhinum and Solanaceous species. Several lines of evidence suggest that the duplication of a gene controlling anthocyanin production and subsequent divergence might be the major force driving trichome formation in Arabidopsis, whereas the multicellular trichomes of Antirrhinum and Solanaceous species appear to have a different regulatory origin.
Hinman, Veronica F; Yankura, Kristen A; McCauley, Brenna S
Developmental gene regulatory networks (GRNs) explain how regulatory states are established in particular cells during development and how these states then determine the final form of the embryo. Evolutionary changes to the sequence of the genome will direct reorganization of GRN architectures, which in turn will lead to the alteration of developmental programs. A comparison of GRN architectures must consequently reveal the molecular basis for the evolution of developmental programs among different organisms. This review highlights some of the important findings that have emerged from the most extensive direct comparison of GRN architectures to date. Comparison of the orthologous GRNs for endomesodermal specification in the sea urchin and sea star, provides examples of several discrete, functional GRN subcircuits and shows that they are subject to diverse selective pressures. This demonstrates that different regulatory linkages may be more or less amenable to evolutionary change. One of the more surprising findings from this comparison is that GRN-level functions may be maintained while the factors performing the functions have changed, suggesting that GRNs have a high capacity for compensatory changes involving transcription factor binding to cis regulatory modules.
Katherine M. Buckley
Full Text Available The gut epithelium is an ancient site of complex communication between the animal immune system and the microbial world. While elements of self-non-self receptors and effector mechanisms differ greatly among animal phyla, some aspects of recognition, regulation, and response are broadly conserved. A gene regulatory network (GRN approach provides a means to investigate the nature of this conservation and divergence even as more peripheral functional details remain incompletely understood. The sea urchin embryo is an unparalleled experimental model for detangling the GRNs that govern embryonic development. By applying this theoretical framework to the free swimming, feeding larval stage of the purple sea urchin, it is possible to delineate the conserved regulatory circuitry that regulates the gut-associated immune response. This model provides a morphologically simple system in which to efficiently unravel regulatory connections that are phylogenetically relevant to immunity in vertebrates. Here, we review the organism-wide cellular and transcriptional immune response of the sea urchin larva. A large set of transcription factors and signal systems, including epithelial expression of interleukin 17 (IL17, are important mediators in the activation of the early gut-associated response. Many of these have homologs that are active in vertebrate immunity, while others are ancient in animals but absent in vertebrates or specific to echinoderms. This larval model provides a means to experimentally characterize immune function encoded in the sea urchin genome and the regulatory interconnections that control immune response and resolution across the tissues of the organism.
Le Borgne Michel
Full Text Available Abstract Background Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays. Results We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of E. coli extracted from the literature (1529 nodes and 3802 edges, and we estimate that 30% of the regulations can be annotated with about 30 profiles. We additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to S. cerevisiae transcriptional network (2419 nodes and 4344 interactions, by combining ChIP-chip data and 15 expression profiles. We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions. In addition, we report predictions for 14.5% of all interactions. Conclusion Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data, real and artificial, that a relatively small number of perturbation experiments are enough to determine
Adabor, Emmanuel S; Acquaah-Mensah, George K; Oduro, Francis T
Bayesian Networks have been used for the inference of transcriptional regulatory relationships among genes, and are valuable for obtaining biological insights. However, finding optimal Bayesian Network (BN) is NP-hard. Thus, heuristic approaches have sought to effectively solve this problem. In this work, we develop a hybrid search method combining Simulated Annealing with a Greedy Algorithm (SAGA). SAGA explores most of the search space by undergoing a two-phase search: first with a Simulated Annealing search and then with a Greedy search. Three sets of background-corrected and normalized microarray datasets were used to test the algorithm. BN structure learning was also conducted using the datasets, and other established search methods as implemented in BANJO (Bayesian Network Inference with Java Objects). The Bayesian Dirichlet Equivalence (BDe) metric was used to score the networks produced with SAGA. SAGA predicted transcriptional regulatory relationships among genes in networks that evaluated to higher BDe scores with high sensitivities and specificities. Thus, the proposed method competes well with existing search algorithms for Bayesian Network structure learning of transcriptional regulatory networks. Copyright © 2014 Elsevier Inc. All rights reserved.
Full Text Available Circadian rhythm is fundamental in regulating a wide range of cellular, metabolic, physiological, and behavioral activities in mammals. Although a small number of key circadian genes have been identified through extensive molecular and genetic studies in the past, the existence of other key circadian genes and how they drive the genomewide circadian oscillation of gene expression in different tissues still remains unknown. Here we try to address these questions by integrating all available circadian microarray data in mammals. We identified 41 common circadian genes that showed circadian oscillation in a wide range of mouse tissues with a remarkable consistency of circadian phases across tissues. Comparisons across mouse, rat, rhesus macaque, and human showed that the circadian phases of known key circadian genes were delayed for 4-5 hours in rat compared to mouse and 8-12 hours in macaque and human compared to mouse. A systematic gene regulatory network for the mouse circadian rhythm was constructed after incorporating promoter analysis and transcription factor knockout or mutant microarray data. We observed the significant association of cis-regulatory elements: EBOX, DBOX, RRE, and HSE with the different phases of circadian oscillating genes. The analysis of the network structure revealed the paths through which light, food, and heat can entrain the circadian clock and identified that NR3C1 and FKBP/HSP90 complexes are central to the control of circadian genes through diverse environmental signals. Our study improves our understanding of the structure, design principle, and evolution of gene regulatory networks involved in the mammalian circadian rhythm.
Background Plant secondary metabolites are critical to various biological processes. However, the regulations of these metabolites are complex because of regulatory rewiring or crosstalk. To unveil how regulatory behaviors on secondary metabolism reshape biological processes, we constructed and analyzed a dynamic regulatory network of secondary metabolic pathways in Arabidopsis. Results The dynamic regulatory network was constructed through integrating co-expressed gene pairs and regulatory interactions. Regulatory interactions were either predicted by conserved transcription factor binding sites (TFBSs) or proved by experiments. We found that integrating two data (co-expression and predicted regulatory interactions) enhanced the number of highly confident regulatory interactions by over 10% compared with using single data. The dynamic changes of regulatory network systematically manifested regulatory rewiring to explain the mechanism of regulation, such as in terpenoids metabolism, the regulatory crosstalk of RAV1 (AT1G13260) and ATHB1 (AT3G01470) on HMG1 (hydroxymethylglutaryl-CoA reductase, AT1G76490); and regulation of RAV1 on epoxysqualene biosynthesis and sterol biosynthesis. Besides, we investigated regulatory rewiring with expression, network topology and upstream signaling pathways. Regulatory rewiring was revealed by the variability of genes’ expression: pathway genes and transcription factors (TFs) were significantly differentially expressed under different conditions (such as terpenoids biosynthetic genes in tissue experiments and E2F/DP family members in genotype experiments). Both network topology and signaling pathways supported regulatory rewiring. For example, we discovered correlation among the numbers of pathway genes, TFs and network topology: one-gene pathways (such as δ-carotene biosynthesis) were regulated by a fewer TFs, and were not critical to metabolic network because of their low degrees in topology. Upstream signaling pathways of 50
Full Text Available The mammalian genome is packed tightly in the nucleus of the cell. This packing is primarily facilitated by histone proteins and results in an ordered organization of the genome in chromosome territories that can be roughly divided in heterochromatic and euchromatic domains. On top of this organization several distinct gene regulatory elements on the same chromosome or other chromosomes are thought to dynamically communicate via chromatin looping. Advances in genome-wide technologies have revealed the existence of a plethora of these regulatory elements in various eukaryotic genomes. These regulatory elements are defined by particular in vitro assays as promoters, enhancers, insulators and boundary elements. However, recent studies indicate that the in vivo distinction between these elements is often less strict. Regulatory elements are bound by a mixture of common and lineage specific transcription factors which mediate the long-range interactions between these elements. Inappropriate modulation of the binding of these transcription factors can alter the interactions between regulatory elements, which in turn leads to aberrant gene expression with disease as an ultimate consequence. Here we discuss the bi-modal behavior of regulatory elements that act in cis (with a focus on enhancers, how their activity is modulated by transcription factor binding and the effect this has on gene regulation.
Yi Kan Wang
Full Text Available We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data.
de Luis Balaguer, Maria Angels; Sozzani, Rosangela
Gene regulatory network (GRN) models have been shown to predict and represent interactions among sets of genes. Here, we first show the basic steps to implement a simple but computationally efficient algorithm to infer GRNs based on dynamic Bayesian networks (DBNs), and we then explain how to approximate DBN-based GRN models with continuous models. In addition, we show a MATLAB implementation of the key steps of this method, which we use to infer an Arabidopsis root GRN.
Mahadevan, Radhakrishnan; von Kamp, Axel; Klamt, Steffen
Stoichiometric and constraint-based methods of computational strain design have become an important tool for rational metabolic engineering. One of those relies on the concept of constrained minimal cut sets (cMCSs). However, as most other techniques, cMCSs may consider only reaction (or gene) knockouts to achieve a desired phenotype. We generalize the cMCSs approach to constrained regulatory MCSs (cRegMCSs), where up/downregulation of reaction rates can be combined along with reaction deletions. We show that flux up/downregulations can virtually be treated as cuts allowing their direct integration into the algorithmic framework of cMCSs. Because of vastly enlarged search spaces in genome-scale networks, we developed strategies to (optionally) preselect suitable candidates for flux regulation and novel algorithmic techniques to further enhance efficiency and speed of cMCSs calculation. We illustrate the cRegMCSs approach by a simple example network and apply it then by identifying strain designs for ethanol production in a genome-scale metabolic model of Escherichia coli. The results clearly show that cRegMCSs combining reaction deletions and flux regulations provide a much larger number of suitable strain designs, many of which are significantly smaller relative to cMCSs involving only knockouts. Furthermore, with cRegMCSs, one may also enable the fine tuning of desired behaviours in a narrower range. The new cRegMCSs approach may thus accelerate the implementation of model-based strain designs for the bio-based production of fuels and chemicals. MATLAB code and the examples can be downloaded at http://www.mpi-magdeburg.mpg.de/projects/cna/etcdownloads.html. email@example.com or firstname.lastname@example.org Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com.
We present a computational method in which modular and Groebner bases (GB) computation in Boolean rings are used for solving problems in Boolean gene regulatory networks (BN). In contrast to other known algebraic approaches, the degree of intermediate polynomials during the calculation of Groebner bases using our method will never grow resulting in a significant improvement in running time and memory space consumption. We also show how calculation in temporal logic for model checking can be done by means of our direct and efficient Groebner basis computation in Boolean rings. We present our experimental results in finding attractors and control strategies of Boolean networks to illustrate our theoretical arguments. The results are promising. Our algebraic approach is more efficient than the state-of-the-art model checker NuSMV on BNs. More importantly, our approach finds all solutions for the BN problems.
Full Text Available A key objective of gene network modeling is to develop intervention strategies to alter regulatory dynamics in such a way as to reduce the likelihood of undesirable phenotypes. Optimal stationary intervention policies have been developed for gene regulation in the framework of probabilistic Boolean networks in a number of settings. To mitigate the possibility of detrimental side effects, for instance, in the treatment of cancer, it may be desirable to limit the expected number of treatments beneath some bound. This paper formulates a general constraint approach for optimal therapeutic intervention by suitably adapting the reward function and then applies this formulation to bound the expected number of treatments. A mutated mammalian cell cycle is considered as a case study.
Full Text Available Horizontal gene transfer (HGT is a major force driving bacterial evolution. Because of their ability to cross inter-species barriers, bacterial plasmids are essential agents for HGT. This ability, however, poses specific requisites on plasmid physiology, in particular the need to overcome a multilevel selection process with opposing demands. We analyzed the transcriptional network of plasmid R388, one of the most promiscuous plasmids in Proteobacteria. Transcriptional analysis by fluorescence expression profiling and quantitative PCR revealed a regulatory network controlled by six transcriptional repressors. The regulatory network relied on strong promoters, which were tightly repressed in negative feedback loops. Computational simulations and theoretical analysis indicated that this architecture would show a transcriptional burst after plasmid conjugation, linking the magnitude of the feedback gain with the intensity of the transcriptional burst. Experimental analysis showed that transcriptional overshooting occurred when the plasmid invaded a new population of susceptible cells. We propose that transcriptional overshooting allows genome rebooting after horizontal gene transfer, and might have an adaptive role in overcoming the opposing demands of multilevel selection.
Ahituv, Nadav; Prabhakar, Shyam; Poulin, Francis; Rubin, EdwardM.; Couronne, Olivier
Our inability to associate distant regulatory elements with the genes that they regulate has largely precluded their examination for sequence alterations contributing to human disease. One major obstacle is the large genomic space surrounding targeted genes in which such elements could potentially reside. In order to delineate gene regulatory boundaries we used whole-genome human-mouse-chicken (HMC) and human-mouse-frog (HMF) multiple alignments to compile conserved blocks of synteny (CBS), under the hypothesis that these blocks have been kept intact throughout evolution at least in part by the requirement of regulatory elements to stay linked to the genes that they regulate. A total of 2,116 and 1,942 CBS>200 kb were assembled for HMC and HMF respectively, encompassing 1.53 and 0.86 Gb of human sequence. To support the existence of complex long-range regulatory domains within these CBS we analyzed the prevalence and distribution of chromosomal aberrations leading to position effects (disruption of a genes regulatory environment), observing a clear bias not only for mapping onto CBS but also for longer CBS size. Our results provide a genome wide data set characterizing the regulatory domains of genes and the conserved regulatory elements within them.
Balasubramanian, Deepak; Schneper, Lisa; Kumari, Hansi; Mathee, Kalai
Pseudomonas aeruginosa is a metabolically versatile bacterium that is found in a wide range of biotic and abiotic habitats. It is a major human opportunistic pathogen causing numerous acute and chronic infections. The critical traits contributing to the pathogenic potential of P. aeruginosa are the production of a myriad of virulence factors, formation of biofilms and antibiotic resistance. Expression of these traits is under stringent regulation, and it responds to largely unidentified environmental signals. This review is focused on providing a global picture of virulence gene regulation in P. aeruginosa. In addition to key regulatory pathways that control the transition from acute to chronic infection phenotypes, some regulators have been identified that modulate multiple virulence mechanisms. Despite of a propensity for chaotic behaviour, no chaotic motifs were readily observed in the P. aeruginosa virulence regulatory network. Having a ‘birds-eye’ view of the regulatory cascades provides the forum opportunities to pose questions, formulate hypotheses and evaluate theories in elucidating P. aeruginosa pathogenesis. Understanding the mechanisms involved in making P. aeruginosa a successful pathogen is essential in helping devise control strategies. PMID:23143271
Hill, Jonathon T; Demarest, Bradley; Gorsi, Bushra; Smith, Megan; Yost, H Joseph
During embryogenesis the heart forms as a linear tube that then undergoes multiple simultaneous morphogenetic events to obtain its mature shape. To understand the gene regulatory networks (GRNs) driving this phase of heart development, during which many congenital heart disease malformations likely arise, we conducted an RNA-seq timecourse in zebrafish from 30 hpf to 72 hpf and identified 5861 genes with altered expression. We clustered the genes by temporal expression pattern, identified transcription factor binding motifs enriched in each cluster, and generated a model GRN for the major gene batteries in heart morphogenesis. This approach predicted hundreds of regulatory interactions and found batteries enriched in specific cell and tissue types, indicating that the approach can be used to narrow the search for novel genetic markers and regulatory interactions. Subsequent analyses confirmed the GRN using two mutants, Tbx5 and nkx2-5, and identified sets of duplicated zebrafish genes that do not show temporal subfunctionalization. This dataset provides an essential resource for future studies on the genetic/epigenetic pathways implicated in congenital heart defects and the mechanisms of cardiac transcriptional regulation. © 2017. Published by The Company of Biologists Ltd.
Stephen B Montgomery
Full Text Available Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.
Nikolaev, Lev G; Akopov, Sergey B; Chernov, Igor P; Sverdlov, Eugene D
The availability of complete human and other metazoan genome sequences has greatly facilitated positioning and analysis of various genomic functional elements, with initial emphasis on coding sequences. However, complete functional maps of sequenced eukaryotic genomes should include also positions of all non-coding regulatory elements. Unfortunately, experimental data on genomic positions of a multitude of regulatory sequences, such as enhancers, silencers, insulators, transcription terminators, and replication origins are very limited, especially at the whole genome level. Since most genomic regulatory elements (e.g. enhancers) are generally gene-, tissue-, or cell-specific, the prediction of these elements by computational methods is difficult and often ambiguous. Therefore, the development of high-throughput experimental approaches for identifying and mapping genomic functional elements is highly desirable. At the same time, the creation of whole-genome map of hundreds of thousands of regulatory elements in several hundreds of tissue/cell types is presently far beyond our capabilities. A possible alternative for the whole genome approach is to concentrate efforts on individual genomic segments and then to integrate the data obtained into a whole genome functional map. Moreover, the maps of polygenic fragments with functional cis-regulatory elements would provide valuable data on complex regulatory systems, including their variability and evolution. Here, we reviewed experimental approaches to the realization of these ideas, including our own developments of experimental techniques for selection of cis-acting functionally active DNA fragments from large (megabase-sized) segments of mammalian genomes.
Hinman, Veronica F; Cheatle Jarvela, Alys M
One of the central concerns of Evolutionary Developmental biology is to understand how the specification of cell types can change during evolution. In the last decade, developmental biology has progressed toward a systems level understanding of cell specification processes. In particular, the focus has been on determining the regulatory interactions of the repertoire of genes that make up gene regulatory networks (GRNs). Echinoderms provide an extraordinary model system for determining how GRNs evolve. This review highlights the comparative GRN analyses arising from the echinoderm system. This work shows that certain types of GRN subcircuits or motifs, i.e., those involving positive feedback, tend to be conserved and may provide a constraint on development. This conservation may be due to a required arrangement of transcription factor binding sites in cis regulatory modules. The review will also discuss ways in which novelty may arise, in particular through the co-option of regulatory genes and subcircuits. The development of the sea urchin larval skeleton, a novel feature that arose in echinoderms, has provided a model for study of co-option mechanisms. Finally, the types of GRNs that can permit the great diversity in the patterns of ciliary bands and their associated neurons found among these taxa are discussed. The availability of genomic resources is rapidly expanding for echinoderms, including genome sequences not only for multiple species of sea urchins but also a species of sea star, sea cucumber, and brittle star. This will enable echinoderms to become a particularly powerful system for understanding how developmental GRNs evolve. Copyright © 2014 Wiley Periodicals, Inc.
ABSTRACT: BACKGROUND: The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very difficult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insufficient. RESULTS: This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and offer a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. CONCLUSIONS: Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identified and a platform for development of appropriate model formalisms is established.
Carré, Clément; Mas, André; Krouk, Gabriel
Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology, with potential impacts ranging from medicine to agronomy. There are several techniques used presently to experimentally assay transcription factors to target relationships, defining important information about real gene regulatory networks connections. These techniques include classical ChIP-seq, yeast one-hybrid, or more recently, DAP-seq or target technologies. These techniques are usually used to validate algorithm predictions. Here, we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms. First, we developed a gene regulatory networks-simulating engine called FRANK (Fast Randomizing Algorithm for Network Knowledge) that is able to simulate large gene regulatory networks (containing 10(4) genes) with characteristics of gene regulatory networks observed in vivo. FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks. The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties (built ex nihilo). In combination with supervised (accepting prior knowledge) support vector machine algorithm we (i) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning, and (ii) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future. By demonstrating that our predictions concerning the influence of the prior-knowledge structure on support vector machine learning capacity holds true on real data (Escherichia coli K14 network
Voytas, Daniel F.; Caixia Gao
Plant agriculture is poised at a technological inflection point. Recent advances in genome engineering make it possible to precisely alter DNA sequences in living cells, providing unprecedented control over a plant's genetic material. Potential future crops derived through genome engineering include those that better withstand pests, that have enhanced nutritional value, and that are able to grow on marginal lands. In many instances, crops with such traits will be created by altering only a f...
Full Text Available The coordinated expression of the different genes in an organism is essential to sustain functionality under the random external perturbations to which the organism might be subjected. To cope with such external variability, the global dynamics of the genetic network must possess two central properties. (a It must be robust enough as to guarantee stability under a broad range of external conditions, and (b it must be flexible enough to recognize and integrate specific external signals that may help the organism to change and adapt to different environments. This compromise between robustness and adaptability has been observed in dynamical systems operating at the brink of a phase transition between order and chaos. Such systems are termed critical. Thus, criticality, a precise, measurable, and well characterized property of dynamical systems, makes it possible for robustness and adaptability to coexist in living organisms. In this work we investigate the dynamical properties of the gene transcription networks reported for S. cerevisiae, E. coli, and B. subtilis, as well as the network of segment polarity genes of D. melanogaster, and the network of flower development of A. thaliana. We use hundreds of microarray experiments to infer the nature of the regulatory interactions among genes, and implement these data into the Boolean models of the genetic networks. Our results show that, to the best of the current experimental data available, the five networks under study indeed operate close to criticality. The generality of this result suggests that criticality at the genetic level might constitute a fundamental evolutionary mechanism that generates the great diversity of dynamically robust living forms that we observe around us.
Functional annotations of large plant genome projects mostly provide information on gene function and gene families based on the presence of protein domains and gene homology, but not necessarily in association with gene expression or metabolic and regulatory networks. These additional annotations a...
Österlund, Tobias; Bordel, Sergio; Nielsen, Jens
we analyze the topology and organization of nine transcriptional regulatory networks for E. coli, yeast, mouse and human, and we evaluate how the structure of these networks influences two of their key properties, namely controllability and stability. We calculate the controllability for each network......% for the human network. The high controllability (low number of drivers needed to control the system) in yeast, mouse and human is due to the presence of internal loops in their regulatory networks where the TFs regulate each other in a circular fashion. We refer to these internal loops as circular control...... motifs (CCM). The E. coli transcriptional regulatory network, which does not have any CCMs, shows a hierarchical structure of the transcriptional regulatory network in contrast to the eukaryal networks. The presence of CCMs also has influence on the stability of these networks, as the presence of cycles...
Full Text Available Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of
Full Text Available We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.
Dong, Xianjun; Navratilova, Pavla; Fredman, David; Drivenes, Øyvind; Becker, Thomas S; Lenhard, Boris
Using a comparative genomics approach to reconstruct the fate of genomic regulatory blocks (GRBs) and identify exonic remnants that have survived the disappearance of their host genes after whole-genome duplication (WGD) in teleosts, we discover a set of 38 candidate cis-regulatory coding exons (RCEs) with predicted target genes. These elements demonstrate evolutionary separation of overlapping protein-coding and regulatory information after WGD in teleosts. We present evidence that the corresponding mammalian exons are still under both coding and non-coding selection pressure, are more conserved than other protein coding exons in the host gene and several control sets, and share key characteristics with highly conserved non-coding elements in the same regions. Their dual function is corroborated by existing experimental data. Additionally, we show examples of human exon remnants stemming from the vertebrate 2R WGD. Our findings suggest that long-range cis-regulatory inputs for developmental genes are not limited to non-coding regions, but can also overlap the coding sequence of unrelated genes. Thus, exonic regulatory elements in GRBs might be functionally equivalent to those in non-coding regions, calling for a re-evaluation of the sequence space in which to look for long-range regulatory elements and experimentally test their activity.
ABSTRACT The global agricultural landscape regarding the commercial cultivation of genetically modified (GM) crops is mosaic. Meanwhile, a new plant breeding technique, genome editing is expected to make genetic engineering-mediated crop breeding more socially acceptable because it can be used to develop crop varieties without introducing transgenes, which have hampered the regulatory review and public acceptance of GM crops. The present study revealed that product- and process-based concepts have been implemented to regulate GM crops in 30 countries. Moreover, this study analyzed the regulatory responses to genome-edited crops in the USA, Argentina, Sweden and New Zealand. The findings suggested that countries will likely be divided in their policies on genome-edited crops: Some will deregulate transgene-free crops, while others will regulate all types of crops that have been modified by genome editing. These implications are discussed from the viewpoint of public acceptance. PMID:27960622
Ishii, Tetsuya; Araki, Motoko
The global agricultural landscape regarding the commercial cultivation of genetically modified (GM) crops is mosaic. Meanwhile, a new plant breeding technique, genome editing is expected to make genetic engineering-mediated crop breeding more socially acceptable because it can be used to develop crop varieties without introducing transgenes, which have hampered the regulatory review and public acceptance of GM crops. The present study revealed that product- and process-based concepts have been implemented to regulate GM crops in 30 countries. Moreover, this study analyzed the regulatory responses to genome-edited crops in the USA, Argentina, Sweden and New Zealand. The findings suggested that countries will likely be divided in their policies on genome-edited crops: Some will deregulate transgene-free crops, while others will regulate all types of crops that have been modified by genome editing. These implications are discussed from the viewpoint of public acceptance.
Full Text Available The recognition of a positive correlation between organism genome size with its transposable element (TE content, represents a key discovery of the field of genome biology. Considerable evidence accumulated since then suggests the involvement of TEs in genome structure, evolution and function. The global genome reorganization brought about by transposon activity might play an adaptive/regulatory role in the host response to environmental challenges, reminiscent of McClintock’s original ’Controlling Element’ hypothesis. This regulatory aspect of TEs is also garnering support in light of the recent evidences which project TEs as distributed genomic control modules. According to this view, TEs are capable of actively reprogramming host genes circuits and ultimately fine-tuning the host response to specific environmental stimuli. Moreover, the stress-induced changes in epigenetic status of TE activity may allow TEs to propagate their stress responsive elements to host genes; the resulting genome fluidity can permit phenotypic plasticity and adaptation to stress. Given their predominating presence in the plant genomes, nested organization in the genic regions and potential regulatory role in stress response, TEs hold unexplored potential for crop improvement programs. This review intends to present the current information about the roles played by TEs in plant genome organization, evolution and function, and highlight the regulatory mechanisms in plant stress responses. We will also briefly discuss the connection between TE activity, host epigenetic response and phenotypic plasticity as a critical link for traversing the translational bridge from a purely basic study of TEs, to the applied field of stress adaptation and crop improvement.
Aretz, Axel; Bernabé, Rosa R.; Calvo, Fabien; Eerola, Iiro; Hemsley, Fiona M.; Jennings, Jennifer L; Kerr, David; Klatt, Peter; Kolar, Patrik; Lane, David P; Laplace, Frank; Nettekoven, Gerd; Remacle, Jacques; Watanabe, Koichi; Matthew M. F. Yuen
The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeut...
Roukos, Dimitrios H
The post-ENCODE era shapes now a new biomedical research direction for understanding transcriptional and signaling networks driving gene expression and core cellular processes such as cell fate, survival, and apoptosis. Over the past half century, the Francis Crick 'central dogma' of single n gene/protein-phenotype (trait/disease) has defined biology, human physiology, disease, diagnostics, and drugs discovery. However, the ENCODE project and several other genomic studies using high-throughput sequencing technologies, computational strategies, and imaging techniques to visualize regulatory networks, provide evidence that transcriptional process and gene expression are regulated by highly complex dynamic molecular and signaling networks. This Focus article describes the linear experimentation-based limitations of diagnostics and therapeutics to cure advanced cancer and the need to move on from reductionist to network-based approaches. With evident a wide genomic heterogeneity, the power and challenges of next-generation sequencing (NGS) technologies to identify a patient's personal mutational landscape for tailoring the best target drugs in the individual patient are discussed. However, the available drugs are not capable of targeting aberrant signaling networks and research on functional transcriptional heterogeneity and functional genome organization is poorly understood. Therefore, the future clinical genome network medicine aiming at overcoming multiple problems in the new fields of regulatory DNA mapping, noncoding RNA, enhancer RNAs, and dynamic complexity of transcriptional circuitry are also discussed expecting in new innovation technology and strong appreciation of clinical data and evidence-based medicine. The problematic and potential solutions in the discovery of next-generation, molecular, and signaling circuitry-based biomarkers and drugs are explored. © 2013 Wiley Periodicals, Inc.
Hudson, Thomas J.; Anderson, Warwick; Aretz, Axel; Barker, Anna D.; Bell, Cindy; Bernabé, Rosa R.; Bhan, M. K.; Calvo, Fabien; Eerola, Iiro; Gerhard, Daniela S.; Guttmacher, Alan; Guyer, Mark; Hemsley, Fiona M.; Jennings, Jennifer L.; Kerr, David; Klatt, Peter; Kolar, Patrik; Kusuda, Jun; Lane, David P.; Laplace, Frank; Lu, Youyong; Nettekoven, Gerd; Ozenberger, Brad; Peterson, Jane; Rao, T. S.; Remacle, Jacques; Schafer, Alan J.; Shibata, Tatsuhiro; Stratton, Michael R.; Vockley, Joseph G.; Watanabe, Koichi; Yang, Huanming; Yuen, Matthew M. F.; Knoppers, Bartha M.; Bobrow, Martin; Cambon-Thomsen, Anne; Dressler, Lynn G.; Dyke, Stephanie O. M.; Joly, Yann; Kato, Kazuto; Kennedy, Karen L.; Nicolás, Pilar; Parker, Michael J.; Rial-Sebbag, Emmanuelle; Romeo-Casabona, Carlos M.; Shaw, Kenna M.; Wallace, Susan; Wiesner, Georgia L.; Zeps, Nikolajs; Lichter, Peter; Biankin, Andrew V.; Chabannon, Christian; Chin, Lynda; Clément, Bruno; de Alava, Enrique; Degos, Françoise; Ferguson, Martin L.; Geary, Peter; Hayes, D. Neil; Johns, Amber L.; Kasprzyk, Arek; Nakagawa, Hidewaki; Penny, Robert; Piris, Miguel A.; Sarin, Rajiv; Scarpa, Aldo; van de Vijver, Marc; Futreal, P. Andrew; Aburatani, Hiroyuki; Bayés, Mónica; Bowtell, David D. L.; Campbell, Peter J.; Estivill, Xavier; Grimmond, Sean M.; Gut, Ivo; Hirst, Martin; López-Otín, Carlos; Majumder, Partha; Marra, Marco; McPherson, John D.; Ning, Zemin; Puente, Xose S.; Ruan, Yijun; Stunnenberg, Hendrik G.; Swerdlow, Harold; Velculescu, Victor E.; Wilson, Richard K.; Xue, Hong H.; Yang, Liu; Spellman, Paul T.; Bader, Gary D.; Boutros, Paul C.; Flicek, Paul; Getz, Gad; Guigó, Roderic; Guo, Guangwu; Haussler, David; Heath, Simon; Hubbard, Tim J.; Jiang, Tao; Jones, Steven M.; Li, Qibin; López-Bigas, Nuria; Luo, Ruibang; Muthuswamy, Lakshmi; Ouellette, B. F. Francis; Pearson, John V.; Quesada, Victor; Raphael, Benjamin J.; Sander, Chris; Speed, Terence P.; Stein, Lincoln D.; Stuart, Joshua M.; Teague, Jon W.; Totoki, Yasushi; Tsunoda, Tatsuhiko; Valencia, Alfonso; Wheeler, David A.; Wu, Honglong; Zhao, Shancen; Zhou, Guangyu; Lathrop, Mark; Thomas, Gilles; Yoshida, Teruhiko; Axton, Myles; Gunter, Chris; Miller, Linda J.; Zhang, Junjun; Haider, Syed A.; Wang, Jianxin; Yung, Christina K.; Cross, Anthony; Liang, Yong; Gnaneshan, Saravanamuttu; Guberman, Jonathan; Hsu, Jack; Chalmers, Don R. C.; Hasel, Karl W.; Kaan, Terry S. H.; Lowrance, William W.; Masui, Tohru; Rodriguez, Laura Lyman; Vergely, Catherine; Cloonan, Nicole; Defazio, Anna; Eshleman, James R.; Etemadmoghadam, Dariush; Gardiner, Brooke A.; Kench, James G.; Sutherland, Robert L.; Tempero, Margaret A.; Waddell, Nicola J.; Wilson, Peter J.; Gallinger, Steve; Tsao, Ming-Sound; Shaw, Patricia A.; Petersen, Gloria M.; Mukhopadhyay, Debabrata; DePinho, Ronald A.; Thayer, Sarah; Shazand, Kamran; Beck, Timothy; Sam, Michelle; Timms, Lee; Ballin, Vanessa; Ji, Jiafu; Zhang, Xiuqing; Chen, Feng; Hu, Xueda; Yang, Qi; Tian, Geng; Zhang, Lianhai; Xing, Xiaofang; Li, Xianghong; Zhu, Zhenggang; Yu, Yingyan; Yu, Jun; Tost, Jörg; Brennan, Paul; Holcatova, Ivana; Zaridze, David; Brazma, Alvis; Egevad, Lars; Prokhortchouk, Egor; Banks, Rosamonde Elizabeth; Uhlén, Mathias; Viksna, Juris; Ponten, Fredrik; Skryabin, Konstantin; Birney, Ewan; Borg, Ake; Børresen-Dale, Anne-Lise; Caldas, Carlos; Foekens, John A.; Martin, Sancha; Reis-Filho, Jorge S.; Richardson, Andrea L.; Sotiriou, Christos; van't Veer, Laura; Birnbaum, Daniel; Blanche, Hélène; Boucher, Pascal; Boyault, Sandrine; Masson-Jacquemier, Jocelyne D.; Pauporté, Iris; Pivot, Xavier; Vincent-Salomon, Anne; Tabone, Eric; Theillet, Charles; Treilleux, Isabelle; Bioulac-Sage, Paulette; Decaens, Thomas; Franco, Dominique; Gut, Marta; Samuel, Didier; Zucman-Rossi, Jessica; Eils, Roland; Brors, Benedikt; Korbel, Jan O.; Korshunov, Andrey; Landgraf, Pablo; Lehrach, Hans; Pfister, Stefan; Radlwimmer, Bernhard; Reifenberger, Guido; Taylor, Michael D.; von Kalle, Christof; Majumder, Partha P.; Pederzoli, Paolo; Lawlor, Rita T.; Delledonne, Massimo; Bardelli, Alberto; Gress, Thomas; Klimstra, David; Zamboni, Giuseppe; Nakamura, Yusuke; Miyano, Satoru; Fujimoto, Akihiro; Campo, Elias; de Sanjosé, Silvia; Montserrat, Emili; González-Díaz, Marcos; Jares, Pedro; Himmelbaue, Heinz; Bea, Silvia; Aparicio, Samuel; Easton, Douglas F.; Collins, Francis S.; Compton, Carolyn C.; Lander, Eric S.; Burke, Wylie; Green, Anthony R.; Hamilton, Stanley R.; Kallioniemi, Olli P.; Ley, Timothy J.; Liu, Edison T.; Wainwright, Brandon J.
The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the
Full Text Available Myocardial infarction (MI is a severe coronary artery disease and a leading cause of mortality and morbidity worldwide. However, the molecular mechanisms of MI have yet to be fully elucidated. In this study, we compiled MI-related genes, MI-related microRNAs (miRNAs and known human transcription factors (TFs, and we then identified 1,232 feed-forward loops (FFLs among these miRNAs, TFs and their co-regulated target genes through integrating target prediction. By merging these FFLs, the first miRNA and TF mediated regulatory network for MI was constructed, from which four regulators (SP1, ESR1, miR-21-5p and miR-155-5p and three regulatory modules that might play crucial roles in MI were then identified. Furthermore, based on the miRNA and TF mediated regulatory network and literature survey, we proposed a pathway model for miR-21-5p, the miR-29 family and SP1 to demonstrate their potential co-regulatory mechanisms in cardiac fibrosis, apoptosis and angiogenesis. The majority of the regulatory relations in the model were confirmed by previous studies, which demonstrated the reliability and validity of this miRNA and TF mediated regulatory network. Our study will aid in deciphering the complex regulatory mechanisms involved in MI and provide putative therapeutic targets for MI.
Förster, Jochen; Famili, I.; Fu, P.
and the environment were included. A total of 708 structural open reading frames (ORFs) were accounted for in the reconstructed network, corresponding to 1035 metabolic reactions. Further, 140 reactions were included on the basis of biochemical evidence resulting in a genome-scale reconstructed metabolic network...... with Escherichia coli. The reconstructed metabolic network is the first comprehensive network for a eukaryotic organism, and it may be used as the basis for in silico analysis of phenotypic functions....
Full Text Available Abstract Background Noise has many important roles in cellular genetic regulatory functions at the nanomolar scale. At present, no good theory exists for identifying all possible mechanisms of genetic regulatory networks to attenuate the molecular noise to achieve regulatory ability or to amplify the molecular noise to randomize outcomes to the advantage of diversity. Therefore, the noise filtering of genetic regulatory network is an important topic for gene networks under intrinsic fluctuation and extrinsic noise. Results Based on stochastic dynamic regulation equation, the intrinsic fluctuation in reaction rates is modeled as a state-dependent stochastic process, which will influence the stability of gene regulatory network, especially, with low concentrations of reacting species. Then the mechanisms of genetic regulatory network to attenuate or amplify extrinsic fluctuation are revealed from the nonlinear stochastic filtering point of view. Furthermore, a simple measure of attenuation level or amplification level of extrinsic noise for genetic regulatory networks is also introduced by nonlinear robust filtering method. Based on the global linearization scheme, a convenient method is introduced to measure noise attenuation or amplification for each gene of the nonlinear stochastic regulatory network by solving a set of filtering problems, which correspond to a set of linearized stochastic regulatory networks. Finally, by the proposed methods, several simulation examples of genetic regulatory networks are given to measure their robust stability under intrinsic fluctuations, and to estimate the genes' attenuation and amplification levels under extrinsic noises. Conclusion In this study, a stochastic nonlinear dynamic model is developed for genetic regulatory networks under intrinsic fluctuation and extrinsic noise. By the method we proposed, we could determine the robust stability under intrinsic fluctuations and identify the genes that are
Gaiti, Federico; Calcino, Andrew D; Tanurdžić, Miloš; Degnan, Bernard M
Animals rely on genomic regulatory systems to direct the dynamic spatiotemporal and cell-type specific gene expression that is essential for the development and maintenance of a multicellular lifestyle. Although it is widely appreciated that these systems ultimately evolved from genomic regulatory mechanisms present in single-celled stem metazoans, it remains unclear how this occurred. Here, we focus on the contribution of the non-coding portion of the genome to the evolution of animal gene regulation, specifically on recent insights from non-bilaterian metazoan lineages, and unicellular and colonial holozoan sister taxa. High-throughput next-generation sequencing, largely in bilaterian model species, has led to the discovery of tens of thousands of non-coding RNA genes (ncRNAs), including short, long and circular forms, and uncovered the central roles they play in development. Based on the analysis of non-bilaterian metazoan, unicellular holozoan and fungal genomes, the evolution of some ncRNAs, such as Piwi-interacting RNAs, correlates with the emergence of metazoan multicellularity, while others, including microRNAs, long non-coding RNAs and circular RNAs, appear to be more ancient. Analysis of non-coding regulatory DNA and histone post-translational modifications have revealed that some cis-regulatory mechanisms, such as those associated with proximal promoters, are present in non-animal holozoans, while others appear to be metazoan innovations, most notably distal enhancers. In contrast, the cohesin-CTCF system for regulating higher-order chromatin structure and enhancer-promoter long-range interactions appears to be restricted to bilaterians. Taken together, most bilaterian non-coding regulatory mechanisms appear to have originated before the divergence of crown metazoans. However, differential expansion of non-coding RNA and cis-regulatory DNA repertoires in bilaterians may account for their increased regulatory and morphological complexity relative to non
Yue, Dandan; Guan, Zhi-Hong; Li, Tao; Liao, Rui-Quan; Liu, Feng; Lai, Qiang
In this paper, the cluster synchronization of coupled genetic regulatory networks with a directed topology is studied by using the event-based strategy and pinning control. An event-triggered condition with a threshold consisting of the neighbors' discrete states at their own event time instants and a state-independent exponential decay function is proposed. The intra-cluster states information and extra-cluster states information are involved in the threshold in different ways. By using the Lyapunov function approach and the theories of matrices and inequalities, we establish the cluster synchronization criterion. It is shown that both the avoidance of continuous transmission of information and the exclusion of the Zeno behavior are ensured under the presented triggering condition. Explicit conditions on the parameters in the threshold are obtained for synchronization. The stability criterion of a single GRN is also given under the reduced triggering condition. Numerical examples are provided to validate the theoretical results.
Ahsen, Mehmet Eren; Niculescu, Silviu-Iulian
This brief examines a deterministic, ODE-based model for gene regulatory networks (GRN) that incorporates nonlinearities and time-delayed feedback. An introductory chapter provides some insights into molecular biology and GRNs. The mathematical tools necessary for studying the GRN model are then reviewed, in particular Hill functions and Schwarzian derivatives. One chapter is devoted to the analysis of GRNs under negative feedback with time delays and a special case of a homogenous GRN is considered. Asymptotic stability analysis of GRNs under positive feedback is then considered in a separate chapter, in which conditions leading to bi-stability are derived. Graduate and advanced undergraduate students and researchers in control engineering, applied mathematics, systems biology and synthetic biology will find this brief to be a clear and concise introduction to the modeling and analysis of GRNs.
Taylor-Teeples, M; Lin, L; de Lucas, M; Turco, G; Toal, T W; Gaudinier, A; Young, N F; Trabucco, G M; Veling, M T; Lamothe, R; Handakumbura, P P; Xiong, G; Wang, C; Corwin, J; Tsoukalas, A; Zhang, L; Ware, D; Pauly, M; Kliebenstein, D J; Dehesh, K; Tagkopoulos, I; Breton, G; Pruneda-Paz, J L; Ahnert, S E; Kay, S A; Hazen, S P; Brady, S M
The plant cell wall is an important factor for determining cell shape, function and response to the environment. Secondary cell walls, such as those found in xylem, are composed of cellulose, hemicelluloses and lignin and account for the bulk of plant biomass. The coordination between transcriptional regulation of synthesis for each polymer is complex and vital to cell function. A regulatory hierarchy of developmental switches has been proposed, although the full complement of regulators remains unknown. Here we present a protein-DNA network between Arabidopsis thaliana transcription factors and secondary cell wall metabolic genes with gene expression regulated by a series of feed-forward loops. This model allowed us to develop and validate new hypotheses about secondary wall gene regulation under abiotic stress. Distinct stresses are able to perturb targeted genes to potentially promote functional adaptation. These interactions will serve as a foundation for understanding the regulation of a complex, integral plant component.
Boja, Emily S; Rodriguez, Henry
Better biomarkers are urgently needed to cancer detection, diagnosis, and prognosis. While the genomics community is making significant advances in understanding the molecular basis of disease, proteomics will delineate the functional units of a cell, proteins and their intricate interaction network and signaling pathways for the underlying disease. Great progress has been made to characterize thousands of proteins qualitatively and quantitatively in complex biological systems by utilizing multi-dimensional sample fractionation strategies, mass spectrometry and protein microarrays. Comparative/quantitative analysis of high-quality clinical biospecimen (e.g., tissue and biofluids) of human cancer proteome landscape has the potential to reveal protein/peptide biomarkers responsible for this disease by means of their altered levels of expression, post-translational modifications as well as different forms of protein variants. Despite technological advances in proteomics, major hurdles still exist in every step of the biomarker development pipeline. The National Cancer Institute's Clinical Proteomic Technologies for Cancer initiative (NCI-CPTC) has taken a critical step to close the gap between biomarker discovery and qualification by introducing a pre-clinical "verification" stage in the pipeline, partnering with clinical laboratory organizations to develop and implement common standards, and developing regulatory science documents with the US Food and Drug Administration to educate the proteomics community on analytical evaluation requirements for multiplex assays in order to ensure the safety and effectiveness of these tests for their intended use.
Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Su, Yi-Hsien; Li, Enhu; Geiss, Gary K.; Longabaugh, William J. R.; Krämer, Alexander; Davidson, Eric H.
The current gene regulatory network (GRN) for the sea urchin embryo pertains to pregastrular specification functions in the endomesodermal territories. Here we extend gene regulatory network analysis to the adjacent oral and aboral ectoderm territories over the same period. A large fraction of the regulatory genes predicted by the sea urchin genome project and shown in ancillary studies to be expressed in either oral or aboral ectoderm by 24h are included, though universally expressed and pan-ectodermal regulatory genes are in general not. The loci of expression of these genes have been determined by whole mount in situ hybridization. We have carried out a global perturbation analysis in which expression of each gene was interrupted by introduction of morpholino antisense oligonucleotide, and the effects on all other genes were measured quantitatively, both by QPCR and by a new instrumental technology (NanoString Technologies nCounter Analysis System). At its current stage the network model, built in BioTapestry, includes 22 genes encoding transcription factors, 4 genes encoding known signaling ligands, and 3 genes that are yet unknown but are predicted to perform specific roles. Evidence emerged from the analysis pointing to distinctive subcircuit features observed earlier in other parts of the GRN, including a double negative transcriptional regulatory gate, and dynamic state lockdowns by feedback interactions. While much of the regulatory apparatus is downstream of Nodal signaling, as expected from previous observations, there are also cohorts of independently activated oral and aboral ectoderm regulatory genes, and we predict yet unidentified signaling interactions between oral and aboral territories. PMID:19268450
Festuccia, Nicola; Dubois, Agnès; Vandormael-Pournin, Sandrine; Gallego Tejeda, Elena; Mouren, Adrien; Bessonnard, Sylvain; Mueller, Florian; Proux, Caroline; Cohen-Tannoudji, Michel; Navarro, Pablo
Pluripotent mouse embryonic stem cells maintain their identity throughout virtually infinite cell divisions. This phenomenon, referred to as self-renewal, depends on a network of sequence-specific transcription factors (TFs) and requires daughter cells to accurately reproduce the gene expression pattern of the mother. However, dramatic chromosomal changes take place in mitosis, generally leading to the eviction of TFs from chromatin. Here, we report that Esrrb, a major pluripotency TF, remains bound to key regulatory regions during mitosis. We show that mitotic Esrrb binding is highly dynamic, driven by specific recognition of its DNA-binding motif and is associated with early transcriptional activation of target genes after completion of mitosis. These results indicate that Esrrb may act as a mitotic bookmarking factor, opening another perspective to molecularly understand the role of sequence-specific TFs in the epigenetic control of self-renewal, pluripotency and genome reprogramming.
Stranger Barbara E; Dermitzakis Emmanouil T
Abstract The regulation of gene expression plays an important role in complex phenotypes, including disease in humans. For some genes, the genetic mechanisms influencing gene expression are well elucidated; however, it is unclear how applicable these results are to gene expression on a genome-wide level. Studies in model organisms and humans have clearly documented gene expression variation among individuals and shown that a significant proportion of this variation has a genetic basis. Recent...
Maeso, Ignacio; Irimia, Manuel; Tena, Juan J; González-Pérez, Esther; Tran, David; Ravi, Vydianathan; Venkatesh, Byrappa; Campuzano, Sonsoles; Gómez-Skarmeta, José Luis; Garcia-Fernàndez, Jordi
Developmental genes are regulated by complex, distantly located cis-regulatory modules (CRMs), often forming genomic regulatory blocks (GRBs) that are conserved among vertebrates and among insects. We have investigated GRBs associated with Iroquois homeobox genes in 39 metazoans. Despite 600 million years of independent evolution, Iroquois genes are linked to ankyrin-repeat-containing Sowah genes in nearly all studied bilaterians. We show that Iroquois-specific CRMs populate the Sowah locus, suggesting that regulatory constraints underlie the maintenance of the Iroquois-Sowah syntenic block. Surprisingly, tetrapod Sowah orthologs are intronless and not associated with Iroquois; however, teleost and elephant shark data demonstrate that this is a derived feature, and that many Iroquois-CRMs were ancestrally located within Sowah introns. Retroposition, gene, and genome duplication have allowed selective elimination of Sowah exons from the Iroquois regulatory landscape while keeping associated CRMs, resulting in large associated gene deserts. These results highlight the importance of CRMs in imposing constraints to genome architecture, even across large phylogenetic distances, and of gene duplication-mediated genetic redundancy to disentangle these constraints, increasing genomic plasticity.
Yu, Chunxiao; McClure, Ryan; Nudel, Kathleen; Daou, Nadine; Genco, Caroline Attardo
The Neisseria gonorrhoeae ferric uptake regulator (Fur) protein controls expression of iron homeostasis genes in response to intracellular iron levels. In this study, using transcriptome sequencing (RNA-seq) analysis of an N. gonorrhoeae fur strain, we defined the gonococcal Fur and iron regulons and characterized Fur-controlled expression of an ArsR-like DNA binding protein. We observed that 158 genes (8% of the genome) showed differential expression in response to iron in an N. gonorrhoeae wild-type or fur strain, while 54 genes exhibited differential expression in response to Fur. The Fur regulon was extended to additional regulators, including NrrF and 13 other small RNAs (sRNAs), and two transcriptional factors. One transcriptional factor, coding for an ArsR-like regulator (ArsR), exhibited increased expression under iron-replete conditions in the wild-type strain but showed decreased expression across iron conditions in the fur strain, an effect that was reversed in a fur-complemented strain. Fur was shown to bind to the promoter region of the arsR gene downstream of a predicted σ(70) promoter region. Electrophoretic mobility shift assay (EMSA) analysis confirmed binding of the ArsR protein to the norB promoter region, and sequence analysis identified two additional putative targets, NGO1411 and NGO1646. A gonococcal arsR strain demonstrated decreased survival in human endocervical epithelial cells compared to that of the wild-type and arsR-complemented strains, suggesting that the ArsR regulon includes genes required for survival in host cells. Collectively, these results demonstrate that the N. gonorrhoeae Fur functions as a global regulatory protein to repress or activate expression of a large repertoire of genes, including additional transcriptional regulatory proteins. Gene regulation in bacteria in response to environmental stimuli, including iron, is of paramount importance to both bacterial replication and, in the case of pathogenic bacteria
Araki, Motoko; Ishii, Tetsuya
Genome editing technology, including zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeat (CRISPR)/Cas, has enabled far more efficient genetic engineering even in non-human primates. This biotechnology is more likely to develop into medicine for preventing a genetic disease if corrective genome editing is integrated into assisted reproductive technology, represented by in vitro fertilization. Although rapid advances in genome editing are expected to make germline gene correction feasible in a clinical setting, there are many issues that still need to be addressed before this could occur. We herein examine current status of genome editing in mammalian embryonic stem cells and zygotes and discuss potential issues in the international regulatory landscape regarding human germline gene modification. Moreover, we address some ethical and social issues that would be raised when each country considers whether genome editing-mediated germline gene correction for preventive medicine should be permitted.
Drulhe, Samuel; Ferrari-Trecate, Giancarlo; De Jong, Hidde; Viari, Alain
http://dx.doi.org/10.1007/11730637_16; Recent advances of experimental techniques in biology have led to the production of enormous amounts of data on the dynamics of genetic regulatory networks. In this paper, we present an approach for the identification of PieceWise-Affine (PWA) models of genetic regulatory networks from experimental data, focusing on the reconstruction of switching thresholds associated with regulatory interactions. In particular, our method takes into account geometric c...
Liang, Jinling; Lam, James
In this article, the state estimation problem is investigated for genetic regulatory networks (GRNs) with parameter uncertainties and stochastic disturbances. To account for the unavoidable modelling errors and parameter fluctuations, the network parameters are assumed to be time-varying but norm-bounded. Furthermore, scalar multiplicative white noises are introduced into both the translation process and the feedback regulation process in order to reflect the inherent intracellular and extracellular noise perturbations. The purpose of the addressed problem is to design a linear state estimator that can estimate the true concentration of the mRNA and the protein of the uncertain GRNs. By resorting to the Lyapunov-Krasovskii functional method combined with the linear matrix inequality (LMI) technique, sufficient conditions are first established for ensuring the stochastic stability of the dynamics of the estimation error, and the estimator gains are then designed in terms of the solutions to some LMIs that can be easily solved by using the standard numerical software. A three-node GRN is presented to show the effectiveness of the proposed design procedures.
Full Text Available Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions. Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.
Xing, Linlin; Guo, Maozu; Liu, Xiaoyan; Wang, Chunyu; Wang, Lei; Zhang, Yin
The reconstruction of gene regulatory network (GRN) from gene expression data can discover regulatory relationships among genes and gain deep insights into the complicated regulation mechanism of life. However, it is still a great challenge in systems biology and bioinformatics. During the past years, numerous computational approaches have been developed for this goal, and Bayesian network (BN) methods draw most of attention among these methods because of its inherent probability characteristics. However, Bayesian network methods are time consuming and cannot handle large-scale networks due to their high computational complexity, while the mutual information-based methods are highly effective but directionless and have a high false-positive rate. To solve these problems, we propose a Candidate Auto Selection algorithm (CAS) based on mutual information and breakpoint detection to restrict the search space in order to accelerate the learning process of Bayesian network. First, the proposed CAS algorithm automatically selects the neighbor candidates of each node before searching the best structure of GRN. Then based on CAS algorithm, we propose a globally optimal greedy search method (CAS + G), which focuses on finding the highest rated network structure, and a local learning method (CAS + L), which focuses on faster learning the structure with little loss of quality. Results show that the proposed CAS algorithm can effectively reduce the search space of Bayesian networks through identifying the neighbor candidates of each node. In our experiments, the CAS + G method outperforms the state-of-the-art method on simulation data for inferring GRNs, and the CAS + L method is significantly faster than the state-of-the-art method with little loss of accuracy. Hence, the CAS based methods effectively decrease the computational complexity of Bayesian network and are more suitable for GRN inference.
Parker, Brian John; Moltke, Ida; Roth, Adam
a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein...... identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one...... involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we...
Full Text Available Biological signaling processes may be mediated by complex networks in which network components and network sectors interact with each other in complex ways. Studies of complex networks benefit from approaches in which the roles of individual components are considered in the context of the network. The plant immune signaling network, which controls inducible responses to pathogen attack, is such a complex network. We studied the Arabidopsis immune signaling network upon challenge with a strain of the bacterial pathogen Pseudomonas syringae expressing the effector protein AvrRpt2 (Pto DC3000 AvrRpt2. This bacterial strain feeds multiple inputs into the signaling network, allowing many parts of the network to be activated at once. mRNA profiles for 571 immune response genes of 22 Arabidopsis immunity mutants and wild type were collected 6 hours after inoculation with Pto DC3000 AvrRpt2. The mRNA profiles were analyzed as detailed descriptions of changes in the network state resulting from the genetic perturbations. Regulatory relationships among the genes corresponding to the mutations were inferred by recursively applying a non-linear dimensionality reduction procedure to the mRNA profile data. The resulting static network model accurately predicted 23 of 25 regulatory relationships reported in the literature, suggesting that predictions of novel regulatory relationships are also accurate. The network model revealed two striking features: (i the components of the network are highly interconnected; and (ii negative regulatory relationships are common between signaling sectors. Complex regulatory relationships, including a novel negative regulatory relationship between the early microbe-associated molecular pattern-triggered signaling sectors and the salicylic acid sector, were further validated. We propose that prevalent negative regulatory relationships among the signaling sectors make the plant immune signaling network a "sector
Full Text Available Mammalian host response to pathogenic infections is controlled by a complex regulatory network connecting regulatory proteins such as transcription factors and signaling proteins to target genes. An important challenge in infectious disease research is to understand molecular similarities and differences in mammalian host response to diverse sets of pathogens. Recently, systems biology studies have produced rich collections of omic profiles measuring host response to infectious agents such as influenza viruses at multiple levels. To gain a comprehensive understanding of the regulatory network driving host response to multiple infectious agents, we integrated host transcriptomes and proteomes using a network-based approach. Our approach combines expression-based regulatory network inference, structured-sparsity based regression, and network information flow to infer putative physical regulatory programs for expression modules. We applied our approach to identify regulatory networks, modules and subnetworks that drive host response to multiple influenza infections. The inferred regulatory network and modules are significantly enriched for known pathways of immune response and implicate apoptosis, splicing, and interferon signaling processes in the differential response of viral infections of different pathogenicities. We used the learned network to prioritize regulators and study virus and time-point specific networks. RNAi-based knockdown of predicted regulators had significant impact on viral replication and include several previously unknown regulators. Taken together, our integrated analysis identified novel module level patterns that capture strain and pathogenicity-specific patterns of expression and helped identify important regulators of host response to influenza infection.
Full Text Available Abstract Background Gene regulation and metabolic reactions are two primary activities of life. Although many works have been dedicated to study each system, the coupling between them is less well understood. To bridge this gap, we propose a joint model of gene regulation and metabolic reactions. Results We integrate regulatory and metabolic networks by adding links specifying the feedback control from the substrates of metabolic reactions to enzyme gene expressions. We adopt two alternative approaches to build those links: inferring the links between metabolites and transcription factors to fit the data or explicitly encoding the general hypotheses of feedback control as links between metabolites and enzyme expressions. A perturbation data is explained by paths in the joint network if the predicted response along the paths is consistent with the observed response. The consistency requirement for explaining the perturbation data imposes constraints on the attributes in the network such as the functions of links and the activities of paths. We build a probabilistic graphical model over the attributes to specify these constraints, and apply an inference algorithm to identify the attribute values which optimally explain the data. The inferred models allow us to 1 identify the feedback links between metabolites and regulators and their functions, 2 identify the active paths responsible for relaying perturbation effects, 3 computationally test the general hypotheses pertaining to the feedback control of enzyme expressions, 4 evaluate the advantage of an integrated model over separate systems. Conclusion The modeling results provide insight about the mechanisms of the coupling between the two systems and possible "design rules" pertaining to enzyme gene regulation. The model can be used to investigate the less well-probed systems and generate consistent hypotheses and predictions for further validation.
Borodina, Irina; Nielsen, Jens
Genome-scale metabolic models are the focal point of systems biology as they allow the collection of various data types in a form suitable for mathematical analysis. High-quality metabolic networks and metabolic networks with incorporated regulation have been successfully used for the analysis of...
Full Text Available Abstract Background Fusarium graminearum (Fg, a major fungal pathogen of cultivated cereals, is responsible for billions of dollars in agriculture losses. There is a growing interest in understanding the transcriptional regulation of this organism, especially the regulation of genes underlying its pathogenicity. The generation of whole genome sequence assemblies for Fg and three closely related Fusarium species provides a unique opportunity for such a study. Results Applying comparative genomics approaches, we developed a computational pipeline to systematically discover evolutionarily conserved regulatory motifs in the promoter, downstream and the intronic regions of Fg genes, based on the multiple alignments of sequenced Fusarium genomes. Using this method, we discovered 73 candidate regulatory motifs in the promoter regions. Nearly 30% of these motifs are highly enriched in promoter regions of Fg genes that are associated with a specific functional category. Through comparison to Saccharomyces cerevisiae (Sc and Schizosaccharomyces pombe (Sp, we observed conservation of transcription factors (TFs, their binding sites and the target genes regulated by these TFs related to pathways known to respond to stress conditions or phosphate metabolism. In addition, this study revealed 69 and 39 conserved motifs in the downstream regions and the intronic regions, respectively, of Fg genes. The top intronic motif is the splice donor site. For the downstream regions, we noticed an intriguing absence of the mammalian and Sc poly-adenylation signals among the list of conserved motifs. Conclusion This study provides the first comprehensive list of candidate regulatory motifs in Fg, and underscores the power of comparative genomics in revealing functional elements among related genomes. The conservation of regulatory pathways among the Fusarium genomes and the two yeast species reveals their functional significance, and provides new insights in their
Creixell, Pau; Reimand, Jueri; Haider, Syed
Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been ...
Dong, Zhanshan; Danilevskaya, Olga; Abadie, Tabare; Messina, Carlos; Coles, Nathan; Cooper, Mark
The transition from the vegetative to reproductive development is a critical event in the plant life cycle. The accurate prediction of flowering time in elite germplasm is important for decisions in maize breeding programs and best agronomic practices. The understanding of the genetic control of flowering time in maize has significantly advanced in the past decade. Through comparative genomics, mutant analysis, genetic analysis and QTL cloning, and transgenic approaches, more than 30 flowering time candidate genes in maize have been revealed and the relationships among these genes have been partially uncovered. Based on the knowledge of the flowering time candidate genes, a conceptual gene regulatory network model for the genetic control of flowering time in maize is proposed. To demonstrate the potential of the proposed gene regulatory network model, a first attempt was made to develop a dynamic gene network model to predict flowering time of maize genotypes varying for specific genes. The dynamic gene network model is composed of four genes and was built on the basis of gene expression dynamics of the two late flowering id1 and dlf1 mutants, the early flowering landrace Gaspe Flint and the temperate inbred B73. The model was evaluated against the phenotypic data of the id1 dlf1 double mutant and the ZMM4 overexpressed transgenic lines. The model provides a working example that leverages knowledge from model organisms for the utilization of maize genomic information to predict a whole plant trait phenotype, flowering time, of maize genotypes.
Thiagarajan, Raghuram; Alavi, Amir; Podichetty, Jagdeep T; Bazil, Jason N; Beard, Daniel A
Systems research spanning fields from biology to finance involves the identification of models to represent the underpinnings of complex systems. Formal approaches for data-driven identification of network interactions include statistical inference-based approaches and methods to identify dynamical systems models that are capable of fitting multivariate data. Availability of large data sets and so-called 'big data' applications in biology present great opportunities as well as major challenges for systems identification/reverse engineering applications. For example, both inverse identification and forward simulations of genome-scale gene regulatory network models pose compute-intensive problems. This issue is addressed here by combining the processing power of Graphics Processing Units (GPUs) and a parallel reverse engineering algorithm for inference of regulatory networks. It is shown that, given an appropriate data set, information on genome-scale networks (systems of 1000 or more state variables) can be inferred using a reverse-engineering algorithm in a matter of days on a small-scale modern GPU cluster.
Imrichová, Hana; Van de Sande, Bram; Standaert, Laura; Christiaens, Valerie; Hulselmans, Gert; Herten, Koen; Naval Sanchez, Marina; Potier, Delphine; Svetlichnyy, Dmitry; Kalender Atak, Zeynep; Fiers, Mark; Marine, Jean-Christophe; Aerts, Stein
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org. PMID:25058159
Zhao, Li-Li; Zhang, Tong; Liu, Bing-Rong; Liu, Tie-Fu; Tao, Na; Zhuang, Li-Wei
Transcription factor (TF) and microRNA (miRNA) have been discovered playing crucial roles in cancer development. However, the effect of TFs and miRNAs in pancreatic cancer pathogenesis remains vague. We attempted to reveal the possible mechanism of pancreatic cancer based on transcription level. Using GSE16515 datasets downloaded from gene expression omnibus database, we first identified the differentially expressed genes (DEGs) in pancreatic cancer by the limma package in R. Then the DEGs were mapped into DAVID to conduct the kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis. TFs and miRNAs that DEGs significantly enriched were identified by Fisher's test, and then the pancreatic cancer double-factor regulatory network was constructed. In our study, total 1117 DEGs were identified and they significantly enriched in 4 KEGG pathways. A double-factor regulatory network was established, including 29 DEGs, 24 TFs, 25 miRNAs. In the network, LAMC2, BRIP1 and miR155 were identified which may be involved in pancreatic cancer development. In conclusion, the double-factor regulatory network was found to play an important role in pancreatic cancer progression and our results shed new light on the molecular mechanism of pancreatic cancer.
Full Text Available To maintain a stable intracellular environment, cells utilize complex and specialized defense systems against a variety of external perturbations, such as electrophilic stress, heat shock, and hypoxia, etc. Irrespective of the type of stress, many adaptive mechanisms contributing to cellular homeostasis appear to operate through gene regulatory networks that are organized into negative feedback loops. In general, the degree of deviation of the controlled variables, such as electrophiles, misfolded proteins, and O2, is first detected by specialized sensor molecules, then the signal is transduced to specific transcription factors. Transcription factors can regulate the expression of a suite of anti-stress genes, many of which encode enzymes functioning to counteract the perturbed variables. The objective of this study was to explore, using control theory and computational approaches, the theoretical basis that underlies the steady-state dose response relationship between cellular stressors and intracellular biochemical species (controlled variables, transcription factors, and gene products in these gene regulatory networks. Our work indicated that the shape of dose response curves (linear, superlinear, or sublinear depends on changes in the specific values of local response coefficients (gains distributed in the feedback loop. Multimerization of anti-stress enzymes and transcription factors into homodimers, homotrimers, or even higher-order multimers, play a significant role in maintaining robust homeostasis. Moreover, our simulation noted that dose response curves for the controlled variables can transition sequentially through four distinct phases as stressor level increases: initial superlinear with lesser control, superlinear more highly controlled, linear uncontrolled, and sublinear catastrophic. Each phase relies on specific gain-changing events that come into play as stressor level increases. The low-dose region is intrinsically nonlinear
Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence
Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a
Rowe, Heather C; Rieseberg, Loren H
Interspecific hybridization creates individuals harboring diverged genomes. The interaction of these genomes can generate successful evolutionary novelty or disadvantageous genomic conflict. Annual sunflowers Helianthus annuus and H. petiolaris have a rich history of hybridization in natural populations. Although first-generation hybrids generally have low fertility, hybrid swarms that include later generation and fully fertile backcross plants have been identified, as well as at least three independently-originated stable hybrid taxa. We examine patterns of transcript accumulation in the earliest stages of hybridization of these species via analyses of transcriptome sequences from laboratory-derived F1 offspring of an inbred H. annuus cultivar and a wild H. petiolaris accession. While nearly 14% of the reference transcriptome showed significant accumulation differences between parental accessions, total F1 transcript levels showed little evidence of dominance, as midparent transcript levels were highly predictive of transcript accumulation in F1 plants. Allelic bias in F1 transcript accumulation was detected in 20% of transcripts containing sufficient polymorphism to distinguish parental alleles; however the magnitude of these biases were generally smaller than differences among parental accessions. While analyses of allelic bias suggest that cis regulatory differences between H. annuus and H. petiolaris are common, their effect on transcript levels may be more subtle than trans-acting regulatory differences. Overall, these analyses found little evidence of regulatory incompatibility or dominance interactions between parental genomes within F1 hybrid individuals, although it is unclear whether this is a legacy or an enabler of introgression between species.
Carrera, Javier; Elena, Santiago F.; Jaramillo, Alfonso
Transcriptional profiling has been widely used as a tool for unveiling the coregulations of genes in response to genetic and environmental perturbations. These coregulations have been used, in a few instances, to infer global transcriptional regulatory models. Here, using the large amount of transcriptomic information available for the bacterium Escherichia coli, we seek to understand the design principles determining the regulation of its transcriptome. Combining transcriptomic and signaling data, we develop an evolutionary computational procedure that allows obtaining alternative genomic transcriptional regulatory network (GTRN) that still maintains its adaptability to dynamic environments. We apply our methodology to an E. coli GTRN and show that it could be rewired to simpler transcriptional regulatory structures. These rewired GTRNs still maintain the global physiological response to fluctuating environments. Rewired GTRNs contain 73% fewer regulated operons. Genes with similar functions and coordinated patterns of expression across environments are clustered into longer regulated operons. These synthetic GTRNs are more sensitive and show a more robust response to challenging environments. This result illustrates that the natural configuration of E. coli GTRN does not necessarily result from selection for robustness to environmental perturbations, but that evolutionary contingencies may have been important as well. We also discuss the limitations of our methodology in the context of the demand theory. Our procedure will be useful as a novel way to analyze global transcription regulation networks and in synthetic biology for the de novo design of genomes. PMID:22927389
Full Text Available An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from this information in our efforts to comprehend human disease. Humanized mouse models (HuMMs, in which human genes are introduced into the mouse, suggest an approach to this problem. In the past, HuMMs have been used successfully to study human disease variants; e.g., the complex genetic condition arising from Down syndrome, common monogenic disorders such as Huntington disease and β-thalassemia, and cancer susceptibility genes such as BRCA1. In this commentary, we highlight a novel method for high-throughput single-copy site-specific generation of HuMMs entitled High-throughput Human Genes on the X Chromosome (HuGX. This method can be applied to most human genes for which a bacterial artificial chromosome (BAC construct can be derived and a mouse-null allele exists. This strategy comprises (1 the use of recombineering technology to create a human variant-harbouring BAC, (2 knock-in of this BAC into the mouse genome using Hprt docking technology, and (3 allele comparison by interspecies complementation. We demonstrate the throughput of the HuGX method by generating a series of seven different alleles for the human NR2E1 gene at Hprt. In future challenges, we consider the current limitations of experimental approaches and call for a concerted effort by the genetics community, for both human and mouse, to solve the challenge of the functional analysis of human regulatory variation.
Full Text Available ABSTRACTTomato fruit ripening is a complex developmental programme partly mediated by transcriptional regulatory networks. Several transcription factors (TFs which are members of gene families such as MADS-box and ERF were shown to play a significant role in ripening through interconnections into an intricate network. The accumulation of large datasets of expression profiles corresponding to different stages of tomato fruit ripening and the availability of bioinformatics tools for their analysis provide an opportunity to identify TFs which might regulate gene clusters with similar co-expression patterns. We identified two TFs, a SlWRKY22-like and a SlER24 transcriptional activator which were shown to regulate modules by using the LeMoNe algorithm for the analysis of our microarray datasets representing four stages of fruit ripening, breaker, turning, pink and red ripe. The WRKY22-like module comprised a subgroup of six various calcium sensing transcripts with similar to the TF expression patterns according to real time PCR validation. A promoter motif search identified a cis acting element, the W-box, recognized by WRKY TFs that was present in the promoter region of all six calcium sensing genes. Moreover, publicly available microarray datasets of similar ripening stages were also analyzed with LeMoNe resulting in TFs such as SlERF.E1, SlERF.C1, SlERF.B2, SLERF.A2, SlWRKY24, SLWRKY37 and MADS-box/TM29 which might also play an important role in regulation of ripening. These results suggest that the SlWRKY22-like might be involved in the coordinated regulation of expression of the six calcium sensing genes. Conclusively the LeMoNe tool might lead to the identification of putative TF targets for further physiological analysis as regulators of tomato fruit ripening.
Kemmeren, Patrick; Sameith, Katrin; van de Pasch, Loes A L; Benschop, Joris J; Lenstra, Tineke L; Margaritis, Thanasis; O'Duibhir, Eoghan; Apweiler, Eva; van Wageningen, Sake; Ko, Cheuk W; van Heesch, Sebastiaan; Kashani, Mehdi M; Ampatziadis-Michailidis, Giannis; Brok, Mariel O; Brabers, Nathalie A C H; Miles, Anthony J; Bouwmeester, Diane; van Hooff, Sander R; van Bakel, Harm; Sluiters, Erik; Bakker, Linda V; Snel, Berend; Lijnzaad, Philip; van Leenen, Dik; Groot Koerkamp, Marian J A; Holstege, Frank C P
To understand regulatory systems, it would be useful to uniformly determine how different components contribute to the expression of all other genes. We therefore monitored mRNA expression genome-wide, for individual deletions of one-quarter of yeast genes, focusing on (putative) regulators. The resulting genetic perturbation signatures reflect many different properties. These include the architecture of protein complexes and pathways, identification of expression changes compatible with viability, and the varying responsiveness to genetic perturbation. The data are assembled into a genetic perturbation network that shows different connectivities for different classes of regulators. Four feed-forward loop (FFL) types are overrepresented, including incoherent type 2 FFLs that likely represent feedback. Systematic transcription factor classification shows a surprisingly high abundance of gene-specific repressors, suggesting that yeast chromatin is not as generally restrictive to transcription as is often assumed. The data set is useful for studying individual genes and for discovering properties of an entire regulatory system. Copyright © 2014 Elsevier Inc. All rights reserved.
Curto, Gloria G; Gard, Chris; Ribes, Vanessa
Over the past two decades, Pax proteins have received a lot of attention from researchers working on the generation and assembly of neural circuits during vertebrate development. Through tissue or cell based phenotypic analyses, or more recently using genome-wide approaches, they have highlighted the pleiotropic functions of Pax proteins during neurogenesis. This review discusses the wide range of molecular and cellular mechanisms by which these transcription factors control in time and space the number and identity of neurons produced during development. We first focus on the position of Pax proteins within gene regulatory networks that generate patterns of cellular differentiation within the central nervous system. Next, the architecture of Pax-linked regulatory loops that provide a tempo of differentiation to progenitor cells is presented. Finally, we examine the molecular foundations providing a "multitasking" property to Pax proteins. Amongst the Pax factors that are expressed within the developing nervous system, Pax6 is the most extensively studied and thus holds a dominant position in this article. Copyright © 2015 Elsevier Ltd. All rights reserved.
Schaub, M. A.
Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.
Full Text Available In flowering plants, male gametophyte development occurs in the anther. Tapetum, the innermost of the four anther somatic layers, surrounds the developing reproductive cells to provide materials for pollen development. A genetic pathway of DYT1-TDF1-AMS-MS188 in regulating tapetum development has been proven. Here we used laser microdissection and pressure catapulting to capture and analyze the transcriptome data for the Arabidopsis tapetum at two stages. With a comprehensive analysis by the microarray data of dyt1, tdf1, ams, and ms188 mutants, we identified possible downstream genes for each transcription factor. These transcription factors regulate many biological processes in addition to activating the expression of the other transcription factor. Briefly, DYT1 may also regulate early tapetum development via E3 ubiquitin ligases and many other transcription factors. TDF1 is likely involved in redox and cell degradation. AMS probably regulates lipid transfer proteins, which are involved in pollen wall formation, and other E3 ubiquitin ligases, functioning in degradating proteins produced in previous processes. MS188 is responsible for most cell wall-related genes, functioning both in tapetum cell wall degradation and pollen wall formation. These results propose a more complex gene regulatory network for tapetum development and function.
Cogburn, L A; Wang, X; Carre, W; Rejto, L; Aggrey, S E; Duclos, M J; Simon, J; Porter, T E
The genetic networks that govern the differentiation and growth of major tissues of economic importance in the chicken are largely unknown. Under a functional genomics project, our consortium has generated 30 609 expressed sequence tags (ESTs) and developed several chicken DNA microarrays, which represent the Chicken Metabolic/Somatic (10 K) and Neuroendocrine/Reproductive (8 K) Systems (http://udgenome.ags.udel.edu/cogburn/). One of the major challenges facing functional genomics is the development of mathematical models to reconstruct functional gene networks and regulatory pathways from vast volumes of microarray data. In initial studies with liver-specific microarrays (3.1 K), we have examined gene expression profiles in liver during the peri-hatch transition and during a strong metabolic perturbation-fasting and re-feeding-in divergently selected broiler chickens (fast vs. slow-growth lines). The expression of many genes controlling metabolic pathways is dramatically altered by these perturbations. Our analysis has revealed a large number of clusters of functionally related genes (mainly metabolic enzymes and transcription factors) that control major metabolic pathways. Currently, we are conducting transcriptional profiling studies of multiple tissues during development of two sets of divergently selected broiler chickens (fast vs. slow growing and fat vs. lean lines). Transcriptional profiling across multiple tissues should permit construction of a detailed genetic blueprint that illustrates the developmental events and hierarchy of genes that govern growth and development of chickens. This review will briefly describe the recent acquisition of chicken genomic resources (ESTs and microarrays) and our consortium's efforts to help launch the new era of functional genomics in the chicken.
Dall'Olio, Giovanni Marco; Bertranpetit, Jaume; Wagner, Andreas; Laayouni, Hafid
Genotype networks are a concept used in systems biology to study sets of genotypes having the same phenotype, and the ability of these to bring forth novel phenotypes. In the past they have been applied to determine the genetic heterogeneity, and stability to mutations, of systems such as metabolic networks and RNA folds. Recently, they have been the base for reconciling the neutralist and selectionist views on evolution. Here, we adapted this concept to the study of population genetics data. Specifically, we applied genotype networks to the human 1000 genomes dataset, and analyzed networks composed of short haplotypes of Single Nucleotide Variants (SNV). The result is a scan of how properties related to genetic heterogeneity and stability to mutations are distributed along the human genome. We found that genes involved in acquired immunity, such as some HLA and MHC genes, tend to have the most heterogeneous and connected networks, and that coding regions tend to be more heterogeneous and stable to mutations than non-coding regions. We also found, using coalescent simulations, that regions under selection have more extended and connected networks. The application of the concept of genotype networks can provide a new opportunity to understand the evolutionary processes that shaped our genome. Learning how the genotype space of each region of our genome has been explored during the evolutionary history of the human species can lead to a better understanding on how selective pressures and neutral factors have shaped genetic diversity within populations and among individuals. Combined with the availability of larger datasets of sequencing data, genotype networks represent a new approach to the study of human genetic diversity that looks to the whole genome, and goes beyond the classical division between selection and neutrality methods. PMID:24911413
Dall'Olio, Giovanni Marco; Bertranpetit, Jaume; Wagner, Andreas; Laayouni, Hafid
Genotype networks are a concept used in systems biology to study sets of genotypes having the same phenotype, and the ability of these to bring forth novel phenotypes. In the past they have been applied to determine the genetic heterogeneity, and stability to mutations, of systems such as metabolic networks and RNA folds. Recently, they have been the base for reconciling the neutralist and selectionist views on evolution. Here, we adapted this concept to the study of population genetics data. Specifically, we applied genotype networks to the human 1000 genomes dataset, and analyzed networks composed of short haplotypes of Single Nucleotide Variants (SNV). The result is a scan of how properties related to genetic heterogeneity and stability to mutations are distributed along the human genome. We found that genes involved in acquired immunity, such as some HLA and MHC genes, tend to have the most heterogeneous and connected networks, and that coding regions tend to be more heterogeneous and stable to mutations than non-coding regions. We also found, using coalescent simulations, that regions under selection have more extended and connected networks. The application of the concept of genotype networks can provide a new opportunity to understand the evolutionary processes that shaped our genome. Learning how the genotype space of each region of our genome has been explored during the evolutionary history of the human species can lead to a better understanding on how selective pressures and neutral factors have shaped genetic diversity within populations and among individuals. Combined with the availability of larger datasets of sequencing data, genotype networks represent a new approach to the study of human genetic diversity that looks to the whole genome, and goes beyond the classical division between selection and neutrality methods.
Full Text Available A central challenge in genetics is to understand when and why mutations alter the phenotype of an organism. The consequences of gene inhibition have been systematically studied and can be predicted reasonably well across a genome. However, many sequence variants important for disease and evolution may alter gene regulation rather than gene function. The consequences of altering a regulatory interaction (or "edge" rather than a gene (or "node" in a network have not been as extensively studied. Here we use an integrative analysis and evolutionary conservation to identify features that predict when the loss of a regulatory interaction is detrimental in the extensively mapped transcription network of budding yeast. Properties such as the strength of an interaction, location and context in a promoter, regulator and target gene importance, and the potential for compensation (redundancy associate to some extent with interaction importance. Combined, however, these features predict quite well whether the loss of a regulatory interaction is detrimental across many promoters and for many different transcription factors. Thus, despite the potential for regulatory diversity, common principles can be used to understand and predict when changes in regulation are most harmful to an organism.
Westenberg, Michel A.; Hijum, Sacha A.F.T. van; Lulko, Andrzej T.; Kuipers, Oscar P.; Roerdink, Jos B.T.M.; Linsen, L; Hagen, H; Hamann, B
We present GENeVis, an application to visualize gene expression time series data in a gene regulatory network context. This is a network of regulator proteins that regulate the expression of their respective target genes. The networks are represented as graphs, in which the nodes represent genes,
Full Text Available Understanding the relationship between genetic variation and gene expression is a central question in genetics. With the availability of data from high-throughput technologies such as ChIP-Chip, expression, and genotyping arrays, we can begin to not only identify associations but to understand how genetic variations perturb the underlying transcription regulatory networks to induce differential gene expression. In this study, we describe a simple model of transcription regulation where the expression of a gene is completely characterized by two properties: the concentrations and promoter affinities of active transcription factors. We devise a method that extends Network Component Analysis (NCA to determine how genetic variations in the form of single nucleotide polymorphisms (SNPs perturb these two properties. Applying our method to a segregating population of Saccharomyces cerevisiae, we found statistically significant examples of trans-acting SNPs located in regulatory hotspots that perturb transcription factor concentrations and affinities for target promoters to cause global differential expression and cis-acting genetic variations that perturb the promoter affinities of transcription factors on a single gene to cause local differential expression. Although many genetic variations linked to gene expressions have been identified, it is not clear how they perturb the underlying regulatory networks that govern gene expression. Our work begins to fill this void by showing that many genetic variations affect the concentrations of active transcription factors in a cell and their affinities for target promoters. Understanding the effects of these perturbations can help us to paint a more complete picture of the complex landscape of transcription regulation. The software package implementing the algorithms discussed in this work is available as a MATLAB package upon request.
Ye, Chun; Galbraith, Simon J; Liao, James C; Eskin, Eleazar
Understanding the relationship between genetic variation and gene expression is a central question in genetics. With the availability of data from high-throughput technologies such as ChIP-Chip, expression, and genotyping arrays, we can begin to not only identify associations but to understand how genetic variations perturb the underlying transcription regulatory networks to induce differential gene expression. In this study, we describe a simple model of transcription regulation where the expression of a gene is completely characterized by two properties: the concentrations and promoter affinities of active transcription factors. We devise a method that extends Network Component Analysis (NCA) to determine how genetic variations in the form of single nucleotide polymorphisms (SNPs) perturb these two properties. Applying our method to a segregating population of Saccharomyces cerevisiae, we found statistically significant examples of trans-acting SNPs located in regulatory hotspots that perturb transcription factor concentrations and affinities for target promoters to cause global differential expression and cis-acting genetic variations that perturb the promoter affinities of transcription factors on a single gene to cause local differential expression. Although many genetic variations linked to gene expressions have been identified, it is not clear how they perturb the underlying regulatory networks that govern gene expression. Our work begins to fill this void by showing that many genetic variations affect the concentrations of active transcription factors in a cell and their affinities for target promoters. Understanding the effects of these perturbations can help us to paint a more complete picture of the complex landscape of transcription regulation. The software package implementing the algorithms discussed in this work is available as a MATLAB package upon request.
Cheng, Ye; Quinn, Jeffrey Francis; Weiss, Lauren Anne
To date, genome-wide single nucleotide polymorphism (SNP) and copy number variant (CNV) association studies of autism spectrum disorders (ASDs) have led to promising signals but not to easily interpretable or translatable results. Our own genome-wide association study (GWAS) showed significant association to an intergenic SNP near Semaphorin 5A (SEMA5A) and provided evidence for reduced expression of the same gene. In a novel GWAS follow-up approach, we map an expression regulatory pathway for a GWAS candidate gene, SEMA5A, in silico by using population expression and genotype data sets. We find that the SEMA5A regulatory network significantly overlaps rare autism-specific CNVs. The SEMA5A regulatory network includes previous autism candidate genes and regions, including MACROD2, A2BP1, MCPH1, MAST4, CDH8, CADM1, FOXP1, AUTS2, MBD5, 7q21, 20p, USH2A, KIRREL3, DBF4B and RELN, among others. Our results provide: (i) a novel data-derived network implicated in autism, (ii) evidence that the same pathway seeded by an initial SNP association shows association with rare genetic variation in ASDs, (iii) a potential mechanism of action and interpretation for the previous autism candidate genes and genetic variants that fall in this network, and (iv) a novel approach that can be applied to other candidate genes for complex genetic disorders. We take a step towards better understanding of the significance of SEMA5A pathways in autism that can guide interpretation of many other genetic results in ASDs.
Li, Zhonghai; Liu, Guodong; Qu, Yinbo
Filamentous fungi are considered as the most efficient producers expressing lignocellulose-degrading enzymes. Penicillium oxalicum strains possess extraordinary fungal lignocellulolytic enzyme systems and can efficiently utilize plant biomass. In recent years, the regulatory aspects of production of hydrolytic enzymes by P. oxalicum have been well established. This review aims to discuss the recent developments for the production of lignocellulolytic enzymes by P. oxalicum. The main cellulolytic transcription factors mediating the complex transcriptional-regulatory network are highlighted. The genome-wide identification of cellulolytic transcription factors, the cascade regulation network for cellulolytic gene expression, and the synergistic and dose-controlled regulation by cellulolytic regulators are discussed. A cellulase regulatory network sensitive to inducers in intracellular environments, the cross-talk of regulation of lignocellulose-degrading enzyme and amylase, and accessory enzymes are also demonstrated. Finally, strategies for the metabolic engineering of P. oxalicum, which show promising applications in the enzymatic hydrolysis for biochemical production, are established. Copyright © 2017. Published by Elsevier Ltd.
Full Text Available Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation–maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer.
Zhang, Qingyang; Shi, Xuan
Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation-maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer.
Castro-Melchor, Marlene; Charaniya, Salim; Karypis, George; Takano, Eriko; Hu, Wei-Shou
Background: The onset of antibiotics production in Streptomyces species is co-ordinated with differentiation events. An understanding of the genetic circuits that regulate these coupled biological phenomena is essential to discover and engineer the pharmacologically important natural products made
Lam, M.C.; Puchalka, J.; Diez, M.S.; Martins Dos Santos, V.A.P.
Systems biology is aimed at achieving a holistic understanding of living organisms, while synthetic biology seeks to design and construct new living organisms with targeted functionalities. Genome sequencing and the fields of ‘omics’ technology have proven a goldmine of information for scientists
Full Text Available Abstract Background It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. Results In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network to address the underlying regulations of genes that can span any unit(s of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. Conclusion We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex
Rubiolo, Mariano; Milone, Diego H; Stegmayer, Georgina
Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times series. This work proposes a novel method based on a pool of neural networks for obtaining a gene regulatory network from a gene expression dataset. They are used for modeling each possible interaction between pairs of genes in the dataset, and a set of mining rules is applied to accurately detect the subjacent relations among genes. The results obtained on artificial and real datasets confirm the method effectiveness for discovering regulatory networks from a proper modeling of the temporal dynamics of gene expression profiles.
Koonin, Eugene V; Karev, Georgy P
Power Laws, Scale-free Networks and Genome Biology deals with crucial aspects of the theoretical foundations of systems biology, namely power law distributions and scale-free networks which have emerged as the hallmarks of biological organization in the post-genomic era. The chapters in the book not only describe the interesting mathematical properties of biological networks but moves beyond phenomenology, toward models of evolution capable of explaining the emergence of these features. The collection of chapters, contributed by both physicists and biologists, strives to address the problems in this field in a rigorous but not excessively mathematical manner and to represent different viewpoints, which is crucial in this emerging discipline. Each chapter includes, in addition to technical descriptions of properties of biological networks and evolutionary models, a more general and accessible introduction to the respective problems. Most chapters emphasize the potential of theoretical systems biology for disco...
Krouk, Gabriel; Lingeman, Jesse; Colon, Amy Marshall; Coruzzi, Gloria; Shasha, Dennis
The goal of systems biology is to generate models for predicting how a system will react under untested conditions or in response to genetic perturbations. This paper discusses experimental and analytical approaches to deriving causal relationships in gene regulatory networks.
Herranz, Héctor; Cohen, Stephen M
Biological systems are continuously challenged by an environment that is variable. Yet, a key feature of developmental and physiological processes is their remarkable stability. This review considers how microRNAs contribute to gene regulatory networks that confer robustness.
Virginia Balbi; Alessandra Devoto
.... In this review, we focus on the latest published work on jasmonate (JA) signalling components and new regulatory nodes in the transcriptional network that regulates a number of diverse plant responses to developmental and environmental cues...
Geeven, G.; van Kesteren, R.E.; Smit, A.B.; de Gunst, M.C.M.
Motivation: Gene regulatory networks, in which edges between nodes describe interactions between transcriptional regulators and their target genes, determine the coordinated spatiotemporal expression of genes. Especially in higher organisms, context-specific combinatorial regulation by transcription
Seo, Sang Woo; Kim, Donghyuk; Latif, Haythem
The ferric uptake regulator (Fur) plays a critical role in the transcriptional regulation of iron metabolism. However, the full regulatory potential of Fur remains undefined. Here we comprehensively reconstruct the Fur transcriptional regulatory network in Escherichia coli K-12 MG1655 in response...
Lavansiri, Direk; Bull, Trevor
The Energy Regulatory Commission of Thailand is a new regulatory agency. The structure of the energy sector; the tradition of administration; and, the lack of access to experienced personnel in Thailand all pose particular challenges. The Commission is meeting these challenges through regional and international networking to assist in developing policies and procedures that allow it to meet international benchmarks.
Chanda-Kapata, Pascalina; Kapata, Nathan; Moraes, Albertina Ngomah; Chongwe, Gershom; Munthali, James
Genomic research has the potential to increase knowledge in health sciences, but the process has to ensure the safety, integrity and well-being of research participants. A legal framework for the conduct of health research in Zambia is available. However, the ethical, policy and regulatory framework to operationalise genomic research requires a paradigm shift. This paper outlines the current legal and policy framework as well as the ethics environment, and suggests recommendations for Zambia to fully benefit from the opportunity that genomic research presents. This will entail creating national research interest, improving knowledge levels, and building community trust among researchers, policymakers, donors, regulators and, most importantly, patients and research participants. A real balancing act of the risk and benefits will need to be objectively undertaken.
Full Text Available Elucidating gene regulatory network (GRN from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks.
Simeonidis, Evangelos; Chandrasekaran, Sriram; Price, Nathan D
The integration of transcriptional regulatory and metabolic networks is a crucial step in the process of predicting metabolic behaviors that emerge from either genetic or environmental changes. Here, we present a guide to PROM (probabilistic regulation of metabolism), an automated method for the construction and simulation of integrated metabolic and transcriptional regulatory networks that enables large-scale phenotypic predictions for a wide range of model organisms.
Hase, Takeshi; Ghosh, Samik; Yamanaka, Ryota; Kitano, Hiroaki
Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks.
de Matos Simoes, Ricardo; Tripathi, Shailesh; Emmert-Streib, Frank
The physical periphery of a biological cell is mainly described by signaling pathways which are triggered by transmembrane proteins and receptors that are sentinels to control the whole gene regulatory network of a cell. However, our current knowledge about the gene regulatory mechanisms that are governed by extracellular signals is severely limited. The purpose of this paper is three fold. First, we infer a gene regulatory network from a large-scale B-cell lymphoma expression data set using the C3NET algorithm. Second, we provide a functional and structural analysis of the largest connected component of this network, revealing that this network component corresponds to the peripheral region of a cell. Third, we analyze the hierarchical organization of network components of the whole inferred B-cell gene regulatory network by introducing a new approach which exploits the variability within the data as well as the inferential characteristics of C3NET. As a result, we find a functional bisection of the network corresponding to different cellular components. Overall, our study allows to highlight the peripheral gene regulatory network of B-cells and shows that it is centered around hub transmembrane proteins located at the physical periphery of the cell. In addition, we identify a variety of novel pathological transmembrane proteins such as ion channel complexes and signaling receptors in B-cell lymphoma.
Full Text Available Probabilistic Boolean networks (PBNs have recently been introduced as a promising class of models of genetic regulatory networks. The dynamic behaviour of PBNs can be analysed in the context of Markov chains. A key goal is the determination of the steady-state (long-run behaviour of a PBN by analysing the corresponding Markov chain. This allows one to compute the long-term influence of a gene on another gene or determine the long-term joint probabilistic behaviour of a few selected genes. Because matrix-based methods quickly become prohibitive for large sizes of networks, we propose the use of Monte Carlo methods. However, the rate of convergence to the stationary distribution becomes a central issue. We discuss several approaches for determining the number of iterations necessary to achieve convergence of the Markov chain corresponding to a PBN. Using a recently introduced method based on the theory of two-state Markov chains, we illustrate the approach on a sub-network designed from human glioma gene expression data and determine the joint steadystate probabilities for several groups of genes.
Kaznessis Yiannis N
Full Text Available Abstract Background Tightly regulated gene networks, precisely controlling the expression of protein molecules, have received considerable interest by the biomedical community due to their promising applications. Among the most well studied inducible transcription systems are the tetracycline regulatory expression systems based on the tetracycline resistance operon of Escherichia coli, Tet-Off (tTA and Tet-On (rtTA. Despite their initial success and improved designs, limitations still persist, such as low inducer sensitivity. Instead of looking at these networks statically, and simply changing or mutating the promoter and operator regions with trial and error, a systematic investigation of the dynamic behavior of the network can result in rational design of regulatory gene expression systems. Sophisticated algorithms can accurately capture the dynamical behavior of gene networks. With computer aided design, we aim to improve the synthesis of regulatory networks and propose new designs that enable tighter control of expression. Results In this paper we engineer novel networks by recombining existing genes or part of genes. We synthesize four novel regulatory networks based on the Tet-Off and Tet-On systems. We model all the known individual biomolecular interactions involved in transcription, translation, regulation and induction. With multiple time-scale stochastic-discrete and stochastic-continuous models we accurately capture the transient and steady state dynamics of these networks. Important biomolecular interactions are identified and the strength of the interactions engineered to satisfy design criteria. A set of clear design rules is developed and appropriate mutants of regulatory proteins and operator sites are proposed. Conclusion The complexity of biomolecular interactions is accurately captured through computer simulations. Computer simulations allow us to look into the molecular level, portray the dynamic behavior of gene regulatory
Christen, Beat; Abeliuk, Eduardo; Collier, John M; Kalogeraki, Virginia S; Passarelli, Ben; Coller, John A; Fero, Michael J; McAdams, Harley H; Shapiro, Lucy
.... Full discovery of its essential genome, including non‐coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of a bacterial cell. Using hyper...
Makai, Szabolcs; Tamás, László; Juhász, Angéla
Wheat has been cultivated for 10000 years and ever since the origin of hexaploid wheat it has been exempt from natural selection. Instead, it was under the constant selective pressure of human agriculture from harvest to sowing during every year, producing a vast array of varieties. Wheat has been adopted globally, accumulating variation for genes involved in yield traits, environmental adaptation and resistance. However, one small but important part of the wheat genome has hardly changed: the regulatory regions of both the x- and y-type high molecular weight glutenin subunit (HMW-GS) genes, which are alone responsible for approximately 12% of the grain protein content. The phylogeny of the HMW-GS regulatory regions of the Triticeae demonstrates that a genetic bottleneck may have led to its decreased diversity during domestication and the subsequent cultivation. It has also highlighted the fact that the wild relatives of wheat may offer an unexploited genetic resource for the regulatory region of these genes. Significant research efforts have been made in the public sector and by international agencies, using wild crosses to exploit the available genetic variation, and as a result synthetic hexaploids are now being utilized by a number of breeding companies. However, a newly emerging tool of genome editing provides significantly improved efficiency in exploiting the natural variation in HMW-GS genes and incorporating this into elite cultivars and breeding lines. Recent advancement in the understanding of the regulation of these genes underlines the needs for an overview of the regulatory elements for genome editing purposes.
Ahnert, S E; Fink, T M A
Network motifs have been studied extensively over the past decade, and certain motifs, such as the feed-forward loop, play an important role in regulatory networks. Recent studies have used Boolean network motifs to explore the link between form and function in gene regulatory networks and have found that the structure of a motif does not strongly determine its function, if this is defined in terms of the gene expression patterns the motif can produce. Here, we offer a different, higher-level definition of the 'function' of a motif, in terms of two fundamental properties of its dynamical state space as a Boolean network. One is the basin entropy, which is a complexity measure of the dynamics of Boolean networks. The other is the diversity of cyclic attractor lengths that a given motif can produce. Using these two measures, we examine all 104 topologically distinct three-node motifs and show that the structural properties of a motif, such as the presence of feedback loops and feed-forward loops, predict fundamental characteristics of its dynamical state space, which in turn determine aspects of its functional versatility. We also show that these higher-level properties have a direct bearing on real regulatory networks, as both basin entropy and cycle length diversity show a close correspondence with the prevalence, in neural and genetic regulatory networks, of the 13 connected motifs without self-interactions that have been studied extensively in the literature. © 2016 The Authors.
Douglas L. Brutlag Nancy Ryan Gray
This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.
Background Interspecific hybridization creates individuals harboring diverged genomes. The interaction of these genomes can generate successful evolutionary novelty or disadvantageous genomic conflict. Annual sunflowers Helianthus annuus and H. petiolaris have a rich history of hybridization in natural populations. Although first-generation hybrids generally have low fertility, hybrid swarms that include later generation and fully fertile backcross plants have been identified, as well as at least three independently-originated stable hybrid taxa. We examine patterns of transcript accumulation in the earliest stages of hybridization of these species via analyses of transcriptome sequences from laboratory-derived F1 offspring of an inbred H. annuus cultivar and a wild H. petiolaris accession. Results While nearly 14% of the reference transcriptome showed significant accumulation differences between parental accessions, total F1 transcript levels showed little evidence of dominance, as midparent transcript levels were highly predictive of transcript accumulation in F1 plants. Allelic bias in F1 transcript accumulation was detected in 20% of transcripts containing sufficient polymorphism to distinguish parental alleles; however the magnitude of these biases were generally smaller than differences among parental accessions. Conclusions While analyses of allelic bias suggest that cis regulatory differences between H. annuus and H. petiolaris are common, their effect on transcript levels may be more subtle than trans-acting regulatory differences. Overall, these analyses found little evidence of regulatory incompatibility or dominance interactions between parental genomes within F1 hybrid individuals, although it is unclear whether this is a legacy or an enabler of introgression between species. PMID:23701699
Levy, Orr; Knisbacher, Binyamin A.; Levanon, Erez Y.; Havlin, Shlomo
Retroelements (REs) are mobile DNA sequences that multiply and spread throughout genomes by a copy-and-paste mechanism. These parasitic elements are active in diverse genomes, from yeast to humans, where they promote diversity, cause disease, and accelerate evolution. Because of their high copy number and sequence similarity, studying their activity and tracking their proliferation dynamics is a challenge. It is particularly difficult to pinpoint the few REs in a genome that are still active in the haystack of degenerate and suppressed elements. We develop a computational framework based on network theory that tracks the path of RE proliferation throughout evolution. We analyze SVA (SINE-VNTR-Alu), the youngest RE family in human genomes, to understand RE dynamics across hominids. Integrating comparative genomics and network tools enables us to track the course of SVA proliferation, identify yet unknown active communities, and detect tentative “master REs” that played key roles in SVA propagation, providing strong support for the fundamental “master gene” model of RE proliferation. The method is generic and thus can be applied to REs of any of the thousands of available genomes to identify active RE communities and master REs that were pivotal in the evolution of their host genomes. PMID:29043294
Full Text Available Bayesian network is one of the most successful graph models for representing the reactive oxygen species regulatory pathway. With the increasing number of microarray measurements, it is possible to construct the bayesian network from microarray data directly. Although large numbers of bayesian network learning algorithms have been developed, when applying them to learn bayesian networks from microarray data, the accuracies are low due to that the databases they used to learn bayesian networks contain too few microarray data. In this paper, we propose a consensus bayesian network which is constructed by combining bayesian networks from relevant literatures and bayesian networks learned from microarray data. It would have a higher accuracy than the bayesian networks learned from one database. In the experiment, we validated the bayesian network combination algorithm on several classic machine learning databases and used the consensus bayesian network to model the Escherichia coli's ROS pathway.
Hu, Liangdong; Wang, Limin
Bayesian network is one of the most successful graph models for representing the reactive oxygen species regulatory pathway. With the increasing number of microarray measurements, it is possible to construct the bayesian network from microarray data directly. Although large numbers of bayesian network learning algorithms have been developed, when applying them to learn bayesian networks from microarray data, the accuracies are low due to that the databases they used to learn bayesian networks contain too few microarray data. In this paper, we propose a consensus bayesian network which is constructed by combining bayesian networks from relevant literatures and bayesian networks learned from microarray data. It would have a higher accuracy than the bayesian networks learned from one database. In the experiment, we validated the bayesian network combination algorithm on several classic machine learning databases and used the consensus bayesian network to model the Escherichia coli's ROS pathway.
Østergaard, Simon; Olsson, Lisbeth; Johnston, M.
Increasing the flux through central carbon metabolism is difficult because of rigidity in regulatory structures, at both the genetic and the enzymatic levels. Here we describe metabolic engineering of a regulatory network to obtain a balanced increase in the activity of all the enzymes...... in the pathway, and ultimately, increasing metabolic flux through the pathway of interest, By manipulating the GAL gene regulatory network of Saccharomyces cerevisiae, which is a tightly regulated system, we produced prototroph mutant strains, which increased the flux through the galactose utilization pathway...
Bhardwaj, Nitin; Yan, Koon-Kiu; Gerstein, Mark B.
Gene regulatory networks have been shown to share some common aspects with commonplace social governance structures. Thus, we can get some intuition into their organization by arranging them into well-known hierarchical layouts. These hierarchies, in turn, can be placed between the extremes of autocracies, with well-defined levels and clear chains of command, and democracies, without such defined levels and with more co-regulatory partnerships between regulators. In general, the presence of partnerships decreases the variation in information flow amongst nodes within a level, more evenly distributing stress. Here we study various regulatory networks (transcriptional, modification, and phosphorylation) for five diverse species, Escherichia coli to human. We specify three levels of regulators—top, middle, and bottom—which collectively govern the non-regulator targets lying in the lowest fourth level. We define quantities for nodes, levels, and entire networks that measure their degree of collaboration and autocratic vs. democratic character. We show individual regulators have a range of partnership tendencies: Some regulate their targets in combination with other regulators in local instantiations of democratic structure, whereas others regulate mostly in isolation, in more autocratic fashion. Overall, we show that in all networks studied the middle level has the highest collaborative propensity and coregulatory partnerships occur most frequently amongst midlevel regulators, an observation that has parallels in corporate settings where middle managers must interact most to ensure organizational effectiveness. There is, however, one notable difference between networks in different species: The amount of collaborative regulation and democratic character increases markedly with overall genomic complexity. PMID:20351254
Tamada, Yoshinori; Imoto, Seiya; Araki, Hiromitsu; Nagasaki, Masao; Print, Cristin; Charnock-Jones, D Stephen; Miyano, Satoru
We present a novel algorithm to estimate genome-wide gene networks consisting of more than 20,000 genes from gene expression data using nonparametric Bayesian networks. Due to the difficulty of learning Bayesian network structures, existing algorithms cannot be applied to more than a few thousand genes. Our algorithm overcomes this limitation by repeatedly estimating subnetworks in parallel for genes selected by neighbor node sampling. Through numerical simulation, we confirmed that our algorithm outperformed a heuristic algorithm in a shorter time. We applied our algorithm to microarray data from human umbilical vein endothelial cells (HUVECs) treated with siRNAs, to construct a human genome-wide gene network, which we compared to a small gene network estimated for the genes extracted using a traditional bioinformatics method. The results showed that our genome-wide gene network contains many features of the small network, as well as others that could not be captured during the small network estimation. The results also revealed master-regulator genes that are not in the small network but that control many of the genes in the small network. These analyses were impossible to realize without our proposed algorithm.
Sato João R
Full Text Available Abstract Background There are several studies in the literature depicting measurement error in gene expression data and also, several others about regulatory network models. However, only a little fraction describes a combination of measurement error in mathematical regulatory networks and shows how to identify these networks under different rates of noise. Results This article investigates the effects of measurement error on the estimation of the parameters in regulatory networks. Simulation studies indicate that, in both time series (dependent and non-time series (independent data, the measurement error strongly affects the estimated parameters of the regulatory network models, biasing them as predicted by the theory. Moreover, when testing the parameters of the regulatory network models, p-values computed by ignoring the measurement error are not reliable, since the rate of false positives are not controlled under the null hypothesis. In order to overcome these problems, we present an improved version of the Ordinary Least Square estimator in independent (regression models and dependent (autoregressive models data when the variables are subject to noises. Moreover, measurement error estimation procedures for microarrays are also described. Simulation results also show that both corrected methods perform better than the standard ones (i.e., ignoring measurement error. The proposed methodologies are illustrated using microarray data from lung cancer patients and mouse liver time series data. Conclusions Measurement error dangerously affects the identification of regulatory network models, thus, they must be reduced or taken into account in order to avoid erroneous conclusions. This could be one of the reasons for high biological false positive rates identified in actual regulatory network models.
Hüser Andrea T
Full Text Available Abstract Background The knowledge about complete bacterial genome sequences opens the way to reconstruct the qualitative topology and global connectivity of transcriptional regulatory networks. Since iron is essential for a variety of cellular processes but also poses problems in biological systems due to its high toxicity, bacteria have evolved complex transcriptional regulatory networks to achieve an effective iron homeostasis. Here, we apply a combination of transcriptomics, bioinformatics, in vitro assays, and comparative genomics to decipher the regulatory network of the iron-dependent transcriptional regulator DtxR of Corynebacterium glutamicum. Results A deletion of the dtxR gene of C. glutamicum ATCC 13032 led to the mutant strain C. glutamicum IB2103 that was able to grow in minimal medium only under low-iron conditions. By performing genome-wide DNA microarray hybridizations, differentially expressed genes involved in iron metabolism of C. glutamicum were detected in the dtxR mutant. Bioinformatics analysis of the genome sequence identified a common 19-bp motif within the upstream region of 31 genes, whose differential expression in C. glutamicum IB2103 was verified by real-time reverse transcription PCR. Binding of a His-tagged DtxR protein to oligonucleotides containing the 19-bp motifs was demonstrated in vitro by DNA band shift assays. At least 64 genes encoding a variety of physiological functions in iron transport and utilization, in central carbohydrate metabolism and in transcriptional regulation are controlled directly by the DtxR protein. A comparison with the bioinformatically predicted networks of C. efficiens, C. diphtheriae and C. jeikeium identified evolutionary conserved elements of the DtxR network. Conclusion This work adds considerably to our currrent understanding of the transcriptional regulatory network of C. glutamicum genes that are controlled by DtxR. The DtxR protein has a major role in controlling the
Denas, Olgert; Sandstrom, Richard; Cheng, Yong; Beal, Kathryn; Herrero, Javier; Hardison, Ross C; Taylor, James
Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood. We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of repurposed TFos, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest exaptation of some functional regulatory sequences into new function. Despite TFos repurposing, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TFos - target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the
Klann, Tyler S; Black, Joshua B; Chellappan, Malathi; Safi, Alexias; Song, Lingyun; Hilton, Isaac B; Crawford, Gregory E; Reddy, Timothy E; Gersbach, Charles A
Large genome-mapping consortia and thousands of genome-wide association studies have identified non-protein-coding elements in the genome as having a central role in various biological processes. However, decoding the functions of the millions of putative regulatory elements discovered in these studies remains challenging. CRISPR-Cas9-based epigenome editing technologies have enabled precise perturbation of the activity of specific regulatory elements. Here we describe CRISPR-Cas9-based epigenomic regulatory element screening (CERES) for improved high-throughput screening of regulatory element activity in the native genomic context. Using dCas9KRAB repressor and dCas9p300 activator constructs and lentiviral single guide RNA libraries to target DNase I hypersensitive sites surrounding a gene of interest, we carried out both loss- and gain-of-function screens to identify regulatory elements for the β-globin and HER2 loci in human cells. CERES readily identified known and previously unidentified regulatory elements, some of which were dependent on cell type or direction of perturbation. This technology allows the high-throughput functional annotation of putative regulatory elements in their native chromosomal context.
Full Text Available The SCL (TAL1 transcription factor is a critical regulator of haematopoiesis and its expression is tightly controlled by multiple cis-acting regulatory elements. To elaborate further the DNA elements which control its regulation, we used genomic tiling microarrays covering 256 kb of the human SCL locus to perform a concerted analysis of chromatin structure and binding of regulatory proteins in human haematopoietic cell lines. This approach allowed us to characterise further or redefine known human SCL regulatory elements and led to the identification of six novel elements with putative regulatory function both up and downstream of the SCL gene. They bind a number of haematopoietic transcription factors (GATA1, E2A LMO2, SCL, LDB1, CTCF or components of the transcriptional machinery and are associated with relevant histone modifications, accessible chromatin and low nucleosomal density. Functional characterisation shows that these novel elements are able to enhance or repress SCL promoter activity, have endogenous promoter function or enhancer-blocking insulator function. Our analysis opens up several areas for further investigation and adds new layers of complexity to our understanding of the regulation of SCL expression.
Alberto J. Martin
Full Text Available One of the main challenges of the post-genomic era is the understanding of how gene expression is controlled. Changes in gene expression lay behind diverse biological phenomena such as development, disease and the adaptation to different environmental conditions. Despite the availability of well-established methods to identify these changes, tools to discern how gene regulation is orchestrated are still required. The regulation of gene expression is usually depicted as a Gene Regulatory Network (GRN where changes in the network structure (i.e., network topology represent adjustments of gene regulation. Like other networks, GRNs are composed of basic building blocks; small induced subgraphs called graphlets. Here we present LoTo, a novel method that using Graphlet Based Metrics (GBMs identifies topological variations between different states of a GRN. Under our approach, different states of a GRN are analyzed to determine the types of graphlet formed by all triplets of nodes in the network. Subsequently, graphlets occurring in a state of the network are compared to those formed by the same three nodes in another version of the network. Once the comparisons are performed, LoTo applies metrics from binary classification problems calculated on the existence and absence of graphlets to assess the topological similarity between both network states. Experiments performed on randomized networks demonstrate that GBMs are more sensitive to topological variation than the same metrics calculated on single edges. Additional comparisons with other common metrics demonstrate that our GBMs are capable to identify nodes whose local topology changes between different states of the network. Notably, due to the explicit use of graphlets, LoTo captures topological variations that are disregarded by other approaches. LoTo is freely available as an online web server at http://dlab.cl/loto.
Full Text Available Abstract Background Cellular processes are controlled by gene-regulatory networks. Several computational methods are currently used to learn the structure of gene-regulatory networks from data. This study focusses on time series gene expression and gene knock-out data in order to identify the underlying network structure. We compare the performance of different network reconstruction methods using synthetic data generated from an ensemble of reference networks. Data requirements as well as optimal experiments for the reconstruction of gene-regulatory networks are investigated. Additionally, the impact of prior knowledge on network reconstruction as well as the effect of unobserved cellular processes is studied. Results We identify linear Gaussian dynamic Bayesian networks and variable selection based on F-statistics as suitable methods for the reconstruction of gene-regulatory networks from time series data. Commonly used discrete dynamic Bayesian networks perform inferior and this result can be attributed to the inevitable information loss by discretization of expression data. It is shown that short time series generated under transcription factor knock-out are optimal experiments in order to reveal the structure of gene regulatory networks. Relative to the level of observational noise, we give estimates for the required amount of gene expression data in order to accurately reconstruct gene-regulatory networks. The benefit of using of prior knowledge within a Bayesian learning framework is found to be limited to conditions of small gene expression data size. Unobserved processes, like protein-protein interactions, induce dependencies between gene expression levels similar to direct transcriptional regulation. We show that these dependencies cannot be distinguished from transcription factor mediated gene regulation on the basis of gene expression data alone. Conclusion Currently available data size and data quality make the reconstruction of
Silver, Debra L
The cerebral cortex controls our most distinguishing higher cognitive functions. Human-specific gene expression differences are abundant in the cerebral cortex, yet we have only begun to understand how these variations impact brain function. This review discusses the current evidence linking non-coding regulatory DNA changes, including enhancers, with neocortical evolution. Functional interrogation using animal models reveals converging roles for our genome in key aspects of cortical development including progenitor cell cycle and neuronal signaling. New technologies, including iPS cells and organoids, offer potential alternatives to modeling evolutionary modifications in a relevant species context. Several diseases rooted in the cerebral cortex uniquely manifest in humans compared to other primates, thus highlighting the importance of understanding human brain differences. Future studies of regulatory loci, including those implicated in disease, will collectively help elucidate key cellular and genetic mechanisms underlying our distinguishing cognitive traits. © 2015 WILEY Periodicals, Inc.
Soberano de Oliveira, Ana Paula; Patil, Kiran Raosaheb; Nielsen, Jens
is to use the topology of bio-molecular interaction networks in order to constrain the solution space. Such approaches systematically integrate the existing biological knowledge with the 'omics' data. Results: Here we introduce a hypothesis-driven method that integrates bio-molecular network topology...... Factors, Reporter Proteins and Reporter Complexes, and use this to decipher the logic of regulatory circuits playing a key role in yeast glucose repression and human diabetes. Conclusion: Reporter Features offer the opportunity to identify regulatory hot-spots in bio-molecular interaction networks...
Full Text Available The interplay between entropy and robustness of gene network is a core mechanism of systems biology. The entropy is a measure of randomness or disorder of a physical system due to random parameter fluctuation and environmental noises in gene regulatory networks. The robustness of a gene regulatory network, which can be measured as the ability to tolerate the random parameter fluctuation and to attenuate the effect of environmental noise, will be discussed from the robust H∞ stabilization and filtering perspective. In this review, we will also discuss their balancing roles in evolution and potential applications in systems and synthetic biology.
Carillo Caicedo, G.; Perez-Arriaga, I.J. [Univ. Pontificia Comillas, Madrid (Spain). Inst. de Investigacion Tecnologica
Any given regulatory scheme for the electric power industry requires a consistent set of rules determining operation and pricing of the different electricity services, in order to guarantee optimal efficiency conditions. This paper determines the optimal operation rules for a distribution network under different regulatory environments, and specifically in a competitive setting. Using realistic examples and a novel reconfiguration algorithm, it is shown that imperfect regulations may result in widely different `optimal` network configurations, therefore emphasizing the need to avoid the introduction of perverse economic incentives in the remuneration of network distribution services. 6 refs, 5 figs
Full Text Available Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate
Windhager, Lukas; Zierer, Jonas; Küffner, Robert
Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate predictions for a gene
Errol A. Blake
Full Text Available Database security has evolved; data security professionals have developed numerous techniques and approaches to assure data confidentiality, integrity, and availability. This paper will show that the Traditional Database Security, which has focused primarily on creating user accounts and managing user privileges to database objects are not enough to protect data confidentiality, integrity, and availability. This paper is a compilation of different journals, articles and classroom discussions will focus on unifying the process of securing data or information whether it is in use, in storage or being transmitted. Promoting a change in Database Curriculum Development trends may also play a role in helping secure databases. This paper will take the approach that if one make a conscientious effort to unifying the Database Security process, which includes Database Management System (DBMS selection process, following regulatory compliances, analyzing and learning from the mistakes of others, Implementing Networking Security Technologies, and Securing the Database, may prevent database breach.
James J. Lewis
Full Text Available Uncovering phylogenetic patterns of cis-regulatory evolution remains a fundamental goal for evolutionary and developmental biology. Here, we characterize the evolution of regulatory loci in butterflies and moths using chromatin immunoprecipitation sequencing (ChIP-seq annotation of regulatory elements across three stages of head development. In the process we provide a high-quality, functionally annotated genome assembly for the butterfly, Heliconius erato. Comparing cis-regulatory element conservation across six lepidopteran genomes, we find that regulatory sequences evolve at a pace similar to that of protein-coding regions. We also observe that elements active at multiple developmental stages are markedly more conserved than elements with stage-specific activity. Surprisingly, we also find that stage-specific proximal and distal regulatory elements evolve at nearly identical rates. Our study provides a benchmark for genome-wide patterns of regulatory element evolution in insects, and it shows that developmental timing of activity strongly predicts patterns of regulatory sequence evolution.
Singh, Navjot; Wade, Joseph T
The ability to map transcription start sites is critical for studies of gene regulation and for identification of novel RNAs. Conventional RNA-seq is often insufficient for identification of transcription start sites due to low coverage and/or RNA processing events. We have developed a highly sensitive, genome-scale method for detection of transcription start sites in bacteria. This method uses deep sequencing of cDNA libraries to identify transcription start sites with strand specificity at nucleotide resolution. Here, we describe the application of this method for transcription start site identification in Escherichia coli.
Ali, Joseph; Califf, Robert; Sugarman, Jeremy
PCORnet, the National Patient-Centered Clinical Research Network, seeks to establish a robust national health data network for patient-centered comparative effectiveness research. This article reports the results of a PCORnet survey designed to identify the ethics and regulatory challenges anticipated in network implementation. A 12-item online survey was developed by leadership of the PCORnet Ethics and Regulatory Task Force; responses were collected from the 29 PCORnet networks. The most pressing ethics issues identified related to informed consent, patient engagement, privacy and confidentiality, and data sharing. High priority regulatory issues included IRB coordination, privacy and confidentiality, informed consent, and data sharing. Over 150 IRBs and five different approaches to managing multisite IRB review were identified within PCORnet. Further empirical and scholarly work, as well as practical and policy guidance, is essential if important initiatives that rely on comparative effectiveness research are to move forward.
Full Text Available Abstract Background Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discretization has been conducted, in the context of gene regulatory network inference from time series gene expression data. Results In this study, we propose a new discretization method "bikmeans", and compare its performance with four other widely-used discretization methods using different datasets, modeling algorithms and number of intervals. Sensitivities, specificities and total accuracies were calculated and statistical analysis was carried out. Bikmeans method always gave high total accuracies. Conclusions Our results indicate that proper discretization methods can consistently improve gene regulatory network inference independent of network modeling algorithms and datasets. Our new method, bikmeans, resulted in significant better total accuracies than other methods.
Lim, Wendell A.; Lee, Connie M.; Tang, Chao
A challenge in biology is to understand how complex molecular networks in the cell execute sophisticated regulatory functions. Here we explore the idea that there are common and general principles that link network structures to biological functions, principles that constrain the design solutions that evolution can converge upon for accomplishing a given cellular task. We describe approaches for classifying networks based on abstract architectures and functions, rather than on the specific molecular components of the networks. For any common regulatory task, can we define the space of all possible molecular solutions? Such inverse approaches might ultimately allow the assembly of a design table of core molecular algorithms that could serve as a guide for building synthetic networks and modulating disease networks. PMID:23352241
Shahdoust, Maryam; Pezeshk, Hamid; Mahjub, Hossein; Sadeghi, Mehdi
The Common topological features of related species gene regulatory networks suggest reconstruction of the network of one species by using the further information from gene expressions profile of related species. We present an algorithm to reconstruct the gene regulatory network named; F-MAP, which applies the knowledge about gene interactions from related species. Our algorithm sets a Bayesian framework to estimate the precision matrix of one species microarray gene expressions dataset to infer the Gaussian Graphical model of the network. The conjugate Wishart prior is used and the information from related species is applied to estimate the hyperparameters of the prior distribution by using the factor analysis. Applying the proposed algorithm on six related species of drosophila shows that the precision of reconstructed networks is improved considerably compared to the precision of networks constructed by other Bayesian approaches.
Lim, Wendell A; Lee, Connie M; Tang, Chao
A challenge in biology is to understand how complex molecular networks in the cell execute sophisticated regulatory functions. Here we explore the idea that there are common and general principles that link network structures to biological functions, principles that constrain the design solutions that evolution can converge upon for accomplishing a given cellular task. We describe approaches for classifying networks based on abstract architectures and functions, rather than on the specific molecular components of the networks. For any common regulatory task, can we define the space of all possible molecular solutions? Such inverse approaches might ultimately allow the assembly of a design table of core molecular algorithms that could serve as a guide for building synthetic networks and modulating disease networks. Copyright © 2013 Elsevier Inc. All rights reserved.
Parker Joel S
Full Text Available Abstract Background The proneural proteins Mash1 and Ngn2 are key cell autonomous regulators of neurogenesis in the mammalian central nervous system, yet little is known about the molecular pathways regulated by these transcription factors. Results Here we identify the downstream effectors of proneural genes in the telencephalon using a genomic approach to analyze the transcriptome of mice that are either lacking or overexpressing proneural genes. Novel targets of Ngn2 and/or Mash1 were identified, such as members of the Notch and Wnt pathways, and proteins involved in adhesion and signal transduction. Next, we searched the non-coding sequence surrounding the predicted proneural downstream effector genes for evolutionarily conserved transcription factor binding sites associated with newly defined consensus binding sites for Ngn2 and Mash1. This allowed us to identify potential novel co-factors and co-regulators for proneural proteins, including Creb, Tcf/Lef, Pou-domain containing transcription factors, Sox9, and Mef2a. Finally, a gene regulatory network was delineated using a novel Bayesian-based algorithm that can incorporate information from diverse datasets. Conclusion Together, these data shed light on the molecular pathways regulated by proneural genes and demonstrate that the integration of experimentation with bioinformatics can guide both hypothesis testing and hypothesis generation.
Lin, Lu; Xu, Jian
Interest in thermophilic bacteria as live-cell catalysts in biofuel and biochemical industry has surged in recent years, due to their tolerance of high temperature and wide spectrum of carbon-sources that include cellulose. However their direct employment as microbial cellular factories in the highly demanding industrial conditions has been hindered by uncompetitive biofuel productivity, relatively low tolerance to solvent and osmic stresses, and limitation in genome engineering tools. In this work we review recent advances in dissecting and engineering the metabolic and regulatory networks of thermophilic bacteria for improving the traits of key interest in biofuel industry: cellulose degradation, pentose-hexose co-utilization, and tolerance of thermal, osmotic, and solvent stresses. Moreover, new technologies enabling more efficient genetic engineering of thermophiles were discussed, such as improved electroporation, ultrasound-mediated DNA delivery, as well as thermo-stable plasmids and functional selection systems. Expanded applications of such technological advancements in thermophilic microbes promise to substantiate a synthetic biology perspective, where functional parts, module, chassis, cells and consortia were modularly designed and rationally assembled for the many missions at industry and nature that demand the extraordinary talents of these extremophiles. Copyright © 2013 Elsevier Inc. All rights reserved.
Gu, Fei; Hsu, Hang-Kai; Hsu, Pei-Yin; Wu, Jiejun; Ma, Yilin; Parvin, Jeffrey; Huang, Tim H-M; Jin, Victor X
Global profiling of in vivo protein-DNA interactions using ChIP-based technologies has evolved rapidly in recent years. Although many genome-wide studies have identified thousands of ERα binding sites and have revealed the associated transcription factor (TF) partners, such as AP1, FOXA1 and CEBP, little is known about ERα associated hierarchical transcriptional regulatory networks. In this study, we applied computational approaches to analyze three public available ChIP-based datasets: ChIP-seq, ChIP-PET and ChIP-chip, and to investigate the hierarchical regulatory network for ERα and ERα partner TFs regulation in estrogen-dependent breast cancer MCF7 cells. 16 common TFs and two common new TF partners (RORA and PITX2) were found among ChIP-seq, ChIP-chip and ChIP-PET datasets. The regulatory networks were constructed by scanning the ChIP-peak region with TF specific position weight matrix (PWM). A permutation test was performed to test the reliability of each connection of the network. We then used DREM software to perform gene ontology function analysis on the common genes. We found that FOS, PITX2, RORA and FOXA1 were involved in the up-regulated genes.We also conducted the ERα and Pol-II ChIP-seq experiments in tamoxifen resistance MCF7 cells (denoted as MCF7-T in this study) and compared the difference between MCF7 and MCF7-T cells. The result showed very little overlap between these two cells in terms of targeted genes (21.2% of common genes) and targeted TFs (25% of common TFs). The significant dissimilarity may indicate totally different transcriptional regulatory mechanisms between these two cancer cells. Our study uncovers new estrogen-mediated regulatory networks by mining three ChIP-based data in MCF7 cells and ChIP-seq data in MCF7-T cells. We compared the different ChIP-based technologies as well as different breast cancer cells. Our computational analytical approach may guide biologists to further study the underlying mechanisms in breast
Full Text Available Abstract Background Global profiling of in vivo protein-DNA interactions using ChIP-based technologies has evolved rapidly in recent years. Although many genome-wide studies have identified thousands of ERα binding sites and have revealed the associated transcription factor (TF partners, such as AP1, FOXA1 and CEBP, little is known about ERα associated hierarchical transcriptional regulatory networks. Results In this study, we applied computational approaches to analyze three public available ChIP-based datasets: ChIP-seq, ChIP-PET and ChIP-chip, and to investigate the hierarchical regulatory network for ERα and ERα partner TFs regulation in estrogen-dependent breast cancer MCF7 cells. 16 common TFs and two common new TF partners (RORA and PITX2 were found among ChIP-seq, ChIP-chip and ChIP-PET datasets. The regulatory networks were constructed by scanning the ChIP-peak region with TF specific position weight matrix (PWM. A permutation test was performed to test the reliability of each connection of the network. We then used DREM software to perform gene ontology function analysis on the common genes. We found that FOS, PITX2, RORA and FOXA1 were involved in the up-regulated genes. We also conducted the ERα and Pol-II ChIP-seq experiments in tamoxifen resistance MCF7 cells (denoted as MCF7-T in this study and compared the difference between MCF7 and MCF7-T cells. The result showed very little overlap between these two cells in terms of targeted genes (21.2% of common genes and targeted TFs (25% of common TFs. The significant dissimilarity may indicate totally different transcriptional regulatory mechanisms between these two cancer cells. Conclusions Our study uncovers new estrogen-mediated regulatory networks by mining three ChIP-based data in MCF7 cells and ChIP-seq data in MCF7-T cells. We compared the different ChIP-based technologies as well as different breast cancer cells. Our computational analytical approach may guide biologists to
Background: The transcriptional regulatory network involved in low temperature response leading to acclimation has been established in Arabidopsis. In japonica rice, which can only withstand transient exposure to milder cold stress (10C), an oxidative-mediated network has been proposed to play a key role in configuring early responses and short-term defenses. The components, hierarchical organization and physiological consequences of this network were further dissected by a systems-level approach.Results: Regulatory clusters responding directly to oxidative signals were prominent during the initial 6 to 12 hours at 10C. Early events mirrored a typical oxidative response based on striking similarities of the transcriptome to disease, elicitor and wounding induced processes. Targets of oxidative-mediated mechanisms are likely regulated by several classes of bZIP factors acting on as1/ocs/TGA-like element enriched clusters, ERF factors acting on GCC-box/JAre-like element enriched clusters and R2R3-MYB factors acting on MYB2-like element enriched clusters.Temporal induction of several H2O2-induced bZIP, ERF and MYB genes coincided with the transient H2O2spikes within the initial 6 to 12 hours. Oxidative-independent responses involve DREB/CBF, RAP2 and RAV1 factors acting on DRE/CRT/rav1-like enriched clusters and bZIP factors acting on ABRE-like enriched clusters. Oxidative-mediated clusters were activated earlier than ABA-mediated clusters.Conclusion: Genome-wide, physiological and whole-plant level analyses established a holistic view of chilling stress response mechanism of japonica rice. Early response regulatory network triggered by oxidative signals is critical for prolonged survival under sub-optimal temperature. Integration of stress and developmental responses leads to modulated growth and vigor maintenance contributing to a delay of plastic injuries. 2010 Yun et al; licensee BioMed Central Ltd.
Full Text Available High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc, which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.
Full Text Available Puberty is a complex physiological event by which animals mature into an adult capable of sexual reproduction. In order to enhance our understanding of the genes and regulatory pathways and networks involved in puberty, we characterized the transcriptome of five reproductive tissues (i.e. hypothalamus, pituitary gland, ovary, uterus, and endometrium as well as tissues known to be relevant to growth and metabolism needed to achieve puberty (i.e., longissimus dorsi muscle, adipose, and liver. These tissues were collected from pre- and post-pubertal Brangus heifers (3/8 Brahman; Bos indicus x 5/8 Angus; Bos taurus derived from a population of cattle used to identify quantitative trait loci associated with fertility traits (i.e., age of first observed corpus luteum (ACL, first service conception (FSC, and heifer pregnancy (HPG. In order to exploit the power of complementary omics analyses, pre- and post-puberty co-expression gene networks were constructed by combining the results from genome-wide association studies (GWAS, RNA-Seq, and bovine transcription factors. Eight tissues among pre-pubertal and post-pubertal Brangus heifers revealed 1,515 differentially expressed and 943 tissue-specific genes within the 17,832 genes confirmed by RNA-Seq analysis. The hypothalamus experienced the most notable up-regulation of genes via puberty (i.e., 204 out of 275 genes. Combining the results of GWAS and RNA-Seq, we identified 25 loci containing a single nucleotide polymorphism (SNP associated with ACL, FSC, and (or HPG. Seventeen of these SNP were within a gene and 13 of the genes were expressed in uterus or endometrium. Multi-tissue omics analyses revealed 2,450 co-expressed genes relative to puberty. The pre-pubertal network had 372,861 connections whereas the post-pubertal network had 328,357 connections. A sub-network from this process revealed key transcriptional regulators (i.e., PITX2, FOXA1, DACH2, PROP1, SIX6, etc.. Results from these multi
Full Text Available BACKGROUND: Gene Regulatory Networks (GRNs have become a major focus of interest in recent years. A number of reverse engineering approaches have been developed to help uncover the regulatory networks giving rise to the observed gene expression profiles. However, this is an overspecified problem due to the fact that more than one genotype (network wiring can give rise to the same phenotype. We refer to this phenomenon as "gene elasticity." In this work, we study the effect of this particular problem on the pure, data-driven inference of gene regulatory networks. METHODOLOGY: We simulated a four-gene network in order to produce "data" (protein levels that we use in lieu of real experimental data. We then optimized the network connections between the four genes with a view to obtain the original network that gave rise to the data. We did this for two different cases: one in which only the network connections were optimized and the other in which both the network connections as well as the kinetic parameters (given as reaction probabilities in our case were estimated. We observed that multiple genotypes gave rise to very similar protein levels. Statistical experimentation indicates that it is impossible to differentiate between the different networks on the basis of both equilibrium as well as dynamic data. CONCLUSIONS: We show explicitly that reverse engineering of GRNs from pure expression data is an indeterminate problem. Our results suggest the unsuitability of an inferential, purely data-driven approach for the reverse engineering transcriptional networks in the case of gene regulatory networks displaying a certain level of complexity.
Schwarz, J. M.
It is well known that genes that code for proteins regulate the expression of each other through protein-mediated interactions. With the discovery of microRNAs^1 (miRNAs), it has been conjectured that there are many such regulatory miRNAs in the cell that are never transcribed into proteins but are important for regulation and, hence, could explain the nature of the non-coding (or junk) DNA.^2 Furthermore, miRNAs are highly conserved molecules. So, just as genes that code for proteins form regulatory networks, we conjecture that miRNAs form a higher-level regulatory network amongst themselves as mediated by the genes-coding-for-proteins regulatory network to form a complex organism. We investigate this conjecture within the framework of random Boolean networks where the two-level architecture is modelled via two coupled random Boolean networks with one network taking precedence over the other for various input/output values. Aspects of the evolution of the lower-level network will also be addressed. ^1 D. P. Bartel, Cell 116, 281 (2004). ^2 J. S. Mattick, Sci. Amer. 291, 60 (2004).
Aparicio, S; Morrison, A; Gould, A; Gilthorpe, J; Chaudhuri, C; Rigby, P; Krumlauf, R; Brenner, S
Comparative vertebrate genome sequencing offers a powerful method for detecting conserved regulatory sequences. We propose that the compact genome of the teleost Fugu rubripes is well suited for this purpose. The evolutionary distance of teleosts from other vertebrates offers the maximum stringency for such evolutionary comparisons. To illustrate the comparative genome approach for F. rubripes, we use sequence comparisons between mouse and Fugu Hoxb-4 noncoding regions to identify conserved sequence blocks. We have used two approaches to test the function of these conserved blocks. In the first, homologous sequences were deleted from a mouse enhancer, resulting in a tissue-specific loss of activity when assayed in transgenic mice. In the second approach, Fugu DNA sequences showing homology to mouse sequences were tested for enhancer activity in transgenic mice. This strategy identified a neural element that mediates a subset of Hoxb-4 expression that is conserved between mammals and teleosts. The comparison of noncoding vertebrate sequences with those of Fugu, coupled to a transgenic bioassay, represents a general approach suitable for many genome projects. Images Fig. 2 Fig. 3 Fig. 4 PMID:7878040
Samal, Areejit; Wagner, Andreas; Martin, Olivier C
The ubiquity of modules in biological networks may result from an evolutionary benefit of a modular organization. For instance, modularity may increase the rate of adaptive evolution, because modules can be easily combined into new arrangements that may benefit their carrier. Conversely, modularity may emerge as a by-product of some trait. We here ask whether this last scenario may play a role in genome-scale metabolic networks that need to sustain life in one or more chemical environments. For such networks, we define a network module as a maximal set of reactions that are fully coupled, i.e., whose fluxes can only vary in fixed proportions. This definition overcomes limitations of purely graph based analyses of metabolism by exploiting the functional links between reactions. We call a metabolic network viable in a given chemical environment if it can synthesize all of an organism's biomass compounds from nutrients in this environment. An organism's metabolism is highly versatile if it can sustain life in many different chemical environments. We here ask whether versatility affects the modularity of metabolic networks. Using recently developed techniques to randomly sample large numbers of viable metabolic networks from a vast space of metabolic networks, we use flux balance analysis to study in silico metabolic networks that differ in their versatility. We find that highly versatile networks are also highly modular. They contain more modules and more reactions that are organized into modules. Most or all reactions in a module are associated with the same biochemical pathways. Modules that arise in highly versatile networks generally involve reactions that process nutrients or closely related chemicals. We also observe that the metabolism of E. coli is significantly more modular than even our most versatile networks. Our work shows that modularity in metabolic networks can be a by-product of functional constraints, e.g., the need to sustain life in multiple
Background The ubiquity of modules in biological networks may result from an evolutionary benefit of a modular organization. For instance, modularity may increase the rate of adaptive evolution, because modules can be easily combined into new arrangements that may benefit their carrier. Conversely, modularity may emerge as a by-product of some trait. We here ask whether this last scenario may play a role in genome-scale metabolic networks that need to sustain life in one or more chemical environments. For such networks, we define a network module as a maximal set of reactions that are fully coupled, i.e., whose fluxes can only vary in fixed proportions. This definition overcomes limitations of purely graph based analyses of metabolism by exploiting the functional links between reactions. We call a metabolic network viable in a given chemical environment if it can synthesize all of an organism's biomass compounds from nutrients in this environment. An organism's metabolism is highly versatile if it can sustain life in many different chemical environments. We here ask whether versatility affects the modularity of metabolic networks. Results Using recently developed techniques to randomly sample large numbers of viable metabolic networks from a vast space of metabolic networks, we use flux balance analysis to study in silico metabolic networks that differ in their versatility. We find that highly versatile networks are also highly modular. They contain more modules and more reactions that are organized into modules. Most or all reactions in a module are associated with the same biochemical pathways. Modules that arise in highly versatile networks generally involve reactions that process nutrients or closely related chemicals. We also observe that the metabolism of E. coli is significantly more modular than even our most versatile networks. Conclusions Our work shows that modularity in metabolic networks can be a by-product of functional constraints, e.g., the need to
Full Text Available Abstract Background The ubiquity of modules in biological networks may result from an evolutionary benefit of a modular organization. For instance, modularity may increase the rate of adaptive evolution, because modules can be easily combined into new arrangements that may benefit their carrier. Conversely, modularity may emerge as a by-product of some trait. We here ask whether this last scenario may play a role in genome-scale metabolic networks that need to sustain life in one or more chemical environments. For such networks, we define a network module as a maximal set of reactions that are fully coupled, i.e., whose fluxes can only vary in fixed proportions. This definition overcomes limitations of purely graph based analyses of metabolism by exploiting the functional links between reactions. We call a metabolic network viable in a given chemical environment if it can synthesize all of an organism's biomass compounds from nutrients in this environment. An organism's metabolism is highly versatile if it can sustain life in many different chemical environments. We here ask whether versatility affects the modularity of metabolic networks. Results Using recently developed techniques to randomly sample large numbers of viable metabolic networks from a vast space of metabolic networks, we use flux balance analysis to study in silico metabolic networks that differ in their versatility. We find that highly versatile networks are also highly modular. They contain more modules and more reactions that are organized into modules. Most or all reactions in a module are associated with the same biochemical pathways. Modules that arise in highly versatile networks generally involve reactions that process nutrients or closely related chemicals. We also observe that the metabolism of E. coli is significantly more modular than even our most versatile networks. Conclusions Our work shows that modularity in metabolic networks can be a by-product of functional
Khan, Abhinandan; Saha, Goutam; Pal, Rajat Kumar
A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives
Fabian Fröhlich; Barbara Kaltenbacher; Theis, Fabian J; Jan Hasenauer
Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model ...
Droege, Gabriele; Barker, Katharine; Astrin, Jonas J; Bartels, Paul; Butler, Carol; Cantrill, David; Coddington, Jonathan; Forest, Félix; Gemeinholzer, Birgit; Hobern, Donald; Mackenzie-Dodds, Jacqueline; Ó Tuama, Éamonn; Petersen, Gitte; Sanjur, Oris; Schindel, David; Seberg, Ole
The Global Genome Biodiversity Network (GGBN) was formed in 2011 with the principal aim of making high-quality well-documented and vouchered collections that store DNA or tissue samples of biodiversity, discoverable for research through a networked community of biodiversity repositories. This is achieved through the GGBN Data Portal (http://data.ggbn.org), which links globally distributed databases and bridges the gap between biodiversity repositories, sequence databases and research results. Advances in DNA extraction techniques combined with next-generation sequencing technologies provide new tools for genome sequencing. Many ambitious genome sequencing projects with the potential to revolutionize biodiversity research consider access to adequate samples to be a major bottleneck in their workflow. This is linked not only to accelerating biodiversity loss and demands to improve conservation efforts but also to a lack of standardized methods for providing access to genomic samples. Biodiversity biobank-holding institutions urgently need to set a standard of collaboration towards excellence in collections stewardship, information access and sharing and responsible and ethical use of such collections. GGBN meets these needs by enabling and supporting accessibility and the efficient coordinated expansion of biodiversity biobanks worldwide.
Ravindran, Vandana; Sunitha, V.; Bagler, Ganesh
Cancer is characterized by a complex web of regulatory mechanisms which makes it difficult to identify features that are central to its control. Molecular integrative models of cancer, generated with the help of data from experimental assays, facilitate use of control theory to probe for ways of controlling the state of such a complex dynamic network. We modeled the human cancer signaling network as a directed graph and analyzed it for its controllability, identification of driver nodes and their characterization. We identified the driver nodes using the maximum matching algorithm and classified them as backbone, peripheral and ordinary based on their role in regulatory interactions and control of the network. We found that the backbone driver nodes were key to driving the regulatory network into cancer phenotype (via mutations) as well as for steering into healthy phenotype (as drug targets). This implies that while backbone genes could lead to cancer by virtue of mutations, they are also therapeutic targets of cancer. Further, based on their impact on the size of the set of driver nodes, genes were characterized as indispensable, dispensable and neutral. Indispensable nodes within backbone of the network emerged as central to regulatory mechanisms of control of cancer. In addition to probing the cancer signaling network from the perspective of control, our findings suggest that indispensable backbone driver nodes could be potentially leveraged as therapeutic targets. This study also illustrates the application of structural controllability for studying the mechanisms underlying the regulation of complex diseases.
Ren, Hai-Peng; Huang, Xiao-Na; Hao, Jia-Xuan
Robust adaptation plays a key role in gene regulatory networks, and it is thought to be an important attribute for the organic or cells to survive in fluctuating conditions. In this paper, a simplified three-node enzyme network is modeled by the Michaelis-Menten rate equations for all possible topologies, and a family of topologies and the corresponding parameter sets of the network with satisfactory adaptation are obtained using the multi-objective genetic algorithm. The proposed approach improves the computation efficiency significantly as compared to the time consuming exhaustive searching method. This approach provides a systemic way for searching the feasible topologies and the corresponding parameter sets to make the gene regulatory networks have robust adaptation. The proposed methodology, owing to its universality and simplicity, can be used to address more complex issues in biological networks.
Robin P Smith
Full Text Available Inter-individual variation in gene regulatory elements is hypothesized to play a causative role in adverse drug reactions and reduced drug activity. However, relatively little is known about the location and function of drug-dependent elements. To uncover drug-associated elements in a genome-wide manner, we performed RNA-seq and ChIP-seq using antibodies against the pregnane X receptor (PXR and three active regulatory marks (p300, H3K4me1, H3K27ac on primary human hepatocytes treated with rifampin or vehicle control. Rifampin and PXR were chosen since they are part of the CYP3A4 pathway, which is known to account for the metabolism of more than 50% of all prescribed drugs. We selected 227 proximal promoters for genes with rifampin-dependent expression or nearby PXR/p300 occupancy sites and assayed their ability to induce luciferase in rifampin-treated HepG2 cells, finding only 10 (4.4% that exhibited drug-dependent activity. As this result suggested a role for distal enhancer modules, we searched more broadly to identify 1,297 genomic regions bearing a conditional PXR occupancy as well as all three active regulatory marks. These regions are enriched near genes that function in the metabolism of xenobiotics, specifically members of the cytochrome P450 family. We performed enhancer assays in rifampin-treated HepG2 cells for 42 of these sequences as well as 7 sequences that overlap linkage-disequilibrium blocks defined by lead SNPs from pharmacogenomic GWAS studies, revealing 15/42 and 4/7 to be functional enhancers, respectively. A common African haplotype in one of these enhancers in the GSTA locus was found to exhibit potential rifampin hypersensitivity. Combined, our results further suggest that enhancers are the predominant targets of rifampin-induced PXR activation, provide a genome-wide catalog of PXR targets and serve as a model for the identification of drug-responsive regulatory elements.
Poultney, Christopher S.; Greenfield, Alex; Bonneau, Richard
Regulatory and signaling networks coordinate the enormously complex interactions and processes that control cellular processes (such as metabolism and cell division), coordinate response to the environment, and carry out multiple cell decisions (such as development and quorum sensing). Regulatory network inference is the process of inferring these networks, traditionally from microarray data but increasingly incorporating other measurement types such as proteomics, ChIP-seq, metabolomics, and mass cytometry. We discuss existing techniques for network inference. We review in detail our pipeline, which consists of an initial biclustering step, designed to estimate co-regulated groups; a network inference step, designed to select and parameterize likely regulatory models for the control of the co-regulated groups from the biclustering step; and a visualization and analysis step, designed to find and communicate key features of the network. Learning biological networks from even the most complete data sets is challenging; we argue that integrating new data types into the inference pipeline produces networks of increased accuracy, validity, and biological relevance. PMID:22482944
Full Text Available We investigate in this paper reverse engineering of gene regulatory networks from time-series microarray data. We apply dynamic Bayesian networks (DBNs for modeling cell cycle regulations. In developing a network inference algorithm, we focus on soft solutions that can provide a posteriori probability (APP of network topology. In particular, we propose a variational Bayesian structural expectation maximization algorithm that can learn the posterior distribution of the network model parameters and topology jointly. We also show how the obtained APPs of the network topology can be used in a Bayesian data integration strategy to integrate two different microarray data sets. The proposed VBSEM algorithm has been tested on yeast cell cycle data sets. To evaluate the confidence of the inferred networks, we apply a moving block bootstrap method. The inferred network is validated by comparing it to the KEGG pathway map.
Íñiguez, Luis P; Nova-Franco, Bárbara; Hernández, Georgina
The intricate regulatory network for floral organogenesis in plants that includes AP2/ERF, SPL and AGL transcription factors, miR172 and miR156 along with other components is well documented, though its complexity and size keep increasing. The miR172/AP2 node was recently proposed as essential regulator in the legume-rhizobia nitrogen-fixing symbiosis. Research from our group contributed to demonstrate the control of common bean (Phaseolus vulgaris) nodulation by miR172c/AP2-1, however no other components of such regulatory network have been reported. Here we propose AGLs as new protagonists in the regulation of common bean nodulation and discuss the relevance of future deeper analysis of the complex AP2 regulatory network for nodule organogenesis in legumes.
In this Data in Brief we detail the contents and quality controls for the gene expression data (available from NCBI Gene Expression Omnibus repository with accession number GSE53091 associated with our study published in Genomics (Olsen et al. 2014. We also provide R code to access the data and reproduce the analysis presented in this article.
Full Text Available Abstract Background Reverse engineering of gene regulatory networks can be used to predict regulatory interactions of an organism faced with environmental changes, but can prove problematic, especially when focusing on complicated multi-factorial processes. Candida albicans is a major human fungal pathogen. During the infection process, this fungus is able to adapt to conditions of very low iron availability. Such adaptation is an important virulence attribute of virtually all pathogenic microbes. Understanding the regulation of iron acquisition genes will extend our knowledge of the complex regulatory changes during the infection process and might identify new potential drug targets. Thus, there is a need for efficient modelling approaches predicting key regulatory events of iron acquisition genes during the infection process. Results This study deals with the regulation of C. albicans iron uptake genes during adhesion to and invasion into human oral epithelial cells. A reverse engineering strategy is presented, which is able to infer regulatory networks on the basis of gene expression data, making use of relevant selection criteria such as sparseness and robustness. An exhaustive use of available knowledge from different data sources improved the network prediction. The predicted regulatory network proposes a number of new target genes for the transcriptional regulators Rim101, Hap3, Sef1 and Tup1. Furthermore, the molecular mode of action for Tup1 is clarified. Finally, regulatory interactions between the transcription factors themselves are proposed. This study presents a model describing how C. albicans may regulate iron acquisition during contact with and invasion of human oral epithelial cells. There is evidence that some of the proposed regulatory interactions might also occur during oral infection. Conclusions This study focuses on a typical problem in Systems Biology where an interesting biological phenomenon is studied using a small
Monzón-Sandoval, Jimena; Castillo-Morales, Atahualpa; Urrutia, Araxi O; Gutierrez, Humberto
During early development of the nervous system, gene expression patterns are known to vary widely depending on the specific developmental trajectories of different structures. Observable changes in gene expression profiles throughout development are determined by an underlying network of precise regulatory interactions between individual genes. Elucidating the organizing principles that shape this gene regulatory network is one of the central goals of developmental biology. Whether the developmental programme is the result of a dynamic driven by a fixed architecture of regulatory interactions, or alternatively, the result of waves of regulatory reorganization is not known. Here we contrast these two alternative models by examining existing expression data derived from the developing human brain in prenatal and postnatal stages. We reveal a sharp change in gene expression profiles at birth across brain areas. This sharp division between foetal and postnatal profiles is not the result of pronounced changes in level of expression of existing gene networks. Instead we demonstrate that the perinatal transition is marked by the widespread regulatory rearrangement within and across existing gene clusters, leading to the emergence of new functional groups. This rearrangement is itself organized into discrete blocks of genes, each targeted by a distinct set of transcriptional regulators and associated to specific biological functions. Our results provide evidence of an acute modular reorganization of the regulatory architecture of the brain transcriptome occurring at birth, reflecting the reassembly of new functional associations required for the normal transition from prenatal to postnatal brain development.
Droege, G.; Barker, K.; Seberg, O.; Coddington, J.; Benson, E.; Berendsohn, W. G.; Bunk, B.; Butler, C.; Cawsey, E. M.; Deck, J.; Döring, M.; Flemons, P.; Gemeinholzer, B.; Güntsch, A.; Hollowell, T.; Kelbert, P.; Kostadinov, I.; Kottmann, R.; Lawlor, R. T.; Lyal, C.; Mackenzie-Dodds, J.; Meyer, C.; Mulcahy, D.; Nussbeck, S. Y.; O'Tuama, É.; Orrell, T.; Petersen, G.; Robertson, T.; Söhngen, C.; Whitacre, J.; Wieczorek, J.; Yilmaz, P.; Zetzsche, H.; Zhang, Y.; Zhou, X.
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today’s ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard. Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard PMID:27694206
Droege, G; Barker, K; Seberg, O; Coddington, J; Benson, E; Berendsohn, W G; Bunk, B; Butler, C; Cawsey, E M; Deck, J; Döring, M; Flemons, P; Gemeinholzer, B; Güntsch, A; Hollowell, T; Kelbert, P; Kostadinov, I; Kottmann, R; Lawlor, R T; Lyal, C; Mackenzie-Dodds, J; Meyer, C; Mulcahy, D; Nussbeck, S Y; O'Tuama, É; Orrell, T; Petersen, G; Robertson, T; Söhngen, C; Whitacre, J; Wieczorek, J; Yilmaz, P; Zetzsche, H; Zhang, Y; Zhou, X
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today's ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard.Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard. © The Author(s) 2016. Published by Oxford University Press.
Potapov, Anatolij P; Goemann, Björn; Wingender, Edgar
Currently, there is a gap between purely theoretical studies of the topology of large bioregulatory networks and the practical traditions and interests of experimentalists. While the theoretical approaches emphasize the global characterization of regulatory systems, the practical approaches focus on the role of distinct molecules and genes in regulation. To bridge the gap between these opposite approaches, one needs to combine 'general' with 'particular' properties and translate abstract topological features of large systems into testable functional characteristics of individual components. Here, we propose a new topological parameter--the pairwise disconnectivity index of a network's element - that is capable of such bridging. The pairwise disconnectivity index quantifies how crucial an individual element is for sustaining the communication ability between connected pairs of vertices in a network that is displayed as a directed graph. Such an element might be a vertex (i.e., molecules, genes), an edge (i.e., reactions, interactions), as well as a group of vertices and/or edges. The index can be viewed as a measure of topological redundancy of regulatory paths which connect different parts of a given network and as a measure of sensitivity (robustness) of this network to the presence (absence) of each individual element. Accordingly, we introduce the notion of a path-degree of a vertex in terms of its corresponding incoming, outgoing and mediated paths, respectively. The pairwise disconnectivity index has been applied to the analysis of several regulatory networks from various organisms. The importance of an individual vertex or edge for the coherence of the network is determined by the particular position of the given element in the whole network. Our approach enables to evaluate the effect of removing each element (i.e., vertex, edge, or their combinations) from a network. The greatest potential value of this approach is its ability to systematically analyze the
Full Text Available Abstract Background Currently, there is a gap between purely theoretical studies of the topology of large bioregulatory networks and the practical traditions and interests of experimentalists. While the theoretical approaches emphasize the global characterization of regulatory systems, the practical approaches focus on the role of distinct molecules and genes in regulation. To bridge the gap between these opposite approaches, one needs to combine 'general' with 'particular' properties and translate abstract topological features of large systems into testable functional characteristics of individual components. Here, we propose a new topological parameter – the pairwise disconnectivity index of a network's element – that is capable of such bridging. Results The pairwise disconnectivity index quantifies how crucial an individual element is for sustaining the communication ability between connected pairs of vertices in a network that is displayed as a directed graph. Such an element might be a vertex (i.e., molecules, genes, an edge (i.e., reactions, interactions, as well as a group of vertices and/or edges. The index can be viewed as a measure of topological redundancy of regulatory paths which connect different parts of a given network and as a measure of sensitivity (robustness of this network to the presence (absence of each individual element. Accordingly, we introduce the notion of a path-degree of a vertex in terms of its corresponding incoming, outgoing and mediated paths, respectively. The pairwise disconnectivity index has been applied to the analysis of several regulatory networks from various organisms. The importance of an individual vertex or edge for the coherence of the network is determined by the particular position of the given element in the whole network. Conclusion Our approach enables to evaluate the effect of removing each element (i.e., vertex, edge, or their combinations from a network. The greatest potential value of
Lapointe, Christopher P; Preston, Melanie A; Wilinski, Daniel; Saunders, Harriet A J; Campbell, Zachary T; Wickens, Marvin
A single protein can bind and regulate many mRNAs. Multiple proteins with similar specificities often bind and control overlapping sets of mRNAs. Yet little is known about the architecture or dynamics of overlapped networks. We focused on three proteins with similar structures and related RNA-binding specificities-Puf3p, Puf4p, and Puf5p of S. cerevisiae Using RNA Tagging, we identified a "super-network" comprised of four subnetworks: Puf3p, Puf4p, and Puf5p subnetworks, and one controlled by both Puf4p and Puf5p. The architecture of individual subnetworks, and thus the super-network, is determined by competition among particular PUF proteins to bind mRNAs, their affinities for binding elements, and the abundances of the proteins. The super-network responds dramatically: The remaining network can either expand or contract. These strikingly opposite outcomes are determined by an interplay between the relative abundance of the RNAs and proteins, and their affinities for one another. The diverse interplay between overlapping RNA-protein networks provides versatile opportunities for regulation and evolution. © 2017 Lapointe et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Zahadat, Payam; Christensen, David Johan; Schultz, Ulrik Pagh
Designing controllers for modular robots is difficult due to the distributed and dynamic nature of the robots. In this paper fractal gene regulatory networks are evolved to control modular robots in a distributed way. Experiments with different morphologies of modular robot are performed and the ......Designing controllers for modular robots is difficult due to the distributed and dynamic nature of the robots. In this paper fractal gene regulatory networks are evolved to control modular robots in a distributed way. Experiments with different morphologies of modular robot are performed...
Zahadat, Payam; Christensen, David Johan; Katebi, Serajeddin
In this paper we study fractal gene regulatory network (FGRN) controllers based on sensory information. The FGRN controllers are evolved to control a snake robot consisting of seven simulated ATRON modules. Each module contains three tilt sensors which represent the direction of gravity in the co......In this paper we study fractal gene regulatory network (FGRN) controllers based on sensory information. The FGRN controllers are evolved to control a snake robot consisting of seven simulated ATRON modules. Each module contains three tilt sensors which represent the direction of gravity...
Sorek, Matan; Balaban, Nathalie Q; Loewenstein, Yonatan
It is generally believed that associative memory in the brain depends on multistable synaptic dynamics, which enable the synapses to maintain their value for extended periods of time. However, multistable dynamics are not restricted to synapses. In particular, the dynamics of some genetic regulatory networks are multistable, raising the possibility that even single cells, in the absence of a nervous system, are capable of learning associations. Here we study a standard genetic regulatory network model with bistable elements and stochastic dynamics. We demonstrate that such a genetic regulatory network model is capable of learning multiple, general, overlapping associations. The capacity of the network, defined as the number of associations that can be simultaneously stored and retrieved, is proportional to the square root of the number of bistable elements in the genetic regulatory network. Moreover, we compute the capacity of a clonal population of cells, such as in a colony of bacteria or a tissue, to store associations. We show that even if the cells do not interact, the capacity of the population to store associations substantially exceeds that of a single cell and is proportional to the number of bistable elements. Thus, we show that even single cells are endowed with the computational power to learn associations, a power that is substantially enhanced when these cells form a population.
Full Text Available It is generally believed that associative memory in the brain depends on multistable synaptic dynamics, which enable the synapses to maintain their value for extended periods of time. However, multistable dynamics are not restricted to synapses. In particular, the dynamics of some genetic regulatory networks are multistable, raising the possibility that even single cells, in the absence of a nervous system, are capable of learning associations. Here we study a standard genetic regulatory network model with bistable elements and stochastic dynamics. We demonstrate that such a genetic regulatory network model is capable of learning multiple, general, overlapping associations. The capacity of the network, defined as the number of associations that can be simultaneously stored and retrieved, is proportional to the square root of the number of bistable elements in the genetic regulatory network. Moreover, we compute the capacity of a clonal population of cells, such as in a colony of bacteria or a tissue, to store associations. We show that even if the cells do not interact, the capacity of the population to store associations substantially exceeds that of a single cell and is proportional to the number of bistable elements. Thus, we show that even single cells are endowed with the computational power to learn associations, a power that is substantially enhanced when these cells form a population.
Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.
Patrick E. Meyer
Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.
Jayter Silva Paula
Full Text Available PURPOSE: To describe the procedures used in developing Clinical and Regulatory Protocols for primary care teams to use in the management of the most common scenarios of impaired vision in Southern Brazil. METHODS: A retrospective review of 1.333 referral forms from all primary care practitioners was performed in Ribeirão Preto city, during a 30-day period. The major ophthalmic diagnostic categories were evaluated from those referrals forms. The Clinical and Regulatory Protocols development process was held afterwards and involved scientific cooperation between a university and the health care system, in the form of workshops attended by primary care practitioners and regulatory system team members composed of health care administrators, ophthalmologists, and professors of ophthalmology and social medicine. RESULTS: The management of impaired vision was chosen as the theme, since it accounted for 43.6% of the ophthalmology-related referrals from primary care providers of Ribeirão Preto. The Clinical and Regulatory Protocols developed involve distinctive diagnostic and therapeutic interventions that can be performed at the primary care level and in different health care settings. The most relevant clinical and regulatory interventions were expressed as algorithms in order to facilitate the use of the Clinical and Regulatory Protocols by health care practitioners. CONCLUSIONS: These Clinical and Regulatory Protocols could represent a useful tool for health systems with universal access, as well as for health care networks based on primary care and for regulatory system teams. Implementation of these Clinical and Regulatory Protocols can minimize the disparity between the needs of patients with impaired vision and the treatment modalities offered, resulting in a more cooperative health care network.
Background Hepatocellular carcinoma (HCC) is one of the most fatal cancers in the world, and metastasis is a significant cause to the high mortality in patients with HCC. However, the molecular mechanism behind HCC metastasis is not fully understood. Study of regulatory networks may help investigate HCC metastasis in the way of systems biology profiling. Methods By utilizing both sequence information and parallel microRNA(miRNA) and mRNA expression data on the same cohort of HBV related HCC patients without or with venous metastasis, we constructed combinatorial regulatory networks of non-metastatic and metastatic HCC which contain transcription factor(TF) regulation and miRNA regulation. Differential regulation patterns, classifying marker modules, and key regulatory miRNAs were analyzed by comparing non-metastatic and metastatic networks. Results Globally TFs accounted for the main part of regulation while miRNAs for the minor part of regulation. However miRNAs displayed a more active role in the metastatic network than in the non-metastatic one. Seventeen differential regulatory modules discriminative of the metastatic status were identified as cumulative-module classifier, which could also distinguish survival time. MiR-16, miR-30a, Let-7e and miR-204 were identified as key miRNA regulators contributed to HCC metastasis. Conclusion In this work we demonstrated an integrative approach to conduct differential combinatorial regulatory network analysis in the specific context venous metastasis of HBV-HCC. Our results proposed possible transcriptional regulatory patterns underlying the different metastatic subgroups of HCC. The workflow in this study can be applied in similar context of cancer research and could also be extended to other clinical topics. PMID:23282077
Fröhlich, Fabian; Kaltenbacher, Barbara; Theis, Fabian J; Hasenauer, Jan
Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics.
Full Text Available Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics.
Batt, Gregory; Page, Michel; Cantone, Irene; Goessler, Gregor; Monteiro, Pedro; de Jong, Hidde
Investigating the relation between the structure and behavior of complex biological networks often involves posing the question if the hypothesized structure of a regulatory network is consistent with the observed behavior, or if a proposed structure can generate a desired behavior. The above questions can be cast into a parameter search problem for qualitative models of regulatory networks. We develop a method based on symbolic model checking that avoids enumerating all possible parametrizations, and show that this method performs well on real biological problems, using the IRMA synthetic network and benchmark datasets. We test the consistency between IRMA and time-series expression profiles, and search for parameter modifications that would make the external control of the system behavior more robust. GNA and the IRMA model are available at http://ibis.inrialpes.fr/.
Full Text Available Abstract Background In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression. Results We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure. Conclusion Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.
Full Text Available Identifying cancer subtypes is an important component of the personalised medicine framework. An increasing number of computational methods have been developed to identify cancer subtypes. However, existing methods rarely use information from gene regulatory networks to facilitate the subtype identification. It is widely accepted that gene regulatory networks play crucial roles in understanding the mechanisms of diseases. Different cancer subtypes are likely caused by different regulatory mechanisms. Therefore, there are great opportunities for developing methods that can utilise network information in identifying cancer subtypes.In this paper, we propose a method, weighted similarity network fusion (WSNF, to utilise the information in the complex miRNA-TF-mRNA regulatory network in identifying cancer subtypes. We firstly build the regulatory network where the nodes represent the features, i.e. the microRNAs (miRNAs, transcription factors (TFs and messenger RNAs (mRNAs and the edges indicate the interactions between the features. The interactions are retrieved from various interatomic databases. We then use the network information and the expression data of the miRNAs, TFs and mRNAs to calculate the weight of the features, representing the level of importance of the features. The feature weight is then integrated into a network fusion approach to cluster the samples (patients and thus to identify cancer subtypes. We applied our method to the TCGA breast invasive carcinoma (BRCA and glioblastoma multiforme (GBM datasets. The experimental results show that WSNF performs better than the other commonly used computational methods, and the information from miRNA-TF-mRNA regulatory network contributes to the performance improvement. The WSNF method successfully identified five breast cancer subtypes and three GBM subtypes which show significantly different survival patterns. We observed that the expression patterns of the features in some mi
Palsson Bernhard Ø
Full Text Available Abstract Background Biochemically detailed stoichiometric matrices have now been reconstructed for various bacteria, yeast, and for the human cardiac mitochondrion based on genomic and proteomic data. These networks have been manually curated based on legacy data and elementally and charge balanced. Comparative analysis of these well curated networks is now possible. Pairs of metabolites often appear together in several network reactions, linking them topologically. This co-occurrence of pairs of metabolites in metabolic reactions is termed herein "metabolite coupling." These metabolite pairs can be directly computed from the stoichiometric matrix, S. Metabolite coupling is derived from the matrix ŜŜT, whose off-diagonal elements indicate the number of reactions in which any two metabolites participate together, where Ŝ is the binary form of S. Results Metabolite coupling in the studied networks was found to be dominated by a relatively small group of highly interacting pairs of metabolites. As would be expected, metabolites with high individual metabolite connectivity also tended to be those with the highest metabolite coupling, as the most connected metabolites couple more often. For metabolite pairs that are not highly coupled, we show that the number of reactions a pair of metabolites shares across a metabolic network closely approximates a line on a log-log scale. We also show that the preferential coupling of two metabolites with each other is spread across the spectrum of metabolites and is not unique to the most connected metabolites. We provide a measure for determining which metabolite pairs couple more often than would be expected based on their individual connectivity in the network and show that these metabolites often derive their principal biological functions from existing in pairs. Thus, analysis of metabolite coupling provides information beyond that which is found from studying the individual connectivity of individual
Guo, Liyuan; Wang, Jing
Here, we present the updated rSNPBase 3.0 database (http://rsnp3.psych.ac.cn), which provides human SNP-related regulatory elements, element-gene pairs and SNP-based regulatory networks. This database is the updated version of the SNP regulatory annotation database rSNPBase and rVarBase. In comparison to the last two versions, there are both structural and data adjustments in rSNPBase 3.0: (i) The most significant new feature is the expansion of analysis scope from SNP-related regulatory elements to include regulatory element-target gene pairs (E-G pairs), therefore it can provide SNP-based gene regulatory networks. (ii) Web function was modified according to data content and a new network search module is provided in the rSNPBase 3.0 in addition to the previous regulatory SNP (rSNP) search module. The two search modules support data query for detailed information (related-elements, element-gene pairs, and other extended annotations) on specific SNPs and SNP-related graphic networks constructed by interacting transcription factors (TFs), miRNAs and genes. (3) The type of regulatory elements was modified and enriched. To our best knowledge, the updated rSNPBase 3.0 is the first data tool supports SNP functional analysis from a regulatory network prospective, it will provide both a comprehensive understanding and concrete guidance for SNP-related regulatory studies. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Full Text Available Abstract Background A myriad of methods to reverse-engineer transcriptional regulatory networks have been developed in recent years. Direct methods directly reconstruct a network of pairwise regulatory interactions while module-based methods predict a set of regulators for modules of coexpressed genes treated as a single unit. To date, there has been no systematic comparison of the relative strengths and weaknesses of both types of methods. Results We have compared a recently developed module-based algorithm, LeMoNe (Learning Module Networks, to a mutual information based direct algorithm, CLR (Context Likelihood of Relatedness, using benchmark expression data and databases of known transcriptional regulatory interactions for Escherichia coli and Saccharomyces cerevisiae. A global comparison using recall versus precision curves hides the topologically distinct nature of the inferred networks and is not informative about the specific subtasks for which each method is most suited. Analysis of the degree distributions and a regulator specific comparison show that CLR is 'regulator-centric', making true predictions for a higher number of regulators, while LeMoNe is 'target-centric', recovering a higher number of known targets for fewer regulators, with limited overlap in the predicted interactions between both methods. Detailed biological examples in E. coli and S. cerevisiae are used to illustrate these differences and to prove that each method is able to infer parts of the network where the other fails. Biological validation of the inferred networks cautions against over-interpreting recall and precision values computed using incomplete reference networks. Conclusion Our results indicate that module-based and direct methods retrieve largely distinct parts of the underlying transcriptional regulatory networks. The choice of algorithm should therefore be based on the particular biological problem of interest and not on global metrics which cannot be
Holloway, Catherine; Beiko, Robert G
Microbial genomes exhibit complex sets of genetic affinities due to lateral genetic transfer. Assessing the relative contributions of parent-to-offspring inheritance and gene sharing is a vital step in understanding the evolutionary origins and modern-day function of an organism, but recovering and showing these relationships is a challenging problem. We have developed a new approach that uses linear programming to find between-genome relationships, by treating tables of genetic affinities (here, represented by transformed BLAST e-values) as an optimization problem. Validation trials on simulated data demonstrate the effectiveness of the approach in recovering and representing vertical and lateral relationships among genomes. Application of the technique to a set comprising Aquifex aeolicus and 75 other thermophiles showed an important role for large genomes as 'hubs' in the gene sharing network, and suggested that genes are preferentially shared between organisms with similar optimal growth temperatures. We were also able to discover distinct and common genetic contributors to each sequenced representative of genus Pseudomonas. The linear programming approach we have developed can serve as an effective inference tool in its own right, and can be an efficient first step in a more-intensive phylogenomic analysis.
Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A.
A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. PMID:25324314
Gene regulatory networks analyze the relationships between genes allowing us to un- derstand the gene regulatory interactions in systems biology. Gene expression data from the microarray experiments is used to obtain the gene regulatory networks. How- ever, the microarray data is discrete, noisy and non-linear which makes learning the networks a challenging problem and existing gene network inference methods do not give consistent results. Current state-of-the-art study uses the average-ranking-based consensus method to combine and average the ranked predictions from individual methods. However each individual method has an equal contribution to the consen- sus prediction. We have developed a linear programming-based consensus approach which uses learned weights from linear programming among individual methods such that the methods have di↵erent weights depending on their performance. Our result reveals that assigning di↵erent weights to individual methods rather than giving them equal weights improves the performance of the consensus. The linear programming- based consensus method is evaluated and it had the best performance on in silico and Saccharomyces cerevisiae networks, and the second best on the Escherichia coli network outperformed by Inferelator Pipeline method which gives inconsistent results across a wide range of microarray data sets.
Kevin Y Yip
Full Text Available We performed computational reconstruction of the in silico gene regulatory networks in the DREAM3 Challenges. Our task was to learn the networks from two types of data, namely gene expression profiles in deletion strains (the 'deletion data' and time series trajectories of gene expression after some initial perturbation (the 'perturbation data'. In the course of developing the prediction method, we observed that the two types of data contained different and complementary information about the underlying network. In particular, deletion data allow for the detection of direct regulatory activities with strong responses upon the deletion of the regulator while perturbation data provide richer information for the identification of weaker and more complex types of regulation. We applied different techniques to learn the regulation from the two types of data. For deletion data, we learned a noise model to distinguish real signals from random fluctuations using an iterative method. For perturbation data, we used differential equations to model the change of expression levels of a gene along the trajectories due to the regulation of other genes. We tried different models, and combined their predictions. The final predictions were obtained by merging the results from the two types of data. A comparison with the actual regulatory networks suggests that our approach is effective for networks with a range of different sizes. The success of the approach demonstrates the importance of integrating heterogeneous data in network reconstruction.
Kamal Dev Sharma
Full Text Available Cold stress modifies anthers’ metabolic pathways to induce pollen sterility. Cold-tolerant plants, unlike the susceptible ones, produce high proportion of viable pollen. Anthers in susceptible plants, when exposed to cold stress, increase abscisic acid (ABA metabolism and reduce ABA catabolism. Increased ABA negatively regulates expression of tapetum cell wall bound invertase and monosaccharide transport genes resulting in distorted carbohydrate pool in anther. Cold-stress also reduces endogenous levels of the bioactive gibberellins (GAs, GA4 and GA7, in susceptible anthers by repression of the GA biosynthesis genes. Here we discuss recent findings on mechanisms of cold susceptibility in anthers which determine pollen sterility. We also discuss differences in regulatory pathways between cold-stressed anthers of susceptible and tolerant plants that decide pollen sterility or viability.
Gottesman, Omri; Kuivaniemi, Helena; Tromp, Gerard; Faucett, W Andrew; Li, Rongling; Manolio, Teri A; Sanderson, Saskia C; Kannry, Joseph; Zinberg, Randi; Basford, Melissa A; Brilliant, Murray; Carey, David J; Chisholm, Rex L; Chute, Christopher G; Connolly, John J; Crosslin, David; Denny, Joshua C; Gallego, Carlos J; Haines, Jonathan L; Hakonarson, Hakon; Harley, John; Jarvik, Gail P; Kohane, Isaac; Kullo, Iftikhar J; Larson, Eric B; McCarty, Catherine; Ritchie, Marylyn D; Roden, Dan M; Smith, Maureen E; Böttinger, Erwin P; Williams, Marc S
The Electronic Medical Records and Genomics Network is a National Human Genome Research Institute-funded consortium engaged in the development of methods and best practices for using the electronic...
Full Text Available Abstract Background Reverse engineering gene networks and identifying regulatory interactions are integral to understanding cellular decision making processes. Advancement in high throughput experimental techniques has initiated innovative data driven analysis of gene regulatory networks. However, inherent noise associated with biological systems requires numerous experimental replicates for reliable conclusions. Furthermore, evidence of robust algorithms directly exploiting basic biological traits are few. Such algorithms are expected to be efficient in their performance and robust in their prediction. Results We have developed a network identification algorithm to accurately infer both the topology and strength of regulatory interactions from time series gene expression data in the presence of significant experimental noise and non-linear behavior. In this novel formulism, we have addressed data variability in biological systems by integrating network identification with the bootstrap resampling technique, hence predicting robust interactions from limited experimental replicates subjected to noise. Furthermore, we have incorporated non-linearity in gene dynamics using the S-system formulation. The basic network identification formulation exploits the trait of sparsity of biological interactions. Towards that, the identification algorithm is formulated as an integer-programming problem by introducing binary variables for each network component. The objective function is targeted to minimize the network connections subjected to the constraint of maximal agreement between the experimental and predicted gene dynamics. The developed algorithm is validated using both in silico and experimental data-sets. These studies show that the algorithm can accurately predict the topology and connection strength of the in silico networks, as quantified by high precision and recall, and small discrepancy between the actual and predicted kinetic parameters
Multilayered hierarchical gene regulatory networks (ML-hGRNs) are very important for understanding genetics regulation of biological pathways. However, there are currently no computational algorithms available for directly building ML-hGRNs that regulate biological pathways. A bottom-up graphic Gaus...
that model GRNs from real data. PRISM, a probabilistic learning framework based on B- prolog , was used to program the Bayesian networks. Instead of...intelligence, prolog , gene regulation, “Raf” pathway 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF PAGES 28 19a...probabilistic logic paradigm. PRISM is a probabilistic logical framework based on B- prolog the language extends the Horn clauses to include random variables
Full Text Available Abstract Background To facilitate deciphering underlying transcriptional regulatory circuits in mouse embryonic stem (ES cells, recent ChIP-seq data provided genome-wide binding locations of several key transcription factors (TFs; meanwhile, existing efforts profiled gene expression in ES cells and in their early differentiated state. It has been shown that the gene expression profiles are correlated with the binding of these TFs. However, it remains unclear whether other TFs, referred to as cofactors, participate the gene regulation by collaborating with the ChIP-seq TFs. Results Based on our analyses of the ES gene expression profiles and binding sites of potential cofactors in vicinity of the ChIP-seq TF binding locations, we identified a list of co-binding features that show significantly different characteristics between different gene expression patterns (activated or repressed gene expression in ES cells at a false discovery rate of 10%. Gene classification with a subset of the identified features achieved up to 20% improvement over classification only based on the ChIP-seq TFs. More than 1/3 of reasoned regulatory roles of cofactor candidates involved in these features are supported by existing literatures. Finally, the predicted target genes of the majority candidates present expected expression change in another independent data set, which serves as a supplementary validation of these candidates. Conclusions Our results revealed a list of combinatorial genomic features that are significantly associated with gene expression in ES cells, suggesting potential cofactors of the ChIP-seq TFs for gene regulation.
Xiao, Dong; Bucher, Johan; Jin, Mina; Boyle, Kerry; Fobert, Pierre; Maliepaard, Chris
Fatty acids in seeds affect seed germination and seedling vigor, and fatty acid composition determines the quality of seed oil. In this study, quantitative trait locus (QTL) mapping of fatty acid and transcript abundance was integrated with gene network analysis to unravel the genetic regulation of seed fatty acid composition in a Brassica rapa doubled haploid population from a cross between a yellow sarson oil type and a black-seeded pak choi. The distribution of major QTLs for fatty acids showed a relationship with the fatty acid types: linkage group A03 for monounsaturated fatty acids, A04 for saturated fatty acids, and A05 for polyunsaturated fatty acids. Using a genetical genomics approach, expression quantitative trait locus (eQTL) hotspots were found at major fatty acid QTLs on linkage groups A03, A04, A05, and A09. An eQTL-guided gene coexpression network of lipid metabolism-related genes showed major hubs at the genes BrPLA2-ALPHA, BrWD-40, a number of seed storage protein genes, and the transcription factor BrMD-2, suggesting essential roles for these genes in lipid metabolism. Three subnetworks were extracted for the economically important and most abundant fatty acids erucic, oleic, linoleic, and linolenic acids. Network analysis, combined with comparison of the genome positions of cis- or trans-eQTLs with fatty acid QTLs, allowed the identification of candidate genes for genetic regulation of these fatty acids. The generated insights in the genetic architecture of fatty acid composition and the underlying complex gene regulatory networks in B. rapa seeds are discussed. PMID:26518343
Basnet, Ram Kumar; Del Carpio, Dunia Pino; Xiao, Dong; Bucher, Johan; Jin, Mina; Boyle, Kerry; Fobert, Pierre; Visser, Richard G F; Maliepaard, Chris; Bonnema, Guusje
Fatty acids in seeds affect seed germination and seedling vigor, and fatty acid composition determines the quality of seed oil. In this study, quantitative trait locus (QTL) mapping of fatty acid and transcript abundance was integrated with gene network analysis to unravel the genetic regulation of seed fatty acid composition in a Brassica rapa doubled haploid population from a cross between a yellow sarson oil type and a black-seeded pak choi. The distribution of major QTLs for fatty acids showed a relationship with the fatty acid types: linkage group A03 for monounsaturated fatty acids, A04 for saturated fatty acids, and A05 for polyunsaturated fatty acids. Using a genetical genomics approach, expression quantitative trait locus (eQTL) hotspots were found at major fatty acid QTLs on linkage groups A03, A04, A05, and A09. An eQTL-guided gene coexpression network of lipid metabolism-related genes showed major hubs at the genes BrPLA2-ALPHA, BrWD-40, a number of seed storage protein genes, and the transcription factor BrMD-2, suggesting essential roles for these genes in lipid metabolism. Three subnetworks were extracted for the economically important and most abundant fatty acids erucic, oleic, linoleic, and linolenic acids. Network analysis, combined with comparison of the genome positions of cis- or trans-eQTLs with fatty acid QTLs, allowed the identification of candidate genes for genetic regulation of these fatty acids. The generated insights in the genetic architecture of fatty acid composition and the underlying complex gene regulatory networks in B. rapa seeds are discussed. © 2016 American Society of Plant Biologists. All Rights Reserved.
Scribner, Elizabeth Y; Fathallah-Shaykh, Hassan M
Biological networks can be very complex. Mathematical modeling and simulation of regulatory networks can assist in resolving unanswered questions about these complex systems, which are often impossible to explore experimentally. The network regulating the Drosophila circadian clock is particularly amenable to such modeling given its complexity and what we call the clockwork orange (CWO) anomaly. CWO is a protein whose function in the network as an indirect activator of genes per, tim, vri, and pdp1 is counterintuitive--in isolated experiments, CWO inhibits transcription of these genes. Although many different types of modeling frameworks have recently been applied to the Drosophila circadian network, this chapter focuses on the application of continuous deterministic dynamic modeling to this network. In particular, we present three unique systems of ordinary differential equations that have been used to successfully model different aspects of the circadian network. The last model incorporates the newly identified protein CWO, and we explain how this model's unique mathematical equations can be used to explore and resolve the CWO anomaly. Finally, analysis of these equations gives rise to a new network regulatory rule, which clarifies the unusual role of CWO in this dynamical system. © 2011 Elsevier Inc. All rights reserved.
Full Text Available Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL, for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL based scheme, and the QTL-directed dependency graph (QDG method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.
Bauer, Tobias; Trump, Saskia; Ishaque, Naveed; Thürmann, Loreen; Gu, Lei; Bauer, Mario; Bieg, Matthias; Gu, Zuguang; Weichenhan, Dieter; Mallm, Jan-Philipp; Röder, Stefan; Herberth, Gunda; Takada, Eiko; Mücke, Oliver; Winter, Marcus; Junge, Kristin M; Grützmann, Konrad; Rolle-Kampczyk, Ulrike; Wang, Qi; Lawerenz, Christian; Borte, Michael; Polte, Tobias; Schlesner, Matthias; Schanne, Michaela; Wiemann, Stefan; Geörg, Christina; Stunnenberg, Hendrik G; Plass, Christoph; Rippe, Karsten; Mizuguchi, Junichiro; Herrmann, Carl; Eils, Roland; Lehmann, Irina
Epigenetic mechanisms have emerged as links between prenatal environmental exposure and disease risk later in life. Here, we studied epigenetic changes associated with maternal smoking at base pair resolution by mapping DNA methylation, histone modifications, and transcription in expectant mothers and their newborn children. We found extensive global differential methylation and carefully evaluated these changes to separate environment associated from genotype-related DNA methylation changes. Differential methylation is enriched in enhancer elements and targets in particular "commuting" enhancers having multiple, regulatory interactions with distal genes. Longitudinal whole-genome bisulfite sequencing revealed that DNA methylation changes associated with maternal smoking persist over years of life. Particularly in children prenatal environmental exposure leads to chromatin transitions into a hyperactive state. Combined DNA methylation, histone modification, and gene expression analyses indicate that differential methylation in enhancer regions is more often functionally translated than methylation changes in promoters or non-regulatory elements. Finally, we show that epigenetic deregulation of a commuting enhancer targeting c-Jun N-terminal kinase 2 (JNK2) is linked to impaired lung function in early childhood. © 2016 The Authors. Published under the terms of the CC BY 4.0 license.
Full Text Available Abstract Background Pathway databases are becoming increasingly important and almost omnipresent in most types of biological and translational research. However, little is known about the quality and completeness of pathways stored in these databases. The present study conducts a comprehensive assessment of transcriptional regulatory pathways in humans for seven well-studied transcription factors: MYC, NOTCH1, BCL6, TP53, AR, STAT1, and RELA. The employed benchmarking methodology first involves integrating genome-wide binding with functional gene expression data to derive direct targets of transcription factors. Then the lists of experimentally obtained direct targets are compared with relevant lists of transcriptional targets from 10 commonly used pathway databases. Results The results of this study show that for the majority of pathway databases, the overlap between experimentally obtained target genes and targets reported in transcriptional regulatory pathway databases is surprisingly small and often is not statistically significant. The only exception is MetaCore pathway database which yields statistically significant intersection with experimental results in 84% cases. Additionally, we suggest that the lists of experimentally derived direct targets obtained in this study can be used to reveal new biological insight in transcriptional regulation and suggest novel putative therapeutic targets in cancer. Conclusions Our study opens a debate on validity of using many popular pathway databases to obtain transcriptional regulatory targets. We conclude that the choice of pathway databases should be informed by solid scientific evidence and rigorous empirical evaluation. Reviewers This article was reviewed by Prof. Wing Hung Wong, Dr. Thiago Motta Venancio (nominated by Dr. L Aravind, and Prof. Geoff J McLachlan.
Piganeau, Gwenael; Vandepoele, Klaas; Gourbière, Sébastien; Van de Peer, Yves; Moreau, Hervé
We used a phylogenetic footprinting approach, adapted to high levels of divergence, to estimate the level of constraint in intergenic regions of the extremely gene dense Ostreococcus algae genomes (Chlorophyta, Prasinophyceae). We first benchmarked our method against the Saccharomyces sensu stricto genome data and found that the proportion of conserved non-coding sites was consistent with those obtained with methods using calibration by the neutral substitution rate. We then applied our method to the complete genomes of Ostreococcus tauri and O. lucimarinus, which are the most divergent species from the same genus sequenced so far. We found that 77% of intergenic regions in Ostreococcus still contain some phylogenetic footprints, as compared to 88% for Saccharomyces, corresponding to an average rate of constraint on intergenic region of 17% and 30%, respectively. A comparison with some known functional cis-regulatory elements enabled us to investigate whether some transcriptional regulatory pathways were conserved throughout the green lineage. Strikingly, the size of the phylogenetic footprints depends on gene orientation of neighboring genes, and appears to be genus-specific. In Ostreococcus, 5' intergenic regions contain four times more conserved sites than 3' intergenic regions, whereas in yeast a higher frequency of constrained sites in intergenic regions between genes on the same DNA strand suggests a higher frequency of bidirectional regulatory elements. The phylogenetic footprinting approach can be used despite high levels of divergence in the ultrasmall Ostreococcus algae, to decipher structure of constrained regulatory motifs, and identify putative regulatory pathways conserved within the green lineage.
Full Text Available The onset of cancer is unavoidably accompanied by suppression of antitumor immunity. This occurs through mechanisms ranging from the progressive accumulation of regulatory immune cells associated with chronic immune stimulation and inflammation, to the expression of immunosuppressive molecules. Some of them are being successfully exploited as therapeutic targets, with impressive clinical results achieved in patients, as in the case of immune checkpoint inhibitors. To limit immune attack, tumor cells exploit specific pathways to render the tumor microenvironment hostile for antitumor effector cells. Local acidification might, in fact, anergize activated T cells and facilitate the accumulation of immune suppressive cells. Moreover, the release of extracellular vesicles by tumor cells can condition distant immune sites contributing to the onset of systemic immune suppression. Understanding which mechanisms may be prevalent in specific cancers or disease stages, and identifying possible strategies to counterbalance would majorly contribute to improving clinical efficacy of cancer immunotherapy. Here, we intend to highlight these mechanisms, how they could be targeted and the tools that might be available in the near future to achieve this goal.
Full Text Available It is desirable to have efficient mathematical methods to extract information about regulatory iterations between genes from repeated measurements of gene transcript concentrations. One piece of information is of interest when the dynamics reaches a steady state. In this paper we develop tools that enable the detection of steady states that are modeled by fixed points in discrete finite dynamical systems. We discuss two algebraic models, a univariate model and a multivariate model. We show that these two models are equivalent and that one can be converted to the other by means of a discrete Fourier transform. We give a new, more general definition of a linear finite dynamical system and we give a necessary and sufficient condition for such a system to be a fixed point system, that is, all cycles are of length one. We show how this result for generalized linear systems can be used to determine when certain nonlinear systems (monomial dynamical systems over finite fields are fixed point systems. We also show how it is possible to determine in polynomial time when an ordinary linear system (defined over a finite field is a fixed point system. We conclude with a necessary condition for a univariate finite dynamical system to be a fixed point system.
Cho, Michael H.; San José Estépar, Raúl; McDonald, Merry-Lynn N.; Laird, Nan; Beaty, Terri H.; Washko, George; Crapo, James D.; Silverman, Edwin K.
Rationale: Emphysema is a heritable trait that occurs in smokers with and without chronic obstructive pulmonary disease. Emphysema occurs in distinct pathologic patterns, but the genetic determinants of these patterns are unknown. Objectives: To identify genetic loci associated with distinct patterns of emphysema in smokers and investigate the regulatory function of these loci. Methods: Quantitative measures of distinct emphysema patterns were generated from computed tomography scans from smokers in the COPDGene Study using the local histogram emphysema quantification method. Genome-wide association studies (GWAS) were performed in 9,614 subjects for five emphysema patterns, and the results were referenced against enhancer and DNase I hypersensitive regions from ENCODE and Roadmap Epigenomics cell lines. Measurements and Main Results: Genome-wide significant associations were identified for seven loci. Two are novel associations (top single-nucleotide polymorphism rs379123 in MYO1D and rs9590614 in VMA8) located within genes that function in cell-cell signaling and cell migration, and five are in loci previously associated with chronic obstructive pulmonary disease susceptibility (HHIP, IREB2/CHRNA3, CYP2A6/ADCK, TGFB2, and MMP12). Five of these seven loci lay within enhancer or DNase I hypersensitivity regions in lung fibroblasts or small airway epithelial cells, respectively. Enhancer enrichment analysis for top GWAS associations (single-nucleotide polymorphisms associated at P emphysema quantified by computed tomography scan. Enhancer regions are significantly enriched among these GWAS results, with pulmonary fibroblasts among the cell types showing the strongest enrichment. PMID:25006744
Full Text Available Genome-wide maps of transcription factor (TF occupancy and regions of open chromatin implicitly contain DNA sequence signals for multiple factors. We present SeqGL, a novel de novo motif discovery algorithm to identify multiple TF sequence signals from ChIP-, DNase-, and ATAC-seq profiles. SeqGL trains a discriminative model using a k-mer feature representation together with group lasso regularization to extract a collection of sequence signals that distinguish peak sequences from flanking regions. Benchmarked on over 100 ChIP-seq experiments, SeqGL outperformed traditional motif discovery tools in discriminative accuracy. Furthermore, SeqGL can be naturally used with multitask learning to identify genomic and cell-type context determinants of TF binding. SeqGL successfully scales to the large multiplicity of sequence signals in DNase- or ATAC-seq maps. In particular, SeqGL was able to identify a number of ChIP-seq validated sequence signals that were not found by traditional motif discovery algorithms. Thus compared to widely used motif discovery algorithms, SeqGL demonstrates both greater discriminative accuracy and higher sensitivity for detecting the DNA sequence signals underlying regulatory element maps. SeqGL is available at http://cbio.mskcc.org/public/Leslie/SeqGL/.
Wang, Zhuo; Danziger, Samuel A; Heavner, Benjamin D; Ma, Shuyi; Smith, Jennifer J; Li, Song; Herricks, Thurston; Simeonidis, Evangelos; Baliga, Nitin S; Aitchison, John D; Price, Nathan D
Gene regulatory and metabolic network models have been used successfully in many organisms, but inherent differences between them make networks difficult to integrate. Probabilistic Regulation Of Metabolism (PROM) provides a partial solution, but it does not incorporate network inference and underperforms in eukaryotes. We present an Integrated Deduced And Metabolism (IDREAM) method that combines statistically inferred Environment and Gene Regulatory Influence Network (EGRIN) models with the PROM framework to create enhanced metabolic-regulatory network models. We used IDREAM to predict phenotypes and genetic interactions between transcription factors and genes encoding metabolic activities in the eukaryote, Saccharomyces cerevisiae. IDREAM models contain many fewer interactions than PROM and yet produce significantly more accurate growth predictions. IDREAM consistently outperformed PROM using any of three popular yeast metabolic models and across three experimental growth conditions. Importantly, IDREAM's enhanced accuracy makes it possible to identify subtle synthetic growth defects. With experimental validation, these novel genetic interactions involving the pyruvate dehydrogenase complex suggested a new role for fatty acid-responsive factor Oaf1 in regulating acetyl-CoA production in glucose grown cells.
Wang, B H; Lim, J W; Lim, J S
Many studies exist for reconstructing gene regulatory networks (GRNs). In this paper, we propose a method based on an advanced neuro-fuzzy system, for gene regulatory network reconstruction from microarray time-series data. This approach uses a neural network with a weighted fuzzy function to model the relationships between genes. Fuzzy rules, which determine the regulators of genes, are very simplified through this method. Additionally, a regulator selection procedure is proposed, which extracts the exact dynamic relationship between genes, using the information obtained from the weighted fuzzy function. Time-series related features are extracted from the original data to employ the characteristics of temporal data that are useful for accurate GRN reconstruction. The microarray dataset of the yeast cell cycle was used for our study. We measured the mean squared prediction error for the efficiency of the proposed approach and evaluated the accuracy in terms of precision, sensitivity, and F-score. The proposed method outperformed the other existing approaches.
Laubichler, Manfred D; Renn, Jürgen
This paper introduces a conceptual framework for the evolution of complex systems based on the integration of regulatory network and niche construction theories. It is designed to apply equally to cases of biological, social and cultural evolution. Within the conceptual framework we focus especially on the transformation of complex networks through the linked processes of externalization and internalization of causal factors between regulatory networks and their corresponding niches and argue that these are an important part of evolutionary explanations. This conceptual framework extends previous evolutionary models and focuses on several challenges, such as the path-dependent nature of evolutionary change, the dynamics of evolutionary innovation and the expansion of inheritance systems. © 2015 The Authors. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution published by Wiley Periodicals, Inc.
Mousavian, Zaynab; Kavousi, Kaveh; Masoudi-Nejad, Ali
"A Mathematical Theory of Communication", was published in 1948 by Claude Shannon to establish a framework that is now known as information theory. In recent decades, information theory has gained much attention in the area of systems biology. The aim of this paper is to provide a systematic review of those contributions that have applied information theory in inferring or understanding of biological systems. Based on the type of system components and the interactions between them, we classify the biological systems into 4 main classes: gene regulatory, metabolic, protein-protein interaction and signaling networks. In the first part of this review, we attempt to introduce most of the existing studies on two types of biological networks, including gene regulatory and metabolic networks, which are founded on the concepts of information theory. Copyright © 2015 Elsevier Ltd. All rights reserved.
Full Text Available Abstract Background Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications. Results Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as information-theoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in
Full Text Available Abstract Background Effects on gene expression due to environmental or genetic changes can be easily measured using microarrays. However, indirect effects on expression can be substantial. The indirect effects of a perturbation need to be distinguished from the direct effects if we are to understand the structure and behavior of regulatory networks. Results The most direct way to perturb a transcriptional network is to alter transcription factor activity. Here, for the first time, we compare expression changes and genomic binding in a simple regulon under conditions of both low and high transcription factor activity. Specifically, we assessed the effects on expression and binding due to deletion of the yeast LEU3 transcription factor gene and effects due to elevation of Leu3 activity. Leu3 activity was elevated through overexpression and the introduction of a mutation that renders the protein constitutively active. Genes that are bound and/or regulated by Leu3 under one or both conditions were characterized in terms of their functional annotations and their predicted potential to be bound by Leu3. We also assessed the evolutionary conservation of the predicted binding potential using a novel alignment-independent method. Both perturbations yield genes that are likely to be direct targets of Leu3, including most of the classically defined targets. Additional direct targets are identified by each of the methods. However, experimental and computational criteria suggest that most genes whose expression is affected by the Leu3 genotype are unlikely to be regulated by binding of the protein. Conclusion Most genes that are differentially expressed by Leu3 are not direct targets despite the exceptional simplicity of the regulon, and the unusually direct nature of the perturbations investigated. These conclusions are reached through computational analyses that support and extend chromatin immunoprecipitation data on the identities of direct targets
Full Text Available Abstract Background The translational efficiency of an mRNA can be modulated by upstream open reading frames (uORFs present in certain genes. A uORF can attenuate translation of the main ORF by interfering with translational reinitiation at the main start codon. uORFs also occur by chance in the genome, in which case they do not have a regulatory role. Since the sequence determinants for functional uORFs are not understood, it is difficult to discriminate functional from spurious uORFs by sequence analysis. Results We have used comparative genomics to identify novel uORFs in yeast with a high likelihood of having a translational regulatory role. We examined uORFs, previously shown to play a role in regulation of translation in Saccharomyces cerevisiae, for evolutionary conservation within seven Saccharomyces species. Inspection of the set of conserved uORFs yielded the following three characteristics useful for discrimination of functional from spurious uORFs: a length between 4 and 6 codons, a distance from the start of the main ORF between 50 and 150 nucleotides, and finally a lack of overlap with, and clear separation from, neighbouring uORFs. These derived rules are inherently associated with uORFs with properties similar to the GCN4 locus, and may not detect most uORFs of other types. uORFs with high scores based on these rules showed a much higher evolutionary conservation than randomly selected uORFs. In a genome-wide scan in S. cerevisiae, we found 34 conserved uORFs from 32 genes that we predict to be functional; subsequent analysis showed the majority of these to be located within transcripts. A total of 252 genes were found containing conserved uORFs with properties indicative of a functional role; all but 7 are novel. Functional content analysis of this set identified an overrepresentation of genes involved in transcriptional control and development. Conclusion Evolutionary conservation of uORFs in yeasts can be traced up to 100
Cheng, Long; Hou, Zeng-Guang; Lin, Yingzi; Tan, Min; Zhang, Wenjun Chris; Wu, Fang-Xiang
A recurrent neural network is proposed for solving the non-smooth convex optimization problem with the convex inequality and linear equality constraints. Since the objective function and inequality constraints may not be smooth, the Clarke's generalized gradients of the objective function and inequality constraints are employed to describe the dynamics of the proposed neural network. It is proved that the equilibrium point set of the proposed neural network is equivalent to the optimal solution of the original optimization problem by using the Lagrangian saddle-point theorem. Under weak conditions, the proposed neural network is proved to be stable, and the state of the neural network is convergent to one of its equilibrium points. Compared with the existing neural network models for non-smooth optimization problems, the proposed neural network can deal with a larger class of constraints and is not based on the penalty method. Finally, the proposed neural network is used to solve the identification problem of genetic regulatory networks, which can be transformed into a non-smooth convex optimization problem. The simulation results show the satisfactory identification accuracy, which demonstrates the effectiveness and efficiency of the proposed approach.
Differential networks allow us to better understand the changes in cellular processes that are exhibited in conditions of interest, identifying variations in gene regulation or protein interaction between, for example, cases and controls, or in response to external stimuli. Here we present a novel methodology for the inference of differential gene regulatory networks from gene expression microarray data. Specifically we apply a Bayesian model selection approach to compare models of conserved and varying network structure, and use Gaussian graphical models to represent the network structures. We apply a variational inference approach to the learning of Gaussian graphical models of gene regulatory networks, that enables us to perform Bayesian model selection that is significantly more computationally efficient than Markov Chain Monte Carlo approaches. Our method is demonstrated to be more robust than independent analysis of data from multiple conditions when applied to synthetic network data, generating fewer false positive predictions of differential edges. We demonstrate the utility of our approach on real world gene expression microarray data by applying it to existing data from amyotrophic lateral sclerosis cases with and without mutations in C9orf72, and controls, where we are able to identify differential network interactions for further investigation.
Barah, Pankaj; B N, Mahantesha Naika; Jayavelu, Naresh Doni; Sowdhamini, Ramanathan; Shameer, Khader; Bones, Atle M.
Differentially evolved responses to various stress conditions in plants are controlled by complex regulatory circuits of transcriptional activators, and repressors, such as transcription factors (TFs). To understand the general and condition-specific activities of the TFs and their regulatory relationships with the target genes (TGs), we have used a homogeneous stress gene expression dataset generated on ten natural ecotypes of the model plant Arabidopsis thaliana, during five single and six combined stress conditions. Knowledge-based profiles of binding sites for 25 stress-responsive TF families (187 TFs) were generated and tested for their enrichment in the regulatory regions of the associated TGs. Condition-dependent regulatory sub-networks have shed light on the differential utilization of the underlying network topology, by stress-specific regulators and multifunctional regulators. The multifunctional regulators maintain the core stress response processes while the transient regulators confer the specificity to certain conditions. Clustering patterns of transcription factor binding sites (TFBS) have reflected the combinatorial nature of transcriptional regulation, and suggested the putative role of the homotypic clusters of TFBS towards maintaining transcriptional robustness against cis-regulatory mutations to facilitate the preservation of stress response processes. The Gene Ontology enrichment analysis of the TGs reflected sequential regulation of stress response mechanisms in plants. PMID:26681689
Chaitankar, Vijender; Zhang, Chaoyang; Ghosh, Preetam; Gong, Ping; Perkins, Edward J; Deng, Youping
Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold that defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we propose a new inference algorithm that incorporates mutual information (MI), conditional mutual information (CMI), and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm is evaluated using both synthetic time series data sets and a biological time series data set (Saccharomyces cerevisiae). The results show that the proposed algorithm produced fewer false edges and significantly improved the precision when compared to existing MDL algorithm.
Brusić, Igor; Kittl, Jörg; Ruhle, Ernst-Olav; Žuti, Vladimir
In September 2010 the Croatian regulatory agency (HAKOM) put in force the ordinance on technical requirements and conditions of use of optical distribution networks. With this ordinance the Croatian regulatory agency is looking over the rim by proposing a rather technical approach for the rollout of optical access networks which will have significant influence on the deployment of next generation access networks (NGAN) in Croatia. The ordinance stipulates the requirements that have to be fulf...
Fang, Xin; Sastry, Anand; Mih, Nathan
Transcriptional regulatory networks (TRNs) have been studied intensely for >25 y. Yet, even for the Escherichia coli TRN-probably the best characterized TRN-several questions remain. Here, we address three questions: (i) How complete is our knowledge of the E. coli TRN; (ii) how well can we predict...... were collected from published, validated chromatin immunoprecipitation (ChIP) data and RegulonDB. For 21 different TF knockouts, up to 63% of the differentially expressed genes in the hiTRN were traced to the knocked-out TF through regulatory cascades. Second, we trained supervised machine learning...
Bal-Price, Anna; Crofton, Kevin M.; Leist, Marcel
of regulatory needs on the one hand and the opportunities provided by new test systems and methods on the other hand. Alignment of academically and industrially driven assay development with regulatory needs in the field of DNT is a core mission of the International STakeholder NETwork (ISTNET) in DNT testing...... as an important guiding principle to assemble predictive integrated testing strategies (ITSs) for DNT. The recommendations on a road map towards AOP-based DNT testing is considered a stepwise approach, operating initially with incomplete AOPs for compound grouping, and focussing on key events of neurodevelopment...
Full Text Available Abstract Background A variety of high-throughput techniques are now available for constructing comprehensive gene regulatory networks in systems biology. In this study, we report a new statistical approach for facilitating in silico inference of regulatory network structure. The new measure of association, coefficient of intrinsic dependence (CID, is model-free and can be applied to both continuous and categorical distributions. When given two variables X and Y, CID answers whether Y is dependent on X by examining the conditional distribution of Y given X. In this paper, we apply CID to analyze the regulatory relationships between transcription factors (TFs (X and their downstream genes (Y based on clinical data. More specifically, we use estrogen receptor α (ERα as the variable X, and the analyses are based on 48 clinical breast cancer gene expression arrays (48A. Results The analytical utility of CID was evaluated in comparison with four commonly used statistical methods, Galton-Pearson's correlation coefficient (GPCC, Student's t-test (STT, coefficient of determination (CoD, and mutual information (MI. When being compared to GPCC, CoD, and MI, CID reveals its preferential ability to discover the regulatory association where distribution of the mRNA expression levels on X and Y does not fit linear models. On the other hand, when CID is used to measure the association of a continuous variable (Y against a discrete variable (X, it shows similar performance as compared to STT, and appears to outperform CoD and MI. In addition, this study established a two-layer transcriptional regulatory network to exemplify the usage of CID, in combination with GPCC, in deciphering gene networks based on gene expression profiles from patient arrays. Conclusion CID is shown to provide useful information for identifying associations between genes and transcription factors of interest in patient arrays. When coupled with the relationships detected by GPCC, the
Indoliya, Yuvraj; Tiwari, Poonam; Chauhan, Abhisekh Singh; Goel, Ridhi; Shri, Manju; Bag, Sumit Kumar; Chakrabarty, Debasis
Somatic embryogenesis is a unique process in plants and has considerable interest for biotechnological application. Compare to japonica, indica rice has been less responsive to in vitro culture. We used Illumina Hiseq 2000 sequencing platform for comparative transcriptome analysis between two rice subspecies at six different developmental stages combined with a tag-based digital gene expression profiling. Global gene expression among different samples showed greater complexity in japonica rice compared to indica which may be due to polyphyletic origin of two rice subspecies. Expression pattern in initial stage indicate major differences in proembryogenic callus induction phase that may serve as key regulator to observe differences between both subspecies. Our data suggests that phytohormone signaling pathways consist of elaborate networks with frequent crosstalk, thereby allowing plants to regulate somatic embryogenesis pathway. However, this crosstalk varies between the two rice subspecies. Down regulation of positive regulators of meristem development (i.e. KNOX, OsARF5) and up regulation of its counterparts (OsRRs, MYB, GA20ox1/GA3ox2) in japonica may be responsible for its better regeneration and differentiation of somatic embryos. Comprehensive gene expression information in the present experiment may also facilitate to understand the monocot specific meristem regulation for dedifferentiation of somatic cell to embryogenic cells. PMID:26973288
David A Garfield
Full Text Available Regulatory interactions buffer development against genetic and environmental perturbations, but adaptation requires phenotypes to change. We investigated the relationship between robustness and evolvability within the gene regulatory network underlying development of the larval skeleton in the sea urchin Strongylocentrotus purpuratus. We find extensive variation in gene expression in this network throughout development in a natural population, some of which has a heritable genetic basis. Switch-like regulatory interactions predominate during early development, buffer expression variation, and may promote the accumulation of cryptic genetic variation affecting early stages. Regulatory interactions during later development are typically more sensitive (linear, allowing variation in expression to affect downstream target genes. Variation in skeletal morphology is associated primarily with expression variation of a few, primarily structural, genes at terminal positions within the network. These results indicate that the position and properties of gene interactions within a network can have important evolutionary consequences independent of their immediate regulatory role.
An accurate determination of the network structure of gene regulatory systems from high-throughput gene expression data is an essential yet challenging step in studying how the expression of endogenous genes is controlled through a complex interaction of gene products and DNA. While numerous methods have been proposed to infer the structure of gene regulatory networks, none of them seem to work consistently over different data sets with high accuracy. A recent study to compare gene network inference methods showed that an average-ranking-based consensus method consistently performs well under various settings. Here, we propose a linear programming-based consensus method for the inference of gene regulatory networks. Unlike the average-ranking-based one, which treats the contribution of each individual method equally, our new consensus method assigns a weight to each method based on its credibility. As a case study, we applied the proposed consensus method on synthetic and real microarray data sets, and compared its performance to that of the average-ranking-based consensus and individual inference methods. Our results show that our weighted consensus method achieves superior performance over the unweighted one, suggesting that assigning weights to different individual methods rather than giving them equal weights improves the accuracy. © 2016 Elsevier B.V.
Xiao, Fei; Gao, Lin; Ye, Yusen; Hu, Yuxuan; He, Ruijie
Combining path consistency (PC) algorithms with conditional mutual information (CMI) are widely used in reconstruction of gene regulatory networks. CMI has many advantages over Pearson correlation coefficient in measuring non-linear dependence to infer gene regulatory networks. It can also discriminate the direct regulations from indirect ones. However, it is still a challenge to select the conditional genes in an optimal way, which affects the performance and computation complexity of the PC algorithm. In this study, we develop a novel conditional mutual information-based algorithm, namely RPNI (Regulation Pattern based Network Inference), to infer gene regulatory networks. For conditional gene selection, we define the co-regulation pattern, indirect-regulation pattern and mixture-regulation pattern as three candidate patterns to guide the selection of candidate genes. To demonstrate the potential of our algorithm, we apply it to gene expression data from DREAM challenge. Experimental results show that RPNI outperforms existing conditional mutual information-based methods in both accuracy and time complexity for different sizes of gene samples. Furthermore, the robustness of our algorithm is demonstrated by noisy interference analysis using different types of noise.
Full Text Available Abstract Background Uncovering the operating principles underlying cellular processes by using 'omics' data is often a difficult task due to the high-dimensionality of the solution space that spans all interactions among the bio-molecules under consideration. A rational way to overcome this problem is to use the topology of bio-molecular interaction networks in order to constrain the solution space. Such approaches systematically integrate the existing biological knowledge with the 'omics' data. Results Here we introduce a hypothesis-driven method that integrates bio-molecular network topology with transcriptome data, thereby allowing the identification of key biological features (Reporter Features around which transcriptional changes are significantly concentrated. We have combined transcriptome data with different biological networks in order to identify Reporter Gene Ontologies, Reporter Transcription Factors, Reporter Proteins and Reporter Complexes, and use this to decipher the logic of regulatory circuits playing a key role in yeast glucose repression and human diabetes. Conclusion Reporter Features offer the opportunity to identify regulatory hot-spots in bio-molecular interaction networks that are significantly affected between or across conditions. Results of the Reporter Feature analysis not only provide a snapshot of the transcriptional regulatory program but also are biologically easy to interpret and provide a powerful way to generate new hypotheses. Our Reporter Features analyses of yeast glucose repression and human diabetes data brings hints towards the understanding of the principles of transcriptional regulation controlling these two important and potentially closely related systems.
Full Text Available MicroRNAs (miRNAs are potent effectors in gene regulatory networks where aberrant miRNA expression can contribute to human diseases such as cancer. For a better understanding of the regulatory role of miRNAs in coordinating gene expression, we here present a systems biology approach combining data-driven modeling and model-driven experiments. Such an approach is characterized by an iterative process, including biological data acquisition and integration, network construction, mathematical modeling and experimental validation. To demonstrate the application of this approach, we adopt it to investigate mechanisms of collective repression on p21 by multiple miRNAs. We first construct a p21 regulatory network based on data from the literature and further expand it using algorithms that predict molecular interactions. Based on the network structure, a detailed mechanistic model is established and its parameter values are determined using data. Finally, the calibrated model is used to study the effect of different miRNA expression profiles and cooperative target regulation on p21 expression levels in different biological contexts.
Rowland, Michael A; Abdelzaher, Ahmed; Ghosh, Preetam; Mayo, Michael L
Network motifs, such as the feed-forward loop (FFL), introduce a range of complex behaviors to transcriptional regulatory networks, yet such properties are typically determined from their isolated study. We characterize the effects of crosstalk on FFL dynamics by modeling the cross regulation between two different FFLs and evaluate the extent to which these patterns occur in vivo. Analytical modeling suggests that crosstalk should overwhelmingly affect individual protein-expression dynamics. Counter to this expectation we find that entire FFLs are more likely than expected to resist the effects of crosstalk (≈20% for one crosstalk interaction) and remain dynamically modular. The likelihood that cross-linked FFLs are dynamically correlated increases monotonically with additional crosstalk, but is independent of the specific regulation type or connectivity of the interactions. Just one additional regulatory interaction is sufficient to drive the FFL dynamics to a statistically different state. Despite the potential for modularity between sparsely connected network motifs, Escherichia coli (E. coli) appears to favor crosstalk wherein at least one of the cross-linked FFLs remains modular. A gene ontology analysis reveals that stress response processes are significantly overrepresented in the cross-linked motifs found within E. coli. Although the daunting complexity of biological networks affects the dynamical properties of individual network motifs, some resist and remain modular, seemingly insulated from extrinsic perturbations-an intriguing possibility for nature to consistently and reliably provide certain network functionalities wherever the need arise. Published by Elsevier Inc.
Ma, Chihua; Luciani, Timothy; Terebus, Anna; Liang, Jie; Marai, G Elisabeta
Visualizing the complex probability landscape of stochastic gene regulatory networks can further biologists' understanding of phenotypic behavior associated with specific genes. We present PRODIGEN (PRObability DIstribution of GEne Networks), a web-based visual analysis tool for the systematic exploration of probability distributions over simulation time and state space in such networks. PRODIGEN was designed in collaboration with bioinformaticians who research stochastic gene networks. The analysis tool combines in a novel way existing, expanded, and new visual encodings to capture the time-varying characteristics of probability distributions: spaghetti plots over one dimensional projection, heatmaps of distributions over 2D projections, enhanced with overlaid time curves to display temporal changes, and novel individual glyphs of state information corresponding to particular peaks. We demonstrate the effectiveness of the tool through two case studies on the computed probabilistic landscape of a gene regulatory network and of a toggle-switch network. Domain expert feedback indicates that our visual approach can help biologists: 1) visualize probabilities of stable states, 2) explore the temporal probability distributions, and 3) discover small peaks in the probability landscape that have potential relation to specific diseases.
Full Text Available Efforts in phylogenomics have greatly improved our understanding of the backbone tree of life. However, due to the systematic error in sequence data, a sequence-based phylogenomic approach leads to well-resolved but statistically significant incongruence. Thus, independent test of current phylogenetic knowledge is required. Here, we have devised a distance-based strategy to reconstruct a highly resolved backbone tree of life, on the basis of the genome context networks of 195 fully sequenced representative species. Along with strongly supporting the monophylies of three superkingdoms and most taxonomic sub-divisions, the derived tree also suggests some intriguing results, such as high G+C gram positive origin of Bacteria, classification of Symbiobacterium thermophilum and Alcanivorax borkumensis in Firmicutes. Furthermore, simulation analyses indicate that addition of more gene relationships with high accuracy can greatly improve the resolution of the phylogenetic tree. Our results demonstrate the feasibility of the reconstruction of highly resolved phylogenetic tree with extensible gene networks across all three domains of life. This strategy also implies that the relationships between the genes (gene network can define what kind of species it is.
Olga A Soutourina
Full Text Available Clostridium difficile is an emergent pathogen, and the most common cause of nosocomial diarrhea. In an effort to understand the role of small noncoding RNAs (sRNAs in C. difficile physiology and pathogenesis, we used an in silico approach to identify 511 sRNA candidates in both intergenic and coding regions. In parallel, RNA-seq and differential 5'-end RNA-seq were used for global identification of C. difficile sRNAs and their transcriptional start sites at three different growth conditions (exponential growth phase, stationary phase, and starvation. This global experimental approach identified 251 putative regulatory sRNAs including 94 potential trans riboregulators located in intergenic regions, 91 cis-antisense RNAs, and 66 riboswitches. Expression of 35 sRNAs was confirmed by gene-specific experimental approaches. Some sRNAs, including an antisense RNA that may be involved in control of C. difficile autolytic activity, showed growth phase-dependent expression profiles. Expression of each of 16 predicted c-di-GMP-responsive riboswitches was observed, and experimental evidence for their regulatory role in coordinated control of motility and biofilm formation was obtained. Finally, we detected abundant sRNAs encoded by multiple C. difficile CRISPR loci. These RNAs may be important for C. difficile survival in bacteriophage-rich gut communities. Altogether, this first experimental genome-wide identification of C. difficile sRNAs provides a firm basis for future RNome characterization and identification of molecular mechanisms of sRNA-based regulation of gene expression in this emergent enteropathogen.
Novichkov, Pavel S; Brettin, Thomas S; Novichkova, Elena S; Dehal, Paramvir S; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A
Web services application programming interface (API) was developed to provide a programmatic access to the regulatory interactions accumulated in the RegPrecise database (http://regprecise.lbl.gov), a core resource on transcriptional regulation for the microbial domain of the Department of Energy (DOE) Systems Biology Knowledgebase. RegPrecise captures and visualize regulogs, sets of genes controlled by orthologous regulators in several closely related bacterial genomes, that were reconstructed by comparative genomics. The current release of RegPrecise 2.0 includes >1400 regulogs controlled either by protein transcription factors or by conserved ribonucleic acid regulatory motifs in >250 genomes from 24 taxonomic groups of bacteria. The reference regulons accumulated in RegPrecise can serve as a basis for automatic annotation of regulatory interactions in newly sequenced genomes. The developed API provides an efficient access to the RegPrecise data by a comprehensive set of 14 web service resources. The RegPrecise web services API is freely accessible at http://regprecise.lbl.gov/RegPrecise/services.jsp with no login requirements.
Vasconcelos, E J R; Terrão, M C; Ruiz, J C; Vêncio, R Z N; Cruz, A K
In silico analyses of Leishmania spp. genome data are a powerful resource to improve the understanding of these pathogens' biology. Trypanosomatids such as Leishmania spp. have their protein-coding genes grouped in long polycistronic units of functionally unrelated genes. The control of gene expression happens by a variety of posttranscriptional mechanisms. The high degree of synteny among Leishmania species is accompanied by highly conserved coding sequences (CDS) and poorly conserved intercoding untranslated sequences. To identify the elements involved in the control of gene expression, we conducted an in silico investigation to find conserved intercoding sequences (CICS) in the genomes of L. major, L. infantum, and L. braziliensis. We used a combination of computational tools, such as Linux-Shell, PERL and R languages, BLAST, MSPcrunch, SSAKE, and Pred-A-Term algorithms to construct a pipeline which was able to: (i) search for conservation in target-regions, (ii) eliminate CICS redundancy and mask repeat elements, (iii) predict the mRNA's extremities, (iv) analyze the distribution of orthologous genes within the generated LeishCICS-clusters, (v) assign GO terms to the LeishCICS-clusters, and (vi) provide statistical support for the gene-enrichment annotation. We associated the LeishCICS-cluster data, generated at the end of the pipeline, with the expression profile of L. donovani genes during promastigote-amastigote differentiation, as previously evaluated by others (GEO accession: GSE21936). A Pearson's correlation coefficient greater than 0.5 was observed for 730 LeishCICS-clusters containing from 2 to 17 genes. The designed computational pipeline is a useful tool and its application identified potential regulatory cis elements and putative regulons in Leishmania. Copyright © 2012 Elsevier B.V. All rights reserved.
Nandi, Anjan K; Sumana, Annagiri; Bhattacharya, Kunal
Social insects provide an excellent platform to investigate flow of information in regulatory systems since their successful social organization is essentially achieved by effective information transfer through complex connectivity patterns among the colony members. Network representation of such behavioural interactions offers a powerful tool for structural as well as dynamical analysis of the underlying regulatory systems. In this paper, we focus on the dominance interaction networks in the tropical social wasp Ropalidia marginata-a species where behavioural observations indicate that such interactions are principally responsible for the transfer of information between individuals about their colony needs, resulting in a regulation of their own activities. Our research reveals that the dominance networks of R. marginata are structurally similar to a class of naturally evolved information processing networks, a fact confirmed also by the predominance of a specific substructure-the 'feed-forward loop'-a key functional component in many other information transfer networks. The dynamical analysis through Boolean modelling confirms that the networks are sufficiently stable under small fluctuations and yet capable of more efficient information transfer compared to their randomized counterparts. Our results suggest the involvement of a common structural design principle in different biological regulatory systems and a possible similarity with respect to the effect of selection on the organization levels of such systems. The findings are also consistent with the hypothesis that dominance behaviour has been shaped by natural selection to co-opt the information transfer process in such social insect species, in addition to its primal function of mediation of reproductive competition in the colony. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Behdani, Elham; Bakhtiarizadeh, Mohammad Reza
The immune system is an important biological system that is negatively impacted by stress. This study constructed an integrated regulatory network to enhance our understanding of the regulatory gene network used in the stress-related immune system. Module inference was used to construct modules of co-expressed genes with bovine leukocyte RNA-Seq data. Transcription factors (TFs) were then assigned to these modules using Lemon-Tree algorithms. In addition, the TFs assigned to each module were confirmed using the promoter analysis and protein-protein interactions data. Therefore, our integrated method identified three TFs which include one TF that is previously known to be involved in immune response (MYBL2) and two TFs (E2F8 and FOXS1) that had not been recognized previously and were identified for the first time in this study as novel regulatory candidates in immune response. This study provides valuable insights on the regulatory programs of genes involved in the stress-related immune system.
Li, Yifeng; Chen, Haifen; Zheng, Jie; Ngom, Alioune
Accurately reconstructing gene regulatory network (GRN) from gene expression data is a challenging task in systems biology. Although some progresses have been made, the performance of GRN reconstruction still has much room for improvement. Because many regulatory events are asynchronous, learning gene interactions with multiple time delays is an effective way to improve the accuracy of GRN reconstruction. Here, we propose a new approach, called Max-Min high-order dynamic Bayesian network (MMHO-DBN) by extending the Max-Min hill-climbing Bayesian network technique originally devised for learning a Bayesian network's structure from static data. Our MMHO-DBN can explicitly model the time lags between regulators and targets in an efficient manner. It first uses constraint-based ideas to limit the space of potential structures, and then applies search-and-score ideas to search for an optimal HO-DBN structure. The performance of MMHO-DBN to GRN reconstruction was evaluated using both synthetic and real gene expression time-series data. Results show that MMHO-DBN is more accurate than current time-delayed GRN learning methods, and has an intermediate computing performance. Furthermore, it is able to learn long time-delayed relationships between genes. We applied sensitivity analysis on our model to study the performance variation along different parameter settings. The result provides hints on the setting of parameters of MMHO-DBN.
Summer, Georg; Perkins, Theodore J
A key problem in systems biology is estimating dynamical models of gene regulatory networks. Traditionally, this has been done using regression or other ad-hoc methods when the model is linear. More detailed, realistic modeling studies usually employ nonlinear dynamical models, which lead to computationally difficult parameter estimation problems. Functional data analysis methods, however, offer a means to simplify fitting by transforming the problem from one of matching modeled and observed dynamics to one of matching modeled and observed time derivatives-a regression problem, albeit a nonlinear one. We formulate a functional data analysis approach for estimating the parameters of nonlinear dynamical models and evaluate this approach on data from two real systems, the gap gene system of Drosophila melanogaster and the synthetic IRMA network, which was created expressly as a test case for genetic network inference. We also evaluate the approach on simulated data sets generated by the GeneNetWeaver program, the basis for the annual DREAM reverse engineering challenge. We assess the accuracy with which the correct regulatory relationships within the networks are extracted, and consider alternative methods of regularization for the purpose of overfitting avoidance. We also show that the computational efficiency of the functional data analysis approach, and the decomposability of the resulting regression problem, allow us to explicitly enumerate and evaluate all possible regulator combinations for every gene. This gives deeper insight into the the relevance of different regulators or regulator combinations, and lets one check for alternative regulatory explanations. Functional data analysis is a powerful approach for estimating detailed nonlinear models of gene expression dynamics, allowing efficient and accurate estimation of regulatory architecture.
Full Text Available Transformative applications in biomedicine require the discovery of complex regulatory networks that explain the development and regeneration of anatomical structures, and reveal what external signals will trigger desired changes of large-scale pattern. Despite recent advances in bioinformatics, extracting mechanistic pathway models from experimental morphological data is a key open challenge that has resisted automation. The fundamental difficulty of manually predicting emergent behavior of even simple networks has limited the models invented by human scientists to pathway diagrams that show necessary subunit interactions but do not reveal the dynamics that are sufficient for complex, self-regulating pattern to emerge. To finally bridge the gap between high-resolution genetic data and the ability to understand and control patterning, it is critical to develop computational tools to efficiently extract regulatory pathways from the resultant experimental shape phenotypes. For example, planarian regeneration has been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model has yet been found by human scientists that explains more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. We present a method to infer the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. We demonstrated our approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature; By analyzing all the datasets together, our system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method
Full Text Available BACKGROUND: Drug repositioning offers the possibility of faster development times and reduced risks in drug discovery. With the rapid development of high-throughput technologies and ever-increasing accumulation of whole genome-level datasets, an increasing number of diseases and drugs can be comprehensively characterized by the changes they induce in gene expression, protein, metabolites and phenotypes. METHODOLOGY/PRINCIPAL FINDINGS: We performed a systematic, large-scale analysis of genomic expression profiles of human diseases and drugs to create a disease-drug network. A network of 170,027 significant interactions was extracted from the approximately 24.5 million comparisons between approximately 7,000 publicly available transcriptomic profiles. The network includes 645 disease-disease, 5,008 disease-drug, and 164,374 drug-drug relationships. At least 60% of the disease-disease pairs were in the same disease area as determined by the Medical Subject Headings (MeSH disease classification tree. The remaining can drive a molecular level nosology by discovering relationships between seemingly unrelated diseases, such as a connection between bipolar disorder and hereditary spastic paraplegia, and a connection between actinic keratosis and cancer. Among the 5,008 disease-drug links, connections with negative scores suggest new indications for existing drugs, such as the use of some antimalaria drugs for Crohn's disease, and a variety of existing drugs for Huntington's disease; while the positive scoring connections can aid in drug side effect identification, such as tamoxifen's undesired carcinogenic property. From the approximately 37K drug-drug relationships, we discover relationships that aid in target and pathway deconvolution, such as 1 KCNMA1 as a potential molecular target of lobeline, and 2 both apoptotic DNA fragmentation and G2/M DNA damage checkpoint regulation as potential pathway targets of daunorubicin. CONCLUSIONS/SIGNIFICANCE: We
Zheng, Ming; Wu, Jia-nan