Tornow, Sabine; Mewes, H W
Genes and proteins are organized on the basis of their particular mutual relations or according to their interactions in cellular and genetic networks. These include metabolic or signaling pathways and protein interaction, regulatory or co-expression networks. Integrating the information from the different types of networks may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique which is based on collective, multi-body correlations in a genetic network. We calculated the correlation strength of a group of genes (e.g. in the co-expression network) which were identified as members of a module in a different network (e.g. in the protein interaction network) and estimated the probability that this correlation strength was found by chance. Groups of genes with a significant correlation strength in different networks have a high probability that they perform the same function. Here, we propose evaluating the multi-body correlations by applying the superparamagnetic approach. We compare our method to the presently applied mean Pearson correlations and show that our method is more sensitive in revealing functional relationships.
Aronow Bruce J
Full Text Available Abstract Background Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN analyses. Results For the first time, extended versions of the PageRank and HITS algorithms, and the K-Step Markov method are applied to prioritize disease candidate genes in a training-test schema. Using a list of known disease-related genes from our earlier study as a training set ("seeds", and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Under appropriate settings – for example, a back probability of 0.3 for PageRank with Priors and HITS with Priors, and step size 6 for K-Step Markov method – the three methods achieved a comparable AUC value, suggesting a similar performance. Conclusion Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations. Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.
Nguyen, Cao D.; Gardiner, Katheleen J.; Cios, Krzysztof J.
We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precis...
Full Text Available Abstract Background In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP, generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Results Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Conclusion Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.
Mostafavi, Sara; Morris, Quaid
In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Lee, Sungyoung; Kwon, Min-Seok; Park, Taesung
Most common complex traits, such as obesity, hypertension, diabetes, and cancers, are known to be associated with multiple genes, environmental factors, and their epistasis. Recently, the development of advanced genotyping technologies has allowed us to perform genome-wide association studies (GWASs). For detecting the effects of multiple genes on complex traits, many approaches have been proposed for GWASs. Multifactor dimensionality reduction (MDR) is one of the powerful and efficient methods for detecting high-order gene-gene (GxG) interactions. However, the biological interpretation of GxG interactions identified by MDR analysis is not easy. In order to aid the interpretation of MDR results, we propose a network graph analysis to elucidate the meaning of identified GxG interactions. The proposed network graph analysis consists of three steps. The first step is for performing GxG interaction analysis using MDR analysis. The second step is to draw the network graph using the MDR result. The third step is to provide biological evidence of the identified GxG interaction using external biological databases. The proposed method was applied to Korean Association Resource (KARE) data, containing 8838 individuals with 327,632 single-nucleotide polymorphisms, in order to perform GxG interaction analysis of body mass index (BMI). Our network graph analysis successfully showed that many identified GxG interactions have known biological evidence related to BMI. We expect that our network graph analysis will be helpful to interpret the biological meaning of GxG interactions.
Nguyen, Cao D; Gardiner, Katheleen J; Cios, Krzysztof J
We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precision and 60% recall versus 45% and 26% for Majority and 24% and 61% for χ²-statistics, respectively. Copyright © 2011 Elsevier Inc. All rights reserved.
Chiu, Yu-Chiao; Wang, Li-Ju; Hsiao, Tzu-Hung; Chuang, Eric Y; Chen, Yidong
With the advances in high-throughput gene profiling technologies, a large volume of gene interaction maps has been constructed. A higher-level layer of gene-gene interaction, namely modulate gene interaction, is composed of gene pairs of which interaction strengths are modulated by (i.e., dependent on) the expression level of a key modulator gene. Systematic investigations into the modulation by estrogen receptor (ER), the best-known modulator gene, have revealed the functional and prognostic significance in breast cancer. However, a genome-wide identification of key modulator genes that may further unveil the landscape of modulated gene interaction is still lacking. We proposed a systematic workflow to screen for key modulators based on genome-wide gene expression profiles. We designed four modularity parameters to measure the ability of a putative modulator to perturb gene interaction networks. Applying the method to a dataset of 286 breast tumors, we comprehensively characterized the modularity parameters and identified a total of 973 key modulator genes. The modularity of these modulators was verified in three independent breast cancer datasets. ESR1, the encoding gene of ER, appeared in the list, and abundant novel modulators were illuminated. For instance, a prognostic predictor of breast cancer, SFRP1, was found the second modulator. Functional annotation analysis of the 973 modulators revealed involvements in ER-related cellular processes as well as immune- and tumor-associated functions. Here we present, as far as we know, the first comprehensive analysis of key modulator genes on a genome-wide scale. The validity of filtering parameters as well as the conservativity of modulators among cohorts were corroborated. Our data bring new insights into the modulated layer of gene-gene interaction and provide candidates for further biological investigations.
Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun
Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these
Choi, Yunkyu; Kim, Seok; Yi, Gwan-Su; Park, Jinah
Evolution of computer technologies makes it possible to access a large amount and various kinds of biological data via internet such as DNA sequences, proteomics data and information discovered about them. It is expected that the combination of various data could help researchers find further knowledge about them. Roles of a visualization system are to invoke human abilities to integrate information and to recognize certain patterns in the data. Thus, when the various kinds of data are examined and analyzed manually, an effective visualization system is an essential part. One instance of these integrated visualizations can be combination of protein-protein interaction (PPI) data and Gene Ontology (GO) which could help enhance the analysis of PPI network. We introduce a simple but comprehensive visualization system that integrates GO and PPI data where GO and PPI graphs are visualized side-by-side and supports quick reference functions between them. Furthermore, the proposed system provides several interactive visualization methods for efficiently analyzing the PPI network and GO directedacyclic- graph such as context-based browsing and common ancestors finding.
Full Text Available Most common complex traits, such as obesity, hypertension, diabetes, and cancers, are known to be associated with multiple genes, environmental factors, and their epistasis. Recently, the development of advanced genotyping technologies has allowed us to perform genome-wide association studies (GWASs. For detecting the effects of multiple genes on complex traits, many approaches have been proposed for GWASs. Multifactor dimensionality reduction (MDR is one of the powerful and efficient methods for detecting high-order gene-gene (GxG interactions. However, the biological interpretation of GxG interactions identified by MDR analysis is not easy. In order to aid the interpretation of MDR results, we propose a network graph analysis to elucidate the meaning of identified GxG interactions. The proposed network graph analysis consists of three steps. The first step is for performing GxG interaction analysis using MDR analysis. The second step is to draw the network graph using the MDR result. The third step is to provide biological evidence of the identified GxG interaction using external biological databases. The proposed method was applied to Korean Association Resource (KARE data, containing 8838 individuals with 327,632 single-nucleotide polymorphisms, in order to perform GxG interaction analysis of body mass index (BMI. Our network graph analysis successfully showed that many identified GxG interactions have known biological evidence related to BMI. We expect that our network graph analysis will be helpful to interpret the biological meaning of GxG interactions.
Westenberg, M.A.; Hijum, van S.A.F.T.; Lulko, A.T.; Kuipers, O.P.; Roerdink, J.B.T.M.; Linsen, L.; Hagen, H.; Hamann, B.
We present GENeVis, an application to visualize gene expression time series data in a gene regulatory network context. This is a network of regulator proteins that regulate the expression of their respective target genes. The networks are represented as graphs, in which the nodes represent genes,
Hur, Junguk; Özgür, Arzucan; He, Yongqun
Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of
Full Text Available Abstract Background The identification of genes that predict in vitro cellular chemosensitivity of cancer cells is of great importance. Chemosensitivity related genes (CRGs have been widely utilized to guide clinical and cancer chemotherapy decisions. In addition, CRGs potentially share functional characteristics and network features in protein interaction networks (PPIN. Methods In this study, we proposed a method to identify CRGs based on Gene Ontology (GO and PPIN. Firstly, we documented 150 pairs of drug-CCRG (curated chemosensitivity related gene from 492 published papers. Secondly, we characterized CCRGs from the perspective of GO and PPIN. Thirdly, we prioritized CRGs based on CCRGs’ GO and network characteristics. Lastly, we evaluated the performance of the proposed method. Results We found that CCRG enriched GO terms were most often related to chemosensitivity and exhibited higher similarity scores compared to randomly selected genes. Moreover, CCRGs played key roles in maintaining the connectivity and controlling the information flow of PPINs. We then prioritized CRGs using CCRG enriched GO terms and CCRG network characteristics in order to obtain a database of predicted drug-CRGs that included 53 CRGs, 32 of which have been reported to affect susceptibility to drugs. Our proposed method identifies a greater number of drug-CCRGs, and drug-CCRGs are much more significantly enriched in predicted drug-CRGs, compared to a method based on the correlation of gene expression and drug activity. The mean area under ROC curve (AUC for our method is 65.2%, whereas that for the traditional method is 55.2%. Conclusions Our method not only identifies CRGs with expression patterns strongly correlated with drug activity, but also identifies CRGs in which expression is weakly correlated with drug activity. This study provides the framework for the identification of signatures that predict in vitro cellular chemosensitivity and offers a valuable
Full Text Available Identifying genes related to human diseases, such as cancer and cardiovascular disease, etc., is an important task in biomedical research because of its applications in disease diagnosis and treatment. Interactome networks, especially protein-protein interaction networks, had been used to disease genes identification based on the hypothesis that strong candidate genes tend to closely relate to each other in some kinds of measure on the network. We proposed a new measure to analyze the relationship between network nodes which was called graphlet interaction. The graphlet interaction contained 28 different isomers. The results showed that the numbers of the graphlet interaction isomers between disease genes in interactome networks were significantly larger than random picked genes, while graphlet signatures were not. Then, we designed a new type of score, based on the network properties, to identify disease genes using graphlet interaction. The genes with higher scores were more likely to be disease genes, and all candidate genes were ranked according to their scores. Then the approach was evaluated by leave-one-out cross-validation. The precision of the current approach achieved 90% at about 10% recall, which was apparently higher than the previous three predominant algorithms, random walk, Endeavour and neighborhood based method. Finally, the approach was applied to predict new disease genes related to 4 common diseases, most of which were identified by other independent experimental researches. In conclusion, we demonstrate that the graphlet interaction is an effective tool to analyze the network properties of disease genes, and the scores calculated by graphlet interaction is more precise in identifying disease genes.
Yang, Lun; Wei, Dong-Qing; Qi, Ying-Xin; Jiang, Zong-Lai
Identifying genes related to human diseases, such as cancer and cardiovascular disease, etc., is an important task in biomedical research because of its applications in disease diagnosis and treatment. Interactome networks, especially protein-protein interaction networks, had been used to disease genes identification based on the hypothesis that strong candidate genes tend to closely relate to each other in some kinds of measure on the network. We proposed a new measure to analyze the relationship between network nodes which was called graphlet interaction. The graphlet interaction contained 28 different isomers. The results showed that the numbers of the graphlet interaction isomers between disease genes in interactome networks were significantly larger than random picked genes, while graphlet signatures were not. Then, we designed a new type of score, based on the network properties, to identify disease genes using graphlet interaction. The genes with higher scores were more likely to be disease genes, and all candidate genes were ranked according to their scores. Then the approach was evaluated by leave-one-out cross-validation. The precision of the current approach achieved 90% at about 10% recall, which was apparently higher than the previous three predominant algorithms, random walk, Endeavour and neighborhood based method. Finally, the approach was applied to predict new disease genes related to 4 common diseases, most of which were identified by other independent experimental researches. In conclusion, we demonstrate that the graphlet interaction is an effective tool to analyze the network properties of disease genes, and the scores calculated by graphlet interaction is more precise in identifying disease genes. PMID:24465923
Salem, Saeed; Alroobi, Rami; Banitaan, Shadi; Seridi, Loqmane; Aljarah, Ibrahim; Brewer, James
networks. We demonstrate the effectiveness of CLARM on Yeast and Human interaction datasets, and gene expression and molecular function profiles. Experiments on these real datasets show that the CLARM approach is competitive to well established functional
Cui, Ying; Cai, Meng; Stanley, H. Eugene
Although there have been many network-based attempts to discover disease-associated genes, most of them have not taken edge weight - which quantifies their relative strength - into consideration. We use connection weights in a protein-protein interaction (PPI) network to locate disease-related genes. We analyze the topological properties of both weighted and unweighted PPI networks and design an improved random forest classifier to distinguish disease genes from non-disease genes. We use a cross-validation test to confirm that weighted networks are better able to discover disease-associated genes than unweighted networks, which indicates that including link weight in the analysis of network properties provides a better model of complex genotype-phenotype associations.
Evaluating the potential human health and ecological risks associated with exposures to complex chemical mixtures in the environment is one of the main challenges of chemical safety assessment and environmental protection. There is a need for approaches that can help to integrate chemical monitoring and biological effects data to evaluate risks associated with chemicals present in the environment. Here, we used prior knowledge about chemical-gene interactions to develop a knowledge assembly model for detected chemicals at five locations near the North Branch and Chisago wastewater treatment plants (WWTP) in the St. Croix River Basin, MN and WI. The assembly model was used to generate hypotheses about the biological impacts of the chemicals at each location. The hypotheses were tested using empirical hepatic gene expression data from fathead minnows exposed for 12 d at each location. Empirical gene expression data were also mapped to the assembly models to evaluate the likelihood of a chemical contributing to the observed biological responses using richness and concordance statistics. The prior knowledge approach was able predict the observed biological pathways impacted at one site but not the other. Atrazine was identified as a potential contributor to the observed gene expression responses at a location upstream of the North Branch WTTP. Four chemicals were identified as contributors to the observed biological responses at the effluent and downstream o
Full Text Available Abstract Background The reconstruction of gene regulatory networks from high-throughput "omics" data has become a major goal in the modelling of living systems. Numerous approaches have been proposed, most of which attempt only "one-shot" reconstruction of the whole network with no intervention from the user, or offer only simple correlation analysis to infer gene dependencies. Results We have developed MINER (Microarray Interactive Network Exploration and Representation, an application that combines multivariate non-linear tree learning of individual gene regulatory dependencies, visualisation of these dependencies as both trees and networks, and representation of known biological relationships based on common Gene Ontology annotations. MINER allows biologists to explore the dependencies influencing the expression of individual genes in a gene expression data set in the form of decision, model or regression trees, using their domain knowledge to guide the exploration and formulate hypotheses. Multiple trees can then be summarised in the form of a gene network diagram. MINER is being adopted by several of our collaborators and has already led to the discovery of a new significant regulatory relationship with subsequent experimental validation. Conclusion Unlike most gene regulatory network inference methods, MINER allows the user to start from genes of interest and build the network gene-by-gene, incorporating domain expertise in the process. This approach has been used successfully with RNA microarray data but is applicable to other quantitative data produced by high-throughput technologies such as proteomics and "next generation" DNA sequencing.
Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui
The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.
Dong Ling Tong
Full Text Available OBJECTIVE: To model the potential interaction between previously identified biomarkers in children sarcomas using artificial neural network inference (ANNI. METHOD: To concisely demonstrate the biological interactions between correlated genes in an interaction network map, only 2 types of sarcomas in the children small round blue cell tumors (SRBCTs dataset are discussed in this paper. A backpropagation neural network was used to model the potential interaction between genes. The prediction weights and signal directions were used to model the strengths of the interaction signals and the direction of the interaction link between genes. The ANN model was validated using Monte Carlo cross-validation to minimize the risk of over-fitting and to optimize generalization ability of the model. RESULTS: Strong connection links on certain genes (TNNT1 and FNDC5 in rhabdomyosarcoma (RMS; FCGRT and OLFM1 in Ewing's sarcoma (EWS suggested their potency as central hubs in the interconnection of genes with different functionalities. The results showed that the RMS patients in this dataset are likely to be congenital and at low risk of cardiomyopathy development. The EWS patients are likely to be complicated by EWS-FLI fusion and deficiency in various signaling pathways, including Wnt, Fas/Rho and intracellular oxygen. CONCLUSIONS: The ANN network inference approach and the examination of identified genes in the published literature within the context of the disease highlights the substantial influence of certain genes in sarcomas.
Recent advances in proteomic and transcriptomic technologies resulted in the accumulation of vast amount of high-throughput data that span multiple biological processes and characteristics in different organisms. Much of the data come in the form of interaction networks and mRNA expression arrays. An important task in systems biology is functional modules discovery where the goal is to uncover well-connected sub-networks (modules). These discovered modules help to unravel the underlying mechanisms of the observed biological processes. While most of the existing module discovery methods use only the interaction data, in this work we propose, CLARM, which discovers biological modules by incorporating gene profiles data with protein-protein interaction networks. We demonstrate the effectiveness of CLARM on Yeast and Human interaction datasets, and gene expression and molecular function profiles. Experiments on these real datasets show that the CLARM approach is competitive to well established functional module discovery methods.
Full Text Available Retinitis pigmentosa (RP is a highly heterogeneous genetic visual disorder with more than 70 known causative genes, some of them shared with other non-syndromic retinal dystrophies (e.g. Leber congenital amaurosis, LCA. The identification of RP genes has increased steadily during the last decade, and the 30% of the cases that still remain unassigned will soon decrease after the advent of exome/genome sequencing. A considerable amount of genetic and functional data on single RD genes and mutations has been gathered, but a comprehensive view of the RP genes and their interacting partners is still very fragmentary. This is the main gap that needs to be filled in order to understand how mutations relate to progressive blinding disorders and devise effective therapies.We have built an RP-specific network (RPGeNet by merging data from different sources: high-throughput data from BioGRID and STRING databases, manually curated data for interactions retrieved from iHOP, as well as interactions filtered out by syntactical parsing from up-to-date abstracts and full-text papers related to the RP research field. The paths emerging when known RP genes were used as baits over the whole interactome have been analysed, and the minimal number of connections among the RP genes and their close neighbors were distilled in order to simplify the search space.In contrast to the analysis of single isolated genes, finding the networks linking disease genes renders powerful etiopathological insights. We here provide an interactive interface, RPGeNet, for the molecular biologist to explore the network centered on the non-syndromic and syndromic RP and LCA causative genes. By integrating tissue-specific expression levels and phenotypic data on top of that network, a more comprehensive biological view will highlight key molecular players of retinal degeneration and unveil new RP disease candidates.
Yang, Yi; Maxwell, Andrew; Zhang, Xiaowei; Wang, Nan; Perkins, Edward J; Zhang, Chaoyang; Gong, Ping
Background Pathway alterations reflected as changes in gene expression regulation and gene interaction can result from cellular exposure to toxicants. Such information is often used to elucidate toxicological modes of action. From a risk assessment perspective, alterations in biological pathways are a rich resource for setting toxicant thresholds, which may be more sensitive and mechanism-informed than traditional toxicity endpoints. Here we developed a novel differential networks (DNs) appro...
Yang, Yi; Maxwell, Andrew; Zhang, Xiaowei; Wang, Nan; Perkins, Edward J; Zhang, Chaoyang; Gong, Ping
Pathway alterations reflected as changes in gene expression regulation and gene interaction can result from cellular exposure to toxicants. Such information is often used to elucidate toxicological modes of action. From a risk assessment perspective, alterations in biological pathways are a rich resource for setting toxicant thresholds, which may be more sensitive and mechanism-informed than traditional toxicity endpoints. Here we developed a novel differential networks (DNs) approach to connect pathway perturbation with toxicity threshold setting. Our DNs approach consists of 6 steps: time-series gene expression data collection, identification of altered genes, gene interaction network reconstruction, differential edge inference, mapping of genes with differential edges to pathways, and establishment of causal relationships between chemical concentration and perturbed pathways. A one-sample Gaussian process model and a linear regression model were used to identify genes that exhibited significant profile changes across an entire time course and between treatments, respectively. Interaction networks of differentially expressed (DE) genes were reconstructed for different treatments using a state space model and then compared to infer differential edges/interactions. DE genes possessing differential edges were mapped to biological pathways in databases such as KEGG pathways. Using the DNs approach, we analyzed a time-series Escherichia coli live cell gene expression dataset consisting of 4 treatments (control, 10, 100, 1000 mg/L naphthenic acids, NAs) and 18 time points. Through comparison of reconstructed networks and construction of differential networks, 80 genes were identified as DE genes with a significant number of differential edges, and 22 KEGG pathways were altered in a concentration-dependent manner. Some of these pathways were perturbed to a degree as high as 70% even at the lowest exposure concentration, implying a high sensitivity of our DNs approach
Poswar, Fabiano de Oliveira; Farias, Lucyana Conceição; Fraga, Carlos Alberto de Carvalho; Bambirra, Wilson; Brito-Júnior, Manoel; Sousa-Neto, Manoel Damião; Santos, Sérgio Henrique Souza; de Paula, Alfredo Maurício Batista; D'Angelo, Marcos Flávio Silveira Vasconcelos; Guimarães, André Luiz Sena
Bioinformatics has emerged as an important tool to analyze the large amount of data generated by research in different diseases. In this study, gene expression for radicular cysts (RCs) and periapical granulomas (PGs) was characterized based on a leader gene approach. A validated bioinformatics algorithm was applied to identify leader genes for RCs and PGs. Genes related to RCs and PGs were first identified in PubMed, GenBank, GeneAtlas, and GeneCards databases. The Web-available STRING software (The European Molecular Biology Laboratory [EMBL], Heidelberg, Baden-Württemberg, Germany) was used in order to build the interaction map among the identified genes by a significance score named weighted number of links. Based on the weighted number of links, genes were clustered using k-means. The genes in the highest cluster were considered leader genes. Multilayer perceptron neural network analysis was used as a complementary supplement for gene classification. For RCs, the suggested leader genes were TP53 and EP300, whereas PGs were associated with IL2RG, CCL2, CCL4, CCL5, CCR1, CCR3, and CCR5 genes. Our data revealed different gene expression for RCs and PGs, suggesting that not only the inflammatory nature but also other biological processes might differentiate RCs and PGs. Copyright © 2015 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
Yu, Bowen; Doraiswamy, Harish; Chen, Xi; Miraldi, Emily; Arrieta-Ortiz, Mario Luis; Hafemeister, Christoph; Madar, Aviv; Bonneau, Richard; Silva, Cláudio T
Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).
Zhu Xiaodong; Guo Ya; Qu Song; Li Ling; Huang Shiting; Li Danrong; Zhang Wei
Objective: To discover radioresistance associated molecular biomarkers and its mechanism in nasopharyngeal carcinoma by protein-protein interaction network analysis. Methods: Whole genome expression microarray was applied to screen out differentially expressed genes in two cell lines CNE-2R and CNE-2 with different radiosensitivity. Four differentially expressed genes were randomly selected for further verification by the semi-quantitative RT-PCR analysis with self-designed primers. The common differentially expressed genes from two experiments were analyzed with the SNOW online database in order to find out the central node related to the biomarkers of nasopharyngeal carcinoma radioresistance. The expression of STAT1 in CNE-2R and CNE-2 cells was measured by Western blot. Results: Compared with CNE-2 cells, 374 genes in CNE-2R cells were differentially expressed while 197 genes showed significant differences. Four randomly selected differentially expressed genes were verified by RT-PCR and had same change trend in consistent with the results of chip assay. Analysis with the SNOW database demonstrated that those 197 genes could form a complicated interaction network where STAT1 and JUN might be two key nodes. Indeed, the STAT1-α expression in CNE-2R was higher than that in CNE-2 (t=4.96, P<0.05). Conclusions: The key nodes of STAT1 and JUN may be the molecular biomarkers leading to radioresistance in nasopharyngeal carcinoma, and STAT1-α might have close relationship with radioresistance. (authors)
Cao, HuanHuan; Zhang, YuHang; Zhao, Jia; Zhu, Liucun; Wang, Yi; Li, JiaRui; Feng, Yuan-Ming; Zhang, Ning
Ebola hemorrhagic fever (EHF) is caused by Ebola virus (EBOV). It is reported that human could be infected by EBOV with a high fatality rate. However, association factors between EBOV and host still tend to be ambiguous. According to the "guilt by association" (GBA) principle, proteins interacting with each other are very likely to function similarly or the same. Based on this assumption, we tried to obtain EBOV infection-related human genes in a protein-protein interaction network using Dijkstra algorithm. We hope it could contribute to the discovery of novel effective treatments. Finally, 15 genes were selected as potential EBOV infection-related human genes. Copyright© Bentham Science Publishers; For any queries, please email at firstname.lastname@example.org.
Full Text Available Duchenne Muscular Dystrophy (DMD is an important pathology associated with the human skeletal muscle and has been studied extensively. Gene expression measurements on skeletal muscle of patients afflicted with DMD provides the opportunity to understand the underlying mechanisms that lead to the pathology. Community structure analysis is a useful computational technique for understanding and modeling genetic interaction networks. In this paper, we leverage this technique in combination with gene expression measurements from normal and DMD patient skeletal muscle tissue to study the structure of genetic interactions in the context of DMD. We define a novel framework for transforming a raw dataset of gene expression measurements into an interaction network, and subsequently apply algorithms for community structure analysis for the extraction of topological communities. The emergent communities are analyzed from a biological standpoint in terms of their constituent biological pathways, and an interpretation that draws correlations between functional and structural organization of the genetic interactions is presented. We also compare these communities and associated functions in pathology against those in normal human skeletal muscle. In particular, differential enhancements are observed in the following pathways between pathological and normal cases: Metabolic, Focal adhesion, Regulation of actin cytoskeleton and Cell adhesion, and implication of these mechanisms are supported by prior work. Furthermore, our study also includes a gene-level analysis to identify genes that are involved in the coupling between the pathways of interest. We believe that our results serve to highlight important distinguishing features in the structural/functional organization of constituent biological pathways, as it relates to normal and DMD cases, and provide the mechanistic basis for further biological investigations into specific pathways differently regulated
Brorsson, C.; Hansen, Niclas Tue; Hansen, Kasper Lage
genes. We have developed a novel method that combines single nucleotide polymorphism (SNP) genotyping data with protein-protein interaction (ppi) networks to identify disease-associated network modules enriched for proteins encoded from the MHC region. Approximately 2500 SNPs located in the 4 Mb MHC......To develop novel methods for identifying new genes that contribute to the risk of developing type 1 diabetes within the Major Histocompatibility Complex (MHC) region on chromosome 6, independently of the known linkage disequilibrium (LD) between human leucocyte antigen (HLA)-DRB1, -DQA1, -DQB1...... region were analysed in 1000 affected offspring trios generated by the Type 1 Diabetes Genetics Consortium (T1DGC). The most associated SNP in each gene was chosen and genes were mapped to ppi networks for identification of interaction partners. The association testing and resulting interacting protein...
Nath, Artika P; Ritchie, Scott C; Byars, Sean G; Fearnley, Liam G; Havulinna, Aki S; Joensuu, Anni; Kangas, Antti J; Soininen, Pasi; Wennerström, Annika; Milani, Lili; Metspalu, Andres; Männistö, Satu; Würtz, Peter; Kettunen, Johannes; Raitoharju, Emma; Kähönen, Mika; Juonala, Markus; Palotie, Aarno; Ala-Korpela, Mika; Ripatti, Samuli; Lehtimäki, Terho; Abraham, Gad; Raitakari, Olli; Salomaa, Veikko; Perola, Markus; Inouye, Michael
Immunometabolism plays a central role in many cardiometabolic diseases. However, a robust map of immune-related gene networks in circulating human cells, their interactions with metabolites, and their genetic control is still lacking. Here, we integrate blood transcriptomic, metabolomic, and genomic profiles from two population-based cohorts (total N = 2168), including a subset of individuals with matched multi-omic data at 7-year follow-up. We identify topologically replicable gene networks enriched for diverse immune functions including cytotoxicity, viral response, B cell, platelet, neutrophil, and mast cell/basophil activity. These immune gene modules show complex patterns of association with 158 circulating metabolites, including lipoprotein subclasses, lipids, fatty acids, amino acids, small molecules, and CRP. Genome-wide scans for module expression quantitative trait loci (mQTLs) reveal five modules with mQTLs that have both cis and trans effects. The strongest mQTL is in ARHGEF3 (rs1354034) and affects a module enriched for platelet function, independent of platelet counts. Modules of mast cell/basophil and neutrophil function show temporally stable metabolite associations over 7-year follow-up, providing evidence that these modules and their constituent gene products may play central roles in metabolic inflammation. Furthermore, the strongest mQTL in ARHGEF3 also displays clear temporal stability, supporting widespread trans effects at this locus. This study provides a detailed map of natural variation at the blood immunometabolic interface and its genetic basis, and may facilitate subsequent studies to explain inter-individual variation in cardiometabolic disease.
Full Text Available The spatial conformation of a genome plays an important role in the long-range regulation of genome-wide gene expression and methylation, but has not been extensively studied due to lack of genome conformation data. The recently developed chromosome conformation capturing techniques such as the Hi-C method empowered by next generation sequencing can generate unbiased, large-scale, high-resolution chromosomal interaction (contact data, providing an unprecedented opportunity to investigate the spatial structure of a genome and its applications in gene regulation, genomics, epigenetics, and cell biology. In this work, we conducted a comprehensive, large-scale computational analysis of this new stream of genome conformation data generated for three different human leukemia cells or cell lines by the Hi-C technique. We developed and applied a set of bioinformatics methods to reliably generate spatial chromosomal contacts from high-throughput sequencing data and to effectively use them to study the properties of the genome structures in one-dimension (1D and two-dimension (2D. Our analysis demonstrates that Hi-C data can be effectively applied to study tissue-specific genome conformation, chromosome-chromosome interaction, chromosomal translocations, and spatial gene-gene interaction and regulation in a three-dimensional genome of primary tumor cells. Particularly, for the first time, we constructed genome-scale spatial gene-gene interaction network, transcription factor binding site (TFBS - TFBS interaction network, and TFBS-gene interaction network from chromosomal contact information. Remarkably, all these networks possess the properties of scale-free modular networks.
Li, Yongsheng; Xu, Juan; Chen, Hong; Zhao, Zheng; Li, Shengli; Bai, Jing; Wu, Aiwei; Jiang, Chunjie; Wang, Yuan; Su, Bin; Li, Xia
DNA methylation is an essential epigenetic mechanism involved in transcriptional control. However, how genes with different methylation patterns are assembled in the protein-protein interaction network (PPIN) remains a mystery. In the present study, we systematically dissected the characterization of genes with different methylation patterns in the PPIN. A negative association was detected between the methylation levels in the brain tissues and topological centralities. By focusing on two classes of genes with considerably different methylation levels in the brain tissues, namely the low methylated genes (LMGs) and high methylated genes (HMGs), we found that their organizing principles in the PPIN are distinct. The LMGs tend to be the center of the PPIN, and attacking them causes a more deleterious effect on the network integrity. Furthermore, the LMGs express their functions in a modular pattern and substantial differences in functions are observed between the two types of genes. The LMGs are enriched in the basic biological functions, such as binding activity and regulation of transcription. More importantly, cancer genes, especially recessive cancer genes, essential genes, and aging-related genes were all found more often in the LMGs. Additionally, our analysis presented that the intra-classes communications are enhanced, but inter-classes communications are repressed. Finally, a functional complementation was revealed between methylation and miRNA regulation in the human genome. We have elucidated the assembling principles of genes with different methylation levels in the context of the PPIN, providing key insights into the complex epigenetic regulation mechanisms.
Full Text Available Mammalian genomes contain several dozens of large (>0.5 Mbp lineage-specific gene loci harbouring functionally related genes. However, spatial chromatin folding, organization of the enhancer-promoter networks and their relevance to Topologically Associating Domains (TADs in these loci remain poorly understood. TADs are principle units of the genome folding and represents the DNA regions within which DNA interacts more frequently and less frequently across the TAD boundary. Here, we used Chromatin Conformation Capture Carbon Copy (5C technology to characterize spatial chromatin interaction network in the 3.1 Mb Epidermal Differentiation Complex (EDC locus harbouring 61 functionally related genes that show lineage-specific activation during terminal keratinocyte differentiation in the epidermis. 5C data validated by 3D-FISH demonstrate that the EDC locus is organized into several TADs showing distinct lineage-specific chromatin interaction networks based on their transcription activity and the gene-rich or gene-poor status. Correlation of the 5C results with genome-wide studies for enhancer-specific histone modifications (H3K4me1 and H3K27ac revealed that the majority of spatial chromatin interactions that involves the gene-rich TADs at the EDC locus in keratinocytes include both intra- and inter-TAD interaction networks, connecting gene promoters and enhancers. Compared to thymocytes in which the EDC locus is mostly transcriptionally inactive, these interactions were found to be keratinocyte-specific. In keratinocytes, the promoter-enhancer anchoring regions in the gene-rich transcriptionally active TADs are enriched for the binding of chromatin architectural proteins CTCF, Rad21 and chromatin remodeler Brg1. In contrast to gene-rich TADs, gene-poor TADs show preferential spatial contacts with each other, do not contain active enhancers and show decreased binding of CTCF, Rad21 and Brg1 in keratinocytes. Thus, spatial interactions between gene
Full Text Available Apoptosis is the process of programmed cell death (PCD that occurs in multicellular organisms. This process of normal cell death is required to maintain the balance of homeostasis. In addition, some diseases, such as obesity, cancer, and neurodegenerative diseases, can be cured through apoptosis, which produces few side effects. An effective comprehension of the mechanisms underlying apoptosis will be helpful to prevent and treat some diseases. The identification of genes related to apoptosis is essential to uncover its underlying mechanisms. In this study, a computational method was proposed to identify novel candidate genes related to apoptosis. First, protein-protein interaction information was used to construct a weighted graph. Second, a shortest path algorithm was applied to the graph to search for new candidate genes. Finally, the obtained genes were filtered by a permutation test. As a result, 26 genes were obtained, and we discuss their likelihood of being novel apoptosis-related genes by collecting evidence from published literature.
Carlos Roberto Arias
Full Text Available Finding a genetic disease-related gene is not a trivial task. Therefore, computational methods are needed to present clues to the biomedical community to explore genes that are more likely to be related to a specific disease as biomarker. We present biomarker identification problem using gene prioritization method called gene prioritization from microarray data based on shortest paths, extended with structural and biological properties and edge flux using voting scheme (GP-MIDAS-VXEF. The method is based on finding relevant interactions on protein interaction networks, then scoring the genes using shortest paths and topological analysis, integrating the results using a voting scheme and a biological boosting. We applied two experiments, one is prostate primary and normal samples and the other is prostate primary tumor with and without lymph nodes metastasis. We used 137 truly prostate cancer genes as benchmark. In the first experiment, GP-MIDAS-VXEF outperforms all the other state-of-the-art methods in the benchmark by retrieving the truest related genes from the candidate set in the top 50 scores found. We applied the same technique to infer the significant biomarkers in prostate cancer with lymph nodes metastasis which is not established well.
Full Text Available With the large availability of protein interaction networks and microarray data supported, to identify the linear paths that have biological significance in search of a potential pathway is a challenge issue. We proposed a color-coding method based on the characteristics of biological network topology and applied heuristic search to speed up color-coding method. In the experiments, we tested our methods by applying to two datasets: yeast and human prostate cancer networks and gene expression data set. The comparisons of our method with other existing methods on known yeast MAPK pathways in terms of precision and recall show that we can find maximum number of the proteins and perform comparably well. On the other hand, our method is more efficient than previous ones and detects the paths of length 10 within 40 seconds using CPU Intel 1.73GHz and 1GB main memory running under windows operating system.
Lin, Wen-Hsien; Liu, Wei-Chung; Hwang, Ming-Jing
Human cells of various tissue types differ greatly in morphology despite having the same set of genetic information. Some genes are expressed in all cell types to perform house-keeping functions, while some are selectively expressed to perform tissue-specific functions. In this study, we wished to elucidate how proteins encoded by human house-keeping genes and tissue-specific genes are organized in human protein-protein interaction networks. We constructed protein-protein interaction networks for different tissue types using two gene expression datasets and one protein-protein interaction database. We then calculated three network indices of topological importance, the degree, closeness, and betweenness centralities, to measure the network position of proteins encoded by house-keeping and tissue-specific genes, and quantified their local connectivity structure. Compared to a random selection of proteins, house-keeping gene-encoded proteins tended to have a greater number of directly interacting neighbors and occupy network positions in several shortest paths of interaction between protein pairs, whereas tissue-specific gene-encoded proteins did not. In addition, house-keeping gene-encoded proteins tended to connect with other house-keeping gene-encoded proteins in all tissue types, whereas tissue-specific gene-encoded proteins also tended to connect with other tissue-specific gene-encoded proteins, but only in approximately half of the tissue types examined. Our analysis showed that house-keeping gene-encoded proteins tend to occupy important network positions, while those encoded by tissue-specific genes do not. The biological implications of our findings were discussed and we proposed a hypothesis regarding how cells organize their protein tools in protein-protein interaction networks. Our results led us to speculate that house-keeping gene-encoded proteins might form a core in human protein-protein interaction networks, while clusters of tissue-specific gene
Modrzynska, Katarzyna; Pfander, Claudia; Chappell, Lia; Yu, Lu; Suarez, Catherine; Dundas, Kirsten; Gomes, Ana Rita; Goulding, David; Rayner, Julian C; Choudhary, Jyoti; Billker, Oliver
A family of apicomplexa-specific proteins containing AP2 DNA-binding domains (ApiAP2s) was identified in malaria parasites. This family includes sequence-specific transcription factors that are key regulators of development. However, functions for the majority of ApiAP2 genes remain unknown. Here, a systematic knockout screen in Plasmodium berghei identified ten ApiAP2 genes that were essential for mosquito transmission: four were critical for the formation of infectious ookinetes, and three were required for sporogony. We describe non-essential functions for AP2-O and AP2-SP proteins in blood stages, and identify AP2-G2 as a repressor active in both asexual and sexual stages. Comparative transcriptomics across mutants and developmental stages revealed clusters of co-regulated genes with shared cis promoter elements, whose expression can be controlled positively or negatively by different ApiAP2 factors. We propose that stage-specific interactions between ApiAP2 proteins on partly overlapping sets of target genes generate the complex transcriptional network that controls the Plasmodium life cycle. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Constructing, evaluating, and interpreting gene networks generally sits within the broader field of systems biology, which continues to emerge rapidly, particular with respect to its application to understanding the complexity of signaling in the context of cancer biology. For the purposes of this volume, we take a broad definition of systems biology. Considering an organism or disease within an organism as a system, systems biology is the study of the integrated and coordinated interactions of the network(s) of genes, their variants both natural and mutated (e.g., polymorphisms, rearrangements, alternate splicing, mutations), their proteins and isoforms, and the organic and inorganic molecules with which they interact, to execute the biochemical reactions (e.g., as enzymes, substrates, products) that reflect the function of that system. Central to systems biology, and perhaps the only approach that can effectively manage the complexity of such systems, is the building of quantitative multiscale predictive models. The predictions of the models can vary substantially depending on the nature of the model and its inputoutput relationships. For example, a model may predict the outcome of a specific molecular reaction(s), a cellular phenotype (e.g., alive, dead, growth arrest, proliferation, and motility), a change in the respective prevalence of cell or subpopulations, a patient or patient subgroup outcome(s). Such models necessarily require computers. Computational modeling can be thought of as using machine learning and related tools to integrate the very high dimensional data generated from modern, high throughput omics technologies including genomics (next generation sequencing), transcriptomics (gene expression microarrays; RNAseq), metabolomics and proteomics (ultra high performance liquid chromatography, mass spectrometry), and "subomic" technologies to study the kinome, methylome, and others. Mathematical modeling can be thought of as the use of ordinary
Brett A McKinney
Full Text Available Although many diseases and traits show large heritability, few genetic variants have been found to strongly separate phenotype groups by genotype. Complex regulatory networks of variants and expression of multiple genes lead to small individual-variant effects and difficulty replicating the effect of any single variant in an affected pathway. Interaction network modeling of GWAS identifies effects ignored by univariate models, but population differences may still cause specific genes to not replicate. Integrative network models may help detect indirect effects of variants in the underlying biological pathway. In this study, we used gene-level functional interaction information from the Integrative Multi-species Prediction (IMP tool to reveal important genes associated with a complex phenotype through evidence from epistasis networks and pathway enrichment. We test this method for augmenting variant-based network analyses with functional interactions by applying it to a smallpox vaccine immune response GWAS. The integrative analysis spotlights the role of genes related to retinoid X receptor alpha (RXRA, which has been implicated in a previous epistasis network analysis of smallpox vaccine.
Full Text Available Neurological disorders are known to show similar phenotypic manifestations like anxiety, depression, and cognitive impairment. There is a need to identify shared genetic markers and molecular pathways in these diseases, which lead to such comorbid conditions. Our study aims to prioritize novel genetic markers that might increase the susceptibility of patients affected with one neurological disorder to other diseases with similar manifestations. Identification of pathways involving common candidate markers will help in the development of improved diagnosis and treatments strategies for patients affected with neurological disorders. This systems biology study for the first time integratively uses 3D-structural protein interface descriptors and network topological properties that characterize proteins in a neurological protein interaction network, to aid the identification of genes that are previously not known to be shared between these diseases. Results of protein prioritization by machine learning have identified known as well as new genetic markers which might have direct or indirect involvement in several neurological disorders. Important gene hubs have also been identified that provide an evidence for shared molecular pathways in the neurological disease network.
Characterizing the effects of chemicals in biological systems is often summarized by chemical-gene interactions, which have sparse coverage in the literature. The ToxCast chemical screening program has produced bioactivity data for nearly 2000 chemicals and over 450 gene targets....
Full Text Available Abstract Background Deciphering gene regulatory networks by in silico approaches is a crucial step in the study of the molecular perturbations that occur in diseases. The development of regulatory maps is a tedious process requiring the comprehensive integration of various evidences scattered over biological databases. Thus, the research community would greatly benefit from having a unified database storing known and predicted molecular interactions. Furthermore, given the intrinsic complexity of the data, the development of new tools offering integrated and meaningful visualizations of molecular interactions is necessary to help users drawing new hypotheses without being overwhelmed by the density of the subsequent graph. Results We extend the previously developed TranscriptomeBrowser database with a set of tables containing 1,594,978 human and mouse molecular interactions. The database includes: (i predicted regulatory interactions (computed by scanning vertebrate alignments with a set of 1,213 position weight matrices, (ii potential regulatory interactions inferred from systematic analysis of ChIP-seq experiments, (iii regulatory interactions curated from the literature, (iv predicted post-transcriptional regulation by micro-RNA, (v protein kinase-substrate interactions and (vi physical protein-protein interactions. In order to easily retrieve and efficiently analyze these interactions, we developed In-teractomeBrowser, a graph-based knowledge browser that comes as a plug-in for Transcriptome-Browser. The first objective of InteractomeBrowser is to provide a user-friendly tool to get new insight into any gene list by providing a context-specific display of putative regulatory and physical interactions. To achieve this, InteractomeBrowser relies on a "cell compartments-based layout" that makes use of a subset of the Gene Ontology to map gene products onto relevant cell compartments. This layout is particularly powerful for visual integration
Lepoivre, Cyrille; Bergon, Aurélie; Lopez, Fabrice; Perumal, Narayanan B; Nguyen, Catherine; Imbert, Jean; Puthier, Denis
Deciphering gene regulatory networks by in silico approaches is a crucial step in the study of the molecular perturbations that occur in diseases. The development of regulatory maps is a tedious process requiring the comprehensive integration of various evidences scattered over biological databases. Thus, the research community would greatly benefit from having a unified database storing known and predicted molecular interactions. Furthermore, given the intrinsic complexity of the data, the development of new tools offering integrated and meaningful visualizations of molecular interactions is necessary to help users drawing new hypotheses without being overwhelmed by the density of the subsequent graph. We extend the previously developed TranscriptomeBrowser database with a set of tables containing 1,594,978 human and mouse molecular interactions. The database includes: (i) predicted regulatory interactions (computed by scanning vertebrate alignments with a set of 1,213 position weight matrices), (ii) potential regulatory interactions inferred from systematic analysis of ChIP-seq experiments, (iii) regulatory interactions curated from the literature, (iv) predicted post-transcriptional regulation by micro-RNA, (v) protein kinase-substrate interactions and (vi) physical protein-protein interactions. In order to easily retrieve and efficiently analyze these interactions, we developed In-teractomeBrowser, a graph-based knowledge browser that comes as a plug-in for Transcriptome-Browser. The first objective of InteractomeBrowser is to provide a user-friendly tool to get new insight into any gene list by providing a context-specific display of putative regulatory and physical interactions. To achieve this, InteractomeBrowser relies on a "cell compartments-based layout" that makes use of a subset of the Gene Ontology to map gene products onto relevant cell compartments. This layout is particularly powerful for visual integration of heterogeneous biological information
Jian Xin Jiang
Conclusions: Using identified DEGs, significantly changed biological processes such as nucleic acid metabolic process and KEGG pathways such as cytokine-cytokine receptor interaction in PBMCs of HCC patients were identified. In addition, several important hub genes, for example, CUL4A, and interleukin (IL 8 were also uncovered.
Background Protein-protein, cell signaling, metabolic, and transcriptional interaction networks are useful for identifying connections between lists of experimentally identified genes/proteins. However, besides physical or co-expression interactions there are many ways in which pairs of genes, or their protein products, can be associated. By systematically incorporating knowledge on shared properties of genes from diverse sources to build functional association networks (FANs), researchers may be able to identify additional functional interactions between groups of genes that are not readily apparent. Results Genes2FANs is a web based tool and a database that utilizes 14 carefully constructed FANs and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user’s PubMed query. As a case study, we applied Genes2FANs to connect disease genes from 90 well-studied disorders. We find an inverse correlation between the counts of links connecting disease genes through PPI and links connecting diseases genes through FANs, separating diseases into two categories. Conclusions Genes2FANs is a useful tool for interpreting the relationships between gene/protein lists in the context of their various functions and networks. Combining functional association interactions with physical PPIs can be useful for revealing new biology and help form hypotheses for further experimentation. Our finding that disease genes in
We propose a gene regulatory network model which incorporates the microscopic interactions between genes and transcription factors. In particular the gene's expression level is determined by deterministic synchronous dynamics with contribution from excitatory interactions. We study the structure of networks that have a particular '' function '' and are subject to the natural selection pressure. The question of network robustness against point mutations is addressed, and we conclude that only a small part of connections defined as '' essential '' for cell's existence is fragile. Additionally, the obtained networks are sparse with narrow in-degree and broad out-degree, properties well known from experimental study of biological regulatory networks. Furthermore, during sampling procedure we observe that significantly different genotypes can emerge under mutation-selection balance. All the preceding features hold for the model parameters which lay in the experimentally relevant range. (author)
Aalt D J van Dijk
Full Text Available Mutational robustness of gene regulatory networks refers to their ability to generate constant biological output upon mutations that change network structure. Such networks contain regulatory interactions (transcription factor-target gene interactions but often also protein-protein interactions between transcription factors. Using computational modeling, we study factors that influence robustness and we infer several network properties governing it. These include the type of mutation, i.e. whether a regulatory interaction or a protein-protein interaction is mutated, and in the case of mutation of a regulatory interaction, the sign of the interaction (activating vs. repressive. In addition, we analyze the effect of combinations of mutations and we compare networks containing monomeric with those containing dimeric transcription factors. Our results are consistent with available data on biological networks, for example based on evolutionary conservation of network features. As a novel and remarkable property, we predict that networks are more robust against mutations in monomer than in dimer transcription factors, a prediction for which analysis of conservation of DNA binding residues in monomeric vs. dimeric transcription factors provides indirect evidence.
Rutter, William B; Salcedo, Andres; Akhunova, Alina; He, Fei; Wang, Shichen; Liang, Hanquan; Bowden, Robert L; Akhunov, Eduard
Two opposing evolutionary constraints exert pressure on plant pathogens: one to diversify virulence factors in order to evade plant defenses, and the other to retain virulence factors critical for maintaining a compatible interaction with the plant host. To better understand how the diversified arsenals of fungal genes promote interaction with the same compatible wheat line, we performed a comparative genomic analysis of two North American isolates of Puccinia graminis f. sp. tritici (Pgt). The patterns of inter-isolate divergence in the secreted candidate effector genes were compared with the levels of conservation and divergence of plant-pathogen gene co-expression networks (GCN) developed for each isolate. Comprative genomic analyses revealed substantial level of interisolate divergence in effector gene complement and sequence divergence. Gene Ontology (GO) analyses of the conserved and unique parts of the isolate-specific GCNs identified a number of conserved host pathways targeted by both isolates. Interestingly, the degree of inter-isolate sub-network conservation varied widely for the different host pathways and was positively associated with the proportion of conserved effector candidates associated with each sub-network. While different Pgt isolates tended to exploit similar wheat pathways for infection, the mode of plant-pathogen interaction varied for different pathways with some pathways being associated with the conserved set of effectors and others being linked with the diverged or isolate-specific effectors. Our data suggest that at the intra-species level pathogen populations likely maintain divergent sets of effectors capable of targeting the same plant host pathways. This functional redundancy may play an important role in the dynamic of the "arms-race" between host and pathogen serving as the basis for diverse virulence strategies and creating conditions where mutations in certain effector groups will not have a major effect on the pathogen
Metzler, R.; Kinzel, W.; Kanter, I.
Several scenarios of interacting neural networks which are trained either in an identical or in a competitive way are solved analytically. In the case of identical training each perceptron receives the output of its neighbor. The symmetry of the stationary state as well as the sensitivity to the used training algorithm are investigated. Two competitive perceptrons trained on mutually exclusive learning aims and a perceptron which is trained on the opposite of its own output are examined analytically. An ensemble of competitive perceptrons is used as decision-making algorithms in a model of a closed market (El Farol Bar problem or the Minority Game. In this game, a set of agents who have to make a binary decision is considered.); each network is trained on the history of minority decisions. This ensemble of perceptrons relaxes to a stationary state whose performance can be better than random.
Miryala, Sravan Kumar; Anbarasu, Anand; Ramaiah, Sudha
Computational analysis of biomolecular interaction networks is now gaining a lot of importance to understand the functions of novel genes/proteins. Gene interaction (GI) network analysis and protein-protein interaction (PPI) network analysis play a major role in predicting the functionality of interacting genes or proteins and gives an insight into the functional relationships and evolutionary conservation of interactions among the genes. An interaction network is a graphical representation of gene/protein interactome, where each gene/protein is a node, and interaction between gene/protein is an edge. In this review, we discuss the popular open source databases that serve as data repositories to search and collect protein/gene interaction data, and also tools available for the generation of interaction network, visualization and network analysis. Also, various network analysis approaches like topological approach and clustering approach to study the network properties and functional enrichment server which illustrates the functions and pathway of the genes and proteins has been discussed. Hence the distinctive attribute mentioned in this review is not only to provide an overview of tools and web servers for gene and protein-protein interaction (PPI) network analysis but also to extract useful and meaningful information from the interaction networks. Copyright © 2017 Elsevier B.V. All rights reserved.
Li, Yupeng; Jackson, Scott A
Lim, Néhémy; Senbabaoglu, Yasin; Michailidis, George; d'Alché-Buc, Florence
Reverse engineering of gene regulatory networks remains a central challenge in computational systems biology, despite recent advances facilitated by benchmark in silico challenges that have aided in calibrating their performance. A number of approaches using either perturbation (knock-out) or wild-type time-series data have appeared in the literature addressing this problem, with the latter using linear temporal models. Nonlinear dynamical models are particularly appropriate for this inference task, given the generation mechanism of the time-series data. In this study, we introduce a novel nonlinear autoregressive model based on operator-valued kernels that simultaneously learns the model parameters, as well as the network structure. A flexible boosting algorithm (OKVAR-Boost) that shares features from L2-boosting and randomization-based algorithms is developed to perform the tasks of parameter learning and network inference for the proposed model. Specifically, at each boosting iteration, a regularized Operator-valued Kernel-based Vector AutoRegressive model (OKVAR) is trained on a random subnetwork. The final model consists of an ensemble of such models. The empirical estimation of the ensemble model's Jacobian matrix provides an estimation of the network structure. The performance of the proposed algorithm is first evaluated on a number of benchmark datasets from the DREAM3 challenge and then on real datasets related to the In vivo Reverse-Engineering and Modeling Assessment (IRMA) and T-cell networks. The high-quality results obtained strongly indicate that it outperforms existing approaches. The OKVAR-Boost Matlab code is available as the archive: http://amis-group.fr/sourcecode-okvar-boost/OKVARBoost-v1.0.zip. Supplementary data are available at Bioinformatics online.
Huan, Jinliang; Wang, Lishan; Xing, Li; Qin, Xianju; Feng, Lingbin; Pan, Xiaofeng; Zhu, Ling
Estrogens are known to regulate the proliferation of breast cancer cells and to alter their cytoarchitectural and phenotypic properties, but the gene networks and pathways by which estrogenic hormones regulate these events are only partially understood. We used global gene expression profiling by Affymetrix GeneChip microarray analysis, with KEGG pathway enrichment, PPI network construction, module analysis and text mining methods to identify patterns and time courses of genes that are either stimulated or inhibited by estradiol (E2) in estrogen receptor (ER)-positive MCF-7 human breast cancer cells. Of the genes queried on the Affymetrix Human Genome U133 plus 2.0 microarray, we identified 628 (12h), 852 (24h) and 880 (48 h) differentially expressed genes (DEGs) that showed a robust pattern of regulation by E2. From pathway enrichment analysis, we found out the changes of metabolic pathways of E2 treated samples at each time point. At 12h time point, the changes of metabolic pathways were mainly focused on pathways in cancer, focal adhesion, and chemokine signaling pathway. At 24h time point, the changes were mainly enriched in neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction and calcium signaling pathway. At 48 h time point, the significant pathways were pathways in cancer, regulation of actin cytoskeleton, cell adhesion molecules (CAMs), axon guidance and ErbB signaling pathway. Of interest, our PPI network analysis and module analysis found that E2 treatment induced enhancement of PRSS23 at the three time points and PRSS23 was in the central position of each module. Text mining results showed that the important genes of DEGs have relationship with signal pathways, such as ERbB pathway (AREG), Wnt pathway (NDP), MAPK pathway (NTRK3, TH), IP3 pathway (TRA@) and some transcript factors (TCF4, MAF). Our studies highlight the diverse gene networks and metabolic and cell regulatory pathways through which E2 operates to achieve its
Full Text Available Abstract Background Genetic interaction profiles are highly informative and helpful for understanding the functional linkages between genes, and therefore have been extensively exploited for annotating gene functions and dissecting specific pathway structures. However, our understanding is rather limited to the relationship between double concurrent perturbation and various higher level phenotypic changes, e.g. those in cells, tissues or organs. Modifier screens, such as synthetic genetic arrays (SGA can help us to understand the phenotype caused by combined gene mutations. Unfortunately, exhaustive tests on all possible combined mutations in any genome are vulnerable to combinatorial explosion and are infeasible either technically or financially. Therefore, an accurate computational approach to predict genetic interaction is highly desirable, and such methods have the potential of alleviating the bottleneck on experiment design. Results In this work, we introduce a computational systems biology approach for the accurate prediction of pairwise synthetic genetic interactions (SGI. First, a high-coverage and high-precision functional gene network (FGN is constructed by integrating protein-protein interaction (PPI, protein complex and gene expression data; then, a graph-based semi-supervised learning (SSL classifier is utilized to identify SGI, where the topological properties of protein pairs in weighted FGN is used as input features of the classifier. We compare the proposed SSL method with the state-of-the-art supervised classifier, the support vector machines (SVM, on a benchmark dataset in S. cerevisiae to validate our method's ability to distinguish synthetic genetic interactions from non-interaction gene pairs. Experimental results show that the proposed method can accurately predict genetic interactions in S. cerevisiae (with a sensitivity of 92% and specificity of 91%. Noticeably, the SSL method is more efficient than SVM, especially for
The original publication is available from www.springerlink.com. Sloep, P. (2009). Social Interaction in Learning Networks. In R. Koper (Ed.), Learning Network Services for Professional Development (pp 13-15). Berlin, Germany: Springer Verlag.
Zhang, Song-Yao; Zhang, Shao-Wu; Liu, Lian; Meng, Jia; Huang, Yufei
As the most prevalent mammalian mRNA epigenetic modification, N6-methyladenosine (m6A) has been shown to possess important post-transcriptional regulatory functions. However, the regulatory mechanisms and functional circuits of m6A are still largely elusive. To help unveil the regulatory circuitry mediated by mRNA m6A methylation, we develop here m6A-Driver, an algorithm for predicting m6A-driven genes and associated networks, whose functional interactions are likely to be actively modulated ...
Cline, M.S.; Smoot, M.; Cerami, E.
of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules......Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context...... and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape....
Full Text Available In this paper, we apply the entitymetrics model to our constructed Gene-Citation-Gene (GCG network. Based on the premise there is a hidden, but plausible, relationship between an entity in one article and an entity in its citing article, we constructed a GCG network of gene pairs implicitly connected through citation. We compare the performance of this GCG network to a gene-gene (GG network constructed over the same corpus but which uses gene pairs explicitly connected through traditional co-occurrence. Using 331,411 MEDLINE abstracts collected from 18,323 seed articles and their references, we identify 25 gene pairs. A comparison of these pairs with interactions found in BioGRID reveal that 96% of the gene pairs in the GCG network have known interactions. We measure network performance using degree, weighted degree, closeness, betweenness centrality and PageRank. Combining all measures, we find the GCG network has more gene pairs, but a lower matching rate than the GG network. However, combining top ranked genes in both networks produces a matching rate of 35.53%. By visualizing both the GG and GCG networks, we find that cancer is the most dominant disease associated with the genes in both networks. Overall, the study indicates that the GCG network can be useful for detecting gene interaction in an implicit manner.
Zhang, Shuqin; Zhao, Hongyu; Ng, Michael K
Network has been a general tool for studying the complex interactions between different genes, proteins, and other small molecules. Module as a fundamental property of many biological networks has been widely studied and many computational methods have been proposed to identify the modules in an individual network. However, in many cases, a single network is insufficient for module analysis due to the noise in the data or the tuning of parameters when building the biological network. The availability of a large amount of biological networks makes network integration study possible. By integrating such networks, more informative modules for some specific disease can be derived from the networks constructed from different tissues, and consistent factors for different diseases can be inferred. In this paper, we have developed an effective method for module identification from multiple networks under different conditions. The problem is formulated as an optimization model, which combines the module identification in each individual network and alignment of the modules from different networks together. An approximation algorithm based on eigenvector computation is proposed. Our method outperforms the existing methods, especially when the underlying modules in multiple networks are different in simulation studies. We also applied our method to two groups of gene coexpression networks for humans, which include one for three different cancers, and one for three tissues from the morbidly obese patients. We identified 13 modules with three complete subgraphs, and 11 modules with two complete subgraphs, respectively. The modules were validated through Gene Ontology enrichment and KEGG pathway enrichment analysis. We also showed that the main functions of most modules for the corresponding disease have been addressed by other researchers, which may provide the theoretical basis for further studying the modules experimentally.
Edwards, R.; Glass, L.
The explosive growth in knowledge of the genome of humans and other organisms leaves open the question of how the functioning of genes in interacting networks is coordinated for orderly activity. One approach to this problem is to study mathematical properties of abstract network models that capture the logical structures of gene networks. The principal issue is to understand how particular patterns of activity can result from particular network structures, and what types of behavior are possible. We study idealized models in which the logical structure of the network is explicitly represented by Boolean functions that can be represented by directed graphs on n-cubes, but which are continuous in time and described by differential equations, rather than being updated synchronously via a discrete clock. The equations are piecewise linear, which allows significant analysis and facilitates rapid integration along trajectories. We first give a combinatorial solution to the question of how many distinct logical structures exist for n-dimensional networks, showing that the number increases very rapidly with n. We then outline analytic methods that can be used to establish the existence, stability and periods of periodic orbits corresponding to particular cycles on the n-cube. We use these methods to confirm the existence of limit cycles discovered in a sample of a million randomly generated structures of networks of 4 genes. Even with only 4 genes, at least several hundred different patterns of stable periodic behavior are possible, many of them surprisingly complex. We discuss ways of further classifying these periodic behaviors, showing that small mutations (reversal of one or a few edges on the n-cube) need not destroy the stability of a limit cycle. Although these networks are very simple as models of gene networks, their mathematical transparency reveals relationships between structure and behavior, they suggest that the possibilities for orderly dynamics in such
Considine, Mark; Lewis, Jenny
of `street-level' employment services staff for the impacts of this. Contrary to expectations, networking has generally declined over the last decade. There are signs of path dependence in networking patterns within each country, but also a convergence of patterns for the UK and Australia......The systemic reform of employment services in OECD countries was driven by New Public Management (NPM) and then post-NPM reforms, when first-phase changes such as privatization were amended with `joined up' processes to help manage fragmentation. This article examines the networking strategies......, but not The Netherlands. Networking appears to be mediated by policy and regulatory imperatives....
Background Mannoproteins construct the outer cover of the fungal cell wall. The covalently linked cell wall protein Ccw12p is an abundant mannoprotein. It is considered as crucial structural cell wall component since in baker's yeast the lack of CCW12 results in severe cell wall damage and reduced mating efficiency. Results In order to explore the function of CCW12, we performed a Synthetic Genetic Analysis (SGA) and identified genes that are essential in the absence of CCW12. The resulting interaction network identified 21 genes involved in cell wall integrity, chitin synthesis, cell polarity, vesicular transport and endocytosis. Among those are PFD1, WHI3, SRN2, PAC10, FEN1 and YDR417C, which have not been related to cell wall integrity before. We correlated our results with genetic interaction networks of genes involved in glucan and chitin synthesis. A core of genes essential to maintain cell integrity in response to cell wall stress was identified. In addition, we performed a large-scale transcriptional analysis and compared the transcriptional changes observed in mutant ccw12Δ with transcriptomes from studies investigating responses to constitutive or acute cell wall damage. We identified a set of genes that are highly induced in the majority of the mutants/conditions and are directly related to the cell wall integrity pathway and cell wall compensatory responses. Among those are BCK1, CHS3, EDE1, PFD1, SLT2 and SLA1 that were also identified in the SGA. In contrast, a specific feature of mutant ccw12Δ is the transcriptional repression of genes involved in mating. Physiological experiments substantiate this finding. Further, we demonstrate that Ccw12p is present at the cell periphery and highly concentrated at the presumptive budding site, around the bud, at the septum and at the tip of the mating projection. Conclusions The combination of high throughput screenings, phenotypic analyses and localization studies provides new insight into the function of Ccw
Full Text Available A cross-cultural comparison of social networking structure on McDonald’s Facebook fan sites between Taiwan and the USA was conducted utilizing the individualism/collectivism dimension proposed by Hofstede. Four network indicators are used to describe the network structure of McDonald’s Facebook fan sites: size, density, clique and centralization. Individuals who post on both Facebook sites for the year of 2012 were considered as network participants for the purpose of the study. Due to the huge amount of data, only one thread of postings was sampled from each month of the year of 2012. The final data consists of 1002 postings written by 896 individuals and 5962 postings written by 5532 individuals from Taiwan and the USA respectively. The results indicated that the USA McDonald’s Facebook fan network has more fans, while Taiwan’s McDonald’s Facebook fan network is more densely connected. Cliques did form among the overall multiplex and within the individual uniplex networks in two countries, yet no significant differences were found between them. All the fan networks in both countries are relatively centralized, mostly on the site operators.
Full Text Available Physical interactions between proteins mediate a variety of biological functions, including signal transduction, physical structuring of the cell and regulation. While extensive catalogs of such interactions are known from model organisms, their evolutionary histories are difficult to study given the lack of interaction data from phylogenetic outgroups. Using phylogenomic approaches, we infer a upper bound on the time of origin for a large set of human protein-protein interactions, showing that most such interactions appear relatively ancient, dating no later than the radiation of placental mammals. By analyzing paired alignments of orthologous and putatively interacting protein-coding genes from eight mammals, we find evidence for weak but significant co-evolution, as measured by relative selective constraint, between pairs of genes with interacting proteins. However, we find no strong evidence for shared instances of directional selection within an interacting pair. Finally, we use a network approach to show that the distribution of selective constraint across the protein interaction network is non-random, with a clear tendency for interacting proteins to share similar selective constraints. Collectively, the results suggest that, on the whole, protein interactions in mammals are under selective constraint, presumably due to their functional roles.
Singh, Pramesh; Chen, Tianlong; Arendsee, Zebulun; Wurtele, Eve S.; Bassler, Kevin E.
Orphan genes, which are genes unique to each particular species, have recently drawn significant attention for their potential usefulness for organismal robustness. Their origin and regulatory interaction patterns remain largely undiscovered. Recently, methods that use the context likelihood of relatedness to infer a network followed by modularity maximizing community detection algorithms on the inferred network to find the functional structure of regulatory networks were shown to be effective. We apply improved versions of these methods to gene expression data from Arabidopsis thaliana, identify groups (clusters) of interacting genes with related patterns of expression and analyze the structure within those groups. Focusing on clusters that contain orphan genes, we compare the identified clusters to gene ontology (GO) terms, regulons, and pathway designations and analyze their hierarchical structure. We predict new regulatory interactions and unravel the structure of the regulatory interaction patterns of orphan genes. Work supported by the NSF through Grants DMR-1507371 and IOS-1546858.
Winterbach, W.; Van Mieghem, P.; Reinders, M.; Wang, H.; De Ridder, D.
Molecular interactions are often represented as network models which have become the common language of many areas of biology. Graphs serve as convenient mathematical representations of network models and have themselves become objects of study. Their topology has been intensively researched over
Full Text Available Network analysis is one of the most widely used techniques in many areas of modern science. Most existing tools for that purpose are limited to drawing networks and computing their basic general characteristics. The user is not able to interactively and graphically manipulate the networks, select and explore subgraphs using other statistical and data mining techniques, add and plot various other data within the graph, and so on. In this paper we present a tool that addresses these challenges, an add-on for exploration of networks within the general component-based environment Orange.
Coronnello, C; Tumminello, M; Micciche, S; Mantegna, R.N.
Many biological systems can be described as networks where different elements interact, in order to perform biological processes. We introduce a network associated with the Gene Ontology. Specifically, we construct a correlation-based network where the vertices are the terms of the Gene Ontology and the link between each two terms is weighted on the basis of the number of genes that they have in common. We analyze a filtered network obtained from the correlation-based network and we characterize its evolution over different releases of the Gene Ontology.
Peng, Zhe-Ye; Tang, Zi-Jun; Xie, Min-Zhu
Complex diseases are results of gene-gene and gene-environment interactions. However, the detection of high-dimensional gene-gene interactions is computationally challenging. In the last two decades, machine-learning approaches have been developed to detect gene-gene interactions with some successes. In this review, we summarize the progress in research on machine learning methods, as applied to gene-gene interaction detection. It systematically examines the principles and limitations of the current machine learning methods used in genome wide association studies (GWAS) to detect gene-gene interactions, such as neural networks (NN), random forest (RF), support vector machines (SVM) and multifactor dimensionality reduction (MDR), and provides some insights on the future research directions in the field.
Dickison, Mark E.
This thesis employs methods of statistical mechanics and numerical simulations to study some aspects of dynamic and interacting complex networks. The mapping of various social and physical phenomena to complex networks has been a rich field in the past few decades. Subjects as broad as petroleum engineering, scientific collaborations, and the structure of the internet have all been analyzed in a network physics context, with useful and universal results. In the first chapter we introduce basic concepts in networks, including the two types of network configurations that are studied and the statistical physics and epidemiological models that form the framework of the network research, as well as covering various previously-derived results in network theory that are used in the work in the following chapters. In the second chapter we introduce a model for dynamic networks, where the links or the strengths of the links change over time. We solve the model by mapping dynamic networks to the problem of directed percolation, where the direction corresponds to the time evolution of the network. We show that the dynamic network undergoes a percolation phase transition at a critical concentration pc, that decreases with the rate r at which the network links are changed. The behavior near criticality is universal and independent of r. We find that for dynamic random networks fundamental laws are changed: i) The size of the giant component at criticality scales with the network size N for all values of r, rather than as N2/3 in static network, ii) In the presence of a broad distribution of disorder, the optimal path length between two nodes in a dynamic network scales as N1/2, compared to N1/3 in a static network. The third chapter consists of a study of the effect of quarantine on the propagation of epidemics on an adaptive network of social contacts. For this purpose, we analyze the susceptible-infected-recovered model in the presence of quarantine, where susceptible
Musungu, Bryan M.; Bhatnagar, Deepak; Brown, Robert L.; Payne, Gary A.; OBrian, Greg; Fakhoury, Ahmad M.; Geisler, Matt
A gene co-expression network (GEN) was generated using a dual RNA-seq study with the fungal pathogen Aspergillus flavus and its plant host Zea mays during the initial 3 days of infection. The analysis deciphered novel pathways and mapped genes of interest in both organisms during the infection. This network revealed a high degree of connectivity in many of the previously recognized pathways in Z. mays such as jasmonic acid, ethylene, and reactive oxygen species (ROS). For the pathogen A. flavus, a link between aflatoxin production and vesicular transport was identified within the network. There was significant interspecies correlation of expression between Z. mays and A. flavus for a subset of 104 Z. mays, and 1942 A. flavus genes. This resulted in an interspecies subnetwork enriched in multiple Z. mays genes involved in the production of ROS. In addition to the ROS from Z. mays, there was enrichment in the vesicular transport pathways and the aflatoxin pathway for A. flavus. Included in these genes, a key aflatoxin cluster regulator, AflS, was found to be co-regulated with multiple Z. mays ROS producing genes within the network, suggesting AflS may be monitoring host ROS levels. The entire GEN for both host and pathogen, and the subset of interspecies correlations, is presented as a tool for hypothesis generation and discovery for events in the early stages of fungal infection of Z. mays by A. flavus. PMID:27917194
Musungu, Bryan M; Bhatnagar, Deepak; Brown, Robert L; Payne, Gary A; OBrian, Greg; Fakhoury, Ahmad M; Geisler, Matt
A gene co-expression network (GEN) was generated using a dual RNA-seq study with the fungal pathogen Aspergillus flavus and its plant host Zea mays during the initial 3 days of infection. The analysis deciphered novel pathways and mapped genes of interest in both organisms during the infection. This network revealed a high degree of connectivity in many of the previously recognized pathways in Z. mays such as jasmonic acid, ethylene, and reactive oxygen species (ROS). For the pathogen A. flavus , a link between aflatoxin production and vesicular transport was identified within the network. There was significant interspecies correlation of expression between Z. mays and A. flavus for a subset of 104 Z. mays , and 1942 A. flavus genes. This resulted in an interspecies subnetwork enriched in multiple Z. mays genes involved in the production of ROS. In addition to the ROS from Z. mays , there was enrichment in the vesicular transport pathways and the aflatoxin pathway for A. flavus . Included in these genes, a key aflatoxin cluster regulator, AflS, was found to be co-regulated with multiple Z. mays ROS producing genes within the network, suggesting AflS may be monitoring host ROS levels. The entire GEN for both host and pathogen, and the subset of interspecies correlations, is presented as a tool for hypothesis generation and discovery for events in the early stages of fungal infection of Z. mays by A. flavus .
Musungu, Bryan M.; Bhatnagar, Deepak; Brown, Robert L.; Payne, Gary A.; OBrian, Greg; Fakhoury, Ahmad M.; Geisler, Matt
A gene co-expression network (GEN) was generated using a dual RNA-seq study with the fungal pathogen Aspergillus flavus and its plant host Zea mays during the initial 3 days of infection. The analysis deciphered novel pathways and mapped genes of interest in both organisms during the infection. This network revealed a high degree of connectivity in many of the previously recognized pathways in Z. mays such as jasmonic acid, ethylene, and reactive oxygen species (ROS). For the pathogen A. flav...
Full Text Available New Hi-C technologies have revealed that chromosomes have a complex network of spatial contacts in the cell nucleus of higher organisms, whose organisation is only partially understood. Here, we investigate the structure of such a network in human GM12878 cells, to derive a large scale picture of nuclear architecture. We find that the intensity of intra-chromosomal interactions is power-law distributed. Inter-chromosomal interactions are two orders of magnitude weaker and exponentially distributed, yet they are not randomly arranged along the genomic sequence. Intra-chromosomal contacts broadly occur between epigenomically homologous regions, whereas inter-chromosomal contacts are especially associated with regions rich in highly expressed genes. Overall, genomic contacts in the nucleus appear to be structured as a network of networks where a set of strongly individual chromosomal units, as envisaged in the 'chromosomal territory' scenario derived from microscopy, interact with each other via on average weaker, yet far from random and functionally important interactions.
Koo, Ching Lee; Liew, Mei Jing; Mohamad, Mohd Saberi; Salleh, Abdul Hakim Mohamed
Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.
Haibe-Kains, Benjamin; Olsen, Catharina; Djebbari, Amira; Bontempi, Gianluca; Correll, Mick; Bouton, Christopher; Quackenbush, John
Genomics provided us with an unprecedented quantity of data on the genes that are activated or repressed in a wide range of phenotypes. We have increasingly come to recognize that defining the networks and pathways underlying these phenotypes requires both the integration of multiple data types and the development of advanced computational methods to infer relationships between the genes and to estimate the predictive power of the networks through which they interact. To address these issues we have developed Predictive Networks (PN), a flexible, open-source, web-based application and data services framework that enables the integration, navigation, visualization and analysis of gene interaction networks. The primary goal of PN is to allow biomedical researchers to evaluate experimentally derived gene lists in the context of large-scale gene interaction networks. The PN analytical pipeline involves two key steps. The first is the collection of a comprehensive set of known gene interactions derived from a variety of publicly available sources. The second is to use these 'known' interactions together with gene expression data to infer robust gene networks. The PN web application is accessible from http://predictivenetworks.org. The PN code base is freely available at https://sourceforge.net/projects/predictivenets/.
Amoutzias, Gregory D; Robertson, David L; Oliver, Stephen G; Bornberg-Bauer, Erich
By combining phylogenetic, proteomic and structural information, we have elucidated the evolutionary driving forces for the gene-regulatory interaction networks of basic helix–loop–helix transcription factors. We infer that recurrent events of single-gene duplication and domain rearrangement repeatedly gave rise to distinct networks with almost identical hub-based topologies, and multiple activators and repressors. We thus provide the first empirical evidence for scale-free protein networks e...
Full Text Available Abstract Many different approaches have been developed to model and simulate gene regulatory networks. We proposed the following categories for gene regulatory network models: network parts lists, network topology models, network control logic models, and dynamic models. Here we will describe some examples for each of these categories. We will study the topology of gene regulatory networks in yeast in more detail, comparing a direct network derived from transcription factor binding data and an indirect network derived from genome-wide expression data in mutants. Regarding the network dynamics we briefly describe discrete and continuous approaches to network modelling, then describe a hybrid model called Finite State Linear Model and demonstrate that some simple network dynamics can be simulated in this model.
Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin
Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .
Full Text Available Next-generation sequencing was exploited to gain deeper insight into the response to infection by Candidatus liberibacter asiaticus (CaLas, especially the immune disregulation and metabolic dysfunction caused by source-sink disruption. Previous fruit transcriptome data were compared with additional RNA-Seq data in three tissues: immature fruit, and young and mature leaves. Four categories of orchard trees were studied: symptomatic, asymptomatic, apparently healthy, and healthy. Principal component analysis found distinct expression patterns between immature and mature fruits and leaf samples for all four categories of trees. A predicted protein - protein interaction network identified HLB-regulated genes for sugar transporters playing key roles in the overall plant responses. Gene set and pathway enrichment analyses highlight the role of sucrose and starch metabolism in disease symptom development in all tissues. HLB-regulated genes (glucose-phosphate-transporter, invertase, starch-related genes would likely determine the source-sink relationship disruption. In infected leaves, transcriptomic changes were observed for light reactions genes (downregulation, sucrose metabolism (upregulation, and starch biosynthesis (upregulation. In parallel, symptomatic fruits over-expressed genes involved in photosynthesis, sucrose and raffinose metabolism, and downregulated starch biosynthesis. We visualized gene networks between tissues inducing a source-sink shift. CaLas alters the hormone crosstalk, resulting in weak and ineffective tissue-specific plant immune responses necessary for bacterial clearance. Accordingly, expression of WRKYs (including WRKY70 was higher in fruits than in leaves. Systemic acquired responses were inadequately activated in young leaves, generally considered the sites where most new infections occur.
Amoutzias, Gregory D; Robertson, David L; Oliver, Stephen G; Bornberg-Bauer, Erich
By combining phylogenetic, proteomic and structural information, we have elucidated the evolutionary driving forces for the gene-regulatory interaction networks of basic helix-loop-helix transcription factors. We infer that recurrent events of single-gene duplication and domain rearrangement repeatedly gave rise to distinct networks with almost identical hub-based topologies, and multiple activators and repressors. We thus provide the first empirical evidence for scale-free protein networks emerging through single-gene duplications, the dominant importance of molecular modularity in the bottom-up construction of complex biological entities, and the convergent evolution of networks.
Liu, Xiaoping; Tang, Wei-Hua; Zhao, Xing-Ming; Chen, Luonan
Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB), which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN) of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other pathogenic fungi, which
Chen, Yulong; Su, Zhiguang
Establishing a systematic network is aimed at finding essential human gene-gene/gene-disease pathway by means of network inter-connecting patterns and functional annotation analysis. In the present study, we have analyzed functional gene interactions of short-chain acyl-coenzyme A dehydrogenase gene (ACADS). ACADS plays a vital role in free fatty acid β-oxidation and regulates energy homeostasis. Modules of highly inter-connected genes in disease-specific ACADS network are derived by integrating gene function and protein interaction data. Among the 8 genes in ACADS web retrieved from both STRING and GeneMANIA, ACADS is effectively conjoined with 4 genes including HAHDA, HADHB, ECHS1 and ACAT1. The functional analysis is done via ontological briefing and candidate disease identification. We observed that the highly efficient-interlinked genes connected with ACADS are HAHDA, HADHB, ECHS1 and ACAT1. Interestingly, the ontological aspect of genes in the ACADS network reveals that ACADS, HAHDA and HADHB play equally vital roles in fatty acid metabolism. The gene ACAT1 together with ACADS indulges in ketone metabolism. Our computational gene web analysis also predicts potential candidate disease recognition, thus indicating the involvement of ACADS, HAHDA, HADHB, ECHS1 and ACAT1 not only with lipid metabolism but also with infant death syndrome, skeletal myopathy, acute hepatic encephalopathy, Reye-like syndrome, episodic ketosis, and metabolic acidosis. The current study presents a comprehensible layout of ACADS network, its functional strategies and candidate disease approach associated with ACADS network. Copyright © 2015 Elsevier B.V. All rights reserved.
In the last ten years important breakthroughs in the understanding of the topology of complexity have been made in the framework of network science. Indeed it has been found that many networks belong to the universality classes called small-world networks or scale-free networks. Moreover it was found that the complex architecture of real world networks strongly affects the critical phenomena defined on these structures. Nevertheless the main focus of the research has been the characterization of single and static networks. Recently, temporal networks and interacting networks have attracted large interest. Indeed many networks are interacting or formed by a multilayer structure. Example of these networks are found in social networks where an individual might be at the same time part of different social networks, in economic and financial networks, in physiology or in infrastructure systems. Moreover, many networks are temporal, i.e. the links appear and disappear on the fast time scale. Examples of these networks are social networks of contacts such as face-to-face interactions or mobile-phone communication, the time-dependent correlations in the brain activity and etc. Understanding the evolution of temporal and multilayer networks and characterizing critical phenomena in these systems is crucial if we want to describe, predict and control the dynamics of complex system. In this thesis, we investigate several statistical mechanics models of temporal and interacting networks, to shed light on the dynamics of this new generation of complex networks. First, we investigate a model of temporal social networks aimed at characterizing human social interactions such as face-to-face interactions and phone-call communication. Indeed thanks to the availability of data on these interactions, we are now in the position to compare the proposed model to the real data finding good agreement. Second, we investigate the entropy of temporal networks and growing networks , to provide
Genome-Wide Analyses of the NAC Transcription Factor Gene Family in Pepper (Capsicum annuum L.: Chromosome Location, Phylogeny, Structure, Expression Patterns, Cis-Elements in the Promoter, and Interaction Network
different types of stress. Our results also showed that CaNAC36 plays an important role in the interaction network, interacting with 48 genes. Most of these genes are in the mitogen-activated protein kinase (MAPK family. Taken together, our results provide a platform for further studies to identify the biological functions of CaNAC genes.
Rao, Arvind; Hero, Alfred O; States, David J; Engel, James Douglas
Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster--to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.
Full Text Available Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.
Soliman, Maha; Nasraoui, Olfa; Cooper, Nigel G F
The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease. A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx. This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of
Bi, Dongbin; Ning, Hao; Liu, Shuai; Que, Xinxiang; Ding, Kejia
To explore molecular mechanisms of bladder cancer (BC), network strategy was used to find biomarkers for early detection and diagnosis. The differentially expressed genes (DEGs) between bladder carcinoma patients and normal subjects were screened using empirical Bayes method of the linear models for microarray data package. Co-expression networks were constructed by differentially co-expressed genes and links. Regulatory impact factors (RIF) metric was used to identify critical transcription factors (TFs). The protein-protein interaction (PPI) networks were constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and clusters were obtained through molecular complex detection (MCODE) algorithm. Centralities analyses for complex networks were performed based on degree, stress and betweenness. Enrichment analyses were performed based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Co-expression networks and TFs (based on expression data of global DEGs and DEGs in different stages and grades) were identified. Hub genes of complex networks, such as UBE2C, ACTA2, FABP4, CKS2, FN1 and TOP2A, were also obtained according to analysis of degree. In gene enrichment analyses of global DEGs, cell adhesion, proteinaceous extracellular matrix and extracellular matrix structural constituent were top three GO terms. ECM-receptor interaction, focal adhesion, and cell cycle were significant pathways. Our results provide some potential underlying biomarkers of BC. However, further validation is required and deep studies are needed to elucidate the pathogenesis of BC. Copyright © 2015 Elsevier Ltd. All rights reserved.
Pavlogiannis, Andreas; Mozhayskiy, Vadim; Tagkopoulos, Ilias
Biological networks tend to have high interconnectivity, complex topologies and multiple types of interactions. This renders difficult the identification of sub-networks that are involved in condition- specific responses. In addition, we generally lack scalable methods that can reveal the information flow in gene regulatory and biochemical pathways. Doing so will help us to identify key participants and paths under specific environmental and cellular context. This paper introduces the theory of network flooding, which aims to address the problem of network minimization and regulatory information flow in gene regulatory networks. Given a regulatory biological network, a set of source (input) nodes and optionally a set of sink (output) nodes, our task is to find (a) the minimal sub-network that encodes the regulatory program involving all input and output nodes and (b) the information flow from the source to the sink nodes of the network. Here, we describe a novel, scalable, network traversal algorithm and we assess its potential to achieve significant network size reduction in both synthetic and E. coli networks. Scalability and sensitivity analysis show that the proposed method scales well with the size of the network, and is robust to noise and missing data. The method of network flooding proves to be a useful, practical approach towards information flow analysis in gene regulatory networks. Further extension of the proposed theory has the potential to lead in a unifying framework for the simultaneous network minimization and information flow analysis across various "omics" levels.
Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming
In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.
Wu, Guanming; Haw, Robin
Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.
Barone, Antonio; Toti, Paolo; Giuca, Maria Rita; Derchi, Giacomo; Covani, Ugo
In this theoretical study, a text mining search and clustering analysis of data related to genes potentially involved in human pemphigoid autoimmune blistering diseases (PAIBD) was performed using web tools to create a gene/protein interaction network. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database was employed to identify a final set of PAIBD-involved genes and to calculate the overall significant interactions among genes: for each gene, the weighted number of links, or WNL, was registered and a clustering procedure was performed using the WNL analysis. Genes were ranked in class (leader, B, C, D and so on, up to orphans). An ontological analysis was performed for the set of 'leader' genes. Using the above-mentioned data network, 115 genes represented the final set; leader genes numbered 7 (intercellular adhesion molecule 1 (ICAM-1), interferon gamma (IFNG), interleukin (IL)-2, IL-4, IL-6, IL-8 and tumour necrosis factor (TNF)), class B genes were 13, whereas the orphans were 24. The ontological analysis attested that the molecular action was focused on extracellular space and cell surface, whereas the activation and regulation of the immunity system was widely involved. Despite the limited knowledge of the present pathologic phenomenon, attested by the presence of 24 genes revealing no protein-protein direct or indirect interactions, the network showed significant pathways gathered in several subgroups: cellular components, molecular functions, biological processes and the pathologic phenomenon obtained from the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database. The molecular basis for PAIBD was summarised and expanded, which will perhaps give researchers promising directions for the identification of new therapeutic targets.
Grammaticos, B; Carstea, A S; Ramani, A
We examine the dynamics of a network of genes focusing on a periodic chain of genes, of arbitrary length. We show that within a given class of sigmoids representing the equilibrium probability of the binding of the RNA polymerase to the core promoter, the system possesses a single stable fixed point. By slightly modifying the sigmoid, introducing 'stiffer' forms, we show that it is possible to find network configurations exhibiting bistable behaviour. Our results do not depend crucially on the length of the chain considered: calculations with finite chains lead to similar results. However, a realistic study of regulatory genetic networks would require the consideration of more complex topologies and interactions
Kordmahalleh, Mina Moradi; Sefidmazgi, Mohammad Gorji; Harrison, Scott H; Homaifar, Abdollah
The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects of transcription factors, repressors, small metabolites, and microRNA species. In addition, the effects of regulatory interactions are not always simultaneous, but can occur after a finite time delay, or as a combined outcome of simultaneous and time delayed interactions. Powerful biotechnologies have been rapidly and successfully measuring levels of genetic expression to illuminate different states of biological systems. This has led to an ensuing challenge to improve the identification of specific regulatory mechanisms through regulatory network reconstructions. Solutions to this challenge will ultimately help to spur forward efforts based on the usage of regulatory network reconstructions in systems biology applications. We have developed a hierarchical recurrent neural network (HRNN) that identifies time-delayed gene interactions using time-course data. A customized genetic algorithm (GA) was used to optimize hierarchical connectivity of regulatory genes and a target gene. The proposed design provides a non-fully connected network with the flexibility of using recurrent connections inside the network. These features and the non-linearity of the HRNN facilitate the process of identifying temporal patterns of a GRN. Our HRNN method was implemented with the Python language. It was first evaluated on simulated data representing linear and nonlinear time-delayed gene-gene interaction models across a range of network sizes and variances of noise. We then further demonstrated the capability of our method in reconstructing GRNs of the Saccharomyces cerevisiae synthetic network for in vivo benchmarking of reverse-engineering and modeling approaches (IRMA). We compared the performance of our method to TD-ARACNE, HCC-CLINDE, TSNI and ebdbNet across different network
BACKGROUND: There is increasing interest in the evolution of protein-protein interactions because this should ultimately be informative of the patterns of evolution of new protein functions within the cell. One model proposes that the evolution of new protein-protein interactions and protein complexes proceeds through the duplication of self-interacting genes. This model is supported by data from yeast. We examined the relationship between gene duplication and self-interaction in the human genome. RESULTS: We investigated the patterns of self-interaction and duplication among 34808 interactions encoded by 8881 human genes, and show that self-interacting proteins are encoded by genes with higher duplicability than genes whose proteins lack this type of interaction. We show that this result is robust against the system used to define duplicate genes. Finally we compared the presence of self-interactions amongst proteins whose genes have duplicated either through whole-genome duplication (WGD) or small-scale duplication (SSD), and show that the former tend to have more interactions in general. After controlling for age differences between the two sets of duplicates this result can be explained by the time since the gene duplication. CONCLUSIONS: Genes encoding self-interacting proteins tend to have higher duplicability than proteins lacking self-interactions. Moreover these duplicate genes have more often arisen through whole-genome rather than small-scale duplication. Finally, self-interacting WGD genes tend to have more interaction partners in general in the PIN, which can be explained by their overall greater age. This work adds to our growing knowledge of the importance of contextual factors in gene duplicability.
Jonathan H. Young
Full Text Available Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.
Mhedbi-Hajri, Nadia; Malfatti, Pierrette; Pédron, Jacques; Gaubert, Stéphane; Reverchon, Sylvie; Van Gijsegem, Frédérique
Successful infection of a pathogen relies on the coordinated expression of numerous virulence factor-encoding genes. In plant-bacteria interactions, this control is very often achieved through the integration of several regulatory circuits controlling cell-cell communication or sensing environmental conditions. Dickeya dadantii (formerly Erwinia chrysanthemi), the causal agent of soft rot on many crops and ornamentals, provokes maceration of infected plants mainly by producing and secreting a battery of plant cell wall-degrading enzymes. However, several other virulence factors have also been characterized. During Arabidopsis infection, most D. dadantii virulence gene transcripts accumulated in a coordinated manner during infection. This activation requires a functional GacA-GacS two-component regulatory system but the Gac system is not involved in the growth phase dependence of virulence gene expression. Here we show that, contrary to Pectobacterium, the AHL-mediated ExpIR quorum-sensing system does not play a major role in the growth phase-dependent control of D. dadantii virulence genes. On the other hand, the global regulator PecS participates in this coordinated expression since, in a pecS mutant, an early activation of virulence genes is observed both in vitro and in planta. This correlated with the known hypervirulence phenotype of the pecS mutant. Analysis of the relationship between the regulatory circuits governed by the PecS and GacA global regulators indicates that these two regulators act independently. PecS prevents a premature expression of virulence genes in the first stages of colonization whereas GacA, presumably in conjunction with other regulators, is required for the activation of virulence genes at the onset of symptom occurrence. © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.
The modeling of gene networks from transcriptional expression data is an important tool in biomedical research to reveal signaling pathways and to identify treatment targets. Current gene network modeling is primarily based on the use of Gaussian graphical models applied to continuous data, which give a closedformmarginal likelihood. In this paper,we extend network modeling to discrete data, specifically data from serial analysis of gene expression, and RNA-sequencing experiments, both of which generate counts of mRNAtranscripts in cell samples.We propose a generalized linear model to fit the discrete gene expression data and assume that the log ratios of the mean expression levels follow a Gaussian distribution.We restrict the gene network structures to decomposable graphs and derive the graphs by selecting the covariance matrix of the Gaussian distribution with the hyper-inverse Wishart priors. Furthermore, we incorporate prior network models based on gene ontology information, which avails existing biological information on the genes of interest. We conduct simulation studies to examine the performance of our discrete graphical model and apply the method to two real datasets for gene network inference. © The Author 2013. Published by Oxford University Press. All rights reserved.
Peter J Castaldi
Full Text Available Expression quantitative trait (eQTL studies are a powerful tool for identifying genetic variants that affect levels of messenger RNA. Since gene expression is controlled by a complex network of gene-regulating factors, one way to identify these factors is to search for interaction effects between genetic variants and mRNA levels of transcription factors (TFs and their respective target genes. However, identification of interaction effects in gene expression data pose a variety of methodological challenges, and it has become clear that such analyses should be conducted and interpreted with caution. Investigating the validity and interpretability of several interaction tests when screening for eQTL SNPs whose effect on the target gene expression is modified by the expression level of a transcription factor, we characterized two important methodological issues. First, we stress the scale-dependency of interaction effects and highlight that commonly applied transformation of gene expression data can induce or remove interactions, making interpretation of results more challenging. We then demonstrate that, in the setting of moderate to strong interaction effects on the order of what may be reasonably expected for eQTL studies, standard interaction screening can be biased due to heteroscedasticity induced by true interactions. Using simulation and real data analysis, we outline a set of reasonable minimum conditions and sample size requirements for reliable detection of variant-by-environment and variant-by-TF interactions using the heteroscedasticity consistent covariance-based approach.
Saik, Olga V; Demenkov, Pavel S; Ivanisenko, Timofey V; Bragina, Elena Yu; Freidin, Maxim B; Goncharova, Irina A; Dosenko, Victor E; Zolotareva, Olga I; Hofestaedt, Ralf; Lavrik, Inna N; Rogaev, Evgeny I; Ivanisenko, Vladimir A
Hypertension and bronchial asthma are a major issue for people's health. As of 2014, approximately one billion adults, or ~ 22% of the world population, have had hypertension. As of 2011, 235-330 million people globally have been affected by asthma and approximately 250,000-345,000 people have died each year from the disease. The development of the effective treatment therapies against these diseases is complicated by their comorbidity features. This is often a major problem in diagnosis and their treatment. Hence, in this study the bioinformatical methodology for the analysis of the comorbidity of these two diseases have been developed. As such, the search for candidate genes related to the comorbid conditions of asthma and hypertension can help in elucidating the molecular mechanisms underlying the comorbid condition of these two diseases, and can also be useful for genotyping and identifying new drug targets. Using ANDSystem, the reconstruction and analysis of gene networks associated with asthma and hypertension was carried out. The gene network of asthma included 755 genes/proteins and 62,603 interactions, while the gene network of hypertension - 713 genes/proteins and 45,479 interactions. Two hundred and five genes/proteins and 9638 interactions were shared between asthma and hypertension. An approach for ranking genes implicated in the comorbid condition of two diseases was proposed. The approach is based on nine criteria for ranking genes by their importance, including standard methods of gene prioritization (Endeavor, ToppGene) as well as original criteria that take into account the characteristics of an associative gene network and the presence of known polymorphisms in the analysed genes. According to the proposed approach, the genes IL10, TLR4, and CAT had the highest priority in the development of comorbidity of these two diseases. Additionally, it was revealed that the list of top genes is enriched with apoptotic genes and genes involved in
Kari Y Lam
Full Text Available Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method's utility in learning from data collected on different experimental platforms.
Bartsch, Ronny P; Liu, Kang K L; Bashan, Amir; Ivanov, Plamen Ch
We systematically study how diverse physiologic systems in the human organism dynamically interact and collectively behave to produce distinct physiologic states and functions. This is a fundamental question in the new interdisciplinary field of Network Physiology, and has not been previously explored. Introducing the novel concept of Time Delay Stability (TDS), we develop a computational approach to identify and quantify networks of physiologic interactions from long-term continuous, multi-channel physiological recordings. We also develop a physiologically-motivated visualization framework to map networks of dynamical organ interactions to graphical objects encoded with information about the coupling strength of network links quantified using the TDS measure. Applying a system-wide integrative approach, we identify distinct patterns in the network structure of organ interactions, as well as the frequency bands through which these interactions are mediated. We establish first maps representing physiologic organ network interactions and discover basic rules underlying the complex hierarchical reorganization in physiologic networks with transitions across physiologic states. Our findings demonstrate a direct association between network topology and physiologic function, and provide new insights into understanding how health and distinct physiologic states emerge from networked interactions among nonlinear multi-component complex systems. The presented here investigations are initial steps in building a first atlas of dynamic interactions among organ systems.
Bartsch, Ronny P.; Liu, Kang K. L.; Bashan, Amir; Ivanov, Plamen Ch.
We systematically study how diverse physiologic systems in the human organism dynamically interact and collectively behave to produce distinct physiologic states and functions. This is a fundamental question in the new interdisciplinary field of Network Physiology, and has not been previously explored. Introducing the novel concept of Time Delay Stability (TDS), we develop a computational approach to identify and quantify networks of physiologic interactions from long-term continuous, multi-channel physiological recordings. We also develop a physiologically-motivated visualization framework to map networks of dynamical organ interactions to graphical objects encoded with information about the coupling strength of network links quantified using the TDS measure. Applying a system-wide integrative approach, we identify distinct patterns in the network structure of organ interactions, as well as the frequency bands through which these interactions are mediated. We establish first maps representing physiologic organ network interactions and discover basic rules underlying the complex hierarchical reorganization in physiologic networks with transitions across physiologic states. Our findings demonstrate a direct association between network topology and physiologic function, and provide new insights into understanding how health and distinct physiologic states emerge from networked interactions among nonlinear multi-component complex systems. The presented here investigations are initial steps in building a first atlas of dynamic interactions among organ systems. PMID:26555073
Zhang, L.; Mallick, B. K.
graphical models applied to continuous data, which give a closedformmarginal likelihood. In this paper,we extend network modeling to discrete data, specifically data from serial analysis of gene expression, and RNA-sequencing experiments, both of which
Yan, Chuan; Zhang, Zhibin
The relationship between stability and biodiversity has long been debated in ecology due to opposing empirical observations and theoretical predictions. Species interaction strength is often assumed to be monotonically related to population density, but the effects on stability of ecological networks of non-monotonous interactions that change signs have not been investigated previously. We demonstrate that for four kinds of non-monotonous interactions, shifting signs to negative or neutral interactions at high population density increases persistence (a measure of stability) of ecological networks, while for the other two kinds of non-monotonous interactions shifting signs to positive interactions at high population density decreases persistence of networks. Our results reveal a novel mechanism of network stabilization caused by specific non-monotonous interaction types through either increasing stable equilibrium points or reducing unstable equilibrium points (or both). These specific non-monotonous interactions may be important in maintaining stable and complex ecological networks, as well as other networks such as genes, neurons, the internet and human societies.
Ho Joshua WK
Full Text Available Abstract Background It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging. Methods In this paper, we propose a new approach for identifying such gene-gene and gene-environment interactions underlying complex diseases. This is a hybrid algorithm and it combines genetic algorithm (GA and an ensemble of classifiers (called genetic ensemble. Using this approach, the original problem of SNP interaction identification is converted into a data mining problem of combinatorial feature selection. By collecting various single nucleotide polymorphisms (SNP subsets as well as environmental factors generated in multiple GA runs, patterns of gene-gene and gene-environment interactions can be extracted using a simple combinatorial ranking method. Also considered in this study is the idea of combining identification results obtained from multiple algorithms. A novel formula based on pairwise double fault is designed to quantify the degree of complementarity. Conclusions Our simulation study demonstrates that the proposed genetic ensemble algorithm has comparable identification power to Multifactor Dimensionality Reduction (MDR and is slightly better than Polymorphism Interaction Analysis (PIA, which are the two most popular methods for gene-gene interaction identification. More importantly, the identification results generated by using our genetic ensemble algorithm are highly complementary to those obtained by PIA and MDR. Experimental results from our simulation studies and real world data application also confirm the effectiveness of the proposed genetic ensemble algorithm, as well as the potential benefits of
Garrett, K A; Andersen, K F; Asche, F; Bowden, R L; Forbes, G A; Kulakow, P A; Zhou, B
Resistance genes are a major tool for managing crop diseases. The networks of crop breeders who exchange resistance genes and deploy them in varieties help to determine the global landscape of resistance and epidemics, an important system for maintaining food security. These networks function as a complex adaptive system, with associated strengths and vulnerabilities, and implications for policies to support resistance gene deployment strategies. Extensions of epidemic network analysis can be used to evaluate the multilayer agricultural networks that support and influence crop breeding networks. Here, we evaluate the general structure of crop breeding networks for cassava, potato, rice, and wheat. All four are clustered due to phytosanitary and intellectual property regulations, and linked through CGIAR hubs. Cassava networks primarily include public breeding groups, whereas others are more mixed. These systems must adapt to global change in climate and land use, the emergence of new diseases, and disruptive breeding technologies. Research priorities to support policy include how best to maintain both diversity and redundancy in the roles played by individual crop breeding groups (public versus private and global versus local), and how best to manage connectivity to optimize resistance gene deployment while avoiding risks to the useful life of resistance genes. [Formula: see text] Copyright © 2017 The Author(s). This is an open access article distributed under the CC BY 4.0 International license .
David A Garfield
Full Text Available Regulatory interactions buffer development against genetic and environmental perturbations, but adaptation requires phenotypes to change. We investigated the relationship between robustness and evolvability within the gene regulatory network underlying development of the larval skeleton in the sea urchin Strongylocentrotus purpuratus. We find extensive variation in gene expression in this network throughout development in a natural population, some of which has a heritable genetic basis. Switch-like regulatory interactions predominate during early development, buffer expression variation, and may promote the accumulation of cryptic genetic variation affecting early stages. Regulatory interactions during later development are typically more sensitive (linear, allowing variation in expression to affect downstream target genes. Variation in skeletal morphology is associated primarily with expression variation of a few, primarily structural, genes at terminal positions within the network. These results indicate that the position and properties of gene interactions within a network can have important evolutionary consequences independent of their immediate regulatory role.
Di Camillo, Barbara; Toffolo, Gianna; Cobelli, Claudio
In the context of reverse engineering of biological networks, simulators are helpful to test and compare the accuracy of different reverse-engineering approaches in a variety of experimental conditions. A novel gene-network simulator is presented that resembles some of the main features of transcriptional regulatory networks related to topology, interaction among regulators of transcription, and expression dynamics. The simulator generates network topology according to the current knowledge of biological network organization, including scale-free distribution of the connectivity and clustering coefficient independent of the number of nodes in the network. It uses fuzzy logic to represent interactions among the regulators of each gene, integrated with differential equations to generate continuous data, comparable to real data for variety and dynamic complexity. Finally, the simulator accounts for saturation in the response to regulation and transcription activation thresholds and shows robustness to perturbations. It therefore provides a reliable and versatile test bed for reverse engineering algorithms applied to microarray data. Since the simulator describes regulatory interactions and expression dynamics as two distinct, although interconnected aspects of regulation, it can also be used to test reverse engineering approaches that use both microarray and protein-protein interaction data in the process of learning. A first software release is available at http://www.dei.unipd.it/~dicamill/software/netsim as an R programming language package.
Full Text Available Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB, which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other
Full Text Available Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3, the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA by: i introducing quality control of co-expression similarities, ii parallelizing embedded network construction, and iii developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs. We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA. MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.
Song, Won-Min; Zhang, Bin
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.
Yang, Jian; Yang, Tinghong; Wu, Duzhi; Lin, Limei; Yang, Fan; Zhao, Jing
Physical and functional interplays between genes or proteins have important biological meaning for cellular functions. Some efforts have been made to construct weighted gene association meta-networks by integrating multiple biological resources, where the weight indicates the confidence of the interaction. However, it is found that these existing human gene association networks share only quite limited overlapped interactions, suggesting their incompleteness and noise. Here we proposed a workflow to construct a weighted human gene association network using information of six existing networks, including two weighted specific PPI networks and four gene association meta-networks. We applied link prediction algorithm to predict possible missing links of the networks, cross-validation approach to refine each network and finally integrated the refined networks to get the final integrated network. The common information among the refined networks increases notably, suggesting their higher reliability. Our final integrated network owns much more links than most of the original networks, meanwhile its links still keep high functional relevance. Being used as background network in a case study of disease gene prediction, the final integrated network presents good performance, implying its reliability and application significance. Our workflow could be insightful for integrating and refining existing gene association data.
Kocarev, L; Zlatanov, N; Trajanov, D
The concept of vulnerability is introduced for a model of random, dynamical interactions on networks. In this model, known as the influence model, the nodes are arranged in an arbitrary network, while the evolution of the status at a node is according to an internal Markov chain, but with transition probabilities that depend not only on the current status of that node but also on the statuses of the neighbouring nodes. Vulnerability is treated analytically and numerically for several networks with different topological structures, as well as for two real networks--the network of infrastructures and the EU power grid--identifying the most vulnerable nodes of these networks.
Pluripotent stem cells can be isolated from embryos or derived by reprogramming. Pluripotency is stabilized by an interconnected network of pluripotency genes that cooperatively regulate gene expression. Here we describe the molecular principles of pluripotency gene function and highlight post-transcriptional controls, particularly those induced by RNA-binding proteins and alternative splicing, as an important regulatory layer of pluripotency. We also discuss heterogeneity in pluripotency regulation, alternative pluripotency states and future directions of pluripotent stem cell research.
Li, Mo; Belmonte, Juan Carlos Izpisua
Pluripotent stem cells can be isolated from embryos or derived by reprogramming. Pluripotency is stabilized by an interconnected network of pluripotency genes that cooperatively regulate gene expression. Here we describe the molecular principles of pluripotency gene function and highlight post-transcriptional controls, particularly those induced by RNA-binding proteins and alternative splicing, as an important regulatory layer of pluripotency. We also discuss heterogeneity in pluripotency regulation, alternative pluripotency states and future directions of pluripotent stem cell research.
Tiys, Evgeny S; Ivanisenko, Timofey V; Demenkov, Pavel S; Ivanisenko, Vladimir A
Estimation of functional connectivity in gene sets derived from genome-wide or other biological experiments is one of the essential tasks of bioinformatics. A promising approach for solving this problem is to compare gene networks built using experimental gene sets with random networks. One of the resources that make such an analysis possible is CrossTalkZ, which uses the FunCoup database. However, existing methods, including CrossTalkZ, do not take into account individual types of interactions, such as protein/protein interactions, expression regulation, transport regulation, catalytic reactions, etc., but rather work with generalized types characterizing the existence of any connection between network members. We developed the online tool FunGeneNet, which utilizes the ANDSystem and STRING to reconstruct gene networks using experimental gene sets and to estimate their difference from random networks. To compare the reconstructed networks with random ones, the node permutation algorithm implemented in CrossTalkZ was taken as a basis. To study the FunGeneNet applicability, the functional connectivity analysis of networks constructed for gene sets involved in the Gene Ontology biological processes was conducted. We showed that the method sensitivity exceeds 0.8 at a specificity of 0.95. We found that the significance level of the difference between gene networks of biological processes and random networks is determined by the type of connections considered between objects. At the same time, the highest reliability is achieved for the generalized form of connections that takes into account all the individual types of connections. By taking examples of the thyroid cancer networks and the apoptosis network, it is demonstrated that key participants in these processes are involved in the interactions of those types by which these networks differ from random ones. FunGeneNet is a web tool aimed at proving the functionality of networks in a wide range of sizes of
Full Text Available We tackle the problem of completing and inferring genetic networks under stationary conditions from static data, where network completion is to make the minimum amount of modifications to an initial network so that the completed network is most consistent with the expression data in which addition of edges and deletion of edges are basic modification operations. For this problem, we present a new method for network completion using dynamic programming and least-squares fitting. This method can find an optimal solution in polynomial time if the maximum indegree of the network is bounded by a constant. We evaluate the effectiveness of our method through computational experiments using synthetic data. Furthermore, we demonstrate that our proposed method can distinguish the differences between two types of genetic networks under stationary conditions from lung cancer and normal gene expression data.
Full Text Available Abstract Background Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. Results In this paper, we introduce a novel gene regulatory network inference (GRNI algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. Conclusions For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.
Altay, Gökmen; Emmert-Streib, Frank
Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.
Cui, Ying; Cai, Meng; Dai, Yang; Stanley, H. Eugene
Detecting disease-related genes is crucial in disease diagnosis and drug design. The accepted view is that neighbors of a disease-causing gene in a molecular network tend to cause the same or similar diseases, and network-based methods have been recently developed to identify novel hereditary disease-genes in available biomedical networks. Despite the steady increase in the discovery of disease-associated genes, there is still a large fraction of disease genes that remains under the tip of the iceberg. In this paper we exploit the topological properties of the protein-protein interaction (PPI) network to detect disease-related genes. We compute, analyze, and compare the topological properties of disease genes with non-disease genes in PPI networks. We also design an improved random forest classifier based on these network topological features, and a cross-validation test confirms that our method performs better than previous similar studies.
Gene regulatory networks analyze the relationships between genes allowing us to un- derstand the gene regulatory interactions in systems biology. Gene expression data from the microarray experiments is used to obtain the gene regulatory networks. How- ever, the microarray data is discrete, noisy and non-linear which makes learning the networks a challenging problem and existing gene network inference methods do not give consistent results. Current state-of-the-art study uses the average-ranking-based consensus method to combine and average the ranked predictions from individual methods. However each individual method has an equal contribution to the consen- sus prediction. We have developed a linear programming-based consensus approach which uses learned weights from linear programming among individual methods such that the methods have di↵erent weights depending on their performance. Our result reveals that assigning di↵erent weights to individual methods rather than giving them equal weights improves the performance of the consensus. The linear programming- based consensus method is evaluated and it had the best performance on in silico and Saccharomyces cerevisiae networks, and the second best on the Escherichia coli network outperformed by Inferelator Pipeline method which gives inconsistent results across a wide range of microarray data sets.
Naushad, Shaik Mohammad; Ramaiah, M Janaki; Pavithrakumari, Manickam; Jayapriya, Jaganathan; Hussain, Tajamul; Alrokayan, Salman A; Gottumukkala, Suryanarayana Raju; Digumarti, Raghunadharao; Kutala, Vijay Kumar
In the current study, an artificial neural network (ANN)-based breast cancer prediction model was developed from the data of folate and xenobiotic pathway genetic polymorphisms along with the nutritional and demographic variables to investigate how micronutrients modulate susceptibility to breast cancer. The developed ANN model explained 94.2% variability in breast cancer prediction. Fixed effect models of folate (400 μg/day) and B12 (6 μg/day) showed 33.3% and 11.3% risk reduction, respectively. Multifactor dimensionality reduction analysis showed the following interactions in responders to folate: RFC1 G80A × MTHFR C677T (primary), COMT H108L × CYP1A1 m2 (secondary), MTR A2756G (tertiary). The interactions among responders to B12 were RFC1G80A × cSHMT C1420T and CYP1A1 m2 × CYP1A1 m4. ANN simulations revealed that increased folate might restore ER and PR expression and reduce the promoter CpG island methylation of extra cellular superoxide dismutase and BRCA1. Dietary intake of folate appears to confer protection against breast cancer through its modulating effects on ER and PR expression and methylation of EC-SOD and BRCA1. Copyright © 2016 Elsevier B.V. All rights reserved.
Full Text Available Interactions between proteins and genes are considered essential in the description of biomolecular phenomena, and networks of interactions are applied in a system's biology approach. Recently, many studies have sought to extract information from biomolecular text using natural language processing technology. Previous studies have asserted that linguistic information is useful for improving the detection of gene interactions. In particular, syntactic relations among linguistic information are good for detecting gene interactions. However, previous systems give a reasonably good precision but poor recall. To improve recall without sacrificing precision, this paper proposes a three-phase method for detecting gene interactions based on syntactic relations. In the first phase, we retrieve syntactic encapsulation categories for each candidate agent and target. In the second phase, we construct a verb list that indicates the nature of the interaction between pairs of genes. In the last phase, we determine direction rules to detect which of two genes is the agent or target. Even without biomolecular knowledge, our method performs reasonably well using a small training dataset. While the first phase contributes to improve recall, the second and third phases contribute to improve precision. In the experimental results using ICML 05 Workshop on Learning Language in Logic (LLL05 data, our proposed method gave an F-measure of 67.2% for the test data, significantly outperforming previous methods. We also describe the contribution of each phase to the performance.
... Matters NIH Research Matters August 12, 2013 Mutated Genes in Schizophrenia Map to Brain Networks Schizophrenia networks ... have a high number of spontaneous mutations in genes that form a network in the front region ...
Kinzel, Wolfgang; Metzler, Richard; Kanter, Ido
Recent results on the statistical physics of time series generation and prediction are presented. A neural network is trained on quasi-periodic and chaotic sequences and overlaps to the sequence generator as well as the prediction errors are calculated numerically. For each network there exists a sequence for which it completely fails to make predictions. Two interacting networks show a transition to perfect synchronization. A pool of interacting networks shows good coordination in the minority game-a model of competition in a closed market. Finally, as a demonstration, a perceptron predicts bit sequences produced by human beings.
Ching Lee Koo
Full Text Available Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs, support vector machine (SVM, and random forests (RFs in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.
Full Text Available The architecture of tomato inflorescence strongly affects flower production and subsequent crop yield. To understand the genetic activities involved, insight into the underlying network of genes that initiate and control the sympodial growth in the tomato is essential. In this paper, we show how the structure of this network can be derived from available data of the expressions of the involved genes. Our approach starts from employing biological expert knowledge to select the most probable gene candidates behind branching behavior. To find how these genes interact, we develop a stepwise procedure for computational inference of the network structure. Our data consists of expression levels from primary shoot meristems, measured at different developmental stages on three different genotypes of tomato. With the network inferred by our algorithm, we can explain the dynamics corresponding to all three genotypes simultaneously, despite their apparent dissimilarities. We also correctly predict the chronological order of expression peaks for the main hubs in the network. Based on the inferred network, using optimal experimental design criteria, we are able to suggest an informative set of experiments for further investigation of the mechanisms underlying branching behavior.
Full Text Available Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.
Shlykova, Irina; Ponosov, Arcady
There are different ways of how to model gene regulatory networks. Differential equations allow for a detailed description of the network's dynamics and provide an explicit model of the gene concentration changes over time. Production and relative degradation rate functions used in such models depend on the vector of steeply sloped threshold functions which characterize the activity of genes. The most popular example of the threshold functions comes from the Boolean network approach, where the threshold functions are given by step functions. The system of differential equations becomes then piecewise linear. The dynamics of this system can be described very easily between the thresholds, but not in the switching domains. For instance this approach fails to analyze stationary points of the system and to define continuous solutions in the switching domains. These problems were studied in , , but the proposed model did not take into account a time delay in cellular systems. However, analysis of real gene expression data shows a considerable number of time-delayed interactions suggesting that time delay is essential in gene regulation. Therefore, delays may have a great effect on the dynamics of the system presenting one of the critical factors that should be considered in reconstruction of gene regulatory networks. The goal of this work is to apply the singular perturbation analysis to certain systems with delay and to obtain an analog of Tikhonov's theorem, which provides sufficient conditions for constracting the limit system in the delay case.
El-Kebir, M.; Brandt, B.W.; Heringa, J.; Klau, G.W.
Background Molecular interactions need to be taken into account to adequately model the complex behavior of biological systems. These interactions are captured by various types of biological networks, such as metabolic, gene-regulatory, signal transduction and protein-protein interaction networks.
Allot, Alexis; Chennen, Kirsley; Nevers, Yannis; Poidevin, Laetitia; Kress, Arnaud; Ripp, Raymond; Thompson, Julie Dawn; Poch, Olivier; Lecompte, Odile
The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user's specific interests and provides an efficient way to share information with collaborators. Furthermore, the user's behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi.fr/mygenefriends. ©Alexis Allot, Kirsley Chennen, Yannis
Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.
Abstract Background Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori assumptions about the interactions, which all simulate the observed patterns. It is important to analyze the properties of the circuits. Findings We have analyzed the simulated gene expression ...
Costanzo, Michael; VanderSluis, Benjamin; Koch, Elizabeth N; Baryshnikova, Anastasia; Pons, Carles; Tan, Guihong; Wang, Wen; Usaj, Matej; Hanchard, Julia; Lee, Susan D; Pelechano, Vicent; Styles, Erin B; Billmann, Maximilian; van Leeuwen, Jolanda; van Dyk, Nydia; Lin, Zhen-Yuan; Kuzmin, Elena; Nelson, Justin; Piotrowski, Jeff S; Srikumar, Tharan; Bahr, Sondra; Chen, Yiqun; Deshpande, Raamesh; Kurat, Christoph F; Li, Sheena C; Li, Zhijian; Usaj, Mojca Mattiazzi; Okada, Hiroki; Pascoe, Natasha; San Luis, Bryan-Joseph; Sharifpoor, Sara; Shuteriqi, Emira; Simpkins, Scott W; Snider, Jamie; Suresh, Harsha Garadi; Tan, Yizhao; Zhu, Hongwei; Malod-Dognin, Noel; Janjic, Vuk; Przulj, Natasa; Troyanskaya, Olga G; Stagljar, Igor; Xia, Tian; Ohya, Yoshikazu; Gingras, Anne-Claude; Raught, Brian; Boutros, Michael; Steinmetz, Lars M; Moore, Claire L; Rosebrock, Adam P; Caudy, Amy A; Myers, Chad L; Andrews, Brenda; Boone, Charles
We generated a global genetic interaction network for Saccharomyces cerevisiae, constructing more than 23 million double mutants, identifying about 550,000 negative and about 350,000 positive genetic interactions. This comprehensive network maps genetic interactions for essential gene pairs, highlighting essential genes as densely connected hubs. Genetic interaction profiles enabled assembly of a hierarchical model of cell function, including modules corresponding to protein complexes and pathways, biological processes, and cellular compartments. Negative interactions connected functionally related genes, mapped core bioprocesses, and identified pleiotropic genes, whereas positive interactions often mapped general regulatory connections among gene pairs, rather than shared functionality. The global network illustrates how coherent sets of genetic interactions connect protein complex and pathway modules to map a functional wiring diagram of the cell. Copyright © 2016, American Association for the Advancement of Science.
Costanzo, Michael; VanderSluis, Benjamin; Koch, Elizabeth N.; Baryshnikova, Anastasia; Pons, Carles; Tan, Guihong; Wang, Wen; Usaj, Matej; Hanchard, Julia; Lee, Susan D.; Pelechano, Vicent; Styles, Erin B.; Billmann, Maximilian; van Leeuwen, Jolanda; van Dyk, Nydia; Lin, Zhen-Yuan; Kuzmin, Elena; Nelson, Justin; Piotrowski, Jeff S.; Srikumar, Tharan; Bahr, Sondra; Chen, Yiqun; Deshpande, Raamesh; Kurat, Christoph F.; Li, Sheena C.; Li, Zhijian; Usaj, Mojca Mattiazzi; Okada, Hiroki; Pascoe, Natasha; Luis, Bryan-Joseph San; Sharifpoor, Sara; Shuteriqi, Emira; Simpkins, Scott W.; Snider, Jamie; Suresh, Harsha Garadi; Tan, Yizhao; Zhu, Hongwei; Malod-Dognin, Noel; Janjic, Vuk; Przulj, Natasa; Troyanskaya, Olga G.; Stagljar, Igor; Xia, Tian; Ohya, Yoshikazu; Gingras, Anne-Claude; Raught, Brian; Boutros, Michael; Steinmetz, Lars M.; Moore, Claire L.; Rosebrock, Adam P.; Caudy, Amy A.; Myers, Chad L.; Andrews, Brenda; Boone, Charles
We generated a global genetic interaction network for Saccharomyces cerevisiae, constructing over 23 million double mutants, identifying ~550,000 negative and ~350,000 positive genetic interactions. This comprehensive network maps genetic interactions for essential gene pairs, highlighting essential genes as densely connected hubs. Genetic interaction profiles enabled assembly of a hierarchical model of cell function, including modules corresponding to protein complexes and pathways, biological processes, and cellular compartments. Negative interactions connected functionally related genes, mapped core bioprocesses, and identified pleiotropic genes, whereas positive interactions often mapped general regulatory connections among gene pairs, rather than shared functionality. The global network illustrates how coherent sets of genetic interactions connect protein complex and pathway modules to map a functional wiring diagram of the cell. PMID:27708008
Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli , and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Alexey Anatolievich Morozov
Full Text Available Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary, sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm. Binary encoding can also be useful, but only when the methods mentioned above cannot be used.
The objective of this CDA is to evaluate the gene-gene and gene-environment interactions in the etiology of breast cancer in two ongoing case-control studies, the Shanghai Breast Cancer Study (SBCS...
Rubiolo, Mariano; Milone, Diego H; Stegmayer, Georgina
Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times series. This work proposes a novel method based on a pool of neural networks for obtaining a gene regulatory network from a gene expression dataset. They are used for modeling each possible interaction between pairs of genes in the dataset, and a set of mining rules is applied to accurately detect the subjacent relations among genes. The results obtained on artificial and real datasets confirm the method effectiveness for discovering regulatory networks from a proper modeling of the temporal dynamics of gene expression profiles.
Full Text Available Abstract Background Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact. Results A recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement. Conclusions Compared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA.
Gupta, Chinmaya; López, José Manuel; Ott, William; Josić, Krešimir; Bennett, Matthew R
Transcriptional delay can significantly impact the dynamics of gene networks. Here we examine how such delay affects bistable systems. We investigate several stochastic models of bistable gene networks and find that increasing delay dramatically increases the mean residence times near stable states. To explain this, we introduce a non-Markovian, analytically tractable reduced model. The model shows that stabilization is the consequence of an increased number of failed transitions between stable states. Each of the bistable systems that we simulate behaves in this manner.
Hooiveld, MHW; Morgan, R; Rieden, PID; Houtzager, E; Pannese, M; Damen, K; Boncinelli, E; Durston, AJ
Understanding why metazoan Hox/HOM-C genes are expressed in spatiotemporal sequences showing colinearity with their genomic sequence is a central challenge in developmental biology. Here, we studied the consequences of ectopically expressing Hox genes to investigate whether Hox-Hox interactions
Li-Feng, Gao; Jian-Jun, Shi; Shan, Guan
In this paper, we attempt to understand complex network evolution from the underlying evolutionary relationship between biological organisms. Firstly, we construct a Pfam domain interaction network for each of the 470 completely sequenced organisms, and therefore each organism is correlated with a specific Pfam domain interaction network; secondly, we infer the evolutionary relationship of these organisms with the nearest neighbour joining method; thirdly, we use the evolutionary relationship between organisms constructed in the second step as the evolutionary course of the Pfam domain interaction network constructed in the first step. This analysis of the evolutionary course shows: (i) there is a conserved sub-network structure in network evolution; in this sub-network, nodes with lower degree prefer to maintain their connectivity invariant, and hubs tend to maintain their role as a hub is attached preferentially to new added nodes; (ii) few nodes are conserved as hubs; most of the other nodes are conserved as one with very low degree; (iii) in the course of network evolution, new nodes are added to the network either individually in most cases or as clusters with relative high clustering coefficients in a very few cases. (general)
Fujii, Chisato; Kuwahara, Hiroyuki; Yu, Ge; Guo, Lili; Gao, Xin
An accurate determination of the network structure of gene regulatory systems from high-throughput gene expression data is an essential yet challenging step in studying how the expression of endogenous genes is controlled through a complex interaction of gene products and DNA. While numerous methods have been proposed to infer the structure of gene regulatory networks, none of them seem to work consistently over different data sets with high accuracy. A recent study to compare gene network inference methods showed that an average-ranking-based consensus method consistently performs well under various settings. Here, we propose a linear programming-based consensus method for the inference of gene regulatory networks. Unlike the average-ranking-based one, which treats the contribution of each individual method equally, our new consensus method assigns a weight to each method based on its credibility. As a case study, we applied the proposed consensus method on synthetic and real microarray data sets, and compared its performance to that of the average-ranking-based consensus and individual inference methods. Our results show that our weighted consensus method achieves superior performance over the unweighted one, suggesting that assigning weights to different individual methods rather than giving them equal weights improves the accuracy. © 2016 Elsevier B.V.
An accurate determination of the network structure of gene regulatory systems from high-throughput gene expression data is an essential yet challenging step in studying how the expression of endogenous genes is controlled through a complex interaction of gene products and DNA. While numerous methods have been proposed to infer the structure of gene regulatory networks, none of them seem to work consistently over different data sets with high accuracy. A recent study to compare gene network inference methods showed that an average-ranking-based consensus method consistently performs well under various settings. Here, we propose a linear programming-based consensus method for the inference of gene regulatory networks. Unlike the average-ranking-based one, which treats the contribution of each individual method equally, our new consensus method assigns a weight to each method based on its credibility. As a case study, we applied the proposed consensus method on synthetic and real microarray data sets, and compared its performance to that of the average-ranking-based consensus and individual inference methods. Our results show that our weighted consensus method achieves superior performance over the unweighted one, suggesting that assigning weights to different individual methods rather than giving them equal weights improves the accuracy. © 2016 Elsevier B.V.
Garg, Abhishek; Di Cara, Alessandro; Xenarios, Ioannis; Mendoza, Luis; De Micheli, Giovanni
In silico modeling of gene regulatory networks has gained some momentum recently due to increased interest in analyzing the dynamics of biological systems. This has been further facilitated by the increasing availability of experimental data on gene-gene, protein-protein and gene-protein interactions. The two dynamical properties that are often experimentally testable are perturbations and stable steady states. Although a lot of work has been done on the identification of steady states, not much work has been reported on in silico modeling of cellular differentiation processes. In this manuscript, we provide algorithms based on reduced ordered binary decision diagrams (ROBDDs) for Boolean modeling of gene regulatory networks. Algorithms for synchronous and asynchronous transition models have been proposed and their corresponding computational properties have been analyzed. These algorithms allow users to compute cyclic attractors of large networks that are currently not feasible using existing software. Hereby we provide a framework to analyze the effect of multiple gene perturbation protocols, and their effect on cell differentiation processes. These algorithms were validated on the T-helper model showing the correct steady state identification and Th1-Th2 cellular differentiation process. The software binaries for Windows and Linux platforms can be downloaded from http://si2.epfl.ch/~garg/genysis.html.
D. M. Feldman
Full Text Available Abstract: The processes of fragmentation (regionalization and localization and globalization turn the state as the basic system forming element of the state-centric world political system into the component of the world political network. The political relations between actors of the world political network are ruled by the effectiveness and not by legitimacy (“victory rules”, what is different from the participatory principles of interstate relations (“participation rules” accepted by the Westphalian state system. The article argues that the post-Westphalian world political system will witness the clashes between victory rules and participation rules and their eventual coexistence since the very nature of the victory rules hinders its institutionalization, consolidation and legitimation. The article suggests that the new system of state relations regardless of the name will be not less Westphalian than the preceding one thus new participation rules will have to be formulated and codified.
Bergholdt, R.; Størling, Zenia, Marian; Hansen, Kasper Lage
We have developed an integrative analysis method combining genetic interactions, identified using type 1 diabetes genome scan data, and a high-confidence human protein interaction network. Resulting networks were ranked by the significance of the enrichment of proteins from interacting regions. We...... identified a number of new protein network modules and novel candidate genes/proteins for type 1 diabetes. We propose this type of integrative analysis as a general method for the elucidation of genes and networks involved in diabetes and other complex diseases....
Van Bel, Michiel; Coppens, Frederik
Selecting and filtering a reference expression and interaction dataset when studying specific pathways and regulatory interactions can be a very time-consuming and error-prone task. In order to reduce the duplicated efforts required to amass such datasets, we have created the CORNET (CORrelation NETworks) platform which allows for easy access to a wide variety of data types: coexpression data, protein-protein interactions, regulatory interactions, and functional annotations. The CORNET platform outputs its results in either text format or through the Cytoscape framework, which is automatically launched by the CORNET website.CORNET 3.0 is the third iteration of the web platform designed for the user exploration of the coexpression space of plant genomes, with a focus on the model species Arabidopsis thaliana. Here we describe the platform: the tools, data, and best practices when using the platform. We indicate how the platform can be used to infer networks from a set of input genes, such as upregulated genes from an expression experiment. By exploring the network, new target and regulator genes can be discovered, allowing for follow-up experiments and more in-depth study. We also indicate how to avoid common pitfalls when evaluating the networks and how to avoid over interpretation of the results.All CORNET versions are available at http://bioinformatics.psb.ugent.be/cornet/ .
Full Text Available We investigate interaction networks that we derive from multivariate time series with methods frequently employed in diverse scientific fields such as biology, quantitative finance, physics, earth and climate sciences, and the neurosciences. Mimicking experimental situations, we generate time series with finite length and varying frequency content but from independent stochastic processes. Using the correlation coefficient and the maximum cross-correlation, we estimate interdependencies between these time series. With clustering coefficient and average shortest path length, we observe unweighted interaction networks, derived via thresholding the values of interdependence, to possess non-trivial topologies as compared to Erdös-Rényi networks, which would indicate small-world characteristics. These topologies reflect the mostly unavoidable finiteness of the data, which limits the reliability of typically used estimators of signal interdependence. We propose random networks that are tailored to the way interaction networks are derived from empirical data. Through an exemplary investigation of multichannel electroencephalographic recordings of epileptic seizures--known for their complex spatial and temporal dynamics--we show that such random networks help to distinguish network properties of interdependence structures related to seizure dynamics from those spuriously induced by the applied methods of analysis.
Živković, J.; Tadić, B.; Wick, N.; Thurner, S.
We analyze gene expression time-series data of yeast (S. cerevisiae) measured along two full cell-cycles. We quantify these data by using q-exponentials, gene expression ranking and a temporal mean-variance analysis. We construct gene interaction networks based on correlation coefficients and study the formation of the corresponding giant components and minimum spanning trees. By coloring genes according to their cell function we find functional clusters in the correlation networks and functional branches in the associated trees. Our results suggest that a percolation point of functional clusters can be identified on these gene expression correlation networks.
Hassani-Pak, Keywan; Castellote, Martin; Esch, Maria; Hindle, Matthew; Lysenko, Artem; Taubert, Jan; Rawlings, Christopher
The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.
Shin, Yong-Jun; Bleris, Leonidas
Systems biology is an interdisciplinary field that aims at understanding complex interactions in cells. Here we demonstrate that linear control theory can provide valuable insight and practical tools for the characterization of complex biological networks. We provide the foundation for such analyses through the study of several case studies including cascade and parallel forms, feedback and feedforward loops. We reproduce experimental results and provide rational analysis of the observed behavior. We demonstrate that methods such as the transfer function (frequency domain) and linear state-space (time domain) can be used to predict reliably the properties and transient behavior of complex network topologies and point to specific design strategies for synthetic networks.
Li, Zhiyuan; Bianco, Simone; Zhang, Zhaoyang; Tang, Chao
Modeling gene regulatory networks (GRNs) is an important topic in systems biology. Although there has been much work focusing on various specific systems, the generic behavior of GRNs with continuous variables is still elusive. In particular, it is not clear typically how attractors partition among the three types of orbits: steady state, periodic and chaotic, and how the dynamical properties change with network's topological characteristics. In this work, we first investigated these questions in random GRNs with different network sizes, connectivity, fraction of inhibitory links and transcription regulation rules. Then we searched for the core motifs that govern the dynamic behavior of large GRNs. We show that the stability of a random GRN is typically governed by a few embedding motifs of small sizes, and therefore can in general be understood in the context of these short motifs. Our results provide insights for the study and design of genetic networks.
Full Text Available The impact of gene silencing on cellular phenotypes is difficult to establish due to the complexity of interactions in the associated biological processes and pathways. A recent genome-wide RNA knock-down study both identified and phenotypically characterized a set of important genes for the cell cycle in HeLa cells. Here, we combine a molecular interaction network analysis, based on physical and functional protein interactions, in conjunction with evolutionary information, to elucidate the common biological and topological properties of these key genes. Our results show that these genes tend to be conserved with their corresponding protein interactions across several species and are key constituents of the evolutionary conserved molecular interaction network. Moreover, a group of bistable network motifs is found to be conserved within this network, which are likely to influence the network stability and therefore the robustness of cellular functioning. They form a cluster, which displays functional homogeneity and is significantly enriched in genes phenotypically relevant for mitosis. Additional results reveal a relationship between specific cellular processes and the phenotypic outcomes induced by gene silencing. This study introduces new ideas regarding the relationship between genotype and phenotype in the context of the cell cycle. We show that the analysis of molecular interaction networks can result in the identification of genes relevant to cellular processes, which is a promising avenue for future research.
Gu, Zuguang; Zhang, Chenyu; Wang, Jin
Hepatocellular carcinoma (HCC) is one of the most lethal cancers worldwide, and the mechanisms that lead to the disease are still relatively unclear. However, with the development of high-throughput technologies it is possible to gain a systematic view of biological systems to enhance the understanding of the roles of genes associated with HCC. Thus, analysis of the mechanism of molecule interactions in the context of gene regulatory networks can reveal specific sub-networks that lead to the development of HCC. In this study, we aimed to identify the most important gene regulations that are dysfunctional in HCC generation. Our method for constructing gene regulatory network is based on predicted target interactions, experimentally-supported interactions, and co-expression model. Regulators in the network included both transcription factors and microRNAs to provide a complete view of gene regulation. Analysis of gene regulatory network revealed that gene regulation in HCC is highly modular, in which different sets of regulators take charge of specific biological processes. We found that microRNAs mainly control biological functions related to mitochondria and oxidative reduction, while transcription factors control immune responses, extracellular activity and the cell cycle. On the higher level of gene regulation, there exists a core network that organizes regulations between different modules and maintains the robustness of the whole network. There is direct experimental evidence for most of the regulators in the core gene regulatory network relating to HCC. We infer it is the central controller of gene regulation. Finally, we explored the influence of the core gene regulatory network on biological pathways. Our analysis provides insights into the mechanism of transcriptional and post-transcriptional control in HCC. In particular, we highlight the importance of the core gene regulatory network; we propose that it is highly related to HCC and we believe further
Full Text Available Abstract Background Stochastic simulation of gene networks by Markov processes has important applications in molecular biology. The complexity of exact simulation algorithms scales with the number of discrete jumps to be performed. Approximate schemes reduce the computational time by reducing the number of simulated discrete events. Also, answering important questions about the relation between network topology and intrinsic noise generation and propagation should be based on general mathematical results. These general results are difficult to obtain for exact models. Results We propose a unified framework for hybrid simplifications of Markov models of multiscale stochastic gene networks dynamics. We discuss several possible hybrid simplifications, and provide algorithms to obtain them from pure jump processes. In hybrid simplifications, some components are discrete and evolve by jumps, while other components are continuous. Hybrid simplifications are obtained by partial Kramers-Moyal expansion 123 which is equivalent to the application of the central limit theorem to a sub-model. By averaging and variable aggregation we drastically reduce simulation time and eliminate non-critical reactions. Hybrid and averaged simplifications can be used for more effective simulation algorithms and for obtaining general design principles relating noise to topology and time scales. The simplified models reproduce with good accuracy the stochastic properties of the gene networks, including waiting times in intermittence phenomena, fluctuation amplitudes and stationary distributions. The methods are illustrated on several gene network examples. Conclusion Hybrid simplifications can be used for onion-like (multi-layered approaches to multi-scale biochemical systems, in which various descriptions are used at various scales. Sets of discrete and continuous variables are treated with different methods and are coupled together in a physically justified approach.
Zhang, Yihua; Li, Wan; Feng, Yuyan; Guo, Shanshan; Zhao, Xilei; Wang, Yahui; He, Yuehan; He, Weiming; Chen, Lina
Chronic obstructive pulmonary disease (COPD) is a multi-factor disease, which could be caused by many factors, including disturbances of metabolism and protein-protein interactions (PPIs). In this paper, a weighted COPD-related metabolic network and a weighted COPD-related PPI network were constructed base on COPD disease genes and functional information. Candidate genes in these weighted COPD-related networks were prioritized by making use of a gene prioritization method, respectively. Literature review and functional enrichment analysis of the top 100 genes in these two networks suggested the correlation of COPD and these genes. The performance of our gene prioritization method was superior to that of ToppGene and ToppNet for genes from the COPD-related metabolic network or the COPD-related PPI network after assessing using leave-one-out cross-validation, literature validation and functional enrichment analysis. The top-ranked genes prioritized from COPD-related metabolic and PPI networks could promote the better understanding about the molecular mechanism of this disease from different perspectives. The top 100 genes in COPD-related metabolic network or COPD-related PPI network might be potential markers for the diagnosis and treatment of COPD.
Full Text Available Abstract Background Being sessile organisms, plants should adjust their metabolism to dynamic changes in their environment. Such adjustments need particular coordination in branched metabolic networks in which a given metabolite can be converted into multiple other metabolites via different enzymatic chains. In the present report, we developed a novel "Gene Coordination" bioinformatics approach and use it to elucidate adjustable transcriptional interactions of two branched amino acid metabolic networks in plants in response to environmental stresses, using publicly available microarray results. Results Using our "Gene Coordination" approach, we have identified in Arabidopsis plants two oppositely regulated groups of "highly coordinated" genes within the branched Asp-family network of Arabidopsis plants, which metabolizes the amino acids Lys, Met, Thr, Ile and Gly, as well as a single group of "highly coordinated" genes within the branched aromatic amino acid metabolic network, which metabolizes the amino acids Trp, Phe and Tyr. These genes possess highly coordinated adjustable negative and positive expression responses to various stress cues, which apparently regulate adjustable metabolic shifts between competing branches of these networks. We also provide evidence implying that these highly coordinated genes are central to impose intra- and inter-network interactions between the Asp-family and aromatic amino acid metabolic networks as well as differential system interactions with other growth promoting and stress-associated genome-wide genes. Conclusion Our novel Gene Coordination elucidates that branched amino acid metabolic networks in plants are regulated by specific groups of highly coordinated genes that possess adjustable intra-network, inter-network and genome-wide transcriptional interactions. We also hypothesize that such transcriptional interactions enable regulatory metabolic adjustments needed for adaptation to the stresses.
Full Text Available Abstract Background Multifactor Dimensionality Reduction (MDR has been widely applied to detect gene-gene (GxG interactions associated with complex diseases. Existing MDR methods summarize disease risk by a dichotomous predisposing model (high-risk/low-risk from one optimal GxG interaction, which does not take the accumulated effects from multiple GxG interactions into account. Results We propose an Aggregated-Multifactor Dimensionality Reduction (A-MDR method that exhaustively searches for and detects significant GxG interactions to generate an epistasis enriched gene network. An aggregated epistasis enriched risk score, which takes into account multiple GxG interactions simultaneously, replaces the dichotomous predisposing risk variable and provides higher resolution in the quantification of disease susceptibility. We evaluate this new A-MDR approach in a broad range of simulations. Also, we present the results of an application of the A-MDR method to a data set derived from Juvenile Idiopathic Arthritis patients treated with methotrexate (MTX that revealed several GxG interactions in the folate pathway that were associated with treatment response. The epistasis enriched risk score that pooled information from 82 significant GxG interactions distinguished MTX responders from non-responders with 82% accuracy. Conclusions The proposed A-MDR is innovative in the MDR framework to investigate aggregated effects among GxG interactions. New measures (pOR, pRR and pChi are proposed to detect multiple GxG interactions.
Full Text Available Protein-protein interaction networks provide a global picture of cellular function and biological processes. Some proteins act as hub proteins, highly connected to others, whereas some others have few interactions. The dysfunction of some interactions causes many diseases, including cancer. Proteins interact through their interfaces. Therefore, studying the interface properties of cancer-related proteins will help explain their role in the interaction networks. Similar or overlapping binding sites should be used repeatedly in single interface hub proteins, making them promiscuous. Alternatively, multi-interface hub proteins make use of several distinct binding sites to bind to different partners. We propose a methodology to integrate protein interfaces into cancer interaction networks (ciSPIN, cancer structural protein interface network. The interactions in the human protein interaction network are replaced by interfaces, coming from either known or predicted complexes. We provide a detailed analysis of cancer related human protein-protein interfaces and the topological properties of the cancer network. The results reveal that cancer-related proteins have smaller, more planar, more charged and less hydrophobic binding sites than non-cancer proteins, which may indicate low affinity and high specificity of the cancer-related interactions. We also classified the genes in ciSPIN according to phenotypes. Within phenotypes, for breast cancer, colorectal cancer and leukemia, interface properties were found to be discriminating from non-cancer interfaces with an accuracy of 71%, 67%, 61%, respectively. In addition, cancer-related proteins tend to interact with their partners through distinct interfaces, corresponding mostly to multi-interface hubs, which comprise 56% of cancer-related proteins, and constituting the nodes with higher essentiality in the network (76%. We illustrate the interface related affinity properties of two cancer-related hub
Modularity is a widespread property in biological systems. It implies that interactions occur mainly within groups of system elements. A modular arrangement facilitates adjustment of one module without perturbing the rest of the system. Therefore, modularity of developmental mechanisms is a major factor for evolvability, the potential to produce beneficial variation from random genetic change. Understanding how modularity evolves in gene regulatory networks, that create the distinct gene activity patterns that characterize different parts of an organism, is key to developmental and evolutionary biology. One hypothesis for the evolution of modules suggests that interactions between some sets of genes become maladaptive when selection favours additional gene activity patterns. The removal of such interactions by selection would result in the formation of modules. A second hypothesis suggests that modularity evolves in response to sparseness, the scarcity of interactions within a system. Here I simulate the evolution of gene regulatory networks and analyse diverse experimentally sustained networks to study the relationship between sparseness and modularity. My results suggest that sparseness alone is neither sufficient nor necessary to explain modularity in gene regulatory networks. However, sparseness amplifies the effects of forms of selection that, like selection for additional gene activity patterns, already produce an increase in modularity. That evolution of new gene activity patterns is frequent across evolution also supports that it is a major factor in the evolution of modularity. That sparseness is widespread across gene regulatory networks indicates that it may have facilitated the evolution of modules in a wide variety of cases.
del Genio, Charo I.; Gómez-Gardeñes, Jesús; Bonamassa, Ivan; Boccaletti, Stefano
The structure of many real-world systems is best captured by networks consisting of several interaction layers. Understanding how a multilayered structure of connections affects the synchronization properties of dynamical systems evolving on top of it is a highly relevant endeavor in mathematics and physics and has potential applications in several socially relevant topics, such as power grid engineering and neural dynamics. We propose a general framework to assess the stability of the synchronized state in networks with multiple interaction layers, deriving a necessary condition that generalizes the master stability function approach. We validate our method by applying it to a network of Rössler oscillators with a double layer of interactions and show that highly rich phenomenology emerges from this. This includes cases where the stability of synchronization can be induced even if both layers would have individually induced unstable synchrony, an effect genuinely arising from the true multilayer structure of the interactions among the units in the network. PMID:28138540
Advances in network technologies enable distributed systems, operating in complex physical environments, to coordinate their activities over larger areas within shorter time intervals. In these systems humans and intelligent machines will, in close interaction, be able to reach their goals under
Fu, Changhe; Deng, Su; Jin, Guangxu; Wang, Xinxin; Yu, Zu-Guo
Molecular interaction data at proteomic and genetic levels provide physical and functional insights into a molecular biosystem and are helpful for the construction of pathway structures complementarily. Despite advances in inferring biological pathways using genetic interaction data, there still exists weakness in developed models, such as, activity pathway networks (APN), when integrating the data from proteomic and genetic levels. It is necessary to develop new methods to infer pathway structure by both of interaction data. We utilized probabilistic graphical model to develop a new method that integrates genetic interaction and protein interaction data and infers exquisitely detailed pathway structure. We modeled the pathway network as Bayesian network and applied this model to infer pathways for the coherent subsets of the global genetic interaction profiles, and the available data set of endoplasmic reticulum genes. The protein interaction data were derived from the BioGRID database. Our method can accurately reconstruct known cellular pathway structures, including SWR complex, ER-Associated Degradation (ERAD) pathway, N-Glycan biosynthesis pathway, Elongator complex, Retromer complex, and Urmylation pathway. By comparing N-Glycan biosynthesis pathway and Urmylation pathway identified from our approach with that from APN, we found that our method is able to overcome its weakness (certain edges are inexplicable). According to underlying protein interaction network, we defined a simple scoring function that only adopts genetic interaction information to avoid the balance difficulty in the APN. Using the effective stochastic simulation algorithm, the performance of our proposed method is significantly high. We developed a new method based on Bayesian network to infer detailed pathway structures from interaction data at proteomic and genetic levels. The results indicate that the developed method performs better in predicting signaling pathways than previously
Full Text Available The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network
Royer, Loic; Reimann, Matthias; Stewart, A. Francis; Schroeder, Michael
With the advent of large-scale protein interaction studies, there is much debate about data quality. Can different noise levels in the measurements be assessed by analyzing network structure? Because proteomic regulation is inherently co-operative, modular and redundant, it is inherently compressible when represented as a network. Here we propose that network compression can be used to compare false positive and false negative noise levels in protein interaction networks. We validate this hypothesis by first confirming the detrimental effect of false positives and false negatives. Second, we show that gold standard networks are more compressible. Third, we show that compressibility correlates with co-expression, co-localization, and shared function. Fourth, we also observe correlation with better protein tagging methods, physiological expression in contrast to over-expression of tagged proteins, and smart pooling approaches for yeast two-hybrid screens. Overall, this new measure is a proxy for both sensitivity and specificity and gives complementary information to standard measures such as average degree and clustering coefficients. PMID:22719828
Gene regulatory networks (GRNs) control cellular function and decision making during tissue development and homeostasis. Mathematical tools based on dynamical systems theory are often used to model these networks, but the size and complexity of these models mean that their behaviour is not always intuitive and the underlying mechanisms can be difficult to decipher. For this reason, methods that simplify and aid exploration of complex networks are necessary. To this end we develop a broadly applicable form of the Zwanzig-Mori projection. By first converting a thermodynamic state ensemble model of gene regulation into mass action reactions we derive a general method that produces a set of time evolution equations for a subset of components of a network. The influence of the rest of the network, the bulk, is captured by memory functions that describe how the subnetwork reacts to its own past state via components in the bulk. These memory functions provide probes of near-steady state dynamics, revealing information not easily accessible otherwise. We illustrate the method on a simple cross-repressive transcriptional motif to show that memory functions not only simplify the analysis of the subnetwork but also have a natural interpretation. We then apply the approach to a GRN from the vertebrate neural tube, a well characterised developmental transcriptional network composed of four interacting transcription factors. The memory functions reveal the function of specific links within the neural tube network and identify features of the regulatory structure that specifically increase the robustness of the network to initial conditions. Taken together, the study provides evidence that Zwanzig-Mori projections offer powerful and effective tools for simplifying and exploring the behaviour of GRNs. PMID:29470492
Wang, Yi Kan; Hurley, Daniel G; Schnell, Santiago; Print, Cristin G; Crampin, Edmund J
We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data.
Full Text Available Multiple sclerosis (MS is a chronic inflammatory disease of the CNS and has a varying disease course as well as variable response to treatment. Biomarkers may therefore aid personalized treatment. We tested whether in vitro activation of MS patient-derived CD4+ T cells could reveal potential biomarkers. The dynamic gene expression response to activation was dysregulated in patient-derived CD4+ T cells. By integrating our findings with genome-wide association studies, we constructed a highly connected MS gene module, disclosing cell activation and chemotaxis as central components. Changes in several module genes were associated with differences in protein levels, which were measurable in cerebrospinal fluid and were used to classify patients from control individuals. In addition, these measurements could predict disease activity after 2 years and distinguish low and high responders to treatment in two additional, independent cohorts. While further validation is needed in larger cohorts prior to clinical implementation, we have uncovered a set of potentially promising biomarkers.
Understanding the direction of information flow is essential for characterizing how genetic networks affect phenotypes. However, methods to find genetic interactions largely fail to reveal directional dependencies. We combine two orthogonal Cas9 proteins from Streptococcus pyogenes and Staphylococcus aureus to carry out a dual screen in which one gene is activated while a second gene is deleted in the same cell. We analyze the quantitative effects of activation and knockout to calculate genetic interaction and directionality scores for each gene pair.
Bag, Susmita; Ramaiah, Sudha; Anbarasu, Anand
Network study on genes and proteins offers functional basics of the complexity of gene and protein, and its interacting partners. The gene fatty acid-binding protein 4 (fabp4) is found to be highly expressed in adipose tissue, and is one of the most abundant proteins in mature adipocytes. Our investigations on functional modules of fabp4 provide useful information on the functional genes interacting with fabp4, their biochemical properties and their regulatory functions. The present study shows that there are eight set of candidate genes: acp1, ext2, insr, lipe, ostf1, sncg, usp15, and vim that are strongly and functionally linked up with fabp4. Gene ontological analysis of network modules of fabp4 provides an explicit idea on the functional aspect of fabp4 and its interacting nodes. The hierarchal mapping on gene ontology indicates gene specific processes and functions as well as their compartmentalization in tissues. The fabp4 along with its interacting genes are involved in lipid metabolic activity and are integrated in multi-cellular processes of tissues and organs. They also have important protein/enzyme binding activity. Our study elucidated disease-associated nsSNP prediction for fabp4 and it is interesting to note that there are four rsID׳s (rs1051231, rs3204631, rs140925685 and rs141169989) with disease allelic variation (T104P, T126P, G27D and G90V respectively). On the whole, our gene network analysis presents a clear insight about the interactions and functions associated with fabp4 gene network. Copyright © 2014 Elsevier Ltd. All rights reserved.
Full Text Available Systems biology is an interdisciplinary field that aims at understanding complex interactions in cells. Here we demonstrate that linear control theory can provide valuable insight and practical tools for the characterization of complex biological networks. We provide the foundation for such analyses through the study of several case studies including cascade and parallel forms, feedback and feedforward loops. We reproduce experimental results and provide rational analysis of the observed behavior. We demonstrate that methods such as the transfer function (frequency domain and linear state-space (time domain can be used to predict reliably the properties and transient behavior of complex network topologies and point to specific design strategies for synthetic networks.
Raza, Khalid; Alam, Mansaf
One of the exciting problems in systems biology research is to decipher how genome controls the development of complex biological system. The gene regulatory networks (GRNs) help in the identification of regulatory interactions between genes and offer fruitful information related to functional role of individual gene in a cellular system. Discovering GRNs lead to a wide range of applications, including identification of disease related pathways providing novel tentative drug targets, helps to predict disease response, and also assists in diagnosing various diseases including cancer. Reconstruction of GRNs from available biological data is still an open problem. This paper proposes a recurrent neural network (RNN) based model of GRN, hybridized with generalized extended Kalman filter for weight update in backpropagation through time training algorithm. The RNN is a complex neural network that gives a better settlement between biological closeness and mathematical flexibility to model GRN; and is also able to capture complex, non-linear and dynamic relationships among variables. Gene expression data are inherently noisy and Kalman filter performs well for estimation problem even in noisy data. Hence, we applied non-linear version of Kalman filter, known as generalized extended Kalman filter, for weight update during RNN training. The developed model has been tested on four benchmark networks such as DNA SOS repair network, IRMA network, and two synthetic networks from DREAM Challenge. We performed a comparison of our results with other state-of-the-art techniques which shows superiority of our proposed model. Further, 5% Gaussian noise has been induced in the dataset and result of the proposed model shows negligible effect of noise on results, demonstrating the noise tolerance capability of the model. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lev Aleksandrovich Korshunov
Full Text Available To improve the efficiency and competitiveness of the regional economy, an effective interaction between educational institutions in the Great Altai region is needed. The innovation growth can enhancing this interaction. The article explores the state of network structures in the economy and higher education in the border territories of the countries of Great Altai. The authors propose an updated approach to the three-level classification of network interaction. We analyze growing influence of the countries with emerging economies. We define the factors that impede the more stable and multifaceted regional development of these countries. Further, the authors determine indicators of the higher education systems and cooperation systems at the university level between the Shanghai Cooperation Organization countries (SCO and BRICS countries, showing the international rankings of the universities in these countries. The teaching language is important to overcome the obstacles in the interregional cooperation. The authors specify the problems of the development of the universities of the SCO and BRICS countries as global educational networks. The research applies basic scientific logical methods of analysis and synthesis, induction and deduction, as well as the SWOT analysis method. We have indentified and analyzed the existing economic and educational relations. To promote the economic innovation development of the border territories of the Great Altai, we propose a model of regional network university. Modern universities function in a new economic environment. Thus, in a great extent, they form the technological and social aspects of this environment. Innovative network structures contribute to the formation of a new network institutional environment of the regional economy, which impacts the macro- and microeconomic performance of the region as a whole. The results of the research can help to optimize the regional economies of the border
Full Text Available We have mapped a protein interaction network of human homologs of proteins that modify longevity in invertebrate species. This network is derived from a proteome-scale human protein interaction Core Network generated through unbiased high-throughput yeast two-hybrid searches. The longevity network is composed of 175 human homologs of proteins known to confer increased longevity through loss of function in yeast, nematode, or fly, and 2,163 additional human proteins that interact with these homologs. Overall, the network consists of 3,271 binary interactions among 2,338 unique proteins. A comparison of the average node degree of the human longevity homologs with random sets of proteins in the Core Network indicates that human homologs of longevity proteins are highly connected hubs with a mean node degree of 18.8 partners. Shortest path length analysis shows that proteins in this network are significantly more connected than would be expected by chance. To examine the relationship of this network to human aging phenotypes, we compared the genes encoding longevity network proteins to genes known to be changed transcriptionally during aging in human muscle. In the case of both the longevity protein homologs and their interactors, we observed enrichments for differentially expressed genes in the network. To determine whether homologs of human longevity interacting proteins can modulate life span in invertebrates, homologs of 18 human FRAP1 interacting proteins showing significant changes in human aging muscle were tested for effects on nematode life span using RNAi. Of 18 genes tested, 33% extended life span when knocked-down in Caenorhabditis elegans. These observations indicate that a broad class of longevity genes identified in invertebrate models of aging have relevance to human aging. They also indicate that the longevity protein interaction network presented here is enriched for novel conserved longevity proteins.
Pardee, Keith; Green, Alexander A; Ferrante, Tom; Cameron, D Ewen; DaleyKeyser, Ajay; Yin, Peng; Collins, James J
Synthetic gene networks have wide-ranging uses in reprogramming and rewiring organisms. To date, there has not been a way to harness the vast potential of these networks beyond the constraints of a laboratory or in vivo environment. Here, we present an in vitro paper-based platform that provides an alternate, versatile venue for synthetic biologists to operate and a much-needed medium for the safe deployment of engineered gene circuits beyond the lab. Commercially available cell-free systems are freeze dried onto paper, enabling the inexpensive, sterile, and abiotic distribution of synthetic-biology-based technologies for the clinic, global health, industry, research, and education. For field use, we create circuits with colorimetric outputs for detection by eye and fabricate a low-cost, electronic optical interface. We demonstrate this technology with small-molecule and RNA actuation of genetic switches, rapid prototyping of complex gene circuits, and programmable in vitro diagnostics, including glucose sensors and strain-specific Ebola virus sensors.
Pardee, Keith; Green, Alexander A.; Ferrante, Tom; Cameron, D. Ewen; DaleyKeyser, Ajay; Yin, Peng; Collins, James J.
Synthetic gene networks have wide-ranging uses in reprogramming and rewiring organisms. To date, there has not been a way to harness the vast potential of these networks beyond the constraints of a laboratory or in vivo environment. Here, we present an in vitro paper-based platform that provides a new venue for synthetic biologists to operate, and a much-needed medium for the safe deployment of engineered gene circuits beyond the lab. Commercially available cell-free systems are freeze-dried onto paper, enabling the inexpensive, sterile and abiotic distribution of synthetic biology-based technologies for the clinic, global health, industry, research and education. For field use, we create circuits with colorimetric outputs for detection by eye, and fabricate a low-cost, electronic optical interface. We demonstrate this technology with small molecule and RNA actuation of genetic switches, rapid prototyping of complex gene circuits, and programmable in vitro diagnostics, including glucose sensors and strain-specific Ebola virus sensors. PMID:25417167
Song, Mingzhou (Joe) [New Mexico State University, Las Cruces; Lewis, Chris K. [New Mexico State University, Las Cruces; Lance, Eric [New Mexico State University, Las Cruces; Chesler, Elissa J [ORNL; Kirova, Roumyana [Bristol-Myers Squibb Pharmaceutical Research & Development, NJ; Langston, Michael A [University of Tennessee, Knoxville (UTK); Bergeson, Susan [Texas Tech University, Lubbock
The problem of reconstructing generalized logical networks to account for temporal dependencies among genes and environmental stimuli from high-throughput transcriptomic data is addressed. A network reconstruction algorithm was developed that uses the statistical significance as a criterion for network selection to avoid false-positive interactions arising from pure chance. Using temporal gene expression data collected from the brains of alcohol-treated mice in an analysis of the molecular response to alcohol, this algorithm identified genes from a major neuronal pathway as putative components of the alcohol response mechanism. Three of these genes have known associations with alcohol in the literature. Several other potentially relevant genes, highlighted and agreeing with independent results from literature mining, may play a role in the response to alcohol. Additional, previously-unknown gene interactions were discovered that, subject to biological verification, may offer new clues in the search for the elusive molecular mechanisms of alcoholism.
Van Assche, Evelien; Moons, Tim; Cinar, Ozan; Viechtbauer, Wolfgang; Oldehinkel, Albertine J.; Van Leeuwen, Karla; Verschueren, Karine; Colpin, Hilde; Lambrechts, Diether; Van den Noortgate, Wim; Goossens, Luc; Claes, Stephan; van Winkel, Ruud
BACKGROUND: Most gene-environment interaction studies (G × E) have focused on single candidate genes. This approach is criticized for its expectations of large effect sizes and occurrence of spurious results. We describe an approach that accounts for the polygenic nature of most psychiatric
Ferguson, Laura B; Harris, R Adron; Mayfield, Roy Dayne
The alcohol research field has amassed an impressive number of gene expression datasets spanning key brain areas for addiction, species (humans as well as multiple animal models), and stages in the addiction cycle (binge/intoxication, withdrawal/negative effect, and preoccupation/anticipation). These data have improved our understanding of the molecular adaptations that eventually lead to dysregulation of brain function and the chronic, relapsing disorder of addiction. Identification of new medications to treat alcohol use disorder (AUD) will likely benefit from the integration of genetic, genomic, and behavioral information included in these important datasets. Systems pharmacology considers drug effects as the outcome of the complex network of interactions a drug has rather than a single drug-molecule interaction. Computational strategies based on this principle that integrate gene expression signatures of pharmaceuticals and disease states have shown promise for identifying treatments that ameliorate disease symptoms (called in silico gene mapping or connectivity mapping). In this review, we suggest that gene expression profiling for in silico mapping is critical to improve drug repurposing and discovery for AUD and other psychiatric illnesses. We highlight studies that successfully apply gene mapping computational approaches to identify or repurpose pharmaceutical treatments for psychiatric illnesses. Furthermore, we address important challenges that must be overcome to maximize the potential of these strategies to translate to the clinic and improve healthcare outcomes.
Hill, Jonathon T; Demarest, Bradley; Gorsi, Bushra; Smith, Megan; Yost, H Joseph
During embryogenesis the heart forms as a linear tube that then undergoes multiple simultaneous morphogenetic events to obtain its mature shape. To understand the gene regulatory networks (GRNs) driving this phase of heart development, during which many congenital heart disease malformations likely arise, we conducted an RNA-seq timecourse in zebrafish from 30 hpf to 72 hpf and identified 5861 genes with altered expression. We clustered the genes by temporal expression pattern, identified transcription factor binding motifs enriched in each cluster, and generated a model GRN for the major gene batteries in heart morphogenesis. This approach predicted hundreds of regulatory interactions and found batteries enriched in specific cell and tissue types, indicating that the approach can be used to narrow the search for novel genetic markers and regulatory interactions. Subsequent analyses confirmed the GRN using two mutants, Tbx5 and nkx2-5 , and identified sets of duplicated zebrafish genes that do not show temporal subfunctionalization. This dataset provides an essential resource for future studies on the genetic/epigenetic pathways implicated in congenital heart defects and the mechanisms of cardiac transcriptional regulation. © 2017. Published by The Company of Biologists Ltd.
van Vliet-Ostaptchouk, Jana V; Snieder, Harold; Lagou, Vasiliki
Obesity is a complex multifaceted disease resulting from interactions between genetics and lifestyle. The proportion of phenotypic variance ascribed to genetic variance is 0.4 to 0.7 for obesity and recent years have seen considerable success in identifying disease-susceptibility variants. Although with the advent of genome-wide association studies the list of genetic variants predisposing to obesity has significantly increased the identified variants only explain a fraction of disease heritability. Studies of gene-environment interactions can provide more insight into the biological mechanisms involved in obesity despite the challenges associated with such designs. Epigenetic changes that affect gene function without DNA sequence modifications may be a key factor explaining interindividual differences in obesity, with both genetic and environmental factors influencing the epigenome. Disentangling the relative contributions of genetic, environmental and epigenetic marks to the establishment of obesity is a major challenge given the complex interplay between these determinants.
Barrdahl, Myrto; Rudolph, Anja; Hopper, John L
.36, 95% CI: 1.16-1.59, pint = 1.9 × 10(-5) ) in relation to ER- disease risk. The remaining two gene-environment interactions were also identified in relation to ER- breast cancer risk and were found between 3p21-rs6796502 and age at menarche (ORint = 1.26, 95% CI: 1.12-1.43, pint =1.8 × 10...... epidemiological breast cancer risk factors in relation to breast cancer. Analyses were conducted on up to 58,573 subjects (26,968 cases and 31,605 controls) from the Breast Cancer Association Consortium, in one of the largest studies of its kind. Analyses were carried out separately for estrogen receptor (ER......) positive (ER+) and ER negative (ER-) disease. The Bayesian False Discovery Probability (BFDP) was computed to assess the noteworthiness of the results. Four potential gene-environment interactions were identified as noteworthy (BFDP
Gregory, Alice M.; Lau, Jennifer Y. F.; Eley, Thalia C.
Phobias are common disorders causing a great deal of suffering. Studies of gene-environment interaction (G × E) have revealed much about the complex processes underlying the development of various psychiatric disorders but have told us little about phobias. This article describes what is already known about genetic and environmental influences upon phobias and suggests how this information can be used to optimise the chances of discovering G × Es for phobias. In addition to the careful concep...
Toga, Arthur W; Neu, Scott C; Bhatt, Priya; Crawford, Karen L; Ashish, Naveen
The Global Alzheimer's Association Interactive Network (GAAIN) is consolidating the efforts of independent Alzheimer's disease data repositories around the world with the goals of revealing more insights into the causes of Alzheimer's disease, improving treatments, and designing preventative measures that delay the onset of physical symptoms. We developed a system for federating these repositories that is reliant on the tenets that (1) its participants require incentives to join, (2) joining the network is not disruptive to existing repository systems, and (3) the data ownership rights of its members are protected. We are currently in various phases of recruitment with over 55 data repositories in North America, Europe, Asia, and Australia and can presently query >250,000 subjects using GAAIN's search interfaces. GAAIN's data sharing philosophy, which guided our architectural choices, is conducive to motivating membership in a voluntary data sharing network. Copyright © 2016 The Alzheimer's Association. Published by Elsevier Inc. All rights reserved.
Interactomics: a complete survey from data generation to knowledge extraction With the increasing use of high-throughput experimental assays, more and more protein interaction databases are becoming available. As a result, computational analysis of protein-to-protein interaction (PPI) data and networks, now known as interactomics, has become an essential tool to determine functionally associated proteins. From wet lab technologies to data management to knowledge extraction, this timely book guides readers through the new science of interactomics, giving them the tools needed to: Generate
Fabbri, Renato; Fabbri, Ricardo; Antunes, Deborah Christina; Pisani, Marilia Mello; de Oliveira, Osvaldo Novais
This paper reports on stable (or invariant) properties of human interaction networks, with benchmarks derived from public email lists. Activity, recognized through messages sent, along time and topology were observed in snapshots in a timeline, and at different scales. Our analysis shows that activity is practically the same for all networks across timescales ranging from seconds to months. The principal components of the participants in the topological metrics space remain practically unchanged as different sets of messages are considered. The activity of participants follows the expected scale-free trace, thus yielding the hub, intermediary and peripheral classes of vertices by comparison against the Erdös-Rényi model. The relative sizes of these three sectors are essentially the same for all email lists and the same along time. Typically, 45% are peripheral vertices. Similar results for the distribution of participants in the three sectors and for the relative importance of the topological metrics were obtained for 12 additional networks from Facebook, Twitter and ParticipaBR. These properties are consistent with the literature and may be general for human interaction networks, which has important implications for establishing a typology of participants based on quantitative criteria.
Jalili, Mahdi; Gebhardt, Tom; Wolkenhauer, Olaf; Salehzadeh-Yazdi, Ali
Decoding health and disease phenotypes is one of the fundamental objectives in biomedicine. Whereas high-throughput omics approaches are available, it is evident that any single omics approach might not be adequate to capture the complexity of phenotypes. Therefore, integrated multi-omics approaches have been used to unravel genotype-phenotype relationships such as global regulatory mechanisms and complex metabolic networks in different eukaryotic organisms. Some of the progress and challenges associated with integrated omics studies have been reviewed previously in comprehensive studies. In this work, we highlight and review the progress, challenges and advantages associated with emerging approaches, integrating gene expression and protein-protein interaction networks to unravel network-based functional features. This includes identifying disease related genes, gene prioritization, clustering protein interactions, developing the modules, extract active subnetworks and static protein complexes or dynamic/temporal protein complexes. We also discuss how these approaches contribute to our understanding of the biology of complex traits and diseases. This article is part of a Special Issue entitled: Cardiac adaptations to obesity, diabetes and insulin resistance, edited by Professors Jan F.C. Glatz, Jason R.B. Dyck and Christine Des Rosiers. Copyright © 2018 Elsevier B.V. All rights reserved.
Olszewski Kellen L
Full Text Available Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes. Results We developed Nearest Neighbor Networks (NNN, a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the
Sun, Mengyang; Cheng, Xianrui; Socolar, Joshua E S
A common approach to the modeling of gene regulatory networks is to represent activating or repressing interactions using ordinary differential equations for target gene concentrations that include Hill function dependences on regulator gene concentrations. An alternative formulation represents the same interactions using Boolean logic with time delays associated with each network link. We consider the attractors that emerge from the two types of models in the case of a simple but nontrivial network: a figure-8 network with one positive and one negative feedback loop. We show that the different modeling approaches give rise to the same qualitative set of attractors with the exception of a possible fixed point in the ordinary differential equation model in which concentrations sit at intermediate values. The properties of the attractors are most easily understood from the Boolean perspective, suggesting that time-delay Boolean modeling is a useful tool for understanding the logic of regulatory networks.
Kevin L Childs
Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional
Carson, Matthew B; Lu, Hui
In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network
Ferrari, Raffaele; Lovering, Ruth C; Hardy, John; Lewis, Patrick A; Manzoni, Claudia
The genetic analysis of complex disorders has undoubtedly led to the identification of a wealth of associations between genes and specific traits. However, moving from genetics to biochemistry one gene at a time has, to date, rather proved inefficient and under-powered to comprehensively explain the molecular basis of phenotypes. Here we present a novel approach, weighted protein-protein interaction network analysis (W-PPI-NA), to highlight key functional players within relevant biological processes associated with a given trait. This is exemplified in the current study by applying W-PPI-NA to frontotemporal dementia (FTD): We first built the state of the art FTD protein network (FTD-PN) and then analyzed both its topological and functional features. The FTD-PN resulted from the sum of the individual interactomes built around FTD-spectrum genes, leading to a total of 4198 nodes. Twenty nine of 4198 nodes, called inter-interactome hubs (IIHs), represented those interactors able to bridge over 60% of the individual interactomes. Functional annotation analysis not only reiterated and reinforced previous findings from single genes and gene-coexpression analyses but also indicated a number of novel potential disease related mechanisms, including DNA damage response, gene expression regulation, and cell waste disposal and potential biomarkers or therapeutic targets including EP300. These processes and targets likely represent the functional core impacted in FTD, reflecting the underlying genetic architecture contributing to disease. The approach presented in this study can be applied to other complex traits for which risk-causative genes are known as it provides a promising tool for setting the foundations for collating genomics and wet laboratory data in a bidirectional manner. This is and will be critical to accelerate molecular target prioritization and drug discovery.
Bhargava, Anuprabha; Herzel, Hanspeter; Ananthasubramaniam, Bharath
Background Most physiological processes in mammals are temporally regulated by means of a master circadian clock in the brain and peripheral oscillators in most other tissues. A transcriptional-translation feedback network of clock genes produces near 24 h oscillations in clock gene and protein expression. Here, we aim to identify novel additions to the clock network using a meta-analysis of public chromatin immunoprecipitation sequencing (ChIP-seq), proteomics and protein-protein interaction...
Full Text Available Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL, for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL based scheme, and the QTL-directed dependency graph (QDG method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.
Full Text Available With knowledge on microbial composition and diversity, investigation of within-community interactions is a further step to elucidate microbial ecological functions, such as the biodegradation of hazardous contaminants. In this work, microbial functional molecular ecological networks were studied in both contaminated and uncontaminated soils to determine the possible influences of oil contamination on microbial interactions and potential functions. Soil samples were obtained from an oil-exploring site located in South China, and the microbial functional genes were analyzed with GeoChip, a high-throughput functional microarray. By building random networks based on null model, we demonstrated that overall network structures and properties were significantly different between contaminated and uncontaminated soils (P < 0.001. Network connectivity, module numbers, and modularity were all reduced with contamination. Moreover, the topological roles of the genes (module hub and connectors were altered with oil contamination. Subnetworks of genes involved in alkane and polycyclic aromatic hydrocarbon degradation were also constructed. Negative co-occurrence patterns prevailed among functional genes, thereby indicating probable competition relationships. The potential keystone genes, defined as either hubs or genes with highest connectivities in the network, were further identified. The network constructed in this study predicted the potential effects of anthropogenic contamination on microbial community co-occurrence interactions.
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.
Özgür, Arzucan; Hur, Junguk; He, Yongqun
The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A
Full Text Available Abstract Background It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. Results In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network to address the underlying regulations of genes that can span any unit(s of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. Conclusion We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex
Gregory, Alice M; Lau, Jennifer Y F; Eley, Thalia C
Phobias are common disorders causing a great deal of suffering. Studies of gene-environment interaction (G x E) have revealed much about the complex processes underlying the development of various psychiatric disorders but have told us little about phobias. This article describes what is already known about genetic and environmental influences upon phobias and suggests how this information can be used to optimise the chances of discovering G x Es for phobias. In addition to the careful conceptualisation of new studies, it is suggested that data already collected should be re-analysed in light of increased understanding of processes influencing phobias.
Cava, Claudia; Bertoli, Gloria; Colaprico, Antonio; Olsen, Catharina; Bontempi, Gianluca; Castiglioni, Isabella
Modern high-throughput genomic technologies represent a comprehensive hallmark of molecular changes in pan-cancer studies. Although different cancer gene signatures have been revealed, the mechanism of tumourigenesis has yet to be completely understood. Pathways and networks are important tools to explain the role of genes in functional genomic studies. However, few methods consider the functional non-equal roles of genes in pathways and the complex gene-gene interactions in a network. We present a novel method in pan-cancer analysis that identifies de-regulated genes with a functional role by integrating pathway and network data. A pan-cancer analysis of 7158 tumour/normal samples from 16 cancer types identified 895 genes with a central role in pathways and de-regulated in cancer. Comparing our approach with 15 current tools that identify cancer driver genes, we found that 35.6% of the 895 genes identified by our method have been found as cancer driver genes with at least 2/15 tools. Finally, we applied a machine learning algorithm on 16 independent GEO cancer datasets to validate the diagnostic role of cancer driver genes for each cancer. We obtained a list of the top-ten cancer driver genes for each cancer considered in this study. Our analysis 1) confirmed that there are several known cancer driver genes in common among different types of cancer, 2) highlighted that cancer driver genes are able to regulate crucial pathways.
Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey
Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.
Full Text Available Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions. Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.
Sîrbu, Alina; Crane, Martin; Ruskin, Heather J
Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions). Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.
Khadka, Sudip; Vangeloff, Abbey D.; Zhang, Chaoying; Siddavatam, Prasad; Heaton, Nicholas S.; Wang, Ling; Sengupta, Ranjan; Sahasrabudhe, Sudhir; Randall, Glenn; Gribskov, Michael; Kuhn, Richard J.; Perera, Rushika; LaCount, Douglas J.
Dengue virus (DENV), an emerging mosquito-transmitted pathogen capable of causing severe disease in humans, interacts with host cell factors to create a more favorable environment for replication. However, few interactions between DENV and human proteins have been reported to date. To identify DENV-human protein interactions, we used high-throughput yeast two-hybrid assays to screen the 10 DENV proteins against a human liver activation domain library. From 45 DNA-binding domain clones containing either full-length viral genes or partially overlapping gene fragments, we identified 139 interactions between DENV and human proteins, the vast majority of which are novel. These interactions involved 105 human proteins, including six previously implicated in DENV infection and 45 linked to the replication of other viruses. Human proteins with functions related to the complement and coagulation cascade, the centrosome, and the cytoskeleton were enriched among the DENV interaction partners. To determine if the cellular proteins were required for DENV infection, we used small interfering RNAs to inhibit their expression. Six of 12 proteins targeted (CALR, DDX3X, ERC1, GOLGA2, TRIP11, and UBE2I) caused a significant decrease in the replication of a DENV replicon. We further showed that calreticulin colocalized with viral dsRNA and with the viral NS3 and NS5 proteins in DENV-infected cells, consistent with a direct role for calreticulin in DENV replication. Human proteins that interacted with DENV had significantly higher average degree and betweenness than expected by chance, which provides additional support for the hypothesis that viruses preferentially target cellular proteins that occupy central position in the human protein interaction network. This study provides a valuable starting point for additional investigations into the roles of human proteins in DENV infection. PMID:21911577
Khadka, Sudip; Vangeloff, Abbey D; Zhang, Chaoying; Siddavatam, Prasad; Heaton, Nicholas S; Wang, Ling; Sengupta, Ranjan; Sahasrabudhe, Sudhir; Randall, Glenn; Gribskov, Michael; Kuhn, Richard J; Perera, Rushika; LaCount, Douglas J
Dengue virus (DENV), an emerging mosquito-transmitted pathogen capable of causing severe disease in humans, interacts with host cell factors to create a more favorable environment for replication. However, few interactions between DENV and human proteins have been reported to date. To identify DENV-human protein interactions, we used high-throughput yeast two-hybrid assays to screen the 10 DENV proteins against a human liver activation domain library. From 45 DNA-binding domain clones containing either full-length viral genes or partially overlapping gene fragments, we identified 139 interactions between DENV and human proteins, the vast majority of which are novel. These interactions involved 105 human proteins, including six previously implicated in DENV infection and 45 linked to the replication of other viruses. Human proteins with functions related to the complement and coagulation cascade, the centrosome, and the cytoskeleton were enriched among the DENV interaction partners. To determine if the cellular proteins were required for DENV infection, we used small interfering RNAs to inhibit their expression. Six of 12 proteins targeted (CALR, DDX3X, ERC1, GOLGA2, TRIP11, and UBE2I) caused a significant decrease in the replication of a DENV replicon. We further showed that calreticulin colocalized with viral dsRNA and with the viral NS3 and NS5 proteins in DENV-infected cells, consistent with a direct role for calreticulin in DENV replication. Human proteins that interacted with DENV had significantly higher average degree and betweenness than expected by chance, which provides additional support for the hypothesis that viruses preferentially target cellular proteins that occupy central position in the human protein interaction network. This study provides a valuable starting point for additional investigations into the roles of human proteins in DENV infection.
Erbas, Aykut; Olvera de La Cruz, Monica; Olvera Group Collaboration
Surfaces formed by charged polymeric species are highly_abundant in both synthetic and biological systems, for which maintaining_an optimum contact distance and a pressure balance is paramount. We investigate interactions between surfaces of two same-charged and_highly swollen polyelectrolyte gels, using extensive molecular dynamic_simulations and minimal analytical methods. The external-pressure_responses of the gels and the polymer-free ionic solvent layer separating_two surfaces are considered. Simulations confirmed that the surfaces are_held apart by osmotic pressure resulting from excess charges diffusing out_of the network. Both the solvent layer and pressure dependence are well_described by an analytical model based on the Poisson -Boltzmann solution for low and moderate electrostatic strengths. Our results can be of great importance for systems where charged gels or gel-like structures interact in various solvents, including systems encapsulated by gels and microgels in confinement.
Mistry, Divya; Wise, Roger P; Dickerson, Julie A
Identification of central genes and proteins in biomolecular networks provides credible candidates for pathway analysis, functional analysis, and essentiality prediction. The DiffSLC centrality measure predicts central and essential genes and proteins using a protein-protein interaction network. Network centrality measures prioritize nodes and edges based on their importance to the network topology. These measures helped identify critical genes and proteins in biomolecular networks. The proposed centrality measure, DiffSLC, combines the number of interactions of a protein and the gene coexpression values of genes from which those proteins were translated, as a weighting factor to bias the identification of essential proteins in a protein interaction network. Potentially essential proteins with low node degree are promoted through eigenvector centrality. Thus, the gene coexpression values are used in conjunction with the eigenvector of the network's adjacency matrix and edge clustering coefficient to improve essentiality prediction. The outcome of this prediction is shown using three variations: (1) inclusion or exclusion of gene co-expression data, (2) impact of different coexpression measures, and (3) impact of different gene expression data sets. For a total of seven networks, DiffSLC is compared to other centrality measures using Saccharomyces cerevisiae protein interaction networks and gene expression data. Comparisons are also performed for the top ranked proteins against the known essential genes from the Saccharomyces Gene Deletion Project, which show that DiffSLC detects more essential proteins and has a higher area under the ROC curve than other compared methods. This makes DiffSLC a stronger alternative to other centrality methods for detecting essential genes using a protein-protein interaction network that obeys centrality-lethality principle. DiffSLC is implemented using the igraph package in R, and networkx package in Python. The python package can be
Warmflash, Aryeh; Siggia, Eric D; Francois, Paul
The computational evolution of gene networks functions like a forward genetic screen to generate, without preconceptions, all networks that can be assembled from a defined list of parts to implement a given function. Frequently networks are subject to multiple design criteria that cannot all be optimized simultaneously. To explore how these tradeoffs interact with evolution, we implement Pareto optimization in the context of gene network evolution. In response to a temporal pulse of a signal, we evolve networks whose output turns on slowly after the pulse begins, and shuts down rapidly when the pulse terminates. The best performing networks under our conditions do not fall into categories such as feed forward and negative feedback that also encode the input–output relation we used for selection. Pareto evolution can more efficiently search the space of networks than optimization based on a single ad hoc combination of the design criteria. (paper)
Warmflash, Aryeh; Francois, Paul; Siggia, Eric D
The computational evolution of gene networks functions like a forward genetic screen to generate, without preconceptions, all networks that can be assembled from a defined list of parts to implement a given function. Frequently networks are subject to multiple design criteria that cannot all be optimized simultaneously. To explore how these tradeoffs interact with evolution, we implement Pareto optimization in the context of gene network evolution. In response to a temporal pulse of a signal, we evolve networks whose output turns on slowly after the pulse begins, and shuts down rapidly when the pulse terminates. The best performing networks under our conditions do not fall into categories such as feed forward and negative feedback that also encode the input-output relation we used for selection. Pareto evolution can more efficiently search the space of networks than optimization based on a single ad hoc combination of the design criteria.
Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh
Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman's rank correlation coefficient and Blomqvist's measure, and compared them with Pearson's correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson's correlation, Spearman's rank correlation, and Blomqvist's coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist's coefficient was not confirmed by visual methods. Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data.
Naderi, Elnaz; Mostafaei, Mehdi; Pourshams, Akram
Background. MicroRNAs are small RNA molecules that regulate the expression of certain genes through interaction with mRNA targets and are mainly involved in human cancer. This study was conducted to make the network of miRNAs-mRNAs interactions in pancreatic cancer as the fourth leading cause of cancer death. Methods. 56 miRNAs that were exclusively expressed and 1176 genes that were downregulated or silenced in pancreas cancer were extracted from beforehand investigations. MiRNA–mRNA interactions data analysis and related networks were explored using MAGIA tool and Cytoscape 3 software. Functional annotations of candidate genes in pancreatic cancer were identified by DAVID annotation tool. Results. This network is made of 217 nodes for mRNA, 15 nodes for miRNA, and 241 edges that show 241 regulations between 15 miRNAs and 217 target genes. The miR-24 was the most significantly powerful miRNA that regulated series of important genes. ACVR2B, GFRA1, and MTHFR were significant target genes were that downregulated. Conclusion. Although the collected previous data seems to be a treasure trove, there was no study simultaneous to analysis of miRNAs and mRNAs interaction. Network of miRNA-mRNA interactions will help to corroborate experimental remarks and could be used to refine miRNA target predictions for developing new therapeutic approaches. PMID:24895587
Full Text Available Background. MicroRNAs are small RNA molecules that regulate the expression of certain genes through interaction with mRNA targets and are mainly involved in human cancer. This study was conducted to make the network of miRNAs-mRNAs interactions in pancreatic cancer as the fourth leading cause of cancer death. Methods. 56 miRNAs that were exclusively expressed and 1176 genes that were downregulated or silenced in pancreas cancer were extracted from beforehand investigations. MiRNA–mRNA interactions data analysis and related networks were explored using MAGIA tool and Cytoscape 3 software. Functional annotations of candidate genes in pancreatic cancer were identified by DAVID annotation tool. Results. This network is made of 217 nodes for mRNA, 15 nodes for miRNA, and 241 edges that show 241 regulations between 15 miRNAs and 217 target genes. The miR-24 was the most significantly powerful miRNA that regulated series of important genes. ACVR2B, GFRA1, and MTHFR were significant target genes were that downregulated. Conclusion. Although the collected previous data seems to be a treasure trove, there was no study simultaneous to analysis of miRNAs and mRNAs interaction. Network of miRNA-mRNA interactions will help to corroborate experimental remarks and could be used to refine miRNA target predictions for developing new therapeutic approaches.
Reguly, Teresa; Breitkreutz, Ashton; Boucher, Lorrie; Breitkreutz, Bobby-Joe; Hon, Gary C; Myers, Chad L; Parsons, Ainslie; Friesen, Helena; Oughtred, Rose; Tong, Amy; Stark, Chris; Ho, Yuen; Botstein, David; Andrews, Brenda; Boone, Charles; Troyanskya, Olga G; Ideker, Trey; Dolinski, Kara; Batada, Nizar N; Tyers, Mike
Background The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. Results We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID () and SGD () databases. Conclusion Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. PMID:16762047
Atreya, Ravi V; Sun, Jingchun; Zhao, Zhongming
Drug addiction is a complex and chronic mental disease, which places a large burden on the American healthcare system due to its negative effects on patients and their families. Recently, network pharmacology is emerging as a promising approach to drug discovery by integrating network biology and polypharmacology, allowing for a deeper understanding of molecular mechanisms of drug actions at the systems level. This study seeks to apply this approach for investigation of illicit drugs and their targets in order to elucidate their interaction patterns and potential secondary drugs that can aid future research and clinical care. In this study, we extracted 188 illicit substances and their related information from the DrugBank database. The data process revealed 86 illicit drugs targeting a total of 73 unique human genes, which forms an illicit drug-target network. Compared to the full drug-target network from DrugBank, illicit drugs and their target genes tend to cluster together and form four subnetworks, corresponding to four major medication categories: depressants, stimulants, analgesics, and steroids. External analysis of Anatomical Therapeutic Chemical (ATC) second sublevel classifications confirmed that the illicit drugs have neurological functions or act via mechanisms of stimulants, opioids, and steroids. To further explore other drugs potentially having associations with illicit drugs, we constructed an illicit-extended drug-target network by adding the drugs that have the same target(s) as illicit drugs to the illicit drug-target network. After analyzing the degree and betweenness of the network, we identified hubs and bridge nodes, which might play important roles in the development and treatment of drug addiction. Among them, 49 non-illicit drugs might have potential to be used to treat addiction or have addictive effects, including some results that are supported by previous studies. This study presents the first systematic review of the network
Genes under H NS control can be. (a) regulated by H NS. (b) regulated by H NS and StpA. Because backup by StpA is partial. Page 19. Gene expression level. H NS regulated xenogenes. Other genes. Page 20 ... recollect: H&NS silences highl transcribable genes. Gene expression level unilateral. Other genes epistatic ...
Hulsman, Marc; Lelieveldt, Boudewijn P. F.; de Ridder, Jeroen; Reinders, Marcel
The three dimensional conformation of the genome in the cell nucleus influences important biological processes such as gene expression regulation. Recent studies have shown a strong correlation between chromatin interactions and gene co-expression. However, predicting gene co-expression from frequent long-range chromatin interactions remains challenging. We address this by characterizing the topology of the cortical chromatin interaction network using scale-aware topological measures. We demonstrate that based on these characterizations it is possible to accurately predict spatial co-expression between genes in the mouse cortex. Consistent with previous findings, we find that the chromatin interaction profile of a gene-pair is a good predictor of their spatial co-expression. However, the accuracy of the prediction can be substantially improved when chromatin interactions are described using scale-aware topological measures of the multi-resolution chromatin interaction network. We conclude that, for co-expression prediction, it is necessary to take into account different levels of chromatin interactions ranging from direct interaction between genes (i.e. small-scale) to chromatin compartment interactions (i.e. large-scale). PMID:25965262
Kentzoglanakis, Kyriakos; Poole, Matthew
In this paper, we investigate the problem of reverse engineering the topology of gene regulatory networks from temporal gene expression data. We adopt a computational intelligence approach comprising swarm intelligence techniques, namely particle swarm optimization (PSO) and ant colony optimization (ACO). In addition, the recurrent neural network (RNN) formalism is employed for modeling the dynamical behavior of gene regulatory systems. More specifically, ACO is used for searching the discrete space of network architectures and PSO for searching the corresponding continuous space of RNN model parameters. We propose a novel solution construction process in the context of ACO for generating biologically plausible candidate architectures. The objective is to concentrate the search effort into areas of the structure space that contain architectures which are feasible in terms of their topological resemblance to real-world networks. The proposed framework is initially applied to the reconstruction of a small artificial network that has previously been studied in the context of gene network reverse engineering. Subsequently, we consider an artificial data set with added noise for reconstructing a subnetwork of the genetic interaction network of S. cerevisiae (yeast). Finally, the framework is applied to a real-world data set for reverse engineering the SOS response system of the bacterium Escherichia coli. Results demonstrate the relative advantage of utilizing problem-specific knowledge regarding biologically plausible structural properties of gene networks over conducting a problem-agnostic search in the vast space of network architectures.
Full Text Available Many plant pathogens secrete virulence effectors into host cells to target important proteins in host cellular network. However, the dynamic interactions between effectors and host cellular network have not been fully understood. Here, an integrative network analysis was conducted by combining Arabidopsis thaliana protein–protein interaction network, known targets of Pseudomonas syringae and Hyaloperonospora arabidopsidis effectors, and gene expression profiles in the immune response. In particular, we focused on the characteristic network topology of the effector targets and differentially expressed genes (DEGs. We found that effectors tended to manipulate key network positions with higher betweenness centrality. The effector targets, especially those that are common targets of an individual effector, tended to be clustered together in the network. Moreover, the distances between the effector targets and DEGs increased over time during infection. In line with this observation, pathogen-susceptible mutants tended to have more DEGs surrounding the effector targets compared with resistant mutants. Our results suggest a common plant–pathogen interaction pattern at the cellular network level, where pathogens employ potent local impact mode to interfere with key positions in the host network, and plant organizes an in-depth defense by sequentially activating genes distal to the effector targets.
Soto-Girón, María Juliana; García-Vallejo, Felipe
One key step of human immunodeficiency virus type 1 (HIV-1) infection is the integration of its viral cDNA. This process is mediated through complex networks of host-virus interactions that alter several normal cell functions of the host. To study the complexity of disturbances in cell gene expression networks by HIV-1 integration, we constructed a network of human macrophage genes located close to chromatin regions rich in proviruses. To perform the network analysis, we selected 28 genes previously identified as the target of cDNA integration and their transcriptional profiles were obtained from GEO Profiles (NCBI). A total of 2770 interactions among the 28 genes located around the HIV-1 proviruses in human macrophages formed a highly dense main network connected to five sub-networks. The overall network was significantly enriched by genes associated with signal transduction, cellular communication and regulatory processes. To simulate the effects of HIV-1 integration in infected macrophages, five genes with the most number of interaction in the normal network were turned off by putting in zero the correspondent expression values. The HIV-1 infected network showed changes in its topology and alteration in the macrophage functions reflected in a re-programming of biosynthetic and general metabolic process. Understanding the complex virus-host interactions that occur during HIV-1 integration, may provided valuable genomic information to develop new antiviral treatments focusing on the management of some specific gene expression networks associated with viral integration. This is the first gene network which describes the human macrophages genes interactions related with HIV-1 integration. Copyright © 2011 Elsevier B.V. All rights reserved.
Cicin-Sain, Damjan; Ashyraliyev, Maksat; Jaeger, Johannes
Understanding the complex regulatory networks underlying development and evolution of multi-cellular organisms is a major problem in biology. Computational models can be used as tools to extract the regulatory structure and dynamics of such networks from gene expression data. This approach is called reverse engineering. It has been successfully applied to many gene networks in various biological systems. However, to reconstitute the structure and non-linear dynamics of a developmental gene network in its spatial context remains a considerable challenge. Here, we address this challenge using a case study: the gap gene network involved in segment determination during early development of Drosophila melanogaster. A major problem for reverse-engineering pattern-forming networks is the significant amount of time and effort required to acquire and quantify spatial gene expression data. We have developed a simplified data processing pipeline that considerably increases the throughput of the method, but results in data of reduced accuracy compared to those previously used for gap gene network inference. We demonstrate that we can infer the correct network structure using our reduced data set, and investigate minimal data requirements for successful reverse engineering. Our results show that timing and position of expression domain boundaries are the crucial features for determining regulatory network structure from data, while it is less important to precisely measure expression levels. Based on this, we define minimal data requirements for gap gene network inference. Our results demonstrate the feasibility of reverse-engineering with much reduced experimental effort. This enables more widespread use of the method in different developmental contexts and organisms. Such systematic application of data-driven models to real-world networks has enormous potential. Only the quantitative investigation of a large number of developmental gene regulatory networks will allow us to
Khan, Abhinandan; Saha, Goutam; Pal, Rajat Kumar
A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives
Full Text Available Abstract Background Reverse engineering gene networks and identifying regulatory interactions are integral to understanding cellular decision making processes. Advancement in high throughput experimental techniques has initiated innovative data driven analysis of gene regulatory networks. However, inherent noise associated with biological systems requires numerous experimental replicates for reliable conclusions. Furthermore, evidence of robust algorithms directly exploiting basic biological traits are few. Such algorithms are expected to be efficient in their performance and robust in their prediction. Results We have developed a network identification algorithm to accurately infer both the topology and strength of regulatory interactions from time series gene expression data in the presence of significant experimental noise and non-linear behavior. In this novel formulism, we have addressed data variability in biological systems by integrating network identification with the bootstrap resampling technique, hence predicting robust interactions from limited experimental replicates subjected to noise. Furthermore, we have incorporated non-linearity in gene dynamics using the S-system formulation. The basic network identification formulation exploits the trait of sparsity of biological interactions. Towards that, the identification algorithm is formulated as an integer-programming problem by introducing binary variables for each network component. The objective function is targeted to minimize the network connections subjected to the constraint of maximal agreement between the experimental and predicted gene dynamics. The developed algorithm is validated using both in silico and experimental data-sets. These studies show that the algorithm can accurately predict the topology and connection strength of the in silico networks, as quantified by high precision and recall, and small discrepancy between the actual and predicted kinetic parameters
Greene Casey S
Full Text Available Abstract Background Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF, which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF. Results SURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm. Conclusion Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be
Kleeberger, Steven R.; Ohtsuka, Yoshinori
Inter-individual variation in human responses to air pollutants suggests that some subpopulations are at increased risk to the detrimental effects of pollutant exposure. Extrinsic factors such as previous exposure and nutritional status may influence individual susceptibility. Intrinsic (host) factors that determine susceptibility include age, gender, and pre-existing disease (e.g., asthma), and it is becoming clear that genetic background also contributes to individual susceptibility. Environmental exposures to particulates and genetic factors associated with disease risk likely interact in a complex fashion that varies from one population and one individual to another. The relationships between genetic background and disease risk and severity are often evaluated through traditional family-based linkage studies and positional cloning techniques. However, case-control studies based on association of disease or disease subphenotypes with candidate genes have advantages over family pedigree studies for complex disease phenotypes. This is based in part on continued development of quantitative analysis and the discovery and availability of simple sequence repeats and single nucleotide polymorphisms. Linkage analyses with genetically standardized animal models also provide a useful tool to identify genetic determinants of responses to environmental pollutants. These approaches have identified significant susceptibility quantitative trait loci on mouse chromosomes 1, 6, 11, and 17. Physical mapping and comparative mapping between human and mouse genomes will yield candidate susceptibility genes that may be tested by association studies in human subjects. Human studies and mouse modeling will provide important insight to understanding genetic factors that contribute to differential susceptibility to air pollutants
Su, Mei-Tsz; Lin, Sheng-Hsiang; Chen, Yi-Chi; Kuo, Pao-Lin
Both vascular endothelial growth factor A (VEGFA) and endocrine gland-derived vascular endothelial growth factor (EG-VEGF) systems play major roles in angiogenesis. A body of evidence suggests VEGFs regulate critical processes during pregnancy and have been associated with recurrent pregnancy loss (RPL). However, little information is available regarding the interaction of these two major major angiogenesis-related systems in early human pregnancy. This study was conducted to investigate the association of gene polymorphisms and gene-gene interaction among genes in VEGFA and EG-VEGF systems and idiopathic RPL. A total of 98 women with history of idiopathic RPL and 142 controls were included, and 5 functional SNPs selected from VEGFA, KDR, EG-VEGF (PROK1), PROKR1 and PROKR2 were genotyped. We used multifactor dimensionality reduction (MDR) analysis to choose a best model and evaluate gene-gene interactions. Ingenuity pathways analysis (IPA) was introduced to explore possible complex interactions. Two receptor gene polymorphisms [KDR (Q472H) and PROKR2 (V331M)] were significantly associated with idiopathic RPL (P<0.01). The MDR test revealed that the KDR (Q472H) polymorphism was the best loci to be associated with RPL (P=0.02). IPA revealed EG-VEGF and VEGFA systems shared several canonical signaling pathways that may contribute to gene-gene interactions, including the Akt, IL-8, EGFR, MAPK, SRC, VHL, HIF-1A and STAT3 signaling pathways. Two receptor gene polymorphisms [KDR (Q472H) and PROKR2 (V331M)] were significantly associated with idiopathic RPL. EG-VEGF and VEGFA systems shared several canonical signaling pathways that may contribute to gene-gene interactions, including the Akt, IL-8, EGFR, MAPK, SRC, VHL, HIF-1A and STAT3.
Full Text Available Abstract Background The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms. Results We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches. Conclusion The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification.
Full Text Available Fronto Temporal Dementia (FTD is a neurodegenerative disorder characterized by degeneration of the fronto temporal lobes and abnormal protein inclusions. It exhibits a broad clinicopathological spectrum and has been linked to mutations in seven different genes. We will provide a picture, which connects the products of these genes, albeit diverse in nature and function, in a network. Despite the paucity of information available for some of these genes, we believe that RNA processing and post-transcriptional regulation of gene expression might constitute a common theme in the network. Recent studies have unraveled the role of mutations affecting the functions of RNA binding proteins and regulation of microRNAs. This review will combine all the recent findings on genes involved in the pathogenesis of FTD, highlighting the importance of a common network of interactions in order to study and decipher the heterogeneous clinical manifestations associated with FTD. This approach could be helpful for the research of potential therapeutic strategies.
Guthke, Reinhard; Möller, Ulrich; Hoffmann, Martin; Thies, Frank; Töpfer, Susanne
The immune response to bacterial infection represents a complex network of dynamic gene and protein interactions. We present an optimized reverse engineering strategy aimed at a reconstruction of this kind of interaction networks. The proposed approach is based on both microarray data and available biological knowledge. The main kinetics of the immune response were identified by fuzzy clustering of gene expression profiles (time series). The number of clusters was optimized using various evaluation criteria. For each cluster a representative gene with a high fuzzy-membership was chosen in accordance with available physiological knowledge. Then hypothetical network structures were identified by seeking systems of ordinary differential equations, whose simulated kinetics could fit the gene expression profiles of the cluster-representative genes. For the construction of hypothetical network structures singular value decomposition (SVD) based methods and a newly introduced heuristic Network Generation Method here were compared. It turned out that the proposed novel method could find sparser networks and gave better fits to the experimental data. Reinhard.Guthke@hki-jena.de.
Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong
Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.
Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high
Full Text Available In biology, networks are used in different contexts as ways to represent relationships between entities, such as for instance interactions between genes, proteins or metabolites. Despite progress in the analysis of such networks and their potential to better understand the collective impact of genes on complex traits, one remaining challenge is to establish the biologic validity of gene co-expression networks and to determine what governs their organization. We used WGCNA to construct and analyze seven gene expression datasets from several tissues of mouse recombinant inbred strains (RIS. For six out of the 7 networks, we found that linkage to module QTLs (mQTLs could be established for 29.3% of gene co-expression modules detected in the several mouse RIS. For about 74.6% of such genetically-linked modules, the mQTL was on the same chromosome as the one contributing most genes to the module, with genes originating from that chromosome showing higher connectivity than other genes in the modules. Such modules (that we considered as genetically-driven had network statistic properties (density, centralization and heterogeneity that set them apart from other modules in the network. Altogether, a sizeable portion of gene co-expression modules detected in mouse RIS panels had genetic determinants as their main organizing principle. In addition to providing a biologic interpretation validation for these modules, these genetic determinants imparted on them particular properties that set them apart from other modules in the network, to the point that they can be predicted to a large extent on the basis of their network statistics.
A mathematical tool for scientists and researchers who work with computer and communication networks, Game Theory in Communication Networks: Cooperative Resolution of Interactive Networking Scenarios addresses the question of how to promote cooperative behavior in interactive situations between heterogeneous entities in communication networking scenarios. It explores network design and management from a theoretical perspective, using game theory and graph theory to analyze strategic situations and demonstrate profitable behaviors of the cooperative entities. The book promotes the use of Game T
Full Text Available Genetic interactions help map biological processes and their functional relationships. A genetic interaction is defined as a deviation from the expected phenotype when combining multiple genetic mutations. In Saccharomyces cerevisiae, most genetic interactions are measured under a single phenotype - growth rate in standard laboratory conditions. Recently genetic interactions have been collected under different phenotypic readouts and experimental conditions. How different are these networks and what can we learn from their differences? We conducted a systematic analysis of quantitative genetic interaction networks in yeast performed under different experimental conditions. We find that networks obtained using different phenotypic readouts, in different conditions and from different laboratories overlap less than expected and provide significant unique information. To exploit this information, we develop a novel method to combine individual genetic interaction data sets and show that the resulting network improves gene function prediction performance, demonstrating that individual networks provide complementary information. Our results support the notion that using diverse phenotypic readouts and experimental conditions will substantially increase the amount of gene function information produced by genetic interaction screens.
Full Text Available Abstract Background Molecular networks represent the backbone of molecular activity within cells and provide opportunities for understanding the mechanism of diseases. While protein-protein interaction data constitute static network maps, integration of condition-specific co-expression information provides clues to the dynamic features of these networks. Dilated cardiomyopathy is a leading cause of heart failure. Although previous studies have identified putative biomarkers or therapeutic targets for heart failure, the underlying molecular mechanism of dilated cardiomyopathy remains unclear. Results We developed a network-based comparative analysis approach that integrates protein-protein interactions with gene expression profiles and biological function annotations to reveal dynamic functional modules under different biological states. We found that hub proteins in condition-specific co-expressed protein interaction networks tended to be differentially expressed between biological states. Applying this method to a cohort of heart failure patients, we identified two functional modules that significantly emerged from the interaction networks. The dynamics of these modules between normal and disease states further suggest a potential molecular model of dilated cardiomyopathy. Conclusions We propose a novel framework to analyze the interaction networks in different biological states. It successfully reveals network modules closely related to heart failure; more importantly, these network dynamics provide new insights into the cause of dilated cardiomyopathy. The revealed molecular modules might be used as potential drug targets and provide new directions for heart failure therapy.
evolved numerous mechanisms to controlgene expression in response to specific environmental signals. In addition to two-component systems, small regulatory RNAs (sRNAs) have emerged as major regulators of gene expression. The majority of sRNAs bind to mRNA and regulate their expression. They often have...... multiple targets and are incorporated into large regulatory networks and the RNA chaper one Hfq in many cases facilitates interactions between sRNAs and their targets. Some sRNAs also act by binding to protein targets and sequestering their function. In this PhD thesis we investigated the transcriptional....... Detailed insights into the mechanisms through which P. putida responds to different stress conditions and increased understanding of bacterial adaptation in natural and industrial settings were gained. Additionally, we identified genome-wide transcription start sites, andmany regulatory RNA elements...
Full Text Available Molecular interaction networks establish all cell biological processes. The networks are under intensive research that is facilitated by new high-throughput measurement techniques for the detection, quantification, and characterization of molecules and their physical interactions. For the common model organism yeast Saccharomyces cerevisiae, public databases store a significant part of the accumulated information and, on the way to better understanding of the cellular processes, there is a need to integrate this information into a consistent reconstruction of the molecular interaction network. This work presents and validates RefRec, the most comprehensive molecular interaction network reconstruction currently available for yeast. The reconstruction integrates protein synthesis pathways, a metabolic network, and a protein-protein interaction network from major biological databases. The core of the reconstruction is based on a reference object approach in which genes, transcripts, and proteins are identified using their primary sequences. This enables their unambiguous identification and non-redundant integration. The obtained total number of different molecular species and their connecting interactions is approximately 67,000. In order to demonstrate the capacity of RefRec for functional predictions, it was used for simulating the gene knockout damage propagation in the molecular interaction network in approximately 590,000 experimentally validated mutant strains. Based on the simulation results, a statistical classifier was subsequently able to correctly predict the viability of most of the strains. The results also showed that the usage of different types of molecular species in the reconstruction is important for accurate phenotype prediction. In general, the findings demonstrate the benefits of global reconstructions of molecular interaction networks. With all the molecular species and their physical interactions explicitly modeled, our
Full Text Available Thickening of tree stems is the result of secondary growth, accomplished by the meristematic activity of the vascular cambium. Secondary growth of the stem entails developmental cascades resulting in the formation of secondary phloem outwards and secondary xylem (i.e., wood inwards of the stem. Signaling and transcriptional reprogramming by the phytohormone ethylene modifies cambial growth and cell differentiation, but the molecular link between ethylene and secondary growth remains unknown. We addressed this shortcoming by analyzing expression profiles and co-expression networks of ethylene pathway genes using the AspWood transcriptome database which covers all stages of secondary growth in aspen (Populus tremula stems. ACC synthase expression suggests that the ethylene precursor 1-aminocyclopropane-1-carboxylic acid (ACC is synthesized during xylem expansion and xylem cell maturation. Ethylene-mediated transcriptional reprogramming occurs during all stages of secondary growth, as deduced from AspWood expression profiles of ethylene-responsive genes. A network centrality analysis of the AspWood dataset identified EIN3D and 11 ERFs as hubs. No overlap was found between the co-expressed genes of the EIN3 and ERF hubs, suggesting target diversification and hence independent roles for these transcription factor families during normal wood formation. The EIN3D hub was part of a large co-expression gene module, which contained 16 transcription factors, among them several new candidates that have not been earlier connected to wood formation and a VND-INTERACTING 2 (VNI2 homolog. We experimentally demonstrated Populus EIN3D function in ethylene signaling in Arabidopsis thaliana. The ERF hubs ERF118 and ERF119 were connected on the basis of their expression pattern and gene co-expression module composition to xylem cell expansion and secondary cell wall formation, respectively. We hereby establish data resources for ethylene-responsive genes and
Full Text Available Proteins within a molecular network are expected to be subject to different selective pressures depending on their relative hierarchical positions. However, it is not obvious what genes within a network should be more likely to evolve under positive selection. On one hand, only mutations at genes with a relatively high degree of control over adaptive phenotypes (such as those encoding highly connected proteins are expected to be “seen” by natural selection. On the other hand, a high degree of pleiotropy at these genes is expected to hinder adaptation. Previous analyses of the human protein-protein interaction network have shown that genes under long-term, recurrent positive selection (as inferred from interspecific comparisons tend to act at the periphery of the network. It is unknown, however, whether these trends apply to other organisms. Here, we show that long-term positive selection has preferentially targeted the periphery of the yeast interactome. Conversely, in flies, genes under positive selection encode significantly more connected and central proteins. These observations are not due to covariation of genes’ adaptability and centrality with confounding factors. Therefore, the distribution of proteins encoded by genes under recurrent positive selection across protein-protein interaction networks varies from one species to another.
Yousefi, Mohammadmahdi R; Dougherty, Edward R
A basic issue for translational genomics is to model gene interaction via gene regulatory networks (GRNs) and thereby provide an informatics environment to study the effects of intervention (say, via drugs) and to derive effective intervention strategies. Taking the view that the phenotype is characterized by the long-run behavior (steady-state distribution) of the network, we desire interventions to optimally move the probability mass from undesirable to desirable states Heretofore, two external control approaches have been taken to shift the steady-state mass of a GRN: (i) use a user-defined cost function for which desirable shift of the steady-state mass is a by-product and (ii) use heuristics to design a greedy algorithm. Neither approach provides an optimal control policy relative to long-run behavior. We use a linear programming approach to optimally shift the steady-state mass from undesirable to desirable states, i.e. optimization is directly based on the amount of shift and therefore must outperform previously proposed methods. Moreover, the same basic linear programming structure is used for both unconstrained and constrained optimization, where in the latter case, constraints on the optimization limit the amount of mass that may be shifted to 'ambiguous' states, these being states that are not directly undesirable relative to the pathology of interest but which bear some perceived risk. We apply the method to probabilistic Boolean networks, but the theory applies to any Markovian GRN. Supplementary materials, including the simulation results, MATLAB source code and description of suboptimal methods are available at http://gsp.tamu.edu/Publications/supplementary/yousefi13b. email@example.com Supplementary data are available at Bioinformatics online.
Full Text Available In this study, we infer the breast cancer gene regulatory network from gene expression data. This network is obtained from the application of the BC3Net inference algorithm to a large-scale gene expression data set consisting of $351$ patient samples. In order to elucidate the functional relevance of the inferred network, we are performing a Gene Ontology (GO analysis for its structural components. Our analysis reveals that most significant GO-terms we find for the breast cancer network represent functional modules of biological processes that are described by known cancer hallmarks, including translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. Furthermore, by using a curated list of census cancer genes, we find an enrichment in these functional modules. Finally, we study cooperative effects of chromosomes based on information of interacting genes in the beast cancer network. We find that chromosome $21$ is most coactive with other chromosomes. To our knowledge this is the first study investigating the genome-scale breast cancer network.
Nussinov, Ruth; Panchenko, Anna R.; Przytycka, Teresa
networks have been identified, including scale free distribution of the vertex degree, network motifs, and modularity, to name a few. These studies of network organization require the network to be as complete as possible, which given the limitations of experimental techniques is not currently the case. Therefore, experimental procedures for detecting biomolecular interactions should be complemented by computational approaches. The paper by Lees et al provides a review of computational methods, integrating multiple independent sources of data to infer physical and functional protein-protein interaction networks. One of the important aspects of protein interactions that should be accounted for in the prediction of protein interaction networks is that many proteins are composed of distinct domains. Protein domains may mediate protein interactions while proteins and their interaction networks may gain complexity through gene duplication and expansion of existing domain architectures via domain rearrangements. The latter mechanisms have been explored in detail in the paper by Cohen-Gihon et al. Protein-protein interactions are not the only component of the cell's interactome. Regulation of cell activity can be achieved at the level of transcription and involve a transcription factor—DNA binding which typically requires recognition of a specific DNA sequence motif. Chip-Chip and the more recent Chip-Seq technologies allow in vivo identification of DNA binding sites and, together with novel in vitro approaches, provide data necessary for deciphering the corresponding binding motifs. Such information, complemented by structures of protein-DNA complexes and knowledge of the differences in binding sites among homologs, opens the door to constructing predictive binding models. The paper by Persikov and Singh provides an example of such a model in the Cys2His2 zinc finger family. Recent studies have indicated that the presence of such binding motifs is, however, neither necessary
Zhou, Xiaobo; Qiu, Weiliang; Sathirapongsasuti, J. Fah.; Cho, Michael H.; Mancini, John D.; Lao, Taotao; Thibault, Derek M.; Litonjua, Gus; Bakke, Per S.; Gulsvik, Amund; Lomas, David A.; Beaty, Terri H.; Hersh, Craig P.; Anderson, Christopher; Geigenmuller, Ute; Raby, Benjamin A.; Rennard, Stephen I.; Perrella, Mark A.; Choi, Augustine M.K.; Quackenbush, John; Silverman, Edwin K.
Hedgehog Interacting Protein (HHIP) was implicated in chronic obstructive pulmonary disease (COPD) by genome-wide association studies (GWAS). However, it remains unclear how HHIP contributes to COPD pathogenesis. To identify genes regulated by HHIP, we performed gene expression microarray analysis in a human bronchial epithelial cell line (Beas-2B) stably infected with HHIP shRNAs. HHIP silencing led to differential expression of 296 genes; enrichment for variants nominally associated with COPD was found. Eighteen of the differentially expressed genes were validated by real-time PCR in Beas-2B cells. Seven of 11 validated genes tested in human COPD and control lung tissues demonstrated significant gene expression differences. Functional annotation indicated enrichment for extracellular matrix and cell growth genes. Network modeling demonstrated that the extracellular matrix and cell proliferation genes influenced by HHIP tended to be interconnected. Thus, we identified potential HHIP targets in human bronchial epithelial cells that may contribute to COPD pathogenesis. PMID:23459001
Full Text Available The network-based approach has been used to describe the relationship among genes and various phenotypes, producing a network describing complex biological relationships. Such networks can be constructed by aggregating previously reported associations in the literature from various databases. In this work, we applied the network-based approach to investigate how different brain areas are associated to genetic disorders and genes. In particular, a tripartite network with genes, genetic diseases, and brain areas was constructed based on the associations among them reported in the literature through text mining. In the resulting network, a disproportionately large number of gene-disease and disease-brain associations were attributed to a small subset of genes, diseases, and brain areas. Furthermore, a small number of brain areas were found to be associated with a large number of the same genes and diseases. These core brain regions encompassed the areas identified by the previous genome-wide association studies, and suggest potential areas of focus in the future imaging genetics research. The approach outlined in this work demonstrates the utility of the network-based approach in studying genetic effects on the brain.
David J. Burks
Full Text Available The Auxin Response Factor (ARF family of transcription factors is an important regulator of environmental response and symbiotic nodulation in the legume Medicago truncatula. While previous studies have identified members of this family, a recent spurt in gene expression data coupled with genome update and reannotation calls for a reassessment of the prevalence of ARF genes and their interaction networks in M. truncatula. We performed a comprehensive analysis of the M. truncatula genome and transcriptome that entailed search for novel ARF genes and the co-expression networks. Our investigation revealed 8 novel M. truncatula ARF (MtARF genes, of the total 22 identified, and uncovered novel gene co-expression networks as well. Furthermore, the topological clustering and single enrichment analysis of several network models revealed the roles of individual members of the MtARF family in nitrogen regulation, nodule initiation, and post-embryonic development through a specialized protein packaging and secretory pathway. In summary, this study not just shines new light on an important gene family, but also provides a guideline for identification of new members of gene families and their functional characterization through network analyses.
Pandey, Gaurav; Arora, Sonali; Manocha, Sahil; Whalen, Sean
Protein interaction networks are a promising type of data for studying complex biological systems. However, despite the rich information embedded in these networks, these networks face important data quality challenges of noise and incompleteness that adversely affect the results obtained from their analysis. Here, we apply a robust measure of local network structure called common neighborhood similarity (CNS) to address these challenges. Although several CNS measures have been proposed in the literature, an understanding of their relative efficacies for the analysis of interaction networks has been lacking. We follow the framework of graph transformation to convert the given interaction network into a transformed network corresponding to a variety of CNS measures evaluated. The effectiveness of each measure is then estimated by comparing the quality of protein function predictions obtained from its corresponding transformed network with those from the original network. Using a large set of human and fly protein interactions, and a set of over GO terms for both, we find that several of the transformed networks produce more accurate predictions than those obtained from the original network. In particular, the measure and other continuous CNS measures perform well this task, especially for large networks. Further investigation reveals that the two major factors contributing to this improvement are the abilities of CNS measures to prune out noisy edges and enhance functional coherence in the transformed networks. PMID:25275489
Banky, Daniel; Ordog, Rafael; Grolmusz, Vince
Large quantity of reliable protein interaction data are available for model organisms in public depositories (e.g., MINT, DIP, HPRD, INTERACT). Most data correspond to experiments with the proteins of Saccharomyces cerevisiae, Drosophila melanogaster, Homo sapiens, Caenorhabditis elegans, Escherichia coli and Mus musculus. For other important organisms the data availability is poor or non-existent. Here we present NASCENT, a completely automatic web-based tool and also a downloadable Java program, capable of modeling and generating protein interaction networks even for non-model organisms. The tool performs protein interaction network modeling through gene-name mapping, and outputs the resulting network in graphical form and also in computer-readable graph-forms, directly applicable by popular network modeling software. http://nascent.pitgroup.org.
Antoci, Angelo; Sabatini, Fabio
There is growing evidence that face-to-face interaction is declining in many countries, exacerbating the phenomenon of social isolation. On the other hand, social interaction through online networking sites is steeply rising. To analyze these societal dynamics, we have built an evolutionary game model in which agents can choose between three strategies of social participation: 1) interaction via both online social networks and face-to-face encounters; 2) interaction by exclusive means of face...
Takaku, Tomoiku; Ohyashiki, Junko H.; Zhang, Yu; Ohyashiki, Kazuma
The immune response to viral infection involves complex network of dynamic gene and protein interactions. We present here the dynamic gene network of the host immune response during human herpesvirus type 6 (HHV-6) infection in an adult T-cell leukemia cell line. Using a pathway-focused oligonucleotide DNA microarray, we found a possible association between chemokine genes regulating Th1/Th2 balance and genes regulating T-cell proliferation during HHV-6B infection. Gene network analysis using an integrated comprehensive workbench, VoyaGene, revealed that a gene encoding a TEC-family kinase, ITK, might be a putative modulator in the host immune response against HHV-6B infection. We conclude that Th2-dominated inflammatory reaction in host cells may play an important role in HHV-6B-infected T cells, thereby suggesting the possibility that ITK might be a therapeutic target in diseases related to dysregulation of Th1/Th2 balance. This study describes a novel approach to find genes related with the complex host-virus interaction using microarray data employing the Bayesian statistical framework
Thibodeau, Asa; Márquez, Eladio J; Luo, Oscar; Ruan, Yijun; Menghi, Francesca; Shin, Dong-Guk; Stitzel, Michael L; Vera-Licona, Paola; Ucar, Duygu
Full Text Available Network analysis is a novel method to understand the complex pathogenesis of inflammation-driven atherosclerosis. Using this approach, we attempted to identify key inflammatory genes and their core transcriptional regulators in coronary artery disease (CAD. Initially, we obtained 124 candidate genes associated with inflammation and CAD using Polysearch and CADgene database for which protein-protein interaction network was generated using STRING 9.0 (Search Tool for the Retrieval of Interacting Genes and visualized using Cytoscape v 2.8.3. Based on betweenness centrality (BC and node degree as key topological parameters, we identified interleukin-6 (IL-6, vascular endothelial growth factor A (VEGFA, interleukin-1 beta (IL-1B, tumor necrosis factor (TNF and prostaglandin-endoperoxide synthase 2 (PTGS2 as hub nodes. The backbone network constructed with these five hub genes showed 111 nodes connected via 348 edges, with IL-6 having the largest degree and highest BC. Nuclear factor kappa B1 (NFKB1, signal transducer and activator of transcription 3 (STAT3 and JUN were identified as the three core transcription factors from the regulatory network derived using MatInspector. For the purpose of validation of the hub genes, 97 test networks were constructed, which revealed the accuracy of the backbone network to be 0.7763 while the frequency of the hub nodes remained largely unaltered. Pathway enrichment analysis with ClueGO, KEGG and REACTOME showed significant enrichment of six validated CAD pathways - smooth muscle cell proliferation, acute-phase response, calcidiol 1-monooxygenase activity, toll-like receptor signaling, NOD-like receptor signaling and adipocytokine signaling pathways. Experimental verification of the above findings in 64 cases and 64 controls showed increased expression of the five candidate genes and the three transcription factors in the cases relative to the controls (p<0.05. Thus, analysis of complex networks aid in the
Full Text Available Abstract Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user can be analyzed in the context of known
Full Text Available Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.We report a significantly improved version (v. 2 of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome. YeastNet is available from http://www.yeastnet.org.
Taylor, Ronald C.; Singhal, Mudita; Daly, Don S.; Gilmore, Jason M.; Cannon, William R.; Domico, Kelly O.; White, Amanda M.; Auberry, Deanna L.; Auberry, Kenneth J.; Hooker, Brian S.; Hurst, G. B.; McDermott, Jason E.; McDonald, W. H.; Pelletier, Dale A.; Schmoyer, Denise A.; Wiley, H. S.
An analysis pipeline has been created for deployment of a novel algorithm, the Bayesian Estimator of Protein-Protein Association Probabilities (BEPro), for use in the reconstruction of protein-protein interaction networks. We have combined the Software Environment for BIological Network Inference (SEBINI), an interactive environment for the deployment and testing of network inference algorithms that use high-throughput data, and the Collective Analysis of Biological Interaction Networks (CABIN), software that allows integration and analysis of protein-protein interaction and gene-to-gene regulatory evidence obtained from multiple sources, to allow interactions computed by BEPro to be stored, visualized, and further analyzed. Incorporating BEPro into SEBINI and automatically feeding the resulting inferred network into CABIN, we have created a structured workflow for protein-protein network inference and supplemental analysis from sets of mass spectrometry bait-prey experiment data. SEBINI demo site: https://www.emsl.pnl.gov /SEBINI/ Contact: firstname.lastname@example.org. BEPro is available at http://www.pnl.gov/statistics/BEPro3/index.htm. Contact: email@example.com. CABIN is available at http://www.sysbio.org/dataresources/cabin.stm. Contact: firstname.lastname@example.org.
Davin, Nicolas; Edger, Patrick P; Hefer, Charles A; Mizrachi, Eshchar; Schuetz, Mathias; Smets, Erik; Myburg, Alexander A; Douglas, Carl J; Schranz, Michael E; Lens, Frederic
Many plant genes are known to be involved in the development of cambium and wood, but how the expression and functional interaction of these genes determine the unique biology of wood remains largely unknown. We used the soc1ful loss of function mutant - the woodiest genotype known in the otherwise herbaceous model plant Arabidopsis - to investigate the expression and interactions of genes involved in secondary growth (wood formation). Detailed anatomical observations of the stem in combination with mRNA sequencing were used to assess transcriptome remodeling during xylogenesis in wild-type and woody soc1ful plants. To interpret the transcriptome changes, we constructed functional gene association networks of differentially expressed genes using the STRING database. This analysis revealed functionally enriched gene association hubs that are differentially expressed in herbaceous and woody tissues. In particular, we observed the differential expression of genes related to mechanical stress and jasmonate biosynthesis/signaling during wood formation in soc1ful plants that may be an effect of greater tension within woody tissues. Our results suggest that habit shifts from herbaceous to woody life forms observed in many angiosperm lineages could have evolved convergently by genetic changes that modulate the gene expression and interaction network, and thereby redeploy the conserved wood developmental program. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.
Chekaleva, Nadezhda V.; Makarova, Natalia S.; Drobotenko, Yulia B.
The study presented in the article is devoted to the analysis of theory and practice of network interaction within the framework of education clusters. Education clusters are considered to be a novel form of network interaction in pedagogical education in Russia. The aim of the article is to show the advantages and disadvantages of the cluster…
Ling, Hong; Samarasinghe, Sandhya; Kulasiri, Don
Understanding the control of cellular networks consisting of gene and protein interactions and their emergent properties is a central activity of Systems Biology research. For this, continuous, discrete, hybrid, and stochastic methods have been proposed. Currently, the most common approach to modelling accurate temporal dynamics of networks is ordinary differential equations (ODE). However, critical limitations of ODE models are difficulty in kinetic parameter estimation and numerical solution of a large number of equations, making them more suited to smaller systems. In this article, we introduce a novel recurrent artificial neural network (RNN) that addresses above limitations and produces a continuous model that easily estimates parameters from data, can handle a large number of molecular interactions and quantifies temporal dynamics and emergent systems properties. This RNN is based on a system of ODEs representing molecular interactions in a signalling network. Each neuron represents concentration change of one molecule represented by an ODE. Weights of the RNN correspond to kinetic parameters in the system and can be adjusted incrementally during network training. The method is applied to the p53-Mdm2 oscillation system - a crucial component of the DNA damage response pathways activated by a damage signal. Simulation results indicate that the proposed RNN can successfully represent the behaviour of the p53-Mdm2 oscillation system and solve the parameter estimation problem with high accuracy. Furthermore, we presented a modified form of the RNN that estimates parameters and captures systems dynamics from sparse data collected over relatively large time steps. We also investigate the robustness of the p53-Mdm2 system using the trained RNN under various levels of parameter perturbation to gain a greater understanding of the control of the p53-Mdm2 system. Its outcomes on robustness are consistent with the current biological knowledge of this system. As more
Fung, David C Y; Wilkins, Marc R; Hart, David; Hong, Seok-Hee
The force-directed layout is commonly used in computer-generated visualizations of protein-protein interaction networks. While it is good for providing a visual outline of the protein complexes and their interactions, it has two limitations when used as a visual analysis method. The first is poor reproducibility. Repeated running of the algorithm does not necessarily generate the same layout, therefore, demanding cognitive readaptation on the investigator's part. The second limitation is that it does not explicitly display complementary biological information, e.g. Gene Ontology, other than the protein names or gene symbols. Here, we present an alternative layout called the clustered circular layout. Using the human DNA replication protein-protein interaction network as a case study, we compared the two network layouts for their merits and limitations in supporting visual analysis.
Bettencourt, Conceição; Forabosco, Paola; Wiethoff, Sarah; Heidari, Moones; Johnstone, Daniel M; Botía, Juan A; Collingwood, Joanna F; Hardy, John; Milward, Elizabeth A; Ryten, Mina; Houlden, Henry
Aberrant brain iron deposition is observed in both common and rare neurodegenerative disorders, including those categorized as Neurodegeneration with Brain Iron Accumulation (NBIA), which are characterized by focal iron accumulation in the basal ganglia. Two NBIA genes are directly involved in iron metabolism, but whether other NBIA-related genes also regulate iron homeostasis in the human brain, and whether aberrant iron deposition contributes to neurodegenerative processes remains largely unknown. This study aims to expand our understanding of these iron overload diseases and identify relationships between known NBIA genes and their main interacting partners by using a systems biology approach. We used whole-transcriptome gene expression data from human brain samples originating from 101 neuropathologically normal individuals (10 brain regions) to generate weighted gene co-expression networks and cluster the 10 known NBIA genes in an unsupervised manner. We investigated NBIA-enriched networks for relevant cell types and pathways, and whether they are disrupted by iron loading in NBIA diseased tissue and in an in vivo mouse model. We identified two basal ganglia gene co-expression modules significantly enriched for NBIA genes, which resemble neuronal and oligodendrocytic signatures. These NBIA gene networks are enriched for iron-related genes, and implicate synapse and lipid metabolism related pathways. Our data also indicates that these networks are disrupted by excessive brain iron loading. We identified multiple cell types in the origin of NBIA disorders. We also found unforeseen links between NBIA networks and iron-related processes, and demonstrate convergent pathways connecting NBIAs and phenotypically overlapping diseases. Our results are of further relevance for these diseases by providing candidates for new causative genes and possible points for therapeutic intervention. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Full Text Available Autism spectrum disorder (ASD is marked by a strong genetic heterogeneity, which is underlined by the low overlap between ASD risk gene lists proposed in different studies. In this context, molecular networks can be used to analyze the results of several genome-wide studies in order to underline those network regions harboring genetic variations associated with ASD, the so-called “disease modules.” In this work, we used a recent network diffusion-based approach to jointly analyze multiple ASD risk gene lists. We defined genome-scale prioritizations of human genes in relation to ASD genes from multiple studies, found significantly connected gene modules associated with ASD and predicted genes functionally related to ASD risk genes. Most of them play a role in synapsis and neuronal development and function; many are related to syndromes that can be in comorbidity with ASD and the remaining are involved in epigenetics, cell cycle, cell adhesion and cancer.
Mustafin, Zakhar Sergeevich; Lashin, Sergey Alexandrovich; Matushkin, Yury Georgievich; Gunbin, Konstantin Vladimirovich; Afonnikov, Dmitry Arkadievich
There are many available software tools for visualization and analysis of biological networks. Among them, Cytoscape ( http://cytoscape.org/ ) is one of the most comprehensive packages, with many plugins and applications which extends its functionality by providing analysis of protein-protein interaction, gene regulatory and gene co-expression networks, metabolic, signaling, neural as well as ecological-type networks including food webs, communities networks etc. Nevertheless, only three plugins tagged 'network evolution' found in Cytoscape official app store and in literature. We have developed a new Cytoscape 3.0 application Orthoscape aimed to facilitate evolutionary analysis of gene networks and visualize the results. Orthoscape aids in analysis of evolutionary information available for gene sets and networks by highlighting: (1) the orthology relationships between genes; (2) the evolutionary origin of gene network components; (3) the evolutionary pressure mode (diversifying or stabilizing, negative or positive selection) of orthologous groups in general and/or branch-oriented mode. The distinctive feature of Orthoscape is the ability to control all data analysis steps via user-friendly interface. Orthoscape allows its users to analyze gene networks or separated gene sets in the context of evolution. At each step of data analysis, Orthoscape also provides for convenient visualization and data manipulation.
Ayhan, Yavuz; Sawa, Akira; Ross, Christopher A; Pletnikov, Mikhail V
The pathogenesis of schizophrenia and related mental illnesses likely involves multiple interactions between susceptibility genes of small effects and environmental factors. Gene-environment interactions occur across different stages of neurodevelopment to produce heterogeneous clinical and pathological manifestations of the disease. The main obstacle for mechanistic studies of gene-environment interplay has been the paucity of appropriate experimental systems for elucidating the molecular pathways that mediate gene-environment interactions relevant to schizophrenia. Recent advances in psychiatric genetics and a plethora of experimental data from animal studies allow us to suggest a new approach to gene-environment interactions in schizophrenia. We propose that animal models based on identified genetic mutations and measurable environment factors will help advance studies of the molecular mechanisms of gene-environment interplay.
Full Text Available Abstract Background Visualization concerns the representation of data visually and is an important task in scientific research. Protein-protein interactions (PPI are discovered using either wet lab techniques, such mass spectrometry, or in silico predictions tools, resulting in large collections of interactions stored in specialized databases. The set of all interactions of an organism forms a protein-protein interaction network (PIN and is an important tool for studying the behaviour of the cell machinery. Since graphic representation of PINs may highlight important substructures, e.g. protein complexes, visualization is more and more used to study the underlying graph structure of PINs. Although graphs are well known data structures, there are different open problems regarding PINs visualization: the high number of nodes and connections, the heterogeneity of nodes (proteins and edges (interactions, the possibility to annotate proteins and interactions with biological information extracted by ontologies (e.g. Gene Ontology that enriches the PINs with semantic information, but complicates their visualization. Methods In these last years many software tools for the visualization of PINs have been developed. Initially thought for visualization only, some of them have been successively enriched with new functions for PPI data management and PIN analysis. The paper analyzes the main software tools for PINs visualization considering four main criteria: (i technology, i.e. availability/license of the software and supported OS (Operating System platforms; (ii interoperability, i.e. ability to import/export networks in various formats, ability to export data in a graphic format, extensibility of the system, e.g. through plug-ins; (iii visualization, i.e. supported layout and rendering algorithms and availability of parallel implementation; (iv analysis, i.e. availability of network analysis functions, such as clustering or mining of the graph, and the
Full Text Available Abstract Background The existence of negative correlations between degrees of interacting proteins is being discussed since such negative degree correlations were found for the large-scale yeast protein-protein interaction (PPI network of Ito et al. More recent studies observed no such negative correlations for high-confidence interaction sets. In this article, we analyzed a range of experimentally derived interaction networks to understand the role and prevalence of degree correlations in PPI networks. We investigated how degree correlations influence the structure of networks and their tolerance against perturbations such as the targeted deletion of hubs. Results For each PPI network, we simulated uncorrelated, positively and negatively correlated reference networks. Here, a simple model was developed which can create different types of degree correlations in a network without changing the degree distribution. Differences in static properties associated with degree correlations were compared by analyzing the network characteristics of the original PPI and reference networks. Dynamics were compared by simulating the effect of a selective deletion of hubs in all networks. Conclusion Considerable differences between the network types were found for the number of components in the original networks. Negatively correlated networks are fragmented into significantly less components than observed for positively correlated networks. On the other hand, the selective deletion of hubs showed an increased structural tolerance to these deletions for the positively correlated networks. This results in a lower rate of interaction loss in these networks compared to the negatively correlated networks and a decreased disintegration rate. Interestingly, real PPI networks are most similar to the randomly correlated references with respect to all properties analyzed. Thus, although structural properties of networks can be modified considerably by degree
Sep 28, 2015 ... Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging ... two types of methods differ primarily based on whether ..... negligible, allowing us to draw the qualitative conclusions .... research will be conducted to develop additional biologically.
Full Text Available A genomewide transcriptome assay of two subtropical genotypes of maize was used to observe the expression of genes at seedling stage of drought stress. The number of genes expressed differentially was greater in HKI1532 (a drought tolerant genotype than in PC3 (a drought sensitive genotype, indicating primary differences at the transcriptional level in stress tolerance. The global coexpression networks of the two genotypes differed significantly with respect to the number of modules and the coexpression pattern within the modules. A total of 174 drought-responsive genes were selected from HKI1532, and their coexpression network revealed key correlations between different adaptive pathways, each cluster of the network representing a specific biological function. Transcription factors related to ABA-dependent stomatal closure, signalling, and phosphoprotein cascades work in concert to compensate for reduced photosynthesis. Under stress, water balance was maintained by coexpression of the genes involved in osmotic adjustments and transporter proteins. Metabolism was maintained by the coexpression of genes involved in cell wall modification and protein and lipid metabolism. The interaction of genes involved in crucial biological functions during stress was identified and the results will be useful in targeting important gene interactions to understand drought tolerance in greater detail.
Vanderweele, Tyler J; Ko, Yi-An; Mukherjee, Bhramar
We show that, in the presence of uncontrolled environmental confounding, joint tests for the presence of a main genetic effect and gene-environment interaction will be biased if the genetic and environmental factors are correlated, even if there is no effect of either the genetic factor or the environmental factor on the disease. When environmental confounding is ignored, such tests will in fact reject the joint null of no genetic effect with a probability that tends to 1 as the sample size increases. This problem with the joint test vanishes under gene-environment independence, but it still persists if estimating the gene-environment interaction parameter itself is of interest. Uncontrolled environmental confounding will bias estimates of gene-environment interaction parameters even under gene-environment independence, but it will not do so if the unmeasured confounding variable itself does not interact with the genetic factor. Under gene-environment independence, if the interaction parameter without controlling for the environmental confounder is nonzero, then there is gene-environment interaction either between the genetic factor and the environmental factor of interest or between the genetic factor and the unmeasured environmental confounder. We evaluate several recently proposed joint tests in a simulation study and discuss the implications of these results for the conduct of gene-environment interaction studies.
Ellwanger, Daniel Christian; Leonhardt, Jörn Florian; Mewes, Hans-Werner
Understanding how regulatory networks globally coordinate the response of a cell to changing conditions, such as perturbations by shifting environments, is an elementary challenge in systems biology which has yet to be met. Genome-wide gene expression measurements are high dimensional as these are reflecting the condition-specific interplay of thousands of cellular components. The integration of prior biological knowledge into the modeling process of systems-wide gene regulation enables the large-scale interpretation of gene expression signals in the context of known regulatory relations. We developed COGERE (http://mips.helmholtz-muenchen.de/cogere), a method for the inference of condition-specific gene regulatory networks in human and mouse. We integrated existing knowledge of regulatory interactions from multiple sources to a comprehensive model of prior information. COGERE infers condition-specific regulation by evaluating the mutual dependency between regulator (transcription factor or miRNA) and target gene expression using prior information. This dependency is scored by the non-parametric, nonlinear correlation coefficient η(2) (eta squared) that is derived by a two-way analysis of variance. We show that COGERE significantly outperforms alternative methods in predicting condition-specific gene regulatory networks on simulated data sets. Furthermore, by inferring the cancer-specific gene regulatory network from the NCI-60 expression study, we demonstrate the utility of COGERE to promote hypothesis-driven clinical research. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Goh, K.-I.; Kahng, B.; Kim, D.
Understanding of how protein interaction networks of living organisms have evolved or are organized can be the first stepping stone in unveiling how life works on a fundamental ground. Here we introduce an in silico ``coevolutionary'' model for the protein interaction network and the protein family network. The essential ingredient of the model includes the protein family identity and its robustness under evolution, as well as the three previously proposed: gene duplication, divergence, and mutation. This model produces a prototypical feature of complex networks in a wide range of parameter space, following the generalized Pareto distribution in connectivity. Moreover, we investigate other structural properties of our model in detail with some specific values of parameters relevant to the yeast Saccharomyces cerevisiae, showing excellent agreement with the empirical data. Our model indicates that the physical constraints encoded via the domain structure of proteins play a crucial role in protein interactions.
Censi, Federica; Calcagnini, Giovanni; Mattei, Eugenio; Giuliani, Alessandro
Phenotypic changes at different organization levels from cell to entire organism are associated to changes in the pattern of gene expression. These changes involve the entire genome expression pattern and heavily rely upon correlation patterns among genes. The classical approach used to analyze gene expression data builds upon the application of supervised statistical techniques to detect genes differentially expressed among two or more phenotypes (e.g., normal vs. disease). The use of an a posteriori, unsupervised approach based on principal component analysis (PCA) and the subsequent construction of gene correlation networks can shed a light on unexpected behaviour of gene regulation system while maintaining a more naturalistic view on the studied system.In this chapter we applied an unsupervised method to discriminate DMD patient and controls. The genes having the highest absolute scores in the discrimination between the groups were then analyzed in terms of gene expression networks, on the basis of their mutual correlation in the two groups. The correlation network structures suggest two different modes of gene regulation in the two groups, reminiscent of important aspects of DMD pathogenesis.
Chuang, Yu-Hsuan; Lill, Christina M; Lee, Pei-Chen
BACKGROUND AND PURPOSE: Drinking caffeinated coffee has been reported to provide protection against Parkinson's disease (PD). Caffeine is an adenosine A2A receptor (encoded by the gene ADORA2A) antagonist that increases dopaminergic neurotransmission and Cytochrome P450 1A2 (gene: CYP1A2...
Pájaro, Manuel; Otero-Muras, Irene; Vázquez, Carlos; Alonso, Antonio A
Gene regulation is inherently stochastic. In many applications concerning Systems and Synthetic Biology such as the reverse engineering and the de novo design of genetic circuits, stochastic effects (yet potentially crucial) are often neglected due to the high computational cost of stochastic simulations. With advances in these fields there is an increasing need of tools providing accurate approximations of the stochastic dynamics of gene regulatory networks (GRNs) with reduced computational effort. This work presents SELANSI (SEmi-LAgrangian SImulation of GRNs), a software toolbox for the simulation of stochastic multidimensional gene regulatory networks. SELANSI exploits intrinsic structural properties of gene regulatory networks to accurately approximate the corresponding Chemical Master Equation with a partial integral differential equation that is solved by a semi-lagrangian method with high efficiency. Networks under consideration might involve multiple genes with self and cross regulations, in which genes can be regulated by different transcription factors. Moreover, the validity of the method is not restricted to a particular type of kinetics. The tool offers total flexibility regarding network topology, kinetics and parameterization, as well as simulation options. SELANSI runs under the MATLAB environment, and is available under GPLv3 license at https://sites.google.com/view/selansi. email@example.com. © The Author(s) 2017. Published by Oxford University Press.
Boutwell, Brian B; Menard, Scott; Barnes, J C; Beaver, Kevin M; Armstrong, Todd A; Boisvert, Danielle
A host of research has examined the possibility that environmental risk factors might condition the influence of genes on various outcomes. Less research, however, has been aimed at exploring the possibility that genetic factors might interact to impact the emergence of human traits. Even fewer studies exist examining the interaction of genes in the prediction of behavioral outcomes. The current study expands this body of research by testing the interaction between genes involved in neural transmission. Our findings suggest that certain dopamine genes interact to increase the odds of criminogenic outcomes in a national sample of Americans. Copyright © 2014 Elsevier Inc. All rights reserved.
Shannon, Paul; Markiel, Andrew; Ozier, Owen; Baliga, Nitin S; Wang, Jonathan T; Ramage, Daniel; Amin, Nada; Schwikowski, Benno; Ideker, Trey
Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.
Jordan K. Boutilier
Full Text Available The pulmonary myocardium is a muscular coat surrounding the pulmonary and caval veins. Although its definitive physiological function is unknown, it may have a pathological role as the source of ectopic beats initiating atrial fibrillation. How the pulmonary myocardium gains pacemaker function is not clearly defined, although recent evidence indicates that changed transcriptional gene expression networks are at fault. The gene expression profile of this distinct cell type in situ was examined to investigate underlying molecular events that might contribute to atrial fibrillation. Via systems genetics, a whole-lung transcriptome data set from the BXD recombinant inbred mouse resource was analyzed, uncovering a pulmonary cardiomyocyte gene network of 24 transcripts, coordinately regulated by chromosome 1 and 2 loci. Promoter enrichment analysis and interrogation of publicly available ChIP-seq data suggested that transcription of this gene network may be regulated by the concerted activity of NKX2-5, serum response factor, myocyte enhancer factor 2, and also, at a post-transcriptional level, by RNA binding protein motif 20. Gene ontology terms indicate that this gene network overlaps with molecular markers of the stressed heart. Therefore, we propose that perturbed regulation of this gene network might lead to altered calcium handling, myocyte growth, and contractile force contributing to the aberrant electrophysiological properties observed in atrial fibrillation. We reveal novel molecular interactions and pathways representing possible therapeutic targets for atrial fibrillation. In addition, we highlight the utility of recombinant inbred mouse resources in detecting and characterizing gene expression networks of relatively small populations of cells that have a pathological significance.
Bruun, Jesper; Traxler, Adrienne
Centrality in student interaction networks (SINs) can be linked to variables like grades , persistence , and participation . Recent efforts in the field of network science have been done to investigate layered - or multiplex - networks as mathematical objects . These networks can be e......, this study investigates how target entropy [5,1] and pagerank [6,7] are affected when we take time and modes of interaction into account. We present our preliminary models and results and outline our future work in this area....
Zanata, Thais B.; Dalsgaard, Bo; Passos, Fernando C.
, such as plant species richness, asymmetry, latitude, insularity, topography, sampling methods and intensity. Results: Hummingbird–flower networks were more specialized than honeyeater–flower networks. Specifically, hummingbird–flower networks had a lower proportion of realized interactions (lower C), decreased...... in the interaction patterns with their floral resources. Location: Americas, Africa, Asia and Oceania/Australia. Methods: We compiled interaction networks between birds and floral resources for 79 hummingbird, nine sunbird and 33 honeyeater communities. Interaction specialization was quantified through connectance...... (C), complementary specialization (H2′), binary (QB) and weighted modularity (Q), with both observed and null-model corrected values. We compared interaction specialization among the three types of bird–flower communities, both independently and while controlling for potential confounding variables...
Full Text Available Protein interactions play a vital part in the function of a cell. As experimental techniques for detection and validation of protein interactions are time consuming, there is a need for computational methods for this task. Protein interactions appear to form a network with a relatively high degree of local clustering. In this paper we exploit this clustering by suggesting a score based on triplets of observed protein interactions. The score utilises both protein characteristics and network properties. Our score based on triplets is shown to complement existing techniques for predicting protein interactions, outperforming them on data sets which display a high degree of clustering. The predicted interactions score highly against test measures for accuracy. Compared to a similar score derived from pairwise interactions only, the triplet score displays higher sensitivity and specificity. By looking at specific examples, we show how an experimental set of interactions can be enriched and validated. As part of this work we also examine the effect of different prior databases upon the accuracy of prediction and find that the interactions from the same kingdom give better results than from across kingdoms, suggesting that there may be fundamental differences between the networks. These results all emphasize that network structure is important and helps in the accurate prediction of protein interactions. The protein interaction data set and the program used in our analysis, and a list of predictions and validations, are available at http://www.stats.ox.ac.uk/bioinfo/resources/PredictingInteractions.
Mallik, Mrinmay Kumar
Biological networks can be analyzed using "Centrality Analysis" to identify the more influential nodes and interactions in the network. This study was undertaken to create and visualize a biological network comprising of protein-protein interactions (PPIs) amongst proteins which are preferentially over-expressed in glioma cancer stem cell component (GCSC) of glioblastomas as compared to the glioma non-stem cancer cell (GNSC) component and then to analyze this network through centrality analyses (CA) in order to identify the essential proteins in this network and their interactions. In addition, this study proposes a new centrality analysis method pertaining exclusively to transcription factors (TFs) and interactions amongst them. Moreover the relevant molecular functions, biological processes and biochemical pathways amongst these proteins were sought through enrichment analysis. A protein interaction network was created using a list of proteins which have been shown to be preferentially expressed or over-expressed in GCSCs isolated from glioblastomas as compared to the GNSCs. This list comprising of 38 proteins, created using manual literature mining, was submitted to the Reactome FIViz tool, a web based application integrated into Cytoscape, an open source software platform for visualizing and analyzing molecular interaction networks and biological pathways to produce the network. This network was subjected to centrality analyses utilizing ranked lists of six centrality measures using the FIViz application and (for the first time) a dedicated centrality analysis plug-in ; CytoNCA. The interactions exclusively amongst the transcription factors were nalyzed through a newly proposed centrality analysis method called "Gene Expression Associated Degree Centrality Analysis (GEADCA)". Enrichment analysis was performed using the "network function analysis" tool on Reactome. The CA was able to identify a small set of proteins with consistently high centrality ranks that
Linksvayer, Timothy A; Fewell, Jennifer H; Gadau, Jürgen; Laubichler, Manfred D
The evolution and development of complex phenotypes in social insect colonies, such as queen-worker dimorphism or division of labor, can, in our opinion, only be fully understood within an expanded mechanistic framework of Developmental Evolution. Conversely, social insects offer a fertile research area in which fundamental questions of Developmental Evolution can be addressed empirically. We review the concept of gene regulatory networks (GRNs) that aims to fully describe the battery of interacting genomic modules that are differentially expressed during the development of individual organisms. We discuss how distinct types of network models have been used to study different levels of biological organization in social insects, from GRNs to social networks. We propose that these hierarchical networks spanning different organizational levels from genes to societies should be integrated and incorporated into full GRN models to elucidate the evolutionary and developmental mechanisms underlying social insect phenotypes. Finally, we discuss prospects and approaches to achieve such an integration. © 2012 WILEY PERIODICALS, INC.
Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Li, Yanpeng; Xu, Bo
Protein complexes are important for unraveling the secrets of cellular organization and function. Many computational approaches have been developed to predict protein complexes in protein-protein interaction (PPI) networks. However, most existing approaches focus mainly on the topological structure of PPI networks, and largely ignore the gene ontology (GO) annotation information. In this paper, we constructed ontology attributed PPI networks with PPI data and GO resource. After constructing ontology attributed networks, we proposed a novel approach called CSO (clustering based on network structure and ontology attribute similarity). Structural information and GO attribute information are complementary in ontology attributed networks. CSO can effectively take advantage of the correlation between frequent GO annotation sets and the dense subgraph for protein complex prediction. Our proposed CSO approach was applied to four different yeast PPI data sets and predicted many well-known protein complexes. The experimental results showed that CSO was valuable in predicting protein complexes and achieved state-of-the-art performance.
Quevedo-Tumailli, Viviana F; Ortega-Tenezaca, Bernabé; González-Díaz, Humbert
The spatial distribution of genes in chromosomes seems not to be random. For instance, only 10% of genes are transcribed from bidirectional promoters in humans, and many more are organized into larger clusters. This raises intriguing questions previously asked by different authors. We would like to add a few more questions in this context, related to gene orientation inversions. Does gene orientation (inversion) follow a random pattern? Is it relevant to biological activity somehow? We define a new kind of network coined as the gene orientation inversion network (GOIN). GOIN's complex network encodes short- and long-range patterns of inversion of the orientation of pairs of gene in the chromosome. We selected Plasmodium falciparum as a case of study due to the high relevance of this parasite to public health (causal agent of malaria). We constructed here for the first time all of the GOINs for the genome of this parasite. These networks have an average of 383 nodes (genes in one chromosome) and 1314 links (pairs of gene with inverse orientation). We calculated node centralities and other parameters of these networks. These numerical parameters were used to study different properties of gene inversion patterns, for example, distribution, local communities, similarity to Erdös-Rényi random networks, randomness, and so on. We find clues that seem to indicate that gene orientation inversion does not follow a random pattern. We noted that some gene communities in the GOINs tend to group genes encoding for RIFIN-related proteins in the proteome of the parasite. RIFIN-like proteins are a second family of clonally variant proteins expressed on the surface of red cells infected with Plasmodium falciparum. Consequently, we used these centralities as input of machine learning (ML) models to predict the RIFIN-like activity of 5365 proteins in the proteome of Plasmodium sp. The best linear ML model found discriminates RIFIN-like from other proteins with sensitivity and
Full Text Available Abstract Background Network motifs provided a “conceptual tool” for understanding the functional principles of biological networks, but such motifs have primarily been used to consider static network structures. Static networks, however, cannot be used to reveal time- and region-specific traits of biological systems. To overcome this limitation, we proposed the concept of a “spatiotemporal network motif,” a spatiotemporal sequence of network motifs of sub-networks which are active only at specific time points and body parts. Results On the basis of this concept, we analyzed the developmental gene regulatory network of the Drosophila melanogaster embryo. We identified spatiotemporal network motifs and investigated their distribution pattern in time and space. As a result, we found how key developmental processes are temporally and spatially regulated by the gene network. In particular, we found that nested feedback loops appeared frequently throughout the entire developmental process. From mathematical simulations, we found that mutual inhibition in the nested feedback loops contributes to the formation of spatial expression patterns. Conclusions Taken together, the proposed concept and the simulations can be used to unravel the design principle of developmental gene regulatory networks.
Ma, Chunhui; Lv, Qi; Teng, Songsong; Yu, Yinxian; Niu, Kerun; Yi, Chengqin
This study aimed to identify rheumatoid arthritis (RA) related genes based on microarray data using the WGCNA (weighted gene co-expression network analysis) method. Two gene expression profile datasets GSE55235 (10 RA samples and 10 healthy controls) and GSE77298 (16 RA samples and seven healthy controls) were downloaded from Gene Expression Omnibus database. Characteristic genes were identified using metaDE package. WGCNA was used to find disease-related networks based on gene expression correlation coefficients, and module significance was defined as the average gene significance of all genes used to assess the correlation between the module and RA status. Genes in the disease-related gene co-expression network were subject to functional annotation and pathway enrichment analysis using Database for Annotation Visualization and Integrated Discovery. Characteristic genes were also mapped to the Connectivity Map to screen small molecules. A total of 599 characteristic genes were identified. For each dataset, characteristic genes in the green, red and turquoise modules were most closely associated with RA, with gene numbers of 54, 43 and 79, respectively. These genes were enriched in totally enriched in 17 Gene Ontology terms, mainly related to immune response (CD97, FYB, CXCL1, IKBKE, CCR1, etc.), inflammatory response (CD97, CXCL1, C3AR1, CCR1, LYZ, etc.) and homeostasis (C3AR1, CCR1, PLN, CCL19, PPT1, etc.). Two small-molecule drugs sanguinarine and papaverine were predicted to have a therapeutic effect against RA. Genes related to immune response, inflammatory response and homeostasis presumably have critical roles in RA pathogenesis. Sanguinarine and papaverine have a potential therapeutic effect against RA. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.
The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too.
Tuller, Tamir; Atar, Shimshi; Ruppin, Eytan; Gurevich, Michael; Achiron, Anat
Multiple sclerosis (MS) is a central nervous system autoimmune inflammatory T-cell-mediated disease with a relapsing-remitting course in the majority of patients. In this study, we performed a high-resolution systems biology analysis of gene expression and physical interactions in MS relapse and remission. To this end, we integrated 164 large-scale measurements of gene expression in peripheral blood mononuclear cells of MS patients in relapse or remission and healthy subjects, with large-scale information about the physical interactions between these genes obtained from public databases. These data were analyzed with a variety of computational methods. We find that there is a clear and significant global network-level signal that is related to the changes in gene expression of MS patients in comparison to healthy subjects. However, despite the clear differences in the clinical symptoms of MS patients in relapse versus remission, the network level signal is weaker when comparing patients in these two stages of the disease. This result suggests that most of the genes have relatively similar expression levels in the two stages of the disease. In accordance with previous studies, we found that the pathways related to regulation of cell death, chemotaxis and inflammatory response are differentially expressed in the disease in comparison to healthy subjects, while pathways related to cell adhesion, cell migration and cell-cell signaling are activated in relapse in comparison to remission. However, the current study includes a detailed report of the exact set of genes involved in these pathways and the interactions between them. For example, we found that the genes TP53 and IL1 are 'network-hub' that interacts with many of the differentially expressed genes in MS patients versus healthy subjects, and the epidermal growth factor receptor is a 'network-hub' in the case of MS patients with relapse versus remission. The statistical approaches employed in this study enabled us
Joah R. MADDEN, Johanna F. NIELSEN, Tim H. CLUTTON-BROCK
Full Text Available The underlying kin structure of groups of animals may be glimpsed from patterns of spatial position or temporal association between individuals, and is presumed to facilitate inclusive fitness benefits. Such structure may be evident at a finer, behavioural, scale with individuals preferentially interacting with kin. We tested whether kin structure within groups of meerkats Suricata suricatta matched three forms of social interaction networks: grooming, dominance or foraging competitions. Networks of dominance interactions were positively related to networks of kinship, with close relatives engaging in dominance interactions with each other. This relationship persisted even after excluding the breeding dominant pair and when we restricted the kinship network to only include links between first order kin, which are most likely to be able to discern kin through simple rules of thumb. Conversely, we found no relationship between kinship networks and either grooming networks or networks of foraging competitions. This is surprising because a positive association between kin in a grooming network, or a negative association between kin in a network of foraging competitions offers opportunities for inclusive fitness benefits. Indeed, the positive association between kin in a network of dominance interactions that we did detect does not offer clear inclusive fitness benefits to group members. We conclude that kin structure in behavioural interactions in meerkats may be driven by factors other than indirect fitness benefits, and that networks of cooperative behaviours such as grooming may be driven by direct benefits accruing to individuals perhaps through mutualism or manipulation [Current Zoology 58 (2: 319-328, 2012].
Joah R. MADDEN; Johanna F. NIEL SEN; Tim H. CLUTTON-BROCK
The underlying kin structure of groups of animals may be glimpsed from patterns of spatial position or temporal association between individuals,and is presumed to facilitate inclusive fitness benefits.Such structure may be evident at a finer,behavioural,scale with individuals preferentially interacting with kin.We tested whether kin structure within groups of meerkats Suricata suricatta matched three forms of social interaction networks:grooming,dominance or foraging competitions.Networks of dominance interactions were positively related to networks of kinship,with close relatives engaging in dominance interactions with each other.This relationship persisted even after excluding the breeding dominant pair and when we restricted the kinship network to only include links between first order kin,which are most likely to be able to discern kin through simple rules of thumb.Conversely,we found no relationship between kinship networks and either grooming networks or networks of foraging competitions.This is surprising because a positive association between kin in a grooming network,or a negative association between kin in a network of foraging competitions offers opportunities for inclusive fitness benefits.Indeed,the positive association between kin in a network of dominance interactions that we did detect does not offer clear inclusive fitness benefits to group members.We conclude that kin structure in behavioural interactions in meerkats may be driven by factors other than indirect fitness benefits,and that networks of cooperative behaviours such as grooming may be driven by direct benefits accruing to individuals perhaps through mutualism or manipulation [Current Zoology 58 (2):319-328,2012].
Zheng, Guangyong; Huang, Tao
In post-genomic era, an important task is to explore the function of individual biological molecules (i.e., gene, noncoding RNA, protein, metabolite) and their organization in living cells. For this end, gene regulatory networks (GRNs) are constructed to show relationship between biological molecules, in which the vertices of network denote biological molecules and the edges of network present connection between nodes (Strogatz, Nature 410:268-276, 2001; Bray, Science 301:1864-1865, 2003). Biologists can understand not only the function of biological molecules but also the organization of components of living cells through interpreting the GRNs, since a gene regulatory network is a comprehensively physiological map of living cells and reflects influence of genetic and epigenetic factors (Strogatz, Nature 410:268-276, 2001; Bray, Science 301:1864-1865, 2003). In this paper, we will review the inference methods of GRN reconstruction and analysis approaches of network structure. As a powerful tool for studying complex diseases and biological processes, the applications of the network method in pathway analysis and disease gene identification will be introduced.
Zhu, Chengyu; Guo, Xiaoli; Jin, Zheng; Sun, Junfeng; Qiu, Yihong; Zhu, Yisheng; Tong, Shanbao
To study the effect of brain development and ageing on the pattern of cortical interactive networks. By causality analysis of multichannel electroencephalograph (EEG) with partial directed coherence (PDC), we investigated the different neural networks involved in the whole cortex as well as the anterior and posterior areas in three age groups, i.e., children (0-10 years), mid-aged adults (26-38 years) and the elderly (56-80 years). By comparing the cortical interactive networks in different age groups, the following findings were concluded: (1) the cortical interactive network in the right hemisphere develops earlier than its left counterpart in the development stage; (2) the cortical interactive network of anterior cortex, especially at C3 and F3, is demonstrated to undergo far more extensive changes, compared with the posterior area during brain development and ageing; (3) the asymmetry of the cortical interactive networks declines during ageing with more loss of connectivity in the left frontal and central areas. The age-related variation of cortical interactive networks from resting EEG provides new insights into brain development and ageing. Our findings demonstrated that the PDC analysis of EEG is a powerful approach for characterizing the cortical functional connectivity during brain development and ageing. Copyright Â© 2010 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Tembine, Hamidou; Tempone, Raul; Vilanova, Pedro
In this paper we establish a relationship between chemical dynamics and mean field game dynamics. We show that chemical reaction networks can be studied using noisy mean field limits. We provide deterministic, noisy and switching mean field limits
Shubhada R Hegde
Full Text Available Response of cells to changing environmental conditions is governed by the dynamics of intricate biomolecular interactions. It may be reasonable to assume, proteins being the dominant macromolecules that carry out routine cellular functions, that understanding the dynamics of protein:protein interactions might yield useful insights into the cellular responses. The large-scale protein interaction data sets are, however, unable to capture the changes in the profile of protein:protein interactions. In order to understand how these interactions change dynamically, we have constructed conditional protein linkages for Escherichia coli by integrating functional linkages and gene expression information. As a case study, we have chosen to analyze UV exposure in wild-type and SOS deficient E. coli at 20 minutes post irradiation. The conditional networks exhibit similar topological properties. Although the global topological properties of the networks are similar, many subtle local changes are observed, which are suggestive of the cellular response to the perturbations. Some such changes correspond to differences in the path lengths among the nodes of carbohydrate metabolism correlating with its loss in efficiency in the UV treated cells. Similarly, expression of hubs under unique conditions reflects the importance of these genes. Various centrality measures applied to the networks indicate increased importance for replication, repair, and other stress proteins for the cells under UV treatment, as anticipated. We thus propose a novel approach for studying an organism at the systems level by integrating genome-wide functional linkages and the gene expression data.
Canard, E F; Mouquet, N; Mouillot, D; Stanko, M; Miklisova, D; Gravel, D
While niche-based processes have been invoked extensively to explain the structure of interaction networks, recent studies propose that neutrality could also be of great importance. Under the neutral hypothesis, network structure would simply emerge from random encounters between individuals and thus would be directly linked to species abundance. We investigated the impact of species abundance distributions on qualitative and quantitative metrics of 113 host-parasite networks. We analyzed the concordance between neutral expectations and empirical observations at interaction, species, and network levels. We found that species abundance accurately predicts network metrics at all levels. Despite host-parasite systems being constrained by physiology and immunology, our results suggest that neutrality could also explain, at least partially, their structure. We hypothesize that trait matching would determine potential interactions between species, while abundance would determine their realization.
Jia, Chen; Qian, Hong; Chen, Min; Zhang, Michael Q.
The transient response to a stimulus and subsequent recovery to a steady state are the fundamental characteristics of a living organism. Here we study the relaxation kinetics of autoregulatory gene networks based on the chemical master equation model of single-cell stochastic gene expression with nonlinear feedback regulation. We report a novel relation between the rate of relaxation, characterized by the spectral gap of the Markov model, and the feedback sign of the underlying gene circuit. When a network has no feedback, the relaxation rate is exactly the decaying rate of the protein. We further show that positive feedback always slows down the relaxation kinetics while negative feedback always speeds it up. Numerical simulations demonstrate that this relation provides a possible method to infer the feedback topology of autoregulatory gene networks by using time-series data of gene expression.
Eagle, Michael; Johnson, Matthew; Barnes, Tiffany
We introduce a novel data structure, the Interaction Network, for representing interaction-data from open problem solving environment tutors. We show how using network community detecting techniques are used to identify sub-goals in problems in a logic tutor. We then use those community structures to generate high level hints between sub-goals.…
Spreng, R Nathan; Schacter, Daniel L
We investigated age-related changes in default, attention, and control network activity and their interactions in young and old adults. Brain activity during autobiographical and visuospatial planning was assessed using multivariate analysis and with intrinsic connectivity networks as regions of interest. In both groups, autobiographical planning engaged the default network while visuospatial planning engaged the attention network, consistent with a competition between the domains of internalized and externalized cognition. The control network was engaged for both planning tasks. In young subjects, the control network coupled with the default network during autobiographical planning and with the attention network during visuospatial planning. In old subjects, default-to-control network coupling was observed during both planning tasks, and old adults failed to deactivate the default network during visuospatial planning. This failure is not indicative of default network dysfunction per se, evidenced by default network engagement during autobiographical planning. Rather, a failure to modulate the default network in old adults is indicative of a lower degree of flexible network interactivity and reduced dynamic range of network modulation to changing task demands.
Lucila G Alvarez-Zuzek
Full Text Available We propose and study a model for the interplay between two different dynamical processes -one for opinion formation and the other for decision making- on two interconnected networks A and B. The opinion dynamics on network A corresponds to that of the M-model, where the state of each agent can take one of four possible values (S = -2,-1, 1, 2, describing its level of agreement on a given issue. The likelihood to become an extremist (S = ±2 or a moderate (S = ±1 is controlled by a reinforcement parameter r ≥ 0. The decision making dynamics on network B is akin to that of the Abrams-Strogatz model, where agents can be either in favor (S = +1 or against (S = -1 the issue. The probability that an agent changes its state is proportional to the fraction of neighbors that hold the opposite state raised to a power β. Starting from a polarized case scenario in which all agents of network A hold positive orientations while all agents of network B have a negative orientation, we explore the conditions under which one of the dynamics prevails over the other, imposing its initial orientation. We find that, for a given value of β, the two-network system reaches a consensus in the positive state (initial state of network A when the reinforcement overcomes a crossover value r*(β, while a negative consensus happens for r βc. We develop an analytical mean-field approach that gives an insight into these regimes and shows that both dynamics are equivalent along the crossover line (r*, β*.
Berlanga, Adriana; Bitter-Rijpkema, Marlies; Brouns, Francis; Sloep, Peter; Fetter, Sibren
Berlanga, A. J., Bitter-Rijpkema, M., Brouns, F., Sloep, P. B., & Fetter, S. (2011). Personal Profiles: Enhancing Social Interaction in Learning Networks. International Journal of Web Based Communities, 7(1), 66-82.
Richiardi, Jonas; Altmann, Andre; Milazzo, Anna-Clare; Chang, Catie; Chakravarty, M Mallar; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Bromberg, Uli; Büchel, Christian; Conrod, Patricia; Fauth-Bühler, Mira; Flor, Herta; Frouin, Vincent; Gallinat, Jürgen; Garavan, Hugh; Gowland, Penny; Heinz, Andreas; Lemaître, Hervé; Mann, Karl F; Martinot, Jean-Luc; Nees, Frauke; Paus, Tomáš; Pausova, Zdenka; Rietschel, Marcella; Robbins, Trevor W; Smolka, Michael N; Spanagel, Rainer; Ströhle, Andreas; Schumann, Gunter; Hawrylycz, Mike; Poline, Jean-Baptiste; Greicius, Michael D
During rest, brain activity is synchronized between different regions widely distributed throughout the brain, forming functional networks. However, the molecular mechanisms supporting functional connectivity remain undefined. We show that functional brain networks defined with resting-state functional magnetic resonance imaging can be recapitulated by using measures of correlated gene expression in a post mortem brain tissue data set. The set of 136 genes we identify is significantly enriched for ion channels. Polymorphisms in this set of genes significantly affect resting-state functional connectivity in a large sample of healthy adolescents. Expression levels of these genes are also significantly associated with axonal connectivity in the mouse. The results provide convergent, multimodal evidence that resting-state functional networks correlate with the orchestrated activity of dozens of genes linked to ion channel activity and synaptic function. Copyright © 2015, American Association for the Advancement of Science.
Teschendorff, Andrew E.; Banerji, Christopher R. S.; Severini, Simone; Kuehn, Reimer; Sollich, Peter
One of the key characteristics of cancer cells is an increased phenotypic plasticity, driven by underlying genetic and epigenetic perturbations. However, at a systems-level it is unclear how these perturbations give rise to the observed increased plasticity. Elucidating such systems-level principles is key for an improved understanding of cancer. Recently, it has been shown that signaling entropy, an overall measure of signaling pathway promiscuity, and computable from integrating a sample's gene expression profile with a protein interaction network, correlates with phenotypic plasticity and is increased in cancer compared to normal tissue. Here we develop a computational framework for studying the effects of network perturbations on signaling entropy. We demonstrate that the increased signaling entropy of cancer is driven by two factors: (i) the scale-free (or near scale-free) topology of the interaction network, and (ii) a subtle positive correlation between differential gene expression and node connectivity. Indeed, we show that if protein interaction networks were random graphs, described by Poisson degree distributions, that cancer would generally not exhibit an increased signaling entropy. In summary, this work exposes a deep connection between cancer, signaling entropy and interaction network topology. PMID:25919796
Fricke, Evan C; Tewksbury, Joshua J; Rogers, Haldre S
Following defaunation, the loss of interactions with mutualists such as pollinators or seed dispersers may be compensated through increased interactions with remaining mutualists, ameliorating the negative cascading impacts on biodiversity. Alternatively, remaining mutualists may respond to altered competition by reducing the breadth or intensity of their interactions, exacerbating negative impacts on biodiversity. Despite the importance of these responses for our understanding of the dynamics of mutualistic networks and their response to global change, the mechanism and magnitude of interaction compensation within real mutualistic networks remains largely unknown. We examined differences in mutualistic interactions between frugivores and fruiting plants in two island ecosystems possessing an intact or disrupted seed dispersal network. We determined how changes in the abundance and behavior of remaining seed dispersers either increased mutualistic interactions (contributing to "interaction compensation") or decreased interactions (causing an "interaction deficit") in the disrupted network. We found a "rich-get-richer" response in the disrupted network, where remaining frugivores favored the plant species with highest interaction frequency, a dynamic that worsened the interaction deficit among plant species with low interaction frequency. Only one of five plant species experienced compensation and the other four had significant interaction deficits, with interaction frequencies 56-95% lower in the disrupted network. These results do not provide support for the strong compensating mechanisms assumed in theoretical network models, suggesting that existing network models underestimate the prevalence of cascading mutualism disruption after defaunation. This work supports a mutualist biodiversity-ecosystem functioning relationship, highlighting the importance of mutualist diversity for sustaining diverse and resilient ecosystems. © 2017 John Wiley & Sons Ltd.
Wolock, Samuel L.; Yates, Andrew; Petrill, Stephen A.; Bohland, Jason W.; Blair, Clancy; Li, Ning; Machiraju, Raghu; Huang, Kun; Bartlett, Christopher W.
Background: Numerous studies have examined gene × environment interactions (G × E) in cognitive and behavioral domains. However, these studies have been limited in that they have not been able to directly assess differential patterns of gene expression in the human brain. Here, we assessed G × E interactions using two publically available datasets…
Hu, Yan-Shi; Xin, Juncai; Hu, Ying; Zhang, Lei; Wang, Ju
Our understanding of the molecular mechanisms underlying Alzheimer's disease (AD) remains incomplete. Previous studies have revealed that genetic factors provide a significant contribution to the pathogenesis and development of AD. In the past years, numerous genes implicated in this disease have been identified via genetic association studies on candidate genes or at the genome-wide level. However, in many cases, the roles of these genes and their interactions in AD are still unclear. A comprehensive and systematic analysis focusing on the biological function and interactions of these genes in the context of AD will therefore provide valuable insights to understand the molecular features of the disease. In this study, we collected genes potentially associated with AD by screening publications on genetic association studies deposited in PubMed. The major biological themes linked with these genes were then revealed by function and biochemical pathway enrichment analysis, and the relation between the pathways was explored by pathway crosstalk analysis. Furthermore, the network features of these AD-related genes were analyzed in the context of human interactome and an AD-specific network was inferred using the Steiner minimal tree algorithm. We compiled 430 human genes reported to be associated with AD from 823 publications. Biological theme analysis indicated that the biological processes and biochemical pathways related to neurodevelopment, metabolism, cell growth and/or survival, and immunology were enriched in these genes. Pathway crosstalk analysis then revealed that the significantly enriched pathways could be grouped into three interlinked modules-neuronal and metabolic module, cell growth/survival and neuroendocrine pathway module, and immune response-related module-indicating an AD-specific immune-endocrine-neuronal regulatory network. Furthermore, an AD-specific protein network was inferred and novel genes potentially associated with AD were identified. By
Nadeem, Amina; Mumtaz, Sadaf; Naveed, Abdul Khaliq; Aslam, Muhammad; Siddiqui, Arif; Lodhi, Ghulam Mustafa; Ahmad, Tausif
Inflammation plays a significant role in the etiology of type 2 diabetes mellitus (T2DM). The rise in the pro-inflammatory cytokines is the essential step in glucotoxicity and lipotoxicity induced mitochondrial injury, oxidative stress and beta cell apoptosis in T2DM. Among the recognized markers are interleukin (IL)-6, IL-1, IL-10, IL-18, tissue necrosis factor-alpha (TNF-α), C-reactive protein, resistin, adiponectin, tissue plasminogen activator, fibrinogen and heptoglobins. Diabetes mellitus has firm genetic and very strong environmental influence; exhibiting a polygenic mode of inheritance. Many single nucleotide polymorphisms (SNPs) in various genes including those of pro and anti-inflammatory cytokines have been reported as a risk for T2DM. Not all the SNPs have been confirmed by unifying results in different studies and wide variations have been reported in various ethnic groups. The inter-ethnic variations can be explained by the fact that gene expression may be regulated by gene-gene, gene-environment and gene-nutrient interactions. This review highlights the impact of these interactions on determining the role of single nucleotide polymorphism of IL-6, TNF-α, resistin and adiponectin in pathogenesis of T2DM.
Full Text Available Background. Symptoms and signs (symptoms in brief are the essential clinical manifestations for individualized diagnosis and treatment in traditional Chinese medicine (TCM. To gain insights into the molecular mechanism of symptoms, we develop a computational approach to identify the candidate genes of symptoms. Methods. This paper presents a network-based approach for the integrated analysis of multiple phenotype-genotype data sources and the prediction of the prioritizing genes for the associated symptoms. The method first calculates the similarities between symptoms and diseases based on the symptom-disease relationships retrieved from the PubMed bibliographic database. Then the disease-gene associations and protein-protein interactions are utilized to construct a phenotype-genotype network. The PRINCE algorithm is finally used to rank the potential genes for the associated symptoms. Results. The proposed method gets reliable gene rank list with AUC (area under curve 0.616 in classification. Some novel genes like CALCA, ESR1, and MTHFR were predicted to be associated with headache symptoms, which are not recorded in the benchmark data set, but have been reported in recent published literatures. Conclusions. Our study demonstrated that by integrating phenotype-genotype relationships into a complex network framework it provides an effective approach to identify candidate genes of symptoms.
Oskari Kilpeläinen, Tuomas; Franks, Paul W
to an equal bout of physical activity. Individuals with specific genetic profiles are also expected to be more responsive to the beneficial effects of physical activity in the prevention of type 2 diabetes. Identification of such gene-physical activity interactions could give new insights into the biological...... the reader to the recent advances in the genetics of type 2 diabetes, summarize the current evidence on gene-physical activity interactions in relation to type 2 diabetes, and outline how information on gene-physical activity interactions might help improve the prevention and treatment of type 2 diabetes....... Finally, we will discuss the existing and emerging strategies that might enhance our ability to identify and exploit gene-physical activity interactions in the etiology of type 2 diabetes. © 2014 S. Karger AG, Basel....
Pozuelos, Joan P.; Paz-Alonso, Pedro M.; Castillo, Alejandro; Fuentes, Luis J.; Rueda, M. Rosario
In the present study, we investigated developmental trajectories of alerting, orienting, and executive attention networks and their interactions over childhood. Two cross-sectional experiments were conducted with different samples of 6-to 12-year-old children using modified versions of the attention network task (ANT). In Experiment 1 (N = 106),…
Liu, Wei; Li, Li; Ye, Hua; Tu, Wei
High-throughput biological technologies are now widely applied in biology and medicine, allowing scientists to monitor thousands of parameters simultaneously in a specific sample. However, it is still an enormous challenge to mine useful information from high-throughput data. The emergence of network biology provides deeper insights into complex bio-system and reveals the modularity in tissue/cellular networks. Correlation networks are increasingly used in bioinformatics applications. Weighted gene co-expression network analysis (WGCNA) tool can detect clusters of highly correlated genes. Therefore, we systematically reviewed the application of WGCNA in the study of disease diagnosis, pathogenesis and other related fields. First, we introduced principle, workflow, advantages and disadvantages of WGCNA. Second, we presented the application of WGCNA in disease, physiology, drug, evolution and genome annotation. Then, we indicated the application of WGCNA in newly developed high-throughput methods. We hope this review will help to promote the application of WGCNA in biomedicine research.
The Internet and other interactive networks are diffusing across the globe at rates that vary from country to country. Typically, economic and market structure variables are used to explain these differences. The addition of culture to these variables will provide a more robust understanding of the differences in Internet and interactive network diffusion. Existing analyses that identify culture as a predictor of diffusion do not adequately specificy the dimensions of culture and their imp...
Full Text Available Protein interaction networks have become a tool to study biological processes, either for predicting molecular functions or for designing proper new drugs to regulate the main biological interactions. Furthermore, such networks are known to be organized in sub-networks of proteins contributing to the same cellular function. However, the protein function prediction is not accurate and each protein has traditionally been assigned to only one function by the network formalism. By considering the network of the physical interactions between proteins of the yeast together with a manual and single functional classification scheme, we introduce a method able to reveal important information on protein function, at both micro- and macro-scale. In particular, the inspection of the properties of oscillatory dynamics on top of the protein interaction network leads to the identification of misclassification problems in protein function assignments, as well as to unveil correct identification of protein functions. We also demonstrate that our approach can give a network representation of the meta-organization of biological processes by unraveling the interactions between different functional classes.
Mark D Schroeder
Full Text Available The segmentation gene network of Drosophila consists of maternal and zygotic factors that generate, by transcriptional (cross- regulation, expression patterns of increasing complexity along the anterior-posterior axis of the embryo. Using known binding site information for maternal and zygotic gap transcription factors, the computer algorithm Ahab recovers known segmentation control elements (modules with excellent success and predicts many novel modules within the network and genome-wide. We show that novel module predictions are highly enriched in the network and typically clustered proximal to the promoter, not only upstream, but also in intronic space and downstream. When placed upstream of a reporter gene, they consistently drive patterned blastoderm expression, in most cases faithfully producing one or more pattern elements of the endogenous gene. Moreover, we demonstrate for the entire set of known and newly validated modules that Ahab's prediction of binding sites correlates well with the expression patterns produced by the modules, revealing basic rules governing their composition. Specifically, we show that maternal factors consistently act as activators and that gap factors act as repressors, except for the bimodal factor Hunchback. Our data suggest a simple context-dependent rule for its switch from repressive to activating function. Overall, the composition of modules appears well fitted to the spatiotemporal distribution of their positive and negative input factors. Finally, by comparing Ahab predictions with different categories of transcription factor input, we confirm the global regulatory structure of the segmentation gene network, but find odd skipped behaving like a primary pair-rule gene. The study expands our knowledge of the segmentation gene network by increasing the number of experimentally tested modules by 50%. For the first time, the entire set of validated modules is analyzed for binding site composition under a
Abstract Background Gene and protein interactions are commonly represented as networks, with the genes or proteins comprising the nodes and the relationship between them as edges. Motifs, or small local configurations of edges and nodes that arise repeatedly, can be used to simplify the interpretation of networks. Results We examined triplet motifs in a network of quantitative epistatic genetic relationships, and found a non-random distribution of particular motif classes. Individual motif classes were found to be associated with different functional properties, suggestive of an underlying biological significance. These associations were apparent not only for motif classes, but for individual positions within the motifs. As expected, NNN (all negative) motifs were strongly associated with previously reported genetic (i.e. synthetic lethal) interactions, while PPP (all positive) motifs were associated with protein complexes. The two other motif classes (NNP: a positive interaction spanned by two negative interactions, and NPP: a negative spanned by two positives) showed very distinct functional associations, with physical interactions dominating for the former but alternative enrichments, typical of biochemical pathways, dominating for the latter. Conclusion We present a model showing how NNP motifs can be used to recognize supportive relationships between protein complexes, while NPP motifs often identify opposing or regulatory behaviour between a gene and an associated pathway. The ability to use motifs to point toward underlying biological organizational themes is likely to be increasingly important as more extensive epistasis mapping projects in higher organisms begin.
Full Text Available Abstract Background High-throughput screens have revealed large-scale protein interaction networks defining most cellular functions. How the proteins were added to the protein interaction network during its growth is a basic and important issue. Network motifs represent the simplest building blocks of cellular machines and are of biological significance. Results Here we study the evolution of protein interaction networks from the perspective of network motifs. We find that in current protein interaction networks, proteins of the same age class tend to form motifs and such co-origins of motif constituents are affected by their topologies and biological functions. Further, we find that the proteins within motifs whose constituents are of the same age class tend to be densely interconnected, co-evolve and share the same biological functions, and these motifs tend to be within protein complexes. Conclusions Our findings provide novel evidence for the hypothesis of the additions of clustered interacting nodes and point out network motifs, especially the motifs with the dense topology and specific function may play important roles during this process. Our results suggest functional constraints may be the underlying driving force for such additions of clustered interacting nodes.
The interactions among different microbial populations in a community could play more important roles in determining ecosystem functioning than species numbers and their abundances, but very little is known about such network interactions at a community level. The goal of this project is to develop novel framework approaches and associated software tools to characterize the network interactions in microbial communities based on high throughput, large scale high-throughput metagenomics data and apply these approaches to understand the impacts of environmental changes (e.g., climate change, contamination) on network interactions among different nitrifying populations and associated microbial communities.
Full Text Available Combining path consistency (PC algorithms with conditional mutual information (CMI are widely used in reconstruction of gene regulatory networks. CMI has many advantages over Pearson correlation coefficient in measuring non-linear dependence to infer gene regulatory networks. It can also discriminate the direct regulations from indirect ones. However, it is still a challenge to select the conditional genes in an optimal way, which affects the performance and computation complexity of the PC algorithm. In this study, we develop a novel conditional mutual information-based algorithm, namely RPNI (Regulation Pattern based Network Inference, to infer gene regulatory networks. For conditional gene selection, we define the co-regulation pattern, indirect-regulation pattern and mixture-regulation pattern as three candidate patterns to guide the selection of candidate genes. To demonstrate the potential of our algorithm, we apply it to gene expression data from DREAM challenge. Experimental results show that RPNI outperforms existing conditional mutual information-based methods in both accuracy and time complexity for different sizes of gene samples. Furthermore, the robustness of our algorithm is demonstrated by noisy interference analysis using different types of noise.
Linghu, Bolan; Franzosa, Eric A; Xia, Yu
Networks of functional associations between genes have recently been successfully used for gene function and disease-related research. A typical approach for constructing such functional linkage gene networks (FLNs) is based on the integration of diverse high-throughput functional genomics datasets. Data integration is a nontrivial task due to the heterogeneous nature of the different data sources and their variable accuracy and completeness. The presence of correlations between data sources also adds another layer of complexity to the integration process. In this chapter we discuss an approach for constructing a human FLN from data integration and a subsequent application of the FLN to novel disease gene discovery. Similar approaches can be applied to nonhuman species and other discovery tasks.
Alanis Lobato, Gregorio
Genetic interaction (GI) detection impacts the understanding of human disease and the ability to design personalized treatment. The mapping of every GI in most organisms is far from complete due to the combinatorial amount of gene deletions and knockdowns required. Computational techniques to predict new interactions based only on network topology have been developed in network science but never applied to GI networks.We show that topological prediction of GIs is possible with high precision and propose a graph dissimilarity index that is able to provide robust prediction in both dense and sparse networks.Computational prediction of GIs is a strong tool to aid high-throughput GI determination. The dissimilarity index we propose in this article is able to attain precise predictions that reduce the universe of candidate GIs to test in the lab. © 2013 Elsevier Inc.
Alanis Lobato, Gregorio; Cannistraci, Carlo; Ravasi, Timothy
Genetic interaction (GI) detection impacts the understanding of human disease and the ability to design personalized treatment. The mapping of every GI in most organisms is far from complete due to the combinatorial amount of gene deletions and knockdowns required. Computational techniques to predict new interactions based only on network topology have been developed in network science but never applied to GI networks.We show that topological prediction of GIs is possible with high precision and propose a graph dissimilarity index that is able to provide robust prediction in both dense and sparse networks.Computational prediction of GIs is a strong tool to aid high-throughput GI determination. The dissimilarity index we propose in this article is able to attain precise predictions that reduce the universe of candidate GIs to test in the lab. © 2013 Elsevier Inc.
Venkatesan, Aravind; Tripathi, Sushil; Sanz de Galdeano, Alejandro; Blondé, Ward; Lægreid, Astrid; Mironov, Vladimir; Kuiper, Martin
Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.
Lipner, Ettie M.; Garcia, Benjamin J.; Strong, Michael
Tuberculosis and nontuberculous mycobacterial infections constitute a high burden of pulmonary disease in humans, resulting in over 1.5 million deaths per year. Building on the premise that genetic factors influence the instance, progression, and defense of infectious disease, we undertook a systems biology approach to investigate relationships among genetic factors that may play a role in increased susceptibility or control of mycobacterial infections. We combined literature and database mining with network analysis and pathway enrichment analysis to examine genes, pathways, and networks, involved in the human response to Mycobacterium tuberculosis and nontuberculous mycobacterial infections. This approach allowed us to examine functional relationships among reported genes, and to identify novel genes and enriched pathways that may play a role in mycobacterial susceptibility or control. Our findings suggest that the primary pathways and genes influencing mycobacterial infection control involve an interplay between innate and adaptive immune proteins and pathways. Signaling pathways involved in autoimmune disease were significantly enriched as revealed in our networks. Mycobacterial disease susceptibility networks were also examined within the context of gene-chemical relationships, in order to identify putative drugs and nutrients with potential beneficial immunomodulatory or anti-mycobacterial effects. PMID:26751573
Full Text Available The identification of disease-causing genes is a fundamental challenge in human health and of great importance in improving medical care, and provides a better understanding of gene functions. Recent computational approaches based on the interactions among human proteins and disease similarities have shown their power in tackling the issue. In this paper, a novel systematic and global method that integrates two heterogeneous networks for prioritizing candidate disease-causing genes is provided, based on the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein interactions. In this method, the association score function between a query disease and a candidate gene is defined as the weighted sum of all the association scores between similar diseases and neighbouring genes. Moreover, the topological correlation of these two heterogeneous networks can be incorporated into the definition of the score function, and finally an iterative algorithm is designed for this issue. This method was tested with 10-fold cross-validation on all 1,126 diseases that have at least a known causal gene, and it ranked the correct gene as one of the top ten in 622 of all the 1,428 cases, significantly outperforming a state-of-the-art method called PRINCE. The results brought about by this method were applied to study three multi-factorial disorders: breast cancer, Alzheimer disease and diabetes mellitus type 2, and some suggestions of novel causal genes and candidate disease-causing subnetworks were provided for further investigation.
Full Text Available Abstract Background Protein kinases and phosphatases regulate protein phosphorylation, a critical means of modulating protein function, stability and localization. The identification of functional networks for protein phosphatases has been slow due to their redundant nature and the lack of large-scale analyses. We hypothesized that a genome-scale analysis of genetic interactions using the Synthetic Genetic Array could reveal protein phosphatase functional networks. We apply this approach to the conserved type 1 protein phosphatase Glc7, which regulates numerous cellular processes in budding yeast. Results We created a novel glc7 catalytic mutant (glc7-E101Q. Phenotypic analysis indicates that this novel allele exhibits slow growth and defects in glucose metabolism but normal cell cycle progression and chromosome segregation. This suggests that glc7-E101Q is a hypomorphic glc7 mutant. Synthetic Genetic Array analysis of glc7-E101Q revealed a broad network of 245 synthetic sick/lethal interactions reflecting that many processes are required when Glc7 function is compromised such as histone modification, chromosome segregation and cytokinesis, nutrient sensing and DNA damage. In addition, mitochondrial activity and inheritance and lipid metabolism were identified as new processes involved in buffering Glc7 function. An interaction network among 95 genes genetically interacting with GLC7 was constructed by integration of genetic and physical interaction data. The obtained network has a modular architecture, and the interconnection among the modules reflects the cooperation of the processes buffering Glc7 function. Conclusion We found 245 genes required for the normal growth of the glc7-E101Q mutant. Functional grouping of these genes and analysis of their physical and genetic interaction patterns bring new information on Glc7-regulated processes.
Malod-Dognin, Noël; Ban, Kristina; Pržulj, Nataša
Paralleling the increasing availability of protein-protein interaction (PPI) network data, several network alignment methods have been proposed. Network alignments have been used to uncover functionally conserved network parts and to transfer annotations. However, due to the computational intractability of the network alignment problem, aligners are heuristics providing divergent solutions and no consensus exists on a gold standard, or which scoring scheme should be used to evaluate them. We comprehensively evaluate the alignment scoring schemes and global network aligners on large scale PPI data and observe that three methods, HUBALIGN, L-GRAAL and NATALIE, regularly produce the most topologically and biologically coherent alignments. We study the collective behaviour of network aligners and observe that PPI networks are almost entirely aligned with a handful of aligners that we unify into a new tool, Ulign. Ulign enables complete alignment of two networks, which traditional global and local aligners fail to do. Also, multiple mappings of Ulign define biologically relevant soft clusterings of proteins in PPI networks, which may be used for refining the transfer of annotations across networks. Hence, PPI networks are already well investigated by current aligners, so to gain additional biological insights, a paradigm shift is needed. We propose such a shift come from aligning all available data types collectively rather than any particular data type in isolation from others.
In this paper we establish a relationship between chemical dynamics and mean field game dynamics. We show that chemical reaction networks can be studied using noisy mean field limits. We provide deterministic, noisy and switching mean field limits and illustrate them with numerical examples. © 2011 IEEE.
Full Text Available MYB transcription factor (TF is one of the largest TF families and regulates defense responses to various stresses, hormone signaling as well as many metabolic and developmental processes in plants. Understanding these regulatory hierarchies of gene expression networks in response to developmental and environmental cues is a major challenge due to the complex interactions between the genetic elements. Correlation analyses are useful to unravel co-regulated gene pairs governing biological process as well as identification of new candidate hub genes in response to these complex processes. High throughput expression profiling data are highly useful for construction of co-expression networks. In the present study, we utilized transcriptome data for comprehensive regulatory network studies of MYB TFs by top down and guide gene approaches. More than 50% of OsMYBs were strongly correlated under fifty experimental conditions with 51 hub genes via top down approach. Further, clusters were identified using Markov Clustering (MCL. To maximize the clustering performance, parameter evaluation of the MCL inflation score (I was performed in terms of enriched GO categories by measuring F-score. Comparison of co-expressed cluster and clads analyzed from phylogenetic analysis signifies their evolutionarily conserved co-regulatory role. We utilized compendium of known interaction and biological role with Gene Ontology enrichment analysis to hypothesize function of coexpressed OsMYBs. In the other part, the transcriptional regulatory network analysis by guide gene approach revealed 40 putative targets of 26 OsMYB TF hubs with high correlation value utilizing 815 microarray data. The putative targets with MYB-binding cis-elements enrichment in their promoter region, functional co-occurrence as well as nuclear localization supports our finding. Specially, enrichment of MYB binding regions involved in drought-inducibility implying their regulatory role in drought
Pluripotency is a state that exists transiently in the early embryo and, remarkably, can be recapitulated in vitro by deriving embryonic stem cells or by reprogramming somatic cells to become induced pluripotent stem cells. The state of pluripotency, which is stabilized by an interconnected network of pluripotency-associated genes, integrates external signals and exerts control over the decision between self-renewal and differentiation at the transcriptional, post-transcriptional and epigenetic levels. Recent evidence of alternative pluripotency states indicates the regulatory flexibility of this network. Insights into the underlying principles of the pluripotency network may provide unprecedented opportunities for studying development and for regenerative medicine.
Li, Mo; Belmonte, Juan Carlos Izpisua
Pluripotency is a state that exists transiently in the early embryo and, remarkably, can be recapitulated in vitro by deriving embryonic stem cells or by reprogramming somatic cells to become induced pluripotent stem cells. The state of pluripotency, which is stabilized by an interconnected network of pluripotency-associated genes, integrates external signals and exerts control over the decision between self-renewal and differentiation at the transcriptional, post-transcriptional and epigenetic levels. Recent evidence of alternative pluripotency states indicates the regulatory flexibility of this network. Insights into the underlying principles of the pluripotency network may provide unprecedented opportunities for studying development and for regenerative medicine.
Ba, Qian; Li, Junyang; Huang, Chao; Li, Jingquan; Chu, Ruiai; Wu, Yongning; Wang, Hui
Benzo(a)pyrene is a common environmental and foodborne pollutant that has been identified as a human carcinogen. Although the carcinogenicity of benzo(a)pyrene has been extensively reported, its precise molecular mechanisms and the influence on system-level protein networks are not well understood. To investigate the system-level influence of benzo(a)pyrene on protein interactions and regulatory networks, a benzo(a)pyrene-rewired protein interaction network was constructed based on 769 key proteins derived from more than 500 literature reports. The protein interaction network rewired by benzo(a)pyrene was a scale-free, highly-connected biological system. Ten modules were identified, and 25 signaling pathways were enriched, most of which belong to the human diseases category, especially cancer and infectious disease. In addition, two lung-specific and two liver-specific pathways were identified. Three pathways were specific in short and medium-term networks (< 48 h), and five pathways were enriched only in the medium-term network (6 h–48 h). Finally, the expression of linker genes in the network was validated by Western blotting. These findings establish the overall, tissue- and time-specific benzo(a)pyrene-rewired protein interaction networks and provide insights into the biological effects and molecular mechanisms of action of benzo(a)pyrene. - Highlights: • Benzo(a)pyrene induced scale-free, highly-connected protein interaction networks. • 25 signaling pathways were enriched through modular analysis. • Tissue- and time-specific pathways were identified
Ba, Qian [Key Laboratory of Food Safety Research, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai (China); Key Laboratory of Food Safety Risk Assessment, Ministry of Health, Beijing (China); Li, Junyang; Huang, Chao [Key Laboratory of Food Safety Research, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai (China); Li, Jingquan; Chu, Ruiai [Key Laboratory of Food Safety Research, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai (China); Key Laboratory of Food Safety Risk Assessment, Ministry of Health, Beijing (China); Wu, Yongning, E-mail: firstname.lastname@example.org [Key Laboratory of Food Safety Risk Assessment, Ministry of Health, Beijing (China); Wang, Hui, E-mail: email@example.com [Key Laboratory of Food Safety Research, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai (China); Key Laboratory of Food Safety Risk Assessment, Ministry of Health, Beijing (China); School of Life Science and Technology, ShanghaiTech University, Shanghai (China)
Benzo(a)pyrene is a common environmental and foodborne pollutant that has been identified as a human carcinogen. Although the carcinogenicity of benzo(a)pyrene has been extensively reported, its precise molecular mechanisms and the influence on system-level protein networks are not well understood. To investigate the system-level influence of benzo(a)pyrene on protein interactions and regulatory networks, a benzo(a)pyrene-rewired protein interaction network was constructed based on 769 key proteins derived from more than 500 literature reports. The protein interaction network rewired by benzo(a)pyrene was a scale-free, highly-connected biological system. Ten modules were identified, and 25 signaling pathways were enriched, most of which belong to the human diseases category, especially cancer and infectious disease. In addition, two lung-specific and two liver-specific pathways were identified. Three pathways were specific in short and medium-term networks (< 48 h), and five pathways were enriched only in the medium-term network (6 h–48 h). Finally, the expression of linker genes in the network was validated by Western blotting. These findings establish the overall, tissue- and time-specific benzo(a)pyrene-rewired protein interaction networks and provide insights into the biological effects and molecular mechanisms of action of benzo(a)pyrene. - Highlights: • Benzo(a)pyrene induced scale-free, highly-connected protein interaction networks. • 25 signaling pathways were enriched through modular analysis. • Tissue- and time-specific pathways were identified.
Zhou, Xionghui; Liu, Juan
Although many methods have been proposed to reconstruct gene regulatory network, most of them, when applied in the sample-based data, can not reveal the gene regulatory relations underlying the phenotypic change (e.g. normal versus cancer). In this paper, we adopt phenotype as a variable when constructing the gene regulatory network, while former researches either neglected it or only used it to select the differentially expressed genes as the inputs to construct the gene regulatory network. To be specific, we integrate phenotype information with gene expression data to identify the gene dependency pairs by using the method of conditional mutual information. A gene dependency pair (A,B) means that the influence of gene A on the phenotype depends on gene B. All identified gene dependency pairs constitute a directed network underlying the phenotype, namely gene dependency network. By this way, we have constructed gene dependency network of breast cancer from gene expression data along with two different phenotype states (metastasis and non-metastasis). Moreover, we have found the network scale free, indicating that its hub genes with high out-degrees may play critical roles in the network. After functional investigation, these hub genes are found to be biologically significant and specially related to breast cancer, which suggests that our gene dependency network is meaningful. The validity has also been justified by literature investigation. From the network, we have selected 43 discriminative hubs as signature to build the classification model for distinguishing the distant metastasis risks of breast cancer patients, and the result outperforms those classification models with published signatures. In conclusion, we have proposed a promising way to construct the gene regulatory network by using sample-based data, which has been shown to be effective and accurate in uncovering the hidden mechanism of the biological process and identifying the gene signature for
Full Text Available Small GTP binding proteins of the Ras superfamily (Ras, Rho, Rab, Arf, and Ran regulate key cellular processes such as signal transduction, cell proliferation, cell motility, and vesicle transport. A great deal of experimental evidence supports the existence of signaling cascades and feedback loops within and among the small GTPase subfamilies suggesting that these proteins function in a coordinated and cooperative manner. The interplay occurs largely through association with bi-partite regulatory and effector proteins but can also occur through the active form of the small GTPases themselves. In order to understand the connectivity of the small GTPases signaling routes, a systems-level approach that analyzes data describing direct and indirect interactions was used to construct the small GTPases protein interaction network. The data were curated from the Search Tool for the Retrieval of Interacting Genes (STRING database and include only experimentally validated interactions. The network method enables the conceptualization of the overall structure as well as the underlying organization of the protein-protein interactions. The interaction network described here is comprised of 778 nodes and 1943 edges and has a scale-free topology. Rac1, Cdc42, RhoA, and HRas are identified as the hubs. Ten sub-network motifs are also identified in this study with themes in apoptosis, cell growth/proliferation, vesicle traffic, cell adhesion/junction dynamics, the nicotinamide adenine dinucleotide phosphate (NADPH oxidase response, transcription regulation, receptor-mediated endocytosis, gene silencing, and growth factor signaling. Bottleneck proteins that bridge signaling paths and proteins that overlap in multiple small GTPase networks are described along with the functional annotation of all proteins in the network.
Tsuchiya, Masa; Selvarajoo, Kumar; Piras, Vincent; Tomita, Masaru; Giuliani, Alessandro
An exacerbated sensitivity to apparently minor stimuli and a general resilience of the entire system stay together side-by-side in biological systems. This apparent paradox can be explained by the consideration of biological systems as very strongly interconnected network systems. Some nodes of these networks, thanks to their peculiar location in the network architecture, are responsible for the sensitivity aspects, while the large degree of interconnection is at the basis of the resilience properties of the system. One relevant feature of the high degree of connectivity of gene regulation networks is the emergence of collective ordered phenomena influencing the entire genome and not only a specific portion of transcripts. The great majority of existing gene regulation models give the impression of purely local ‘hard-wired’ mechanisms disregarding the emergence of global ordered behavior encompassing thousands of genes while the general, genome wide, aspects are less known. Here we address, on a data analysis perspective, the discrimination between local and global scale regulations, this goal was achieved by means of the examination of two biological systems: innate immune response in macrophages and oscillating growth dynamics in yeast. Our aim was to reconcile the ‘hard-wired’ local view of gene regulation with a global continuous and scalable one borrowed from statistical physics. This reconciliation is based on the network paradigm in which the local ‘hard-wired’ activities correspond to the activation of specific crucial nodes in the regulation network, while the scalable continuous responses can be equated to the collective oscillations of the network after a perturbation.
Yeung, Enoch; Dy, Aaron J; Martin, Kyle B; Ng, Andrew H; Del Vecchio, Domitilla; Beck, James L; Collins, James J; Murray, Richard M
Synthetic gene expression is highly sensitive to intragenic compositional context (promoter structure, spacing regions between promoter and coding sequences, and ribosome binding sites). However, much less is known about the effects of intergenic compositional context (spatial arrangement and orientation of entire genes on DNA) on expression levels in synthetic gene networks. We compare expression of induced genes arranged in convergent, divergent, or tandem orientations. Induction of convergent genes yielded up to 400% higher expression, greater ultrasensitivity, and dynamic range than divergent- or tandem-oriented genes. Orientation affects gene expression whether one or both genes are induced. We postulate that transcriptional interference in divergent and tandem genes, mediated by supercoiling, can explain differences in expression and validate this hypothesis through modeling and in vitro supercoiling relaxation experiments. Treatment with gyrase abrogated intergenic context effects, bringing expression levels within 30% of each other. We rebuilt the toggle switch with convergent genes, taking advantage of supercoiling effects to improve threshold detection and switch stability. Copyright © 2017 Elsevier Inc. All rights reserved.
Full Text Available This paper focuses on the contribution of Australian Football League (AFL players to their team's on-field network by simulating player interactions within a chosen team list and estimating the net effect on final score margin. A Visual Basic computer program was written, firstly, to isolate the effective interactions between players from a particular team in all 2011 season matches and, secondly, to generate a symmetric interaction matrix for each match. Negative binomial distributions were fitted to each player pairing in the Geelong Football Club for the 2011 season, enabling an interactive match simulation model given the 22 chosen players. Dynamic player ratings were calculated from the simulated network using eigenvector centrality, a method that recognises and rewards interactions with more prominent players in the team network. The centrality ratings were recorded after every network simulation and then applied in final score margin predictions so that each player's match contribution-and, hence, an optimal team-could be estimated. The paper ultimately demonstrates that the presence of highly rated players, such as Geelong's Jimmy Bartel, provides the most utility within a simulated team network. It is anticipated that these findings will facilitate optimal AFL team selection and player substitutions, which are key areas of interest to coaches. Network simulations are also attractive for use within betting markets, specifically to provide information on the likelihood of a chosen AFL team list "covering the line".
Yoon, S.; Goltsev, A. V.; Mendes, J. F. F.
We explore structural stability of weighted and unweighted networks of positively interacting agents against a negative external field. We study how the agents support the activity of each other to confront the negative field, which suppresses the activity of agents and can lead to collapse of the whole network. The competition between the interactions and the field shape the structure of stable states of the system. In unweighted networks (uniform interactions) the stable states have the structure of k -cores of the interaction network. The interplay between the topology and the distribution of weights (heterogeneous interactions) impacts strongly the structural stability against a negative field, especially in the case of fat-tailed distributions of weights. We show that apart from critical slowing down there is also a critical change in the system structure that precedes the network collapse. The change can serve as an early warning of the critical transition. To characterize changes of network structure we develop a method based on statistical analysis of the k -core organization and so-called "corona" clusters belonging to the k -cores.
According to the CNIC Security Policy for Control Systems (EDMS #584092), interactive emailing on PCs (and other devices) connected to the Technical Network is prohibited. Please note that from November 6th, neither reading emails nor sending emails interactively using e.g. Outlook or Pine mail clients on PCs connected to the Technical Network will be possible anymore. However, automatically generated emails will not be blocked and can still be sent off using CERNMX.CERN.CH as mail server. These restrictions DO NOT apply to PCs connected to any other network, like the General Purpose (or office) network. If you have questions, please do not hesitate to contact Uwe Epting, Pierre Charrue or Stefan Lueders (Technical-Network.Administrator@cern.ch). Your CNIC Working Group
Zhao, Xiaowei; Li, Ping
In this paper we present an unsupervised neural network model of bilingual lexical development and interaction. We focus on how the representational structures of the bilingual lexicons can emerge, develop, and interact with each other as a function of the learning history. The results show that: (1) distinct representations for the two lexicons…
Full Text Available Abstract Background Data from high-throughput experiments of protein-protein interactions are commonly used to probe the nature of biological organization and extract functional relationships between sets of proteins. What has not been appreciated is that the underlying mechanisms involved in assembling these networks may exhibit considerable probabilistic behaviour. Results We find that the probability of an interaction between two proteins is generally proportional to the numerical product of their individual interacting partners, or degrees. The degree-weighted behaviour is manifested throughout the protein-protein interaction networks studied here, except for the high-degree, or hub, interaction areas. However, we find that the probabilities of interaction between the hubs are still high. Further evidence is provided by path length analyses, which show that these hubs are separated by very few links. Conclusion The results suggest that protein-protein interaction networks incorporate probabilistic elements that lead to scale-rich hierarchical architectures. These observations seem to be at odds with a biologically-guided organization. One interpretation of the findings is that we are witnessing the ability of proteins to indiscriminately bind rather than the protein-protein interactions that are actually utilized by the cell in biological processes. Therefore, the topological study of a degree-weighted network requires a more refined methodology to extract biological information about pathways, modules, or other inferred relationships among proteins.
Full Text Available Complex biological systems usually pose a trade-off between robustness and fragility where a small number of perturbations can substantially disrupt the system. Although biological systems are robust against changes in many external and internal conditions, even a single mutation can perturb the system substantially, giving rise to a pathophenotype. Recent advances in identifying and analyzing the sequential variations beneath human disorders help to comprehend a systemic view of the mechanisms underlying various disease phenotypes. Network-based disease-gene prioritization methods rank the relevance of genes in a disease under the hypothesis that genes whose proteins interact with each other tend to exhibit similar phenotypes. In this study, we have tested the robustness of several network-based disease-gene prioritization methods with respect to the perturbations of the system using various disease phenotypes from the Online Mendelian Inheritance in Man database. These perturbations have been introduced either in the protein-protein interaction network or in the set of known disease-gene associations. As the network-based disease-gene prioritization methods are based on the connectivity between known disease-gene associations, we have further used these methods to categorize the pathophenotypes with respect to the recoverability of hidden disease-genes. Our results have suggested that, in general, disease-genes are connected through multiple paths in the human interactome. Moreover, even when these paths are disturbed, network-based prioritization can reveal hidden disease-gene associations in some pathophenotypes such as breast cancer, cardiomyopathy, diabetes, leukemia, parkinson disease and obesity to a greater extend compared to the rest of the pathophenotypes tested in this study. Gene Ontology (GO analysis highlighted the role of functional diversity for such diseases.
Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard
The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the
McCormack, Theodore; Frings, Oliver; Alexeyenko, Andrey; Sonnhammer, Erik L L
Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.
The formation of the nervous system is a multistep process that yields a mature brain. Failure in any of the steps of this process may cause brain malfunction. In the early stages of embryonic development, neural progenitors quickly proliferate and then, at a specific moment, differentiate into neurons or glia. Once they become postmitotic neurons, they migrate to their final destinations and begin to extend their axons to connect with other neurons, sometimes located in quite distant regions, to establish different neural circuits. During the last decade, it has become evident that Zic genes, in addition to playing important roles in early development (e.g., gastrulation and neural tube closure), are involved in different processes of late brain development, such as neuronal migration, axon guidance, and refinement of axon terminals. ZIC proteins are therefore essential for the proper wiring and connectivity of the brain. In this chapter, we review our current knowledge of the role of Zic genes in the late stages of neural circuit formation.
Davis, Darren; Yaveroğlu, Ömer Nebil; Malod-Dognin, Noël; Stojmirovic, Aleksandar; Pržulj, Nataša
Proteins underlay the functioning of a cell and the wiring of proteins in protein-protein interaction network (PIN) relates to their biological functions. Proteins with similar wiring in the PIN (topology around them) have been shown to have similar functions. This property has been successfully exploited for predicting protein functions. Topological similarity is also used to guide network alignment algorithms that find similarly wired proteins between PINs of different species; these similarities are used to transfer annotation across PINs, e.g. from model organisms to human. To refine these functional predictions and annotation transfers, we need to gain insight into the variability of the topology-function relationships. For example, a function may be significantly associated with specific topologies, while another function may be weakly associated with several different topologies. Also, the topology-function relationships may differ between different species. To improve our understanding of topology-function relationships and of their conservation among species, we develop a statistical framework that is built upon canonical correlation analysis. Using the graphlet degrees to represent the wiring around proteins in PINs and gene ontology (GO) annotations to describe their functions, our framework: (i) characterizes statistically significant topology-function relationships in a given species, and (ii) uncovers the functions that have conserved topology in PINs of different species, which we term topologically orthologous functions. We apply our framework to PINs of yeast and human, identifying seven biological process and two cellular component GO terms to be topologically orthologous for the two organisms. © The Author 2015. Published by Oxford University Press.
Full Text Available Abstract Background Recent years have seen a dramatic increase in the use of mathematical modeling to gain insight into gene regulatory network behavior across many different organisms. In particular, there has been considerable interest in using mathematical tools to understand how multistable regulatory networks may contribute to developmental processes such as cell fate determination. Indeed, such a network may subserve the formation of unicellular leaf hairs (trichomes in the model plant Arabidopsis thaliana. Results In order to investigate the capacity of small gene regulatory networks to generate multiple equilibria, we present a chemical reaction network (CRN-based modeling formalism and describe a number of methods for CRN analysis in a parameter-free context. These methods are compared and applied to a full set of one-component subnetworks, as well as a large random sample from 40,680 similarly constructed two-component subnetworks. We find that positive feedback and cooperativity mediated by transcription factor (TF dimerization is a requirement for one-component subnetwork bistability. For subnetworks with two components, the presence of these processes increases the probability that a randomly sampled subnetwork will exhibit multiple equilibria, although we find several examples of bistable two-component subnetworks that do not involve cooperative TF-promoter binding. In the specific case of epidermal differentiation in Arabidopsis, dimerization of the GL3-GL1 complex and cooperative sequential binding of GL3-GL1 to the CPC promoter are each independently sufficient for bistability. Conclusion Computational methods utilizing CRN-specific theorems to rule out bistability in small gene regulatory networks are far superior to techniques generally applicable to deterministic ODE systems. Using these methods to conduct an unbiased survey of parameter-free deterministic models of small networks, and the Arabidopsis epidermal cell
Liu, Kang K L; Ma, Qianli D Y; Ivanov, Plamen Ch; Bartsch, Ronny P
The human organism is a complex network of interconnected organ systems, where the behavior of one system affects the dynamics of other systems. Identifying and quantifying dynamical networks of diverse physiologic systems under varied conditions is a challenge due to the complexity in the output dynamics of the individual systems and the transient and nonlinear characteristics of their coupling. We introduce a novel computational method based on the concept of time delay stability and major component analysis to investigate how organ systems interact as a network to coordinate their functions. We analyze a large database of continuously recorded multi-channel physiologic signals from healthy young subjects during night-time sleep. We identify a network of dynamic interactions between key physiologic systems in the human organism. Further, we find that each physiologic state is characterized by a distinct network structure with different relative contribution from individual organ systems to the global network dynamics. Specifically, we observe a gradual decrease in the strength of coupling of heart and respiration to the rest of the network with transition from wake to deep sleep, and in contrast, an increased relative contribution to network dynamics from chin and leg muscle tone and eye movement, demonstrating a robust association between network topology and physiologic function. (paper)
Nyamugudza, Tendai; Rajasekar, Venkatesh; Sen, Prasad; Nirmala, M.; Madhu Viswanatham, V.
Advancements in networking technology have seen more and more devices becoming connected day by day. This has given organizations capacity to extend their networks beyond their boundaries to remote offices and remote employees. However as the network grows security becomes a major challenge since the attack surface also increases. There is need to guard the network against different types of attacks like intrusion and malware through using different tools at different networking levels. This paper describes how network intelligence can be acquired through implementing a low-interaction honeypot which detects and track network intrusion. Honeypot allows an organization to interact and gather information about an attack earlier before it compromises the network. This process is important because it allows the organization to learn about future attacks of the same nature and allows them to develop counter measures. The paper further shows how honeypot-honey net based model for interruption detection system (IDS) can be used to get the best valuable information about the attacker and prevent unexpected harm to the network.
Li, Yi; Rao, Nini; Yang, Feng; Zhang, Ying; Yang, Yang; Liu, Han-ming; Guo, Fengbiao; Huang, Jian
Acid stress is one of the most serious threats that cyanobacteria have to face, and it has an impact at all levels from genome to phenotype. However, very little is known about the detailed response mechanism to acid stress in this species. We present here a general analysis of the gene regulatory network of Synechocystis sp. PCC 6803 in response to acid stress using comparative genome analysis and biocomputational prediction. In this study, we collected 85 genes and used them as an initial template to predict new genes through co-regulation, protein-protein interactions and the phylogenetic profile, and 179 new genes were obtained to form a complete template. In addition, we found that 11 enriched pathways such as glycolysis are closely related to the acid stress response. Finally, we constructed a regulatory network for the intricate relationship of these genes and summarize the key steps in response to acid stress. This is the first time a bioinformatic approach has been taken systematically to gene interactions in cyanobacteria and the elaboration of their cell metabolism and regulatory pathways under acid stress, which is more efficient than a traditional experimental study. The results also provide theoretical support for similar research into environmental stresses in cyanobacteria and possible industrial applications. Copyright © 2014 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Larson, Nicholas B; Schaid, Daniel J
Gene-gene interactions are increasingly being addressed as a potentially important contributor to the variability of complex traits. Consequently, attentions have moved beyond single locus analysis of association to more complex genetic models. Although several single-marker approaches toward interaction analysis have been developed, such methods suffer from very high testing dimensionality and do not take advantage of existing information, notably the definition of genes as functional units. Here, we propose a comprehensive family of gene-level score tests for identifying genetic elements of disease risk, in particular pairwise gene-gene interactions. Using kernel machine methods, we devise score-based variance component tests under a generalized linear mixed model framework. We conducted simulations based upon coalescent genetic models to evaluate the performance of our approach under a variety of disease models. These simulations indicate that our methods are generally higher powered than alternative gene-level approaches and at worst competitive with exhaustive SNP-level (where SNP is single-nucleotide polymorphism) analyses. Furthermore, we observe that simulated epistatic effects resulted in significant marginal testing results for the involved genes regardless of whether or not true main effects were present. We detail the benefits of our methods and discuss potential genome-wide analysis strategies for gene-gene interaction analysis in a case-control study design. © 2013 WILEY PERIODICALS, INC.
Full Text Available Several studies have reported gene expression signatures that predict recurrence risk in stage II and III colorectal cancer (CRC patients with minimal gene membership overlap and undefined biological relevance. The goal of this study was to investigate biological themes underlying these signatures, to infer genes of potential mechanistic importance to the CRC recurrence phenotype and to test whether accurate prognostic models can be developed using mechanistically important genes.We investigated eight published CRC gene expression signatures and found no functional convergence in Gene Ontology enrichment analysis. Using a random walk-based approach, we integrated these signatures and publicly available somatic mutation data on a protein-protein interaction network and inferred 487 genes that were plausible candidate molecular underpinnings for the CRC recurrence phenotype. We named the list of 487 genes a NEM signature because it integrated information from Network, Expression, and Mutation. The signature showed significant enrichment in four biological processes closely related to cancer pathophysiology and provided good coverage of known oncogenes, tumor suppressors, and CRC-related signaling pathways. A NEM signature-based Survival Support Vector Machine prognostic model was trained using a microarray gene expression dataset and tested on an independent dataset. The model-based scores showed a 75.7% concordance with the real survival data and separated patients into two groups with significantly different relapse-free survival (p = 0.002. Similar results were obtained with reversed training and testing datasets (p = 0.007. Furthermore, adjuvant chemotherapy was significantly associated with prolonged survival of the high-risk patients (p = 0.006, but not beneficial to the low-risk patients (p = 0.491.The NEM signature not only reflects CRC biology but also informs patient prognosis and treatment response. Thus, the network
Stifanelli, Patrizia F; Creanza, Teresa M; Anglani, Roberto; Liuzzi, Vania C; Mukherjee, Sayan; Schena, Francesco P; Ancona, Nicola
The inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology. In this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) 'ℓ(2C)' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that ℓ(2C) outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling. Software implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip. Copyright © 2013 The Authors. Published by
Breitkreutz, Ashton; Choi, Hyungwon; Sharom, Jeffrey R.; Boucher, Lorrie; Neduva, Victor; Larsen, Brett; Lin, Zhen-Yuan; Breitkreutz, Bobby-Joe; Stark, Chris; Liu, Guomin; Ahn, Jessica; Dewar-Darch, Danielle; Reguly, Teresa; Tang, Xiaojing; Almeida, Ricardo; Qin, Zhaohui Steve; Pawson, Tony; Gingras, Anne-Claude; Nesvizhskii, Alexey I.; Tyers, Mike
The interactions of protein kinases and phosphatases with their regulatory subunits and substrates underpin cellular regulation. We identified a kinase and phosphatase interaction (KPI) network of 1844 interactions in budding yeast by mass spectrometric analysis of protein complexes. The KPI network contained many dense local regions of interactions that suggested new functions. Notably, the cell cycle phosphatase Cdc14 associated with multiple kinases that revealed roles for Cdc14 in mitogen-activated protein kinase signaling, the DNA damage response, and metabolism, whereas interactions of the target of rapamycin complex 1 (TORC1) uncovered new effector kinases in nitrogen and carbon metabolism. An extensive backbone of kinase-kinase interactions cross-connects the proteome and may serve to coordinate diverse cellular responses. PMID:20489023
Ahmad, Shafqat; Rukh, Gull; Varga, Tibor V
Numerous obesity loci have been identified using genome-wide association studies. A UK study indicated that physical activity may attenuate the cumulative effect of 12 of these loci, but replication studies are lacking. Therefore, we tested whether the aggregate effect of these loci is diminished...... in adults of European ancestry reporting high levels of physical activity. Twelve obesity-susceptibility loci were genotyped or imputed in 111,421 participants. A genetic risk score (GRS) was calculated by summing the BMI-associated alleles of each genetic variant. Physical activity was assessed using self...... combined using meta-analysis weighted by cohort sample size. The meta-analysis yielded a statistically significant GRS × physical activity interaction effect estimate (Pinteraction = 0.015). However, a statistically significant interaction effect was only apparent in North American cohorts (n = 39...
Full Text Available Computational prediction of functionally related groups of genes (functional modules from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented.We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB, by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples.We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely
Colak, Recep; Moser, Flavia; Chu, Jeffrey Shih-Chieh; Schönhuth, Alexander; Chen, Nansheng; Ester, Martin
Computational prediction of functionally related groups of genes (functional modules) from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense) regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented. We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB), by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples. We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely available large
Full Text Available Human gene regulatory networks (GRN can be difficult to interpret due to a tangle of edges interconnecting thousands of genes. We constructed a general human GRN from extensive transcription factor and microRNA target data obtained from public databases. In a subnetwork of this GRN that is active during estrogen stimulation of MCF-7 breast cancer cells, we benchmarked automated algorithms for identifying core regulatory genes (transcription factors and microRNAs. Among these algorithms, we identified K-core decomposition, pagerank and betweenness centrality algorithms as the most effective for discovering core regulatory genes in the network evaluated based on previously known roles of these genes in MCF-7 biology as well as in their ability to explain the up or down expression status of up to 70% of the remaining genes. Finally, we validated the use of K-core algorithm for organizing the GRN in an easier to interpret layered hierarchy where more influential regulatory genes percolate towards the inner layers. The integrated human gene and miRNA network and software used in this study are provided as supplementary materials (S1 Data accompanying this manuscript.
Full Text Available Marbling is an important trait in characterization beef quality and a major factor for determining the price of beef in the Korean beef market. In particular, marbling is a complex trait and needs a system-level approach for identifying candidate genes related to the trait. To find the candidate gene associated with marbling, we used a weighted gene coexpression network analysis from the expression value of bovine genes. Hub genes were identified; they were topologically centered with large degree and BC values in the global network. We performed gene expression analysis to detect candidate genes in M. longissimus with divergent marbling phenotype (marbling scores 2 to 7 using qRT-PCR. The results demonstrate that transmembrane protein 60 (TMEM60 and dihydropyrimidine dehydrogenase (DPYD are associated with increasing marbling fat. We suggest that the network-based approach in livestock may be an important method for analyzing the complex effects of candidate genes associated with complex traits like marbling or tenderness.
Castillo Luis F.
Full Text Available Gene annotation is a process that encompasses multiple approaches on the analysis of nucleic acids or protein sequences in order to assign structural and functional characteristics to gene models. When thousands of gene models are being described in an organism genome, construction and visualization of gene networks impose novel challenges in the understanding of complex expression patterns and the generation of new knowledge in genomics research. In order to take advantage of accumulated text data after conventional gene sequence analysis, this work applied semantics in combination with visualization tools to build transcriptome networks from a set of coffee gene annotations. A set of selected coffee transcriptome sequences, chosen by the quality of the sequence comparison reported by Basic Local Alignment Search Tool (BLAST and Interproscan, were filtered out by coverage, identity, length of the query, and e-values. Meanwhile, term descriptors for molecular biology and biochemistry were obtained along the Wordnet dictionary in order to construct a Resource Description Framework (RDF using Ruby scripts and Methontology to find associations between concepts. Relationships between sequence annotations and semantic concepts were graphically represented through a total of 6845 oriented vectors, which were reduced to 745 non-redundant associations. A large gene network connecting transcripts by way of relational concepts was created where detailed connections remain to be validated for biological significance based on current biochemical and genetics frameworks. Besides reusing text information in the generation of gene connections and for data mining purposes, this tool development opens the possibility to visualize complex and abundant transcriptome data, and triggers the formulation of new hypotheses in metabolic pathways analysis.
Auman, Tzach; Chipman, Ariel D
Our understanding of the genetics of arthropod body plan development originally stems from work on Drosophila melanogaster from the late 1970s and onward. In Drosophila, there is a relatively detailed model for the network of gene interactions that proceeds in a sequential-hierarchical fashion to define the main features of the body plan. Over the years, we have a growing understanding of the networks involved in defining the body plan in an increasing number of arthropod species. It is now becoming possible to tease out the conserved aspects of these networks and to try to reconstruct their evolution. In this contribution, we focus on several key nodes of these networks, starting from early patterning in which the main axes are determined and the broad morphological domains of the embryo are defined, and on to later stage wherein the growth zone network is active in sequential addition of posterior segments. The pattern of conservation of networks is very patchy, with some key aspects being highly conserved in all arthropods and others being very labile. Many aspects of early axis patterning are highly conserved, as are some aspects of sequential segment generation. In contrast, regional patterning varies among different taxa, and some networks, such as the terminal patterning network, are only found in a limited range of taxa. The growth zone segmentation network is ancient and is probably plesiomorphic to all arthropods. In some insects, it has undergone significant modification to give rise to a more hardwired network that generates individual segments separately. In other insects and in most arthropods, the sequential segmentation network has undergone a significant amount of systems drift, wherein many of the genes have changed. However, it maintains a conserved underlying logic and function. © The Author 2017. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please
Smadar eBen-Tabou De-Leon
Full Text Available Developmental gene regulatory networks robustly control the timely activation of regulatory and differentiation genes. The structure of these networks underlies their capacity to buffer intrinsic and extrinsic noise and maintain embryonic morphology. Here I illustrate how the use of specific architectures by the sea urchin developmental regulatory networks enables the robust control of cell fate decisions. The Wnt-βcatenin signaling pathway patterns the primary embryonic axis while the BMP signaling pathway patterns the secondary embryonic axis in the sea urchin embryo and across bilateria. Interestingly, in the sea urchin in both cases, the signaling pathway that defines the axis controls directly the expression of a set of downstream regulatory genes. I propose that this direct activation of a set of regulatory genes enables a uniform regulatory response and a clear cut cell fate decision in the endoderm and in the dorsal ectoderm. The specification of the mesodermal pigment cell lineage is activated by Delta signaling that initiates a triple positive feedback loop that locks down the pigment specification state. I propose that the use of compound positive feedback circuitry provides the endodermal cells enough time to turn off mesodermal genes and ensures correct mesoderm vs. endoderm fate decision. Thus, I argue that understanding the control properties of repeatedly used regulatory architectures illuminates their role in embryogenesis and provides possible explanations to their resistance to evolutionary change.
Full Text Available Progress in uncovering the protein interaction networks of several species has led to questions of what underlying principles might govern their organization. Few studies have tried to determine the impact of protein interaction network evolution on the observed physiological differences between species. Using comparative genomics and structural information, we show here that eukaryotic species have rewired their interactomes at a fast rate of approximately 10(-5 interactions changed per protein pair, per million years of divergence. For Homo sapiens this corresponds to 10(3 interactions changed per million years. Additionally we find that the specificity of binding strongly determines the interaction turnover and that different biological processes show significantly different link dynamics. In particular, human proteins involved in immune response, transport, and establishment of localization show signs of positive selection for change of interactions. Our analysis suggests that a small degree of molecular divergence can give rise to important changes at the network level. We propose that the power law distribution observed in protein interaction networks could be partly explained by the cell's requirement for different degrees of protein binding specificity.
Jónsson, Björn Þór; Hoover, Amy K.; Risi, Sebastian
the space of potential sounds that can be generated through such compositional sound synthesis networks (CSSNs). To study the effect of evolution on subjective appreciation, participants in a listener study ranked evolved timbres by personal preference, resulting in preferences skewed toward the first......While the success of electronic music often relies on the uniqueness and quality of selected timbres, many musicians struggle with complicated and expensive equipment and techniques to create their desired sounds. Instead, this paper presents a technique for producing novel timbres that are evolved...
projects.htm, Site accessed January 5, 2009. 12. John S. Weir, Major, USAF, Mediated User-Simulator Interactive Command with Visualization ( MUSIC -V). Master’s...Computing Sciences in Colleges, December 2005). 14. Enrique Campos -Nanez, “nscript user manual,” Department of System Engineer- ing University of
Larremore, Daniel B.; Clauset, Aaron; Buckee, Caroline O.
The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences. PMID:24130474
Fung, Elizabeth-sharon [Los Alamos National Laboratory
Choice of a T-lymphoid fate by hematopoietic progenitor cells depends on sustained Notch-Delta signaling combined with tightly-regulated activities of multiple transcription factors. To dissect the regulatory network connections that mediate this process, we have used high-resolution analysis of regulatory gene expression trajectories from the beginning to the end of specification; tests of the short-term Notchdependence of these gene expression changes; and perturbation analyses of the effects of overexpression of two essential transcription factors, namely PU.l and GATA-3. Quantitative expression measurements of >50 transcription factor and marker genes have been used to derive the principal components of regulatory change through which T-cell precursors progress from primitive multipotency to T-lineage commitment. Distinct parts of the path reveal separate contributions of Notch signaling, GATA-3 activity, and downregulation of PU.l. Using BioTapestry, the results have been assembled into a draft gene regulatory network for the specification of T-cell precursors and the choice of T as opposed to myeloid dendritic or mast-cell fates. This network also accommodates effects of E proteins and mutual repression circuits of Gfil against Egr-2 and of TCF-l against PU.l as proposed elsewhere, but requires additional functions that remain unidentified. Distinctive features of this network structure include the intense dose-dependence of GATA-3 effects; the gene-specific modulation of PU.l activity based on Notch activity; the lack of direct opposition between PU.l and GATA-3; and the need for a distinct, late-acting repressive function or functions to extinguish stem and progenitor-derived regulatory gene expression.
Larremore, Daniel B; Clauset, Aaron; Buckee, Caroline O
The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.
Daniel B Larremore
Full Text Available The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs, and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.
Full Text Available A substantial proportion of protein interactions relies on small domains binding to short peptides in the partner proteins. Many of these interactions are relatively low affinity and transient, and they impact on signal transduction. However, neither the number of potential interactions mediated by each domain nor the degree of promiscuity at a whole proteome level has been investigated. We have used a combination of phage display and SPOT synthesis to discover all the peptides in the yeast proteome that have the potential to bind to eight SH3 domains. We first identified the peptides that match a relaxed consensus, as deduced from peptides selected by phage display experiments. Next, we synthesized all the matching peptides at high density on a cellulose membrane, and we probed them directly with the SH3 domains. The domains that we have studied were grouped by this approach into five classes with partially overlapping specificity. Within the classes, however, the domains display a high promiscuity and bind to a large number of common targets with comparable affinity. We estimate that the yeast proteome contains as few as six peptides that bind to the Abp1 SH3 domain with a dissociation constant lower than 100 microM, while it contains as many as 50-80 peptides with corresponding affinity for the SH3 domain of Yfr024c. All the targets of the Abp1 SH3 domain, identified by this approach, bind to the native protein in vivo, as shown by coimmunoprecipitation experiments. Finally, we demonstrate that this strategy can be extended to the analysis of the entire human proteome. We have developed an approach, named WISE (whole interactome scanning experiment, that permits rapid and reliable identification of the partners of any peptide recognition module by peptide scanning of a proteome. Since the SPOT synthesis approach is semiquantitative and provides an approximation of the dissociation constants of the several thousands of interactions that are
Full Text Available Abstract Background A wide range of techniques is now available for analyzing regulatory networks. Nonetheless, most of these techniques fail to interpret large-scale transcriptional data at the post-translational level. Results We address the question of using large-scale transcriptomic observation of a system perturbation to analyze a regulatory network which contained several types of interactions - transcriptional and post-translational. Our method consisted of post-processing the outputs of an open-source tool named BioQuali - an automatic constraint-based analysis mimicking biologist's local reasoning on a large scale. The post-processing relied on differences in the behavior of the transcriptional and post-translational levels in the network. As a case study, we analyzed a network representation of the genes and proteins controlled by an oncogene in the context of Ewing's sarcoma. The analysis allowed us to pinpoint active interactions specific to this cancer. We also identified the parts of the network which were incomplete and should be submitted for further investigation. Conclusions The proposed approach is effective for the qualitative analysis of cancer networks. It allows the integrative use of experimental data of various types in order to identify the specific information that should be considered a priority in the initial - and possibly very large - experimental dataset. Iteratively, new dataset can be introduced into the analysis to improve the network representation and make it more specific.
Full Text Available Marine organisms possess a series of cellular strategies to counteract the negative effects of toxic compounds, including the massive reorganization of gene expression networks. Here we report the modulated dose-dependent response of activated genes by diatom polyunsaturated aldehydes (PUAs in the sea urchin Paracentrotus lividus. PUAs are secondary metabolites deriving from the oxidation of fatty acids, inducing deleterious effects on the reproduction and development of planktonic and benthic organisms that feed on these unicellular algae and with anti-cancer activity. Our previous results showed that PUAs target several genes, implicated in different functional processes in this sea urchin. Using interactomic Ingenuity Pathway Analysis we now show that the genes targeted by PUAs are correlated with four HUB genes, NF-κB, p53, δ-2-catenin and HIF1A, which have not been previously reported for P. lividus. We propose a working model describing hypothetical pathways potentially involved in toxic aldehyde stress response in sea urchins. This represents the first report on gene networks affected by PUAs, opening new perspectives in understanding the cellular mechanisms underlying the response of benthic organisms to diatom exposure.
Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sanghyun
The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.
Full Text Available BACKGROUND: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. RESULTS: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. CONCLUSIONS: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.
Full Text Available Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC, is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC's ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting "disease map" network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung's disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks.
Merritt, Sears; Jacobs, Abigail Z.; Mason, Winter; Clauset, Aaron
In many complex social systems, the timing and frequency of interactions between individuals are observable but friendship ties are hidden. Recovering these hidden ties, particularly for casual users who are relatively less active, would enable a wide variety of friendship-aware applications in domains where labeled data are often unavailable, including online advertising and national security. Here, we investigate the accuracy of multiple statistical features, based either purely on temporal...
Ribarska, Teodora; Goering, Wolfgang; Droop, Johanna; Bastian, Klaus-Marius; Ingenwerth, Marc; Schulz, Wolfgang A
Multiple epigenetic alterations contribute to prostate cancer progression by deregulating gene expression. Epigenetic mechanisms, especially differential DNA methylation at imprinting control regions (termed DMRs), normally ensure the exclusive expression of imprinted genes from one specific parental allele. We therefore wondered to which extent imprinted genes become deregulated in prostate cancer and, if so, whether deregulation is due to altered DNA methylation at DMRs. Therefore, we selected presumptive deregulated imprinted genes from a previously conducted in silico analysis and from the literature and analyzed their expression in prostate cancer tissues by qRT-PCR. We found significantly diminished expression of PLAGL1/ZAC1, MEG3, NDN, CDKN1C, IGF2, and H19, while LIT1 was significantly overexpressed. The PPP1R9A gene, which is imprinted in selected tissues only, was strongly overexpressed, but was expressed biallelically in benign and cancerous prostatic tissues. Expression of many of these genes was strongly correlated, suggesting co-regulation, as in an imprinted gene network (IGN) reported in mice. Deregulation of the network genes also correlated with EZH2 and HOXC6 overexpression. Pyrosequencing analysis of all relevant DMRs revealed generally stable DNA methylation between benign and cancerous prostatic tissues, but frequent hypo- and hyper-methylation was observed at the H19 DMR in both benign and cancerous tissues. Re-expression of the ZAC1 transcription factor induced H19, CDKN1C and IGF2, supporting its function as a nodal regulator of the IGN. Our results indicate that a group of imprinted genes are coordinately deregulated in prostate cancers, independently of DNA methylation changes.
Kohane Isaac S
Full Text Available Abstract Background Biological processes are carried out by coordinated modules of interacting molecules. As clustering methods demonstrate that genes with similar expression display increased likelihood of being associated with a common functional module, networks of coexpressed genes provide one framework for assigning gene function. This has informed the guilt-by-association (GBA heuristic, widely invoked in functional genomics. Yet although the idea of GBA is accepted, the breadth of GBA applicability is uncertain. Results We developed methods to systematically explore the breadth of GBA across a large and varied corpus of expression data to answer the following question: To what extent is the GBA heuristic broadly applicable to the transcriptome and conversely how broadly is GBA captured by a priori knowledge represented in the Gene Ontology (GO? Our study provides an investigation of the functional organization of five coexpression networks using data from three mammalian organisms. Our method calculates a probabilistic score between each gene and each Gene Ontology category that reflects coexpression enrichment of a GO module. For each GO category we use Receiver Operating Curves to assess whether these probabilistic scores reflect GBA. This methodology applied to five different coexpression networks demonstrates that the signature of guilt-by-association is ubiquitous and reproducible and that the GBA heuristic is broadly applicable across the population of nine hundred Gene Ontology categories. We also demonstrate the existence of highly reproducible patterns of coexpression between some pairs of GO categories. Conclusion We conclude that GBA has universal value and that transcriptional control may be more modular than previously realized. Our analyses also suggest that methodologies combining coexpression measurements across multiple genes in a biologically-defined module can aid in characterizing gene function or in characterizing
França Gustavo S
Full Text Available Abstract Background Physical protein-protein interaction (PPI is a critical phenomenon for the function of most proteins in living organisms and a significant fraction of PPIs are the result of domain-domain interactions. Exon shuffling, intron-mediated recombination of exons from existing genes, is known to have been a major mechanism of domain shuffling in metazoans. Thus, we hypothesized that exon shuffling could have a significant influence in shaping the topology of PPI networks. Results We tested our hypothesis by compiling exon shuffling and PPI data from six eukaryotic species: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Cryptococcus neoformans and Arabidopsis thaliana. For all four metazoan species, genes enriched in exon shuffling events presented on average higher vertex degree (number of interacting partners in PPI networks. Furthermore, we verified that a set of protein domains that are simultaneously promiscuous (known to interact to multiple types of other domains, self-interacting (able to interact with another copy of themselves and abundant in the genomes presents a stronger signal for exon shuffling. Conclusions Exon shuffling appears to have been a recurrent mechanism for the emergence of new PPIs along metazoan evolution. In metazoan genomes, exon shuffling also promoted the expansion of some protein domains. We speculate that their promiscuous and self-interacting properties may have been decisive for that expansion.
Wang, Tao; Ho, Gloria; Ye, Kenny; Strickler, Howard; Elston, Robert C.
Genetic association studies achieve an unprecedented level of resolution in mapping disease genes by genotyping dense SNPs in a gene region. Meanwhile, these studies require new powerful statistical tools that can optimally handle a large amount of information provided by genotype data. A question that arises is how to model interactions between two genes. Simply modeling all possible interactions between the SNPs in two gene regions is not desirable because a greatly increased number of degrees of freedom can be involved in the test statistic. We introduce an approach to reduce the genotype dimension in modeling interactions. The genotype compression of this approach is built upon the information on both the trait and the cross-locus gametic disequilibrium between SNPs in two interacting genes, in such a way as to parsimoniously model the interactions without loss of useful information in the process of dimension reduction. As a result, it improves power to detect association in the presence of gene-gene interactions. This approach can be similarly applied for modeling gene-environment interactions. We compare this method with other approaches: the corresponding test without modeling any interaction, that based on a saturated interaction model, that based on principal component analysis, and that based on Tukey’s 1-df model. Our simulations suggest that this new approach has superior power to that of the other methods. In an application to endometrial cancer case-control data from the Women’s Health Initiative (WHI), this approach detected AKT1 and AKT2 as being significantly associated with endometrial cancer susceptibility by taking into account their interactions with BMI. PMID:18615621
Newman, M E J; Ferrario, Carrie R
The spread of certain diseases can be promoted, in some cases substantially, by prior infection with another disease. One example is that of HIV, whose immunosuppressant effects significantly increase the chances of infection with other pathogens. Such coinfection processes, when combined with nontrivial structure in the contact networks over which diseases spread, can lead to complex patterns of epidemiological behavior. Here we consider a mathematical model of two diseases spreading through a single population, where infection with one disease is dependent on prior infection with the other. We solve exactly for the sizes of the outbreaks of both diseases in the limit of large population size, along with the complete phase diagram of the system. Among other things, we use our model to demonstrate how diseases can be controlled not only by reducing the rate of their spread, but also by reducing the spread of other infections upon which they depend.
M E J Newman
Full Text Available The spread of certain diseases can be promoted, in some cases substantially, by prior infection with another disease. One example is that of HIV, whose immunosuppressant effects significantly increase the chances of infection with other pathogens. Such coinfection processes, when combined with nontrivial structure in the contact networks over which diseases spread, can lead to complex patterns of epidemiological behavior. Here we consider a mathematical model of two diseases spreading through a single population, where infection with one disease is dependent on prior infection with the other. We solve exactly for the sizes of the outbreaks of both diseases in the limit of large population size, along with the complete phase diagram of the system. Among other things, we use our model to demonstrate how diseases can be controlled not only by reducing the rate of their spread, but also by reducing the spread of other infections upon which they depend.
Full Text Available The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures.
Kandhro, Abdul H.; Shoombuatong, Watshara; Nantasenamat, Chanin; Prachayasittikul, Virapong; Nuchnoi, Pornlada
Background: Dyslipidemia is one of the major forms of lipid disorder, characterized by increased triglycerides (TGs), increased low-density lipoprotein-cholesterol (LDL-C), and decreased high-density lipoprotein-cholesterol (HDL-C) levels in blood. Recently, MicroRNAs (miRNAs) have been reported to involve in various biological processes; their potential usage being a biomarkers and in diagnosis of various diseases. Computational approaches including text mining have been used recently to analyze abstracts from the public databases to observe the relationships/associations between the biological molecules, miRNAs, and disease phenotypes. Materials and Methods: In the present study, significance of text mined extracted pair associations (miRNA-lipid disease) were estimated by one-sided Fisher's exact test. The top 20 significant miRNA-disease associations were visualized on Cytoscape. The CyTargetLinker plug-in tool on Cytoscape was used to extend the network and predicts new miRNA target genes. The Biological Networks Gene Ontology (BiNGO) plug-in tool on Cytoscape was used to retrieve gene ontology (GO) annotations for the targeted genes. Results: We retrieved 227 miRNA-lipid disease associations including 148 miRNAs. The top 20 significant miRNAs analysis on CyTargetLinker provides defined, predicted and validated gene targets, further targeted genes analyzed by BiNGO showed targeted genes were significantly associated with lipid, cholesterol, apolipoprotein, and fatty acids GO terms. Conclusion: We are the first to provide a reliable miRNA-lipid disease association network based on text mining. This could help future experimental studies that aim to validate predicted gene targets. PMID:29018475
Mielniczuk, Jan; Teisseyre, Paweł
Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures. © 2017 WILEY PERIODICALS, INC.
Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. PMID:25673742
Simonyan, Kristina; Fuertinger, Stefan
Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. Copyright © 2015 the American Physiological Society.
Erokhin, Maksim; Davydova, Anna; Kyrchanova, Olga; Parshikov, Alexander; Georgiev, Pavel; Chetverina, Darya
Chromatin insulators are regulatory elements involved in the modulation of enhancer-promoter communication. The 1A2 and Wari insulators are located immediately downstream of the Drosophila yellow and white genes, respectively. Using an assay based on the yeast GAL4 activator, we have found that both insulators are able to interact with their target promoters in transgenic lines, forming gene loops. The existence of an insulator-promoter loop is confirmed by the fact that insulator proteins could be detected on the promoter only in the presence of an insulator in the transgene. The upstream promoter regions, which are required for long-distance stimulation by enhancers, are not essential for promoter-insulator interactions. Both insulators support basal activity of the yellow and white promoters in eyes. Thus, the ability of insulators to interact with promoters might play an important role in the regulation of basal gene transcription.
Garg, Abhishek; Mohanram, Kartik; Di Cara, Alessandro; De Micheli, Giovanni; Xenarios, Ioannis
Understanding gene regulation in biological processes and modeling the robustness of underlying regulatory networks is an important problem that is currently being addressed by computational systems biologists. Lately, there has been a renewed interest in Boolean modeling techniques for gene regulatory networks (GRNs). However, due to their deterministic nature, it is often difficult to identify whether these modeling approaches are robust to the addition of stochastic noise that is widespread in gene regulatory processes. Stochasticity in Boolean models of GRNs has been addressed relatively sparingly in the past, mainly by flipping the expression of genes between different expression levels with a predefined probability. This stochasticity in nodes (SIN) model leads to over representation of noise in GRNs and hence non-correspondence with biological observations. In this article, we introduce the stochasticity in functions (SIF) model for simulating stochasticity in Boolean models of GRNs. By providing biological motivation behind the use of the SIF model and applying it to the T-helper and T-cell activation networks, we show that the SIF model provides more biologically robust results than the existing SIN model of stochasticity in GRNs. Algorithms are made available under our Boolean modeling toolbox, GenYsis. The software binaries can be downloaded from http://si2.epfl.ch/ approximately garg/genysis.html.
Soberano de Oliveira, Ana Paula; Patil, Kiran Raosaheb; Nielsen, Jens
is to use the topology of bio-molecular interaction networks in order to constrain the solution space. Such approaches systematically integrate the existing biological knowledge with the 'omics' data. Results: Here we introduce a hypothesis-driven method that integrates bio-molecular network topology......Background: Uncovering the operating principles underlying cellular processes by using 'omics' data is often a difficult task due to the high-dimensionality of the solution space that spans all interactions among the bio-molecules under consideration. A rational way to overcome this problem...... with transcriptome data, thereby allowing the identification of key biological features (Reporter Features) around which transcriptional changes are significantly concentrated. We have combined transcriptome data with different biological networks in order to identify Reporter Gene Ontologies, Reporter Transcription...
Champagne, N J; Sharpe, R M; Rockway, J W
The EIGER (Electromagnetic Interactions Generalized) modeling suite is a joint development activity by the Lawrence Livermore National Lab, Sandia National Labs, the University of Houston, and the Navy (Space and Naval Warfare Systems Center-San Diego). The effort endeavors to bring the next generation of hybrid, higher-order, full-wave analysis methods into a single integrated framework. The tools are based upon frequency-domain solutions of Maxwell's equations to model scattering and radiation from complex 2D and 3D structures. The framework employs boundary element solutions of integral equation formulations and finite element solutions of the Helmholtz wave equation. A goal is to use higher-order representations to model both the geometry (using higher-order geometric elements) and numerical methods (using higher-order vector basis functions). In addition, a variety of advanced Green's functions and symmetry operators can be applied to efficiently treat geometries containing such features as layered material regions and periodic structures. Each of these methods can be brought to bear simultaneously, on different portions of a complex structure. HPC implementation issues were addressed during the design of the software architecture, so that the same package runs on platforms ranging from serial desktop workstations through advanced HPC architectures. Our current efforts on higher-order modeling and improved solver libraries will be highlighted
Mirzarezaee, Mitra; Sadeghi, Mehdi; Araabi, Babak N
Proteins interact with each other for performing essential functions of an organism. They change partners to get involved in various processes at different times or locations. Studying variations of protein interactions within a specific process would help better understand the dynamic features of the protein interactions and their functions. We studied the protein interaction network of Saccharomyces cerevisiae (yeast) during the brewing of Japanese sake. In this process, yeast cells are exposed to several stresses. Analysis of protein interaction networks of yeast during this process helps to understand how protein interactions of yeast change during the sake brewing process. We used gene expression profiles of yeast cells for this purpose. Results of our experiments revealed some characteristics and behaviors of yeast hubs and non-hubs and their dynamical changes during the brewing process. We found that just a small portion of the proteins (12.8 to 21.6%) is responsible for the functional changes of the proteins in the sake brewing process. The changes in the number of edges and hubs of the yeast protein interaction networks increase in the first stages of the process and it then decreases at the final stages.
Nederhof, E; Bouma, Esther; Riese, Harriette; Laceulle, Odilia; Ormel, J.; Oldehinkel, A.J.
The purpose was to study how functional polymorphisms in the brain derived neurotrophic factor gene (BDNF val66met) and the serotonin transporter gene linked promotor region (5-HTTLPR) interact with childhood adversities in predicting Effortful Control. Effortful Control refers to the ability to
Full Text Available Severe mental illness is a broad category that includes schizophrenia, bipolar disorder and severe depression. Both genetic disposition and environmental exposures play important roles in the development of severe mental illness. Multiple lines of evidence suggest that the roles of genetic and environmental depend on each other. Gene-environment interactions may underlie the paradox of strong environmental factors for highly heritable disorders, the low estimates of shared environmental influences in twin studies of severe mental illness and the heritability gap between twin and molecular heritability estimates. Sons and daughters of parents with severe mental illness are more vulnerable to the effects of prenatal and postnatal environmental exposures, suggesting that the expression of genetic liability depends on environment. In the last decade, gene-environment interactions involving specific molecular variants in candidate genes have been identified. Replicated findings include an interaction between a polymorphism in the AKT1 gene and cannabis use in the development of psychosis and an interaction between the length polymorphism of the serotonin transporter gene and childhood maltreatment in the development of persistent depressive disorder. Bipolar disorder has been underinvestigated, with only a single study showing an interaction between a functional polymorphism in BDNF and stressful life events triggering bipolar depressive episodes. The first systematic search for gene-environment interactions has found that a polymorphism in CTNNA3 may sensitise the developing brain to the pathogenic effect of cytomegalovirus in utero, leading to schizophrenia in adulthood. Strategies for genome-wide investigations will likely include coordination between epidemiological and genetic research efforts, systematic assessment of multiple environmental factors in large samples, and prioritization of genetic variants.
Full Text Available Combinatorial gene perturbations provide rich information for a systematic exploration of genetic interactions. Despite successful applications to bacteria and yeast, the scalability of this approach remains a major challenge for higher organisms such as humans. Here, we report a novel experimental and computational framework to efficiently address this challenge by limiting the 'search space' for important genetic interactions. We propose to integrate rich phenotypes of multiple single gene perturbations to robustly predict functional modules, which can subsequently be subjected to further experimental investigations such as combinatorial gene silencing. We present posterior association networks (PANs to predict functional interactions between genes estimated using a Bayesian mixture modelling approach. The major advantage of this approach over conventional hypothesis tests is that prior knowledge can be incorporated to enhance predictive power. We demonstrate in a simulation study and on biological data, that integrating complementary information greatly improves prediction accuracy. To search for significant modules, we perform hierarchical clustering with multiscale bootstrap resampling. We demonstrate the power of the proposed methodologies in applications to Ewing's sarcoma and human adult stem cells using publicly available and custom generated data, respectively. In the former application, we identify a gene module including many confirmed and highly promising therapeutic targets. Genes in the module are also significantly overrepresented in signalling pathways that are known to be critical for proliferation of Ewing's sarcoma cells. In the latter application, we predict a functional network of chromatin factors controlling epidermal stem cell fate. Further examinations using ChIP-seq, ChIP-qPCR and RT-qPCR reveal that the basis of their genetic interactions may arise from transcriptional cross regulation. A Bioconductor package
Munsky, Brian; Trinh, Brooke; Khammash, Mustafa
The cellular environment is abuzz with noise originating from the inherent random motion of reacting molecules in the living cell. In this noisy environment, clonal cell populations exhibit cell-to-cell variability that can manifest significant prototypical differences. Noise induced stochastic fluctuations in cellular constituents can be measured and their statistics quantified using flow cytometry, single molecule fluorescence in situ hybridization, time lapse fluorescence microscopy and other single cell and single molecule measurement techniques. We show that these random fluctuations carry within them valuable information about the underlying genetic network. Far from being a nuisance, the ever-present cellular noise acts as a rich source of excitation that, when processed through a gene network, carries its distinctive fingerprint that encodes a wealth of information about that network. We demonstrate that in some cases the analysis of these random fluctuations enables the full identification of network parameters, including those that may otherwise be difficult to measure. We use theoretical investigations to establish experimental guidelines for the identification of gene regulatory networks, and we apply these guideline to experimentally identify predictive models for different regulatory mechanisms in bacteria and yeast.
Mohamed Salleh, Faridah Hani; Arif, Shereena Mohd; Zainudin, Suhaila; Firdaus-Raih, Mohd
A gene regulatory network (GRN) is a large and complex network consisting of interacting elements that, over time, affect each other's state. The dynamics of complex gene regulatory processes are difficult to understand using intuitive approaches alone. To overcome this problem, we propose an algorithm for inferring the regulatory interactions from knock-out data using a Gaussian model combines with Pearson Correlation Coefficient (PCC). There are several problems relating to GRN construction that have been outlined in this paper. We demonstrated the ability of our proposed method to (1) predict the presence of regulatory interactions between genes, (2) their directionality and (3) their states (activation or suppression). The algorithm was applied to network sizes of 10 and 50 genes from DREAM3 datasets and network sizes of 10 from DREAM4 datasets. The predicted networks were evaluated based on AUROC and AUPR. We discovered that high false positive values were generated by our GRN prediction methods because the indirect regulations have been wrongly predicted as true relationships. We achieved satisfactory results as the majority of sub-networks achieved AUROC values above 0.5. Copyright © 2015 Elsevier Ltd. All rights reserved.
Brian W Kunkle
Full Text Available In this study we have identified key genes that are critical in development of astrocytic tumors. Meta-analysis of microarray studies which compared normal tissue to astrocytoma revealed a set of 646 differentially expressed genes in the majority of astrocytoma. Reverse engineering of these 646 genes using Bayesian network analysis produced a gene network for each grade of astrocytoma (Grade I-IV, and 'key genes' within each grade were identified. Genes found to be most influential to development of the highest grade of astrocytoma, Glioblastoma multiforme were: COL4A1, EGFR, BTF3, MPP2, RAB31, CDK4, CD99, ANXA2, TOP2A, and SERBP1. All of these genes were up-regulated, except MPP2 (down regulated. These 10 genes were able to predict tumor status with 96-100% confidence when using logistic regression, cross validation, and the support vector machine analysis. Markov genes interact with NFkβ, ERK, MAPK, VEGF, growth hormone and collagen to produce a network whose top biological functions are cancer, neurological disease, and cellular movement. Three of the 10 genes - EGFR, COL4A1, and CDK4, in particular, seemed to be potential 'hubs of activity'. Modified expression of these 10 Markov Blanket genes increases lifetime risk of developing glioblastoma compared to the normal population. The glioblastoma risk estimates were dramatically increased with joint effects of 4 or more than 4 Markov Blanket genes. Joint interaction effects of 4, 5, 6, 7, 8, 9 or 10 Markov Blanket genes produced 9, 13, 20.9, 26.7, 52.8, 53.2, 78.1 or 85.9%, respectively, increase in lifetime risk of developing glioblastoma compared to normal population. In summary, it appears that modified expression of several 'key genes' may be required for the development of glioblastoma. Further studies are needed to validate these 'key genes' as useful tools for early detection and novel therapeutic options for these tumors.
Tuqyah Abdullah Al Qazlan
Full Text Available To address one of the most challenging issues at the cellular level, this paper surveys the fuzzy methods used in gene regulatory networks (GRNs inference. GRNs represent causal relationships between genes that have a direct influence, trough protein production, on the life and the development of living organisms and provide a useful contribution to the understanding of the cellular functions as well as the mechanisms of diseases. Fuzzy systems are based on handling imprecise knowledge, such as biological information. They provide viable computational tools for inferring GRNs from gene expression data, thus contributing to the discovery of gene interactions responsible for specific diseases and/or ad hoc correcting therapies. Increasing computational power and high throughput technologies have provided powerful means to manage these challenging digital ecosystems at different levels from cell to society globally. The main aim of this paper is to report, present, and discuss the main contributions of this multidisciplinary field in a coherent and structured framework.
Araújo, Daniela; Henriques, Mariana; Silva, Sónia
Most cases of candidiasis have been attributed to Candida albicans, but Candida glabrata, Candida parapsilosis and Candida tropicalis, designated as non-C. albicans Candida (NCAC), have been identified as frequent human pathogens. Moreover, Candida biofilms are an escalating clinical problem associated with significant rates of mortality. Biofilms have distinct developmental phases, including adhesion/colonisation, maturation and dispersal, controlled by complex regulatory networks. This review discusses recent advances regarding Candida species biofilm regulatory network genes, which are key components for candidiasis. Copyright © 2016 Elsevier Ltd. All rights reserved.
Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence
Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a
Flowers, Jonathan M; Hanzawa, Yoshie; Hall, Megan C; Moore, Richard C; Purugganan, Michael D
The time to flowering is a key component of the life-history strategy of the model plant Arabidopsis thaliana that varies quantitatively among genotypes. A significant problem for evolutionary and ecological genetics is to understand how natural selection may operate on this ecologically significant trait. Here, we conduct a population genomic study of resequencing data from 52 genes in the flowering time network. McDonald-Kreitman tests of neutrality suggested a strong excess of amino acid polymorphism when pooling across loci. This excess of replacement polymorphism across the flowering time network and a skewed derived frequency spectrum toward rare alleles for both replacement and noncoding polymorphisms relative to synonymous changes is consistent with a large class of deleterious polymorphisms segregating in these genes. Assuming selective neutrality of synonymous changes, we estimate that approximately 30% of amino acid polymorphisms are deleterious. Evidence of adaptive substitution is less prominent in our analysis. The photoperiod regulatory gene, CO, and a gibberellic acid transcription factor, AtMYB33, show evidence of adaptive fixation of amino acid mutations. A test for extended haplotypes revealed no examples of flowering time alleles with haplotypes comparable in length to those associated with the null fri(Col) allele reported previously. This suggests that the FRI gene likely has a uniquely intense or recent history of selection among the flowering time genes considered here. Although there is some evidence for adaptive evolution in these life-history genes, it appears that slightly deleterious polymorphisms are a major component of natural molecular variation in the flowering time network of A. thaliana.
ABSTRACT: BACKGROUND: The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very difficult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insufficient. RESULTS: This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and offer a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. CONCLUSIONS: Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identified and a platform for development of appropriate model formalisms is established.
Lee, Meng-Huang; Chang, Shin-Hung
In the current CATV system architectures, they provide one- way delivery of a common menu of entertainment to all the homes through the cable network. Through the technologies evolution, the interactive services (or two-way services) can be provided in the cable TV systems. They can supply customers with individualized programming and support real- time two-way communications. With a view to the service type changed from the one-way delivery systems to the two-way interactive systems, `on demand services' is a distinct feature of multimedia systems. In this paper, we present our work of building up an integrated multimedia system on interactive CATV network in Shih Chien University. Besides providing the traditional analog TV programming from the cable operator, we filter some channels to reserve them as our campus information channels. In addition to the analog broadcasting channel, the system also provides the interactive digital multimedia services, e.g. Video-On- Demand (VOD), Virtual Reality, BBS, World-Wide-Web, and Internet Radio Station. These two kinds of services are integrated in a CATV network by the separation of frequency allocation for the analog broadcasting service and the digital interactive services. Our ongoing work is to port our previous work of building up a VOD system conformed to DAVIC standard (for inter-operability concern) on Ethernet network into the current system.
Full Text Available Abstract Background Various computational models have been of interest due to their use in the modelling of gene regulatory networks (GRNs. As a logical model, probabilistic Boolean networks (PBNs consider molecular and genetic noise, so the study of PBNs provides significant insights into the understanding of the dynamics of GRNs. This will ultimately lead to advances in developing therapeutic methods that intervene in the process of disease development and progression. The applications of PBNs, however, are hindered by the complexities involved in the computation of the state transition matrix and the steady-state distribution of a PBN. For a PBN with n genes and N Boolean networks, the complexity to compute the state transition matrix is O(nN22n or O(nN2n for a sparse matrix. Results This paper presents a novel implementation of PBNs based on the notions of stochastic logic and stochastic computation. This stochastic implementation of a PBN is referred to as a stochastic Boolean network (SBN. An SBN provides an accurate and efficient simulation of a PBN without and with random gene perturbation. The state transition matrix is computed in an SBN with a complexity of O(nL2n, where L is a factor related to the stochastic sequence length. Since the minimum sequence length required for obtaining an evaluation accuracy approximately increases in a polynomial order with the number of genes, n, and the number of Boolean networks, N, usually increases exponentially with n, L is typically smaller than N, especially in a network with a large number of genes. Hence, the computational efficiency of an SBN is primarily limited by the number of genes, but not directly by the total possible number of Boolean networks. Furthermore, a time-frame expanded SBN enables an efficient analysis of the steady-state distribution of a PBN. These findings are supported by the simulation results of a simplified p53 network, several randomly generated networks and a
Gill, Joel C.; Malamud, Bruce D.
This paper combines research and commentary to reinforce the importance of integrating hazard interactions and interaction networks (cascades) into multi-hazard methodologies. We present a synthesis of the differences between multi-layer single-hazard approaches and multi-hazard approaches that integrate such interactions. This synthesis suggests that ignoring interactions between important environmental and anthropogenic processes could distort management priorities, increase vulnerability to other spatially relevant hazards or underestimate disaster risk. In this paper we proceed to present an enhanced multi-hazard framework through the following steps: (i) description and definition of three groups (natural hazards, anthropogenic processes and technological hazards/disasters) as relevant components of a multi-hazard environment, (ii) outlining of three types of interaction relationship (triggering, increased probability, and catalysis/impedance), and (iii) assessment of the importance of networks of interactions (cascades) through case study examples (based on the literature, field observations and semi-structured interviews). We further propose two visualisation frameworks to represent these networks of interactions: hazard interaction matrices and hazard/process flow diagrams. Our approach reinforces the importance of integrating interactions between different aspects of the Earth system, together with human activity, into enhanced multi-hazard methodologies. Multi-hazard approaches support the holistic assessment of hazard potential and consequently disaster risk. We conclude by describing three ways by which understanding networks of interactions contributes to the theoretical and practical understanding of hazards, disaster risk reduction and Earth system management. Understanding interactions and interaction networks helps us to better (i) model the observed reality of disaster events, (ii) constrain potential changes in physical and social vulnerability
Full Text Available A vast amount of literature has confirmed the role of gene-environment (G×E interaction in the etiology of complex human diseases. Traditional methods are predominantly focused on the analysis of interaction between a single nucleotide polymorphism (SNP and an environmental variable. Given that genes are the functional units, it is crucial to understand how gene effects (rather than single SNP effects are influenced by an environmental variable to affect disease risk. Motivated by the increasing awareness of the power of gene-based association analysis over single variant based approach, in this work, we proposed a sparse principle component regression (sPCR model to understand the gene-based G×E interaction effect on complex disease. We first extracted the sparse principal components for SNPs in a gene, then the effect of each principal component was modeled by a varying-coefficient (VC model. The model can jointly model variants in a gene in which their effects are nonlinearly influenced by an environmental variable. In addition, the varying-coefficient sPCR (VC-sPCR model has nice interpretation property since the sparsity on the principal component loadings can tell the relative importance of the corresponding SNPs in each component. We applied our method to a human birth weight dataset in Thai population. We analyzed 12,005 genes across 22 chromosomes and found one significant interaction effect using the Bonferroni correction method and one suggestive interaction. The model performance was further evaluated through simulation studies. Our model provides a system approach to evaluate gene-based G×E interaction.
Kozlov, Konstantin; Gursky, Vitaly; Kulakovskiy, Ivan; Samsonova, Maria
The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3
Full Text Available Background/Aims: Pediatric sepsis is a disease that threatens life of children. The incidence of pediatric sepsis is higher in developing countries due to various reasons, such as insufficient immunization and nutrition, water and air pollution, etc. Exploring the potential genes via different methods is of significance for the prevention and treatment of pediatric sepsis. This study aimed to identify potential genes associated with pediatric sepsis utilizing analysis of gene network and entropy. Methods: The mRNA expression in the blood samples collected from 20 septic children and 30 healthy controls was quantified by using Affymetrix HG-U133A microarray. Two condition-specific protein-protein interaction networks (PINs, one for the healthy control and the other one for the children with sepsis, were deduced by combining the fundamental human PINs with gene expression profiles in the two phenotypes. Subsequently, distinct modules from the two conditional networks were extracted by adopting a maximal clique-merging approach. Delta entropy (ΔS was calculated between sepsis and control modules. Results: Then, key genes displaying changes in gene composition were identified by matching the control and sepsis modules. Two objective modules were obtained, in which ribosomal protein RPL4 and RPL9 as well as TOP2A were probably considered as the key genes differentiating sepsis from healthy controls. Conclusion: According to previous reports and this work, TOP2A is the potential gene therapy target for pediatric sepsis. The relationship between pediatric sepsis and RPL4 and RPL9 needs further investigation.
Kreula, Sanna M; Kaewphan, Suwisa; Ginter, Filip; Jones, Patrik R
The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from 'reading the literature'. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already 'known', and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to ( i ) discover novel candidate associations between different genes or proteins in the network, and ( ii ) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource.
Systematic Search for Gene-Gene Interaction 5a. CONTRACT NUMBER Effect on Prostate Cancer Risk 5b. GRANT NUMBER W81XWH-09-1-0488 5c. PROGRAM...Supported by this grant ) 1. Tao S, Wang Z, Feng J, Hsu FC, Jin G, Kin ST, Zhang Z, Gronberg H, Zheng, SL, Isaacs WB, XU J, Sun J. A Genome-Wide Search for...order interactions among estrogen- metabolism genes in sporadic breast cancer. Am J Hum Genet, 69, 138-47. 48. Marchini, J., Donnelly, P. and Cardon
Full Text Available Abstract Background Synthetic lethality experiments identify pairs of genes with complementary function. More direct functional associations (for example greater probability of membership in a single protein complex may be inferred between genes that share synthetic lethal interaction partners than genes that are directly synthetic lethal. Probabilistic algorithms that identify gene modules based on motif discovery are highly appropriate for the analysis of synthetic lethal genetic interaction data and have great potential in integrative analysis of heterogeneous datasets. Results We have developed Genetic Interaction Motif Finding (GIMF, an algorithm for unsupervised motif discovery from synthetic lethal interaction data. Interaction motifs are characterized by position weight matrices and optimized through expectation maximization. Given a seed gene, GIMF performs a nonlinear transform on the input genetic interaction data and automatically assigns genes to the motif or non-motif category. We demonstrate the capacity to extract known and novel pathways for Saccharomyces cerevisiae (budding yeast. Annotations suggested for several uncharacterized genes are supported by recent experimental evidence. GIMF is efficient in computation, requires no training and automatically down-weights promiscuous genes with high degrees. Conclusion GIMF effectively identifies pathways from synthetic lethality data with several unique features. It is mostly suitable for building gene modules around seed genes. Optimal choice of one single model parameter allows construction of gene networks with different levels of confidence. The impact of hub genes the generic probabilistic framework of GIMF may be used to group other types of biological entities such as proteins based on stochastic motifs. Analysis of the strongest motifs discovered by the algorithm indicates that synthetic lethal interactions are depleted between genes within a motif, suggesting that synthetic
Starnini, Michele; Baronchelli, Andrea; Pastor-Satorras, Romualdo
Face-to-face interaction networks describe social interactions in human gatherings, and are the substrate for processes such as epidemic spreading and gossip propagation. The bursty nature of human behavior characterizes many aspects of empirical data, such as the distribution of conversation lengths, of conversations per person, or of inter-conversation times. Despite several recent attempts, a general theoretical understanding of the global picture emerging from data is still lacking. Here ...
Che, Dongxue; Wang, Yang; Bai, Weiyang; Li, Leijie; Liu, Guiyou; Zhang, Liangcai; Zuo, Yongchun; Tao, Shiheng; Hua, Jinlian; Liao, Mingzhi
Gametogenesis is a complex process, which includes mitosis and meiosis and results in the production of ovum and sperm. The development of gametogenesis is dynamic and needs many different genes to work synergistically, but it is lack of global perspective research about this process. In this study, we detected the dynamic process of gametogenesis from the perspective of systems biology based on protein-protein interaction networks (PPINs) and functional analysis. Results showed that gametogenesis genes have strong synergistic effects in PPINs within and between different phases during the development. Addition to the synergistic effects on molecular networks, gametogenesis genes showed functional consistency within and between different phases, which provides the further evidence about the dynamic process during the development of gametogenesis. At last, we detected and provided the core molecular modules of different phases about gametogenesis. The gametogenesis genes and related modules can be obtained from our Web site Gametogenesis Molecule Online (GMO, http://gametsonline.nwsuaflmz.com/index.php), which is freely accessible. GMO may be helpful for the reference and application of these genes and modules in the future identification of key genes about gametogenesis. Summary, this work provided a computational perspective and frame to the analysis of the gametogenesis dynamics and modularity in both human and mouse. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: firstname.lastname@example.org.
Mateescu, Raluca G; Garrick, Dorian J; Reecy, James M
Improvements in eating satisfaction will benefit consumers and should increase beef demand which is of interest to the beef industry. Tenderness, juiciness, and flavor are major determinants of the palatability of beef and are often used to reflect eating satisfaction. Carcass qualities are used as indicator traits for meat quality, with higher quality grade carcasses expected to relate to more tender and palatable meat. However, meat quality is a complex concept determined by many component traits making interpretation of genome-wide association studies (GWAS) on any one component challenging to interpret. Recent approaches combining traditional GWAS with gene network interactions theory could be more efficient in dissecting the genetic architecture of complex traits. Phenotypic measures of 23 traits reflecting carcass characteristics, components of meat quality, along with mineral and peptide concentrations were used along with Illumina 54k bovine SNP genotypes to derive an annotated gene network associated with meat quality in 2,110 Angus beef cattle. The efficient mixed model association (EMMAX) approach in combination with a genomic relationship matrix was used to directly estimate the associations between 54k SNP genotypes and each of the 23 component traits. Genomic correlated regions were identified by partial correlations which were further used along with an information theory algorithm to derive gene network clusters. Correlated SNP across 23 component traits were subjected to network scoring and visualization software to identify significant SNP. Significant pathways implicated in the meat quality complex through GO term enrichment analysis included angiogenesis, inflammation, transmembrane transporter activity, and receptor activity. These results suggest that network analysis using partial correlations and annotation of significant SNP can reveal the genetic architecture of complex traits and provide novel information regarding biological mechanisms
Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology
Estrada, Ernesto; Kalala-Mutombo, Franck; Valverde-Colmeiro, Alba
An “infection,” understood here in a very broad sense, can be propagated through the network of social contacts among individuals. These social contacts include both “close” contacts and “casual” encounters among individuals in transport, leisure, shopping, etc. Knowing the first through the study of the social networks is not a difficult task, but having a clear picture of the network of casual contacts is a very hard problem in a society of increasing mobility. Here we assume, on the basis of several pieces of empirical evidence, that the casual contacts between two individuals are a function of their social distance in the network of close contacts. Then, we assume that we know the network of close contacts and infer the casual encounters by means of nonrandom long-range (LR) interactions determined by the social proximity of the two individuals. This approach is then implemented in a susceptible-infected-susceptible (SIS) model accounting for the spread of infections in complex networks. A parameter called “conductance” controls the feasibility of those casual encounters. In a zero conductance network only contagion through close contacts is allowed. As the conductance increases the probability of having casual encounters also increases. We show here that as the conductance parameter increases, the rate of propagation increases dramatically and the infection is less likely to die out. This increment is particularly marked in networks with scale-free degree distributions, where infections easily become epidemics. Our model provides a general framework for studying epidemic spreading in networks with arbitrary topology with and without casual contacts accounted for by means of LR interactions.
Estrada, Ernesto; Kalala-Mutombo, Franck; Valverde-Colmeiro, Alba
An "infection," understood here in a very broad sense, can be propagated through the network of social contacts among individuals. These social contacts include both "close" contacts and "casual" encounters among individuals in transport, leisure, shopping, etc. Knowing the first through the study of the social networks is not a difficult task, but having a clear picture of the network of casual contacts is a very hard problem in a society of increasing mobility. Here we assume, on the basis of several pieces of empirical evidence, that the casual contacts between two individuals are a function of their social distance in the network of close contacts. Then, we assume that we know the network of close contacts and infer the casual encounters by means of nonrandom long-range (LR) interactions determined by the social proximity of the two individuals. This approach is then implemented in a susceptible-infected-susceptible (SIS) model accounting for the spread of infections in complex networks. A parameter called "conductance" controls the feasibility of those casual encounters. In a zero conductance network only contagion through close contacts is allowed. As the conductance increases the probability of having casual encounters also increases. We show here that as the conductance parameter increases, the rate of propagation increases dramatically and the infection is less likely to die out. This increment is particularly marked in networks with scale-free degree distributions, where infections easily become epidemics. Our model provides a general framework for studying epidemic spreading in networks with arbitrary topology with and without casual contacts accounted for by means of LR interactions.