Gareth J Fraser
Full Text Available Vertebrate dentitions originated in the posterior pharynx of jawless fishes more than half a billion years ago. As gnathostomes (jawed vertebrates evolved, teeth developed on oral jaws and helped to establish the dominance of this lineage on land and in the sea. The advent of oral jaws was facilitated, in part, by absence of hox gene expression in the first, most anterior, pharyngeal arch. Much later in evolutionary time, teleost fishes evolved a novel toothed jaw in the pharynx, the location of the first vertebrate teeth. To examine the evolutionary modularity of dentitions, we asked whether oral and pharyngeal teeth develop using common or independent gene regulatory pathways. First, we showed that tooth number is correlated on oral and pharyngeal jaws across species of cichlid fishes from Lake Malawi (East Africa, suggestive of common regulatory mechanisms for tooth initiation. Surprisingly, we found that cichlid pharyngeal dentitions develop in a region of dense hox gene expression. Thus, regulation of tooth number is conserved, despite distinct developmental environments of oral and pharyngeal jaws; pharyngeal jaws occupy hox-positive, endodermal sites, and oral jaws develop in hox-negative regions with ectodermal cell contributions. Next, we studied the expression of a dental gene network for tooth initiation, most genes of which are similarly deployed across the two disparate jaw sites. This collection of genes includes members of the ectodysplasin pathway, eda and edar, expressed identically during the patterning of oral and pharyngeal teeth. Taken together, these data suggest that pharyngeal teeth of jawless vertebrates utilized an ancient gene network before the origin of oral jaws, oral teeth, and ectodermal appendages. The first vertebrate dentition likely appeared in a hox-positive, endodermal environment and expressed a genetic program including ectodysplasin pathway genes. This ancient regulatory circuit was co-opted and modified
Veiga-Crespo, P; Poza, M; Prieto-Alcedo, M; Villa, T G
Amber is a plant resin mainly produced by coniferous trees that, after entrapping a variety of living beings, was subjected to a process of fossilization until it turned into yellowish, translucent stones. It is also one of the best sources of ancient DNA on which to perform studies on evolution. Here a method for the sterilization of amber that allows reliable ancient DNA extraction with no actual DNA contamination is described. Working with insects taken from amber, it was possible to amplify the ATP9, PGU1 and rRNA18S ancient genes of Saccharomyces cerevisiae corresponding to samples from the Miocene and Oligocene. After comparison of the current genes with their ancient (up to 35-40 million years) counterparts it was concluded that essential genes such as rRNA18S are highly conserved and that even normal 'house-keeping' genes, such as PGU1, are strikingly conserved along the millions of years that S. cerevisiae has evolved.
Full Text Available Bacteria to eukaryote lateral gene transfers (LGT are an important potential source of material for the evolution of novel genetic traits. The explosion in the number of newly sequenced genomes provides opportunities to identify and characterize examples of these lateral gene transfer events, and to assess their role in the evolution of new genes. In this paper, we describe an ancient lepidopteran LGT of a glycosyl hydrolase family 31 gene (GH31 from an Enterococcus bacteria. PCR amplification between the LGT and a flanking insect gene confirmed that the GH31 was integrated into the Bombyx mori genome and was not a result of an assembly error. Database searches in combination with degenerate PCR on a panel of 7 lepidopteran families confirmed that the GH31 LGT event occurred deep within the Order approximately 65-145 million years ago. The most basal species in which the LGT was found is Plutella xylostella (superfamily: Yponomeutoidea. Array data from Bombyx mori shows that GH31 is expressed, and low dN/dS ratios indicates the LGT coding sequence is under strong stabilizing selection. These findings provide further support for the proposition that bacterial LGTs are relatively common in insects and likely to be an underappreciated source of adaptive genetic material.
Kacar, Betul; Guy, Lionel; Smith, Eric; Baross, John
Two datasets, the geologic record and the genetic content of extant organisms, provide complementary insights into the history of how key molecular components have shaped or driven global environmental and macroevolutionary trends. Changes in global physico-chemical modes over time are thought to be a consistent feature of this relationship between Earth and life, as life is thought to have been optimizing protein functions for the entirety of its approximately 3.8 billion years of history on the Earth. Organismal survival depends on how well critical genetic and metabolic components can adapt to their environments, reflecting an ability to optimize efficiently to changing conditions. The geologic record provides an array of biologically independent indicators of macroscale atmospheric and oceanic composition, but provides little in the way of the exact behaviour of the molecular components that influenced the compositions of these reservoirs. By reconstructing sequences of proteins that might have been present in ancient organisms, we can downselect to a subset of possible sequences that may have been optimized to these ancient environmental conditions. How can one use modern life to reconstruct ancestral behaviours? Configurations of ancient sequences can be inferred from the diversity of extant sequences, and then resurrected in the laboratory to ascertain their biochemical attributes. One way to augment sequence-based, single-gene methods to obtain a richer and more reliable picture of the deep past, is to resurrect inferred ancestral protein sequences in living organisms, where their phenotypes can be exposed in a complex molecular-systems context, and then to link consequences of those phenotypes to biosignatures that were preserved in the independent historical repository of the geological record. As a first step beyond single-molecule reconstruction to the study of functional molecular systems, we present here the ancestral sequence reconstruction of the
Constructing, evaluating, and interpreting gene networks generally sits within the broader field of systems biology, which continues to emerge rapidly, particular with respect to its application to understanding the complexity of signaling in the context of cancer biology. For the purposes of this volume, we take a broad definition of systems biology. Considering an organism or disease within an organism as a system, systems biology is the study of the integrated and coordinated interactions of the network(s) of genes, their variants both natural and mutated (e.g., polymorphisms, rearrangements, alternate splicing, mutations), their proteins and isoforms, and the organic and inorganic molecules with which they interact, to execute the biochemical reactions (e.g., as enzymes, substrates, products) that reflect the function of that system. Central to systems biology, and perhaps the only approach that can effectively manage the complexity of such systems, is the building of quantitative multiscale predictive models. The predictions of the models can vary substantially depending on the nature of the model and its inputoutput relationships. For example, a model may predict the outcome of a specific molecular reaction(s), a cellular phenotype (e.g., alive, dead, growth arrest, proliferation, and motility), a change in the respective prevalence of cell or subpopulations, a patient or patient subgroup outcome(s). Such models necessarily require computers. Computational modeling can be thought of as using machine learning and related tools to integrate the very high dimensional data generated from modern, high throughput omics technologies including genomics (next generation sequencing), transcriptomics (gene expression microarrays; RNAseq), metabolomics and proteomics (ultra high performance liquid chromatography, mass spectrometry), and "subomic" technologies to study the kinome, methylome, and others. Mathematical modeling can be thought of as the use of ordinary
Taylor, William R.; Gibbs, Melanie; Breuker, Casper J.; Holland, Peter W. H.
Gene duplications within the conserved Hox cluster are rare in animal evolution, but in Lepidoptera an array of divergent Hox-related genes (Shx genes) has been reported between pb and zen. Here, we use genome sequencing of five lepidopteran species (Polygonia c-album, Pararge aegeria, Callimorpha dominula, Cameraria ohridella, Hepialus sylvina) plus a caddisfly outgroup (Glyphotaelius pellucidus) to trace the evolution of the lepidopteran Shx genes. We demonstrate that Shx genes originated by tandem duplication of zen early in the evolution of large clade Ditrysia; Shx are not found in a caddisfly and a member of the basally diverging Hepialidae (swift moths). Four distinct Shx genes were generated early in ditrysian evolution, and were stably retained in all descendent Lepidoptera except the silkmoth which has additional duplications. Despite extensive sequence divergence, molecular modelling indicates that all four Shx genes have the potential to encode stable homeodomains. The four Shx genes have distinct spatiotemporal expression patterns in early development of the Speckled Wood butterfly (Pararge aegeria), with ShxC demarcating the future sites of extraembryonic tissue formation via strikingly localised maternal RNA in the oocyte. All four genes are also expressed in presumptive serosal cells, prior to the onset of zen expression. Lepidopteran Shx genes represent an unusual example of Hox cluster expansion and integration of novel genes into ancient developmental regulatory networks. PMID:25340822
Belcastro, Vincenzo; di Bernardo, Diego
The aim of this chapter is a step-by-step guide on how to infer gene networks from gene expression profiles. The definition of a gene network is given in Subheading 1, where the different types of networks are discussed. The chapter then guides the readers through a data-gathering process in order to build a compendium of gene expression profiles from a public repository. Gene expression profiles are then discretized and a statistical relationship between genes, called mutual information (MI), is computed. Gene pairs with insignificant MI scores are then discarded by applying one of the described pruning steps. The retained relationships are then used to build up a Boolean adjacency matrix used as input for a clustering algorithm to divide the network into modules (or communities). The gene network can then be used as a hypothesis generator for discovering gene function and analyzing gene signatures. Some case studies are presented, and an online web-tool called Netview is described.
Li Guoxia; Liang Xianhua; Zhao Weijuan; Sun Hongwei; Guo Min; Xie Jianzhong; Gao Zhengyao; Cui Pengfei; Yang Dawei; Li rongwu; Zhao Qingyun; Sun Xinmin; Zhao Wenjun; Feng Songlin
Forty samples of Jun porcelain from an ancient Juntai kiln and 3 modern Jun kilns (Kongjia, Miaojia and Xinghang) were selected and analyzed for 25 elements by INAA.The data were trained and forecasted by BP neural network. The results indicate that the network can distinguish unknown body and glaze samples of the official Jun porcelain and the modern top-grade Jun porcelain after proper training. (authors)
Ziesemer, Kirsten A; Mann, Allison E; Sankaranarayanan, Krithivasan; Schroeder, Hannes; Ozga, Andrew T; Brandt, Bernd W; Zaura, Egija; Waters-Rist, Andrea; Hoogland, Menno; Salazar-García, Domingo C; Aldenderfer, Mark; Speller, Camilla; Hendy, Jessica; Weston, Darlene A; MacDonald, Sandy J; Thomas, Gavin H; Collins, Matthew J; Lewis, Cecil M; Hofman, Corinne; Warinner, Christina
To date, characterization of ancient oral (dental calculus) and gut (coprolite) microbiota has been primarily accomplished through a metataxonomic approach involving targeted amplification of one or more variable regions in the 16S rRNA gene. Specifically, the V3 region (E. coli 341-534) of this gene has been suggested as an excellent candidate for ancient DNA amplification and microbial community reconstruction. However, in practice this metataxonomic approach often produces highly skewed taxonomic frequency data. In this study, we use non-targeted (shotgun metagenomics) sequencing methods to better understand skewed microbial profiles observed in four ancient dental calculus specimens previously analyzed by amplicon sequencing. Through comparisons of microbial taxonomic counts from paired amplicon (V3 U341F/534R) and shotgun sequencing datasets, we demonstrate that extensive length polymorphisms in the V3 region are a consistent and major cause of differential amplification leading to taxonomic bias in ancient microbiome reconstructions based on amplicon sequencing. We conclude that systematic amplification bias confounds attempts to accurately reconstruct microbiome taxonomic profiles from 16S rRNA V3 amplicon data generated using universal primers. Because in silico analysis indicates that alternative 16S rRNA hypervariable regions will present similar challenges, we advocate for the use of a shotgun metagenomics approach in ancient microbiome reconstructions.
Ziesemer, K.A.; Mann, A.E.; Sankaranarayanan, K.; Schroeder, H.; Ozga, A.T.; Brandt, B.W.; Zaura, E.; Waters-Rist, A.; Hoogland, M.; Salazar-García, D.C.; Aldenderfer, M.; Speller, C.; Hendy, J.; Weston, D.A.; MacDonald, S.J.; Thomas, G.H.; Collins, M.J.; Lewis, C.M.; Hofman, C.; Warinner, C.
To date, characterization of ancient oral (dental calculus) and gut (coprolite) microbiota has been primarily accomplished through a metataxonomic approach involving targeted amplification of one or more variable regions in the 16S rRNA gene. Specifically, the V3 region (E. coli 341-534) of this
Full Text Available During the second half of the 19th century, the Roman Empire was already considered one of the key players inside the Eurasian networks. This research focuses on four relevant points. From a historiographical perspective, the reconstruction of the trading routes represented a central theme in the history of the relationship between the Roman Empire and the Far East. Imagining a plurality of itineraries and combinations of overland and sea routes, it is possible to reconstruct a complex reality in which the Eurasian networks during the Early Roman Empire developed. As far as economics is concerned, new documentation demonstrates the wide range and the extraordinary impact of the Eastern products on Roman markets. A final focus on the process of Chinese silk unravelling and reweaving provides an important clue on how complex and absolutely not mono-directional were the interactions and the exchanges in the Eurasian networks during the first centuries of the Roman Empire.
Kenna, Ralph; Mac Carron, Pádraig
Three years ago, we initiated a programme of research in which ideas and tools from statistical physics and network theory were applied to the field of comparative mythology. The eclecticism of the work, together with the perspectives it delivered, led to widespread media coverage and academic discussion. Here we review some aspects of the project, contextualised with a brief history of the long relationship between science and the humanities. We focus in particular on an Irish epic, summarising some of the outcomes of our quantitative investigation. We also describe the emergence of a new sub-discipline and our hopes for its future.
Full Text Available The highly successful human pathogen Mycobacterium tuberculosis has an extremely low level of genetic variation, which suggests that the entire population resulted from clonal expansion following an evolutionary bottleneck around 35,000 y ago. Here, we show that this population constitutes just the visible tip of a much broader progenitor species, whose extant representatives are human isolates of tubercle bacilli from East Africa. In these isolates, we detected incongruence among gene phylogenies as well as mosaic gene sequences, whose individual elements are retrieved in classical M. tuberculosis. Therefore, despite its apparent homogeneity, the M. tuberculosis genome appears to be a composite assembly resulting from horizontal gene transfer events predating clonal expansion. The amount of synonymous nucleotide variation in housekeeping genes suggests that tubercle bacilli were contemporaneous with early hominids in East Africa, and have thus been coevolving with their human host much longer than previously thought. These results open novel perspectives for unraveling the molecular bases of M. tuberculosis evolutionary success.
Ueda, Sumie; Makino, Kumi; Itoh, Yoshiaki; Tsuchiya, Takashi
We reconstruct the published year of each cuneiform tablet of the Nuzi society in ancient Mesopotamia. The tablets are on land transaction, marriage, loan, slavery contracts, etc. The number of tablets seems to increase by logistic growth. It may show the dynamics of concentration of lands or other properties into few powerful families in a period of about sixty years and most of them are in about thirty years. We reconstruct family trees and social networks of Nuzi and estimate the published years of cuneiform tablets consistently with the trees and networks, formulating least squares problems with linear inequality constraints.
Girdland Flink, Linus; Allen, Richard; Barnett, Ross; Malmström, Helena; Peters, Joris; Eriksson, Jonas; Andersson, Leif; Dobney, Keith
Modern domestic plants and animals are subject to human-driven selection for desired phenotypic traits and behavior. Large-scale genetic studies of modern domestic populations and their wild relatives have revealed not only the genetic mechanisms underlying specific phenotypic traits, but also allowed for the identification of candidate domestication genes. Our understanding of the importance of these genes during the initial stages of the domestication process traditionally rests on the assumption that robust inferences about the past can be made on the basis of modern genetic datasets. A growing body of evidence from ancient DNA studies, however, has revealed that ancient and even historic populations often bear little resemblance to their modern counterparts. Here, we test the temporal context of selection on specific genetic loci known to differentiate modern domestic chickens from their extant wild ancestors. We extracted DNA from 80 ancient chickens excavated from 12 European archaeological sites, dated from ∼280 B.C. to the 18th century A.D. We targeted three unlinked genetic loci: the mitochondrial control region, a gene associated with yellow skin color (β-carotene dioxygenase 2), and a putative domestication gene thought to be linked to photoperiod and reproduction (thyroid-stimulating hormone receptor, TSHR). Our results reveal significant variability in both nuclear genes, suggesting that the commonality of yellow skin in Western breeds and the near fixation of TSHR in all modern chickens took place only in the past 500 y. In addition, mitochondrial variation has increased as a result of recent admixture with exotic breeds. We conclude by emphasizing the perils of inferring the past from modern genetic data alone. PMID:24753608
Girdland Flink, Linus; Allen, Richard; Barnett, Ross; Malmström, Helena; Peters, Joris; Eriksson, Jonas; Andersson, Leif; Dobney, Keith; Larson, Greger
Modern domestic plants and animals are subject to human-driven selection for desired phenotypic traits and behavior. Large-scale genetic studies of modern domestic populations and their wild relatives have revealed not only the genetic mechanisms underlying specific phenotypic traits, but also allowed for the identification of candidate domestication genes. Our understanding of the importance of these genes during the initial stages of the domestication process traditionally rests on the assumption that robust inferences about the past can be made on the basis of modern genetic datasets. A growing body of evidence from ancient DNA studies, however, has revealed that ancient and even historic populations often bear little resemblance to their modern counterparts. Here, we test the temporal context of selection on specific genetic loci known to differentiate modern domestic chickens from their extant wild ancestors. We extracted DNA from 80 ancient chickens excavated from 12 European archaeological sites, dated from ∼ 280 B.C. to the 18th century A.D. We targeted three unlinked genetic loci: the mitochondrial control region, a gene associated with yellow skin color (β-carotene dioxygenase 2), and a putative domestication gene thought to be linked to photoperiod and reproduction (thyroid-stimulating hormone receptor, TSHR). Our results reveal significant variability in both nuclear genes, suggesting that the commonality of yellow skin in Western breeds and the near fixation of TSHR in all modern chickens took place only in the past 500 y. In addition, mitochondrial variation has increased as a result of recent admixture with exotic breeds. We conclude by emphasizing the perils of inferring the past from modern genetic data alone.
A novel gene's role in an ancient mechanism: secreted Frizzled-related protein 1 is a critical component in the anterior-posterior Wnt signaling network that governs the establishment of the anterior neuroectoderm in sea urchin embryos.
Khadka, Anita; Martínez-Bartolomé, Marina; Burr, Stephanie D; Range, Ryan C
The anterior neuroectoderm (ANE) in many deuterostome embryos (echinoderms, hemichordates, urochordates, cephalochordates, and vertebrates) is progressively restricted along the anterior-posterior axis to a domain around the anterior pole. In the sea urchin embryo, three integrated Wnt signaling branches (Wnt/β-catenin, Wnt/JNK, and Wnt/PKC) govern this progressive restriction process, which begins around the 32- to 60-cell stage and terminates by the early gastrula stage. We previously have established that several secreted Wnt modulators of the Dickkopf and secreted Frizzled-related protein families (Dkk1, Dkk3, and sFRP-1/5) are expressed within the ANE and play important roles in modulating the Wnt signaling network during this process. In this study, we use morpholino and dominant-negative interference approaches to characterize the function of a novel Frizzled-related protein, secreted Frizzled-related protein 1 (sFRP-1), during ANE restriction. sFRP-1 appears to be related to a secreted Wnt modulator, sFRP3/4, that is essential to block Wnt signaling and establish the ANE in vertebrates. Here, we show that the sea urchin sFRP3/4 orthologue is not expressed during ANE restriction in the sea urchin embryo. Instead, our results indicate that ubiquitously expressed maternal sFRP-1 and Fzl1/2/7 signaling act together as early as the 32- to 60-cell stage to antagonize the ANE restriction mechanism mediated by Wnt/β-catenin and Wnt/JNK signaling. Then, starting from the blastula stage, Fzl5/8 signaling activates zygotic sFRP-1 within the ANE territory, where it works with the secreted Wnt antagonist Dkk1 (also activated by Fzl5/8 signaling) to antagonize Wnt1/Wnt8-Fzl5/8-JNK signaling in a negative feedback mechanism that defines the outer ANE territory boundary. Together, these data indicate that maternal and zygotic sFRP-1 protects the ANE territory by antagonizing the Wnt1/Wnt8-Fzl5/8-JNK signaling pathway throughout ANE restriction, providing precise
Clarke, Thomas H; Garb, Jessica E; Hayashi, Cheryl Y; Arensburger, Peter; Ayoub, Nadia A
The evolution of specialized tissues with novel functions, such as the silk synthesizing glands in spiders, is likely an influential driver of adaptive success. Large-scale gene duplication events and subsequent paralog divergence are thought to be required for generating evolutionary novelty. Such an event has been proposed for spiders, but not tested. We de novo assembled transcriptomes from three cobweb weaving spider species. Based on phylogenetic analyses of gene families with representatives from each of the three species, we found numerous duplication events indicative of a whole genome or segmental duplication. We estimated the age of the gene duplications relative to several speciation events within spiders and arachnids and found that the duplications likely occurred after the divergence of scorpions (order Scorpionida) and spiders (order Araneae), but before the divergence of the spider suborders Mygalomorphae and Araneomorphae, near the evolutionary origin of spider silk glands. Transcripts that are expressed exclusively or primarily within black widow silk glands are more likely to have a paralog descended from the ancient duplication event and have elevated amino acid replacement rates compared with other transcripts. Thus, an ancient large-scale gene duplication event within the spider lineage was likely an important source of molecular novelty during the evolution of silk gland-specific expression. This duplication event may have provided genetic material for subsequent silk gland diversification in the true spiders (Araneomorphae). © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Full Text Available Abstract Many different approaches have been developed to model and simulate gene regulatory networks. We proposed the following categories for gene regulatory network models: network parts lists, network topology models, network control logic models, and dynamic models. Here we will describe some examples for each of these categories. We will study the topology of gene regulatory networks in yeast in more detail, comparing a direct network derived from transcription factor binding data and an indirect network derived from genome-wide expression data in mutants. Regarding the network dynamics we briefly describe discrete and continuous approaches to network modelling, then describe a hybrid model called Finite State Linear Model and demonstrate that some simple network dynamics can be simulated in this model.
Iñiguez, Alena Mayo; Reinhard, Karl; Carvalho Gonçalves, Marcelo Luiz; Ferreira, Luiz Fernando; Araújo, Adauto; Paulo Vicente, Ana Carolina
Enterobius vermicularis, pinworm, is one of the most common helminths worldwide, infecting nearly a billion people at all socio-economic levels. In prehistoric populations the paleoparasitological findings show a pinworm homogeneous distribution among hunter-gatherers in North America, intensified with the advent of agriculture. This same increase also occurred in the transition from nomad hunter-gatherers to sedentary farmers in South America, although E. vermicularis infection encompasses only the ancient Andean peoples, with no record among the pre-Colombian populations in the South American lowlands. However, the outline of pinworm paleoepidemiology has been supported by microscopic finding of eggs recovered from coprolites. Since molecular techniques are precise and sensitive in detecting pathogen ancient DNA (aDNA), and also could provide insights into the parasite evolutionary history, in this work we have performed a molecular paleoparasitological study of E. vermicularis. aDNA was recovered and pinworm 5S rRNA spacer sequences were determined from pre-Columbian coprolites (4110 BC-AD 900) from four different North and South American archaeological sites. The sequence analysis confirmed E. vermicularis identity and revealed a similarity among ancient and modern sequences. Moreover, polymorphisms were identified at the relative positions 160, 173 and 180, in independent coprolite samples from Tulán, San Pedro de Atacama, Chile (1080-950 BC). We also verified the presence of peculiarities (Splicing leader (SL1) RNA sequence, spliced donor site, the Sm antigen biding site, and RNA secondary structure) which characterise the SL1 RNA gene. The analysis shows that the SL1 RNA gene of contemporary pinworms was present in pre-Columbian E. vermicularis by 6110 years ago. We were successful in detecting E. vermicularis aDNA even in coprolites without direct microscopic evidence of the eggs, improving the diagnosis of helminth infections in the past and further
Full Text Available Polyketides are natural products with a wide range of biological functions and pharmaceutical applications. Discovery and utilization of polyketides can be facilitated by understanding the evolutionary processes that gave rise to the biosynthetic machinery and the natural product potential of extant organisms. Gene duplication and subfunctionalization, as well as horizontal gene transfer are proposed mechanisms in the evolution of biosynthetic gene clusters. To explain the amount of homology in some polyketide synthases in unrelated organisms such as bacteria and fungi, interkingdom horizontal gene transfer has been evoked as the most likely evolutionary scenario. However, the origin of the genes and the direction of the transfer remained elusive.We used comparative phylogenetics to infer the ancestor of a group of polyketide synthase genes involved in antibiotic and mycotoxin production. We aligned keto synthase domain sequences of all available fungal 6-methylsalicylic acid (6-MSA-type PKSs and their closest bacterial relatives. To assess the role of symbiotic fungi in the evolution of this gene we generated 24 6-MSA synthase sequence tags from lichen-forming fungi. Our results support an ancient horizontal gene transfer event from an actinobacterial source into ascomycete fungi, followed by gene duplication.Given that actinobacteria are unrivaled producers of biologically active compounds, such as antibiotics, it appears particularly promising to study biosynthetic genes of actinobacterial origin in fungi. The large number of 6-MSA-type PKS sequences found in lichen-forming fungi leads us hypothesize that the evolution of typical lichen compounds, such as orsellinic acid derivatives, was facilitated by the gain of this bacterial polyketide synthase.
Oliver, Karen L; Lukic, Vesna; Thorne, Natalie P; Berkovic, Samuel F; Scheffer, Ingrid E; Bahlo, Melanie
We apply a novel gene expression network analysis to a cohort of 182 recently reported candidate Epileptic Encephalopathy genes to identify those most likely to be true Epileptic Encephalopathy genes. These candidate genes were identified as having single variants of likely pathogenic significance discovered in a large-scale massively parallel sequencing study. Candidate Epileptic Encephalopathy genes were prioritized according to their co-expression with 29 known Epileptic Encephalopathy genes. We utilized developing brain and adult brain gene expression data from the Allen Human Brain Atlas (AHBA) and compared this to data from Celsius: a large, heterogeneous gene expression data warehouse. We show replicable prioritization results using these three independent gene expression resources, two of which are brain-specific, with small sample size, and the third derived from a heterogeneous collection of tissues with large sample size. Of the nineteen genes that we predicted with the highest likelihood to be true Epileptic Encephalopathy genes, two (GNAO1 and GRIN2B) have recently been independently reported and confirmed. We compare our results to those produced by an established in silico prioritization approach called Endeavour, and finally present gene expression networks for the known and candidate Epileptic Encephalopathy genes. This highlights sub-networks of gene expression, particularly in the network derived from the adult AHBA gene expression dataset. These networks give clues to the likely biological interactions between Epileptic Encephalopathy genes, potentially highlighting underlying mechanisms and avenues for therapeutic targets.
Full Text Available Cancer is sometimes depicted as a reversion to single cell behavior in cells adapted to live in a multicellular assembly. If this is the case, one would expect that mutation in cancer disrupts functional mechanisms that suppress cell-level traits detrimental to multicellularity. Such mechanisms should have evolved with or after the emergence of multicellularity. This leads to two related, but distinct hypotheses: 1 Somatic mutations in cancer will occur in genes that are younger than the emergence of multicellularity (1000 million years [MY]; and 2 genes that are frequently mutated in cancer and whose mutations are functionally important for the emergence of the cancer phenotype evolved within the past 1000 million years, and thus would exhibit an age distribution that is skewed to younger genes. In order to investigate these hypotheses we estimated the evolutionary ages of all human genes and then studied the probability of mutation and their biological function in relation to their age and genomic location for both normal germline and cancer contexts. We observed that under a model of uniform random mutation across the genome, controlled for gene size, genes less than 500 MY were more frequently mutated in both cases. Paradoxically, causal genes, defined in the COSMIC Cancer Gene Census, were depleted in this age group. When we used functional enrichment analysis to explain this unexpected result we discovered that COSMIC genes with recessive disease phenotypes were enriched for DNA repair and cell cycle control. The non-mutated genes in these pathways are orthologous to those underlying stress-induced mutation in bacteria, which results in the clustering of single nucleotide variations. COSMIC genes were less common in regions where the probability of observing mutational clusters is high, although they are approximately 2-fold more likely to harbor mutational clusters compared to other human genes. Our results suggest this ancient mutational
Full Text Available Gene regulatory networks are perhaps the most important organizational level in the cell where signals from the cell state and the outside environment are integrated in terms of activation and inhibition of genes. For the last decade, the study of such networks has been fueled by large-scale experiments and renewed attention from the theoretical field. Different models have been proposed to, for instance, investigate expression dynamics, explain the network topology we observe in bacteria and yeast, and for the analysis of evolvability and robustness of such networks. Yet how these gene regulatory networks evolve and become evolvable remains an open question. An individual-oriented evolutionary model is used to shed light on this matter. Each individual has a genome from which its gene regulatory network is derived. Mutations, such as gene duplications and deletions, alter the genome, while the resulting network determines the gene expression pattern and hence fitness. With this protocol we let a population of individuals evolve under Darwinian selection in an environment that changes through time. Our work demonstrates that long-term evolution of complex gene regulatory networks in a changing environment can lead to a striking increase in the efficiency of generating beneficial mutations. We show that the population evolves towards genotype-phenotype mappings that allow for an orchestrated network-wide change in the gene expression pattern, requiring only a few specific gene indels. The genes involved are hubs of the networks, or directly influencing the hubs. Moreover, throughout the evolutionary trajectory the networks maintain their mutational robustness. In other words, evolution in an alternating environment leads to a network that is sensitive to a small class of beneficial mutations, while the majority of mutations remain neutral: an example of evolution of evolvability.
Ana Rita Araújo
Full Text Available The Drosophila melanogaster G protein-coupled receptor gene, methuselah (mth, has been described as a novel gene that is less than 10 million years old. Nevertheless, it shows a highly specific expression pattern in embryos, larvae, and adults, and has been implicated in larval development, stress resistance, and in the setting of adult lifespan, among others. Although mth belongs to a gene subfamily with 16 members in D. melanogaster, there is no evidence for functional redundancy in this subfamily. Therefore, it is surprising that a novel gene influences so many traits. Here, we explore the alternative hypothesis that mth is an old gene. Under this hypothesis, in species distantly related to D. melanogaster, there should be a gene with features similar to those of mth. By performing detailed phylogenetic, synteny, protein structure, and gene expression analyses we show that the D. virilis GJ12490 gene is the orthologous of mth in species distantly related to D. melanogaster. We also show that, in D. americana (a species of the virilis group of Drosophila, a common amino acid polymorphism at the GJ12490 orthologous gene is significantly associated with developmental time, size, and lifespan differences. Our results imply that GJ12490 orthologous genes are candidates for developmental time and lifespan differences in Drosophila in general.
McDonald, Bradon R; Currie, Cameron R
Lateral gene transfer (LGT) profoundly shapes the evolution of bacterial lineages. LGT across disparate phylogenetic groups and genome content diversity between related organisms suggest a model of bacterial evolution that views LGT as rampant and promiscuous. It has even driven the argument that species concepts and tree-based phylogenetics cannot be applied to bacteria. Here, we show that acquisition and retention of genes through LGT are surprisingly rare in the ubiquitous and biomedically important bacterial genus Streptomyces Using a molecular clock, we estimate that the Streptomyces bacteria are ~380 million years old, indicating that this bacterial genus is as ancient as land vertebrates. Calibrating LGT rate to this geologic time span, we find that on average only 10 genes per million years were acquired and subsequently maintained. Over that same time span, Streptomyces accumulated thousands of point mutations. By explicitly incorporating evolutionary timescale into our analyses, we provide a dramatically different view on the dynamics of LGT and its impact on bacterial evolution. IMPORTANCE Tree-based phylogenetics and the use of species as units of diversity lie at the foundation of modern biology. In bacteria, these pillars of evolutionary theory have been called into question due to the observation of thousands of lateral gene transfer (LGT) events within and between lineages. Here, we show that acquisition and retention of genes through LGT are exceedingly rare in the bacterial genus Streptomyces , with merely one gene acquired in Streptomyces lineages every 100,000 years. These findings stand in contrast to the current assumption of rampant genetic exchange, which has become the dominant hypothesis used to explain bacterial diversity. Our results support a more nuanced understanding of genetic exchange, with LGT impacting evolution over short timescales but playing a significant role over long timescales. Deeper understanding of LGT provides new
This dissertation discusses the topological and dynamical properties of GRNs in cancer, and is divided into four main chapters. First, the basic tools of modern complex network theory are introduced. These traditional tools as well as those developed by myself (set efficiency, interset efficiency, and nested communities) are crucial for understanding the intricate topological properties of GRNs, and later chapters recall these concepts. Second, the biology of gene regulation is discussed, and a method for disease-specific GRN reconstruction developed by our collaboration is presented. This complements the traditional exhaustive experimental approach of building GRNs edge-by-edge by quickly inferring the existence of as of yet undiscovered edges using correlations across sets of gene expression data. This method also provides insight into the distribution of common mutations across GRNs. Third, I demonstrate that the structures present in these reconstructed networks are strongly related to the evolutionary histories of their constituent genes. Investigation of how the forces of evolution shaped the topology of GRNs in multicellular organisms by growing outward from a core of ancient, conserved genes can shed light upon the ''reverse evolution'' of normal cells into unicellular-like cancer states. Next, I simulate the dynamics of the GRNs of cancer cells using the Hopfield model, an infinite range spin-glass model designed with the ability to encode Boolean data as attractor states. This attractor-driven approach facilitates the integration of gene expression data into predictive mathematical models. Perturbations representing therapeutic interventions are applied to sets of genes, and the resulting deviations from their attractor states are recorded, suggesting new potential drug targets for experimentation. Finally, I extend the Hopfield model to modular networks, cyclic attractors, and complex attractors, and apply these concepts to simulations of the cell cycle
The modeling of gene networks from transcriptional expression data is an important tool in biomedical research to reveal signaling pathways and to identify treatment targets. Current gene network modeling is primarily based on the use of Gaussian graphical models applied to continuous data, which give a closedformmarginal likelihood. In this paper,we extend network modeling to discrete data, specifically data from serial analysis of gene expression, and RNA-sequencing experiments, both of which generate counts of mRNAtranscripts in cell samples.We propose a generalized linear model to fit the discrete gene expression data and assume that the log ratios of the mean expression levels follow a Gaussian distribution.We restrict the gene network structures to decomposable graphs and derive the graphs by selecting the covariance matrix of the Gaussian distribution with the hyper-inverse Wishart priors. Furthermore, we incorporate prior network models based on gene ontology information, which avails existing biological information on the genes of interest. We conduct simulation studies to examine the performance of our discrete graphical model and apply the method to two real datasets for gene network inference. © The Author 2013. Published by Oxford University Press. All rights reserved.
David B. Tindall
Full Text Available In 1993 over 850 people were arrested for engaging in civil disobedience to prevent the clear-cut logging of pristine ancient temperate rainforests in Clayoquot Sound, Canada. This was the largest incident of this type in Canadian history, and has arguably been Canada's most visible mobilization over a specific environmental issue. This study examines the factors that explain the ongoing participation of individuals in the environmental movement (more broadly, beyond participation in civil disobedience to protect Clayoquot Sound during the period following the 1993 protest. We focus on the roles of interpersonal social networks and movement identification, and compare their statistical effects with the effects of values and attitudes on the level of participation of individuals in the movement. We compare survey data from members of Friends of Clayoquot Sound (FOCS, a key environmental organization in this protest, with data collected from several surveys of the general public, and also from members of a local countermovement group (a proforest industry group that mobilized against the environmental movement. Although values and attitudes statistically differentiate members of FOCS from the other groups, these variables do not statistically explain ongoing differential participation in the movement amongst FOCS members. Rather, individual level of participation in this environmental movement is best explained by ego-network centrality (the pattern of ties each respondent has to contacts in the movement, as measured by the number of ties FOCS members have to others in a range of environmental organizations, and by their level of identification with the movement. Implications of this research for more recent mobilizations, such as against oil pipelines, are discussed, as are avenues for future research.
Tamada, Yoshinori; Bannai, Hideo; Imoto, Seiya; Katayama, Toshiaki; Kanehisa, Minoru; Miyano, Satoru
Since microarray gene expression data do not contain sufficient information for estimating accurate gene networks, other biological information has been considered to improve the estimated networks. Recent studies have revealed that highly conserved proteins that exhibit similar expression patterns in different organisms, have almost the same function in each organism. Such conserved proteins are also known to play similar roles in terms of the regulation of genes. Therefore, this evolutionary information can be used to refine regulatory relationships among genes, which are estimated from gene expression data. We propose a statistical method for estimating gene networks from gene expression data by utilizing evolutionarily conserved relationships between genes. Our method simultaneously estimates two gene networks of two distinct organisms, with a Bayesian network model utilizing the evolutionary information so that gene expression data of one organism helps to estimate the gene network of the other. We show the effectiveness of the method through the analysis on Saccharomyces cerevisiae and Homo sapiens cell cycle gene expression data. Our method was successful in estimating gene networks that capture many known relationships as well as several unknown relationships which are likely to be novel. Supplementary information is available at http://bonsai.ims.u-tokyo.ac.jp/~tamada/bayesnet/.
Background Protein-protein, cell signaling, metabolic, and transcriptional interaction networks are useful for identifying connections between lists of experimentally identified genes/proteins. However, besides physical or co-expression interactions there are many ways in which pairs of genes, or their protein products, can be associated. By systematically incorporating knowledge on shared properties of genes from diverse sources to build functional association networks (FANs), researchers may be able to identify additional functional interactions between groups of genes that are not readily apparent. Results Genes2FANs is a web based tool and a database that utilizes 14 carefully constructed FANs and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user’s PubMed query. As a case study, we applied Genes2FANs to connect disease genes from 90 well-studied disorders. We find an inverse correlation between the counts of links connecting disease genes through PPI and links connecting diseases genes through FANs, separating diseases into two categories. Conclusions Genes2FANs is a useful tool for interpreting the relationships between gene/protein lists in the context of their various functions and networks. Combining functional association interactions with physical PPIs can be useful for revealing new biology and help form hypotheses for further experimentation. Our finding that disease genes in
Binladen, Jonas; Wiuf, Carsten Henrik; Gilbert, M. Thomas P.
in phenotypic traits of extinct taxa. It is well documented that postmortem damage in ancient mtDNA can lead to the generation of artifactual sequences. However, as yet no one has thoroughly investigated the damage spectrum in ancient nuDNA. By comparing clone sequences from 23 fossil specimens, recovered from...
Garrett, K A; Andersen, K F; Asche, F; Bowden, R L; Forbes, G A; Kulakow, P A; Zhou, B
Resistance genes are a major tool for managing crop diseases. The networks of crop breeders who exchange resistance genes and deploy them in varieties help to determine the global landscape of resistance and epidemics, an important system for maintaining food security. These networks function as a complex adaptive system, with associated strengths and vulnerabilities, and implications for policies to support resistance gene deployment strategies. Extensions of epidemic network analysis can be used to evaluate the multilayer agricultural networks that support and influence crop breeding networks. Here, we evaluate the general structure of crop breeding networks for cassava, potato, rice, and wheat. All four are clustered due to phytosanitary and intellectual property regulations, and linked through CGIAR hubs. Cassava networks primarily include public breeding groups, whereas others are more mixed. These systems must adapt to global change in climate and land use, the emergence of new diseases, and disruptive breeding technologies. Research priorities to support policy include how best to maintain both diversity and redundancy in the roles played by individual crop breeding groups (public versus private and global versus local), and how best to manage connectivity to optimize resistance gene deployment while avoiding risks to the useful life of resistance genes. [Formula: see text] Copyright © 2017 The Author(s). This is an open access article distributed under the CC BY 4.0 International license .
Gene expression level unilateral. Other genes epistatic. Collateral damage. Page 25. ok.. is there a phenotype? $ % #. Page 26. Can the regulatory network of. E. coli lacking the xenogene silencing system evolve towards greater fitness? Page 27. Many mutations emerge in a dynamic genome. Inactivation of the global ...
Li, Yupeng; Jackson, Scott A
Hanis Z A NurWaliyuddin
Full Text Available The aboriginal populations of Peninsular Malaysia, also known as Orang Asli (OA, comprise three major groups; Semang, Senoi and Proto-Malays. Here, we analyzed for the first time KIR gene polymorphisms for 167 OA individuals, including those from four smallest OA subgroups (Che Wong, Orang Kanaq, Lanoh and Kensiu using polymerase chain reaction-sequence specific primer (PCR-SSP analyses. The observed distribution of KIR profiles of OA is heterogenous; Haplotype B is the most frequent in the Semang subgroups (especially Batek while Haplotype A is the most common type in the Senoi. The Semang subgroups were clustered together with the Africans, Indians, Papuans and Australian Aborigines in a principal component analysis (PCA plot and shared many common genotypes (AB6, BB71, BB73 and BB159 observed in these other populations. Given that these populations also display high frequencies of Haplotype B, it is interesting to speculate that Haplotype B may be generally more frequent in ancient populations. In contrast, the two Senoi subgroups, Che Wong and Semai are displaced toward Southeast Asian and African populations in the PCA scatter plot, respectively. Orang Kanaq, the smallest and the most endangered of all OA subgroups, has lost some degree of genetic variation, as shown by their relatively high frequency of the AB2 genotype (0.73 and a total absence of KIR2DL2 and KIR2DS2 genes. Orang Kanaq tradition that strictly prohibits intermarriage with outsiders seems to have posed a serious threat to their survival. This present survey is a demonstration of the value of KIR polymorphisms in elucidating genetic relationships among human populations.
NurWaliyuddin, Hanis Z A; Norazmi, Mohd N; Edinur, Hisham A; Chambers, Geoffrey K; Panneerchelvam, Sundararajulu; Zafarina, Zainuddin
The aboriginal populations of Peninsular Malaysia, also known as Orang Asli (OA), comprise three major groups; Semang, Senoi and Proto-Malays. Here, we analyzed for the first time KIR gene polymorphisms for 167 OA individuals, including those from four smallest OA subgroups (Che Wong, Orang Kanaq, Lanoh and Kensiu) using polymerase chain reaction-sequence specific primer (PCR-SSP) analyses. The observed distribution of KIR profiles of OA is heterogenous; Haplotype B is the most frequent in the Semang subgroups (especially Batek) while Haplotype A is the most common type in the Senoi. The Semang subgroups were clustered together with the Africans, Indians, Papuans and Australian Aborigines in a principal component analysis (PCA) plot and shared many common genotypes (AB6, BB71, BB73 and BB159) observed in these other populations. Given that these populations also display high frequencies of Haplotype B, it is interesting to speculate that Haplotype B may be generally more frequent in ancient populations. In contrast, the two Senoi subgroups, Che Wong and Semai are displaced toward Southeast Asian and African populations in the PCA scatter plot, respectively. Orang Kanaq, the smallest and the most endangered of all OA subgroups, has lost some degree of genetic variation, as shown by their relatively high frequency of the AB2 genotype (0.73) and a total absence of KIR2DL2 and KIR2DS2 genes. Orang Kanaq tradition that strictly prohibits intermarriage with outsiders seems to have posed a serious threat to their survival. This present survey is a demonstration of the value of KIR polymorphisms in elucidating genetic relationships among human populations.
Pluripotent stem cells can be isolated from embryos or derived by reprogramming. Pluripotency is stabilized by an interconnected network of pluripotency genes that cooperatively regulate gene expression. Here we describe the molecular principles of pluripotency gene function and highlight post-transcriptional controls, particularly those induced by RNA-binding proteins and alternative splicing, as an important regulatory layer of pluripotency. We also discuss heterogeneity in pluripotency regulation, alternative pluripotency states and future directions of pluripotent stem cell research.
Full Text Available We tackle the problem of completing and inferring genetic networks under stationary conditions from static data, where network completion is to make the minimum amount of modifications to an initial network so that the completed network is most consistent with the expression data in which addition of edges and deletion of edges are basic modification operations. For this problem, we present a new method for network completion using dynamic programming and least-squares fitting. This method can find an optimal solution in polynomial time if the maximum indegree of the network is bounded by a constant. We evaluate the effectiveness of our method through computational experiments using synthetic data. Furthermore, we demonstrate that our proposed method can distinguish the differences between two types of genetic networks under stationary conditions from lung cancer and normal gene expression data.
... Matters NIH Research Matters August 12, 2013 Mutated Genes in Schizophrenia Map to Brain Networks Schizophrenia networks ... have a high number of spontaneous mutations in genes that form a network in the front region ...
Aguilar-Ruiz Jesus S
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Bohlander, S K
The molecular analysis of recurring chromosome rearrangements, especially of translocations and inversions, has provided us with valuable insight into the pathogenesis of hematological malignancies. Many translocations result in the fusion of genes located at the translocation breakpoints. In recent years we have witnessed a rapid rise in the number of chromosome translocations in leukemias being characterized at the molecular level. However, the number of genes being newly identified as translocation fusion genes has not risen at the same pace. This is due to the fact that several genes are involved in more than one translocation forming fusion genes with a number of other partner genes. Not only does one find star-shaped topologies, with one gene forming fusions with several others (e.g. ETV6/PDGFRB, ETV6/JAK2, ETV6/ABL etc.), but also networks connecting several genes with more than one fusion partner (e.g. ETV6/RUNX1 (AML1), RUNX1/CBFA2T1 (ETO), ETV6/EVI1, RUNX1/EVI1, ETV6/ABL, BCR/ABL). The emergence of such networks with the "recycling" of genes in new fusion combinations suggests that there is a rather limited number of genes which can be altered to cause leukemia. Copyright 2001 S. Karger AG, Basel
Nazarieh, Maryam; Wiese, Andreas; Will, Thorsten; Hamed, Mohamed; Helms, Volkhard
Identifying the gene regulatory networks governing the workings and identity of cells is one of the main challenges in understanding processes such as cellular differentiation, reprogramming or cancerogenesis. One particular challenge is to identify the main drivers and master regulatory genes that control such cell fate transitions. In this work, we reformulate this problem as the optimization problems of computing a Minimum Dominating Set and a Minimum Connected Dominating Set for directed graphs. Both MDS and MCDS are applied to the well-studied gene regulatory networks of the model organisms E. coli and S. cerevisiae and to a pluripotency network for mouse embryonic stem cells. The results show that MCDS can capture most of the known key player genes identified so far in the model organisms. Moreover, this method suggests an additional small set of transcription factors as novel key players for governing the cell-specific gene regulatory network which can also be investigated with regard to diseases. To this aim, we investigated the ability of MCDS to define key drivers in breast cancer. The method identified many known drug targets as members of the MDS and MCDS. This paper proposes a new method to identify key player genes in gene regulatory networks. The Java implementation of the heuristic algorithm explained in this paper is available as a Cytoscape plugin at http://apps.cytoscape.org/apps/mcds . The SageMath programs for solving integer linear programming formulations used in the paper are available at https://github.com/maryamNazarieh/KeyRegulatoryGenes and as supplementary material.
Aalt D J van Dijk
Full Text Available Mutational robustness of gene regulatory networks refers to their ability to generate constant biological output upon mutations that change network structure. Such networks contain regulatory interactions (transcription factor-target gene interactions but often also protein-protein interactions between transcription factors. Using computational modeling, we study factors that influence robustness and we infer several network properties governing it. These include the type of mutation, i.e. whether a regulatory interaction or a protein-protein interaction is mutated, and in the case of mutation of a regulatory interaction, the sign of the interaction (activating vs. repressive. In addition, we analyze the effect of combinations of mutations and we compare networks containing monomeric with those containing dimeric transcription factors. Our results are consistent with available data on biological networks, for example based on evolutionary conservation of network features. As a novel and remarkable property, we predict that networks are more robust against mutations in monomer than in dimer transcription factors, a prediction for which analysis of conservation of DNA binding residues in monomeric vs. dimeric transcription factors provides indirect evidence.
Alexey Anatolievich Morozov
Full Text Available Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary, sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm. Binary encoding can also be useful, but only when the methods mentioned above cannot be used.
Kadelka, Claus T.
Cells generally manage to maintain stable phenotypes in the face of widely varying environmental conditions. This fact is particularly surprising since the key step of gene expression is fundamentally a stochastic process. Many hypotheses have been suggested to explain this robustness. First, the special topology of gene regulatory networks (GRNs) seems to be an important factor as they possess feedforward loops and certain other topological features much more frequently than expected. Second, genes often regulate each other in a canalizing fashion: there exists a dominance order amidst the regulators of a gene, which in silico leads to very robust phenotypes. Lastly, an entirely novel gene regulatory mechanism, discovered and studied during the last two decades, which is believed to play an important role in cancer, is shedding some light on how canalization may in fact take place as part of a cell's gene regulatory program. Short segments of single-stranded RNA, so-called microRNAs, which are embedded in several different types of feedforward loops, help smooth out noise and generate canalizing effects in gene regulation by overriding the effect of certain genes on others. Boolean networks and their multi-state extensions have been successfully used to model GRNs for many years. In this dissertation, GRNs are represented in the time- and state-discrete framework of Stochastic Discrete Dynamical Systems (SDDS), which captures the cell-inherent stochasticity. Each gene has finitely many different concentration levels and its concentration at the next time step is determined by a gene-specific update rule that depends on the current concentration of the gene's regulators. The update rules in published gene regulatory networks are often nested canalizing functions. In Chapter 2, this class of functions is introduced, generalized and analyzed with respect to its potential to confer robustness. Chapter 3 describes a simulation study, which supports the hypothesis that
Ljubomirović Irena V.
Archaeological research revealed that ancient Naissus was located on the right bank of the Nišava, on the territory partially covered by the Niš Fortress. The ancient town developed on a wide and flat terrain, which offered good conditions for settlement, but also for raising fortifications. According to modern scholars, the urban settlement on the right bank of the Nišava was preceded by a small native village (vicus), which was important for the erection of the town fortifications. Numerous...
Gupta, Chinmaya; López, José Manuel; Ott, William; Josić, Krešimir; Bennett, Matthew R
Transcriptional delay can significantly impact the dynamics of gene networks. Here we examine how such delay affects bistable systems. We investigate several stochastic models of bistable gene networks and find that increasing delay dramatically increases the mean residence times near stable states. To explain this, we introduce a non-Markovian, analytically tractable reduced model. The model shows that stabilization is the consequence of an increased number of failed transitions between stable states. Each of the bistable systems that we simulate behaves in this manner.
Mondragón-Palomino, Mariana; Hiese, Luisa; Härter, Andrea; Koch, Marcus A; Theißen, Günter
Background Positive selection is recognized as the prevalence of nonsynonymous over synonymous substitutions in a gene. Models of the functional evolution of duplicated genes consider neofunctionalization as key to the retention of paralogues. For instance, duplicate transcription factors are specifically retained in plant and animal genomes and both positive selection and transcriptional divergence appear to have played a role in their diversification. However, the relative impact of these two factors has not been systematically evaluated. Class B MADS-box genes, comprising DEF-like and GLO-like genes, encode developmental transcription factors essential for establishment of perianth and male organ identity in the flowers of angiosperms. Here, we contrast the role of positive selection and the known divergence in expression patterns of genes encoding class B-like MADS-box transcription factors from monocots, with emphasis on the family Orchidaceae and the order Poales. Although in the monocots these two groups are highly diverse and have a strongly canalized floral morphology, there is no information on the role of positive selection in the evolution of their distinctive flower morphologies. Published research shows that in Poales, class B-like genes are expressed in stamens and in lodicules, the perianth organs whose identity might also be specified by class B-like genes, like the identity of the inner tepals of their lily-like relatives. In orchids, however, the number and pattern of expression of class B-like genes have greatly diverged. Results The DEF-like genes from Orchidaceae form four well-supported, ancient clades of orthologues. In contrast, orchid GLO-like genes form a single clade of ancient orthologues and recent paralogues. DEF-like genes from orchid clade 2 (OMADS3-like genes) are under less stringent purifying selection than the other orchid DEF-like and GLO-like genes. In comparison with orchids, purifying selection was less stringent in DEF
Mann, Paul J.; Eglinton, Timothy I.; McIntyre, Cameron P.; Zimov, Nikita; Davydova, Anna; Vonk, Jorien E.; Holmes, Robert M.; Spencer, Robert G M
Northern high-latitude rivers are major conduits of carbon from land to coastal seas and the Arctic Ocean. Arctic warming is promoting terrestrial permafrost thaw and shifting hydrologic flowpaths, leading to fluvial mobilization of ancient carbon stores. Here we describe 14 C and 13 C
Full Text Available Abstract Background Gene-set enrichment analyses (GEA or GSEA are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis. Results We developed a method of network enrichment analysis (NEA that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study. Conclusions The results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps.
Alié, Alexandre; Leclère, Lucas; Jager, Muriel; Dayraud, Cyrielle; Chang, Patrick; Le Guyader, Hervé; Quéinnec, Eric; Manuel, Michaël
Stem cells are essential for animal development and adult tissue homeostasis, and the quest for an ancestral gene fingerprint of stemness is a major challenge for evolutionary developmental biology. Recent studies have indicated that a series of genes, including the transposon silencer Piwi and the translational activator Vasa, specifically involved in germline determination and maintenance in classical bilaterian models (e.g., vertebrates, fly, nematode), are more generally expressed in adult multipotent stem cells in other animals like flatworms and hydras. Since the progeny of these multipotent stem cells includes both somatic and germinal derivatives, it remains unclear whether Vasa, Piwi, and associated genes like Bruno and PL10 were ancestrally linked to stemness, or to germinal potential. We have investigated the expression of Vasa, two Piwi paralogues, Bruno and PL10 in Pleurobrachia pileus, a member of the early-diverging phylum Ctenophora, the probable sister group of cnidarians. These genes were all expressed in the male and female germlines, and with the exception of one of the Piwi paralogues, they showed similar expression patterns within somatic territories (tentacle root, comb rows, aboral sensory complex). Cytological observations and EdU DNA-labelling and long-term retention experiments revealed concentrations of stem cells closely matching these gene expression areas. These stem cell pools are spatially restricted, and each specialised in the production of particular types of somatic cells. These data unveil important aspects of cell renewal within the ctenophore body and suggest that Piwi, Vasa, Bruno, and PL10 belong to a gene network ancestrally acting in two distinct contexts: (i) the germline and (ii) stem cells, whatever the nature of their progeny. Copyright © 2010 Elsevier Inc. All rights reserved.
Westenberg, Michel A.; Hijum, Sacha A.F.T. van; Lulko, Andrzej T.; Kuipers, Oscar P.; Roerdink, Jos B.T.M.; Linsen, L; Hagen, H; Hamann, B
We present GENeVis, an application to visualize gene expression time series data in a gene regulatory network context. This is a network of regulator proteins that regulate the expression of their respective target genes. The networks are represented as graphs, in which the nodes represent genes,
Willerslev, Eske; Cooper, Alan
ancient DNA, palaeontology, palaeoecology, archaeology, population genetics, DNA damage and repair......ancient DNA, palaeontology, palaeoecology, archaeology, population genetics, DNA damage and repair...
Full Text Available Abstract Background Stochastic simulation of gene networks by Markov processes has important applications in molecular biology. The complexity of exact simulation algorithms scales with the number of discrete jumps to be performed. Approximate schemes reduce the computational time by reducing the number of simulated discrete events. Also, answering important questions about the relation between network topology and intrinsic noise generation and propagation should be based on general mathematical results. These general results are difficult to obtain for exact models. Results We propose a unified framework for hybrid simplifications of Markov models of multiscale stochastic gene networks dynamics. We discuss several possible hybrid simplifications, and provide algorithms to obtain them from pure jump processes. In hybrid simplifications, some components are discrete and evolve by jumps, while other components are continuous. Hybrid simplifications are obtained by partial Kramers-Moyal expansion 123 which is equivalent to the application of the central limit theorem to a sub-model. By averaging and variable aggregation we drastically reduce simulation time and eliminate non-critical reactions. Hybrid and averaged simplifications can be used for more effective simulation algorithms and for obtaining general design principles relating noise to topology and time scales. The simplified models reproduce with good accuracy the stochastic properties of the gene networks, including waiting times in intermittence phenomena, fluctuation amplitudes and stationary distributions. The methods are illustrated on several gene network examples. Conclusion Hybrid simplifications can be used for onion-like (multi-layered approaches to multi-scale biochemical systems, in which various descriptions are used at various scales. Sets of discrete and continuous variables are treated with different methods and are coupled together in a physically justified approach.
Supplementary figure 1. (A): Visualization of one of the network modules by GeneMania for dataset 4 (B): Visualization of one of the network modules by GeneMania for dataset 1 (C): Visualization of one of the network modules by GeneMania for dataset 3.
Challacombe, Jean F [Los Alamos National Laboratory; Eichorst, Stephanie A [Los Alamos National Laboratory; Xie, Gary [Los Alamos National Laboratory; Kuske, Cheryl R [Los Alamos National Laboratory; Hauser, Loren [ORNL; Land, Miriam [ORNL
Bacterial genome sizes range from ca. 0.5 to 10Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Sequenced genomes of strains in the phylum Acidobacteria revealed that 'Solibacter usistatus' strain Ellin6076 harbors a 9.9 Mb genome. This large genome appears to have arisen by horizontal gene transfer via ancient bacteriophage and plasmid-mediated transduction, as well as widespread small-scale gene duplications. This has resulted in an increased number of paralogs that are potentially ecologically important (ecoparalogs). Low amino acid sequence identities among functional group members and lack of conserved gene order and orientation in the regions containing similar groups of paralogs suggest that most of the paralogs were not the result of recent duplication events. The genome sizes of cultured subdivision 1 and 3 strains in the phylum Acidobacteria were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 1 were estimated to have smaller genome sizes ranging from ca. 2.0 to 4.8 Mb, whereas members of subdivision 3 had slightly larger genomes, from ca. 5.8 to 9.9 Mb. It is hypothesized that the large genome of strain Ellin6076 encodes traits that provide a selective metabolic, defensive and regulatory advantage in the variable soil environment.
Pickrell, Joseph K; Reich, David
Genetic information contains a record of the history of our species, and technological advances have transformed our ability to access this record. Many studies have used genome-wide data from populations today to learn about the peopling of the globe and subsequent adaptation to local conditions. Implicit in this research is the assumption that the geographic locations of people today are informative about the geographic locations of their ancestors in the distant past. However, it is now clear that long-range migration, admixture, and population replacement subsequent to the initial out-of-Africa expansion have altered the genetic structure of most of the world's human populations. In light of this we argue that it is time to critically reevaluate current models of the peopling of the globe, as well as the importance of natural selection in determining the geographic distribution of phenotypes. We specifically highlight the transformative potential of ancient DNA. By accessing the genetic make-up of populations living at archaeologically known times and places, ancient DNA makes it possible to directly track migrations and responses to natural selection. Copyright © 2014 Elsevier Ltd. All rights reserved.
Pardee, Keith; Green, Alexander A.; Ferrante, Tom; Cameron, D. Ewen; DaleyKeyser, Ajay; Yin, Peng; Collins, James J.
Synthetic gene networks have wide-ranging uses in reprogramming and rewiring organisms. To date, there has not been a way to harness the vast potential of these networks beyond the constraints of a laboratory or in vivo environment. Here, we present an in vitro paper-based platform that provides a new venue for synthetic biologists to operate, and a much-needed medium for the safe deployment of engineered gene circuits beyond the lab. Commercially available cell-free systems are freeze-dried onto paper, enabling the inexpensive, sterile and abiotic distribution of synthetic biology-based technologies for the clinic, global health, industry, research and education. For field use, we create circuits with colorimetric outputs for detection by eye, and fabricate a low-cost, electronic optical interface. We demonstrate this technology with small molecule and RNA actuation of genetic switches, rapid prototyping of complex gene circuits, and programmable in vitro diagnostics, including glucose sensors and strain-specific Ebola virus sensors. PMID:25417167
Ljubomirović Irena V.
Full Text Available Archaeological research revealed that ancient Naissus was located on the right bank of the Nišava, on the territory partially covered by the Niš Fortress. The ancient town developed on a wide and flat terrain, which offered good conditions for settlement, but also for raising fortifications. According to modern scholars, the urban settlement on the right bank of the Nišava was preceded by a small native village (vicus, which was important for the erection of the town fortifications. Numerous roads were linking Naissus with the surrounding regions and villas in the county side. We learn about them on the basis of epigraphic and archaeological material milestones and remains of roads, but also on the basis of the location of the necropolises, which in the classical period often sprang up near the suburban roads. The road leading to the east crossed the Nišava by a stone bridge, whose remains were visible not far from today's 'Benetton' factory. The road led further over 'Gabrovac land', intersected with streams over which the remains of three stone bridges from the Roman period were found. This route led to Mediana, a suburb with villas three miles distant from the city. Another important road connecting the region Pomoravlje with the southern parts of the province of Dalmatia was the road Naissus-Lissus. TA section leding to Macedonia and the harbour of Thessalonica over Scupi branched from it south of Naissus: Ad Herculem, Hammeum Ad Fines, Vindenae and Vicianum. From Vicianum station (Vučitrn one section ran towards Lissus and another towards Scupi. Throughout the Timok valley stretched one of the most important roads (Naissus-Ratiaria that linked Naissus and the central Balkan areas with the region of Podunavlje (the Danube basin. The road led from Naissus to the East, along the right bank of the Nišava (across the areas of Jagodin mala and Vrežina and, at the modern village of Malča, it turned towards the North, following the route of
Warr, Oliver; Sherwood Lollar, Barbara; Fellowes, Jonathan; Sutcliffe, Chelsea N.; McDermott, Jill M.; Holland, Greg; Mabry, Jennifer C.; Ballentine, Christopher J.
We show that fluid volumes residing within the Precambrian crystalline basement account for ca 30% of the total groundwater inventory of the Earth (> 30 million km3). The residence times and scientific importance of this groundwater are only now receiving attention with ancient fracture fluids identified in Canada and South Africa showing: (1) microbial life which has existed in isolation for millions of years; (2) significant hydrogen and hydrocarbon production via water-rock reactions; and (3) preserving noble gas components from the early atmosphere. Noble gas (He, Ne, Ar, Kr, Xe) abundance and isotopic compositions provide the primary evidence for fluid mean residence time (MRT). Here we extend the noble gas data from the Kidd Creek Mine in Timmins Ontario Canada, a volcanogenic massive sulfide (VMS) deposit formed at 2.7 Ga, in which fracture fluids with MRTs of 1.1-1.7 Ga were identified at 2.4 km depth (Holland et al., 2013); to fracture fluids at 2.9 km depth. We compare here the Kidd Creek Mine study with noble gas compositions determined in fracture fluids taken from two mines (Mine 1 & Mine 2) at 1.7 and 1.4 km depth below surface in the Sudbury Basin formed by a meteorite impact at 1.849 Ga. The 2.9 km samples at Kidd Creek Mine show the highest radiogenic isotopic ratios observed to date in free fluids (e.g. 21Ne/22Ne = 0.6 and 40Ar/36Ar = 102,000) and have MRTs of 1.0-2.2 Ga. In contrast, resampled 2.4 km fluids indicated a less ancient MRT (0.2-0.6 Ga) compared with the previous study (1.1-1.7 Ga). This is consistent with a change in the age distribution of fluids feeding the fractures as they drain, with a decreasing proportion of the most ancient end-member fluids. 129Xe/136Xe ratios for these fluids confirm that boreholes at 2.4 km versus 2.9 km are sourced from hydrogeologically distinct systems. In contrast, results for the Sudbury mines have MRTs of 0.2-0.6 and 0.2-0.9 Ga for Mines 1 and 2 respectively. While still old compared to almost all
Full Text Available Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.
Full Text Available In this paper, we apply the entitymetrics model to our constructed Gene-Citation-Gene (GCG network. Based on the premise there is a hidden, but plausible, relationship between an entity in one article and an entity in its citing article, we constructed a GCG network of gene pairs implicitly connected through citation. We compare the performance of this GCG network to a gene-gene (GG network constructed over the same corpus but which uses gene pairs explicitly connected through traditional co-occurrence. Using 331,411 MEDLINE abstracts collected from 18,323 seed articles and their references, we identify 25 gene pairs. A comparison of these pairs with interactions found in BioGRID reveal that 96% of the gene pairs in the GCG network have known interactions. We measure network performance using degree, weighted degree, closeness, betweenness centrality and PageRank. Combining all measures, we find the GCG network has more gene pairs, but a lower matching rate than the GG network. However, combining top ranked genes in both networks produces a matching rate of 35.53%. By visualizing both the GG and GCG networks, we find that cancer is the most dominant disease associated with the genes in both networks. Overall, the study indicates that the GCG network can be useful for detecting gene interaction in an implicit manner.
Kippes, Nestor; Debernardi, Juan M; Vasquez-Gross, Hans A; Akpinar, Bala A; Budak, Hikment; Kato, Kenji; Chao, Shiaoman; Akhunov, Eduard; Dubcovsky, Jorge
Wheat varieties with a winter growth habit require long exposures to low temperatures (vernalization) to accelerate flowering. Natural variation in four vernalization genes regulating this requirement has favored wheat adaptation to different environments. The first three genes (VRN1-VRN3) have been cloned and characterized before. Here we show that the fourth gene, VRN-D4, originated by the insertion of a ∼290-kb region from chromosome arm 5AL into the proximal region of chromosome arm 5DS. The inserted 5AL region includes a copy of VRN-A1 that carries distinctive mutations in its coding and regulatory regions. Three lines of evidence confirmed that this gene is VRN-D4: it cosegregated with VRN-D4 in a high-density mapping population; it was expressed earlier than other VRN1 genes in the absence of vernalization; and induced mutations in this gene resulted in delayed flowering. VRN-D4 was found in most accessions of the ancient subspecies Triticum aestivum ssp. sphaerococcum from South Asia. This subspecies showed a significant reduction of genetic diversity and increased genetic differentiation in the centromeric region of chromosome 5D, suggesting that VRN-D4 likely contributed to local adaptation and was favored by positive selection. Three adjacent SNPs in a regulatory region of the VRN-D4 first intron disrupt the binding of GLYCINE-RICH RNA-BINDING PROTEIN 2 (TaGRP2), a known repressor of VRN1 expression. The same SNPs were identified in VRN-A1 alleles previously associated with reduced vernalization requirement. These alleles can be used to modulate vernalization requirements and to develop wheat varieties better adapted to different or changing environments.
Francisco J. Romero-Campero
Full Text Available Phototrophic eukaryotes are among the most successful organisms on Earth due to their unparalleled efficiency at capturing light energy and fixing carbon dioxide to produce organic molecules. A conserved and efficient network of light-dependent regulatory modules could be at the bases of this success. This regulatory system conferred early advantages to phototrophic eukaryotes that allowed for specialization, complex developmental processes and modern plant characteristics. We have studied light-dependent gene regulatory modules from algae to plants employing integrative-omics approaches based on gene co-expression networks. Our study reveals some remarkably conserved ways in which eukaryotic phototrophs deal with day length and light signaling. Here we describe how a family of Arabidopsis transcription factors involved in photoperiod response has evolved from a single algal gene according to the innovation, amplification and divergence theory of gene evolution by duplication. These modifications of the gene co-expression networks from the ancient unicellular green algae Chlamydomonas reinhardtii to the modern brassica Arabidopsis thaliana may hint on the evolution and specialization of plants and other organisms.
Petrovskaya, Olga V; Petrovskiy, Evgeny D; Lavrik, Inna N; Ivanisenko, Vladimir A
Gene network modeling is one of the widely used approaches in systems biology. It allows for the study of complex genetic systems function, including so-called mosaic gene networks, which consist of functionally interacting subnetworks. We conducted a study of a mosaic gene networks modeling method based on integration of models of gene subnetworks by linear control functionals. An automatic modeling of 10,000 synthetic mosaic gene regulatory networks was carried out using computer experiments on gene knockdowns/knockouts. Structural analysis of graphs of generated mosaic gene regulatory networks has revealed that the most important factor for building accurate integrated mathematical models, among those analyzed in the study, is data on expression of genes corresponding to the vertices with high properties of centrality.
Kevin L Childs
Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional
Hu, H P; Niu, Z J; Bai, Y P; Tan, X H
Based on gene expression, we have classified 53 colon cancer patients with UICC II into two groups: relapse and no relapse. Samples were taken from each patient, and gene information was extracted. Of the 53 samples examined, 500 genes were considered proper through analyses by S-Kohonen, BP, and SVM neural networks. Classification accuracy obtained by S-Kohonen neural network reaches 91%, which was more accurate than classification by BP and SVM neural networks. The results show that S-Kohonen neural network is more plausible for classification and has a certain feasibility and validity as compared with BP and SVM neural networks.
Torres-Cortés, Gloria; Ghignone, Stefano; Bonfante, Paola; Schüßler, Arthur
For more than 450 million years, arbuscular mycorrhizal fungi (AMF) have formed intimate, mutualistic symbioses with the vast majority of land plants and are major drivers in almost all terrestrial ecosystems. The obligate plant-symbiotic AMF host additional symbionts, so-called Mollicutes-related endobacteria (MRE). To uncover putative functional roles of these widespread but yet enigmatic MRE, we sequenced the genome of DhMRE living in the AMF Dentiscutata heterogama. Multilocus phylogenetic analyses showed that MRE form a previously unidentified lineage sister to the hominis group of Mycoplasma species. DhMRE possesses a strongly reduced metabolic capacity with 55% of the proteins having unknown function, which reflects unique adaptations to an intracellular lifestyle. We found evidence for transkingdom gene transfer between MRE and their AMF host. At least 27 annotated DhMRE proteins show similarities to nuclear-encoded proteins of the AMF Rhizophagus irregularis, which itself lacks MRE. Nuclear-encoded homologs could moreover be identified for another AMF, Gigaspora margarita, and surprisingly, also the non-AMF Mortierella verticillata. Our data indicate a possible origin of the MRE-fungus association in ancestors of the Glomeromycota and Mucoromycotina. The DhMRE genome encodes an arsenal of putative regulatory proteins with eukaryotic-like domains, some of them encoded in putative genomic islands. MRE are highly interesting candidates to study the evolution and interactions between an ancient, obligate endosymbiotic prokaryote with its obligate plant-symbiotic fungal host. Our data moreover may be used for further targeted searches for ancient effector-like proteins that may be key components in the regulation of the arbuscular mycorrhiza symbiosis.
Bazil, Jason N; Stamm, Karl D; Li, Xing; Thiagarajan, Raghuram; Nelson, Timothy J; Tomita-Mitchell, Aoy; Beard, Daniel A
Cardiac development is a complex, multiscale process encompassing cell fate adoption, differentiation and morphogenesis. To elucidate pathways underlying this process, a recently developed algorithm to reverse engineer gene regulatory networks was applied to time-course microarray data obtained from the developing mouse heart. Approximately 200 genes of interest were input into the algorithm to generate putative network topologies that are capable of explaining the experimental data via model simulation. To cull specious network interactions, thousands of putative networks are merged and filtered to generate scale-free, hierarchical networks that are statistically significant and biologically relevant. The networks are validated with known gene interactions and used to predict regulatory pathways important for the developing mammalian heart. Area under the precision-recall curve and receiver operator characteristic curve are 9% and 58%, respectively. Of the top 10 ranked predicted interactions, 4 have already been validated. The algorithm is further tested using a network enriched with known interactions and another depleted of them. The inferred networks contained more interactions for the enriched network versus the depleted network. In all test cases, maximum performance of the algorithm was achieved when the purely data-driven method of network inference was combined with a data-independent, functional-based association method. Lastly, the network generated from the list of approximately 200 genes of interest was expanded using gene-profile uniqueness metrics to include approximately 900 additional known mouse genes and to form the most likely cardiogenic gene regulatory network. The resultant network supports known regulatory interactions and contains several novel cardiogenic regulatory interactions. The method outlined herein provides an informative approach to network inference and leads to clear testable hypotheses related to gene regulation.
Shao, Zhu-Qing; Xue, Jia-Yu; Wu, Ping; Zhang, Yan-Mei; Wu, Yue; Hang, Yue-Yu; Wang, Bin; Chen, Jian-Qun
Nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes make up the largest plant disease resistance gene family (R genes), with hundreds of copies occurring in individual angiosperm genomes. However, the expansion history of NBS-LRR genes during angiosperm evolution is largely unknown. By identifying more than 6,000 NBS-LRR genes in 22 representative angiosperms and reconstructing their phylogenies, we present a potential framework of NBS-LRR gene evolution in the angiosperm. Three anciently diverged NBS-LRR classes (TNLs, CNLs, and RNLs) were distinguished with unique exon-intron structures and DNA motif sequences. A total of seven ancient TNL, 14 CNL, and two RNL lineages were discovered in the ancestral angiosperm, from which all current NBS-LRR gene repertoires were evolved. A pattern of gradual expansion during the first 100 million years of evolution of the angiosperm clade was observed for CNLs. TNL numbers remained stable during this period but were eventually deleted in three divergent angiosperm lineages. We inferred that an intense expansion of both TNL and CNL genes started from the Cretaceous-Paleogene boundary. Because dramatic environmental changes and an explosion in fungal diversity occurred during this period, the observed expansions of R genes probably reflect convergent adaptive responses of various angiosperm families. An ancient whole-genome duplication event that occurred in an angiosperm ancestor resulted in two RNL lineages, which were conservatively evolved and acted as scaffold proteins for defense signal transduction. Overall, the reconstructed framework of angiosperm NBS-LRR gene evolution in this study may serve as a fundamental reference for better understanding angiosperm NBS-LRR genes. PMID:26839128
Shao, Zhu-Qing; Xue, Jia-Yu; Wu, Ping; Zhang, Yan-Mei; Wu, Yue; Hang, Yue-Yu; Wang, Bin; Chen, Jian-Qun
Nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes make up the largest plant disease resistance gene family (R genes), with hundreds of copies occurring in individual angiosperm genomes. However, the expansion history of NBS-LRR genes during angiosperm evolution is largely unknown. By identifying more than 6,000 NBS-LRR genes in 22 representative angiosperms and reconstructing their phylogenies, we present a potential framework of NBS-LRR gene evolution in the angiosperm. Three anciently diverged NBS-LRR classes (TNLs, CNLs, and RNLs) were distinguished with unique exon-intron structures and DNA motif sequences. A total of seven ancient TNL, 14 CNL, and two RNL lineages were discovered in the ancestral angiosperm, from which all current NBS-LRR gene repertoires were evolved. A pattern of gradual expansion during the first 100 million years of evolution of the angiosperm clade was observed for CNLs. TNL numbers remained stable during this period but were eventually deleted in three divergent angiosperm lineages. We inferred that an intense expansion of both TNL and CNL genes started from the Cretaceous-Paleogene boundary. Because dramatic environmental changes and an explosion in fungal diversity occurred during this period, the observed expansions of R genes probably reflect convergent adaptive responses of various angiosperm families. An ancient whole-genome duplication event that occurred in an angiosperm ancestor resulted in two RNL lineages, which were conservatively evolved and acted as scaffold proteins for defense signal transduction. Overall, the reconstructed framework of angiosperm NBS-LRR gene evolution in this study may serve as a fundamental reference for better understanding angiosperm NBS-LRR genes. © 2016 American Society of Plant Biologists. All Rights Reserved.
Rankin, Scott A; Zorn, Aaron M
The epithelial lining of the respiratory system originates from a small group of progenitor cells in the ventral foregut endoderm of the early embryo. Research in the last decade has revealed a number of paracrine signaling pathways that are critical for the development of these respiratory progenitors. In the post-genomic era the challenge now is to figure out at the genome wide level how these different signaling pathways and their downstream transcription factors interact in a complex "gene regulatory network" (GRN) to orchestrate early lung development. In this prospective, we review our growing understanding of the GRN governing lung specification. We discuss key gaps in our knowledge and describe emerging opportunities that will soon provide an unprecedented understanding of lung development and accelerate our ability to apply this knowledge to regenerative medicine. © 2014 Wiley Periodicals, Inc.
Coronnello, C; Tumminello, M; Micciche, S; Mantegna, R.N.
Many biological systems can be described as networks where different elements interact, in order to perform biological processes. We introduce a network associated with the Gene Ontology. Specifically, we construct a correlation-based network where the vertices are the terms of the Gene Ontology and the link between each two terms is weighted on the basis of the number of genes that they have in common. We analyze a filtered network obtained from the correlation-based network and we characterize its evolution over different releases of the Gene Ontology.
Ryan J Mailloux
Full Text Available The tricarboxylic acid (TCA cycle is an essential metabolic network in all oxidative organisms and provides precursors for anabolic processes and reducing factors (NADH and FADH(2 that drive the generation of energy. Here, we show that this metabolic network is also an integral part of the oxidative defence machinery in living organisms and alpha-ketoglutarate (KG is a key participant in the detoxification of reactive oxygen species (ROS. Its utilization as an anti-oxidant can effectively diminish ROS and curtail the formation of NADH, a situation that further impedes the release of ROS via oxidative phosphorylation. Thus, the increased production of KG mediated by NADP-dependent isocitrate dehydrogenase (NADP-ICDH and its decreased utilization via the TCA cycle confer a unique strategy to modulate the cellular redox environment. Activities of alpha-ketoglutarate dehydrogenase (KGDH, NAD-dependent isocitrate dehydrogenase (NAD-ICDH, and succinate dehydrogenase (SDH were sharply diminished in the cellular systems exposed to conditions conducive to oxidative stress. These findings uncover an intricate link between TCA cycle and ROS homeostasis and may help explain the ineffective TCA cycle that characterizes various pathological conditions and ageing.
Ó'Maoiléidigh, Diarmuid Seosamh; Graciet, Emmanuelle; Wellmer, Frank
The formation of flowers is one of the main models for studying the regulatory mechanisms that underlie plant development and evolution. Over the past three decades, extensive genetic and molecular analyses have led to the identification of a large number of key floral regulators and to detailed insights into how they control flower morphogenesis. In recent years, genome-wide approaches have been applied to obtaining a global view of the gene regulatory networks underlying flower formation. Furthermore, mathematical models have been developed that can simulate certain aspects of this process and drive further experimentation. Here, we review some of the main findings made in the field of Arabidopsis thaliana flower development, with an emphasis on recent advances. In particular, we discuss the activities of the floral organ identity factors, which are pivotal for the specification of the different types of floral organs, and explore the experimental avenues that may elucidate the molecular mechanisms and gene expression programs through which these master regulators of flower development act. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
M Cristina Gutierrez
Full Text Available The highly successful human pathogen Mycobacterium tuberculosis has an extremely low level of genetic variation, which suggests that the entire population resulted from clonal expansion following an evolutionary bottleneck around 35,000 y ago. Here, we show that this population constitutes just the visible tip of a much broader progenitor species, whose extant representatives are human isolates of tubercle bacilli from East Africa. In these isolates, we detected incongruence among gene phylogenies as well as mosaic gene sequences, whose individual elements are retrieved in classical M. tuberculosis. Therefore, despite its apparent homogeneity, the M. tuberculosis genome appears to be a composite assembly resulting from horizontal gene transfer events predating clonal expansion. The amount of synonymous nucleotide variation in housekeeping genes suggests that tubercle bacilli were contemporaneous with early hominids in East Africa, and have thus been coevolving with their human host much longer than previously thought. These results open novel perspectives for unraveling the molecular bases of M. tuberculosis evolutionary success.
Feldman, Ruth; Monakhov, Mikhail; Pratt, Maayan; Ebstein, Richard P
Oxytocin (OT), a nonapeptide signaling molecule originating from an ancestral peptide, appears in different variants across all vertebrate and several invertebrate species. Throughout animal evolution, neuropeptidergic signaling has been adapted by organisms for regulating response to rapidly changing environments. The family of OT-like molecules affects both peripheral tissues implicated in reproduction, homeostasis, and energy balance, as well as neuromodulation of social behavior, stress regulation, and associative learning in species ranging from nematodes to humans. After describing the OT-signaling pathway, we review research on the three genes most extensively studied in humans: the OT receptor (OXTR), the structural gene for OT (OXT/neurophysin-I), and CD38. Consistent with the notion that sociality should be studied from the perspective of social life at the species level, we address human social functions in relation to OT-pathway genes, including parenting, empathy, and using social relationships to manage stress. We then describe associations between OT-pathway genes with psychopathologies involving social dysfunctions such as autism, depression, or schizophrenia. Human research particularly underscored the involvement of two OXTR single nucleotide polymorphisms (rs53576, rs2254298) with fewer studies focusing on other OXTR (rs7632287, rs1042778, rs2268494, rs2268490), OXT (rs2740210, rs4813627, rs4813625), and CD38 (rs3796863, rs6449197) single nucleotide polymorphisms. Overall, studies provide evidence for the involvement of OT-pathway genes in human social functions but also suggest that factors such as gender, culture, and early environment often confound attempts to replicate first findings. We conclude by discussing epigenetics, conceptual implications within an evolutionary perspective, and future directions, especially the need to refine phenotypes, carefully characterize early environments, and integrate observations of social behavior across
For pituitary adenoma-specific coexpressed genes, we integrated transcription factor (TF) and microRNA (miRNA) regulation to construct a complex regulatory network from the transcriptional and posttranscriptional perspectives. Network module analysis identified the synergistic regulation of genes by miRNAs and TFs in ...
Rasch, Liam J; Martin, Kyle J; Cooper, Rory L; Metscher, Brian D; Underwood, Charlie J; Fraser, Gareth J
The evolution of oral teeth is considered a major contributor to the overall success of jawed vertebrates. This is especially apparent in cartilaginous fishes including sharks and rays, which develop elaborate arrays of highly specialized teeth, organized in rows and retain the capacity for life-long regeneration. Perpetual regeneration of oral teeth has been either lost or highly reduced in many other lineages including important developmental model species, so cartilaginous fishes are uniquely suited for deep comparative analyses of tooth development and regeneration. Additionally, sharks and rays can offer crucial insights into the characters of the dentition in the ancestor of all jawed vertebrates. Despite this, tooth development and regeneration in chondrichthyans is poorly understood and remains virtually uncharacterized from a developmental genetic standpoint. Using the emerging chondrichthyan model, the catshark (Scyliorhinus spp.), we characterized the expression of genes homologous to those known to be expressed during stages of early dental competence, tooth initiation, morphogenesis, and regeneration in bony vertebrates. We have found that expression patterns of several genes from Hh, Wnt/β-catenin, Bmp and Fgf signalling pathways indicate deep conservation over ~450 million years of tooth development and regeneration. We describe how these genes participate in the initial emergence of the shark dentition and how they are redeployed during regeneration of successive tooth generations. We suggest that at the dawn of the vertebrate lineage, teeth (i) were most likely continuously regenerative structures, and (ii) utilised a core set of genes from members of key developmental signalling pathways that were instrumental in creating a dental legacy redeployed throughout vertebrate evolution. These data lay the foundation for further experimental investigations utilizing the unique regenerative capacity of chondrichthyan models to answer evolutionary
Bao, Yanli; Hua, Hefeng
Network capability is the enterprise's capability to set up, manage, maintain and use a variety of relations between enterprises, and to obtain resources for improving competitiveness. Tourism in China is in a transformation period from sightseeing to leisure and vacation. Scenic spots as well as tourist enterprises can learn from some other enterprises in the process of resource development, and build up its own network relations in order to get resources for their survival and development. Through the effective management of network relations, the performance of resource development will be improved. By analyzing literature on network capability and the case analysis of Wuxi Huishan Ancient Town, the role of network capacity in the tourism resource development is explored and resource development path is built from the perspective of network capability. Finally, the tourism resource development process model based on network capacity is proposed. This model mainly includes setting up network vision, resource identification, resource acquisition, resource utilization and tourism project development. In these steps, network construction, network management and improving network center status are key points.
Momper, L. M.; Magnabosco, C.; Amend, J.; Osburn, M. R.; Fournier, G. P.
The marine and terrestrial subsurface biospheres represent quite likely the largest reservoirs for life on Earth, directly impacting surface processes and global cycles throughout Earth's history. In the deep subsurface biosphere (DSB) organic carbon and energy are often extremely scarce. However, archaea and bacteria are able to persist in the DSB to at least 3.5 km below surface . Understanding how they persist, and by what metabolisms they subsist, are key questions in this biosphere. To address these questions we investigated 5 global DSB environments: one legacy mine in South Dakota, USA, 3 mines in South Africa and marine fluids circulating beneath the Juan de Fuca Ridge. Boreholes within these mines provided access to fluids buried beneath the earth's surface and sampled depths down to 3.1 km. Geochemical data were collected concomitantly with DNA for metagenomic sequencing. We examined genomes of the ancient and deeply branching Chloroflexi for metabolic capabilities and interrogated the geochemical drivers behind those metabolisms with in situ thermodynamic modeling of reaction energetics. In total, 23 Chloroflexi genomes were identified and analyzed from the 5 subsurface sites. Genes for nitrate reduction (nar) and sulfite reduction (dsr) were found in many of the South Africa Chloroflexi but were absent from genomes collected in South Dakota. Indeed, nitrate reduction was among the most energetically favorable reactions in South African fluids (10-14 kJ cell-1 sec -1 per mol of reactant) and sulfur reduction with Fe2+ or H2 was also exergonic . Conversely, genes for nitrite and nitrous oxide reduction (nrf, nir and nos) were found in genomes collected in South Dakota and Juan de Fuca, but not South Africa. We examined the origin of genes conferring these metabolisms in the Chloroflexi genomes. We discovered evidence for horizontal gene transfer (HGT) for all of these putative metabolisms. Retention of these genes in Chloroflexi lineages indicates
Peng, Jiajie; Bai, Kun; Shang, Xuequn; Wang, Guohua; Xue, Hansheng; Jin, Shuilin; Cheng, Liang; Wang, Yadong; Chen, Jin
Background Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. Howeve...
Full Text Available Culture and genetics rely on two distinct but not isolated transmission systems. Cultural processes may change the human selective environment and thereby affect which individuals survive and reproduce. Here, we evaluated whether the modes of subsistence in Native American populations and the frequencies of the ABCA1*Arg230Cys polymorphism were correlated. Further, we examined whether the evolutionary consequences of the agriculturally constructed niche in Mesoamerica could be considered as a gene-culture coevolution model. For this purpose, we genotyped 229 individuals affiliated with 19 Native American populations and added data for 41 other Native American groups (n = 1905 to the analysis. In combination with the SNP cluster of a neutral region, this dataset was then used to unravel the scenario involved in 230Cys evolutionary history. The estimated age of 230Cys is compatible with its origin occurring in the American continent. The correlation of its frequencies with the archeological data on Zea pollen in Mesoamerica/Central America, the neutral coalescent simulations, and the F(ST-based natural selection analysis suggest that maize domestication was the driving force in the increase in the frequencies of 230Cys in this region. These results may represent the first example of a gene-culture coevolution involving an autochthonous American allele.
Hünemeier, Tábita; Amorim, Carlos Eduardo Guerra; Azevedo, Soledad; Contini, Veronica; Acuña-Alonzo, Víctor; Rothhammer, Francisco; Dugoujon, Jean-Michel; Mazières, Stephane; Barrantes, Ramiro; Villarreal-Molina, María Teresa; Paixão-Côrtes, Vanessa Rodrigues; Salzano, Francisco M.; Canizales-Quinteros, Samuel; Ruiz-Linares, Andres; Bortolini, Maria Cátira
Culture and genetics rely on two distinct but not isolated transmission systems. Cultural processes may change the human selective environment and thereby affect which individuals survive and reproduce. Here, we evaluated whether the modes of subsistence in Native American populations and the frequencies of the ABCA1*Arg230Cys polymorphism were correlated. Further, we examined whether the evolutionary consequences of the agriculturally constructed niche in Mesoamerica could be considered as a gene-culture coevolution model. For this purpose, we genotyped 229 individuals affiliated with 19 Native American populations and added data for 41 other Native American groups (n = 1905) to the analysis. In combination with the SNP cluster of a neutral region, this dataset was then used to unravel the scenario involved in 230Cys evolutionary history. The estimated age of 230Cys is compatible with its origin occurring in the American continent. The correlation of its frequencies with the archeological data on Zea pollen in Mesoamerica/Central America, the neutral coalescent simulations, and the FST-based natural selection analysis suggest that maize domestication was the driving force in the increase in the frequencies of 230Cys in this region. These results may represent the first example of a gene-culture coevolution involving an autochthonous American allele. PMID:22768049
This four-week fourth grade social studies unit dealing with religious dimensions in ancient Egyptian culture was developed by the Public Education Religion Studies Center at Wright State University. It seeks to help students understand ancient Egypt by looking at the people, the culture, and the people's world view. The unit begins with outlines…
Ho, Simon Y. W.; Gilbert, Tom
The mitochondrial genome has been the traditional focus of most research into ancient DNA, owing to its high copy number and population-level variability. Despite this long-standing interest in mitochondrial DNA, it was only in 2001 that the first complete ancient mitogenomic sequences were obtai...
Singh, Pramesh; Chen, Tianlong; Arendsee, Zebulun; Wurtele, Eve S.; Bassler, Kevin E.
Orphan genes, which are genes unique to each particular species, have recently drawn significant attention for their potential usefulness for organismal robustness. Their origin and regulatory interaction patterns remain largely undiscovered. Recently, methods that use the context likelihood of relatedness to infer a network followed by modularity maximizing community detection algorithms on the inferred network to find the functional structure of regulatory networks were shown to be effective. We apply improved versions of these methods to gene expression data from Arabidopsis thaliana, identify groups (clusters) of interacting genes with related patterns of expression and analyze the structure within those groups. Focusing on clusters that contain orphan genes, we compare the identified clusters to gene ontology (GO) terms, regulons, and pathway designations and analyze their hierarchical structure. We predict new regulatory interactions and unravel the structure of the regulatory interaction patterns of orphan genes. Work supported by the NSF through Grants DMR-1507371 and IOS-1546858.
Sep 28, 2015 ... [Patel N and Wang JTL 2015 Semi-supervised prediction of gene regulatory networks using machine learning algorithms. J. Biosci. 40 731–740]. DOI 10.1007/s12038-015-9558-9. 1. Introduction. 1.1 Background. Using gene expression data to infer gene regulatory net- works (GRNs) is a key approach to ...
Kitsukawa, Takashi; Yagi, Takeshi
Networks, such as the human society network, social and professional networks, and biological system networks, contain vast amounts of information. Information signals in networks are distributed over nodes and transmitted through intricately wired links, making the transfer and transformation of such information difficult to follow. Here we introduce a novel method for describing network information and its transfer using a model network, the Gene-matched network (GMN), in which nodes (neurons) possess attributes (genes). In the GMN, nodes are connected according to their expression of common genes. Because neurons have multiple genes, the GMN is cluster-rich. We show that, in the GMN, information transfer and transformation were controlled systematically, according to the activity level of the network. Furthermore, information transfer and transformation could be traced numerically with a vector using genes expressed in the activated neurons, the active-gene array, which was used to assess the relative activity among overlapping neuronal groups. Interestingly, this coding style closely resembles the cell-assembly neural coding theory. The method introduced here could be applied to many real-world networks, since many systems, including human society and various biological systems, can be represented as a network of this type.
Qi Liu; Changjun Ding; Yanguang Chu; Jiafei Chen; Weixi Zhang; Bingyu Zhang; Qinjun Huang; Xiaohua Su
Poplar is not only an important resource for the production of paper, timber and other wood-based products, but it has also emerged as an ideal model system for studying woody plants. To better understand the biological processes underlying various traits in poplar, e.g., wood development, a comprehensive functional gene interaction network is highly needed. Here, we constructed a genome-wide functional gene network for poplar (covering ~70% of the 41,335 poplar genes) and created the network...
Pavlogiannis, Andreas; Mozhayskiy, Vadim; Tagkopoulos, Ilias
Biological networks tend to have high interconnectivity, complex topologies and multiple types of interactions. This renders difficult the identification of sub-networks that are involved in condition- specific responses. In addition, we generally lack scalable methods that can reveal the information flow in gene regulatory and biochemical pathways. Doing so will help us to identify key participants and paths under specific environmental and cellular context. This paper introduces the theory of network flooding, which aims to address the problem of network minimization and regulatory information flow in gene regulatory networks. Given a regulatory biological network, a set of source (input) nodes and optionally a set of sink (output) nodes, our task is to find (a) the minimal sub-network that encodes the regulatory program involving all input and output nodes and (b) the information flow from the source to the sink nodes of the network. Here, we describe a novel, scalable, network traversal algorithm and we assess its potential to achieve significant network size reduction in both synthetic and E. coli networks. Scalability and sensitivity analysis show that the proposed method scales well with the size of the network, and is robust to noise and missing data. The method of network flooding proves to be a useful, practical approach towards information flow analysis in gene regulatory networks. Further extension of the proposed theory has the potential to lead in a unifying framework for the simultaneous network minimization and information flow analysis across various "omics" levels.
Full Text Available Reverse engineering of gene regulatory networks has been an intensively studied topic in bioinformatics since it constitutes an intermediate step from explorative to causative gene expression analysis. Many methods have been proposed through recent years leading to a wide range of mathematical approaches. In practice, different mathematical approaches will generate different resulting network structures, thus, it is very important for users to assess the performance of these algorithms. We have conducted a comparative study with six different reverse engineering methods, including relevance networks, neural networks, and Bayesian networks. Our approach consists of the generation of defined benchmark data, the analysis of these data with the different methods, and the assessment of algorithmic performances by statistical analyses. Performance was judged by network size and noise levels. The results of the comparative study highlight the neural network approach as best performing method among those under study.
Full Text Available The network-based approach has been used to describe the relationship among genes and various phenotypes, producing a network describing complex biological relationships. Such networks can be constructed by aggregating previously reported associations in the literature from various databases. In this work, we applied the network-based approach to investigate how different brain areas are associated to genetic disorders and genes. In particular, a tripartite network with genes, genetic diseases, and brain areas was constructed based on the associations among them reported in the literature through text mining. In the resulting network, a disproportionately large number of gene-disease and disease-brain associations were attributed to a small subset of genes, diseases, and brain areas. Furthermore, a small number of brain areas were found to be associated with a large number of the same genes and diseases. These core brain regions encompassed the areas identified by the previous genome-wide association studies, and suggest potential areas of focus in the future imaging genetics research. The approach outlined in this work demonstrates the utility of the network-based approach in studying genetic effects on the brain.
Hayasaka, Satoru; Hugenschmidt, Christina E; Laurienti, Paul J
The network-based approach has been used to describe the relationship among genes and various phenotypes, producing a network describing complex biological relationships. Such networks can be constructed by aggregating previously reported associations in the literature from various databases. In this work, we applied the network-based approach to investigate how different brain areas are associated to genetic disorders and genes. In particular, a tripartite network with genes, genetic diseases, and brain areas was constructed based on the associations among them reported in the literature through text mining. In the resulting network, a disproportionately large number of gene-disease and disease-brain associations were attributed to a small subset of genes, diseases, and brain areas. Furthermore, a small number of brain areas were found to be associated with a large number of the same genes and diseases. These core brain regions encompassed the areas identified by the previous genome-wide association studies, and suggest potential areas of focus in the future imaging genetics research. The approach outlined in this work demonstrates the utility of the network-based approach in studying genetic effects on the brain.
Full Text Available Motifs are patterns of recurring connections among the genes of genetic networks that occur more frequently than would be expected from randomized networks with the same degree sequence. Although the abundance of certain three-node motifs, such as the feed-forward loop, is positively correlated with a networks’ ability to tolerate moderate disruptions to gene expression, little is known regarding the connectivity of individual genes participating in multiple motifs. Using the transcriptional network of the bacterium Escherichia coli, we investigate this feature by reconstructing the distribution of genes participating in feed-forward loop motifs from its largest connected network component. We contrast these motif participation distributions with those obtained from model networks built using the preferential attachment mechanism employed by many biological and man-made networks. We report that, although some of these model networks support a motif participation distribution that appears qualitatively similar to that obtained from the bacterium Escherichia coli, the probability for a node to support a feed-forward loop motif may instead be strongly influenced by only a few master transcriptional regulators within the network. From these analyses we conclude that such master regulators may be a crucial ingredient to describe coupling among feed-forward loop motifs in transcriptional regulatory networks.
Aronow Bruce J
Full Text Available Abstract Background Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN analyses. Results For the first time, extended versions of the PageRank and HITS algorithms, and the K-Step Markov method are applied to prioritize disease candidate genes in a training-test schema. Using a list of known disease-related genes from our earlier study as a training set ("seeds", and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Under appropriate settings – for example, a back probability of 0.3 for PageRank with Priors and HITS with Priors, and step size 6 for K-Step Markov method – the three methods achieved a comparable AUC value, suggesting a similar performance. Conclusion Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations. Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.
Buckley, Katherine M; Rast, Jonathan P
The gut epithelium is an ancient site of complex communication between the animal immune system and the microbial world. While elements of self-non-self receptors and effector mechanisms differ greatly among animal phyla, some aspects of recognition, regulation, and response are broadly conserved. A gene regulatory network (GRN) approach provides a means to investigate the nature of this conservation and divergence even as more peripheral functional details remain incompletely understood. The sea urchin embryo is an unparalleled experimental model for detangling the GRNs that govern embryonic development. By applying this theoretical framework to the free swimming, feeding larval stage of the purple sea urchin, it is possible to delineate the conserved regulatory circuitry that regulates the gut-associated immune response. This model provides a morphologically simple system in which to efficiently unravel regulatory connections that are phylogenetically relevant to immunity in vertebrates. Here, we review the organism-wide cellular and transcriptional immune response of the sea urchin larva. A large set of transcription factors and signal systems, including epithelial expression of interleukin 17 (IL17), are important mediators in the activation of the early gut-associated response. Many of these have homologs that are active in vertebrate immunity, while others are ancient in animals but absent in vertebrates or specific to echinoderms. This larval model provides a means to experimentally characterize immune function encoded in the sea urchin genome and the regulatory interconnections that control immune response and resolution across the tissues of the organism.
Katherine M. Buckley
Full Text Available The gut epithelium is an ancient site of complex communication between the animal immune system and the microbial world. While elements of self-non-self receptors and effector mechanisms differ greatly among animal phyla, some aspects of recognition, regulation, and response are broadly conserved. A gene regulatory network (GRN approach provides a means to investigate the nature of this conservation and divergence even as more peripheral functional details remain incompletely understood. The sea urchin embryo is an unparalleled experimental model for detangling the GRNs that govern embryonic development. By applying this theoretical framework to the free swimming, feeding larval stage of the purple sea urchin, it is possible to delineate the conserved regulatory circuitry that regulates the gut-associated immune response. This model provides a morphologically simple system in which to efficiently unravel regulatory connections that are phylogenetically relevant to immunity in vertebrates. Here, we review the organism-wide cellular and transcriptional immune response of the sea urchin larva. A large set of transcription factors and signal systems, including epithelial expression of interleukin 17 (IL17, are important mediators in the activation of the early gut-associated response. Many of these have homologs that are active in vertebrate immunity, while others are ancient in animals but absent in vertebrates or specific to echinoderms. This larval model provides a means to experimentally characterize immune function encoded in the sea urchin genome and the regulatory interconnections that control immune response and resolution across the tissues of the organism.
Full Text Available The inference of gene regulatory network from expression data is an important area of research that provides insight to the inner workings of a biological system. The relevance-network-based approaches provide a simple and easily-scalable solution to the understanding of interaction between genes. Up until now, most works based on relevance network focus on the discovery of direct regulation using correlation coefficient or mutual information. However, some of the more complicated interactions such as interactive regulation and coregulation are not easily detected. In this work, we propose a relevance network model for gene regulatory network inference which employs both mutual information and conditional mutual information to determine the interactions between genes. For this purpose, we propose a conditional mutual information estimator based on adaptive partitioning which allows us to condition on both discrete and continuous random variables. We provide experimental results that demonstrate that the proposed regulatory network inference algorithm can provide better performance when the target network contains coregulated and interactively regulated genes.
Kari Y Lam
Full Text Available Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method's utility in learning from data collected on different experimental platforms.
May 1, 2014 ... tion. Therefore, in this study, the structural properties of the co-expression network inferred from gene expression microarray data were compared with the topological prop- erties of the known, well-established network data of the same organism. We use a Web application called. topoGSA (Glaab et al.
Stamatiou, Georgios A; Stankovic, Konstantina M
To perform comprehensive network and pathway analyses of the genes known to cause genetic hearing loss. In silico analysis of deafness genes using ingenuity pathway analysis (IPA). Genes relevant for hearing and deafness were identified through PubMed literature searches and the Hereditary Hearing Loss Homepage. The genes were assembled into 3 groups: 63 genes that cause nonsyndromic deafness, 107 genes that cause nonsyndromic or syndromic sensorineural deafness, and 112 genes associated with otic capsule development and malformations. Each group of genes was analyzed using IPA to discover the most interconnected, that is, "nodal" molecules, within the most statistically significant networks (p deafness (GPCR), or with predisposition to otosclerosis (TGFB1), but also novel genes that have not been described in the cochlea (HNF4A) and signaling kinases (ERK 1/2). A number of molecules that are likely to be key mediators of genetic hearing loss were identified through three different network and pathway analyses. The molecules included new candidate genes for deafness. Therapies targeting these molecules may be useful to treat deafness.
Xiang, Wenliang; Zhang, Jie; Li, Lin; Liang, Huazhong; Luo, Hai; Zhao, Jian; Yang, Zhirong; Sun, Qun
Metagenomic DNA libraries constructed from the Dagong Ancient Brine Well were screened for genes with Na(+)/H(+) antiporter activity on the antiporter-deficient Escherichia coli KNabc strain. One clone with a stable Na(+)-resistant phenotype was obtained and its Na(+)/H(+) antiporter gene was sequenced and designated as m-nha. The deduced amino acid sequence of M-Nha protein consists of 523 residues with a calculated molecular weight of 58 147 Da and a pI of 5.50, which is homologous with NhaH from Halobacillus dabanensis D-8(T) (92%) and Halobacillus aidingensis AD-6(T) (86%), and with Nhe2 from Bacillus sp. NRRL B-14911 (64%). It had a hydropathy profile with 10 putative transmembrane domains and a long carboxyl terminal hydrophilic tail of 140 amino acid residues, similar to Nhap from Synechocystis sp. and Aphanothece halophytica, as well as NhaG from Bacillus subtilis. The m-nha gene in the antiporter-negative mutant E. coli KNabc conferred resistance to Na(+) and the ability to grow under alkaline conditions. The difference in amino acid sequence and the putative secondary structure suggested that the m-nha isolated from the Dagong Ancient Brine Well in this study was a novel Na(+)/H(+) antiporter gene.
Full Text Available Autism spectrum disorder (ASD is marked by a strong genetic heterogeneity, which is underlined by the low overlap between ASD risk gene lists proposed in different studies. In this context, molecular networks can be used to analyze the results of several genome-wide studies in order to underline those network regions harboring genetic variations associated with ASD, the so-called “disease modules.” In this work, we used a recent network diffusion-based approach to jointly analyze multiple ASD risk gene lists. We defined genome-scale prioritizations of human genes in relation to ASD genes from multiple studies, found significantly connected gene modules associated with ASD and predicted genes functionally related to ASD risk genes. Most of them play a role in synapsis and neuronal development and function; many are related to syndromes that can be in comorbidity with ASD and the remaining are involved in epigenetics, cell cycle, cell adhesion and cancer.
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we ...
Sep 28, 2015 ... Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data.
Full Text Available Gene Regulatory Networks (GRNs have become a major focus of interest in recent years. A number of reverse engineering approaches have been developed to help uncover the regulatory networks giving rise to the observed gene expression profiles. However, this is an overspecified problem due to the fact that more than one genotype (network wiring can give rise to the same phenotype. We refer to this phenomenon as "gene elasticity." In this work, we study the effect of this particular problem on the pure, data-driven inference of gene regulatory networks.We simulated a four-gene network in order to produce "data" (protein levels that we use in lieu of real experimental data. We then optimized the network connections between the four genes with a view to obtain the original network that gave rise to the data. We did this for two different cases: one in which only the network connections were optimized and the other in which both the network connections as well as the kinetic parameters (given as reaction probabilities in our case were estimated. We observed that multiple genotypes gave rise to very similar protein levels. Statistical experimentation indicates that it is impossible to differentiate between the different networks on the basis of both equilibrium as well as dynamic data.We show explicitly that reverse engineering of GRNs from pure expression data is an indeterminate problem. Our results suggest the unsuitability of an inferential, purely data-driven approach for the reverse engineering transcriptional networks in the case of gene regulatory networks displaying a certain level of complexity.
Wang, Huan; Xu, Chuan-Yun; Hu, Jing-Bo; Cao, Ke-Fei
In this paper, a network of hypertension-related genes is constructed by analyzing the correlations of gene expression data among the Dahl salt-sensitive rat and two consomic rat strains. The numerical calculations show that this sparse and assortative network has small-world and scale-free properties. Further, 16 key hub genes (Col4a1, Lcn2, Cdk4, etc.) are determined by introducing an integrated centrality and have been confirmed by biological/medical research to play important roles in hypertension.
Pájaro, Manuel; Otero-Muras, Irene; Vázquez, Carlos; Alonso, Antonio A
Gene regulation is inherently stochastic. In many applications concerning Systems and Synthetic Biology such as the reverse engineering and the de novo design of genetic circuits, stochastic effects (yet potentially crucial) are often neglected due to the high computational cost of stochastic simulations. With advances in these fields there is an increasing need of tools providing accurate approximations of the stochastic dynamics of gene regulatory networks (GRNs) with reduced computational effort. This work presents SELANSI (SEmi-LAgrangian SImulation of GRNs), a software toolbox for the simulation of stochastic multidimensional gene regulatory networks. SELANSI exploits intrinsic structural properties of gene regulatory networks to accurately approximate the corresponding Chemical Master Equation with a partial integral differential equation that is solved by a semi-lagrangian method with high efficiency. Networks under consideration might involve multiple genes with self and cross regulations, in which genes can be regulated by different transcription factors. Moreover, the validity of the method is not restricted to a particular type of kinetics. The tool offers total flexibility regarding network topology, kinetics and parameterization, as well as simulation options. SELANSI runs under the MATLAB environment, and is available under GPLv3 license at https://sites.google.com/view/selansi. firstname.lastname@example.org. © The Author(s) 2017. Published by Oxford University Press.
Full Text Available Abstract Background Gene expression and transcription factor (TF binding data have been used to reveal gene transcriptional regulatory networks. Existing knowledge of gene regulation can be presented using gene connectivity networks. However, these composite connectivity networks do not specify the range of biological conditions of the activity of each link in the network. Results We present a novel method that utilizes the expression and binding patterns of the neighboring nodes of each link in existing experimentally-based, literature-derived gene transcriptional regulatory networks and extend them in silico using TF-gene binding motifs and a compendium of large expression data from Saccharomyces cerevisiae. Using this method, we predict several hundreds of new transcriptional regulatory TF-gene links, along with experimental conditions in which known and predicted links become active. This approach unravels new links in the yeast gene transcriptional regulatory network by utilizing the known transcriptional regulatory interactions, and is particularly useful for breaking down the composite transcriptional regulatory network to condition specific networks. Conclusion Our methods can facilitate future binding experiments, as they can considerably help focus on the TFs that must be surveyed to understand gene regulation. (Supplemental material and the latest version of the MATLAB implementation of the United Signature Algorithm is available online at 1 or [see Additional files 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] Additional File 1 overview of supplemental data Click here for file Additional File 2 experimental conditions for each link in figure 5. These are the experimental conditions in which the links are likely to be active. Click here for file Additional File 3 experimental conditions for each link in figure 7. These are the experimental conditions in which the links are likely to be active. Click here for file Additional File 4 Alon
Pavy, Nathalie; Pelgas, Betty; Laroche, Jérôme; Rigault, Philippe; Isabel, Nathalie; Bousquet, Jean
Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling. To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago. Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed plants.
Full Text Available Abstract Background Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling. Results To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago. Conclusions Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed
Full Text Available The architecture of tomato inflorescence strongly affects flower production and subsequent crop yield. To understand the genetic activities involved, insight into the underlying network of genes that initiate and control the sympodial growth in the tomato is essential. In this paper, we show how the structure of this network can be derived from available data of the expressions of the involved genes. Our approach starts from employing biological expert knowledge to select the most probable gene candidates behind branching behavior. To find how these genes interact, we develop a stepwise procedure for computational inference of the network structure. Our data consists of expression levels from primary shoot meristems, measured at different developmental stages on three different genotypes of tomato. With the network inferred by our algorithm, we can explain the dynamics corresponding to all three genotypes simultaneously, despite their apparent dissimilarities. We also correctly predict the chronological order of expression peaks for the main hubs in the network. Based on the inferred network, using optimal experimental design criteria, we are able to suggest an informative set of experiments for further investigation of the mechanisms underlying branching behavior.
Patil, Ashwini; Nakai, Kenta
Time-course gene expression profiles are frequently used to provide insight into the changes in cellular state over time and to infer the molecular pathways involved. When combined with large-scale molecular interaction networks, such data can provide information about the dynamics of cellular response to stimulus. However, few tools are currently available to predict a single active gene sub-network from time-course gene expression profiles. We introduce a tool, TimeXNet, which identifies active gene sub-networks with temporal paths using time-course gene expression profiles in the context of a weighted gene regulatory and protein-protein interaction network. TimeXNet uses a specialized form of the network flow optimization approach to identify the most probable paths connecting the genes with significant changes in expression at consecutive time intervals. TimeXNet has been extensively evaluated for its ability to predict novel regulators and their associated pathways within active gene sub-networks in the mouse innate immune response and the yeast osmotic stress response. Compared to other similar methods, TimeXNet identified up to 50% more novel regulators from independent experimental datasets. It predicted paths within a greater number of known pathways with longer overlaps (up to 7 consecutive edges) within these pathways. TimeXNet was also shown to be robust in the presence of varying amounts of noise in the molecular interaction network. TimeXNet is a reliable tool that can be used to study cellular response to stimuli through the identification of time-dependent active gene sub-networks in diverse biological systems. It is significantly better than other similar tools. TimeXNet is implemented in Java as a stand-alone application and supported on Linux, MS Windows and Macintosh. The output of TimeXNet can be directly viewed in Cytoscape. TimeXNet is freely available for non-commercial users.
Full Text Available Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3, the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA by: i introducing quality control of co-expression similarities, ii parallelizing embedded network construction, and iii developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs. We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA. MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.
Song, Won-Min; Zhang, Bin
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.
Mostafavi, Sara; Morris, Quaid
In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Malliya Gounder Palanichamy
Full Text Available Recent analyses of ancient Mesopotamian mitochondrial genomes have suggested a genetic link between the Indian subcontinent and Mesopotamian civilization. There is no consensus on the origin of the ancient Mesopotamians. They may be descendants of migrants, who founded regional Mesopotamian groups like that of Terqa or they may be merchants who were involved in trans Mesopotamia trade. To identify the Indian source population showing linkage to the ancient Mesopotamians, we screened a total of 15,751 mitochondrial DNAs (11,432 from the literature and 4,319 from this study representing all major populations of India. Our results although suggest that south India (Tamil Nadu and northeast India served as the source of the ancient Mesopotamian mtDNA gene pool, mtDNA of these ancient Mesopotamians probably contributed by Tamil merchants who were involved in the Indo-Roman trade.
Palanichamy, Malliya gounder; Mitra, Bikash; Debnath, Monojit; Agrawal, Suraksha; Chaudhuri, Tapas Kumar; Zhang, Ya-Ping
Recent analyses of ancient Mesopotamian mitochondrial genomes have suggested a genetic link between the Indian subcontinent and Mesopotamian civilization. There is no consensus on the origin of the ancient Mesopotamians. They may be descendants of migrants, who founded regional Mesopotamian groups like that of Terqa or they may be merchants who were involved in trans Mesopotamia trade. To identify the Indian source population showing linkage to the ancient Mesopotamians, we screened a total of 15,751 mitochondrial DNAs (11,432 from the literature and 4,319 from this study) representing all major populations of India. Our results although suggest that south India (Tamil Nadu) and northeast India served as the source of the ancient Mesopotamian mtDNA gene pool, mtDNA of these ancient Mesopotamians probably contributed by Tamil merchants who were involved in the Indo-Roman trade. PMID:25299580
Palanichamy, Malliya Gounder; Mitra, Bikash; Debnath, Monojit; Agrawal, Suraksha; Chaudhuri, Tapas Kumar; Zhang, Ya-Ping
Recent analyses of ancient Mesopotamian mitochondrial genomes have suggested a genetic link between the Indian subcontinent and Mesopotamian civilization. There is no consensus on the origin of the ancient Mesopotamians. They may be descendants of migrants, who founded regional Mesopotamian groups like that of Terqa or they may be merchants who were involved in trans Mesopotamia trade. To identify the Indian source population showing linkage to the ancient Mesopotamians, we screened a total of 15,751 mitochondrial DNAs (11,432 from the literature and 4,319 from this study) representing all major populations of India. Our results although suggest that south India (Tamil Nadu) and northeast India served as the source of the ancient Mesopotamian mtDNA gene pool, mtDNA of these ancient Mesopotamians probably contributed by Tamil merchants who were involved in the Indo-Roman trade.
Shlykova, Irina; Ponosov, Arcady
There are different ways of how to model gene regulatory networks. Differential equations allow for a detailed description of the network's dynamics and provide an explicit model of the gene concentration changes over time. Production and relative degradation rate functions used in such models depend on the vector of steeply sloped threshold functions which characterize the activity of genes. The most popular example of the threshold functions comes from the Boolean network approach, where the threshold functions are given by step functions. The system of differential equations becomes then piecewise linear. The dynamics of this system can be described very easily between the thresholds, but not in the switching domains. For instance this approach fails to analyze stationary points of the system and to define continuous solutions in the switching domains. These problems were studied in , , but the proposed model did not take into account a time delay in cellular systems. However, analysis of real gene expression data shows a considerable number of time-delayed interactions suggesting that time delay is essential in gene regulation. Therefore, delays may have a great effect on the dynamics of the system presenting one of the critical factors that should be considered in reconstruction of gene regulatory networks. The goal of this work is to apply the singular perturbation analysis to certain systems with delay and to obtain an analog of Tikhonov's theorem, which provides sufficient conditions for constracting the limit system in the delay case.
Der Sarkissian, Clio; Allentoft, Morten Erik; Avila Arcos, Maria del Carmen
The past decade has witnessed a revolution in ancient DNA (aDNA) research. Although the field's focus was previously limited to mitochondrial DNA and a few nuclear markers, whole genome sequences from the deep past can now be retrieved. This breakthrough is tightly connected to the massive sequen...
Full Text Available Abstract Background Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact. Results A recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement. Conclusions Compared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA.
Full Text Available Abstract Background Understanding gene expression and regulation is essential for understanding biological mechanisms. Because gene expression profiling has been widely used in basic biological research, especially in transcription regulation studies, we have developed GeneReg, an easy-to-use R package, to construct gene regulatory networks from time course gene expression profiling data; More importantly, this package can provide information about time delays between expression change in a regulator and that of its target genes. Findings The R package GeneReg is based on time delay linear regression, which can generate a model of the expression levels of regulators at a given time point against the expression levels of their target genes at a later time point. There are two parameters in the model, time delay and regulation coefficient. Time delay is the time lag during which expression change of the regulator is transmitted to change in target gene expression. Regulation coefficient expresses the regulation effect: a positive regulation coefficient indicates activation and negative indicates repression. GeneReg was implemented on a real Saccharomyces cerevisiae cell cycle dataset; more than thirty percent of the modeled regulations, based entirely on gene expression files, were found to be consistent with previous discoveries from known databases. Conclusions GeneReg is an easy-to-use, simple, fast R package for gene regulatory network construction from short time course gene expression data. It may be applied to study time-related biological processes such as cell cycle, cell differentiation, or causal inference.
Saik, Olga V; Demenkov, Pavel S; Ivanisenko, Timofey V; Bragina, Elena Yu; Freidin, Maxim B; Goncharova, Irina A; Dosenko, Victor E; Zolotareva, Olga I; Hofestaedt, Ralf; Lavrik, Inna N; Rogaev, Evgeny I; Ivanisenko, Vladimir A
Hypertension and bronchial asthma are a major issue for people's health. As of 2014, approximately one billion adults, or ~ 22% of the world population, have had hypertension. As of 2011, 235-330 million people globally have been affected by asthma and approximately 250,000-345,000 people have died each year from the disease. The development of the effective treatment therapies against these diseases is complicated by their comorbidity features. This is often a major problem in diagnosis and their treatment. Hence, in this study the bioinformatical methodology for the analysis of the comorbidity of these two diseases have been developed. As such, the search for candidate genes related to the comorbid conditions of asthma and hypertension can help in elucidating the molecular mechanisms underlying the comorbid condition of these two diseases, and can also be useful for genotyping and identifying new drug targets. Using ANDSystem, the reconstruction and analysis of gene networks associated with asthma and hypertension was carried out. The gene network of asthma included 755 genes/proteins and 62,603 interactions, while the gene network of hypertension - 713 genes/proteins and 45,479 interactions. Two hundred and five genes/proteins and 9638 interactions were shared between asthma and hypertension. An approach for ranking genes implicated in the comorbid condition of two diseases was proposed. The approach is based on nine criteria for ranking genes by their importance, including standard methods of gene prioritization (Endeavor, ToppGene) as well as original criteria that take into account the characteristics of an associative gene network and the presence of known polymorphisms in the analysed genes. According to the proposed approach, the genes IL10, TLR4, and CAT had the highest priority in the development of comorbidity of these two diseases. Additionally, it was revealed that the list of top genes is enriched with apoptotic genes and genes involved in
Jia, Chen; Qian, Hong; Chen, Min; Zhang, Michael Q.
The transient response to a stimulus and subsequent recovery to a steady state are the fundamental characteristics of a living organism. Here we study the relaxation kinetics of autoregulatory gene networks based on the chemical master equation model of single-cell stochastic gene expression with nonlinear feedback regulation. We report a novel relation between the rate of relaxation, characterized by the spectral gap of the Markov model, and the feedback sign of the underlying gene circuit. When a network has no feedback, the relaxation rate is exactly the decaying rate of the protein. We further show that positive feedback always slows down the relaxation kinetics while negative feedback always speeds it up. Numerical simulations demonstrate that this relation provides a possible method to infer the feedback topology of autoregulatory gene networks by using time-series data of gene expression.
Schroeder, Mark D; Pearce, Michael; Fak, John; Fan, HongQing; Unnerstall, Ulrich; Emberly, Eldon; Rajewsky, Nikolaus; Siggia, Eric D; Gaul, Ulrike
The segmentation gene network of Drosophila consists of maternal and zygotic factors that generate, by transcriptional (cross-) regulation, expression patterns of increasing complexity along the anterior-posterior axis of the embryo. Using known binding site information for maternal and zygotic gap transcription factors, the computer algorithm Ahab recovers known segmentation control elements (modules) with excellent success and predicts many novel modules within the network and genome-wide. We show that novel module predictions are highly enriched in the network and typically clustered proximal to the promoter, not only upstream, but also in intronic space and downstream. When placed upstream of a reporter gene, they consistently drive patterned blastoderm expression, in most cases faithfully producing one or more pattern elements of the endogenous gene. Moreover, we demonstrate for the entire set of known and newly validated modules that Ahab's prediction of binding sites correlates well with the expression patterns produced by the modules, revealing basic rules governing their composition. Specifically, we show that maternal factors consistently act as activators and that gap factors act as repressors, except for the bimodal factor Hunchback. Our data suggest a simple context-dependent rule for its switch from repressive to activating function. Overall, the composition of modules appears well fitted to the spatiotemporal distribution of their positive and negative input factors. Finally, by comparing Ahab predictions with different categories of transcription factor input, we confirm the global regulatory structure of the segmentation gene network, but find odd skipped behaving like a primary pair-rule gene. The study expands our knowledge of the segmentation gene network by increasing the number of experimentally tested modules by 50%. For the first time, the entire set of validated modules is analyzed for binding site composition under a uniform set of
Mark D Schroeder
Full Text Available The segmentation gene network of Drosophila consists of maternal and zygotic factors that generate, by transcriptional (cross- regulation, expression patterns of increasing complexity along the anterior-posterior axis of the embryo. Using known binding site information for maternal and zygotic gap transcription factors, the computer algorithm Ahab recovers known segmentation control elements (modules with excellent success and predicts many novel modules within the network and genome-wide. We show that novel module predictions are highly enriched in the network and typically clustered proximal to the promoter, not only upstream, but also in intronic space and downstream. When placed upstream of a reporter gene, they consistently drive patterned blastoderm expression, in most cases faithfully producing one or more pattern elements of the endogenous gene. Moreover, we demonstrate for the entire set of known and newly validated modules that Ahab's prediction of binding sites correlates well with the expression patterns produced by the modules, revealing basic rules governing their composition. Specifically, we show that maternal factors consistently act as activators and that gap factors act as repressors, except for the bimodal factor Hunchback. Our data suggest a simple context-dependent rule for its switch from repressive to activating function. Overall, the composition of modules appears well fitted to the spatiotemporal distribution of their positive and negative input factors. Finally, by comparing Ahab predictions with different categories of transcription factor input, we confirm the global regulatory structure of the segmentation gene network, but find odd skipped behaving like a primary pair-rule gene. The study expands our knowledge of the segmentation gene network by increasing the number of experimentally tested modules by 50%. For the first time, the entire set of validated modules is analyzed for binding site composition under a
Thomas R Geiger
Full Text Available Identification of conserved co-expression networks is a useful tool for clustering groups of genes enriched for common molecular or cellular functions . The relative importance of genes within networks can frequently be inferred by the degree of connectivity, with those displaying high connectivity being significantly more likely to be associated with specific molecular functions . Previously we utilized cross-species network analysis to identify two network modules that were significantly associated with distant metastasis free survival in breast cancer. Here, we validate one of the highly connected genes as a metastasis associated gene. Tpx2, the most highly connected gene within a proliferation network specifically prognostic for estrogen receptor positive (ER+ breast cancers, enhances metastatic disease, but in a tumor autonomous, proliferation-independent manner. Histologic analysis suggests instead that variation of TPX2 levels within disseminated tumor cells may influence the transition between dormant to actively proliferating cells in the secondary site. These results support the co-expression network approach for identification of new metastasis-associated genes to provide new information regarding the etiology of breast cancer progression and metastatic disease.
Gene regulatory networks analyze the relationships between genes allowing us to un- derstand the gene regulatory interactions in systems biology. Gene expression data from the microarray experiments is used to obtain the gene regulatory networks. How- ever, the microarray data is discrete, noisy and non-linear which makes learning the networks a challenging problem and existing gene network inference methods do not give consistent results. Current state-of-the-art study uses the average-ranking-based consensus method to combine and average the ranked predictions from individual methods. However each individual method has an equal contribution to the consen- sus prediction. We have developed a linear programming-based consensus approach which uses learned weights from linear programming among individual methods such that the methods have di↵erent weights depending on their performance. Our result reveals that assigning di↵erent weights to individual methods rather than giving them equal weights improves the performance of the consensus. The linear programming- based consensus method is evaluated and it had the best performance on in silico and Saccharomyces cerevisiae networks, and the second best on the Escherichia coli network outperformed by Inferelator Pipeline method which gives inconsistent results across a wide range of microarray data sets.
Ojeda, Sergio R; Dubay, Christopher; Lomniczi, Alejandro; Kaidar, Gabi; Matagne, Valerie; Sandau, Ursula S; Dissen, Gregory A
A sustained increase in pulsatile release of gonadotrophin releasing hormone (GnRH) from the hypothalamus is an essential, final event that defines the initiation of mammalian puberty. This increase depends on coordinated changes in transsynaptic and glial-neuronal communication, consisting of activating neuronal and glial excitatory inputs to the GnRH neuronal network and the loss of transsynaptic inhibitory tone. It is now clear that the prevalent excitatory systems stimulating GnRH secretion involve a neuronal component consisting of excitatory amino acids (glutamate) and at least one peptide (kisspeptin), and a glial component that uses growth factors and small molecules for cell-cell signaling. GABAergic and opiatergic neurons provide transsynaptic inhibitory control to the system, but GABA neurons also exert direct excitatory effects on GnRH neurons. The molecular mechanisms that provide encompassing coordination to this cellular network are not known, but they appear to involve a host of functionally related genes hierarchically arranged. We envision that, as observed in other gene networks, the highest level of control in this network is provided by transcriptional regulators that, by directing expression of key subordinate genes, impose an integrative level of coordination to the neuronal and glial subsets involved in initiating the pubertal process. The use of high-throughput and gene manipulation approaches coupled to systems biology strategies should provide not only the experimental bases supporting this concept, but also unveil the existence of crucial components of network control not yet identified. Copyright (c) 2009 Elsevier Ireland Ltd. All rights reserved.
Bourdakou, Marilena M; Spyrou, George M
Systemic approaches offer a different point of view on the analysis of several types of molecular associations as well as on the identification of specific gene communities in several cancer types. However, due to lack of sufficient data needed to construct networks based on experimental evidence, statistical gene co-expression networks are widely used instead. Many efforts have been made to exploit the information hidden in these networks. However, these approaches still need to capitalize comprehensively the prior knowledge encrypted into molecular pathway associations and improve their efficiency regarding the discovery of both exclusive subnetworks as candidate biomarkers and conserved subnetworks that may uncover common origins of several cancer types. In this study we present the development of the Informed Walks model based on random walks that incorporate information from molecular pathways to mine candidate genes and gene-gene links. The proposed model has been applied to TCGA (The Cancer Genome Atlas) datasets from seven different cancer types, exploring the reconstructed co-expression networks of the whole set of genes and driving to highlighted sub-networks for each cancer type. In the sequel, we elucidated the impact of each subnetwork on the indication of underlying exclusive and common molecular mechanisms as well as on the short-listing of drugs that have the potential to suppress the corresponding cancer type through a drug-repurposing pipeline. We have developed a method of gene subnetwork highlighting based on prior knowledge, capable to give fruitful insights regarding the underlying molecular mechanisms and valuable input to drug-repurposing pipelines for a variety of cancer types.
Munsky, Brian [Los Alamos National Laboratory; Khammash, Mustafa [UCSB
The cellular environment is abuzz with noise. The origin of this noise is attributed to the inherent random motion of reacting molecules that take part in gene expression and post expression interactions. In this noisy environment, clonal populations of cells exhibit cell-to-cell variability that frequently manifests as significant phenotypic differences within the cellular population. The stochastic fluctuations in cellular constituents induced by noise can be measured and their statistics quantified. We show that these random fluctuations carry within them valuable information about the underlying genetic network. Far from being a nuisance, the ever-present cellular noise acts as a rich source of excitation that, when processed through a gene network, carries its distinctive fingerprint that encodes a wealth of information about that network. We demonstrate that in some cases the analysis of these random fluctuations enables the full identification of network parameters, including those that may otherwise be difficult to measure. This establishes a potentially powerful approach for the identification of gene networks and offers a new window into the workings of these networks.
Lipner, Ettie M.; Garcia, Benjamin J.; Strong, Michael
Tuberculosis and nontuberculous mycobacterial infections constitute a high burden of pulmonary disease in humans, resulting in over 1.5 million deaths per year. Building on the premise that genetic factors influence the instance, progression, and defense of infectious disease, we undertook a systems biology approach to investigate relationships among genetic factors that may play a role in increased susceptibility or control of mycobacterial infections. We combined literature and database mining with network analysis and pathway enrichment analysis to examine genes, pathways, and networks, involved in the human response to Mycobacterium tuberculosis and nontuberculous mycobacterial infections. This approach allowed us to examine functional relationships among reported genes, and to identify novel genes and enriched pathways that may play a role in mycobacterial susceptibility or control. Our findings suggest that the primary pathways and genes influencing mycobacterial infection control involve an interplay between innate and adaptive immune proteins and pathways. Signaling pathways involved in autoimmune disease were significantly enriched as revealed in our networks. Mycobacterial disease susceptibility networks were also examined within the context of gene-chemical relationships, in order to identify putative drugs and nutrients with potential beneficial immunomodulatory or anti-mycobacterial effects. PMID:26751573
Kordmahalleh, Mina Moradi; Sefidmazgi, Mohammad Gorji; Harrison, Scott H; Homaifar, Abdollah
The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects of transcription factors, repressors, small metabolites, and microRNA species. In addition, the effects of regulatory interactions are not always simultaneous, but can occur after a finite time delay, or as a combined outcome of simultaneous and time delayed interactions. Powerful biotechnologies have been rapidly and successfully measuring levels of genetic expression to illuminate different states of biological systems. This has led to an ensuing challenge to improve the identification of specific regulatory mechanisms through regulatory network reconstructions. Solutions to this challenge will ultimately help to spur forward efforts based on the usage of regulatory network reconstructions in systems biology applications. We have developed a hierarchical recurrent neural network (HRNN) that identifies time-delayed gene interactions using time-course data. A customized genetic algorithm (GA) was used to optimize hierarchical connectivity of regulatory genes and a target gene. The proposed design provides a non-fully connected network with the flexibility of using recurrent connections inside the network. These features and the non-linearity of the HRNN facilitate the process of identifying temporal patterns of a GRN. Our HRNN method was implemented with the Python language. It was first evaluated on simulated data representing linear and nonlinear time-delayed gene-gene interaction models across a range of network sizes and variances of noise. We then further demonstrated the capability of our method in reconstructing GRNs of the Saccharomyces cerevisiae synthetic network for in vivo benchmarking of reverse-engineering and modeling approaches (IRMA). We compared the performance of our method to TD-ARACNE, HCC-CLINDE, TSNI and ebdbNet across different network
Venkatesan, Aravind; Tripathi, Sushil; Sanz de Galdeano, Alejandro; Blondé, Ward; Lægreid, Astrid; Mironov, Vladimir; Kuiper, Martin
Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.
Sauro Herbert M
Full Text Available Abstract Background In synthetic biology, gene regulatory circuits are often constructed by combining smaller circuit components. Connections between components are achieved by transcription factors acting on promoters. If the individual components behave as true modules and certain module interface conditions are satisfied, the function of the composite circuits can in principle be predicted. Results In this paper, we investigate one of the interface conditions: fan-out. We quantify the fan-out, a concept widely used in electrical engineering, to indicate the maximum number of the downstream inputs that an upstream output transcription factor can regulate. The fan-out is shown to be closely related to retroactivity studied by Del Vecchio, et al. An efficient operational method for measuring the fan-out is proposed and shown to be applied to various types of module interfaces. The fan-out is also shown to be enhanced by self-inhibitory regulation on the output. The potential role of an inhibitory regulation is discussed. Conclusions The proposed estimation method for fan-out not only provides an experimentally efficient way for quantifying the level of modularity in gene regulatory circuits but also helps characterize and design module interfaces, enabling the modular construction of gene circuits.
Pluripotency is a state that exists transiently in the early embryo and, remarkably, can be recapitulated in vitro by deriving embryonic stem cells or by reprogramming somatic cells to become induced pluripotent stem cells. The state of pluripotency, which is stabilized by an interconnected network of pluripotency-associated genes, integrates external signals and exerts control over the decision between self-renewal and differentiation at the transcriptional, post-transcriptional and epigenetic levels. Recent evidence of alternative pluripotency states indicates the regulatory flexibility of this network. Insights into the underlying principles of the pluripotency network may provide unprecedented opportunities for studying development and for regenerative medicine.
Full Text Available Combining path consistency (PC algorithms with conditional mutual information (CMI are widely used in reconstruction of gene regulatory networks. CMI has many advantages over Pearson correlation coefficient in measuring non-linear dependence to infer gene regulatory networks. It can also discriminate the direct regulations from indirect ones. However, it is still a challenge to select the conditional genes in an optimal way, which affects the performance and computation complexity of the PC algorithm. In this study, we develop a novel conditional mutual information-based algorithm, namely RPNI (Regulation Pattern based Network Inference, to infer gene regulatory networks. For conditional gene selection, we define the co-regulation pattern, indirect-regulation pattern and mixture-regulation pattern as three candidate patterns to guide the selection of candidate genes. To demonstrate the potential of our algorithm, we apply it to gene expression data from DREAM challenge. Experimental results show that RPNI outperforms existing conditional mutual information-based methods in both accuracy and time complexity for different sizes of gene samples. Furthermore, the robustness of our algorithm is demonstrated by noisy interference analysis using different types of noise.
Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard
The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the
Bert van der Zwaag
Full Text Available The recent identification of copy-number variation in the human genome has opened up new avenues for the discovery of positional candidate genes underlying complex genetic disorders, especially in the field of psychiatric disease. One major challenge that remains is pinpointing the susceptibility genes in the multitude of disease-associated loci. This challenge may be tackled by reconstruction of functional gene-networks from the genes residing in these loci. We applied this approach to autism spectrum disorder (ASD, and identified the copy-number changes in the DNA of 105 ASD patients and 267 healthy individuals with Illumina Humanhap300 Beadchips. Subsequently, we used a human reconstructed gene-network, Prioritizer, to rank candidate genes in the segmental gains and losses in our autism cohort. This analysis highlighted several candidate genes already known to be mutated in cognitive and neuropsychiatric disorders, including RAI1, BRD1, and LARGE. In addition, the LARGE gene was part of a sub-network of seven genes functioning in glycobiology, present in seven copy-number changes specifically identified in autism patients with limited co-morbidity. Three of these seven copy-number changes were de novo in the patients. In autism patients with a complex phenotype and healthy controls no such sub-network was identified. An independent systematic analysis of 13 published autism susceptibility loci supports the involvement of genes related to glycobiology as we also identified the same or similar genes from those loci. Our findings suggest that the occurrence of genomic gains and losses of genes associated with glycobiology are important contributors to the development of ASD.
Full Text Available Abstract Background Recent years have seen a dramatic increase in the use of mathematical modeling to gain insight into gene regulatory network behavior across many different organisms. In particular, there has been considerable interest in using mathematical tools to understand how multistable regulatory networks may contribute to developmental processes such as cell fate determination. Indeed, such a network may subserve the formation of unicellular leaf hairs (trichomes in the model plant Arabidopsis thaliana. Results In order to investigate the capacity of small gene regulatory networks to generate multiple equilibria, we present a chemical reaction network (CRN-based modeling formalism and describe a number of methods for CRN analysis in a parameter-free context. These methods are compared and applied to a full set of one-component subnetworks, as well as a large random sample from 40,680 similarly constructed two-component subnetworks. We find that positive feedback and cooperativity mediated by transcription factor (TF dimerization is a requirement for one-component subnetwork bistability. For subnetworks with two components, the presence of these processes increases the probability that a randomly sampled subnetwork will exhibit multiple equilibria, although we find several examples of bistable two-component subnetworks that do not involve cooperative TF-promoter binding. In the specific case of epidermal differentiation in Arabidopsis, dimerization of the GL3-GL1 complex and cooperative sequential binding of GL3-GL1 to the CPC promoter are each independently sufficient for bistability. Conclusion Computational methods utilizing CRN-specific theorems to rule out bistability in small gene regulatory networks are far superior to techniques generally applicable to deterministic ODE systems. Using these methods to conduct an unbiased survey of parameter-free deterministic models of small networks, and the Arabidopsis epidermal cell
The formation of the nervous system is a multistep process that yields a mature brain. Failure in any of the steps of this process may cause brain malfunction. In the early stages of embryonic development, neural progenitors quickly proliferate and then, at a specific moment, differentiate into neurons or glia. Once they become postmitotic neurons, they migrate to their final destinations and begin to extend their axons to connect with other neurons, sometimes located in quite distant regions, to establish different neural circuits. During the last decade, it has become evident that Zic genes, in addition to playing important roles in early development (e.g., gastrulation and neural tube closure), are involved in different processes of late brain development, such as neuronal migration, axon guidance, and refinement of axon terminals. ZIC proteins are therefore essential for the proper wiring and connectivity of the brain. In this chapter, we review our current knowledge of the role of Zic genes in the late stages of neural circuit formation.
Full Text Available Schizophrenia (SZ is a heritable, complex mental disorder. We have seen limited success in finding causal genes for schizophrenia from numerous conventional studies. Protein interaction network and pathway-based analysis may provide us an alternative and effective approach to investigating the molecular mechanisms of schizophrenia.We selected a list of schizophrenia candidate genes (SZGenes using a multi-dimensional evidence-based approach. The global network properties of proteins encoded by these SZGenes were explored in the context of the human protein interactome while local network properties were investigated by comparing SZ-specific and cancer-specific networks that were extracted from the human interactome. Relative to cancer genes, we observed that SZGenes tend to have an intermediate degree and an intermediate efficiency on a perturbation spreading throughout the human interactome. This suggested that schizophrenia might have different pathological mechanisms from cancer even though both are complex diseases. We conducted pathway analysis using Ingenuity System and constructed the first schizophrenia molecular network (SMN based on protein interaction networks, pathways and literature survey. We identified 24 pathways overrepresented in SZGenes and examined their interactions and crosstalk. We observed that these pathways were related to neurodevelopment, immune system, and retinoic X receptor (RXR. Our examination of SMN revealed that schizophrenia is a dynamic process caused by dysregulation of the multiple pathways. Finally, we applied the network/pathway approach to identify novel candidate genes, some of which could be verified by experiments.This study provides the first comprehensive review of the network and pathway characteristics of schizophrenia candidate genes. Our preliminary results suggest that this systems biology approach might prove promising for selection of candidate genes for complex diseases. Our findings have
Sloan, Zachary; Arends, Danny; Broman, Karl W.; Centeno, Arthur; Furlotte, Nicholas; Nijveen, H.; Yan, Lei; Zhou, Xiang; Williams, Robert W.; Prins, Pjotr
GeneNetwork (GN) is a free and open source (FOSS) framework for web-based genetics that can be deployed anywhere. GN allows biologists to upload high-throughput experimental data, such as expression data from microarrays and RNA-seq, and also `classic' phenotypes, such as disease phenotypes. These
FUMET: A fuzzy network module extraction technique for gene expression data. PRIYAKSHI MAHANTA, HASIN AFZAL AHMED, DHRUBA KUMAR BHATTACHARYYA and ASHISH GHOSH http://www.ias.ac.in/jbiosci. J. Biosci. 39(3), June 2014, 351–364, © Indian Academy of Sciences. Supplementary material ...
An accurate determination of the network structure of gene regulatory systems from high-throughput gene expression data is an essential yet challenging step in studying how the expression of endogenous genes is controlled through a complex interaction of gene products and DNA. While numerous methods have been proposed to infer the structure of gene regulatory networks, none of them seem to work consistently over different data sets with high accuracy. A recent study to compare gene network inference methods showed that an average-ranking-based consensus method consistently performs well under various settings. Here, we propose a linear programming-based consensus method for the inference of gene regulatory networks. Unlike the average-ranking-based one, which treats the contribution of each individual method equally, our new consensus method assigns a weight to each method based on its credibility. As a case study, we applied the proposed consensus method on synthetic and real microarray data sets, and compared its performance to that of the average-ranking-based consensus and individual inference methods. Our results show that our weighted consensus method achieves superior performance over the unweighted one, suggesting that assigning weights to different individual methods rather than giving them equal weights improves the accuracy. © 2016 Elsevier B.V.
Allot, Alexis; Chennen, Kirsley; Nevers, Yannis; Poidevin, Laetitia; Kress, Arnaud; Ripp, Raymond; Thompson, Julie Dawn; Poch, Olivier; Lecompte, Odile
The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user's specific interests and provides an efficient way to share information with collaborators. Furthermore, the user's behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi.fr/mygenefriends. ©Alexis Allot, Kirsley Chennen, Yannis
Ben-Tabou de-Leon, Smadar
Developmental gene regulatory networks robustly control the timely activation of regulatory and differentiation genes. The structure of these networks underlies their capacity to buffer intrinsic and extrinsic noise and maintain embryonic morphology. Here I illustrate how the use of specific architectures by the sea urchin developmental regulatory networks enables the robust control of cell fate decisions. The Wnt-βcatenin signaling pathway patterns the primary embryonic axis while the BMP signaling pathway patterns the secondary embryonic axis in the sea urchin embryo and across bilateria. Interestingly, in the sea urchin in both cases, the signaling pathway that defines the axis controls directly the expression of a set of downstream regulatory genes. I propose that this direct activation of a set of regulatory genes enables a uniform regulatory response and a clear cut cell fate decision in the endoderm and in the dorsal ectoderm. The specification of the mesodermal pigment cell lineage is activated by Delta signaling that initiates a triple positive feedback loop that locks down the pigment specification state. I propose that the use of compound positive feedback circuitry provides the endodermal cells enough time to turn off mesodermal genes and ensures correct mesoderm vs. endoderm fate decision. Thus, I argue that understanding the control properties of repeatedly used regulatory architectures illuminates their role in embryogenesis and provides possible explanations to their resistance to evolutionary change.
Smadar eBen-Tabou De-Leon
Full Text Available Developmental gene regulatory networks robustly control the timely activation of regulatory and differentiation genes. The structure of these networks underlies their capacity to buffer intrinsic and extrinsic noise and maintain embryonic morphology. Here I illustrate how the use of specific architectures by the sea urchin developmental regulatory networks enables the robust control of cell fate decisions. The Wnt-βcatenin signaling pathway patterns the primary embryonic axis while the BMP signaling pathway patterns the secondary embryonic axis in the sea urchin embryo and across bilateria. Interestingly, in the sea urchin in both cases, the signaling pathway that defines the axis controls directly the expression of a set of downstream regulatory genes. I propose that this direct activation of a set of regulatory genes enables a uniform regulatory response and a clear cut cell fate decision in the endoderm and in the dorsal ectoderm. The specification of the mesodermal pigment cell lineage is activated by Delta signaling that initiates a triple positive feedback loop that locks down the pigment specification state. I propose that the use of compound positive feedback circuitry provides the endodermal cells enough time to turn off mesodermal genes and ensures correct mesoderm vs. endoderm fate decision. Thus, I argue that understanding the control properties of repeatedly used regulatory architectures illuminates their role in embryogenesis and provides possible explanations to their resistance to evolutionary change.
Castillo Luis F.
Full Text Available Gene annotation is a process that encompasses multiple approaches on the analysis of nucleic acids or protein sequences in order to assign structural and functional characteristics to gene models. When thousands of gene models are being described in an organism genome, construction and visualization of gene networks impose novel challenges in the understanding of complex expression patterns and the generation of new knowledge in genomics research. In order to take advantage of accumulated text data after conventional gene sequence analysis, this work applied semantics in combination with visualization tools to build transcriptome networks from a set of coffee gene annotations. A set of selected coffee transcriptome sequences, chosen by the quality of the sequence comparison reported by Basic Local Alignment Search Tool (BLAST and Interproscan, were filtered out by coverage, identity, length of the query, and e-values. Meanwhile, term descriptors for molecular biology and biochemistry were obtained along the Wordnet dictionary in order to construct a Resource Description Framework (RDF using Ruby scripts and Methontology to find associations between concepts. Relationships between sequence annotations and semantic concepts were graphically represented through a total of 6845 oriented vectors, which were reduced to 745 non-redundant associations. A large gene network connecting transcripts by way of relational concepts was created where detailed connections remain to be validated for biological significance based on current biochemical and genetics frameworks. Besides reusing text information in the generation of gene connections and for data mining purposes, this tool development opens the possibility to visualize complex and abundant transcriptome data, and triggers the formulation of new hypotheses in metabolic pathways analysis.
Shimamura, Teppei; Imoto, Seiya; Yamaguchi, Rui; Nagasaki, Masao; Miyano, Satoru
Elucidating the differences between cellular responses to various biological conditions or external stimuli is an important challenge in systems biology. Many approaches have been developed to reverse engineer a cellular system, called gene network, from time series microarray data in order to understand a transcriptomic response under a condition of interest. Comparative topological analysis has also been applied based on the gene networks inferred independently from each of the multiple time series datasets under varying conditions to find critical differences between these networks. However, these comparisons often lead to misleading results, because each network contains considerable noise due to the limited length of the time series. We propose an integrated approach for inferring multiple gene networks from time series expression data under varying conditions. To the best of our knowledge, our approach is the first reverse-engineering method that is intended for transcriptomic network comparison between varying conditions. Furthermore, we propose a state-of-the-art parameter estimation method, relevance-weighted recursive elastic net, for providing higher precision and recall than existing reverse-engineering methods. We analyze experimental data of MCF-7 human breast cancer cells stimulated by epidermal growth factor or heregulin with several doses and provide novel biological hypotheses through network comparison. The software NETCOMP is available at http://bonsai.ims.u-tokyo.ac.jp/ approximately shima/NETCOMP/.
Full Text Available Marbling is an important trait in characterization beef quality and a major factor for determining the price of beef in the Korean beef market. In particular, marbling is a complex trait and needs a system-level approach for identifying candidate genes related to the trait. To find the candidate gene associated with marbling, we used a weighted gene coexpression network analysis from the expression value of bovine genes. Hub genes were identified; they were topologically centered with large degree and BC values in the global network. We performed gene expression analysis to detect candidate genes in M. longissimus with divergent marbling phenotype (marbling scores 2 to 7 using qRT-PCR. The results demonstrate that transmembrane protein 60 (TMEM60 and dihydropyrimidine dehydrogenase (DPYD are associated with increasing marbling fat. We suggest that the network-based approach in livestock may be an important method for analyzing the complex effects of candidate genes associated with complex traits like marbling or tenderness.
Full Text Available Human gene regulatory networks (GRN can be difficult to interpret due to a tangle of edges interconnecting thousands of genes. We constructed a general human GRN from extensive transcription factor and microRNA target data obtained from public databases. In a subnetwork of this GRN that is active during estrogen stimulation of MCF-7 breast cancer cells, we benchmarked automated algorithms for identifying core regulatory genes (transcription factors and microRNAs. Among these algorithms, we identified K-core decomposition, pagerank and betweenness centrality algorithms as the most effective for discovering core regulatory genes in the network evaluated based on previously known roles of these genes in MCF-7 biology as well as in their ability to explain the up or down expression status of up to 70% of the remaining genes. Finally, we validated the use of K-core algorithm for organizing the GRN in an easier to interpret layered hierarchy where more influential regulatory genes percolate towards the inner layers. The integrated human gene and miRNA network and software used in this study are provided as supplementary materials (S1 Data accompanying this manuscript.
Hassani-Pak, Keywan; Castellote, Martin; Esch, Maria; Hindle, Matthew; Lysenko, Artem; Taubert, Jan; Rawlings, Christopher
The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.
Daniel B Larremore
Full Text Available The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs, and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.
Larremore, Daniel B.; Clauset, Aaron; Buckee, Caroline O.
The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences. PMID:24130474
Fung, Elizabeth-sharon [Los Alamos National Laboratory
Choice of a T-lymphoid fate by hematopoietic progenitor cells depends on sustained Notch-Delta signaling combined with tightly-regulated activities of multiple transcription factors. To dissect the regulatory network connections that mediate this process, we have used high-resolution analysis of regulatory gene expression trajectories from the beginning to the end of specification; tests of the short-term Notchdependence of these gene expression changes; and perturbation analyses of the effects of overexpression of two essential transcription factors, namely PU.l and GATA-3. Quantitative expression measurements of >50 transcription factor and marker genes have been used to derive the principal components of regulatory change through which T-cell precursors progress from primitive multipotency to T-lineage commitment. Distinct parts of the path reveal separate contributions of Notch signaling, GATA-3 activity, and downregulation of PU.l. Using BioTapestry, the results have been assembled into a draft gene regulatory network for the specification of T-cell precursors and the choice of T as opposed to myeloid dendritic or mast-cell fates. This network also accommodates effects of E proteins and mutual repression circuits of Gfil against Egr-2 and of TCF-l against PU.l as proposed elsewhere, but requires additional functions that remain unidentified. Distinctive features of this network structure include the intense dose-dependence of GATA-3 effects; the gene-specific modulation of PU.l activity based on Notch activity; the lack of direct opposition between PU.l and GATA-3; and the need for a distinct, late-acting repressive function or functions to extinguish stem and progenitor-derived regulatory gene expression.
Guzeldemir-Akcakanat, Esra; Sunnetci-Akkoyunlu, Deniz; Orucguney, Begum; Cine, Naci; Kan, Bahadır; Yılmaz, Elif Büsra; Gümüşlü, Esen; Savli, Hakan
In this study, molecular biomarkers that play a role in the development of generalized aggressive periodontitis (GAgP) are investigated using gingival tissue samples through omics-based whole-genome transcriptomics while using healthy individuals as background controls. Gingival tissue biopsies from 23 patients with GAgP and 25 healthy individuals were analyzed using gene-expression microarrays with network and pathway analyses to identify gene-expression patterns. To substantiate the results of the microarray studies, real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) was performed to assess the messenger RNA (mRNA) expression of MZB1 and DSC1. The microarrays and qRT-PCR resulted in similar gene-expression changes, confirming the reliability of the microarray results at the mRNA level. As a result of the gene-expression microarray studies, four significant gene networks were identified. The most upregulated genes were found as MZB1, TNFRSF17, PNOC, FCRL5, LAX1, BMS1P20, IGLL5, MMP7, SPAG4, and MEI1; the most downregulated genes were found as LOR, LAMB4, AADACL2, MAPT, ARG1, NPR3, AADAC, DSC1, LRRC4, and CHP2. Functions of the identified genes that were involved in gene networks were cellular development, cell growth and proliferation, cellular movement, cell-cell signaling and interaction, humoral immune response, protein synthesis, cell death and survival, cell population and organization, organismal injury and abnormalities, molecular transport, and small-molecule biochemistry. The data suggest new networks that have important functions as humoral immune response and organismal injury/abnormalities. Future analyses may facilitate proteomic profiling analyses to identify gene-expression patterns related to clinical outcome.
Liu, Ji-Long; Zhao, Miao
Ectopic pregnancy is a very dangerous complication of pregnancy, affecting 1%–2% of all reported pregnancies. Due to ethical constraints on human biopsies and the lack of suitable animal models, there has been little success in identifying functionally important genes in the pathogenesis of ectopic pregnancy. In the present study, we developed a random walk–based computational method named TM-rank to prioritize ectopic pregnancy–related genes based on text mining data and gene network information. Using a defined threshold value, we identified five top-ranked genes: VEGFA (vascular endothelial growth factor A), IL8 (interleukin 8), IL6 (interleukin 6), ESR1 (estrogen receptor 1) and EGFR (epidermal growth factor receptor). These genes are promising candidate genes that can serve as useful diagnostic biomarkers and therapeutic targets. Our approach represents a novel strategy for prioritizing disease susceptibility genes. PMID:26840308
Varrella, Stefano; Romano, Giovanna; Costantini, Susan; Ruocco, Nadia; Ianora, Adrianna; Bentley, Matt G; Costantini, Maria
Marine organisms possess a series of cellular strategies to counteract the negative effects of toxic compounds, including the massive reorganization of gene expression networks. Here we report the modulated dose-dependent response of activated genes by diatom polyunsaturated aldehydes (PUAs) in the sea urchin Paracentrotus lividus. PUAs are secondary metabolites deriving from the oxidation of fatty acids, inducing deleterious effects on the reproduction and development of planktonic and benthic organisms that feed on these unicellular algae and with anti-cancer activity. Our previous results showed that PUAs target several genes, implicated in different functional processes in this sea urchin. Using interactomic Ingenuity Pathway Analysis we now show that the genes targeted by PUAs are correlated with four HUB genes, NF-κB, p53, δ-2-catenin and HIF1A, which have not been previously reported for P. lividus. We propose a working model describing hypothetical pathways potentially involved in toxic aldehyde stress response in sea urchins. This represents the first report on gene networks affected by PUAs, opening new perspectives in understanding the cellular mechanisms underlying the response of benthic organisms to diatom exposure.
Full Text Available Marine organisms possess a series of cellular strategies to counteract the negative effects of toxic compounds, including the massive reorganization of gene expression networks. Here we report the modulated dose-dependent response of activated genes by diatom polyunsaturated aldehydes (PUAs in the sea urchin Paracentrotus lividus. PUAs are secondary metabolites deriving from the oxidation of fatty acids, inducing deleterious effects on the reproduction and development of planktonic and benthic organisms that feed on these unicellular algae and with anti-cancer activity. Our previous results showed that PUAs target several genes, implicated in different functional processes in this sea urchin. Using interactomic Ingenuity Pathway Analysis we now show that the genes targeted by PUAs are correlated with four HUB genes, NF-κB, p53, δ-2-catenin and HIF1A, which have not been previously reported for P. lividus. We propose a working model describing hypothetical pathways potentially involved in toxic aldehyde stress response in sea urchins. This represents the first report on gene networks affected by PUAs, opening new perspectives in understanding the cellular mechanisms underlying the response of benthic organisms to diatom exposure.
Albert, Nick W; Davies, Kevin M; Schwinn, Kathy E
The diversity of pigmentation patterns observed in plants occurs due to the spatial distribution and accumulation of colored compounds, which may also be associated with structural changes to the tissue. Anthocyanins are flavonoids that provide red/purple/blue coloration to plants, often forming complex patterns such as spots, stripes, and vein-associated pigmentation, particularly in flowers. These patterns are determined by the activity of MYB-bHLH-WDR (MBW) transcription factor complexes, which activate the anthocyanin biosynthesis genes, resulting in anthocyanin pigment accumulation. Recently, we established that the MBW complex controlling anthocyanin synthesis acts within a gene regulation network that is conserved within at least the Eudicots. This network involves hierarchy, reinforcement, and feedback mechanisms that allow for stringent and responsive regulation of the anthocyanin biosynthesis genes. The gene network and mobile nature of the WDR and R3-MYB proteins provide exciting new opportunities to explore the basis of pigmentation patterning, and to investigate the evolutionary history of the MBW components in land plants.
Zhu, Z-Q; Tang, J-S; Cao, X-J
Ankylosing spondylitis (AS) is a chronic, inflammatory arthritis and autoimmune disease. The main symptom of AS is inflammatory spinal pain; with time, some patients develop ankylosis and spinal immobility. We aim to find cure available for ankylosing spondylitis. We used the GSE11886 series to identify potential genes that related to AS to construct a regulation network. In the network, some of TFs and target genes have been proved related with AS in previous study, such as NFKB1, STAT1, STAT4, TNFSF10, IL2RA, and IL2RB. We also found some new TFs (Franscription Factors) and target genes response to AS, such as BXDC5, and EGFR. Further analysis indicated some significant pathways are associated with AS, including antigen processing and presentation and cytokine-cytokine receptor interaction, etc.; although not significant, there was evident that they play an important role in AS progression, such as apoptosis and systemic lupus erythematosus. Therefore, it is demonstrated that transcriptome network analysis is useful in identification of the candidate genes in AS.
Full Text Available BACKGROUND: The atmospheric CO2 concentration increases every year. While the effects of elevated CO2 on plant growth, physiology and metabolism have been studied, there is now a pressing need to understand the molecular mechanisms of how plants will respond to future increases in CO2 concentration using genomic techniques. PRINCIPAL FINDINGS: Gene expression in triploid white poplar ((Populus tomentosa ×P. bolleana ×P. tomentosa leaves was investigated using the Affymetrix poplar genome gene chip, after three months of growth in controlled environment chambers under three CO2 concentrations. Our physiological findings showed the growth, assessed as stem diameter, was significantly increased, and the net photosynthetic rate was decreased in elevated CO2 concentrations. The concentrations of four major endogenous hormones appeared to actively promote plant development. Leaf tissues under elevated CO2 concentrations had 5,127 genes with different expression patterns in comparison to leaves under the ambient CO2 concentration. Among these, 8 genes were finally selected for further investigation by using randomized variance model corrective ANOVA analysis, dynamic gene expression profiling, gene network construction, and quantitative real-time PCR validation. Among the 8 genes in the network, aldehyde dehydrogenase and pyruvate kinase were situated in the core and had interconnections with other genes. CONCLUSIONS: Under elevated CO2 concentrations, 8 significantly changed key genes involved in metabolism and responding to stimulus of external environment were identified. These genes play crucial roles in the signal transduction network and show strong correlations with elevated CO2 exposure. This study provides several target genes, further investigation of which could provide an initial step for better understanding the molecular mechanisms of plant acclimation and evolution in future rising CO2 concentrations.
Munsky, Brian; Trinh, Brooke; Khammash, Mustafa
The cellular environment is abuzz with noise originating from the inherent random motion of reacting molecules in the living cell. In this noisy environment, clonal cell populations exhibit cell-to-cell variability that can manifest significant prototypical differences. Noise induced stochastic fluctuations in cellular constituents can be measured and their statistics quantified using flow cytometry, single molecule fluorescence in situ hybridization, time lapse fluorescence microscopy and other single cell and single molecule measurement techniques. We show that these random fluctuations carry within them valuable information about the underlying genetic network. Far from being a nuisance, the ever-present cellular noise acts as a rich source of excitation that, when processed through a gene network, carries its distinctive fingerprint that encodes a wealth of information about that network. We demonstrate that in some cases the analysis of these random fluctuations enables the full identification of network parameters, including those that may otherwise be difficult to measure. We use theoretical investigations to establish experimental guidelines for the identification of gene regulatory networks, and we apply these guideline to experimentally identify predictive models for different regulatory mechanisms in bacteria and yeast.
Full Text Available Abstract Background Various computational models have been of interest due to their use in the modelling of gene regulatory networks (GRNs. As a logical model, probabilistic Boolean networks (PBNs consider molecular and genetic noise, so the study of PBNs provides significant insights into the understanding of the dynamics of GRNs. This will ultimately lead to advances in developing therapeutic methods that intervene in the process of disease development and progression. The applications of PBNs, however, are hindered by the complexities involved in the computation of the state transition matrix and the steady-state distribution of a PBN. For a PBN with n genes and N Boolean networks, the complexity to compute the state transition matrix is O(nN22n or O(nN2n for a sparse matrix. Results This paper presents a novel implementation of PBNs based on the notions of stochastic logic and stochastic computation. This stochastic implementation of a PBN is referred to as a stochastic Boolean network (SBN. An SBN provides an accurate and efficient simulation of a PBN without and with random gene perturbation. The state transition matrix is computed in an SBN with a complexity of O(nL2n, where L is a factor related to the stochastic sequence length. Since the minimum sequence length required for obtaining an evaluation accuracy approximately increases in a polynomial order with the number of genes, n, and the number of Boolean networks, N, usually increases exponentially with n, L is typically smaller than N, especially in a network with a large number of genes. Hence, the computational efficiency of an SBN is primarily limited by the number of genes, but not directly by the total possible number of Boolean networks. Furthermore, a time-frame expanded SBN enables an efficient analysis of the steady-state distribution of a PBN. These findings are supported by the simulation results of a simplified p53 network, several randomly generated networks and a
Full Text Available Integration of multi-omics data of cancer can help people to explore cancers comprehensively. However, with a large volume of different omics and functional data being generated, there is a major challenge to distinguish functional driver genes from a sea of inconsequential passenger genes that accrue stochastically but do not contribute to cancer development. In this paper, we present a gene length-based network method, named DriverFinder, to identify driver genes by integrating somatic mutations, copy number variations, gene-gene interaction network, tumor expression, and normal expression data. To illustrate the performance of DriverFinder, it is applied to four cancer types from The Cancer Genome Atlas including breast cancer, head and neck squamous cell carcinoma, thyroid carcinoma, and kidney renal clear cell carcinoma. Compared with some conventional methods, the results demonstrate that the proposed method is effective. Moreover, it can decrease the influence of gene length in identifying driver genes and identify some rare mutated driver genes.
Chiu, Yu-Chiao; Wang, Li-Ju; Hsiao, Tzu-Hung; Chuang, Eric Y; Chen, Yidong
With the advances in high-throughput gene profiling technologies, a large volume of gene interaction maps has been constructed. A higher-level layer of gene-gene interaction, namely modulate gene interaction, is composed of gene pairs of which interaction strengths are modulated by (i.e., dependent on) the expression level of a key modulator gene. Systematic investigations into the modulation by estrogen receptor (ER), the best-known modulator gene, have revealed the functional and prognostic significance in breast cancer. However, a genome-wide identification of key modulator genes that may further unveil the landscape of modulated gene interaction is still lacking. We proposed a systematic workflow to screen for key modulators based on genome-wide gene expression profiles. We designed four modularity parameters to measure the ability of a putative modulator to perturb gene interaction networks. Applying the method to a dataset of 286 breast tumors, we comprehensively characterized the modularity parameters and identified a total of 973 key modulator genes. The modularity of these modulators was verified in three independent breast cancer datasets. ESR1, the encoding gene of ER, appeared in the list, and abundant novel modulators were illuminated. For instance, a prognostic predictor of breast cancer, SFRP1, was found the second modulator. Functional annotation analysis of the 973 modulators revealed involvements in ER-related cellular processes as well as immune- and tumor-associated functions. Here we present, as far as we know, the first comprehensive analysis of key modulator genes on a genome-wide scale. The validity of filtering parameters as well as the conservativity of modulators among cohorts were corroborated. Our data bring new insights into the modulated layer of gene-gene interaction and provide candidates for further biological investigations.
Swamy, Ashwin Balegar
This thesis involves development of an interactive GIS (Geographic Information System) based application, which gives information about the ancient history of Egypt. The astonishing architecture, the strange burial rituals and their civilization were some of the intriguing questions that motivated me towards developing this application. The application is a historical timeline starting from 3100 BC, leading up to 664 BC, focusing on the evolution of the Egyptian dynasties. The tool holds information regarding some of the famous monuments which were constructed during that era and also about the civilizations that co-existed. It also provides details about the religions followed by their kings. It also includes the languages spoken during those periods. The tool is developed using JAVA, a programing language and MOJO (Map Objects Java Objects) a product of ESRI (Environmental Science Research Institute) to create map objects, to provide geographic information. JAVA Swing is used for designing the user interface. HTML (Hyper Text Markup Language) pages are created to provide the user with more information related to the historic period. CSS (Cascade Style Sheets) and JAVA Scripts are used with HTML5 to achieve creative display of content. The tool is kept simple and easy for the user to interact with. The tool also includes pictures and videos for the user to get a feel of the historic period. The application is built to motivate people to know more about one of the prominent and ancient civilization of the Mediterranean world.
Full Text Available Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC, is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC's ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting "disease map" network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung's disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks.
Araújo, Daniela; Henriques, Mariana; Silva, Sónia
Most cases of candidiasis have been attributed to Candida albicans, but Candida glabrata, Candida parapsilosis and Candida tropicalis, designated as non-C. albicans Candida (NCAC), have been identified as frequent human pathogens. Moreover, Candida biofilms are an escalating clinical problem associated with significant rates of mortality. Biofilms have distinct developmental phases, including adhesion/colonisation, maturation and dispersal, controlled by complex regulatory networks. This review discusses recent advances regarding Candida species biofilm regulatory network genes, which are key components for candidiasis. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lee, Sungyoung; Kwon, Min-Seok; Park, Taesung
Most common complex traits, such as obesity, hypertension, diabetes, and cancers, are known to be associated with multiple genes, environmental factors, and their epistasis. Recently, the development of advanced genotyping technologies has allowed us to perform genome-wide association studies (GWASs). For detecting the effects of multiple genes on complex traits, many approaches have been proposed for GWASs. Multifactor dimensionality reduction (MDR) is one of the powerful and efficient methods for detecting high-order gene-gene (GxG) interactions. However, the biological interpretation of GxG interactions identified by MDR analysis is not easy. In order to aid the interpretation of MDR results, we propose a network graph analysis to elucidate the meaning of identified GxG interactions. The proposed network graph analysis consists of three steps. The first step is for performing GxG interaction analysis using MDR analysis. The second step is to draw the network graph using the MDR result. The third step is to provide biological evidence of the identified GxG interaction using external biological databases. The proposed method was applied to Korean Association Resource (KARE) data, containing 8838 individuals with 327,632 single-nucleotide polymorphisms, in order to perform GxG interaction analysis of body mass index (BMI). Our network graph analysis successfully showed that many identified GxG interactions have known biological evidence related to BMI. We expect that our network graph analysis will be helpful to interpret the biological meaning of GxG interactions.
Yang, Yunfeng [ORNL; Harris, Daniel P [ORNL; Luo, Feng [Clemson University; Joachimiak, Marcin [Clemson University; Wu, Liyou [University of Oklahoma; Dehal, Paramvir [Lawrence Berkeley National Laboratory (LBNL); Jacobsen, Janet [Lawrence Berkeley National Laboratory (LBNL); Yang, Zamin Koo [ORNL; Gao, Haichun [University of Oklahoma; Arkin, Adam [Lawrence Berkeley National Laboratory (LBNL); Palumbo, Anthony Vito [ORNL; Zhou, Jizhong [University of Oklahoma
It is of great interest to study the iron response of the -proteobacterium Shewanella oneidensis since it possesses a high content of iron and is capable of utilizing iron for anaerobic respiration. We report here that the iron response in S. oneidensis is a rapid process. To gain more insights into the bacterial response to iron, temporal gene expression profiles were examined for iron depletion and repletion, resulting in identification of iron-responsive biological pathways in a gene co-expression network. Iron acquisition systems, including genes unique to S. oneidensis, were rapidly and strongly induced by iron depletion, and repressed by iron repletion. Some were required for iron depletion, as exemplified by the mutational analysis of the putative siderophore biosynthesis protein SO3032. Unexpectedly, a number of genes related to anaerobic energy metabolism were repressed by iron depletion and induced by repletion, which might be due to the iron storage potential of their protein products. Other iron-responsive biological pathways include protein degradation, aerobic energy metabolism and protein synthesis. Furthermore, sequence motifs enriched in gene clusters as well as their corresponding DNA-binding proteins (Fur, CRP and RpoH) were identified, resulting in a regulatory network of iron response in S. oneidensis. Together, this work provides an overview of iron response and reveals novel features in S. oneidensis, including Shewanella-specific iron acquisition systems, and suggests the intimate relationship between anaerobic energy metabolism and iron response.
ABSTRACT: BACKGROUND: The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very difficult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insufficient. RESULTS: This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and offer a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. CONCLUSIONS: Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identified and a platform for development of appropriate model formalisms is established.
Raza, Khalid; Alam, Mansaf
One of the exciting problems in systems biology research is to decipher how genome controls the development of complex biological system. The gene regulatory networks (GRNs) help in the identification of regulatory interactions between genes and offer fruitful information related to functional role of individual gene in a cellular system. Discovering GRNs lead to a wide range of applications, including identification of disease related pathways providing novel tentative drug targets, helps to predict disease response, and also assists in diagnosing various diseases including cancer. Reconstruction of GRNs from available biological data is still an open problem. This paper proposes a recurrent neural network (RNN) based model of GRN, hybridized with generalized extended Kalman filter for weight update in backpropagation through time training algorithm. The RNN is a complex neural network that gives a better settlement between biological closeness and mathematical flexibility to model GRN; and is also able to capture complex, non-linear and dynamic relationships among variables. Gene expression data are inherently noisy and Kalman filter performs well for estimation problem even in noisy data. Hence, we applied non-linear version of Kalman filter, known as generalized extended Kalman filter, for weight update during RNN training. The developed model has been tested on four benchmark networks such as DNA SOS repair network, IRMA network, and two synthetic networks from DREAM Challenge. We performed a comparison of our results with other state-of-the-art techniques which shows superiority of our proposed model. Further, 5% Gaussian noise has been induced in the dataset and result of the proposed model shows negligible effect of noise on results, demonstrating the noise tolerance capability of the model. Copyright © 2016 Elsevier Ltd. All rights reserved.
Full Text Available Recent Genome-Wide Association Studies (GWAS have revealed numerous Crohn's disease susceptibility genes and a key challenge now is in understanding how risk polymorphisms in associated genes might contribute to development of this disease. For a gene to contribute to disease phenotype, its risk variant will likely adversely communicate with a variety of other gene products to result in dysregulation of common signaling pathways. A vital challenge is to elucidate pathways of potentially greatest influence on pathological behaviour, in a manner recognizing how multiple relevant genes may yield integrative effect. In this work we apply mathematical analysis of networks involving the list of recently described Crohn's susceptibility genes, to prioritise pathways in relation to their potential development of this disease. Prioritisation was performed by applying a text mining and a diffusion based method (GRAIL, GPEC. Prospective biological significance of the resulting prioritised list of proteins is highlighted by changes in their gene expression levels in Crohn's patients intestinal tissue in comparison with healthy donors.
Wellmer, Frank; Riechmann, José L
The onset of flower formation is a key regulatory event during the life cycle of angiosperm plants, which marks the beginning of the reproductive phase of development. It has been shown that floral initiation is under tight genetic control, and deciphering the underlying molecular mechanisms has been a main area of interest in plant biology for the past two decades. Here, we provide an overview of the developmental and genetic processes that occur during floral initiation. We further review recent studies that have led to the genome-wide identification of target genes of key floral regulators and discuss how they have contributed to an in-depth understanding of the gene regulatory networks controlling early flower development. We focus especially on a master regulator of floral initiation in Arabidopsis thaliana APETALA1 (AP1), but also outline what is known about the AP1 network in other plant species and the evolutionary implications. Copyright © 2010 Elsevier Ltd. All rights reserved.
James N. Warnock
Full Text Available The study aimed to identify mechanosensitive pathways and gene networks that are stimulated by elevated cyclic pressure in aortic valve interstitial cells (VICs and lead to detrimental tissue remodeling and/or pathogenesis. Porcine aortic valve leaflets were exposed to cyclic pressures of 80 or 120 mmHg, corresponding to diastolic transvalvular pressure in normal and hypertensive conditions, respectively. Linear, two-cycle amplification of total RNA, followed by microarray was performed for transcriptome analysis (with qRT-PCR validation. A combination of systems biology modeling and pathway analysis identified novel genes and molecular mechanisms underlying the biological response of VICs to elevated pressure. 56 gene transcripts related to inflammatory response mechanisms were differentially expressed. TNF-α, IL-1α, and IL-1β were key cytokines identified from the gene network model. Also of interest was the discovery that pentraxin 3 (PTX3 was significantly upregulated under elevated pressure conditions (41-fold change. In conclusion, a gene network model showing differentially expressed inflammatory genes and their interactions in VICs exposed to elevated pressure has been developed. This system overview has detected key molecules that could be targeted for pharmacotherapy of aortic stenosis in hypertensive patients.
18 August 2004 This Mars Global Surveyor (MGS) Mars Orbiter Camera (MOC) image shows groupings of large ripple-like windblown bedforms on the floor of a large crater (larger than the image shown here) in Sinus Sabaeus, south of Schiaparelli Basin. These ripple-like features are much larger than typical wind ripples on Earth, but smaller than typical sand dunes on either planet. Like most of the other ripple-like bedforms in Sinus Sabaeus, they are probably ancient and no longer mobile. Dark streaks on the substrate between the bedforms were formed by passing dust devils. This image is located near 13.0oS, 343.7oW. The image covers an area about 3 km (1.9 mi) across and sunlight illuminates the scene from the upper left.
Olszewski Kellen L
Full Text Available Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes. Results We developed Nearest Neighbor Networks (NNN, a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the
Terenina, Elena; Fabre, Stephane; Bonnet, Agnès; Monniaux, Danielle; Robert-Granié, Christèle; SanCristobal, Magali; Sarry, Julien; Vignoles, Florence; Gondret, Florence; Monget, Philippe; Tosser-Klopp, Gwenola
Ovarian folliculogenesis corresponds to the development of follicles leading to either ovulation or degeneration, this latter process being called atresia. Even if atresia involves apoptosis, its mechanism is not well understood. The objective of this study was to analyze global gene expression in pig granulosa cells of ovarian follicles during atresia. The transcriptome analysis was performed on a 9,216 cDNA microarray to identify gene networks and candidate genes involved in pig ovarian follicular atresia. We found 1,684 significantly regulated genes to be differentially regulated between small healthy follicles and small atretic follicles. Among them, 287 genes had a fold-change higher than two between the two follicle groups. Eleven genes (DKK3, GADD45A, CAMTA2, CCDC80, DAPK2, ECSIT, MSMB, NUPR1, RUNX2, SAMD4A, and ZNF628) having a fold-change higher than five between groups could likely serve as markers of follicular atresia. Moreover, automatic confrontation of deregulated genes with literature data highlighted 93 genes as regulatory candidates of pig granulosa cell atresia. Among these genes known to be inhibitors of apoptosis, stimulators of apoptosis, or tumor suppressors INHBB, HNF4, CLU, different interleukins (IL5, IL24), TNF-associated receptor (TNFR1), and cytochrome-c oxidase (COX) were suggested as playing an important role in porcine atresia. The present study also enlists key upstream regulators in follicle atresia based on our results and on a literature review. The novel gene candidates and gene networks identified in the current study lead to a better understanding of the molecular regulation of ovarian follicular atresia. Copyright © 2017 the American Physiological Society.
Chen, Xi; Wang, Qiao-Ling; Zhang, Meng-Hui
The current study aimed to identify key genes in glaucoma based on a benchmarked dataset and gene regulatory network (GRN). Local and global noise was added to the gene expression dataset to produce a benchmarked dataset. Differentially-expressed genes (DEGs) between patients with glaucoma and normal controls were identified utilizing the Linear Models for Microarray Data (Limma) package based on benchmarked dataset. A total of 5 GRN inference methods, including Zscore, GeneNet, context likelihood of relatedness (CLR) algorithm, Partial Correlation coefficient with Information Theory (PCIT) and GEne Network Inference with Ensemble of Trees (Genie3) were evaluated using receiver operating characteristic (ROC) and precision and recall (PR) curves. The interference method with the best performance was selected to construct the GRN. Subsequently, topological centrality (degree, closeness and betweenness) was conducted to identify key genes in the GRN of glaucoma. Finally, the key genes were validated by performing reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A total of 176 DEGs were detected from the benchmarked dataset. The ROC and PR curves of the 5 methods were analyzed and it was determined that Genie3 had a clear advantage over the other methods; thus, Genie3 was used to construct the GRN. Following topological centrality analysis, 14 key genes for glaucoma were identified, including IL6 , EPHA2 and GSTT1 and 5 of these 14 key genes were validated by RT-qPCR. Therefore, the current study identified 14 key genes in glaucoma, which may be potential biomarkers to use in the diagnosis of glaucoma and aid in identifying the molecular mechanism of this disease.
Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.
Background: Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori
Ament, Seth A; Pearl, Jocelynn R; Cantle, Jeffrey P; Bragg, Robert M; Skene, Peter J; Coffey, Sydney R; Bergey, Dani E; Wheeler, Vanessa C; MacDonald, Marcy E; Baliga, Nitin S; Rosinski, Jim; Hood, Leroy E; Carroll, Jeffrey B; Price, Nathan D
Transcriptional changes occur presymptomatically and throughout Huntington's disease (HD), motivating the study of transcriptional regulatory networks (TRNs) in HD We reconstructed a genome-scale model for the target genes of 718 transcription factors (TFs) in the mouse striatum by integrating a model of genomic binding sites with transcriptome profiling of striatal tissue from HD mouse models. We identified 48 differentially expressed TF-target gene modules associated with age- and CAG repeat length-dependent gene expression changes in Htt CAG knock-in mouse striatum and replicated many of these associations in independent transcriptomic and proteomic datasets. Thirteen of 48 of these predicted TF-target gene modules were also differentially expressed in striatal tissue from human disease. We experimentally validated a specific model prediction that SMAD3 regulates HD-related gene expression changes using chromatin immunoprecipitation and deep sequencing (ChIP-seq) of mouse striatum. We found CAG repeat length-dependent changes in the genomic occupancy of SMAD3 and confirmed our model's prediction that many SMAD3 target genes are downregulated early in HD. © 2018 The Authors. Published under the terms of the CC BY 4.0 license.
Shaw, Harry C.
Molecular biology provides the ability to implement forms of information and network security completely outside the bounds of legacy security protocols and algorithms. This paper addresses an approach which instantiates the power of gene expression for security. Molecular biology provides a rich source of gene expression and regulation mechanisms, which can be adopted to use in the information and electronic communication domains. Conventional security protocols are becoming increasingly vulnerable due to more intensive, highly capable attacks on the underlying mathematics of cryptography. Security protocols are being undermined by social engineering and substandard implementations by IT (Information Technology) organizations. Molecular biology can provide countermeasures to these weak points with the current security approaches. Future advances in instruments for analyzing assays will also enable this protocol to advance from one of cryptographic algorithms to an integrated system of cryptographic algorithms and real-time assays of gene expression products.
Shaw, Harry C.
Molecular biology provides the ability to implement forms of information and network security completely outside the bounds of legacy security protocols and algorithms. This paper addresses an approach which instantiates the power of gene expression for security. Molecular biology provides a rich source of gene expression and regulation mechanisms, which can be adopted to use in the information and electronic communication domains. Conventional security protocols are becoming increasingly vulnerable due to more intensive, highly capable attacks on the underlying mathematics of cryptography. Security protocols are being undermined by social engineering and substandard implementations by IT organizations. Molecular biology can provide countermeasures to these weak points with the current security approaches. Future advances in instruments for analyzing assays will also enable this protocol to advance from one of cryptographic algorithms to an integrated system of cryptographic algorithms and real-time expression and assay of gene expression products.
Dugas Sandra L
Full Text Available Abstract Background The ars gene system provides arsenic resistance for a variety of microorganisms and can be chromosomal or plasmid-borne. The arsC gene, which codes for an arsenate reductase is essential for arsenate resistance and transforms arsenate into arsenite, which is extruded from the cell. A survey of GenBank shows that arsC appears to be phylogenetically widespread both in organisms with known arsenic resistance and those organisms that have been sequenced as part of whole genome projects. Results Phylogenetic analysis of aligned arsC sequences shows broad similarities to the established 16S rRNA phylogeny, with separation of bacterial, archaeal, and subsequently eukaryotic arsC genes. However, inconsistencies between arsC and 16S rRNA are apparent for some taxa. Cyanobacteria and some of the γ-Proteobacteria appear to possess arsC genes that are similar to those of Low GC Gram-positive Bacteria, and other isolated taxa possess arsC genes that would not be expected based on known evolutionary relationships. There is no clear separation of plasmid-borne and chromosomal arsC genes, although a number of the Enterobacteriales (γ-Proteobacteria possess similar plasmid-encoded arsC sequences. Conclusion The overall phylogeny of the arsenate reductases suggests a single, early origin of the arsC gene and subsequent sequence divergence to give the distinct arsC classes that exist today. Discrepancies between 16S rRNA and arsC phylogenies support the role of horizontal gene transfer (HGT in the evolution of arsenate reductases, with a number of instances of HGT early in bacterial arsC evolution. Plasmid-borne arsC genes are not monophyletic suggesting multiple cases of chromosomal-plasmid exchange and subsequent HGT. Overall, arsC phylogeny is complex and is likely the result of a number of evolutionary mechanisms.
Miller, Hilary C; O'Meally, Denis; Ezaz, Tariq; Amemiya, Chris; Marshall-Graves, Jennifer A; Edwards, Scott
Major histocompatibility complex (MHC) genes are a central component of the vertebrate immune system and usually exist in a single genomic region. However, considerable differences in MHC organization and size exist between different vertebrate lineages. Reptiles occupy a key evolutionary position for understanding how variation in MHC structure evolved in vertebrates, but information on the structure of the MHC region in reptiles is limited. In this study, we investigate the organization and cytogenetic location of MHC genes in the tuatara (Sphenodon punctatus), the sole extant representative of the early-diverging reptilian order Rhynchocephalia. Sequencing and mapping of 12 clones containing class I and II MHC genes from a bacterial artificial chromosome library indicated that the core MHC region is located on chromosome 13q. However, duplication and translocation of MHC genes outside of the core region was evident, because additional class I MHC genes were located on chromosome 4p. We found a total of seven class I sequences and 11 class II β sequences, with evidence for duplication and pseudogenization of genes within the tuatara lineage. The tuatara MHC is characterized by high repeat content and low gene density compared with other species and we found no antigen processing or MHC framework genes on the MHC gene-containing clones. Our findings indicate substantial differences in MHC organization in tuatara compared with mammalian and avian MHCs and highlight the dynamic nature of the MHC. Further sequencing and annotation of tuatara and other reptile MHCs will determine if the tuatara MHC is representative of nonavian reptiles in general. Copyright © 2015 Miller et al.
Full Text Available Abstract Background Network Component Analysis (NCA is a network structure-driven framework for deducing regulatory signal dynamics. In contrast to principal component analysis, which can be employed to select the high-variance genes, NCA makes use of the connectivity structure from transcriptional regulatory networks to infer dynamics of transcription factor activities. Using the budding yeast Saccharomyces cerevisiae as a model system, we aim to deduce regulatory actions of cytokinesis-related genes, using precise spatial proximity (midbody and/or temporal synchronicity (cytokinesis to avoid full-scale computation from genome-wide databases. Results NCA was applied to infer regulatory actions of transcription factor activity from microarray data and partial transcription factor-gene connectivity information for cytokinesis-related genes, which were a subset of genome-wide datasets. No literature has so far discussed the inferred results through NCA are independent of the scale of the gene expression dataset. To avoid full-scale computation from genome-wide databases, four cytokinesis-related gene cases were selected for NCA by running computational analysis over the transcription factor database to confirm the approach being scale-free. The inferred dynamics of transcription factor activity through NCA were independent of the scale of the data matrix selected from the four cytokinesis-related gene sets. Moreover, the inferred regulatory actions were nearly identical to published observations for the selected cytokinesis-related genes in the budding yeast; namely, Mcm1, Ndd1, and Fkh2, which form a transcription factor complex to control expression of the CLB2 cluster (i.e. BUD4, CHS2, IQG1, and CDC5. Conclusion In this study, using S. cerevisiae as a model system, NCA was successfully applied to infer similar regulatory actions of transcription factor activities from two various microarray databases and several partial transcription factor-gene
Chen, Shun-Fu; Juang, Yue-Li; Chou, Wei-Kang; Lai, Jin-Mei; Huang, Chi-Ying F; Kao, Cheng-Yan; Wang, Feng-Sheng
Network Component Analysis (NCA) is a network structure-driven framework for deducing regulatory signal dynamics. In contrast to principal component analysis, which can be employed to select the high-variance genes, NCA makes use of the connectivity structure from transcriptional regulatory networks to infer dynamics of transcription factor activities. Using the budding yeast Saccharomyces cerevisiae as a model system, we aim to deduce regulatory actions of cytokinesis-related genes, using precise spatial proximity (midbody) and/or temporal synchronicity (cytokinesis) to avoid full-scale computation from genome-wide databases. NCA was applied to infer regulatory actions of transcription factor activity from microarray data and partial transcription factor-gene connectivity information for cytokinesis-related genes, which were a subset of genome-wide datasets. No literature has so far discussed the inferred results through NCA are independent of the scale of the gene expression dataset. To avoid full-scale computation from genome-wide databases, four cytokinesis-related gene cases were selected for NCA by running computational analysis over the transcription factor database to confirm the approach being scale-free. The inferred dynamics of transcription factor activity through NCA were independent of the scale of the data matrix selected from the four cytokinesis-related gene sets. Moreover, the inferred regulatory actions were nearly identical to published observations for the selected cytokinesis-related genes in the budding yeast; namely, Mcm1, Ndd1, and Fkh2, which form a transcription factor complex to control expression of the CLB2 cluster (i.e. BUD4, CHS2, IQG1, and CDC5). In this study, using S. cerevisiae as a model system, NCA was successfully applied to infer similar regulatory actions of transcription factor activities from two various microarray databases and several partial transcription factor-gene connectivity datasets for selected cytokinesis
Full Text Available Abstract Background Identifying DNA sequences (enhancers that direct the precise spatial and temporal expression of developmental control genes remains a significant challenge in the annotation of vertebrate genomes. Locating these sequences, which in many cases lie at a great distance from the transcription start site, has been a major obstacle in deciphering gene regulation. Coupling of comparative genomics with functional validation to locate such regulatory elements has been a successful method in locating many such regulatory elements. But most of these studies looked either at a single gene only or the whole genome without focusing on any particular process. The pressing need is to integrate the tools of comparative genomics with knowledge of developmental biology to validate enhancers for developmental transcription factors in greater detail Results Our results show that near four different genes (nkx3.2, pax9, otx1b and foxa2 in zebrafish, only 20-30% of highly conserved DNA sequences can act as developmental enhancers irrespective of the tissue the gene expresses in. We find that some genes also have multiple conserved enhancers expressing in the same tissue at the same or different time points in development. We also located non-conserved enhancers for two of the genes (pax9 and otx1b. Our modified Bacterial artificial chromosome (BACs studies for these 4 genes revealed that many of these enhancers work in a synergistic fashion, which cannot be captured by individual DNA constructs and are not conserved at the sequence level. Our detailed biochemical and transgenic analysis revealed Foxa1 binds to the otx1b non-conserved enhancer to direct its activity in forebrain and otic vesicle of zebrafish at 24 hpf. Conclusion Our results clearly indicate that high level of functional conservation of genes is not necessarily associated with sequence conservation of its regulatory elements. Moreover certain non conserved DNA elements might have
Richards, Thomas A.; Soanes, Darren M.; Foster, Peter G.; Leonard, Guy; Thornton, Christopher R.; Talbot, Nicholas J.
Horizontal gene transfer (HGT) describes the transmission of genetic material across species boundaries and is an important evolutionary phenomenon in the ancestry of many microbes. The role of HGT in plant evolutionary history is, however, largely unexplored. Here, we compare the genomes of six plant species with those of 159 prokaryotic and eukaryotic species and identify 1689 genes that show the highest similarity to corresponding genes from fungi. We constructed a phylogeny for all 1689 genes identified and all homolog groups available from the rice (Oryza sativa) genome (3177 gene families) and used these to define 14 candidate plant-fungi HGT events. Comprehensive phylogenetic analyses of these 14 data sets, using methods that account for site rate heterogeneity, demonstrated support for nine HGT events, demonstrating an infrequent pattern of HGT between plants and fungi. Five HGTs were fungi-to-plant transfers and four were plant-to-fungi HGTs. None of the fungal-to-plant HGTs involved angiosperm recipients. These results alter the current view of organismal barriers to HGT, suggesting that phagotrophy, the consumption of a whole cell by another, is not necessarily a prerequisite for HGT between eukaryotes. Putative functional annotation of the HGT candidate genes suggests that two fungi-to-plant transfers have added phenotypes important for life in a soil environment. Our study suggests that genetic exchange between plants and fungi is exceedingly rare, particularly among the angiosperms, but has occurred during their evolutionary history and added important metabolic traits to plant lineages. PMID:19584142
Mandal, Sudip; Saha, Goutam; Pal, Rajat Kumar
Correct inference of genetic regulations inside a cell from the biological database like time series microarray data is one of the greatest challenges in post genomic era for biologists and researchers. Recurrent Neural Network (RNN) is one of the most popular and simple approach to model the dynamics as well as to infer correct dependencies among genes. Inspired by the behavior of social elephants, we propose a new metaheuristic namely Elephant Swarm Water Search Algorithm (ESWSA) to infer Gene Regulatory Network (GRN). This algorithm is mainly based on the water search strategy of intelligent and social elephants during drought, utilizing the different types of communication techniques. Initially, the algorithm is tested against benchmark small and medium scale artificial genetic networks without and with presence of different noise levels and the efficiency was observed in term of parametric error, minimum fitness value, execution time, accuracy of prediction of true regulation, etc. Next, the proposed algorithm is tested against the real time gene expression data of Escherichia Coli SOS Network and results were also compared with others state of the art optimization methods. The experimental results suggest that ESWSA is very efficient for GRN inference problem and performs better than other methods in many ways.
Cui, Ying; Cai, Meng; Dai, Yang; Stanley, H. Eugene
Detecting disease-related genes is crucial in disease diagnosis and drug design. The accepted view is that neighbors of a disease-causing gene in a molecular network tend to cause the same or similar diseases, and network-based methods have been recently developed to identify novel hereditary disease-genes in available biomedical networks. Despite the steady increase in the discovery of disease-associated genes, there is still a large fraction of disease genes that remains under the tip of the iceberg. In this paper we exploit the topological properties of the protein-protein interaction (PPI) network to detect disease-related genes. We compute, analyze, and compare the topological properties of disease genes with non-disease genes in PPI networks. We also design an improved random forest classifier based on these network topological features, and a cross-validation test confirms that our method performs better than previous similar studies.
Full Text Available Abstract Background The APOBEC3 (A3 genes play a key role in innate antiviral defense in mammals by introducing directed mutations in the DNA. The human genome encodes for seven A3 genes, with multiple splice alternatives. Different A3 proteins display different substrate specificity, but the very basic question on how discerning self from non-self still remains unresolved. Further, the expression of A3 activity/ies shapes the way both viral and host genomes evolve. Results We present here a detailed temporal analysis of the origin and expansion of the A3 repertoire in mammals. Our data support an evolutionary scenario where the genome of the mammalian ancestor encoded for at least one ancestral A3 gene, and where the genome of the ancestor of placental mammals (and possibly of the ancestor of all mammals already encoded for an A3Z1-A3Z2-A3Z3 arrangement. Duplication events of the A3 genes have occurred independently in different lineages: humans, cats and horses. In all of them, gene duplication has resulted in changes in enzyme activity and/or substrate specificity, in a paradigmatic example of convergent adaptive evolution at the genomic level. Finally, our results show that evolutionary rates for the three A3Z1, A3Z2 and A3Z3 motifs have significantly decreased in the last 100 Mya. The analysis constitutes a textbook example of the evolution of a gene locus by duplication and sub/neofunctionalization in the context of virus-host arms race. Conclusions Our results provide a time framework for identifying ancestral and derived genomic arrangements in the APOBEC loci, and to date the expansion of this gene family for different lineages through time, as a response to changes in viral/retroviral/retrotransposon pressure.
Khan, Abhinandan; Mandal, Sudip; Pal, Rajat Kumar; Saha, Goutam
We have proposed a methodology for the reverse engineering of biologically plausible gene regulatory networks from temporal genetic expression data. We have used established information and the fundamental mathematical theory for this purpose. We have employed the Recurrent Neural Network formalism to extract the underlying dynamics present in the time series expression data accurately. We have introduced a new hybrid swarm intelligence framework for the accurate training of the model parameters. The proposed methodology has been first applied to a small artificial network, and the results obtained suggest that it can produce the best results available in the contemporary literature, to the best of our knowledge. Subsequently, we have implemented our proposed framework on experimental (in vivo) datasets. Finally, we have investigated two medium sized genetic networks (in silico) extracted from GeneNetWeaver, to understand how the proposed algorithm scales up with network size. Additionally, we have implemented our proposed algorithm with half the number of time points. The results indicate that a reduction of 50% in the number of time points does not have an effect on the accuracy of the proposed methodology significantly, with a maximum of just over 15% deterioration in the worst case.
Full Text Available RegnANN is a novel method for reverse engineering gene networks based on an ensemble of multilayer perceptrons. The algorithm builds a regressor for each gene in the network, estimating its neighborhood independently. The overall network is obtained by joining all the neighborhoods. RegnANN makes no assumptions about the nature of the relationships between the variables, potentially capturing high-order and non linear dependencies between expression patterns. The evaluation focuses on synthetic data mimicking plausible submodules of larger networks and on biological data consisting of submodules of Escherichia coli. We consider Barabasi and Erdös-Rényi topologies together with two methods for data generation. We verify the effect of factors such as network size and amount of data to the accuracy of the inference algorithm. The accuracy scores obtained with RegnANN is methodically compared with the performance of three reference algorithms: ARACNE, CLR and KELLER. Our evaluation indicates that RegnANN compares favorably with the inference methods tested. The robustness of RegnANN, its ability to discover second order correlations and the agreement between results obtained with this new methods on both synthetic and biological data are promising and they stimulate its application to a wider range of problems.
Constantinescu, Bogdan; Cojocaru, Viorel; Bugoi, Roxana
The analyses of source materials combined with analyses of archaeological objects could distinguish from pieces produced in different regions and periods. For coins, chemical differences that occur during preparation of alloys will affect the elemental composition and could be used for the identification of technologies and workshops and also to distinguish between originals and counterfeits. We illustrate with the case of Geto-Dacian coins (Thassos and Macedonian - Phillip II, Alexander the Greek and Phillip III 'barbarized' tetradrachms) and with Greek Apollonia and Dyrrhachium silver drachms emitted by these old cities for Pompejus during the First Roman Civil War between Julius Caesar and Pompejus, coins found on the actual territory of Romania (ancient Dacia), probably used as bursaries to pay the Dacian mercenaries allied with Pompejus. To analyze the chemical composition of these coins, we used two methods: Am-241 and Pu-238 gamma sources based X-Ray Fluorescence (XRF) and in vacuum 3 MeV protons Particle Induced X-ray Emission (PIXE). Some special measurements on the edge of some coins (to identify plated exemplaires) were done using the ATOMKI Debrecen Van de Graaf 2 MeV protons microprobe, in the frame of European Action COST G1. Concerning the Geto-Dacian coins, we observed: - There is a reduction of the fineness in time that is specific to almost every coin issue. - Tin concentration in coins increased in time; at the beginning of the coinage (250 - 150 B.C.) this was more or less proportionally to copper. This could suggest that bronze was used in alloying silver coins instead of copper. A very high correlation is not expected because the ratio Sn/Cu in ancient bronzes is far to be a constant. A value of the Cu/Sn ratio close to 1 is not surprising because such objects were common in antiquity. In the last issues (150-50 B.C.) seems that Sn replaced partially Cu. - It seems that tin alloying appeared first time in Transylvania around 150 BC and then
Hill, Jonathon T; Demarest, Bradley; Gorsi, Bushra; Smith, Megan; Yost, H Joseph
During embryogenesis the heart forms as a linear tube that then undergoes multiple simultaneous morphogenetic events to obtain its mature shape. To understand the gene regulatory networks (GRNs) driving this phase of heart development, during which many congenital heart disease malformations likely arise, we conducted an RNA-seq timecourse in zebrafish from 30 hpf to 72 hpf and identified 5861 genes with altered expression. We clustered the genes by temporal expression pattern, identified transcription factor binding motifs enriched in each cluster, and generated a model GRN for the major gene batteries in heart morphogenesis. This approach predicted hundreds of regulatory interactions and found batteries enriched in specific cell and tissue types, indicating that the approach can be used to narrow the search for novel genetic markers and regulatory interactions. Subsequent analyses confirmed the GRN using two mutants, Tbx5 and nkx2-5 , and identified sets of duplicated zebrafish genes that do not show temporal subfunctionalization. This dataset provides an essential resource for future studies on the genetic/epigenetic pathways implicated in congenital heart defects and the mechanisms of cardiac transcriptional regulation. © 2017. Published by The Company of Biologists Ltd.
Kim, Dong Sub; Kim, Jinbaek; Kim, Sang Hoon
In this project, we irradiated Arabidopsis plants with various doses of gamma-rays at the vegetative and reproductive stages to assess their radiation sensitivity. After the gene expression profiles and an analysis of the antioxidant response, we selected several Arabidopsis genes for uses of 'Radio marker genes (RMG)' and conducted over-expression and knock-down experiments to confirm the radio sensitivity. Based on these results, we applied two patents for the detection of two RMG (At3g28210 and At4g37990) and development of transgenic plants. Also, we developed a Genechip for use of high-throughput screening of Arabidopsis genes responding only to ionizing radiation and identified RMG to detect radiation leaks. Based on these results, we applied two patents associated with the use of Genechip for different types of radiation and different growth stages. Also, we conducted co-expression network study of specific expressed probes against gamma-ray stress and identified expressed patterns of duplicated genes formed by whole/500kb segmental genome duplication
Kojima, Kenji K; Jurka, Jerzy
Most non-long terminal repeat (non-LTR) retrotransposons encoding a restriction-like endonuclease show target-specific integration into repetitive sequences such as ribosomal RNA genes and microsatellites. However, only a few target-specific lineages of non-LTR retrotransposons are distributed widely and no lineage is found across the eukaryotic kingdoms. Here we report the most widely distributed lineage of target sequence-specific non-LTR retrotransposons, designated Utopia. Utopia is found in three supergroups of eukaryotes: Amoebozoa, SAR, and Opisthokonta. Utopia is inserted into a specific site of U2 small nuclear RNA genes with different strength of specificity for each family. Utopia families from oomycetes and wasps show strong target specificity while only a small number of Utopia copies from reptiles are flanked with U2 snRNA genes. Oomycete Utopia families contain an "archaeal" RNase H domain upstream of reverse transcriptase (RT), which likely originated from a plant RNase H gene. Analysis of Utopia from oomycetes indicates that multiple lineages of Utopia have been maintained inside of U2 genes with few copy numbers. Phylogenetic analysis of RT suggests the monophyly of Utopia, and it likely dates back to the early evolution of eukaryotes.
Kenji K Kojima
Full Text Available Most non-long terminal repeat (non-LTR retrotransposons encoding a restriction-like endonuclease show target-specific integration into repetitive sequences such as ribosomal RNA genes and microsatellites. However, only a few target-specific lineages of non-LTR retrotransposons are distributed widely and no lineage is found across the eukaryotic kingdoms. Here we report the most widely distributed lineage of target sequence-specific non-LTR retrotransposons, designated Utopia. Utopia is found in three supergroups of eukaryotes: Amoebozoa, SAR, and Opisthokonta. Utopia is inserted into a specific site of U2 small nuclear RNA genes with different strength of specificity for each family. Utopia families from oomycetes and wasps show strong target specificity while only a small number of Utopia copies from reptiles are flanked with U2 snRNA genes. Oomycete Utopia families contain an "archaeal" RNase H domain upstream of reverse transcriptase (RT, which likely originated from a plant RNase H gene. Analysis of Utopia from oomycetes indicates that multiple lineages of Utopia have been maintained inside of U2 genes with few copy numbers. Phylogenetic analysis of RT suggests the monophyly of Utopia, and it likely dates back to the early evolution of eukaryotes.
Valentini, Giorgio; Paccanaro, Alberto; Caniza, Horacio; Romero, Alfonso E; Re, Matteo
In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both
Ruyssinck, Joeri; Demeester, Piet; Dhaene, Tom; Saeys, Yvan
Many algorithms have been developed to infer the topology of gene regulatory networks from gene expression data. These methods typically produce a ranking of links between genes with associated confidence scores, after which a certain threshold is chosen to produce the inferred topology. However, the structural properties of the predicted network do not resemble those typical for a gene regulatory network, as most algorithms only take into account connections found in the data and do not include known graph properties in their inference process. This lowers the prediction accuracy of these methods, limiting their usability in practice. We propose a post-processing algorithm which is applicable to any confidence ranking of regulatory interactions obtained from a network inference method which can use, inter alia, graphlets and several graph-invariant properties to re-rank the links into a more accurate prediction. To demonstrate the potential of our approach, we re-rank predictions of six different state-of-the-art algorithms using three simple network properties as optimization criteria and show that Netter can improve the predictions made on both artificially generated data as well as the DREAM4 and DREAM5 benchmarks. Additionally, the DREAM5 E.coli. community prediction inferred from real expression data is further improved. Furthermore, Netter compares favorably to other post-processing algorithms and is not restricted to correlation-like predictions. Lastly, we demonstrate that the performance increase is robust for a wide range of parameter settings. Netter is available at http://bioinformatics.intec.ugent.be. Network inference from high-throughput data is a long-standing challenge. In this work, we present Netter, which can further refine network predictions based on a set of user-defined graph properties. Netter is a flexible system which can be applied in unison with any method producing a ranking from omics data. It can be tailored to specific prior
Full Text Available Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions. Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.
Santillán Zerón, Moisés
Knowing the complete genome of a given species is just a piece of the puzzle. To fully unveil the systems behavior of an organism, an organ, or even a single cell, we need to understand the underlying gene regulatory dynamics. Given the complexity of the whole system, the ultimate goal is unattainable for the moment. But perhaps, by analyzing the most simple genetic systems, we may be able to develop the mathematical techniques and procedures required to tackle more complex genetic networks in the near future. In the present work, the techniques for developing mathematical models of simple bacterial gene networks, like the tryptophan and lactose operons are introduced. Despite all of the underlying assumptions, such models can provide valuable information regarding gene regulation dynamics. Here, we pay special attention to robustness as an emergent property. These notes are organized as follows. In the first section, the long historical relation between mathematics, physics, and biology is briefly reviewed. Recently, the multidisciplinary work in biology has received great attention in the form of systems biology. The main concepts of this novel science are discussed in the second section. A very slim introduction to the essential concepts of molecular biology is given in the third section. In the fourth section, a brief introduction to chemical kinetics is presented. Finally, in the fifth section, a mathematical model for the lactose operon is developed and analyzed..
Alfred J. Robison
Full Text Available Over the past three decades, it has become clear that aberrant function of the network of interconnected brain regions responsible for reward processing and motivated behavior underlies a variety of mood disorders, including depression and anxiety. It is also clear that stress-induced changes in reward network activity underlying both normal and pathological behavior also cause changes in gene expression. Here, we attempt to define the reward circuitry and explore the known and potential contributions of activity-dependent changes in gene expression within this circuitry to stress-induced changes in behavior related to mood disorders, and contrast some of these effects with those induced by exposure to drugs of abuse. We focus on a series of immediate early genes regulated by stress within this circuitry and their connections, both well-explored and relatively novel, to circuit function and subsequent reward-related behaviors. We conclude that IEGs play a crucial role in stress-dependent remodeling of reward circuitry, and that they may serve as inroads to the molecular, cellular, and circuit-level mechanisms of mood disorder etiology and treatment.
Akberdin, Ilya R; Omelyanchuk, Nadezda A; Fadeev, Stanislav I; Leskova, Natalya E; Oschepkova, Evgeniya A; Kazantsev, Fedor V; Matushkin, Yury G; Afonnikov, Dmitry A; Kolchanov, Nikolay A
Multiple experimental data demonstrated that the core gene network orchestrating self-renewal and differentiation of mouse embryonic stem cells involves activity of Oct4, Sox2 and Nanog genes by means of a number of positive feedback loops among them. However, recent studies indicated that the architecture of the core gene network should also incorporate negative Nanog autoregulation and might not include positive feedbacks from Nanog to Oct4 and Sox2. Thorough parametric analysis of the mathematical model based on this revisited core regulatory circuit identified that there are substantial changes in model dynamics occurred depending on the strength of Oct4 and Sox2 activation and molecular complexity of Nanog autorepression. The analysis showed the existence of four dynamical domains with different numbers of stable and unstable steady states. We hypothesize that these domains can constitute the checkpoints in a developmental progression from naïve to primed pluripotency and vice versa. During this transition, parametric conditions exist, which generate an oscillatory behavior of the system explaining heterogeneity in expression of pluripotent and differentiation factors in serum ESC cultures. Eventually, simulations showed that addition of positive feedbacks from Nanog to Oct4 and Sox2 leads mainly to increase of the parametric space for the naïve ESC state, in which pluripotency factors are strongly expressed while differentiation ones are repressed.
Materna, Stefan C
The control processes that underlie the progression of development can be summarized in maps of gene regulatory networks (GRNs). A critical step in their assembly is the systematic perturbation of network candidates. In sea urchins the most important method for interfering with expression in a gene-specific way is application of morpholino antisense oligonucleotides (MOs). MOs act by binding to their sequence complement in transcripts resulting in a block in translation or a change in splicing and thus result in a loss of function. Despite the tremendous success of this technology, recent comparisons to mutants generated by genome editing have led to renewed criticism and challenged its reliability. As with all methods based on sequence recognition, MOs are prone to off-target binding that may result in phenotypes that are erroneously ascribed to the loss of the intended target. However, the slow progression of development in sea urchins has enabled extremely detailed studies of gene activity in the embryo. This wealth of knowledge paired with the simplicity of the sea urchin embryo enables careful analysis of MO phenotypes through a variety of methods that do not rely on terminal phenotypes. This article summarizes the use of MOs in probing GRNs and the steps that should be taken to assure their specificity.
Gene expression is controlled by the combinatorial effects of regulatory factors from different biological subsystems such as general transcription factors (TFs), cellular growth factors and microRNAs. A subsystem’s gene expression may be controlled by its internal regulatory factors, exclusively, or by external subsystems, or by both. It is thus useful to distinguish the degree to which a subsystem is regulated internally or externally–e.g., how non-conserved, species-specific TFs affect the expression of conserved, cross-species genes during evolution. We developed a computational method (DREISS, dreiss.gerteinlab.org) for analyzing the Dynamics of gene expression driven by Regulatory networks, both External and Internal based on State Space models. Given a subsystem, the “state” and “control” in the model refer to its own (internal) and another subsystem’s (external) gene expression levels. The state at a given time is determined by the state and control at a previous time. Because typical time-series data do not have enough samples to fully estimate the model’s parameters, DREISS uses dimensionality reduction, and identifies canonical temporal expression trajectories (e.g., degradation, growth and oscillation) representing the regulatory effects emanating from various subsystems. To demonstrate capabilities of DREISS, we study the regulatory effects of evolutionarily conserved vs. divergent TFs across distant species. In particular, we applied DREISS to the time-series gene expression datasets of C. elegans and D. melanogaster during their embryonic development. We analyzed the expression dynamics of the conserved, orthologous genes (orthologs), seeing the degree to which these can be accounted for by orthologous (internal) versus species-specific (external) TFs. We found that between two species, the orthologs have matched, internally driven expression patterns but very different externally driven ones. This is particularly true for genes with
Full Text Available Gene expression is controlled by the combinatorial effects of regulatory factors from different biological subsystems such as general transcription factors (TFs, cellular growth factors and microRNAs. A subsystem's gene expression may be controlled by its internal regulatory factors, exclusively, or by external subsystems, or by both. It is thus useful to distinguish the degree to which a subsystem is regulated internally or externally-e.g., how non-conserved, species-specific TFs affect the expression of conserved, cross-species genes during evolution. We developed a computational method (DREISS, dreiss.gerteinlab.org for analyzing the Dynamics of gene expression driven by Regulatory networks, both External and Internal based on State Space models. Given a subsystem, the "state" and "control" in the model refer to its own (internal and another subsystem's (external gene expression levels. The state at a given time is determined by the state and control at a previous time. Because typical time-series data do not have enough samples to fully estimate the model's parameters, DREISS uses dimensionality reduction, and identifies canonical temporal expression trajectories (e.g., degradation, growth and oscillation representing the regulatory effects emanating from various subsystems. To demonstrate capabilities of DREISS, we study the regulatory effects of evolutionarily conserved vs. divergent TFs across distant species. In particular, we applied DREISS to the time-series gene expression datasets of C. elegans and D. melanogaster during their embryonic development. We analyzed the expression dynamics of the conserved, orthologous genes (orthologs, seeing the degree to which these can be accounted for by orthologous (internal versus species-specific (external TFs. We found that between two species, the orthologs have matched, internally driven expression patterns but very different externally driven ones. This is particularly true for genes with
Full Text Available Most common complex traits, such as obesity, hypertension, diabetes, and cancers, are known to be associated with multiple genes, environmental factors, and their epistasis. Recently, the development of advanced genotyping technologies has allowed us to perform genome-wide association studies (GWASs. For detecting the effects of multiple genes on complex traits, many approaches have been proposed for GWASs. Multifactor dimensionality reduction (MDR is one of the powerful and efficient methods for detecting high-order gene-gene (GxG interactions. However, the biological interpretation of GxG interactions identified by MDR analysis is not easy. In order to aid the interpretation of MDR results, we propose a network graph analysis to elucidate the meaning of identified GxG interactions. The proposed network graph analysis consists of three steps. The first step is for performing GxG interaction analysis using MDR analysis. The second step is to draw the network graph using the MDR result. The third step is to provide biological evidence of the identified GxG interaction using external biological databases. The proposed method was applied to Korean Association Resource (KARE data, containing 8838 individuals with 327,632 single-nucleotide polymorphisms, in order to perform GxG interaction analysis of body mass index (BMI. Our network graph analysis successfully showed that many identified GxG interactions have known biological evidence related to BMI. We expect that our network graph analysis will be helpful to interpret the biological meaning of GxG interactions.
Eidsaa, Marius; Stubbs, Lisa; Almaas, Eivind
The application of complex network modeling to analyze large co-expression data sets has gained traction during the last decade. In particular, the use of the weighted gene co-expression network analysis framework has allowed an unbiased and systems-level investigation of genotype-phenotype relationships in a wide range of systems. Since mouse is an important model organism for biomedical research on human disease, it is of great interest to identify similarities and differences in the functional roles of human and mouse orthologous genes. Here, we develop a novel network comparison approach which we demonstrate by comparing two gene-expression data sets from a large number of human and mouse tissues. The method uses weighted topological overlap alongside the recently developed network-decomposition method of s-core analysis, which is suitable for making gene-centrality rankings for weighted networks. The aim is to identify globally central genes separately in the human and mouse networks. By comparing the ranked gene lists, we identify genes that display conserved or diverged centrality-characteristics across the networks. This framework only assumes a single threshold value that is chosen from a statistical analysis, and it may be applied to arbitrary network structures and edge-weight distributions, also outside the context of biology. When conducting the comparative network analysis, both within and across the two species, we find a clear pattern of enrichment of transcription factors, for the homeobox domain in particular, among the globally central genes. We also perform gene-ontology term enrichment analysis and look at disease-related genes for the separate networks as well as the network comparisons. We find that gene ontology terms related to regulation and development are generally enriched across the networks. In particular, the genes FOXE3, RHO, RUNX2, ALX3 and RARA, which are disease genes in either human or mouse, are on the top-10 list of globally
Le Novère, Nicolas
Behaviours of complex biomolecular systems are often irreducible to the elementary properties of their individual components. Explanatory and predictive mathematical models are therefore useful for fully understanding and precisely engineering cellular functions. The development and analyses of these models require their adaptation to the problems that need to be solved and the type and amount of available genetic or molecular data. Quantitative and logic modelling are among the main methods currently used to model molecular and gene networks. Each approach comes with inherent advantages and weaknesses. Recent developments show that hybrid approaches will become essential for further progress in synthetic biology and in the development of virtual organisms. PMID:25645874
The gene regulatory network (GRN) is critical for understanding the regulatory interaction between genes. Time-course microarray experiments provide ample information for constructing GRN. The designs for microarray experiments serve different purposes. However, the experiment design specifically for GRN identification is still sparse. In this article, we use a simulation-based approach to deal with design problems in the framework of nonparametric differential equations. We investigate a number of feasible designs. In particular, we evaluate whether earlier samplings can result in more useful information for GRN identification. We also evaluate the effectiveness of two strategies: more frequent samplings per replicate with fewer replicates versus fewer samplings per replicate with more replicates while keeping the total number of samplings constant. The results of our investigation provide quantitative guidance for designing and selecting microarray experiments for the purpose of GRN identification.
Mulligan, Megan K; Mozhui, Khyobeni; Prins, Pjotr; Williams, Robert W
The goal of systems genetics is to understand the impact of genetic variation across all levels of biological organization, from mRNAs, proteins, and metabolites, to higher-order physiological and behavioral traits. This approach requires the accumulation and integration of many types of data, and also requires the use of many types of statistical tools to extract relevant patterns of covariation and causal relations as a function of genetics, environment, stage, and treatment. In this protocol we explain how to use the GeneNetwork web service, a powerful and free online resource for systems genetics. We provide workflows and methods to navigate massive multiscalar data sets and we explain how to use an extensive systems genetics toolkit for analysis and synthesis. Finally, we provide two detailed case studies that take advantage of human and mouse cohorts to evaluate linkage between gene variants, addiction, and aging.
Hur, Junguk; Ozgür, Arzucan; Xiang, Zuoshuang; He, Yongqun
Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Over 170,000 fever-related articles from PubMed abstracts and titles were retrieved and analysed at the sentence level using natural language processing techniques to identify genes and vaccines (including 186 Vaccine Ontology terms) as well as their interactions. This resulted in a generic fever network consisting of 403 genes and 577 gene interactions. A vaccine-specific fever sub-network consisting of 29 genes and 28 gene interactions was extracted from articles that are related to both fever and vaccines. In addition, gene-vaccine interactions were identified. Vaccines (including 4 specific vaccine names) were found to directly interact with 26 genes. Gene set enrichment analysis was performed using the genes in the generated interaction networks. Moreover, the genes in these networks were prioritized using network centrality metrics. Making scientific discoveries and generating new hypotheses were possible by using network centrality and gene set enrichment analyses. For example, our study found that the genes in the generic fever network were more enriched in cell death and responses to wounding, and the vaccine sub-network had more gene enrichment in leukocyte activation and phosphorylation regulation. The most central genes in the vaccine-specific fever network are predicted to be highly relevant to vaccine-induced fever, whereas genes that are central only in the generic fever network are likely to be highly relevant to generic fever responses. Interestingly, no Toll-like receptors (TLRs) were found in the gene-vaccine interaction network. Since
Full Text Available Abstract Background Gene networks in nanoscale are of nonlinear stochastic process. Time delays are common and substantial in these biochemical processes due to gene transcription, translation, posttranslation protein modification and diffusion. Molecular noises in gene networks come from intrinsic fluctuations, transmitted noise from upstream genes, and the global noise affecting all genes. Knowledge of molecular noise filtering and biochemical process delay compensation in gene networks is crucial to understand the signal processing in gene networks and the design of noise-tolerant and delay-robust gene circuits for synthetic biology. Results A nonlinear stochastic dynamic model with multiple time delays is proposed for describing a gene network under process delays, intrinsic molecular fluctuations, and extrinsic molecular noises. Then, the stochastic biochemical processing scheme of gene regulatory networks for attenuating these molecular noises and compensating process delays is investigated from the nonlinear signal processing perspective. In order to improve the robust stability for delay toleration and noise filtering, a robust gene circuit for nonlinear stochastic time-delay gene networks is engineered based on the nonlinear robust H∞ stochastic filtering scheme. Further, in order to avoid solving these complicated noise-tolerant and delay-robust design problems, based on Takagi-Sugeno (T-S fuzzy time-delay model and linear matrix inequalities (LMIs technique, a systematic gene circuit design method is proposed to simplify the design procedure. Conclusion The proposed gene circuit design method has much potential for application to systems biology, synthetic biology and drug design when a gene regulatory network has to be designed for improving its robust stability and filtering ability of disease-perturbed gene network or when a synthetic gene network needs to perform robustly under process delays and molecular noises.
Full Text Available Abstract Background CRISPR (Clustered, Regularly, Interspaced, Short, Palindromic Repeats loci provide prokaryotes with an adaptive immunity against viruses and other mobile genetic elements. CRISPR arrays can be transcribed and processed into small crRNA molecules, which are then used by the cell to target the foreign nucleic acid. Since spacers are accumulated by active CRISPR/Cas systems, the sequences of these spacers provide a record of the past "infection history" of the organism. Results Here we analyzed all currently known spacers present in archaeal genomes and identified their source by DNA similarity. While nearly 50% of archaeal spacers matched mobile genetic elements, such as plasmids or viruses, several others matched chromosomal genes of other organisms, primarily other archaea. Thus, networks of gene exchange between archaeal species were revealed by the spacer analysis, including many cases of inter-genus and inter-species gene transfer events. Spacers that recognize viral sequences tend to be located further away from the leader sequence, implying that there exists a selective pressure for their retention. Conclusions CRISPR spacers provide direct evidence for extensive gene exchange in archaea, especially within genera, and support the current dogma where the primary role of the CRISPR/Cas system is anti-viral and anti-plasmid defense. Open peer review This article was reviewed by: Profs. W. Ford Doolittle, John van der Oost, Christa Schleper (nominated by board member Prof. J Peter Gogarten
Xiang, Zuoshuang; Qin, Tingting; Qin, Zhaohui S; He, Yongqun
The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining
Background The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. Results The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. Conclusions The GenoMesh algorithm and web program provide the first genome
Ó'Maoiléidigh, Diarmuid S; Thomson, Bennett; Raganelli, Andrea; Wuest, Samuel E; Ryan, Patrick T; Kwaśniewska, Kamila; Carles, Cristel C; Graciet, Emmanuelle; Wellmer, Frank
Understanding how flowers develop from undifferentiated stem cells has occupied developmental biologists for decades. Key to unraveling this process is a detailed knowledge of the global regulatory hierarchies that control developmental transitions, cell differentiation and organ growth. These hierarchies may be deduced from gene perturbation experiments, which determine the effects on gene expression after specific disruption of a regulatory gene. Here, we tested experimental strategies for gene perturbation experiments during Arabidopsis thaliana flower development. We used artificial miRNAs (amiRNAs) to disrupt the functions of key floral regulators, and expressed them under the control of various inducible promoter systems that are widely used in the plant research community. To be able to perform genome-wide experiments with stage-specific resolution using the various inducible promoter systems for gene perturbation experiments, we also generated a series of floral induction systems that allow collection of hundreds of synchronized floral buds from a single plant. Based on our results, we propose strategies for performing dynamic gene perturbation experiments in flowers, and outline how they may be combined with versions of the floral induction system to dissect the gene regulatory network underlying flower development. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
Full Text Available The potent proinflammatory cytokine interleukin (IL-1 triggers gene expression through the NF-κB signaling pathway. Here, we investigated the cofactor requirements of strongly regulated IL-1 target genes whose expression is impaired in p65 NF-κB-deficient murine embryonic fibroblasts. By two independent small-hairpin (shRNA screens, we examined 170 genes annotated to encode nuclear cofactors for their role in Cxcl2 mRNA expression and identified 22 factors that modulated basal or IL-1-inducible Cxcl2 levels. The functions of 16 of these factors were validated for Cxcl2 and further analyzed for their role in regulation of 10 additional IL-1 target genes by RT-qPCR. These data reveal that each inducible gene has its own (quantitative requirement of cofactors to maintain basal levels and to respond to IL-1. Twelve factors (Epc1, H2afz, Kdm2b, Kdm6a, Mbd3, Mta2, Phf21a, Ruvbl1, Sin3b, Suv420h1, Taf1, and Ube3a have not been previously implicated in inflammatory cytokine functions. Bioinformatics analysis indicates that they are components of complex nuclear protein networks that regulate chromatin functions and gene transcription. Collectively, these data suggest that downstream from the essential NF-κB signal each cytokine-inducible target gene has further subtle requirements for individual sets of nuclear cofactors that shape its transcriptional activation profile.
Full Text Available Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. However, the Arabidopsis genome is characterized by an inherently complex organization, since it has undergone ancient whole genome duplications, followed by gene reduction, diploidization events and extended rearrangements, which relocated and split up the retained portions. These events, together with probable chromosome reductions, dramatically increased the genome complexity, limiting its role as a reference. The identification of paralogs and single copy genes within a highly duplicated genome is a prerequisite to understand its organization and evolution and to improve its exploitation in comparative genomics. This is still controversial, even in the widely studied Arabidopsis genome. This is also due to the lack of a reference bioinformatics pipeline that could exhaustively identify paralogs and singleton genes. We describe here a complete computational strategy to detect both duplicated and single copy genes in a genome, discussing all the methodological issues that may strongly affect the results, their quality and their reliability. This approach was used to analyze the organization of Arabidopsis nuclear protein coding genes, and besides classifying computationally defined paralogs into networks and single copy genes into different classes, it unraveled further intriguing aspects concerning the genome annotation and the gene relationships in this reference plant species. Since our results may be useful for comparative genomics and genome functional analyses, we organized a dedicated web interface to make them accessible to the scientific community.
Full Text Available Thickening of tree stems is the result of secondary growth, accomplished by the meristematic activity of the vascular cambium. Secondary growth of the stem entails developmental cascades resulting in the formation of secondary phloem outwards and secondary xylem (i.e., wood inwards of the stem. Signaling and transcriptional reprogramming by the phytohormone ethylene modifies cambial growth and cell differentiation, but the molecular link between ethylene and secondary growth remains unknown. We addressed this shortcoming by analyzing expression profiles and co-expression networks of ethylene pathway genes using the AspWood transcriptome database which covers all stages of secondary growth in aspen (Populus tremula stems. ACC synthase expression suggests that the ethylene precursor 1-aminocyclopropane-1-carboxylic acid (ACC is synthesized during xylem expansion and xylem cell maturation. Ethylene-mediated transcriptional reprogramming occurs during all stages of secondary growth, as deduced from AspWood expression profiles of ethylene-responsive genes. A network centrality analysis of the AspWood dataset identified EIN3D and 11 ERFs as hubs. No overlap was found between the co-expressed genes of the EIN3 and ERF hubs, suggesting target diversification and hence independent roles for these transcription factor families during normal wood formation. The EIN3D hub was part of a large co-expression gene module, which contained 16 transcription factors, among them several new candidates that have not been earlier connected to wood formation and a VND-INTERACTING 2 (VNI2 homolog. We experimentally demonstrated Populus EIN3D function in ethylene signaling in Arabidopsis thaliana. The ERF hubs ERF118 and ERF119 were connected on the basis of their expression pattern and gene co-expression module composition to xylem cell expansion and secondary cell wall formation, respectively. We hereby establish data resources for ethylene-responsive genes and
Mier-y-Terán-Romero, Luis; Silber, Mary; Hatzimanikatis, Vassily
Designing genetic networks with desired functionalities requires an accurate mathematical framework that accounts for the essential mechanistic details of the system. Here, we formulate a time-delay model of protein translation and mRNA degradation by systematically reducing a detailed mechanistic model that explicitly accounts for the ribosomal dynamics and the cleaving of mRNA by endonucleases. We exploit various technical and conceptual advantages that our time-delay model offers over the mechanistic model to probe the behavior of a self-repressing gene over wide regions of parameter space. We show that a heuristic time-delay model of protein synthesis of a commonly used form yields a notably different prediction for the parameter region where sustained oscillations occur. This suggests that such heuristics can lead to erroneous results. The functional forms that arise from our systematic reduction can be used for every system that involves transcription and translation and they could replace the commonly used heuristic time-delay models for these processes. The results from our analysis have important implications for the design of synthetic gene networks and stress that such design must be guided by a combination of heuristic models and mechanistic models that include all relevant details of the process. PMID:23663853
Wellmer, Frank; Riechmann, José Luis
The analysis of the gene regulatory networks underlying development is of central importance for a better understanding of the mechanisms that control the formation of the different cell-types, tissues or organs of an organism. The recent invention of genomic technologies has opened the possibility of studying these networks at a global level. In this paper, we summarize some of the recent advances that have been made in the understanding of plant development by the application of genomic technologies. We focus on a few specific processes, namely flower and root development and the control of the cell cycle, but we also highlight landmark studies in other areas that opened new avenues of experimentation or analysis. We describe the methods and the strategies that are currently used for the analysis of plant development by genomic technologies, as well as some of the problems and limitations that hamper their application. Since many genomic technologies and concepts were first developed and tested in organisms other than plants, we make reference to work in non-plant species and compare the current state of network analysis in plants to that in other multicellular organisms.
Yankura, Kristen A; Koechlein, Claire S; Cryan, Abigail F; Cheatle, Alys; Hinman, Veronica F
A great challenge in development biology is to understand how interacting networks of regulatory genes can direct the often highly complex patterning of cells in a 3D embryo. Here, we detail the gene regulatory network that describes the distribution of ciliary band-associated neurons in the bipinnaria larva of the sea star. This larva, typically for the ancestral deuterostome dipleurula larval type that it represents, forms two loops of ciliary bands that extend across much of the anterior-posterior and dorsal-ventral ectoderm. We show that the sea star first likely uses maternally inherited factors and the Wnt and Delta pathways to distinguish neurogenic ectoderm from endomesoderm. The broad neurogenic potential of the ectoderm persists throughout much of gastrulation. Nodal, bone morphogenetic protein 2/4 (Bmp2/4), and Six3-dependent pathways then sculpt a complex ciliary band territory that is defined by the expression of the forkhead transcription factor, foxg. Foxg is needed to define two molecularly distinct ectodermal domains, and for the formation of differentiated neurons along the edge of these two territories. Thus, significantly, Bmp2/4 signaling in sea stars does not distinguish differentiated neurons from nonneuronal ectoderm as it does in many other animals, but instead contributes to the patterning of an ectodermal territory, which then, in turn, provides cues to permit the final steps of neuronal differentiation. The modularity between specification and patterning likely reflects the evolutionary history of this gene regulatory network, in which an ancient module for specification of a broad neurogenic potential ectoderm was subsequently overlaid with a module for patterning.
Amrine, Katherine C H; Blanco-Ulate, Barbara; Cantu, Dario
Intricate signal networks and transcriptional regulators translate the recognition of pathogens into defense responses. In this study, we carried out a gene co-expression analysis of all currently publicly available microarray data, which were generated in experiments that studied the interaction of the model plant Arabidopsis thaliana with microbial pathogens. This work was conducted to identify (i) modules of functionally related co-expressed genes that are differentially expressed in response to multiple biotic stresses, and (ii) hub genes that may function as core regulators of disease responses. Using Weighted Gene Co-expression Network Analysis (WGCNA) we constructed an undirected network leveraging a rich curated expression dataset comprising 272 microarrays that involved microbial infections of Arabidopsis plants with a wide array of fungal and bacterial pathogens with biotrophic, hemibiotrophic, and necrotrophic lifestyles. WGCNA produced a network with scale-free and small-world properties composed of 205 distinct clusters of co-expressed genes. Modules of functionally related co-expressed genes that are differentially regulated in response to multiple pathogens were identified by integrating differential gene expression testing with functional enrichment analyses of gene ontology terms, known disease associated genes, transcriptional regulators, and cis-regulatory elements. The significance of functional enrichments was validated by comparisons with randomly generated networks. Network topology was then analyzed to identify intra- and inter-modular gene hubs. Based on high connectivity, and centrality in meta-modules that are clearly enriched in defense responses, we propose a list of 66 target genes for reverse genetic experiments to further dissect the Arabidopsis immune system. Our results show that statistical-based data trimming prior to network analysis allows the integration of expression datasets generated by different groups, under different
Full Text Available Rice is one of the most important model crop plants whose heterosis has been well exploited in commercial hybrid seed production via a variety of types of male sterile lines. Hybrid rice cultivation area is steadily expanding around the world, especially in Southern Asia. Characterization of genes and proteins related to male sterility aims to understand how and why the male sterility occurs, and which proteins are the key players for microspores abortion. Recently, a series of genes and proteins related to cytoplasmic male sterility, photoperiod sensitive male sterility, self-incompatibility and other types of microspores deterioration have been characterized through genetics or proteomics. Especially the latter, offers us a powerful and high throughput approach to discern the novel proteins involving in male-sterile pathways which may help us to breed artificial male-sterile system. This represents an alternative tool to meet the critical challenge of further development of hybrid rice. In this paper, we reviewed the recent developments in our understanding of male sterility in rice hybrid production across gene, protein and integrated network levels, and also, present a perspective on the engineering of male sterile lines for hybrid rice production.
Zhang, Yuji; Tao, Cui; Jiang, Guoqian; Nair, Asha A; Su, Jian; Chute, Christopher G; Liu, Hongfang
A huge amount of associations among different biological entities (e.g., disease, drug, and gene) are scattered in millions of biomedical articles. Systematic analysis of such heterogeneous data can infer novel associations among different biological entities in the context of personalized medicine and translational research. Recently, network-based computational approaches have gained popularity in investigating such heterogeneous data, proposing novel therapeutic targets and deciphering disease mechanisms. However, little effort has been devoted to investigating associations among drugs, diseases, and genes in an integrative manner. We propose a novel network-based computational framework to identify statistically over-expressed subnetwork patterns, called network motifs, in an integrated disease-drug-gene network extracted from Semantic MEDLINE. The framework consists of two steps. The first step is to construct an association network by extracting pair-wise associations between diseases, drugs and genes in Semantic MEDLINE using a domain pattern driven strategy. A Resource Description Framework (RDF)-linked data approach is used to re-organize the data to increase the flexibility of data integration, the interoperability within domain ontologies, and the efficiency of data storage. Unique associations among drugs, diseases, and genes are extracted for downstream network-based analysis. The second step is to apply a network-based approach to mine the local network structure of this heterogeneous network. Significant network motifs are then identified as the backbone of the network. A simplified network based on those significant motifs is then constructed to facilitate discovery. We implemented our computational framework and identified five network motifs, each of which corresponds to specific biological meanings. Three case studies demonstrate that novel associations are derived from the network topology analysis of reconstructed networks of significant
David A Garfield
Full Text Available Regulatory interactions buffer development against genetic and environmental perturbations, but adaptation requires phenotypes to change. We investigated the relationship between robustness and evolvability within the gene regulatory network underlying development of the larval skeleton in the sea urchin Strongylocentrotus purpuratus. We find extensive variation in gene expression in this network throughout development in a natural population, some of which has a heritable genetic basis. Switch-like regulatory interactions predominate during early development, buffer expression variation, and may promote the accumulation of cryptic genetic variation affecting early stages. Regulatory interactions during later development are typically more sensitive (linear, allowing variation in expression to affect downstream target genes. Variation in skeletal morphology is associated primarily with expression variation of a few, primarily structural, genes at terminal positions within the network. These results indicate that the position and properties of gene interactions within a network can have important evolutionary consequences independent of their immediate regulatory role.
Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.
Background: Inverse modelling of gene regulatory networks (GRNs) capable of simulating continuous spatio-temporal biological processes requires accurate data and a good description of the system. If quantitative relations between genes cannot be extracted from direct measurements, an efficient
Full Text Available We report the comprehensive identification of periodic genes and their network inference, based on a gene co-expression analysis and an Auto-Regressive eXogenous (ARX model with a group smoothly clipped absolute deviation (SCAD method using a time-series transcriptome dataset in a model grass, Brachypodium distachyon. To reveal the diurnal changes in the transcriptome in B. distachyon, we performed RNA-seq analysis of its leaves sampled through a diurnal cycle of over 48 h at 4 h intervals using three biological replications, and identified 3,621 periodic genes through our wavelet analysis. The expression data are feasible to infer network sparsity based on ARX models. We found that genes involved in biological processes such as transcriptional regulation, protein degradation, and post-transcriptional modification and photosynthesis are significantly enriched in the periodic genes, suggesting that these processes might be regulated by circadian rhythm in B. distachyon. On the basis of the time-series expression patterns of the periodic genes, we constructed a chronological gene co-expression network and identified putative transcription factors encoding genes that might be involved in the time-specific regulatory transcriptional network. Moreover, we inferred a transcriptional network composed of the periodic genes in B. distachyon, aiming to identify genes associated with other genes through variable selection by grouping time points for each gene. Based on the ARX model with the group SCAD regularization using our time-series expression datasets of the periodic genes, we constructed gene networks and found that the networks represent typical scale-free structure. Our findings demonstrate that the diurnal changes in the transcriptome in B. distachyon leaves have a sparse network structure, demonstrating the spatiotemporal gene regulatory network over the cyclic phase transitions in B. distachyon diurnal growth.
Koda, Satoru; Onda, Yoshihiko; Matsui, Hidetoshi; Takahagi, Kotaro; Yamaguchi-Uehara, Yukiko; Shimizu, Minami; Inoue, Komaki; Yoshida, Takuhiro; Sakurai, Tetsuya; Honda, Hiroshi; Eguchi, Shinto; Nishii, Ryuei; Mochida, Keiichi
We report the comprehensive identification of periodic genes and their network inference, based on a gene co-expression analysis and an Auto-Regressive eXogenous (ARX) model with a group smoothly clipped absolute deviation (SCAD) method using a time-series transcriptome dataset in a model grass, Brachypodium distachyon . To reveal the diurnal changes in the transcriptome in B. distachyon , we performed RNA-seq analysis of its leaves sampled through a diurnal cycle of over 48 h at 4 h intervals using three biological replications, and identified 3,621 periodic genes through our wavelet analysis. The expression data are feasible to infer network sparsity based on ARX models. We found that genes involved in biological processes such as transcriptional regulation, protein degradation, and post-transcriptional modification and photosynthesis are significantly enriched in the periodic genes, suggesting that these processes might be regulated by circadian rhythm in B. distachyon . On the basis of the time-series expression patterns of the periodic genes, we constructed a chronological gene co-expression network and identified putative transcription factors encoding genes that might be involved in the time-specific regulatory transcriptional network. Moreover, we inferred a transcriptional network composed of the periodic genes in B. distachyon , aiming to identify genes associated with other genes through variable selection by grouping time points for each gene. Based on the ARX model with the group SCAD regularization using our time-series expression datasets of the periodic genes, we constructed gene networks and found that the networks represent typical scale-free structure. Our findings demonstrate that the diurnal changes in the transcriptome in B. distachyon leaves have a sparse network structure, demonstrating the spatiotemporal gene regulatory network over the cyclic phase transitions in B. distachyon diurnal growth.
Full Text Available Interferon-gamma (IFN-γ regulates various immune responses that are often critical for vaccine-induced protection. In order to annotate the IFN-γ-related gene interaction network from a large amount of IFN-γ research reported in the literature, a literature-based discovery approach was applied with a combination of natural language processing (NLP and network centrality analysis. The interaction network of human IFN-γ (Gene symbol: IFNG and its vaccine-specific subnetwork were automatically extracted using abstracts from all articles in PubMed. Four network centrality metrics were further calculated to rank the genes in the constructed networks. The resulting generic IFNG network contains 1060 genes and 26313 interactions among these genes. The vaccine-specific subnetwork contains 102 genes and 154 interactions. Fifty six genes such as TNF, NFKB1, IL2, IL6, and MAPK8 were ranked among the top 25 by at least one of the centrality methods in one or both networks. Gene enrichment analysis indicated that these genes were classified in various immune mechanisms such as response to extracellular stimulus, lymphocyte activation, and regulation of apoptosis. Literature evidence was manually curated for the IFN-γ relatedness of 56 genes and vaccine development relatedness for 52 genes. This study also generated many new hypotheses worth further experimental studies.
Full Text Available Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs. It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC curve and the precision-recall (PR curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference.
Jeanne M Serb
Full Text Available BACKGROUND: Large-scale gene expression studies have not yielded the expected insight into genetic networks that control complex processes. These anticipated discoveries have been limited not by technology, but by a lack of effective strategies to investigate the data in a manageable and meaningful way. Previous work suggests that using a pre-determined seed-network of gene relationships to query large-scale expression datasets is an effective way to generate candidate genes for further study and network expansion or enrichment. Based on the evolutionary conservation of gene relationships, we test the hypothesis that a seed network derived from studies of retinal cell determination in the fly, Drosophila melanogaster, will be an effective way to identify novel candidate genes for their role in mouse retinal development. METHODOLOGY/PRINCIPAL FINDINGS: Our results demonstrate that a number of gene relationships regulating retinal cell differentiation in the fly are identifiable as pairwise correlations between genes from developing mouse retina. In addition, we demonstrate that our extracted seed-network of correlated mouse genes is an effective tool for querying datasets and provides a context to generate hypotheses. Our query identified 46 genes correlated with our extracted seed-network members. Approximately 54% of these candidates had been previously linked to the developing brain and 33% had been previously linked to the developing retina. Five of six candidate genes investigated further were validated by experiments examining spatial and temporal protein expression in the developing retina. CONCLUSIONS/SIGNIFICANCE: We present an effective strategy for pursuing a systems biology approach that utilizes an evolutionary comparative framework between two model organisms, fly and mouse. Future implementation of this strategy will be useful to determine the extent of network conservation, not just gene conservation, between species and will
Full Text Available Abstract Background Cellular processes are controlled by gene-regulatory networks. Several computational methods are currently used to learn the structure of gene-regulatory networks from data. This study focusses on time series gene expression and gene knock-out data in order to identify the underlying network structure. We compare the performance of different network reconstruction methods using synthetic data generated from an ensemble of reference networks. Data requirements as well as optimal experiments for the reconstruction of gene-regulatory networks are investigated. Additionally, the impact of prior knowledge on network reconstruction as well as the effect of unobserved cellular processes is studied. Results We identify linear Gaussian dynamic Bayesian networks and variable selection based on F-statistics as suitable methods for the reconstruction of gene-regulatory networks from time series data. Commonly used discrete dynamic Bayesian networks perform inferior and this result can be attributed to the inevitable information loss by discretization of expression data. It is shown that short time series generated under transcription factor knock-out are optimal experiments in order to reveal the structure of gene regulatory networks. Relative to the level of observational noise, we give estimates for the required amount of gene expression data in order to accurately reconstruct gene-regulatory networks. The benefit of using of prior knowledge within a Bayesian learning framework is found to be limited to conditions of small gene expression data size. Unobserved processes, like protein-protein interactions, induce dependencies between gene expression levels similar to direct transcriptional regulation. We show that these dependencies cannot be distinguished from transcription factor mediated gene regulation on the basis of gene expression data alone. Conclusion Currently available data size and data quality make the reconstruction of
Cui, Ying; Cai, Meng; Stanley, H. Eugene
Although there have been many network-based attempts to discover disease-associated genes, most of them have not taken edge weight - which quantifies their relative strength - into consideration. We use connection weights in a protein-protein interaction (PPI) network to locate disease-related genes. We analyze the topological properties of both weighted and unweighted PPI networks and design an improved random forest classifier to distinguish disease genes from non-disease genes. We use a cross-validation test to confirm that weighted networks are better able to discover disease-associated genes than unweighted networks, which indicates that including link weight in the analysis of network properties provides a better model of complex genotype-phenotype associations.
Ahsen, Mehmet Eren; Niculescu, Silviu-Iulian
This brief examines a deterministic, ODE-based model for gene regulatory networks (GRN) that incorporates nonlinearities and time-delayed feedback. An introductory chapter provides some insights into molecular biology and GRNs. The mathematical tools necessary for studying the GRN model are then reviewed, in particular Hill functions and Schwarzian derivatives. One chapter is devoted to the analysis of GRNs under negative feedback with time delays and a special case of a homogenous GRN is considered. Asymptotic stability analysis of GRNs under positive feedback is then considered in a separate chapter, in which conditions leading to bi-stability are derived. Graduate and advanced undergraduate students and researchers in control engineering, applied mathematics, systems biology and synthetic biology will find this brief to be a clear and concise introduction to the modeling and analysis of GRNs.
This project incorporates technology and a historical emphasis on science drawn from ancient civilizations to promote a greater understanding of conceptual science. In the Apps for Ancient Civilizations project, students investigate an ancient culture to discover how people might have used science and math smartphone apps to make their lives…
Wang, Yi Kan; Hurley, Daniel G; Schnell, Santiago; Print, Cristin G; Crampin, Edmund J
We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data.
Full Text Available The inference of gene regulatory networks gained within recent years a considerable interest in the biology and biomedical community. The purpose of this paper is to investigate the influence that environmental conditions can exhibit on the inference performance of network inference algorithms. Specifically, we study five network inference methods, Aracne, BC3NET, CLR, C3NET and MRNET, and compare the results for three different conditions: (I observational gene expression data: normal environmental condition, (II interventional gene expression data: growth in rich media, (III interventional gene expression data: normal environmental condition interrupted by a positive spike-in stimulation. Overall, we find that different statistical inference methods lead to comparable, but condition-specific results. Further, our results suggest that non-steady-state data enhance the inferability of regulatory networks.
Full Text Available Improving the ability to reverse engineer biochemical networks is a major goal of systems biology. Lesions in signaling networks lead to alterations in gene expression, which in principle should allow network reconstruction. However, the information about the activity levels of signaling proteins conveyed in overall gene expression is limited by the complexity of gene expression dynamics and of regulatory network topology. Two observations provide the basis for overcoming this limitation: a. genes induced without de-novo protein synthesis (early genes show a linear accumulation of product in the first hour after the change in the cell's state; b. The signaling components in the network largely function in the linear range of their stimulus-response curves. Therefore, unlike most genes or most time points, expression profiles of early genes at an early time point provide direct biochemical assays that represent the activity levels of upstream signaling components. Such expression data provide the basis for an efficient algorithm (Plato's Cave algorithm; PLACA to reverse engineer functional signaling networks. Unlike conventional reverse engineering algorithms that use steady state values, PLACA uses stimulated early gene expression measurements associated with systematic perturbations of signaling components, without measuring the signaling components themselves. Besides the reverse engineered network, PLACA also identifies the genes detecting the functional interaction, thereby facilitating validation of the predicted functional network. Using simulated datasets, the algorithm is shown to be robust to experimental noise. Using experimental data obtained from gonadotropes, PLACA reverse engineered the interaction network of six perturbed signaling components. The network recapitulated many known interactions and identified novel functional interactions that were validated by further experiment. PLACA uses the results of experiments that are
Shimoni, Yishai; Fink, Marc Y; Choi, Soon-gang; Sealfon, Stuart C
Improving the ability to reverse engineer biochemical networks is a major goal of systems biology. Lesions in signaling networks lead to alterations in gene expression, which in principle should allow network reconstruction. However, the information about the activity levels of signaling proteins conveyed in overall gene expression is limited by the complexity of gene expression dynamics and of regulatory network topology. Two observations provide the basis for overcoming this limitation: a. genes induced without de-novo protein synthesis (early genes) show a linear accumulation of product in the first hour after the change in the cell's state; b. The signaling components in the network largely function in the linear range of their stimulus-response curves. Therefore, unlike most genes or most time points, expression profiles of early genes at an early time point provide direct biochemical assays that represent the activity levels of upstream signaling components. Such expression data provide the basis for an efficient algorithm (Plato's Cave algorithm; PLACA) to reverse engineer functional signaling networks. Unlike conventional reverse engineering algorithms that use steady state values, PLACA uses stimulated early gene expression measurements associated with systematic perturbations of signaling components, without measuring the signaling components themselves. Besides the reverse engineered network, PLACA also identifies the genes detecting the functional interaction, thereby facilitating validation of the predicted functional network. Using simulated datasets, the algorithm is shown to be robust to experimental noise. Using experimental data obtained from gonadotropes, PLACA reverse engineered the interaction network of six perturbed signaling components. The network recapitulated many known interactions and identified novel functional interactions that were validated by further experiment. PLACA uses the results of experiments that are feasible for any
Spicker, J.S.; Wikman, F.; Lu, M.L.
We have trained an artificial neural network to predict the sequence of the human TP53 tumor suppressor gene based on a p53 GeneChip. The trained neural network uses as input the fluorescence intensities of DNA hybridized to oligonucleotides on the surface of the chip and makes between zero...
Deciphering the onychophoran 'segmentation gene cascade': Gene expression reveals limited involvement of pair rule gene orthologs in segmentation, but a highly conserved segment polarity gene network.
Janssen, Ralf; Budd, Graham E
The hallmark of the arthropods is their segmented body, although origin of segmentation, however, is unresolved. In order to shed light on the origin of segmentation we investigated orthologs of pair rule genes (PRGs) and segment polarity genes (SPGs) in a member of the closest related sister-group to the arthropods, the onychophorans. Our gene expression data analysis suggests that most of the onychophoran PRGs do not play a role in segmentation. One possible exception is the even-skipped (eve) gene that is expressed in the posterior end of the onychophoran where new segments are likely patterned, and is also expressed in segmentation-gene typical transverse stripes in at least a number of newly formed segments. Other onychophoran PRGs such as runt (run), hairy/Hes (h/Hes) and odd-skipped (odd) do not appear to have a function in segmentation at all. Onychophoran PRGs that act low in the segmentation gene cascade in insects, however, are potentially involved in segment-patterning. Most obvious is that from the expression of the pairberry (pby) gene ortholog that is expressed in a typical SPG-pattern. Since this result suggested possible conservation of the SPG-network we further investigated SPGs (and associated factors) such as Notum in the onychophoran. We find that the expression patterns of SPGs in arthropods and the onychophoran are highly conserved, suggesting a conserved SPG-network in these two clades, and indeed also in an annelid. This may suggest that the common ancestor of lophotrochozoans and ecdysozoans was already segmented utilising the same SPG-network, or that the SPG-network was recruited independently in annelids and onychophorans/arthropods. © 2013 Elsevier Inc. All rights reserved.
Berto, Stefano; Perdomo-Sabogal, Alvaro; Gerighausen, Daniel; Qin, Jing; Nowick, Katja
Cognitive abilities, such as memory, learning, language, problem solving, and planning, involve the frontal lobe and other brain areas. Not much is known yet about the molecular basis of cognitive abilities, but it seems clear that cognitive abilities are determined by the interplay of many genes. One approach for analyzing the genetic networks involved in cognitive functions is to study the coexpression networks of genes with known importance for proper cognitive functions, such as genes that have been associated with cognitive disorders like intellectual disability (ID) or autism spectrum disorders (ASD). Because many of these genes are gene regulatory factors (GRFs) we aimed to provide insights into the gene regulatory networks active in the human frontal lobe. Using genome wide human frontal lobe expression data from 10 independent data sets, we first derived 10 individual coexpression networks for all GRFs including their potential target genes. We observed a high level of variability among these 10 independently derived networks, pointing out that relying on results from a single study can only provide limited biological insights. To instead focus on the most confident information from these 10 networks we developed a method for integrating such independently derived networks into a consensus network. This consensus network revealed robust GRF interactions that are conserved across the frontal lobes of different healthy human individuals. Within this network, we detected a strong central module that is enriched for 166 GRFs known to be involved in brain development and/or cognitive disorders. Interestingly, several hubs of the consensus network encode for GRFs that have not yet been associated with brain functions. Their central role in the network suggests them as excellent new candidates for playing an essential role in the regulatory network of the human frontal lobe, which should be investigated in future studies. PMID:27014338
Reyes-Palomares, Armando; Rodríguez-López, Rocío; Ranea, Juan A G; Sánchez-Jiménez, Francisca; Sánchez Jiménez, Francisca; Medina, Miguel Angel
The molecular complexity of genetic diseases requires novel approaches to break it down into coherent biological modules. For this purpose, many disease network models have been created and analyzed. We highlight two of them, "the human diseases networks" (HDN) and "the orphan disease networks" (ODN). However, in these models, each single node represents one disease or an ambiguous group of diseases. In these cases, the notion of diseases as unique entities reduces the usefulness of network-based methods. We hypothesize that using the clinical features (pathophenotypes) to define pathophenotypic connections between disease-causing genes improve our understanding of the molecular events originated by genetic disturbances. For this, we have built a pathophenotypic similarity gene network (PSGN) and compared it with the unipartite projections (based on gene-to-gene edges) similar to those used in previous network models (HDN and ODN). Unlike these disease network models, the PSGN uses semantic similarities. This pathophenotypic similarity has been calculated by comparing pathophenotypic annotations of genes (human abnormalities of HPO terms) in the "Human Phenotype Ontology". The resulting network contains 1075 genes (nodes) and 26197 significant pathophenotypic similarities (edges). A global analysis of this network reveals: unnoticed pairs of genes showing significant pathophenotypic similarity, a biological meaningful re-arrangement of the pathological relationships between genes, correlations of biochemical interactions with higher similarity scores and functional biases in metabolic and essential genes toward the pathophenotypic specificity and the pleiotropy, respectively. Additionally, pathophenotypic similarities and metabolic interactions of genes associated with maple syrup urine disease (MSUD) have been used to merge into a coherent pathological module.Our results indicate that pathophenotypes contribute to identify underlying co-dependencies among disease
Richard A Notebaart
Full Text Available To what extent can modes of gene regulation be explained by systems-level properties of metabolic networks? Prior studies on co-regulation of metabolic genes have mainly focused on graph-theoretical features of metabolic networks and demonstrated a decreasing level of co-expression with increasing network distance, a naïve, but widely used, topological index. Others have suggested that static graph representations can poorly capture dynamic functional associations, e.g., in the form of dependence of metabolic fluxes across genes in the network. Here, we systematically tested the relative importance of metabolic flux coupling and network position on gene co-regulation, using a genome-scale metabolic model of Escherichia coli. After validating the computational method with empirical data on flux correlations, we confirm that genes coupled by their enzymatic fluxes not only show similar expression patterns, but also share transcriptional regulators and frequently reside in the same operon. In contrast, we demonstrate that network distance per se has relatively minor influence on gene co-regulation. Moreover, the type of flux coupling can explain refined properties of the regulatory network that are ignored by simple graph-theoretical indices. Our results underline the importance of studying functional states of cellular networks to define physiologically relevant associations between genes and should stimulate future developments of novel functional genomic tools.
Fang, Minghong; Hu, Xiaohua; Wang, Yan; Zhao, Junmin; Shen, Xianjun; He, Tingting
Disease-causing genes prioritization is very important to understand disease mechanisms and biomedical applications, such as design of drugs. Previous studies have shown that promising candidate genes are mostly ranked according to their relatedness to known disease genes or closely related disease genes. Therefore, a dangling gene (isolated gene) with no edges in the network can not be effectively prioritized. These approaches tend to prioritize those genes that are highly connected in the PPI network while perform poorly when they are applied to loosely connected disease genes. To address these problems, we propose a new disease-causing genes prioritization method that based on network diffusion and rank concordance (NDRC). The method is evaluated by leave-one-out cross validation on 1931 diseases in which at least one gene is known to be involved, and it is able to rank the true causal gene first in 849 of all 2542 cases. The experimental results suggest that NDRC significantly outperforms other existing methods such as RWR, VAVIEN, DADA and PRINCE on identifying loosely connected disease genes and successfully put dangling genes as potential candidate disease genes. Furthermore, we apply NDRC method to study three representative diseases, Meckel syndrome 1, Protein C deficiency and Peroxisome biogenesis disorder 1A (Zellweger). Our study has also found that certain complex disease-causing genes can be divided into several modules that are closely associated with different disease phenotype.
Full Text Available Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes that covary in all of the samples or in only a subset of the samples. Our biclustering method, BicMix, allows overcomplete representations of the data, computational tractability, and joint modeling of unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios as compared to state-of-the-art biclustering methods. Further, we develop a principled method to recover context specific gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and to gene expression data from a cardiovascular study cohort, and we recover gene co-expression networks that are differential across ER+ and ER- samples and across male and female samples. We apply BicMix to the Genotype-Tissue Expression (GTEx pilot data, and we find tissue specific gene networks. We validate these findings by using our tissue specific networks to identify trans-eQTLs specific to one of four primary tissues.
Lodowski Kerrie H
Full Text Available Gene expression time course data can be used not only to detect differentially expressed genes but also to find temporal associations among genes. The problem of reconstructing generalized logical networks to account for temporal dependencies among genes and environmental stimuli from transcriptomic data is addressed. A network reconstruction algorithm was developed that uses statistical significance as a criterion for network selection to avoid false-positive interactions arising from pure chance. The multinomial hypothesis testing-based network reconstruction allows for explicit specification of the false-positive rate, unique from all extant network inference algorithms. The method is superior to dynamic Bayesian network modeling in a simulation study. Temporal gene expression data from the brains of alcohol-treated mice in an analysis of the molecular response to alcohol are used for modeling. Genes from major neuronal pathways are identified as putative components of the alcohol response mechanism. Nine of these genes have associations with alcohol reported in literature. Several other potentially relevant genes, compatible with independent results from literature mining, may play a role in the response to alcohol. Additional, previously unknown gene interactions were discovered that, subject to biological verification, may offer new clues in the search for the elusive molecular mechanisms of alcoholism.
Wei, Jiangyong; Hu, Xiaohua; Zou, Xiufen; Tian, Tianhai
Recent advances in omics technologies have raised great opportunities to study large-scale regulatory networks inside the cell. In addition, single-cell experiments have measured the gene and protein activities in a large number of cells under the same experimental conditions. However, a significant challenge in computational biology and bioinformatics is how to derive quantitative information from the single-cell observations and how to develop sophisticated mathematical models to describe the dynamic properties of regulatory networks using the derived quantitative information. This work designs an integrated approach to reverse-engineer gene networks for regulating early blood development based on singel-cell experimental observations. The wanderlust algorithm is initially used to develop the pseudo-trajectory for the activities of a number of genes. Since the gene expression data in the developed pseudo-trajectory show large fluctuations, we then use Gaussian process regression methods to smooth the gene express data in order to obtain pseudo-trajectories with much less fluctuations. The proposed integrated framework consists of both bioinformatics algorithms to reconstruct the regulatory network and mathematical models using differential equations to describe the dynamics of gene expression. The developed approach is applied to study the network regulating early blood cell development. A graphic model is constructed for a regulatory network with forty genes and a dynamic model using differential equations is developed for a network of nine genes. Numerical results suggests that the proposed model is able to match experimental data very well. We also examine the networks with more regulatory relations and numerical results show that more regulations may exist. We test the possibility of auto-regulation but numerical simulations do not support the positive auto-regulation. In addition, robustness is used as an importantly additional criterion to select candidate
Aaron R Wolen
Full Text Available Individual differences in initial sensitivity to ethanol are strongly related to the heritable risk of alcoholism in humans. To elucidate key molecular networks that modulate ethanol sensitivity we performed the first systems genetics analysis of ethanol-responsive gene expression in brain regions of the mesocorticolimbic reward circuit (prefrontal cortex, nucleus accumbens, and ventral midbrain across a highly diverse family of 27 isogenic mouse strains (BXD panel before and after treatment with ethanol.Acute ethanol altered the expression of ~2,750 genes in one or more regions and 400 transcripts were jointly modulated in all three. Ethanol-responsive gene networks were extracted with a powerful graph theoretical method that efficiently summarized ethanol's effects. These networks correlated with acute behavioral responses to ethanol and other drugs of abuse. As predicted, networks were heavily populated by genes controlling synaptic transmission and neuroplasticity. Several of the most densely interconnected network hubs, including Kcnma1 and Gsk3β, are known to influence behavioral or physiological responses to ethanol, validating our overall approach. Other major hub genes like Grm3, Pten and Nrg3 represent novel targets of ethanol effects. Networks were under strong genetic control by variants that we mapped to a small number of chromosomal loci. Using a novel combination of genetic, bioinformatic and network-based approaches, we identified high priority cis-regulatory candidate genes, including Scn1b, Gria1, Sncb and Nell2.The ethanol-responsive gene networks identified here represent a previously uncharacterized intermediate phenotype between DNA variation and ethanol sensitivity in mice. Networks involved in synaptic transmission were strongly regulated by ethanol and could contribute to behavioral plasticity seen with chronic ethanol. Our novel finding that hub genes and a small number of loci exert major influence over the ethanol
Full Text Available Abstract Background Being sessile organisms, plants should adjust their metabolism to dynamic changes in their environment. Such adjustments need particular coordination in branched metabolic networks in which a given metabolite can be converted into multiple other metabolites via different enzymatic chains. In the present report, we developed a novel "Gene Coordination" bioinformatics approach and use it to elucidate adjustable transcriptional interactions of two branched amino acid metabolic networks in plants in response to environmental stresses, using publicly available microarray results. Results Using our "Gene Coordination" approach, we have identified in Arabidopsis plants two oppositely regulated groups of "highly coordinated" genes within the branched Asp-family network of Arabidopsis plants, which metabolizes the amino acids Lys, Met, Thr, Ile and Gly, as well as a single group of "highly coordinated" genes within the branched aromatic amino acid metabolic network, which metabolizes the amino acids Trp, Phe and Tyr. These genes possess highly coordinated adjustable negative and positive expression responses to various stress cues, which apparently regulate adjustable metabolic shifts between competing branches of these networks. We also provide evidence implying that these highly coordinated genes are central to impose intra- and inter-network interactions between the Asp-family and aromatic amino acid metabolic networks as well as differential system interactions with other growth promoting and stress-associated genome-wide genes. Conclusion Our novel Gene Coordination elucidates that branched amino acid metabolic networks in plants are regulated by specific groups of highly coordinated genes that possess adjustable intra-network, inter-network and genome-wide transcriptional interactions. We also hypothesize that such transcriptional interactions enable regulatory metabolic adjustments needed for adaptation to the stresses.
Full Text Available Despite recent improvements in molecular techniques, biological knowledge remains incomplete. Any theorizing about living systems is therefore necessarily based on the use of heterogeneous and partial information. Much current research has focused successfully on the qualitative behaviors of macromolecular networks. Nonetheless, it is not capable of taking into account available quantitative information such as time-series protein concentration variations. The present work proposes a probabilistic modeling framework that integrates both kinds of information. Average case analysis methods are used in combination with Markov chains to link qualitative information about transcriptional regulations to quantitative information about protein concentrations. The approach is illustrated by modeling the carbon starvation response in Escherichia coli. It accurately predicts the quantitative time-series evolution of several protein concentrations using only knowledge of discrete gene interactions and a small number of quantitative observations on a single protein concentration. From this, the modeling technique also derives a ranking of interactions with respect to their importance during the experiment considered. Such a classification is confirmed by the literature. Therefore, our method is principally novel in that it allows (i a hybrid model that integrates both qualitative discrete model and quantities to be built, even using a small amount of quantitative information, (ii new quantitative predictions to be derived, (iii the robustness and relevance of interactions with respect to phenotypic criteria to be precisely quantified, and (iv the key features of the model to be extracted that can be used as a guidance to design future experiments.
Christoph Tim Krannich
Full Text Available Climate change leading to increased periods of low water availability as well as increasing demands for food in the coming years makes breeding for drought tolerant crops a high priority. Plants have developed diverse strategies and mechanisms to survive drought stress. However, most of these represent drought escape or avoidance strategies like early flowering or low stomatal conductance that are not applicable in breeding for crops with high yields under drought conditions. Even though a great deal of research is ongoing, especially in cereals, in this regard, not all mechanisms involved in drought tolerance are yet understood. The identification of candidate genes for drought tolerance that have a high potential to be used for breeding drought tolerant crops represents a challenge. Breeding for drought tolerant crops has to focus on acceptable yields under water-limited conditions and not on survival. However, as more and more knowledge about the complex networks and the cross talk during drought is available, more options are revealed. In addition, it has to be considered that conditioning a crop for drought tolerance might require the production of metabolites and might cost the plants energy and resources that cannot be used in terms of yield. Recent research indicates that yield penalty exists and efficient breeding for drought tolerant crops with acceptable yields under well-watered and drought conditions might require uncoupling yield penalty from drought tolerance.
Full Text Available BACKGROUND: The underlying change of gene network expression of Guillain-Barré syndrome (GBS remains elusive. We sought to identify GBS-associated gene networks and signaling pathways by analyzing the transcriptional profile of leukocytes in the patients with GBS. METHODS AND FINDINGS: Quantitative global gene expression microarray analysis of peripheral blood leukocytes was performed on 7 patients with GBS and 7 healthy controls. Gene expression profiles were compared between patients and controls after standardization. The set of genes that significantly correlated with GBS was further analyzed by Ingenuity Pathways Analyses. 256 genes and 18 gene networks were significantly associated with GBS (fold change ≥2, P<0.05. FOS, PTGS2, HMGB2 and MMP9 are the top four of 246 significantly up-regulated genes. The most significant disease and altered biological function genes associated with GBS were those involved in inflammatory response, infectious disease, and respiratory disease. Cell death, cellular development and cellular movement were the top significant molecular and cellular functions involved in GBS. Hematological system development and function, immune cell trafficking and organismal survival were the most significant GBS-associated function in physiological development and system category. Several hub genes, such as MMP9, PTGS2 and CREB1 were identified in the associated gene networks. Canonical pathway analysis showed that GnRH, corticotrophin-releasing hormone and ERK/MAPK signaling were the most significant pathways in the up-regulated gene set in GBS. CONCLUSIONS: This study reveals the gene networks and canonical pathways associated with GBS. These data provide not only networks between the genes for understanding the pathogenic properties of GBS but also map significant pathways for the future development of novel therapeutic strategies.
Hur, Junguk; Özgür, Arzucan; He, Yongqun
Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of
Yang, Lun; Wei, Dong-Qing; Qi, Ying-Xin; Jiang, Zong-Lai
Identifying genes related to human diseases, such as cancer and cardiovascular disease, etc., is an important task in biomedical research because of its applications in disease diagnosis and treatment. Interactome networks, especially protein-protein interaction networks, had been used to disease genes identification based on the hypothesis that strong candidate genes tend to closely relate to each other in some kinds of measure on the network. We proposed a new measure to analyze the relationship between network nodes which was called graphlet interaction. The graphlet interaction contained 28 different isomers. The results showed that the numbers of the graphlet interaction isomers between disease genes in interactome networks were significantly larger than random picked genes, while graphlet signatures were not. Then, we designed a new type of score, based on the network properties, to identify disease genes using graphlet interaction. The genes with higher scores were more likely to be disease genes, and all candidate genes were ranked according to their scores. Then the approach was evaluated by leave-one-out cross-validation. The precision of the current approach achieved 90% at about 10% recall, which was apparently higher than the previous three predominant algorithms, random walk, Endeavour and neighborhood based method. Finally, the approach was applied to predict new disease genes related to 4 common diseases, most of which were identified by other independent experimental researches. In conclusion, we demonstrate that the graphlet interaction is an effective tool to analyze the network properties of disease genes, and the scores calculated by graphlet interaction is more precise in identifying disease genes. PMID:24465923
Full Text Available Identifying genes related to human diseases, such as cancer and cardiovascular disease, etc., is an important task in biomedical research because of its applications in disease diagnosis and treatment. Interactome networks, especially protein-protein interaction networks, had been used to disease genes identification based on the hypothesis that strong candidate genes tend to closely relate to each other in some kinds of measure on the network. We proposed a new measure to analyze the relationship between network nodes which was called graphlet interaction. The graphlet interaction contained 28 different isomers. The results showed that the numbers of the graphlet interaction isomers between disease genes in interactome networks were significantly larger than random picked genes, while graphlet signatures were not. Then, we designed a new type of score, based on the network properties, to identify disease genes using graphlet interaction. The genes with higher scores were more likely to be disease genes, and all candidate genes were ranked according to their scores. Then the approach was evaluated by leave-one-out cross-validation. The precision of the current approach achieved 90% at about 10% recall, which was apparently higher than the previous three predominant algorithms, random walk, Endeavour and neighborhood based method. Finally, the approach was applied to predict new disease genes related to 4 common diseases, most of which were identified by other independent experimental researches. In conclusion, we demonstrate that the graphlet interaction is an effective tool to analyze the network properties of disease genes, and the scores calculated by graphlet interaction is more precise in identifying disease genes.
Davila-Velderrain, Jose; Servin-Marquez, Andres; Alvarez-Buylla, Elena R
The gene regulatory network of floral organ cell fate specification of Arabidopsis thaliana is a robust developmental regulatory module. Although such finding was proposed to explain the overall conservation of floral organ types and organization among angiosperms, it has not been confirmed that the network components are conserved at the molecular level among flowering plants. Using the genomic data that have accumulated, we address the conservation of the genes involved in this network and the forces that have shaped its evolution during the divergence of angiosperms. We recovered the network gene homologs for 18 species of flowering plants spanning nine families. We found that all the genes are highly conserved with no evidence of positive selection. We studied the sequence conservation features of the genes in the context of their known biological function and the strength of the purifying selection acting upon them in relation to their placement within the network. Our results suggest an association between protein length and sequence conservation, evolutionary rates, and functional category. On the other hand, we found no significant correlation between the strength of purifying selection and gene placement. Our results confirm that the studied robust developmental regulatory module has been subjected to strong functional constraints. However, unlike previous studies, our results do not support the notion that network topology plays a major role in constraining evolutionary rates. We speculate that the dynamical functional role of genes within the network and not just its connectivity could play an important role in constraining evolution.
Masalia, Rishi R; Bewick, Adam J; Burke, John M
Gene coexpression networks are a useful tool for summarizing transcriptomic data and providing insight into patterns of gene regulation in a variety of species. Though there has been considerable interest in studying the evolution of network topology across species, less attention has been paid to the relationship between network position and patterns of molecular evolution. Here, we generated coexpression networks from publicly available expression data for seven flowering plant taxa (Arabidopsis thaliana, Glycine max, Oryza sativa, Populus spp., Solanum lycopersicum, Vitis spp., and Zea mays) to investigate the relationship between network position and rates of molecular evolution. We found a significant negative correlation between network connectivity and rates of molecular evolution, with more highly connected (i.e., "hub") genes having significantly lower nonsynonymous substitution rates and dN/dS ratios compared to less highly connected (i.e., "peripheral") genes across the taxa surveyed. These findings suggest that more centrally located hub genes are, on average, subject to higher levels of evolutionary constraint than are genes located on the periphery of gene coexpression networks. The consistency of this result across disparate taxa suggests that it holds for flowering plants in general, as opposed to being a species-specific phenomenon.
Kentzoglanakis, Kyriakos; Poole, Matthew
In this paper, we investigate the problem of reverse engineering the topology of gene regulatory networks from temporal gene expression data. We adopt a computational intelligence approach comprising swarm intelligence techniques, namely particle swarm optimization (PSO) and ant colony optimization (ACO). In addition, the recurrent neural network (RNN) formalism is employed for modeling the dynamical behavior of gene regulatory systems. More specifically, ACO is used for searching the discrete space of network architectures and PSO for searching the corresponding continuous space of RNN model parameters. We propose a novel solution construction process in the context of ACO for generating biologically plausible candidate architectures. The objective is to concentrate the search effort into areas of the structure space that contain architectures which are feasible in terms of their topological resemblance to real-world networks. The proposed framework is initially applied to the reconstruction of a small artificial network that has previously been studied in the context of gene network reverse engineering. Subsequently, we consider an artificial data set with added noise for reconstructing a subnetwork of the genetic interaction network of S. cerevisiae (yeast). Finally, the framework is applied to a real-world data set for reverse engineering the SOS response system of the bacterium Escherichia coli. Results demonstrate the relative advantage of utilizing problem-specific knowledge regarding biologically plausible structural properties of gene networks over conducting a problem-agnostic search in the vast space of network architectures.
Coelho Goncalves de Abreu, Gabriel; Labouriau, Rodrigo S.
We present a technique to characterize differentially expressed genes in terms of their position in a high-dimensional co-expression network. The set-up of Gaussian graphical models is used to construct representations of the co-expression network in such a way that redundancy and the propagation...... that allow to make effective inference in problems with high degree of complexity (e.g. several thousands of genes) and small number of observations (e.g. 10-100) as typically occurs in high throughput gene expression studies. Taking advantage of the internal structure of decomposable graphical models, we...... construct a compact representation of the co-expression network that allows to identify the regions with high concentration of differentially expressed genes. It is argued that differentially expressed genes located in highly interconnected regions of the co-expression network are less informative than...
Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro
The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.
Wang, Yi; Thilmony, Roger; Gu, Yong Q
Many lists containing biological identifiers, such as gene lists, have been generated in various genomics projects. Identifying the overlap among gene lists can enable us to understand the similarities and differences between the data sets. Here, we present an interactome network-based web application platform named NetVenn for comparing and mining the relationships among gene lists. NetVenn contains interactome network data publically available for several species and supports a user upload of customized interactome network data. It has an efficient and interactive graphic tool that provides a Venn diagram view for comparing two to four lists in the context of an interactome network. NetVenn also provides a comprehensive annotation of genes in the gene lists by using enriched terms from multiple functional databases. In addition, it allows for mapping the gene expression data, providing information of transcription status of genes in the network. The power graph analysis tool is integrated in NetVenn for simplified visualization of gene relationships in the network. NetVenn is freely available at http://probes.pw.usda.gov/NetVenn or http://wheat.pw.usda.gov/NetVenn. Published by Oxford University Press Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Khan, Abhinandan; Saha, Goutam; Pal, Rajat Kumar
A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives
Bruggeman, F.J.; Oancea, I.; van Driel, R.
Analysis of the genome organization of higher eukaryotes indicates that it contains many clusters of functionally related genes. In these clusters, the activity of a single gene is regulated hierarchically at a local gene-level and a global cluster-level. Whether a single gene can be activated by a
Full Text Available Abstract Background Photosynthetic light acclimation is an important process that allows plants to optimize the efficiency of photosynthesis, which is the core technology for green energy. However, currently little is known about the molecular mechanisms behind the regulation of the photosynthetic light acclimation response. In this study, a systematic method is proposed to investigate this mechanism by constructing gene regulatory networks from microarray data of Arabidopsis thaliana. Methods The potential TF-gene regulatory pairs of photosynthetic light acclimation have been obtained by data mining of literature and databases. Following the identification of these potential TF-gene pairs, they have been refined using Pearson's correlation, allowing the construction of a rough gene regulatory network. This rough gene regulatory network is then pruned using time series microarray data of Arabidopsis thaliana via the maximum likelihood system identification method and Akaike's system order detection method to approach the real gene regulatory network of photosynthetic light acclimation. Results By comparing the gene regulatory networks under the PSI-to-PSII light shift and the PSII-to-PSI light shift, it is possible to identify important transcription factors for the different light shift conditions. Furthermore, the robustness of the gene network, in particular the hubs and weak linkage points, are also discussed under the different light conditions to gain further insight into the mechanisms of photosynthesis. Conclusions This study investigates the molecular mechanisms of photosynthetic light acclimation for Arabidopsis thaliana from the physiological level. This has been achieved through the construction of gene regulatory networks from the limited data sources and literature via an efficient computation method. If more experimental data for whole-genome ChIP-chip data and microarray data with multiple sampling points becomes available in the
Thanks to the digital revolution, digital signal processing and control has been widely used in many areas of science and engineering today. It provides practical and powerful tools to model, simulate, analyze, design, measure, and control complex and dynamic systems such as robots and aircrafts. Gene networks are also complex dynamic systems which can be studied via digital signal processing and control. Unlike conventional computational methods, this approach is capable of not only modeling but also controlling gene networks since the experimental environment is mostly digital today. The overall aim of this article is to introduce digital signal processing and control as a useful tool for the study of gene networks.
Thanks to the digital revolution, digital signal processing and control has been widely used in many areas of science and engineering today. It provides practical and powerful tools to model, simulate, analyze, design, measure, and control complex and dynamic systems such as robots and aircrafts. Gene networks are also complex dynamic systems which can be studied via digital signal processing and control. Unlike conventional computational methods, this approach is capable of not only modeling but also controlling gene networks since the experimental environment is mostly digital today. The overall aim of this article is to introduce digital signal processing and control as a useful tool for the study of gene networks.
Full Text Available The interplay between entropy and robustness of gene network is a core mechanism of systems biology. The entropy is a measure of randomness or disorder of a physical system due to random parameter fluctuation and environmental noises in gene regulatory networks. The robustness of a gene regulatory network, which can be measured as the ability to tolerate the random parameter fluctuation and to attenuate the effect of environmental noise, will be discussed from the robust H∞ stabilization and filtering perspective. In this review, we will also discuss their balancing roles in evolution and potential applications in systems and synthetic biology.
Inoue, Masayo; Kaneko, Kunihiko
Cells generally adapt to environmental changes by first exhibiting an immediate response and then gradually returning to their original state to achieve homeostasis. Although simple network motifs consisting of a few genes have been shown to exhibit such adaptive dynamics, they do not reflect the complexity of real cells, where the expression of a large number of genes activates or represses other genes, permitting adaptive behaviors. Here, we investigated the responses of gene regulatory networks containing many genes that have undergone numerical evolution to achieve high fitness due to the adaptive response of only a single target gene; this single target gene responds to changes in external inputs and later returns to basal levels. Despite setting a single target, most genes showed adaptive responses after evolution. Such adaptive dynamics were not due to common motifs within a few genes; even without such motifs, almost all genes showed adaptation, albeit sometimes partial adaptation, in the sense that expression levels did not always return to original levels. The genes split into two groups: genes in the first group exhibited an initial increase in expression and then returned to basal levels, while genes in the second group exhibited the opposite changes in expression. From this model, genes in the first group received positive input from other genes within the first group, but negative input from genes in the second group, and vice versa. Thus, the adaptation dynamics of genes from both groups were consolidated. This cooperative adaptive behavior was commonly observed if the number of genes involved was larger than the order of ten. These results have implications in the collective responses of gene expression networks in microarray measurements of yeast Saccharomyces cerevisiae and the significance to the biological homeostasis of systems with many components.
Song, Mingzhou (Joe) [New Mexico State University, Las Cruces; Lewis, Chris K. [New Mexico State University, Las Cruces; Lance, Eric [New Mexico State University, Las Cruces; Chesler, Elissa J [ORNL; Kirova, Roumyana [Bristol-Myers Squibb Pharmaceutical Research & Development, NJ; Langston, Michael A [University of Tennessee, Knoxville (UTK); Bergeson, Susan [Texas Tech University, Lubbock
The problem of reconstructing generalized logical networks to account for temporal dependencies among genes and environmental stimuli from high-throughput transcriptomic data is addressed. A network reconstruction algorithm was developed that uses the statistical significance as a criterion for network selection to avoid false-positive interactions arising from pure chance. Using temporal gene expression data collected from the brains of alcohol-treated mice in an analysis of the molecular response to alcohol, this algorithm identified genes from a major neuronal pathway as putative components of the alcohol response mechanism. Three of these genes have known associations with alcohol in the literature. Several other potentially relevant genes, highlighted and agreeing with independent results from literature mining, may play a role in the response to alcohol. Additional, previously-unknown gene interactions were discovered that, subject to biological verification, may offer new clues in the search for the elusive molecular mechanisms of alcoholism.
Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey
Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.
Warmflash, Aryeh; Siggia, Eric D; Francois, Paul
The computational evolution of gene networks functions like a forward genetic screen to generate, without preconceptions, all networks that can be assembled from a defined list of parts to implement a given function. Frequently networks are subject to multiple design criteria that cannot all be optimized simultaneously. To explore how these tradeoffs interact with evolution, we implement Pareto optimization in the context of gene network evolution. In response to a temporal pulse of a signal, we evolve networks whose output turns on slowly after the pulse begins, and shuts down rapidly when the pulse terminates. The best performing networks under our conditions do not fall into categories such as feed forward and negative feedback that also encode the input–output relation we used for selection. Pareto evolution can more efficiently search the space of networks than optimization based on a single ad hoc combination of the design criteria. (paper)
Warmflash, Aryeh; Francois, Paul; Siggia, Eric D
The computational evolution of gene networks functions like a forward genetic screen to generate, without preconceptions, all networks that can be assembled from a defined list of parts to implement a given function. Frequently networks are subject to multiple design criteria that cannot all be optimized simultaneously. To explore how these tradeoffs interact with evolution, we implement Pareto optimization in the context of gene network evolution. In response to a temporal pulse of a signal, we evolve networks whose output turns on slowly after the pulse begins, and shuts down rapidly when the pulse terminates. The best performing networks under our conditions do not fall into categories such as feed forward and negative feedback that also encode the input-output relation we used for selection. Pareto evolution can more efficiently search the space of networks than optimization based on a single ad hoc combination of the design criteria.
Full Text Available BACKGROUND: Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global "omic" scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided. METHODOLOGY/PRINCIPAL FINDINGS: Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial
Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi
DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically
Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high
Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel
We present the current form of a provisional DNA sequence-based regulatory gene network that explains in outline how endomesodermal specification in the sea urchin embryo is controlled. The model of the network is in a continuous process of revision and growth as new genes are added and new experimental results become available; see http://www.its.caltech.edu/mirsky/endomeso.htm (End-mes Gene Network Update) for the latest version. The network contains over 40 genes at present, many newly uncovered in the course of this work, and most encoding DNA-binding transcriptional regulatory factors. The architecture of the network was approached initially by construction of a logic model that integrated the extensive experimental evidence now available on endomesoderm specification. The internal linkages between genes in the network have been determined functionally, by measurement of the effects of regulatory perturbations on the expression of all relevant genes in the network. Five kinds of perturbation have been applied: (1) use of morpholino antisense oligonucleotides targeted to many of the key regulatory genes in the network; (2) transformation of other regulatory factors into dominant repressors by construction of Engrailed repressor domain fusions; (3) ectopic expression of given regulatory factors, from genetic expression constructs and from injected mRNAs; (4) blockade of the beta-catenin/Tcf pathway by introduction of mRNA encoding the intracellular domain of cadherin; and (5) blockade of the Notch signaling pathway by introduction of mRNA encoding the extracellular domain of the Notch receptor. The network model predicts the cis-regulatory inputs that link each gene into the network. Therefore, its architecture is testable by cis-regulatory analysis. Strongylocentrotus purpuratus and Lytechinus variegatus genomic BAC recombinants that include a large number of the genes in the network have been sequenced and annotated. Tests of the cis-regulatory predictions of
Full Text Available In biology, networks are used in different contexts as ways to represent relationships between entities, such as for instance interactions between genes, proteins or metabolites. Despite progress in the analysis of such networks and their potential to better understand the collective impact of genes on complex traits, one remaining challenge is to establish the biologic validity of gene co-expression networks and to determine what governs their organization. We used WGCNA to construct and analyze seven gene expression datasets from several tissues of mouse recombinant inbred strains (RIS. For six out of the 7 networks, we found that linkage to module QTLs (mQTLs could be established for 29.3% of gene co-expression modules detected in the several mouse RIS. For about 74.6% of such genetically-linked modules, the mQTL was on the same chromosome as the one contributing most genes to the module, with genes originating from that chromosome showing higher connectivity than other genes in the modules. Such modules (that we considered as genetically-driven had network statistic properties (density, centralization and heterogeneity that set them apart from other modules in the network. Altogether, a sizeable portion of gene co-expression modules detected in mouse RIS panels had genetic determinants as their main organizing principle. In addition to providing a biologic interpretation validation for these modules, these genetic determinants imparted on them particular properties that set them apart from other modules in the network, to the point that they can be predicted to a large extent on the basis of their network statistics.
Full Text Available The impact of gene silencing on cellular phenotypes is difficult to establish due to the complexity of interactions in the associated biological processes and pathways. A recent genome-wide RNA knock-down study both identified and phenotypically characterized a set of important genes for the cell cycle in HeLa cells. Here, we combine a molecular interaction network analysis, based on physical and functional protein interactions, in conjunction with evolutionary information, to elucidate the common biological and topological properties of these key genes. Our results show that these genes tend to be conserved with their corresponding protein interactions across several species and are key constituents of the evolutionary conserved molecular interaction network. Moreover, a group of bistable network motifs is found to be conserved within this network, which are likely to influence the network stability and therefore the robustness of cellular functioning. They form a cluster, which displays functional homogeneity and is significantly enriched in genes phenotypically relevant for mitosis. Additional results reveal a relationship between specific cellular processes and the phenotypic outcomes induced by gene silencing. This study introduces new ideas regarding the relationship between genotype and phenotype in the context of the cell cycle. We show that the analysis of molecular interaction networks can result in the identification of genes relevant to cellular processes, which is a promising avenue for future research.
Defends the value and relevance of the study of ancient history and classics in history curricula. The unique homogeneity of the classical period contributes to its instructional manageability. A year-long, secondary-level course on fifth-century Greece and Rome is described to illustrate effective approaches to teaching ancient history. (AM)
Parsamian, Elma S.
The most important discovery, which enriched our knowledge of ancient astronomy in Armenia, was the complex of platforms for astronomical observations on the Small Hill of Metzamor, which may be called an ancient “observatory”. Investigations on that Hill show that the ancient inhabitants of the Armenian Highlands have left us not only pictures of celestial bodies, but a very ancient complex of platforms for observing the sky. Among the ancient monuments in Armenia there is a megalithic monument, probably, being connected with astronomy. 250km South-East of Yerevan there is a structure Zorats Kar (Karahunge) dating back to II millennium B.C. Vertical megaliths many of which are more than two meters high form stone rings resembling ancient stone monuments - henges in Great Britain and Brittany. Medieval observations of comets and novas by data in ancient Armenian manuscripts are found. In the collection of ancient Armenian manuscripts (Matenadaran) in Yerevan there are many manuscripts with information about observations of astronomical events as: solar and lunar eclipses, comets and novas, bolides and meteorites etc. in medieval Armenia.
Arbøll, Troels Pank
This dissertation is a microhistorical study of a single individual named Kiṣir-Aššur who practiced medicine in the ancient city of Assur (modern northern Iraq) in the 7th century BCE. The study provides the first detailed analysis of one healer’s education and practice in ancient Mesopotamia...
Jiang, Lu; Ball, Graham; Hodgman, Charlie; Coules, Anne; Zhao, Han; Lu, Chungui
Nitrogen (N) fertilizer has a major influence on the yield and quality. Understanding and optimising the response of crop plants to nitrogen fertilizer usage is of central importance in enhancing food security and agricultural sustainability. In this study, the analysis of gene regulatory networks reveals multiple genes and biological processes in response to N. Two microarray studies have been used to infer components of the nitrogen-response network. Since they used different array technologies, a map linking the two probe sets to the maize B73 reference genome has been generated to allow comparison. Putative Arabidopsis homologues of maize genes were used to query the Biological General Repository for Interaction Datasets (BioGRID) network, which yielded the potential involvement of three transcription factors (TFs) (GLK5, MADS64 and bZIP108) and a Calcium-dependent protein kinase. An Artificial Neural Network was used to identify influential genes and retrieved bZIP108 and WRKY36 as significant TFs in both microarray studies, along with genes for Asparagine Synthetase, a dual-specific protein kinase and a protein phosphatase. The output from one study also suggested roles for microRNA (miRNA) 399b and Nin-like Protein 15 (NLP15). Co-expression-network analysis of TFs with closely related profiles to known Nitrate-responsive genes identified GLK5, GLK8 and NLP15 as candidate regulators of genes repressed under low Nitrogen conditions, while bZIP108 might play a role in gene activation.
Full Text Available Modern technologies and especially next generation sequencing facilities are giving a cheaper access to genotype and genomic data measured on the same sample at once. This creates an ideal situation for multifactorial experiments designed to infer gene regulatory networks. The fifth "Dialogue for Reverse Engineering Assessments and Methods" (DREAM5 challenges are aimed at assessing methods and associated algorithms devoted to the inference of biological networks. Challenge 3 on "Systems Genetics" proposed to infer causal gene regulatory networks from different genetical genomics data sets. We investigated a wide panel of methods ranging from Bayesian networks to penalised linear regressions to analyse such data, and proposed a simple yet very powerful meta-analysis, which combines these inference methods. We present results of the Challenge as well as more in-depth analysis of predicted networks in terms of structure and reliability. The developed meta-analysis was ranked first among the 16 teams participating in Challenge 3A. It paves the way for future extensions of our inference method and more accurate gene network estimates in the context of genetical genomics.
Herranz, Héctor; Cohen, Stephen M
Biological systems are continuously challenged by an environment that is variable. Yet, a key feature of developmental and physiological processes is their remarkable stability. This review considers how microRNAs contribute to gene regulatory networks that confer robustness.
van Leeuwen, D. M.; Pedersen, Marie; Knudsen, Lisbeth E.
Mechanistically relevant information on responses of humans to xenobiotic exposure in relation to chemically induced biological effects, such as micronuclei (MN) formation can be obtained through large-scale transcriptomics studies. Network analysis may enhance the analysis and visualisation...... of such data. Therefore, this study aimed to develop a 'MN formation' network based on a priori knowledge, by using the pathway tool MetaCore. The gene network contained 27 genes and three gene complexes that are related to processes involved in MN formation, e.g. spindle assembly checkpoint, cell cycle...... checkpoint and aneuploidy. The MN-related gene network was tested against a transcriptomics case study associated with MN measurements. In this case study, transcriptomic data from children and adults differentially exposed to ambient air pollution in the Czech Republic were analysed and visualised...
Takahashi, Kei-ichiro; Takigawa, Ichigaku; Mamitsuka, Hiroshi
Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set assigned to one node has coexpressed genes. We evaluated each step of this procedure: 1) significance of the generated biclusters biologically and statistically, 2) biological quality of merged biclusters, and 3) biological significance of gene set networks. We emphasize that gene set networks, in which nodes are not genes but gene sets, can be more compact than usual gene networks, meaning that gene set networks are more comprehensible. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/miami/faces/index.jsp.
Full Text Available Chronic obstructive pulmonary disease (COPD is a multi-factor disease, in which metabolic disturbances played important roles. In this paper, functional information was integrated into a COPD-related metabolic network to assess similarity between genes. Then a gene prioritization method was applied to the COPD-related metabolic network to prioritize COPD candidate genes. The gene prioritization method was superior to ToppGene and ToppNet in both literature validation and functional enrichment analysis. Top-ranked genes prioritized from the metabolic perspective with functional information could promote the better understanding about the molecular mechanism of this disease. Top 100 genes might be potential markers for diagnostic and effective therapies.
He, Feng Q; Ollert, Markus
and the following-up network analysis, opens up new avenues to predict key genes driving a given biological process or cellular function. Here we review and compare the current approaches in predicting key genes, which have no chances to stand out by classic differential expression analysis, from gene......Identification of key genes for a given physiological or pathological process is an essential but still very challenging task for the entire biomedical research community. Statistics-based approaches, such as genome-wide association study (GWAS)- or quantitative trait locus (QTL)-related analysis...... have already made enormous contributions to identifying key genes associated with a given disease or phenotype, the success of which is however very much dependent on a huge number of samples. Recent advances in network biology, especially network inference directly from genome-scale data...
Arsovski, Andrej A; Pradinuk, Julian; Guo, Xu Qiu; Wang, Sishuo; Adams, Keith L
Plant genomes contain large numbers of duplicated genes that contribute to the evolution of new functions. Following duplication, genes can exhibit divergence in their coding sequence and their expression patterns. Changes in the cis-regulatory element landscape can result in changes in gene expression patterns. High-throughput methods developed recently can identify potential cis-regulatory elements on a genome-wide scale. Here, we use a recent comprehensive data set of DNase I sequencing-identified cis-regulatory binding sites (footprints) at single-base-pair resolution to compare binding sites and network connectivity in duplicated gene pairs in Arabidopsis (Arabidopsis thaliana). We found that duplicated gene pairs vary greatly in their cis-regulatory element architecture, resulting in changes in regulatory network connectivity. Whole-genome duplicates (WGDs) have approximately twice as many footprints in their promoters left by potential regulatory proteins than do tandem duplicates (TDs). The WGDs have a greater average number of footprint differences between paralogs than TDs. The footprints, in turn, result in more regulatory network connections between WGDs and other genes, forming denser, more complex regulatory networks than shown by TDs. When comparing regulatory connections between duplicates, WGDs had more pairs in which the two genes are either partially or fully diverged in their network connections, but fewer genes with no network connections than the TDs. There is evidence of younger TDs and WGDs having fewer unique connections compared with older duplicates. This study provides insights into cis-regulatory element evolution and network divergence in duplicated genes. © 2015 American Society of Plant Biologists. All Rights Reserved.
Ben-Tabou de-Leon, Smadar; Davidson, Eric H
Gene regulatory networks for development underlie cell fate specification and differentiation. Network topology, logic and dynamics can be obtained by thorough experimental analysis. Our understanding of the gene regulatory network controlling endomesoderm specification in the sea urchin embryo has attained an advanced level such that it explains developmental phenomenology. Here we review how the network explains the mechanisms utilized in development to control the formation of dynamic expression patterns of transcription factors and signaling molecules. The network represents the genomic program controlling timely activation of specification and differentiation genes in the correct embryonic lineages. It can also be used to study evolution of body plans. We demonstrate how comparing the sea urchin gene regulatory network to that of the sea star and to that of later developmental stages in the sea urchin, reveals mechanisms underlying the origin of evolutionary novelty. The experimentally based gene regulatory network for endomesoderm specification in the sea urchin embryo provides unique insights into the system level properties of cell fate specification and its evolution.
Full Text Available In this study, we infer the breast cancer gene regulatory network from gene expression data. This network is obtained from the application of the BC3Net inference algorithm to a large-scale gene expression data set consisting of $351$ patient samples. In order to elucidate the functional relevance of the inferred network, we are performing a Gene Ontology (GO analysis for its structural components. Our analysis reveals that most significant GO-terms we find for the breast cancer network represent functional modules of biological processes that are described by known cancer hallmarks, including translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. Furthermore, by using a curated list of census cancer genes, we find an enrichment in these functional modules. Finally, we study cooperative effects of chromosomes based on information of interacting genes in the beast cancer network. We find that chromosome $21$ is most coactive with other chromosomes. To our knowledge this is the first study investigating the genome-scale breast cancer network.
Full Text Available With knowledge on microbial composition and diversity, investigation of within-community interactions is a further step to elucidate microbial ecological functions, such as the biodegradation of hazardous contaminants. In this work, microbial functional molecular ecological networks were studied in both contaminated and uncontaminated soils to determine the possible influences of oil contamination on microbial interactions and potential functions. Soil samples were obtained from an oil-exploring site located in South China, and the microbial functional genes were analyzed with GeoChip, a high-throughput functional microarray. By building random networks based on null model, we demonstrated that overall network structures and properties were significantly different between contaminated and uncontaminated soils (P < 0.001. Network connectivity, module numbers, and modularity were all reduced with contamination. Moreover, the topological roles of the genes (module hub and connectors were altered with oil contamination. Subnetworks of genes involved in alkane and polycyclic aromatic hydrocarbon degradation were also constructed. Negative co-occurrence patterns prevailed among functional genes, thereby indicating probable competition relationships. The potential keystone genes, defined as either hubs or genes with highest connectivities in the network, were further identified. The network constructed in this study predicted the potential effects of anthropogenic contamination on microbial community co-occurrence interactions.
Guo, Yuchun; Feng, Ying; Trivedi, Niraj S; Huang, Sui
Gene expression profiles consisting of ten thousands of transcripts are used for clustering of tissue, such as tumors, into subtypes, often without considering the underlying reason that the distinct patterns of expression arise because of constraints in the realization of gene expression profiles imposed by the gene regulatory network. The topology of this network has been suggested to consist of a regulatory core of genes represented most prominently by transcription factors (TFs) and microRNAs, that influence the expression of other genes, and of a periphery of 'enslaved' effector genes that are regulated but not regulating. This 'medusa' architecture implies that the core genes are much stronger determinants of the realized gene expression profiles. To test this hypothesis, we examined the clustering of gene expression profiles into known tumor types to quantitatively demonstrate that TFs, and even more pronounced, microRNAs, are much stronger discriminators of tumor type specific gene expression patterns than a same number of randomly selected or metabolic genes. These findings lend support to the hypothesis of a medusa architecture and of the canalizing nature of regulation by microRNAs. They also reveal the degree of freedom for the expression of peripheral genes that are less stringently associated with a tissue type specific global gene expression profile.
Full Text Available Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.
Ricardo M Ferreira
Full Text Available Whole genome protein-protein association networks are not random and their topological properties stem from genome evolution mechanisms. In fact, more connected, but less clustered proteins are related to genes that, in general, present more paralogs as compared to other genes, indicating frequent previous gene duplication episodes. On the other hand, genes related to conserved biological functions present few or no paralogs and yield proteins that are highly connected and clustered. These general network characteristics must have an evolutionary explanation. Considering data from STRING database, we present here experimental evidence that, more than not being scale free, protein degree distributions of organisms present an increased probability for high degree nodes. Furthermore, based on this experimental evidence, we propose a simulation model for genome evolution, where genes in a network are either acquired de novo using a preferential attachment rule, or duplicated with a probability that linearly grows with gene degree and decreases with its clustering coefficient. For the first time a model yields results that simultaneously describe different topological distributions. Also, this model correctly predicts that, to produce protein-protein association networks with number of links and number of nodes in the observed range for Eukaryotes, it is necessary 90% of gene duplication and 10% of de novo gene acquisition. This scenario implies a universal mechanism for genome evolution.
Full Text Available Abstract Background Reverse engineering of gene regulatory networks presents one of the big challenges in systems biology. Gene regulatory networks are usually inferred from a set of single-gene over-expressions and/or knockout experiments. Functional relationships between genes are retrieved either from the steady state gene expressions or from respective time series. Results We present a novel algorithm for gene network reconstruction on the basis of steady-state gene-chip data from over-expression experiments. The algorithm is based on a straight forward solution of a linear gene-dynamics equation, where experimental data is fed in as a first predictor for the solution. We compare the algorithm's performance with the NIR algorithm, both on the well known E. coli experimental data and on in-silico experiments. Conclusion We show superiority of the proposed algorithm in the number of correctly reconstructed links and discuss computational time and robustness. The proposed algorithm is not limited by combinatorial explosion problems and can be used in principle for large networks.
Zhou, Xuezhong; Liu, Baoyan; Wu, Zhaohui; Feng, Yi
The amount of biomedical data in different disciplines is growing at an exponential rate. Integrating these significant knowledge sources to generate novel hypotheses for systems biology research is difficult. Traditional Chinese medicine (TCM) is a completely different discipline, and is a complementary knowledge system to modern biomedical science. This paper uses a significant TCM bibliographic literature database in China, together with MEDLINE, to help discover novel gene functional knowledge. We present an integrative mining approach to uncover the functional gene relationships from MEDLINE and TCM bibliographic literature. This paper introduces TCM literature (about 50,000 records) as one knowledge source for constructing literature-based gene networks. We use the TCM diagnosis, TCM syndrome, to automatically congregate the related genes. The syndrome-gene relationships are discovered based on the syndrome-disease relationships extracted from TCM literature and the disease-gene relationships in MEDLINE. Based on the bubble-bootstrapping and relation weight computing methods, we have developed a prototype system called MeDisco/3S, which has name entity and relation extraction, and online analytical processing (OLAP) capabilities, to perform the integrative mining process. We have got about 200,000 syndrome-gene relations, which could help generate syndrome-based gene networks, and help analyze the functional knowledge of genes from syndrome perspective. We take the gene network of Kidney-Yang Deficiency syndrome (KYD syndrome) and the functional analysis of some genes, such as CRH (corticotropin releasing hormone), PTH (parathyroid hormone), PRL (prolactin), BRCA1 (breast cancer 1, early onset) and BRCA2 (breast cancer 2, early onset), to demonstrate the preliminary results. The underlying hypothesis is that the related genes of the same syndrome will have some biological functional relationships, and will constitute a functional network. This paper presents
Full Text Available Abstract Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user can be analyzed in the context of known
Full Text Available Abstract Background Reverse engineering gene networks and identifying regulatory interactions are integral to understanding cellular decision making processes. Advancement in high throughput experimental techniques has initiated innovative data driven analysis of gene regulatory networks. However, inherent noise associated with biological systems requires numerous experimental replicates for reliable conclusions. Furthermore, evidence of robust algorithms directly exploiting basic biological traits are few. Such algorithms are expected to be efficient in their performance and robust in their prediction. Results We have developed a network identification algorithm to accurately infer both the topology and strength of regulatory interactions from time series gene expression data in the presence of significant experimental noise and non-linear behavior. In this novel formulism, we have addressed data variability in biological systems by integrating network identification with the bootstrap resampling technique, hence predicting robust interactions from limited experimental replicates subjected to noise. Furthermore, we have incorporated non-linearity in gene dynamics using the S-system formulation. The basic network identification formulation exploits the trait of sparsity of biological interactions. Towards that, the identification algorithm is formulated as an integer-programming problem by introducing binary variables for each network component. The objective function is targeted to minimize the network connections subjected to the constraint of maximal agreement between the experimental and predicted gene dynamics. The developed algorithm is validated using both in silico and experimental data-sets. These studies show that the algorithm can accurately predict the topology and connection strength of the in silico networks, as quantified by high precision and recall, and small discrepancy between the actual and predicted kinetic parameters
Full Text Available BACKGROUND: Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical. METHODS AND RESULTS: In this study, we compared eight gene association methods - Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding's D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson - and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods. CONCLUSIONS: We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction.
Li, Peng; Gong, Ping; Li, Haoni; Perkins, Edward J; Wang, Nan; Zhang, Chaoyang
The Dialogue for Reverse Engineering Assessments and Methods (DREAM) project was initiated in 2006 as a community-wide effort for the development of network inference challenges for rigorous assessment of reverse engineering methods for biological networks. We participated in the in silico network inference challenge of DREAM3 in 2008. Here we report the details of our approach and its performance on the synthetic challenge datasets. In our methodology, we first developed a model called relative change ratio (RCR), which took advantage of the heterozygous knockdown data and null-mutant knockout data provided by the challenge, in order to identify the potential regulators for the genes. With this information, a time-delayed dynamic Bayesian network (TDBN) approach was then used to infer gene regulatory networks from time series trajectory datasets. Our approach considerably reduced the searching space of TDBN; hence, it gained a much higher efficiency and accuracy. The networks predicted using our approach were evaluated comparatively along with 29 other submissions by two metrics (area under the ROC curve and area under the precision-recall curve). The overall performance of our approach ranked the second among all participating teams.
Understanding the direction of information flow is essential for characterizing how genetic networks affect phenotypes. However, methods to find genetic interactions largely fail to reveal directional dependencies. We combine two orthogonal Cas9 proteins from Streptococcus pyogenes and Staphylococcus aureus to carry out a dual screen in which one gene is activated while a second gene is deleted in the same cell. We analyze the quantitative effects of activation and knockout to calculate genetic interaction and directionality scores for each gene pair.
Cardona, Gabriel; Pons, Joan Carles; Rosselló, Francesc
Lateral, or Horizontal, Gene Transfers are a type of asymmetric evolutionary events where genetic material is transferred from one species to another. In this paper we consider LGT networks, a general model of phylogenetic networks with lateral gene transfers which consist, roughly, of a principal rooted tree with its leaves labelled on a set of taxa, and a set of extra secondary arcs between nodes in this tree representing lateral gene transfers. An LGT network gives rise in a natural way to a principal phylogenetic subtree and a set of secondary phylogenetic subtrees, which, roughly, represent, respectively, the main line of evolution of most genes and the secondary lines of evolution through lateral gene transfers. We introduce a set of simple conditions on an LGT network that guarantee that its principal and secondary phylogenetic subtrees are pairwise different and that these subtrees determine, up to isomorphism, the LGT network. We then give an algorithm that, given a set of pairwise different phylogenetic trees [Formula: see text] on the same set of taxa, outputs, when it exists, the LGT network that satisfies these conditions and such that its principal phylogenetic tree is [Formula: see text] and its secondary phylogenetic trees are [Formula: see text].
Guthke, Reinhard; Möller, Ulrich; Hoffmann, Martin; Thies, Frank; Töpfer, Susanne
The immune response to bacterial infection represents a complex network of dynamic gene and protein interactions. We present an optimized reverse engineering strategy aimed at a reconstruction of this kind of interaction networks. The proposed approach is based on both microarray data and available biological knowledge. The main kinetics of the immune response were identified by fuzzy clustering of gene expression profiles (time series). The number of clusters was optimized using various evaluation criteria. For each cluster a representative gene with a high fuzzy-membership was chosen in accordance with available physiological knowledge. Then hypothetical network structures were identified by seeking systems of ordinary differential equations, whose simulated kinetics could fit the gene expression profiles of the cluster-representative genes. For the construction of hypothetical network structures singular value decomposition (SVD) based methods and a newly introduced heuristic Network Generation Method here were compared. It turned out that the proposed novel method could find sparser networks and gave better fits to the experimental data. Reinhard.Guthke@hki-jena.de.
Full Text Available With advances in next-generation sequencing(NGS technologies, a large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data. Breast cancer is known as a heterogeneous disease. The identification of subtype-specific driver genes is critical to guide the diagnosis, assessment of prognosis and treatment of breast cancer. We developed an integrated frame based on gene expression profiles and copy number variation (CNV data to identify breast cancer subtype-specific driver genes. In this frame, we employed statistical machine-learning method to select gene subsets and utilized an module-network analysis method to identify potential candidate driver genes. The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes. To validate specificity of the driver genes, the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes. The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods.
Genome analysis of a simultaneously predatory and prey-independent, novel Bdellovibrio bacteriovorus from the River Tiber, supports in silico predictions of both ancient and recent lateral gene transfer from diverse bacteria
Full Text Available Abstract Background Evolution equipped Bdellovibrio bacteriovorus predatory bacteria to invade other bacteria, digesting and replicating, sealed within them thus preventing nutrient-sharing with organisms in the surrounding environment. Bdellovibrio were previously described as “obligate predators” because only by mutations, often in gene bd0108, are 1 in ~1x107 of predatory lab strains of Bdellovibrio converted to prey-independent growth. A previous genomic analysis of B. bacteriovorus strain HD100 suggested that predatory consumption of prey DNA by lytic enzymes made Bdellovibrio less likely than other bacteria to acquire DNA by lateral gene transfer (LGT. However the Doolittle and Pan groups predicted, in silico, both ancient and recent lateral gene transfer into the B. bacteriovorus HD100 genome. Results To test these predictions, we isolated a predatory bacterium from the River Tiber- a good potential source of LGT as it is rich in diverse bacteria and organic pollutants- by enrichment culturing with E. coli prey cells. The isolate was identified as B. bacteriovorus and named as strain Tiberius. Unusually, this Tiberius strain showed simultaneous prey-independent growth on organic nutrients and predatory growth on live prey. Despite the prey-independent growth, the homolog of bd0108 did not have typical prey-independent-type mutations. The dual growth mode may reflect the high carbon content of the river, and gives B. bacteriovorus Tiberius extended non-predatory contact with the other bacteria present. The HD100 and Tiberius genomes were extensively syntenic despite their different cultured-terrestrial/freshly-isolated aquatic histories; but there were significant differences in gene content indicative of genomic flux and LGT. Gene content comparisons support previously published in silico predictions for LGT in strain HD100 with substantial conservation of genes predicted to have ancient LGT origins but little conservation of AT
Wang, Jiguang; Zhang, Shihua; Wang, Yong; Chen, Luonan; Zhang, Xiang-Sun
One of the challenging problems in biology and medicine is exploring the underlying mechanisms of genetic diseases. Recent studies suggest that the relationship between genetic diseases and the aging process is important in understanding the molecular mechanisms of complex diseases. Although some intricate associations have been investigated for a long time, the studies are still in their early stages. In this paper, we construct a human disease-aging network to study the relationship among aging genes and genetic disease genes. Specifically, we integrate human protein-protein interactions (PPIs), disease-gene associations, aging-gene associations, and physiological system-based genetic disease classification information in a single graph-theoretic framework and find that (1) human disease genes are much closer to aging genes than expected by chance; and (2) diseases can be categorized into two types according to their relationships with aging. Type I diseases have their genes significantly close to aging genes, while type II diseases do not. Furthermore, we examine the topological characters of the disease-aging network from a systems perspective. Theoretical results reveal that the genes of type I diseases are in a central position of a PPI network while type II are not; (3) more importantly, we define an asymmetric closeness based on the PPI network to describe relationships between diseases, and find that aging genes make a significant contribution to associations among diseases, especially among type I diseases. In conclusion, the network-based study provides not only evidence for the intricate relationship between the aging process and genetic diseases, but also biological implications for prying into the nature of human diseases.
Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong
Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.
Chen, Tianlong; Opitz, Madeleine; Bassler, Kevin E.
The rapidly growing amount of available gene expression data for many organisms makes the development of robust systematic methods for determining the structure and function of regulatory networks from that data an important goal. Recently, methods that use the context likelihood of relatedness to infer a network and then use modularity maximizing community detection algorithms on the inferred network to find the functional structure were shown to be effective. Improvements of these methods will be presented and applied to systematically study Escherichia coli expression data. First robust functionally related communities of genes are identified and then the structure of the more closely related genes within those communities are determined. Results will be compared with gene ontology terms and the RegulonDB database. Predictions of a number of significant new regulatory relations are found. Work supported by the NSF through Grants DMR-1507371 and IOS-1546858.
Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui
The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways. PMID:29049295
Ernst, Mathias; Du, Yang; Warsow, Gregor; Hamed, Mohamed; Endlich, Nicole; Endlich, Karlhans; Murua Escobar, Hugo; Sklarz, Lisa-Madeleine; Sender, Sina; Junghanß, Christian; Möller, Steffen; Fuellen, Georg; Struckmann, Stephan
To identify genes contributing to disease phenotypes remains a challenge for bioinformatics. Static knowledge on biological networks is often combined with the dynamics observed in gene expression levels over disease development, to find markers for diagnostics and therapy, and also putative disease-modulatory drug targets and drugs. The basis of current methods ranges from a focus on expression-levels (Limma) to concentrating on network characteristics (PageRank, HITS/Authority Score), and both (DeMAND, Local Radiality). We present an integrative approach (the FocusHeuristics) that is thoroughly evaluated based on public expression data and molecular disease characteristics provided by DisGeNet. The FocusHeuristics combines three scores, i.e. the log fold change and another two, based on the sum and difference of log fold changes of genes/proteins linked in a network. A gene is kept when one of the scores to which it contributes is above a threshold. Our FocusHeuristics is both, a predictor for gene-disease-association and a bioinformatics method to reduce biological networks to their disease-relevant parts, by highlighting the dynamics observed in expression data. The FocusHeuristics is slightly, but significantly better than other methods by its more successful identification of disease-associated genes measured by AUC, and it delivers mechanistic explanations for its choice of genes.
Almathen, Faisal; Charruau, Pauline; Mohandesan, Elmira; Mwacharo, Joram M.; Orozco-terWengel, Pablo; Pitt, Daniel; Abdussamad, Abdussamad M.; Uerpmann, Margarethe; Uerpmann, Hans-Peter; De Cupere, Bea; Magee, Peter; Alnaqeeb, Majed A.; Salim, Bashir; Raziq, Abdul; Dessie, Tadelle; Abdelhadi, Omer M.; Banabazi, Mohammad H.; Al-Eknah, Marzook; Walzer, Chris; Faye, Bernard; Hofreiter, Michael; Peters, Joris; Hanotte, Olivier
Dromedaries have been fundamental to the development of human societies in arid landscapes and for long-distance trade across hostile hot terrains for 3,000 y. Today they continue to be an important livestock resource in marginal agro-ecological zones. However, the history of dromedary domestication and the influence of ancient trading networks on their genetic structure have remained elusive. We combined ancient DNA sequences of wild and early-domesticated dromedary samples from arid regions with nuclear microsatellite and mitochondrial genotype information from 1,083 extant animals collected across the species’ range. We observe little phylogeographic signal in the modern population, indicative of extensive gene flow and virtually affecting all regions except East Africa, where dromedary populations have remained relatively isolated. In agreement with archaeological findings, we identify wild dromedaries from the southeast Arabian Peninsula among the founders of the domestic dromedary gene pool. Approximate Bayesian computations further support the “restocking from the wild” hypothesis, with an initial domestication followed by introgression from individuals from wild, now-extinct populations. Compared with other livestock, which show a long history of gene flow with their wild ancestors, we find a high initial diversity relative to the native distribution of the wild ancestor on the Arabian Peninsula and to the brief coexistence of early-domesticated and wild individuals. This study also demonstrates the potential to retrieve ancient DNA sequences from osseous remains excavated in hot and dry desert environments. PMID:27162355
Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun
Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these
Rabyk, M V; Ostash, B O; Fedorenko, V O
Current advances in the research and practical applications of pleiotropic regulatory genes for antibiotic production in actinomycetes are reviewed. The basic regulatory mechanisms found in these bacteria are outlined. Examples described in the review show the importance of the manipulation of regulatory systems that affect the synthesis of antibiotics for the metabolic engineering of the actinomycetes. Also, the study of these genes is the basis for the development of genetic engineering approaches towards the induction of "cryptic" part of the actinomycetes secondary metabolome, which capacity for production of biologically active compounds is much bigger than the diversity of antibiotics underpinned by traditional microbiological screening. Besides the practical problems, the study of regulatory genes for antibiotic biosynthesis will provide insights into the process of evolution of complex regulatory systems that coordinate the expression of gene operons, clusters and regulons, involved in the control of secondary metabolism and morphogenesis of actinomycetes.
Full Text Available The output of state-of-the-art reverse-engineering methods for biological networks is often based on the fitting of a mathematical model to the data. Typically, different datasets do not give single consistent network predictions but rather an ensemble of inconsistent networks inferred under the same reverse-engineering method that are only consistent with the specific experimentally measured data. Here, we focus on an alternative approach for combining the information contained within such an ensemble of inconsistent gene networks called meta-analysis, to make more accurate predictions and to estimate the reliability of these predictions. We review two existing meta-analysis approaches; the Fisher transformation combined coefficient test (FTCCT and Fisher's inverse combined probability test (FICPT; and compare their performance with five well-known methods, ARACNe, Context Likelihood or Relatedness network (CLR, Maximum Relevance Minimum Redundancy (MRNET, Relevance Network (RN and Bayesian Network (BN. We conducted in-depth numerical ensemble simulations and demonstrated for biological expression data that the meta-analysis approaches consistently outperformed the best gene regulatory network inference (GRNI methods in the literature. Furthermore, the meta-analysis approaches have a low computational complexity. We conclude that the meta-analysis approaches are a powerful tool for integrating different datasets to give more accurate and reliable predictions for biological networks.
Takaku, Tomoiku; Ohyashiki, Junko H.; Zhang, Yu; Ohyashiki, Kazuma
The immune response to viral infection involves complex network of dynamic gene and protein interactions. We present here the dynamic gene network of the host immune response during human herpesvirus type 6 (HHV-6) infection in an adult T-cell leukemia cell line. Using a pathway-focused oligonucleotide DNA microarray, we found a possible association between chemokine genes regulating Th1/Th2 balance and genes regulating T-cell proliferation during HHV-6B infection. Gene network analysis using an integrated comprehensive workbench, VoyaGene, revealed that a gene encoding a TEC-family kinase, ITK, might be a putative modulator in the host immune response against HHV-6B infection. We conclude that Th2-dominated inflammatory reaction in host cells may play an important role in HHV-6B-infected T cells, thereby suggesting the possibility that ITK might be a therapeutic target in diseases related to dysregulation of Th1/Th2 balance. This study describes a novel approach to find genes related with the complex host-virus interaction using microarray data employing the Bayesian statistical framework
Full Text Available Inferring regulatory relationships among many genes based on their temporal variation in transcript abundance has been a popular research topic. Due to the nature of microarray experiments, classical tools for time series analysis lose power since the number of variables far exceeds the number of the samples. In this paper, we describe some of the existing multivariate inference techniques that are applicable to hundreds of variables and show the potential challenges for small-sample, large-scale data. We propose a directed partial correlation (DPC method as an efficient and effective solution to regulatory network inference using these data. Specifically for genomic data, the proposed method is designed to deal with large-scale datasets. It combines the efficiency of partial correlation for setting up network topology by testing conditional independence, and the concept of Granger causality to assess topology change with induced interruptions. The idea is that when a transcription factor is induced artificially within a gene network, the disruption of the network by the induction signifies a genes role in transcriptional regulation. The benchmarking results using GeneNetWeaver, the simulator for the DREAM challenges, provide strong evidence of the outstanding performance of the proposed DPC method. When applied to real biological data, the inferred starch metabolism network in Arabidopsis reveals many biologically meaningful network modules worthy of further investigation. These results collectively suggest DPC is a versatile tool for genomics research. The R package DPC is available for download (http://code.google.com/p/dpcnet/.
Park, Sungjoon; Kim, Jung Min; Shin, Wonho; Han, Sung Won; Jeon, Minji; Jang, Hyun Jin; Jang, Ik-Soon; Kang, Jaewoo
Identifying gene regulatory networks is an important task for understanding biological systems. Time-course measurement data became a valuable resource for inferring gene regulatory networks. Various methods have been presented for reconstructing the networks from time-course measurement data. However, existing methods have been validated on only a limited number of benchmark datasets, and rarely verified on real biological systems. We first integrated benchmark time-course gene expression datasets from previous studies and reassessed the baseline methods. We observed that GENIE3-time, a tree-based ensemble method, achieved the best performance among the baselines. In this study, we introduce BTNET, a boosted tree based gene regulatory network inference algorithm which improves the state-of-the-art. We quantitatively validated BTNET on the integrated benchmark dataset. The AUROC and AUPR scores of BTNET were higher than those of the baselines. We also qualitatively validated the results of BTNET through an experiment on neuroblastoma cells treated with an antidepressant. The inferred regulatory network from BTNET showed that brachyury, a transcription factor, was regulated by fluoxetine, an antidepressant, which was verified by the expression of its downstream genes. We present BTENT that infers a GRN from time-course measurement data using boosting algorithms. Our model achieved the highest AUROC and AUPR scores on the integrated benchmark dataset. We further validated BTNET qualitatively through a wet-lab experiment and showed that BTNET can produce biologically meaningful results.
van Oudenaarden, Alexander
Cells are intrinsically noisy biochemical reactors. This leads to random cell to cell variation (noise) in gene expression levels. First, I will address the source of this noise at the level of transcription and translation of a single gene. Our experimental results demonstrate that the intrinsic noise of a single gene is predominantly controlled at the translational level, and that increased translational efficiency leads to increased noise strength. This observation is consistent with a theoretical model in which proteins are randomly produced in sharp bursts followed by periods of slow decay. Second, I will explore the importance of genetic noise for a naturally occuring network: the lac operon. The classic lactose utilization network of E. coli has been under investigation for several decades and, in its simplest form the network may be modeled as a single positive feedback module. However, this simplicity is deceptive, as even this basic network is capable of complex metabolic behavior, including adaptation, amplification, and graded-to-binary response conversion. I will present single cell measurements on the expression of key genes in lactose uptake network and explore the importance of genetic noise on the regulation of these genes.
Jiang, Peng; Scarpa, Joseph R; Fitzpatrick, Karrie; Losic, Bojan; Gao, Vance D; Hao, Ke; Summa, Keith C; Yang, He S; Zhang, Bin; Allada, Ravi; Vitaterna, Martha H; Turek, Fred W; Kasarskis, Andrew
Sleep dysfunction and stress susceptibility are comorbid complex traits that often precede and predispose patients to a variety of neuropsychiatric diseases. Here, we demonstrate multilevel organizations of genetic landscape, candidate genes, and molecular networks associated with 328 stress and sleep traits in a chronically stressed population of 338 (C57BL/6J × A/J) F2 mice. We constructed striatal gene co-expression networks, revealing functionally and cell-type-specific gene co-regulations important for stress and sleep. Using a composite ranking system, we identified network modules most relevant for 15 independent phenotypic categories, highlighting a mitochondria/synaptic module that links sleep and stress. The key network regulators of this module are overrepresented with genes implicated in neuropsychiatric diseases. Our work suggests that the interplay among sleep, stress, and neuropathology emerges from genetic influences on gene expression and their collective organization through complex molecular networks, providing a framework for interrogating the mechanisms underlying sleep, stress susceptibility, and related neuropsychiatric disorders. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Vahedi, Golnaz; Faryabi, Babak; Chamberland, Jean-Francois; Datta, Aniruddha; Dougherty, Edward R
A prime objective of modeling genetic regulatory networks is the identification of potential targets for therapeutic intervention. To date, optimal stochastic intervention has been studied in the context of probabilistic Boolean networks, with the control policy based on the transition probability matrix of the associated Markov chain and dynamic programming used to find optimal control policies. Dynamical programming algorithms are problematic owing to their high computational complexity. Two additional computationally burdensome issues that arise are the potential for controlling the network and identifying the best gene for intervention. This paper proposes an algorithm based on mean first-passage time that assigns a stationary control policy for each gene candidate. It serves as an approximation to an optimal control policy and, owing to its reduced computational complexity, can be used to predict the best control gene. Once the best control gene is identified, one can derive an optimal policy or simply utilize the approximate policy for this gene when the network size precludes a direct application of dynamic programming algorithms. A salient point is that the proposed algorithm can be model-free. It can be directly designed from time-course data without having to infer the transition probability matrix of the network.
Full Text Available Assessing the contribution of promoters and coding sequences to gene evolution is an important step toward discovering the major genetic determinants of human evolution. Many specific examples have revealed the evolutionary importance of cis-regulatory regions. However, the relative contribution of regulatory and coding regions to the evolutionary process and whether systemic factors differentially influence their evolution remains unclear. To address these questions, we carried out an analysis at the genome scale to identify signatures of positive selection in human proximal promoters. Next, we examined whether genes with positively selected promoters (Prom+ genes show systemic differences with respect to a set of genes with positively selected protein-coding regions (Cod+ genes. We found that the number of genes in each set was not significantly different (8.1% and 8.5%, respectively. Furthermore, a functional analysis showed that, in both cases, positive selection affects almost all biological processes and only a few genes of each group are located in enriched categories, indicating that promoters and coding regions are not evolutionarily specialized with respect to gene function. On the other hand, we show that the topology of the human protein network has a different influence on the molecular evolution of proximal promoters and coding regions. Notably, Prom+ genes have an unexpectedly high centrality when compared with a reference distribution (P=0.008, for Eigenvalue centrality. Moreover, the frequency of Prom+ genes increases from the periphery to the center of the protein network (P=0.02, for the logistic regression coefficient. This means that gene centrality does not constrain the evolution of proximal promoters, unlike the case with coding regions, and further indicates that the evolution of proximal promoters is more efficient in the center of the protein network than in the periphery. These results show that proximal promoters
Chen, Haifen; Mundra, Piyushkumar A; Zhao, Li Na; Lin, Feng; Zheng, Jie
Gene regulatory network (GRN) is a fundamental topic in systems biology. The dynamics of GRN can shed light on the cellular processes, which facilitates the understanding of the mechanisms of diseases when the processes are dysregulated. Accurate reconstruction of GRN could also provide guidelines for experimental biologists. Therefore, inferring gene regulatory networks from high-throughput gene expression data is a central problem in systems biology. However, due to the inherent complexity of gene regulation, noise in measuring the data and the short length of time-series data, it is very challenging to reconstruct accurate GRNs. On the other hand, a better understanding into gene regulation could help to improve the performance of GRN inference. Time delay is one of the most important characteristics of gene regulation. By incorporating the information of time delays, we can achieve more accurate inference of GRN. In this paper, we propose a method to infer time-delayed gene regulation based on cross-correlation and network deconvolution (ND). First, we employ cross-correlation to obtain the probable time delays for the interactions between each target gene and its potential regulators. Then based on the inferred delays, the technique of ND is applied to identify direct interactions between the target gene and its regulators. Experiments on real-life gene expression datasets show that our method achieves overall better performance than existing methods for inferring time-delayed GRNs. By taking into account the time delays among gene interactions, our method is able to infer GRN more accurately. The effectiveness of our method has been shown by the experiments on three real-life gene expression datasets of yeast. Compared with other existing methods which were designed for learning time-delayed GRN, our method has significantly higher sensitivity without much reduction of specificity.
Full Text Available The GeneMANIA Cytoscape app enables users to construct a composite gene-gene functional interaction network from a gene list. The resulting network includes the genes most related to the original list, and functional annotations from Gene Ontology. The edges are annotated with details about the publication or data source the interactions were derived from. The app leverages GeneMANIA’s database of 1800+ networks, containing over 500 million interactions spanning 8 organisms: A. thaliana, C. elegans, D. melanogaster, D. rerio, H. sapiens, M. musculus, R. norvegicus, and S. cerevisiae. Users may also import their own organisms, networks, and expression profiles. The app is compatible with Cytoscape versions 2 and 3.
Gui, Shupeng; Rice, Andrew P; Chen, Rui; Wu, Liang; Liu, Ji; Miao, Hongyu
Gene regulatory interactions are of fundamental importance to various biological functions and processes. However, only a few previous computational studies have claimed success in revealing genome-wide regulatory landscapes from temporal gene expression data, especially for complex eukaryotes like human. Moreover, recent work suggests that these methods still suffer from the curse of dimensionality if a network size increases to 100 or higher. Here we present a novel scalable algorithm for identifying genome-wide gene regulatory network (GRN) structures, and we have verified the algorithm performances by extensive simulation studies based on the DREAM challenge benchmark data. The highlight of our method is that its superior performance does not degenerate even for a network size on the order of 10 4 , and is thus readily applicable to large-scale complex networks. Such a breakthrough is achieved by considering both prior biological knowledge and multiple topological properties (i.e., sparsity and hub gene structure) of complex networks in the regularized formulation. We also validate and illustrate the application of our algorithm in practice using the time-course gene expression data from a study on human respiratory epithelial cells in response to influenza A virus (IAV) infection, as well as the CHIP-seq data from ENCODE on transcription factor (TF) and target gene interactions. An interesting finding, owing to the proposed algorithm, is that the biggest hub structures (e.g., top ten) in the GRN all center at some transcription factors in the context of epithelial cell infection by IAV. The proposed algorithm is the first scalable method for large complex network structure identification. The GRN structure identified by our algorithm could reveal possible biological links and help researchers to choose which gene functions to investigate in a biological event. The algorithm described in this article is implemented in MATLAB Ⓡ , and the source code is freely
Full Text Available Abstract Background Bayesian Network (BN is a powerful approach to reconstructing genetic regulatory networks from gene expression data. However, expression data by itself suffers from high noise and lack of power. Incorporating prior biological knowledge can improve the performance. As each type of prior knowledge on its own may be incomplete or limited by quality issues, integrating multiple sources of prior knowledge to utilize their consensus is desirable. Results We introduce a new method to incorporate the quantitative information from multiple sources of prior knowledge. It first uses the Naïve Bayesian classifier to assess the likelihood of functional linkage between gene pairs based on prior knowledge. In this study we included cocitation in PubMed and schematic similarity in Gene Ontology annotation. A candidate network edge reservoir is then created in which the copy number of each edge is proportional to the estimated likelihood of linkage between the two corresponding genes. In network simulation the Markov Chain Monte Carlo sampling algorithm is adopted, and samples from this reservoir at each iteration to generate new candidate networks. We evaluated the new algorithm using both simulated and real gene expression data including that from a yeast cell cycle and a mouse pancreas development/growth study. Incorporating prior knowledge led to a ~2 fold increase in the number of known transcription regulations recovered, without significant change in false positive rate. In contrast, without the prior knowledge BN modeling is not always better than a random selection, demonstrating the necessity in network modeling to supplement the gene expression data with additional information. Conclusion our new development provides a statistical means to utilize the quantitative information in prior biological knowledge in the BN modeling of gene expression data, which significantly improves the performance.
Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g., wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network. The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each obtained causal ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in reconstructing the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network. PMID:24586224
Full Text Available Abstract Background A salient purpose for studying gene regulatory networks is to derive intervention strategies, the goals being to identify potential drug targets and design gene-based therapeutic intervention. Optimal stochastic control based on the transition probability matrix of the underlying Markov chain has been studied extensively for probabilistic Boolean networks. Optimization is based on minimization of a cost function and a key goal of control is to reduce the steady-state probability mass of undesirable network states. Owing to computational complexity, it is difficult to apply optimal control for large networks. Results In this paper, we propose three new greedy stationary control policies by directly investigating the effects on the network long-run behavior. Similar to the recently proposed mean-first-passage-time (MFPT control policy, these policies do not depend on minimization of a cost function and avoid the computational burden of dynamic programming. They can be used to design stationary control policies that avoid the need for a user-defined cost function because they are based directly on long-run network behavior; they can be used as an alternative to dynamic programming algorithms when the latter are computationally prohibitive; and they can be used to predict the best control gene with reduced computational complexity, even when one is employing dynamic programming to derive the final control policy. We compare the performance of these three greedy control policies and the MFPT policy using randomly generated probabilistic Boolean networks and give a preliminary example for intervening in a mammalian cell cycle network. Conclusion The newly proposed control policies have better performance in general than the MFPT policy and, as indicated by the results on the mammalian cell cycle network, they can potentially serve as future gene therapeutic intervention strategies.
Full Text Available Duplications of genes encoding highly connected and essential proteins are selected against in several species but not in human, where duplicated genes encode highly connected proteins. To understand when and how gene duplicability changed in evolution, we compare gene and network properties in four species (Escherichia coli, yeast, fly, and human that are representative of the increase in evolutionary complexity, defined as progressive growth in the number of genes, cells, and cell types. We find that the origin and conservation of a gene significantly correlates with the properties of the encoded protein in the protein-protein interaction network. All four species preserve a core of singleton and central hubs that originated early in evolution, are highly conserved, and accomplish basic biological functions. Another group of hubs appeared in metazoans and duplicated in vertebrates, mostly through vertebrate-specific whole genome duplication. Such recent and duplicated hubs are frequently targets of microRNAs and show tissue-selective expression, suggesting that these are alternative mechanisms to control their dosage. Our study shows how networks modified during evolution and contributes to explaining the occurrence of somatic genetic diseases, such as cancer, in terms of network perturbations.
Sewak, Mihir S; Reddy, Narender P; Duan, Zhong-Hui
Analysis of gene expression data provides an objective and efficient technique for sub-classification of leukemia. The purpose of the present study was to design a committee neural networks based classification systems to subcategorize leukemia gene expression data. In the study, a binary classification system was considered to differentiate acute lymphoblastic leukemia from acute myeloid leukemia. A ternary classification system which classifies leukemia expression data into three subclasses including B-cell acute lymphoblastic leukemia, T-cell acute lymphoblastic leukemia and acute myeloid leukemia was also developed. In each classification system gene expression profiles of leukemia patients were first subjected to a sequence of simple preprocessing steps. This resulted in filtering out approximately 95 percent of the non-informative genes. The remaining 5 percent of the informative genes were used to train a set of artificial neural networks with different parameters and architectures. The networks that gave the best results during initial testing were recruited into a committee. The committee decision was by majority voting. The committee neural network system was later evaluated using data not used in training. The binary classification system classified microarray gene expression profiles into two categories with 100 percent accuracy and the ternary system correctly predicted the three subclasses of leukemia in over 97 percent of the cases.
Full Text Available Hepatocellular carcinoma (HCC in a liver with advanced-stage chronic hepatitis C (CHC is induced by hepatitis C virus, which chronically infects about 170 million people worldwide. To elucidate the associations between gene groups in hepatocellular carcinogenesis, we analyzed the profiles of the genes characteristically expressed in the CHC and HCC cell stages by a statistical method for inferring the network between gene systems based on the graphical Gaussian model. A systematic evaluation of the inferred network in terms of the biological knowledge revealed that the inferred network was strongly involved in the known gene-gene interactions with high significance , and that the clusters characterized by different cancer-related responses were associated with those of the gene groups related to metabolic pathways and morphological events. Although some relationships in the network remain to be interpreted, the analyses revealed a snapshot of the orchestrated expression of cancer-related groups and some pathways related with metabolisms and morphological events in hepatocellular carcinogenesis, and thus provide possible clues on the disease mechanism and insights that address the gap between molecular and clinical assessments.
Mihir S. Sewak
Full Text Available Analysis of gene expression data provides an objective and efficient technique for sub‑classification of leukemia. The purpose of the present study was to design a committee neural networks based classification systems to subcategorize leukemia gene expression data. In the study, a binary classification system was considered to differentiate acute lymphoblastic leukemia from acute myeloid leukemia. A ternary classification system which classifies leukemia expression data into three subclasses including B‑cell acute lymphoblastic leukemia, T‑cell acute lymphoblastic leukemia and acute myeloid leukemia was also developed. In each classification system gene expression profiles of leukemia patients were first subjected to a sequence of simple preprocessing steps. This resulted in filtering out approximately 95 percent of the non‑informative genes. The remaining 5 percent of the informative genes were used to train a set of artificial neural networks with different parameters and architectures. The networks that gave the best results during initial testing were recruited into a committee. The committee decision was by majority voting. The committee neural network system was later evaluated using data not used in training. The binary classification system classified microarray gene expression profiles into two categories with 100 percent accuracy and the ternary system correctly predicted the three subclasses of leukemia in over 97 percent of the cases.
Cline, M.S.; Smoot, M.; Cerami, E.
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an ...
May 1, 2014 ... organization. 1.25e−9. Regulation of neurological system process. 1.76e−9. Protein tyrosine kinase activity. 8.22e−9. Protein autophosphorylation. 2.78e−8. Table 9. Q-value of one of the network modules of Dataset 1. Module. GO annotation. Q-value. Module 1. Anatomical structure formation involved.
Liang, Meimei; Zhang, Futao; Jin, Gulei; Zhu, Jun
Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out.
Full Text Available Nitrogen (N fertilizer has a major influence on the yield and quality. Understanding and optimising the response of crop plants to nitrogen fertilizer usage is of central importance in enhancing food security and agricultural sustainability. In this study, the analysis of gene regulatory networks reveals multiple genes and biological processes in response to N. Two microarray studies have been used to infer components of the nitrogen-response network. Since they used different array technologies, a map linking the two probe sets to the maize B73 reference genome has been generated to allow comparison. Putative Arabidopsis homologues of maize genes were used to query the Biological General Repository for Interaction Datasets (BioGRID network, which yielded the potential involvement of three transcription factors (TFs (GLK5, MADS64 and bZIP108 and a Calcium-dependent protein kinase. An Artificial Neural Network was used to identify influential genes and retrieved bZIP108 and WRKY36 as significant TFs in both microarray studies, along with genes for Asparagine Synthetase, a dual-specific protein kinase and a protein phosphatase. The output from one study also suggested roles for microRNA (miRNA 399b and Nin-like Protein 15 (NLP15. Co-expression-network analysis of TFs with closely related profiles to known Nitrate-responsive genes identified GLK5, GLK8 and NLP15 as candidate regulators of genes repressed under low Nitrogen conditions, while bZIP108 might play a role in gene activation.
Full Text Available Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out.
Ghaffari, Noushin; Ivanov, Ivan; Qian, Xiaoning; Dougherty, Edward R
One of the most important goals of the mathematical modeling of gene regulatory networks is to alter their behavior toward desirable phenotypes. Therapeutic techniques are derived for intervention in terms of stationary control policies. In large networks, it becomes computationally burdensome to derive an optimal control policy. To overcome this problem, greedy intervention approaches based on the concept of the Mean First Passage Time or the steady-state probability mass of the network states were previously proposed. Another possible approach is to use reduction mappings to compress the network and develop control policies on its reduced version. However, such mappings lead to loss of information and require an induction step when designing the control policy for the original network. In this paper, we propose a novel solution, CoD-CP, for designing intervention policies for large Boolean networks. The new method utilizes the Coefficient of Determination (CoD) and the Steady-State Distribution (SSD) of the model. The main advantage of CoD-CP in comparison with the previously proposed methods is that it does not require any compression of the original model, and thus can be directly designed on large networks. The simulation studies on small synthetic networks shows that CoD-CP performs comparable to previously proposed greedy policies that were induced from the compressed versions of the networks. Furthermore, on a large 17-gene gastrointestinal cancer network, CoD-CP outperforms other two available greedy techniques, which is precisely the kind of case for which CoD-CP has been developed. Finally, our experiments show that CoD-CP is robust with respect to the attractor structure of the model. The newly proposed CoD-CP provides an attractive alternative for intervening large networks where other available greedy methods require size reduction on the network and an extra induction step before designing a control policy.
Gravity plays a fundamental role in plant growth and development. Although a significant body of research has helped define the events of gravity perception, the role of the plant growth regulator auxin, and the mechanisms resulting in the gravity response, the events of signal transduction, those that link the biophysical action of perception to a biochemical signal that results in auxin redistribution, those that regulate the gravitropic effects on plant growth, remain, for the most part, a “black box.” Using a cold affect, dubbed the gravity persistent signal (GPS) response, we developed a mutant screen to specifically identify components of the signal transduction pathway. Cloning of the GPS genes have identified new proteins involved in gravitropic signaling. We have further exploited the GPS response using a multi-faceted approach including gene expression microarrays, proteomics analysis, and bioinformatics analysis and continued mutant analysis to identified additional genes, physiological and biochemical processes. Gene expression data provided the foundation of a regulatory network for gravitropic signaling. Based on these gene expression data and related data sets/information from the literature/repositories, we constructed a gravitropic signaling network for Arabidopsis inflorescence stems. To generate the network, both a dynamic Bayesian network approach and a time-lagged correlation coefficient approach were used. The dynamic Bayesian network added existing information of protein-protein interaction while the time-lagged correlation coefficient allowed incorporation of temporal regulation and thus could incorporate the time-course metric from the data set. Thus the methods complemented each other and provided us with a more comprehensive evaluation of connections. Each method generated a list of possible interactions associated with a statistical significance value. The two networks were then overlaid to generate a more rigorous, intersected
Full Text Available Abstract Background A reverse engineering of gene regulatory network with large number of genes and limited number of experimental data points is a computationally challenging task. In particular, reverse engineering using linear systems is an underdetermined and ill conditioned problem, i.e. the amount of microarray data is limited and the solution is very sensitive to noise in the data. Therefore, the reverse engineering of gene regulatory networks with large number of genes and limited number of data points requires rigorous optimization algorithm. Results This study presents a novel algorithm for reverse engineering with linear systems. The proposed algorithm is a combination of the orthogonal least squares, second order derivative for network pruning, and Bayesian model comparison. In this study, the entire network is decomposed into a set of small networks that are defined as unit networks. The algorithm provides each unit network with P(D|Hi, which is used as confidence level. The unit network with higher P(D|Hi has a higher confidence such that the unit network is correctly elucidated. Thus, the proposed algorithm is able to locate true positive interactions using P(D|Hi, which is a unique property of the proposed algorithm. The algorithm is evaluated with synthetic and Saccharomyces cerevisiae expression data using the dynamic Bayesian network. With synthetic data, it is shown that the performance of the algorithm depends on the number of genes, noise level, and the number of data points. With Yeast expression data, it is shown that there is remarkable number of known physical or genetic events among all interactions elucidated by the proposed algorithm. The performance of the algorithm is compared with Sparse Bayesian Learning algorithm using both synthetic and Saccharomyces cerevisiae expression data sets. The comparison experiments show that the algorithm produces sparser solutions with less false positives than Sparse Bayesian
Okada, Yukinori; Muramatsu, Tomoki; Suita, Naomasa; Kanai, Masahiro; Kawakami, Eiryo; Iotchkova, Valentina; Soranzo, Nicole; Inazawa, Johji; Tanaka, Toshihiro
The impact of microRNA (miRNA) on the genetics of human complex traits, especially in the context of miRNA-target gene networks, has not been fully assessed. Here, we developed a novel analytical method, MIGWAS, to comprehensively evaluate enrichment of genome-wide association study (GWAS) signals in miRNA-target gene networks. We applied the method to the GWAS results of the 18 human complex traits from >1.75 million subjects, and identified significant enrichment in rheumatoid arthritis (RA), kidney function, and adult height (P impact of miRNA-target gene networks on the genetics of human complex traits, and provided resources which should contribute to drug discovery and nucleic acid medicine.
Gao, Long; Uzun, Yasin; Gao, Peng; He, Bing; Ma, Xiaoke; Wang, Jiahui; Han, Shizhong; Tan, Kai
Identifying noncoding risk variants remains a challenging task. Because noncoding variants exert their effects in the context of a gene regulatory network (GRN), we hypothesize that explicit use of disease-relevant GRNs can significantly improve the inference accuracy of noncoding risk variants. We describe Annotation of Regulatory Variants using Integrated Networks (ARVIN), a general computational framework for predicting causal noncoding variants. It employs a set of novel regulatory network-based features, combined with sequence-based features to infer noncoding risk variants. Using known causal variants in gene promoters and enhancers in a number of diseases, we show ARVIN outperforms state-of-the-art methods that use sequence-based features alone. Additional experimental validation using reporter assay further demonstrates the accuracy of ARVIN. Application of ARVIN to seven autoimmune diseases provides a holistic view of the gene subnetwork perturbed by the combinatorial action of the entire set of risk noncoding mutations.
Coney Pei-Chen Lin
Full Text Available The spore wall of Saccharomyces cerevisiae is a multilaminar extracellular structure that is formed de novo in the course of sporulation. The outer layers of the spore wall provide spores with resistance to a wide variety of environmental stresses. The major components of the outer spore wall are the polysaccharide chitosan and a polymer formed from the di-amino acid dityrosine. Though the synthesis and export pathways for dityrosine have been described, genes directly involved in dityrosine polymerization and incorporation into the spore wall have not been identified. A synthetic gene array approach to identify new genes involved in outer spore wall synthesis revealed an interconnected network influencing dityrosine assembly. This network is highly redundant both for genes of different activities that compensate for the loss of each other and for related genes of overlapping activity. Several of the genes in this network have paralogs in the yeast genome and deletion of entire paralog sets is sufficient to severely reduce dityrosine fluorescence. Solid-state NMR analysis of partially purified outer spore walls identifies a novel component in spore walls from wild type that is absent in some of the paralog set mutants. Localization of gene products identified in the screen reveals an unexpected role for lipid droplets in outer spore wall formation.
Full Text Available Background. Gene expression levels change to adapt the stress, such as starvation, toxin, and radiation. The changes are signals transmitted through molecular interactions, eventually leading to two cellular fates, apoptosis and autophagy. Due to genetic variations, the signals may not be effectively transmitted to modulate apoptotic and autophagic responses. Such aberrant modulation may lead to carcinogenesis and drug resistance. The balance between apoptosis and autophagy becomes very crucial in coping with the stress. Though there have been evidences illustrating the apoptosis-autophagy interplay, the underlying mechanism and the participation of the regulators including transcription factors (TFs and microRNAs (miRNAs remain unclear. Results. Gene network is a graphical illustration for exploring the functional linkages and the potential coordinate regulations of genes. Microarray dataset for the study of chronic myeloid leukemia was obtained from Gene Expression Omnibus. The expression profiles of those genes related to apoptosis and autophagy, including MCL1, BCL2, ATG, beclin-1, BAX, BAK, E2F, cMYC, PI3K, AKT, BAD, and LC3, were extracted from the dataset to construct the gene networks. Conclusion. The network analysis of these genes explored the underlying mechanisms and the roles of TFs and miRNAs for the crosstalk between apoptosis and autophagy.
Full Text Available Retinitis pigmentosa (RP is a highly heterogeneous genetic visual disorder with more than 70 known causative genes, some of them shared with other non-syndromic retinal dystrophies (e.g. Leber congenital amaurosis, LCA. The identification of RP genes has increased steadily during the last decade, and the 30% of the cases that still remain unassigned will soon decrease after the advent of exome/genome sequencing. A considerable amount of genetic and functional data on single RD genes and mutations has been gathered, but a comprehensive view of the RP genes and their interacting partners is still very fragmentary. This is the main gap that needs to be filled in order to understand how mutations relate to progressive blinding disorders and devise effective therapies.We have built an RP-specific network (RPGeNet by merging data from different sources: high-throughput data from BioGRID and STRING databases, manually curated data for interactions retrieved from iHOP, as well as interactions filtered out by syntactical parsing from up-to-date abstracts and full-text papers related to the RP research field. The paths emerging when known RP genes were used as baits over the whole interactome have been analysed, and the minimal number of connections among the RP genes and their close neighbors were distilled in order to simplify the search space.In contrast to the analysis of single isolated genes, finding the networks linking disease genes renders powerful etiopathological insights. We here provide an interactive interface, RPGeNet, for the molecular biologist to explore the network centered on the non-syndromic and syndromic RP and LCA causative genes. By integrating tissue-specific expression levels and phenotypic data on top of that network, a more comprehensive biological view will highlight key molecular players of retinal degeneration and unveil new RP disease candidates.
Krienen, Fenna M; Yeo, B T Thomas; Ge, Tian; Buckner, Randy L; Sherwood, Chet C
The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute's human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections.
U Martin Singh-Blom
Full Text Available Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called Catapult (Combining dATa Across species using Positive-Unlabeled Learning Techniques, is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas Catapult is better suited to correctly identifying gene-trait associations overall [corrected].
Elizabeth A Osterndorff-Kahanek
Full Text Available Repeated ethanol exposure and withdrawal in mice increases voluntary drinking and represents an animal model of physical dependence. We examined time- and brain region-dependent changes in gene coexpression networks in amygdala (AMY, nucleus accumbens (NAC, prefrontal cortex (PFC, and liver after four weekly cycles of chronic intermittent ethanol (CIE vapor exposure in C57BL/6J mice. Microarrays were used to compare gene expression profiles at 0-, 8-, and 120-hours following the last ethanol exposure. Each brain region exhibited a large number of differentially expressed genes (2,000-3,000 at the 0- and 8-hour time points, but fewer changes were detected at the 120-hour time point (400-600. Within each region, there was little gene overlap across time (~20%. All brain regions were significantly enriched with differentially expressed immune-related genes at the 8-hour time point. Weighted gene correlation network analysis identified modules that were highly enriched with differentially expressed genes at the 0- and 8-hour time points with virtually no enrichment at 120 hours. Modules enriched for both ethanol-responsive and cell-specific genes were identified in each brain region. These results indicate that chronic alcohol exposure causes global 'rewiring' of coexpression systems involving glial and immune signaling as well as neuronal genes.
Full Text Available For the purpose of improving the prediction of cancer prognosis in the clinical researches, various algorithms have been developed to construct the predictive models with the gene signatures detected by DNA microarrays. Due to the heterogeneity of the clinical samples, the list of differentially expressed genes (DEGs generated by the statistical methods or the machine learning algorithms often involves a number of false positive genes, which are not associated with the phenotypic differences between the compared clinical conditions, and subsequently impacts the reliability of the predictive models. In this study, we proposed a strategy, which combined the statistical algorithm with the gene-pathway bipartite networks, to generate the reliable lists of cancer-related DEGs and constructed the models by using support vector machine for predicting the prognosis of three types of cancers, namely, breast cancer, acute myeloma leukemia, and glioblastoma. Our results demonstrated that, combined with the gene-pathway bipartite networks, our proposed strategy can efficiently generate the reliable cancer-related DEG lists for constructing the predictive models. In addition, the model performance in the swap analysis was similar to that in the original analysis, indicating the robustness of the models in predicting the cancer outcomes.
Linksvayer, Timothy A; Fewell, Jennifer H; Gadau, Jürgen; Laubichler, Manfred D
The evolution and development of complex phenotypes in social insect colonies, such as queen-worker dimorphism or division of labor, can, in our opinion, only be fully understood within an expanded mechanistic framework of Developmental Evolution. Conversely, social insects offer a fertile research area in which fundamental questions of Developmental Evolution can be addressed empirically. We review the concept of gene regulatory networks (GRNs) that aims to fully describe the battery of interacting genomic modules that are differentially expressed during the development of individual organisms. We discuss how distinct types of network models have been used to study different levels of biological organization in social insects, from GRNs to social networks. We propose that these hierarchical networks spanning different organizational levels from genes to societies should be integrated and incorporated into full GRN models to elucidate the evolutionary and developmental mechanisms underlying social insect phenotypes. Finally, we discuss prospects and approaches to achieve such an integration. © 2012 WILEY PERIODICALS, INC.
Soo-Jin Lee, Sandra; Borgelt, Emily
The combination of decreased genotyping costs and prolific social media use is fueling a personal genetic testing industry in which consumers purchase and interact with genetic risk information online. Consumers and their genetic risk profiles are protected in some respects by the 2008 federal Genetic Information Nondiscrimination Act (GINA), which forbids the discriminatory use of genetic information by employers and health insurers; however, practical and technical limitations undermine its enforceability, given the everyday practices of online social networking and its impact on the workplace. In the Web 2.0 era, employers in most states can legally search about job candidates and employees online, probing social networking sites for personal information that might bear on hiring and employment decisions. We examine GINA's protections for online sharing of genetic information as well as its limitations, and propose policy recommendations to address current gaps that leave employees' genetic information vulnerable in a Web-based world.
Ojeda, Sergio R.; Dubay, Christopher; Lomniczi, Alejandro; Kaidar, Gabi; Matagne, Valerie; Sandau, Ursula S.; Dissen, Gregory A.
A sustained increase in pulsatile release of gonadotrophin releasing hormone (GnRH) from the hypothalamus is an essential, final event that defines the initiation of mammalian puberty. This increase depends on coordinated changes in transsynaptic and glial-neuronal communication, consisting of activating neuronal and glial excitatory inputs to the GnRH neuronal network and the loss of transsynaptic inhibitory tone. It is now clear that the prevalent excitatory systems stimulating GnRH secreti...
Sean P Farris
Full Text Available Cocaine and alcohol are two substances of abuse that prominently affect the central nervous system (CNS. Repeated exposure to cocaine and alcohol leads to longstanding changes in gene expression, and subsequent functional CNS plasticity, throughout multiple brain regions. Epigenetic modifications of histones are one proposed mechanism guiding these enduring changes to the transcriptome. Characterizing the large number of available biological relationships as network models can reveal unexpected biochemical relationships. Clustering analysis of variation from whole-genome sequencing of gene expression (RNA-Seq and histone H3 lysine 4 trimethylation (H3K4me3 events (ChIP-Seq revealed the underlying structure of the transcriptional and epigenomic landscape within hippocampal postmortem brain tissue of drug abusers and control cases. Distinct sets of interrelated networks for cocaine and alcohol abuse were determined for each abusive substance. The network approach identified subsets of functionally related genes that are regulated in agreement with H3K4me3 changes, suggesting cause and effect relationships between this epigenetic mark and gene expression. Gene expression networks consisted of recognized substrates for addiction, such as the dopamine- and cAMP-regulated neuronal phosphoprotein PPP1R1B / DARPP-32 and the vesicular glutamate transporter SLC17A7 / VGLUT1 as well as potentially novel molecular targets for substance abuse. Through a systems biology based approach our results illustrate the utility of integrating epigenetic and transcript expression to establish relevant biological networks in the human brain for addiction. Future work with laboratory models may clarify the functional relevance of these gene networks for cocaine and alcohol, and provide a framework for the development of medications for the treatment of addiction.
Full Text Available Protein-Protein Interaction (PPI networks have been widely used for the task of predicting proteins involved in cancer. Previous research has shown that functional information about the protein for which a prediction is made, proximity to specific other proteins in the PPI network, as well as local network structure are informative features in this respect. In this work, we introduce two new types of input features, reflecting additional information: (1 Functional Context: the functions of proteins interacting with the target protein (rather than the protein itself; and (2 Structural Context: the relative position of the target protein with respect to specific other proteins selected according to a novel ANOVA (analysis of variance based measure. We also introduce a selection strategy to pinpoint the most informative features. Results show that the proposed feature types and feature selection strategy yield informative features. A standard machine learning method (Naive Bayes that uses the features proposed here outperforms the current state-of-the-art methods by more than 5% with respect to F-measure. In addition, manual inspection confirms the biological relevance of the top-ranked features.
Full Text Available Abstract Background The structure of molecular networks derives from dynamical processes on evolutionary time scales. For protein interaction networks, global statistical features of their structure can now be inferred consistently from several large-throughput datasets. Understanding the underlying evolutionary dynamics is crucial for discerning random parts of the network from biologically important properties shaped by natural selection. Results We present a detailed statistical analysis of the protein interactions in Saccharomyces cerevisiae based on several large-throughput datasets. Protein pairs resulting from gene duplications are used as tracers into the evolutionary past of the network. From this analysis, we infer rate estimates for two key evolutionary processes shaping the network: (i gene duplications and (ii gain and loss of interactions through mutations in existing proteins, which are referred to as link dynamics. Importantly, the link dynamics is asymmetric, i.e., the evolutionary steps are mutations in just one of the binding parters. The link turnover is shown to be much faster than gene duplications. Both processes are assembled into an empirically grounded, quantitative model for the evolution of protein interaction networks. Conclusions According to this model, the link dynamics is the dominant evolutionary force shaping the statistical structure of the network, while the slower gene duplication dynamics mainly affects its size. Specifically, the model predicts (i a broad distribution of the connectivities (i.e., the number of binding partners of a protein and (ii correlations between the connectivities of interacting proteins, a specific consequence of the asymmetry of the link dynamics. Both features have been observed in the protein interaction network of S. cerevisiae.
Full Text Available Different computational approaches have been examined and compared for inferring network relationships from time-series genomic data on human disease mechanisms under the recent Dialogue on Reverse Engineering Assessment and Methods (DREAM challenge. Many of these approaches infer all possible relationships among all candidate genes, often resulting in extremely crowded candidate network relationships with many more False Positives than True Positives. To overcome this limitation, we introduce a novel approach, Module Anchored Network Inference (MANI, that constructs networks by analyzing sequentially small adjacent building blocks (modules. Using MANI, we inferred a 7-gene adipogenesis network based on time-series gene expression data during adipocyte differentiation. MANI was also applied to infer two 10-gene networks based on time-course perturbation datasets from DREAM3 and DREAM4 challenges. MANI well inferred and distinguished serial, parallel, and time-dependent gene interactions and network cascades in these applications showing a superior performance to other in silico network inference techniques for discovering and reconstructing gene network relationships.
Full Text Available There is an important urgency to detect cancer at early stages to treat it, to improve the patients’ lifespans, and even to cure it. In this work, we determined the entropic contributions of genes in cancer networks. We detected sudden changes in entropy values in melanoma, hepatocellular carcinoma, pancreatic cancer, and squamous lung cell carcinoma associated to transitions from healthy controls to cancer. We also identified the most relevant genes involved in carcinogenic process of the four types of cancer with the help of entropic changes in local networks. Their corresponding proteins could be used as potential targets for treatments and as biomarkers of cancer.
Zahadat, Payam; Christensen, David Johan; Schultz, Ulrik Pagh
Designing controllers for modular robots is difficult due to the distributed and dynamic nature of the robots. In this paper fractal gene regulatory networks are evolved to control modular robots in a distributed way. Experiments with different morphologies of modular robot are performed and the ......Designing controllers for modular robots is difficult due to the distributed and dynamic nature of the robots. In this paper fractal gene regulatory networks are evolved to control modular robots in a distributed way. Experiments with different morphologies of modular robot are performed...
Zahadat, Payam; Christensen, David Johan; Katebi, Serajeddin
In this paper we study fractal gene regulatory network (FGRN) controllers based on sensory information. The FGRN controllers are evolved to control a snake robot consisting of seven simulated ATRON modules. Each module contains three tilt sensors which represent the direction of gravity in the co......In this paper we study fractal gene regulatory network (FGRN) controllers based on sensory information. The FGRN controllers are evolved to control a snake robot consisting of seven simulated ATRON modules. Each module contains three tilt sensors which represent the direction of gravity...
Neiburger, E J
Sumer, an empire in ancient Mesopotamia (southern Iraq), is well known as the cradle of our modern civilization and the home of biblical Abraham. An analysis of skeletal remains from cemeteries at the ancient cities of Ur and Kish (circa 2000 B.C.), show a genetically homogeneous, diseased, and short-lived population. These ancient Mesopotamians suffered severe dental attrition (95 percent), periodontal disease (42 percent), and caries (2 percent). Many oral congenital and neoplastic lesions were noted. During this period, the "local dentists" knew only a few modern dental techniques. Skeletal (dental) evidence indicates that the population suffered from chronic malnutrition. Malnutrition was probably caused by famine, which is substantiated in historic cuneiform and biblical writings, geologic strata samples, and analysis of skeletal and forensic dental pathology. These people had modern dentition but relatively poor dental health. The population's lack of malocclusions, caries, and TMJ problems appear to be due to flat plane occlusion.
Nguyen, Don Duy; Wu, Cheng-Hsuan; Moree, Wilna J.; Lamsa, Anne; Medema, Marnix H.; Zhao, Xiling; Gavilan, Ronnie G.; Aparicio, Marystella; Atencio, Librada; Jackson, Chanaye; Ballesteros, Javier; Sanchez, Joel; Watrous, Jeramie D.; Phelan, Vanessa V.; van de Wiel, Corine; Kersten, Roland D.; Mehnaz, Samina; De Mot, René; Shank, Elizabeth A.; Charusanti, Pep; Nagarajan, Harish; Duggan, Brendan M.; Moore, Bradley S.; Bandeira, Nuno; Palsson, Bernhard Ø.; Pogliano, Kit; Gutiérrez, Marcelino; Dorrestein, Pieter C.
The ability to correlate the production of specialized metabolites to the genetic capacity of the organism that produces such molecules has become an invaluable tool in aiding the discovery of biotechnologically applicable molecules. Here, we accomplish this task by matching molecular families with gene cluster families, making these correlations to 60 microbes at one time instead of connecting one molecule to one organism at a time, such as how it is traditionally done. We can correlate these families through the use of nanospray desorption electrospray ionization MS/MS, an ambient pressure MS technique, in conjunction with MS/MS networking and peptidogenomics. We matched the molecular families of peptide natural products produced by 42 bacilli and 18 pseudomonads through the generation of amino acid sequence tags from MS/MS data of specific clusters found in the MS/MS network. These sequence tags were then linked to biosynthetic gene clusters in publicly accessible genomes, providing us with the ability to link particular molecules with the genes that produced them. As an example of its use, this approach was applied to two unsequenced Pseudoalteromonas species, leading to the discovery of the gene cluster for a molecular family, the bromoalterochromides, in the previously sequenced strain P. piscicida JCM 20779T. The approach itself is not limited to 60 related strains, because spectral networking can be readily adopted to look at molecular family–gene cluster families of hundreds or more diverse organisms in one single MS/MS network. PMID:23798442
Full Text Available Several studies have reported gene expression signatures that predict recurrence risk in stage II and III colorectal cancer (CRC patients with minimal gene membership overlap and undefined biological relevance. The goal of this study was to investigate biological themes underlying these signatures, to infer genes of potential mechanistic importance to the CRC recurrence phenotype and to test whether accurate prognostic models can be developed using mechanistically important genes.We investigated eight published CRC gene expression signatures and found no functional convergence in Gene Ontology enrichment analysis. Using a random walk-based approach, we integrated these signatures and publicly available somatic mutation data on a protein-protein interaction network and inferred 487 genes that were plausible candidate molecular underpinnings for the CRC recurrence phenotype. We named the list of 487 genes a NEM signature because it integrated information from Network, Expression, and Mutation. The signature showed significant enrichment in four biological processes closely related to cancer pathophysiology and provided good coverage of known oncogenes, tumor suppressors, and CRC-related signaling pathways. A NEM signature-based Survival Support Vector Machine prognostic model was trained using a microarray gene expression dataset and tested on an independent dataset. The model-based scores showed a 75.7% concordance with the real survival data and separated patients into two groups with significantly different relapse-free survival (p = 0.002. Similar results were obtained with reversed training and testing datasets (p = 0.007. Furthermore, adjuvant chemotherapy was significantly associated with prolonged survival of the high-risk patients (p = 0.006, but not beneficial to the low-risk patients (p = 0.491.The NEM signature not only reflects CRC biology but also informs patient prognosis and treatment response. Thus, the network
Mukundi, Eric; Gomez-Cano, Fabio; Ouma, Wilberforce Zachary; Grotewold, Erich
Developing a knowledge base that contains all the information necessary for the researcher studying gene regulation in a particular organism can be accomplished in four stages. This begins with defining the data scope. We describe here the necessary information and resources, and outline the methods for obtaining data. The second stage consists of designing the schema, which involves defining the entire arrangement of the database in a systematic plan. The third stage is the implementation, defined by actualization of the database by using software according to a predefined schema. The final stage is development, where the database is made available to users in a web-accessible system. The result is a knowledgebase that integrates all the information pertaining to gene regulation, and which is easily expandable and transferable.
Full Text Available Formation of a dorsoventral axis is a key event in the early development of most animal embryos. It is well established that bone morphogenetic proteins (Bmps and Wnts are key mediators of dorsoventral patterning in vertebrates. In the cephalochordate amphioxus, genes encoding Bmps and transcription factors downstream of Bmp signaling such as Vent are expressed in patterns reminiscent of those of their vertebrate orthologues. However, the key question is whether the conservation of expression patterns of network constituents implies conservation of functional network interactions, and if so, how an increased functional complexity can evolve. Using heterologous systems, namely by reporter gene assays in mammalian cell lines and by transgenesis in medaka fish, we have compared the gene regulatory network implicated in dorsoventral patterning of the basal chordate amphioxus and vertebrates. We found that Bmp but not canonical Wnt signaling regulates promoters of genes encoding homeodomain proteins AmphiVent1 and AmphiVent2. Furthermore, AmphiVent1 and AmphiVent2 promoters appear to be correctly regulated in the context of a vertebrate embryo. Finally, we show that AmphiVent1 is able to directly repress promoters of AmphiGoosecoid and AmphiChordin genes. Repression of genes encoding dorsal-specific signaling molecule Chordin and transcription factor Goosecoid by Xenopus and zebrafish Vent genes represents a key regulatory interaction during vertebrate axis formation. Our data indicate high evolutionary conservation of a core Bmp-triggered gene regulatory network for dorsoventral patterning in chordates and suggest that co-option of the canonical Wnt signaling pathway for dorsoventral patterning in vertebrates represents one of the innovations through which an increased morphological complexity of vertebrate embryo is achieved.
Maldonado, Elaina M.; Leoncikas, Vytautas; Fisher, Ciarán P.; Moore, J. Bernadette; Plant, Nick J.
The scope of physiologically based pharmacokinetic (PBPK) modeling can be expanded by assimilation of the mechanistic models of intracellular processes from systems biology field. The genome scale metabolic networks (GSMNs) represent a whole set of metabolic enzymes expressed in human tissues. Dynamic models of the gene regulation of key drug metabolism enzymes are available. Here, we introduce GSMNs and review ongoing work on integration of PBPK, GSMNs, and metabolic gene regulation. We demonstrate example models. PMID:28782239
Full Text Available Abstract Background Iron homeostasis of Shewanella oneidensis, a γ-proteobacterium possessing high iron content, is regulated by a global transcription factor Fur. However, knowledge is incomplete about other biological pathways that respond to changes in iron concentration, as well as details of the responses. In this work, we integrate physiological, transcriptomics and genetic approaches to delineate the iron response of S. oneidensis. Results We show that the iron response in S. oneidensis is a rapid process. Temporal gene expression profiles were examined for iron depletion and repletion, and a gene co-expression network was reconstructed. Modules of iron acquisition systems, anaerobic energy metabolism and protein degradation were the most noteworthy in the gene network. Bioinformatics analyses suggested that genes in each of the modules might be regulated by DNA-binding proteins Fur, CRP and RpoH, respectively. Closer inspection of these modules revealed a transcriptional regulator (SO2426 involved in iron acquisition and ten transcriptional factors involved in anaerobic energy metabolism. Selected genes in the network were analyzed by genetic studies. Disruption of genes encoding a putative alcaligin biosynthesis protein (SO3032 and a gene previously implicated in protein degradation (SO2017 led to severe growth deficiency under iron depletion conditions. Disruption of a novel transcriptional factor (SO1415 caused deficiency in both anaerobic iron reduction and growth with thiosulfate or TMAO as an electronic acceptor, suggesting that SO1415 is required for specific branches of anaerobic energy metabolism pathways. Conclusion Using a reconstructed gene network, we identified major biological pathways that were differentially expressed during iron depletion and repletion. Genetic studies not only demonstrated the importance of iron acquisition and protein degradation for iron depletion, but also characterized a novel transcriptional
McDougall, Carmel; Degnan, Bernard M
Evidence that conserved developmental gene-regulatory networks can change as a unit during deutersostome evolution emerges from a study published in BMC Biology. This shows that genes consistently expressed in anterior brain patterning in hemichordates and chordates are expressed in a similar spatial pattern in another deuterostome, an asteroid echinoderm (sea star), but in a completely different developmental context (the animal-vegetal axis). This observation has implications for hypotheses on the type of development present in the deuterostome common ancestor.
Full Text Available The analysis of gene expression data has shown that transcriptionally coordinated (co-expressed genes are often functionally related, enabling scientists to use expression data in gene function prediction. This Focused Review discusses our original paper (Large-scale co-expression approach to dissect secondary cell wall formation across plant species, Frontiers in Plant Science 2:23. In this paper we applied cross-species analysis to co-expression networks of genes involved in cellulose biosynthesis. We show that the co-expression networks from different species are highly similar, indicating that whole biological pathways are conserved across species. This finding has two important implications. First, the analysis can transfer gene function annotation from well-studied plants, such as Arabidopsis, to other, uncharacterized plant species. As the analysis finds genes that have similar sequence and similar expression pattern across different organisms, functionally equivalent genes can be identified. Second, since co-expression analyses are often noisy, a comparative analysis should have higher performance, as parts of co-expression networks that are conserved are more likely to be functionally relevant. In this Focused Review, we outline the comparative analysis done in the original paper and comment on the recent advances and approaches that allow comparative analyses of co-function networks. We hypothesize that, in comparison to simple co-expression analysis, comparative analysis would yield more accurate gene function predictions. Finally, by combining comparative analysis with genomic information of green plants, we propose a possible composition of cellulose biosynthesis machinery during earlier stages of plant evolution.
Yang, Yunfeng; Harris, Daniel P.; Luo, Feng; Xiong, Wenlu; Joachimiak, Marcin; Wu, Liyou; Dehal, Paramvir; Jacobsen, Janet; Yang, Zamin; Palumbo, Anthony V.; Arkin, Adam P.; Zhou, Jizhong
Background: Iron homeostasis of Shewanella oneidensis, a gamma-proteobacterium possessing high iron content, is regulated by a global transcription factor Fur. However, knowledge is incomplete about other biological pathways that respond to changes in iron concentration, as well as details of the responses. In this work, we integrate physiological, transcriptomics and genetic approaches to delineate the iron response of S. oneidensis. Results: We show that the iron response in S. oneidensis is a rapid process. Temporal gene expression profiles were examined for iron depletion and repletion, and a gene co-expression network was reconstructed. Modules of iron acquisition systems, anaerobic energy metabolism and protein degradation were the most noteworthy in the gene network. Bioinformatics analyses suggested that genes in each of the modules might be regulated by DNA-binding proteins Fur, CRP and RpoH, respectively. Closer inspection of these modules revealed a transcriptional regulator (SO2426) involved in iron acquisition and ten transcriptional factors involved in anaerobic energy metabolism. Selected genes in the network were analyzed by genetic studies. Disruption of genes encoding a putative alcaligin biosynthesis protein (SO3032) and a gene previously implicated in protein degradation (SO2017) led to severe growth deficiency under iron depletion conditions. Disruption of a novel transcriptional factor (SO1415) caused deficiency in both anaerobic iron reduction and growth with thiosulfate or TMAO as an electronic acceptor, suggesting that SO1415 is required for specific branches of anaerobic energy metabolism pathways. Conclusions: Using a reconstructed gene network, we identified major biological pathways that were differentially expressed during iron depletion and repletion. Genetic studies not only demonstrated the importance of iron acquisition and protein degradation for iron depletion, but also characterized a novel transcriptional factor (SO1415) with a
Sewak, Mihir S.; Reddy, Narender P.; Duan, Zhong-Hui
Analysis of gene expression data provides an objective and efficient technique for sub‑classification of leukemia. The purpose of the present study was to design a committee neural networks based classification systems to subcategorize leukemia gene expression data. In the study, a binary classification system was considered to differentiate acute lymphoblastic leukemia from acute myeloid leukemia. A ternary classification system which classifies leukemia expression data into three subclasses...
Fan, Yue; Wang, Xiao; Peng, Qinke
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab p...
Ludovini, Vienna; Bianconi, Fortunato; Siggillino, Annamaria; Piobbico, Danilo; Vannucci, Jacopo; Metro, Giulio; Chiari, Rita; Bellezza, Guido; Puma, Francesco; Della Fazia, Maria Agnese; Servillo, Giuseppe; Crinò, Lucio
Risk assessment and treatment choice remains a challenge in early non-small-cell lung cancer (NSCLC). The aim of this study was to identify novel genes involved in the risk of early relapse (ER) compared to no relapse (NR) in resected lung adenocarcinoma (AD) patients using a combination of high throughput technology and computational analysis. We identified 18 patients (n.13 NR and n.5 ER) with stage I AD. Frozen samples of patients in ER, NR and corresponding normal lung (NL) were subjected to Microarray technology and quantitative-PCR (Q-PCR). A gene network computational analysis was performed to select predictive genes. An independent set of 79 ADs stage I samples was used to validate selected genes by Q-PCR.From microarray analysis we selected 50 genes, using the fold change ratio of ER versus NR. They were validated both in pool and individually in patient samples (ER and NR) by Q-PCR. Fourteen increased and 25 decreased genes showed a concordance between two methods. They were used to perform a computational gene network analysis that identified 4 increased (HOXA10, CLCA2, AKR1B10, FABP3) and 6 decreased (SCGB1A1, PGC, TFF1, PSCA, SPRR1B and PRSS1) genes. Moreover, in an independent dataset of ADs samples, we showed that both high FABP3 expression and low SCGB1A1 expression was associated with a worse disease-free survival (DFS).Our results indicate that it is possible to define, through gene expression and computational analysis, a characteristic gene profiling of patients with an increased risk of relapse that may become a tool for patient selection for adjuvant therapy.
Smita, Shuchi; Katiyar, Amit; Chinnusamy, Viswanathan; Pandey, Dev M; Bansal, Kailash C
MYB transcription factor (TF) is one of the largest TF families and regulates defense responses to various stresses, hormone signaling as well as many metabolic and developmental processes in plants. Understanding these regulatory hierarchies of gene expression networks in response to developmental and environmental cues is a major challenge due to the complex interactions between the genetic elements. Correlation analyses are useful to unravel co-regulated gene pairs governing biological process as well as identification of new candidate hub genes in response to these complex processes. High throughput expression profiling data are highly useful for construction of co-expression networks. In the present study, we utilized transcriptome data for comprehensive regulatory network studies of MYB TFs by "top-down" and "guide-gene" approaches. More than 50% of OsMYBs were strongly correlated under 50 experimental conditions with 51 hub genes via "top-down" approach. Further, clusters were identified using Markov Clustering (MCL). To maximize the clustering performance, parameter evaluation of the MCL inflation score (I) was performed in terms of enriched GO categories by measuring F-score. Comparison of co-expressed cluster and clads analyzed from phylogenetic analysis signifies their evolutionarily conserved co-regulatory role. We utilized compendium of known interaction and biological role with Gene Ontology enrichment analysis to hypothesize function of coexpressed OsMYBs. In the other part, the transcriptional regulatory network analysis by "guide-gene" approach revealed 40 putative targets of 26 OsMYB TF hubs with high correlation value utilizing 815 microarray data. The putative targets with MYB-binding cis-elements enrichment in their promoter region, functional co-occurrence as well as nuclear localization supports our finding. Specially, enrichment of MYB binding regions involved in drought-inducibility implying their regulatory role in drought response in rice
Full Text Available MYB transcription factor (TF is one of the largest TF families and regulates defense responses to various stresses, hormone signaling as well as many metabolic and developmental processes in plants. Understanding these regulatory hierarchies of gene expression networks in response to developmental and environmental cues is a major challenge due to the complex interactions between the genetic elements. Correlation analyses are useful to unravel co-regulated gene pairs governing biological process as well as identification of new candidate hub genes in response to these complex processes. High throughput expression profiling data are highly useful for construction of co-expression networks. In the present study, we utilized transcriptome data for comprehensive regulatory network studies of MYB TFs by top down and guide gene approaches. More than 50% of OsMYBs were strongly correlated under fifty experimental conditions with 51 hub genes via top down approach. Further, clusters were identified using Markov Clustering (MCL. To maximize the clustering performance, parameter evaluation of the MCL inflation score (I was performed in terms of enriched GO categories by measuring F-score. Comparison of co-expressed cluster and clads analyzed from phylogenetic analysis signifies their evolutionarily conserved co-regulatory role. We utilized compendium of known interaction and biological role with Gene Ontology enrichment analysis to hypothesize function of coexpressed OsMYBs. In the other part, the transcriptional regulatory network analysis by guide gene approach revealed 40 putative targets of 26 OsMYB TF hubs with high correlation value utilizing 815 microarray data. The putative targets with MYB-binding cis-elements enrichment in their promoter region, functional co-occurrence as well as nuclear localization supports our finding. Specially, enrichment of MYB binding regions involved in drought-inducibility implying their regulatory role in drought
Yang, M-R; Zhang, Y; Wu, X-X; Chen, W
RNA-seq data of hepatocellular carcinoma (HCC) was analyzed to identify critical genes related to the pathogenesis and prognosis. Three RNA-seq datasets of HCC (GSE69164, GSE63863 and GSE55758) were downloaded from Gene Expression Omnibus (GEO), while another dataset including 54 HCC cases with survival time was obtained from The Cancer Genome Atlas (TCGA). Differentially expressed genes (DEGs) were identified by significant analysis of microarrays (SAM) method using package samr of R. As followed, we constructed a protein-protein interaction (PPI) network based on the information in Human Protein Reference Database (HPRD). Modules in the PPI network were identified with MCODE method using plugin clusterViz of CytoScape. Gene Ontology (GO) enrichment analysis and pathway enrichment analysis were performed with DAVID. The difference in survival curves was analyzed with Kaplan-Meier (K-M) method using package survival. A total of 2572 DEGs were identified in the 3 datasets from GEO (GSE69164, GSE63863 and GSE55758). The PPI network was constructed including 660 nodes and 1008 edges, and 4 modules were disclosed in the network. Module A (containing 244 DEGs) was found to related to HCC closely, which genes were involved in transcription factor binding, protein metabolism as well as regulation of apoptosis. Nine hub genes were identified in the module A, including PRKCA, YWHAZ, KRT18, NDRG1, HSPA1A, HSP90AA1, HSF1, IKGKB and UBE21. The network provides the protein-protein interaction of these critical genes, which were implicated in the pathogenesis of HCC. Survival analysis showed that there is a significant difference between two groups classified by the genes in module A. Further Univariate Cox regression analysis showed that 72 genes were associated with survival time significantly, such as NPM1, PRKDC, SPARC, HMGA1, COL1A1 and COL1A2. Nine critical genes related to the pathogenesis and 72 potential prognostic markers were revealed in HCC by the network and module
Penfold, Christopher A; Shifaz, Ahmed; Brown, Paul E; Nicholson, Ann; Wild, David L
Here we introduce the causal structure identification (CSI) package, a Gaussian process based approach to inferring gene regulatory networks (GRNs) from multiple time series data. The standard CSI approach infers a single GRN via joint learning from multiple time series datasets; the hierarchical approach (HCSI) infers a separate GRN for each dataset, albeit with the networks constrained to favor similar structures, allowing for the identification of context specific networks. The software is implemented in MATLAB and includes a graphical user interface (GUI) for user friendly inference. Finally the GUI can be connected to high performance computer clusters to facilitate analysis of large genomic datasets.
Davin, Nicolas; Edger, Patrick P; Hefer, Charles A; Mizrachi, Eshchar; Schuetz, Mathias; Smets, Erik; Myburg, Alexander A; Douglas, Carl J; Schranz, Michael E; Lens, Frederic
Many plant genes are known to be involved in the development of cambium and wood, but how the expression and functional interaction of these genes determine the unique biology of wood remains largely unknown. We used the soc1ful loss of function mutant - the woodiest genotype known in the otherwise herbaceous model plant Arabidopsis - to investigate the expression and interactions of genes involved in secondary growth (wood formation). Detailed anatomical observations of the stem in combination with mRNA sequencing were used to assess transcriptome remodeling during xylogenesis in wild-type and woody soc1ful plants. To interpret the transcriptome changes, we constructed functional gene association networks of differentially expressed genes using the STRING database. This analysis revealed functionally enriched gene association hubs that are differentially expressed in herbaceous and woody tissues. In particular, we observed the differential expression of genes related to mechanical stress and jasmonate biosynthesis/signaling during wood formation in soc1ful plants that may be an effect of greater tension within woody tissues. Our results suggest that habit shifts from herbaceous to woody life forms observed in many angiosperm lineages could have evolved convergently by genetic changes that modulate the gene expression and interaction network, and thereby redeploy the conserved wood developmental program. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.
Full Text Available This paper proposes a novel algorithm for inferring gene regulatory networks which makes use of cubature Kalman filter (CKF and Kalman filter (KF techniques in conjunction with compressed sensing methods. The gene network is described using a state-space model. A nonlinear model for the evolution of gene expression is considered, while the gene expression data is assumed to follow a linear Gaussian model. The hidden states are estimated using CKF. The system parameters are modeled as a Gauss-Markov process and are estimated using compressed sensing-based KF. These parameters provide insight into the regulatory relations among the genes. The Cramér-Rao lower bound of the parameter estimates is calculated for the system model and used as a benchmark to assess the estimation accuracy. The proposed algorithm is evaluated rigorously using synthetic data in different scenarios which include different number of genes and varying number of sample points. In addition, the algorithm is tested on the DREAM4 in silico data sets as well as the in vivo data sets from IRMA network. The proposed algorithm shows superior performance in terms of accuracy, robustness, and scalability.
Wang, Huan; Hu, Jing-Bo; Xu, Chuan-Yun; Zhang, De-Hai; Yan, Qian; Xu, Ming; Cao, Ke-Fei; Zhang, Xu-Sheng
Complex network approach has become an effective way to describe interrelationships among large amounts of biological data, which is especially useful in finding core functions and global behavior of biological systems. Hypertension is a complex disease caused by many reasons including genetic, physiological, psychological and even social factors. In this paper, based on the information of biological pathways, we construct a network model of hypertension-related genes of the salt-sensitive rat to explore the interrelationship between genes. Statistical and topological characteristics show that the network has the small-world but not scale-free property, and exhibits a modular structure, revealing compact and complex connections among these genes. By the threshold of integrated centrality larger than 0.71, seven key hub genes are found: Jun, Rps6kb1, Cycs, Creb312, Cdk4, Actg1 and RT1-Da. These genes should play an important role in hypertension, suggesting that the treatment of hypertension should focus on the combination of drugs on multiple genes.
Khurana, Vikram; Peng, Jian; Chung, Chee Yeun; Auluck, Pavan K; Fanning, Saranna; Tardiff, Daniel F; Bartels, Theresa; Koeva, Martina; Eichhorn, Stephen W; Benyamini, Hadar; Lou, Yali; Nutter-Upham, Andy; Baru, Valeriya; Freyzon, Yelena; Tuncbag, Nurcan; Costanzo, Michael; San Luis, Bryan-Joseph; Schöndorf, David C; Barrasa, M Inmaculada; Ehsani, Sepehr; Sanjana, Neville; Zhong, Quan; Gasser, Thomas; Bartel, David P; Vidal, Marc; Deleidi, Michela; Boone, Charles; Fraenkel, Ernest; Berger, Bonnie; Lindquist, Susan
Numerous genes and molecular pathways are implicated in neurodegenerative proteinopathies, but their inter-relationships are poorly understood. We systematically mapped molecular pathways underlying the toxicity of alpha-synuclein (α-syn), a protein central to Parkinson's disease. Genome-wide screens in yeast identified 332 genes that impact α-syn toxicity. To "humanize" this molecular network, we developed a computational method, TransposeNet. This integrates a Steiner prize-collecting approach with homology assignment through sequence, structure, and interaction topology. TransposeNet linked α-syn to multiple parkinsonism genes and druggable targets through perturbed protein trafficking and ER quality control as well as mRNA metabolism and translation. A calcium signaling hub linked these processes to perturbed mitochondrial quality control and function, metal ion transport, transcriptional regulation, and signal transduction. Parkinsonism gene interaction profiles spatially opposed in the network (ATP13A2/PARK9 and VPS35/PARK17) were highly distinct, and network relationships for specific genes (LRRK2/PARK8, ATXN2, and EIF4G1/PARK18) were confirmed in patient induced pluripotent stem cell (iPSC)-derived neurons. This cross-species platform connected diverse neurodegenerative genes to proteinopathy through specific mechanisms and may facilitate patient stratification for targeted therapy. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Werhli, Adriano V; Husmeier, Dirk
There have been various attempts to improve the reconstruction of gene regulatory networks from microarray data by the systematic integration of biological prior knowledge. Our approach is based on pioneering work by Imoto et al. where the prior knowledge is expressed in terms of energy functions, from which a prior distribution over network structures is obtained in the form of a Gibbs distribution. The hyperparameters of this distribution represent the weights associated with the prior knowledge relative to the data. We have derived and tested a Markov chain Monte Carlo (MCMC) scheme for sampling networks and hyperparameters simultaneously from the posterior distribution, thereby automatically learning how to trade off information from the prior knowledge and the data. We have extended this approach to a Bayesian coupling scheme for learning gene regulatory networks from a combination of related data sets, which were obtained under different experimental conditions and are therefore potentially associated with different active subpathways. The proposed coupling scheme is a compromise between (1) learning networks from the different subsets separately, whereby no information between the different experiments is shared; and (2) learning networks from a monolithic fusion of the individual data sets, which does not provide any mechanism for uncovering differences between the network structures associated with the different experimental conditions. We have assessed the viability of all proposed methods on data related to the Raf signaling pathway, generated both synthetically and in cytometry experiments.
Full Text Available Abstract Background The availability of large-scale high-throughput data possesses considerable challenges toward their functional analysis. For this reason gene network inference methods gained considerable interest. However, our current knowledge, especially about the influence of the structure of a gene network on its inference, is limited. Results In this paper we present a comprehensive investigation of the structural influence of gene networks on the inferential characteristics of C3NET - a recently introduced gene network inference algorithm. We employ local as well as global performance metrics in combination with an ensemble approach. The results from our numerical study for various biological and synthetic network structures and simulation conditions, also comparing C3NET with other inference algorithms, lead a multitude of theoretical and practical insights into the working behavior of C3NET. In addition, in order to facilitate the practical usage of C3NET we provide an user-friendly R package, called c3net, and describe its functionality. It is available from https://r-forge.r-project.org/projects/c3net and from the CRAN package repository. Conclusions The availability of gene network inference algorithms with known inferential properties opens a new era of large-scale screening experiments that could be equally beneficial for basic biological and biomedical research with auspicious prospects. The availability of our easy to use software package c3net may contribute to the popularization of such methods. Reviewers This article was reviewed by Lev Klebanov, Joel Bader and Yuriy Gusev.
Xenitidis, P; Seimenis, I; Kakolyris, S; Adamopoulos, A
High-throughput technology like microarrays is widely used in the inference of gene regulatory networks (GRNs). We focused on time series data since we are interested in the dynamics of GRNs and the identification of dynamic networks. We evaluated the amount of information that exists in artificial time series microarray data and the ability of an inference process to produce accurate models based on them. We used dynamic artificial gene regulatory networks in order to create artificial microarray data. Key features that characterize microarray data such as the time separation of directly triggered genes, the percentage of directly triggered genes and the triggering function type were altered in order to reveal the limits that are imposed by the nature of microarray data on the inference process. We examined the effect of various factors on the inference performance such as the network size, the presence of noise in microarray data, and the network sparseness. We used a system theory approach and examined the relationship between the pole placement of the inferred system and the inference performance. We examined the relationship between the inference performance in the time domain and the true system parameter identification. Simulation results indicated that time separation and the percentage of directly triggered genes are crucial factors. Also, network sparseness, the triggering function type and noise in input data affect the inference performance. When two factors were simultaneously varied, it was found that variation of one parameter significantly affects the dynamic response of the other. Crucial factors were also examined using a real GRN and acquired results confirmed simulation findings with artificial data. Different initial conditions were also used as an alternative triggering approach. Relevant results confirmed that the number of datasets constitutes the most significant parameter with regard to the inference performance. Copyright © 2017 Elsevier
Yang, Lei; Wang, Jizhe; Lv, Yingli; Hao, Dapeng; Zuo, Yongchun; Li, Xiang; Jiang, Wei
The TATA box is the core sequence of the promoter and the binding site of many transcription factors. Based on the presence or absence of TATA box, genes can be defined as TATA-containing or TATA-less genes. Many important stress-response functions and highly variable expression patterns are found to be correlated with the TATA box. However, until now, the relationships and differences between TATA-containing and TATA-less genes remain unclear. In this study, based on the transcriptional profiling of the Saccharomyces cerevisiae genome, the perturbation sensitivity (PS) network is constructed. The topological and biological properties are used to investigate differences between TATA-containing and TATA-less genes. Significant differences are found in all topological properties and most of the biological properties. Notably, the TF number, determined mathematically by the number of transcription factors regulating a gene, demonstrates the highest discrimination between TATA-containing and TATA-less genes when all properties are estimated by the F-score. Copyright © 2014 Elsevier Inc. All rights reserved.
Full Text Available The identification of disease-causing genes is a fundamental challenge in human health and of great importance in improving medical care, and provides a better understanding of gene functions. Recent computational approaches based on the interactions among human proteins and disease similarities have shown their power in tackling the issue. In this paper, a novel systematic and global method that integrates two heterogeneous networks for prioritizing candidate disease-causing genes is provided, based on the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein interactions. In this method, the association score function between a query disease and a candidate gene is defined as the weighted sum of all the association scores between similar diseases and neighbouring genes. Moreover, the topological correlation of these two heterogeneous networks can be incorporated into the definition of the score function, and finally an iterative algorithm is designed for this issue. This method was tested with 10-fold cross-validation on all 1,126 diseases that have at least a known causal gene, and it ranked the correct gene as one of the top ten in 622 of all the 1,428 cases, significantly outperforming a state-of-the-art method called PRINCE. The results brought about by this method were applied to study three multi-factorial disorders: breast cancer, Alzheimer disease and diabetes mellitus type 2, and some suggestions of novel causal genes and candidate disease-causing subnetworks were provided for further investigation.
... classics from ancient china. The assumption is that since China's political and military leaders state openly that their strategy is based on traditional Chinese strategic concepts, a study of ancient classics on strategy...
Full Text Available Echinoderms, which are phylogenetically related to vertebrates and produce large numbers of transparent embryos that can be experimentally manipulated, offer many advantages for the analysis of the gene regulatory networks (GRN regulating germ layer formation. During development of the sea urchin embryo, the ectoderm is the source of signals that pattern all three germ layers along the dorsal-ventral axis. How this signaling center controls patterning and morphogenesis of the embryo is not understood. Here, we report a large-scale analysis of the GRN deployed in response to the activity of this signaling center in the embryos of the Mediterranean sea urchin Paracentrotus lividus, in which studies with high spatial resolution are possible. By using a combination of in situ hybridization screening, overexpression of mRNA, recombinant ligand treatments, and morpholino-based loss-of-function studies, we identified a cohort of transcription factors and signaling molecules expressed in the ventral ectoderm, dorsal ectoderm, and interposed neurogenic ("ciliary band" region in response to the known key signaling molecules Nodal and BMP2/4 and defined the epistatic relationships between the most important genes. The resultant GRN showed a number of striking features. First, Nodal was found to be essential for the expression of all ventral and dorsal marker genes, and BMP2/4 for all dorsal genes. Second, goosecoid was identified as a central player in a regulatory sub-circuit controlling mouth formation, while tbx2/3 emerged as a critical factor for differentiation of the dorsal ectoderm. Finally, and unexpectedly, a neurogenic ectoderm regulatory circuit characterized by expression of "ciliary band" genes was triggered in the absence of TGF beta signaling. We propose a novel model for ectoderm regionalization, in which neural ectoderm is the default fate in the absence of TGF beta signaling, and suggest that the stomodeal and neural subcircuits that we
Full Text Available Elucidating gene regulatory network (GRN from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks.
In this series of articles, we intend to have a glimpse of some of the landmarks in ancient In- dian mathematics with special emphasis on num- ber theory. This issue features a brief overview of some of the high peaks of mathematics in an- cient India. In the next part we shall describe. Aryabhata's general solution in integers ...
Gadecki, Victoria L.
Standing in awe in Xian, China, at the Terra Cotta warrior archaeological site, the author thought of sharing this experience and excitement with her sixth-grade students. She decided to let her students carve patterns of the ancient soldiers to understand their place in Chinese history. They would make block prints and print multiple soldiers on…
Hobert, Leah; Binello, Emanuela
Trepanation, the process of making a burr hole in the skull to access the brain, is an ancient form of a primitive craniotomy. There is widespread evidence of contributions made to this practice by ancient civilizations in Europe, Africa, and South America, where archaeologists have unearthed thousands of trepanned skulls dating back to the Neolithic period. Little is known about trepanation in China, and it is commonly believed that the Chinese used only traditional Chinese medicine and nonsurgical methods for treating brain injuries. However, a thorough analysis of the available archeological and literary evidence reveals that trepanation was widely practiced throughout China thousands of years ago. A significant number of trepanned Chinese skulls have been unearthed showing signs of healing and suggesting that patients survived after surgery. Trepanation was likely performed for therapeutic and spiritual reasons. Medical and historical works from Chinese literature contain descriptions of primitive neurosurgical procedures, including stories of surgeons, such as the legendary Hua Tuo, and surgical techniques used for the treatment of brain pathologies. The lack of translation of Chinese reports into the English language and the lack of publications on this topic in the English language may have contributed to the misconception that ancient China was devoid of trepanation. This article summarizes the available evidence attesting to the performance of successful primitive cranial surgery in ancient China. Copyright © 2016 Elsevier Inc. All rights reserved.
Turk, Laraine D.
"Ancient Egypt," an upper-division, non-required history course covering Egypt from pre-dynastic time through the Roman domination is described. General descriptive information is presented first, including the method of grading, expectation of student success rate, long-range course objectives, procedures for revising the course, major…
This teacher resource book provides information on ancient Egypt via short essays, photographs, maps, charts, and drawings. Egyptian social and religious life, including writing, art, architecture, and even the practice of mummification, is conveniently summarized for the teacher or other practitioner in a series of one to three page articles with…
Number Theory for its own sake, as a great 'intellectual challenge, has a long history, particularly here in India. Already in the 7th century, Brahmagupta made impor- tant contributions to what is now known (incorrectly) as. Pell's equation.: Michael Atiyah (, p.913). In number theory, the grandest achievements of ancient.
Hughes, J Donald
The image of the classical Mediterranean environment of the Greeks and Romans had a formative influence on the art, literature, and historical perception of modern Europe and America. How closely does is this image congruent with the ancient environment as it in reality existed? In particular, how forested was the ancient Mediterranean world, was there deforestation, and if so, what were its effects? The consensus of historians, geographers, and other scholars from the mid-nineteenth century through the first three quarters of the twentieth century was that human activities had depleted the forests to a major extent and caused severe erosion. My research confirmed this general picture. Since then, revisionist historians have questioned these conclusions, maintaining instead that little environmental damage was done to forests and soils in ancient Greco-Roman times. In a reconsideration of the question, this paper looks at recent scientific work providing proxy evidence for the condition of forests at various times in ancient history. I look at three scientific methodologies, namely anthracology, palynology, and computer modeling. Each of these avenues of research offers support for the concept of forest change, both in abundance and species composition, and episodes of deforestation and erosion, and confirms my earlier work.
The open-ended activities in this book are designed to extend the imagination and creativity of students and encourage students to examine their feelings and values about historic eras. Civilizations addressed include ancient Egypt, Greece, Rome, Mayan, Stonehenge, and Mesopotamia. The activities focus upon the cognitive and affective pupil…
SERIES I ARTICLE. Mathematics in Ancient India. 3. Brahmagupta's Lemma: The Samasabhavana. Amartya Kumar Dutta is an Associate Professor of. Mathematics at the. Indian Statistical. Institute, Kolkata. His research interest is in commutative algebra. Part 1, An overview, Reso- nance, VoL7, No.4, pp.4-19,. 2002. Part 2.
Full Text Available There are many controversies that surround the problem of incest in Ancient Egypt. One of them is belief that incest was practiced exclusively by the Royal families, which is incorrect. I will try to show that at this time we don’t have satisfactory explanation of this kind of behavior, but that there are interesting suggestions for further research.
which plied between Kalinga and south east Asian countries. Nanda Raja, is said to have attacked Kalinga with the intention of getting access to the sea for the landlocked Kingdom of Magadha (Bihar). The ancient texa Artha Sastra (3rd-4th century B...
Sport and physical education —in Ancient Rome-, looked back to the physical ideals of the Greeks. In contrast, there was also a specific encouragement of spectacles and performance or general entertainment during the Imperial Era. In order to cater for the diverse shows, sophisticated buildings were constructed in Rome, and reproduced in all the built-up areas throughout the Empire. In fact, besides the important circus network, the most emblematic of these being Maximo's Circus, amphitheatre...
Rosenkrantz, Jesper T.; Aarts, Henk; Abee, Tjakko
Background: Salmonella Typhimurium is an important pathogen of human and animals. It shows a broad growth range and survives in harsh conditions. The aim of this study was to analyze transcriptional responses to a number of growth and stress conditions as well as the relationship of metabolic...... genes under a number of growth and stress conditions were used to construct a bipartite network connecting culture conditions and significantly regulated genes (transcriptional network). Also, a genome scale network was constructed for strain LT2. The latter connected genes with metabolic pathways...... pathways and/or cell functions at the genome-scale-level by network analysis, and further to explore whether highly connected genes ( hubs) in these networks were essential for growth, stress adaptation and virulence. Results: De novo generated as well as published transcriptional data for 425 selected...
Zelezniak, Aleksej; Sheridan, Steven; Patil, Kiran Raosaheb
biological perturbations, namely gene knockout, nutrient shock and nutrient change. While the kinetic constraints applied at the level of individual reactions were found to be poor descriptors of the mRNA-metabolite relationship, their use in the context of the network enabled us to correlate changes...
Zahadat, Payam; Christensen, David Johan; Schultz, Ulrik Pagh
Designing controllers for modular robots is difficult due to the distributed and dynamic nature of the robots. In this paper fractal gene regulatory networks are evolved to control modular robots in a distributed way. Experiments with different morphologies of modular robot are performed and the ...
Yu, Yang; Liu, Jie; Feng, Nuan; Song, Bo; Zheng, Zeyu
Studies of protein modules in a Protein-Protein Interaction (PPI) network contribute greatly to the understanding of biological mechanisms. With the development of computing science, computational approaches have played an important role in locating protein modules. In this paper, a new approach combining Gene Ontology and amino acid background frequency is introduced to detect the protein modules in the weighted PPI networks. The proposed approach mainly consists of three parts: the feature extraction, the weighted graph construction and the protein complex detection. Firstly, the topology-sequence information is utilized to present the feature of protein complex. Secondly, six types of the weighed graph are constructed by combining PPI network and Gene Ontology information. Lastly, protein complex algorithm is applied to the weighted graph, which locates the clusters based on three conditions, including density, network diameter and the included angle cosine. Experiments have been conducted on two protein complex benchmark sets for yeast and the results show that the approach is more effective compared to five typical algorithms with the performance of f-measure and precision. The combination of protein interaction network with sequence and gene ontology data is helpful to improve the performance and provide a optional method for protein module detection. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.
Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509
Varrault, Annie; Dantec, Christelle; Le Digarcher, Anne; Chotard, Laëtitia; Bilanges, Benoit; Parrinello, Hugues; Dubois, Emeric; Rialle, Stéphanie; Severac, Dany; Bouschet, Tristan; Journot, Laurent
PLAGL1/ZAC1 undergoes parental genomic imprinting, is paternally expressed, and is a member of the imprinted gene network (IGN). It encodes a zinc finger transcription factor with anti-proliferative activity and is a candidate tumor suppressor gene on 6q24 whose expression is frequently lost in various neoplasms. Conversely, gain of PLAGL1 function is responsible for transient neonatal diabetes mellitus, a rare genetic disease that results from defective pancreas development. In the present work, we showed that Plagl1 up-regulation was not associated with DNA damage-induced cell cycle arrest. It was rather associated with physiological cell cycle exit that occurred with contact inhibition, growth factor withdrawal, or cell differentiation. To gain insights into Plagl1 mechanism of action, we identified Plagl1 target genes by combining chromatin immunoprecipitation and genome-wide transcriptomics in transfected cell lines. Plagl1-elicited gene regulation correlated with multiple binding to the proximal promoter region through a GC-rich motif. Plagl1 target genes included numerous genes involved in signaling, cell adhesion, and extracellular matrix composition, including collagens. Plagl1 targets also included 22% of the 409 genes that make up the IGN. Altogether, this work identified Plagl1 as a transcription factor that coordinated the regulation of a subset of IGN genes and controlled extracellular matrix composition. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sanghyun
The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.
Full Text Available BACKGROUND: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. RESULTS: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. CONCLUSIONS: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.
Zulfiqar, Asma, E-mail: email@example.com [Department of Plant, Soil, and Insect Sciences, 270 Stockbridge Road, University of Massachusetts Amherst, MA 01003 (United States); Paulose, Bibin, E-mail: firstname.lastname@example.org [Department of Plant, Soil, and Insect Sciences, 270 Stockbridge Road, University of Massachusetts Amherst, MA 01003 (United States); Chhikara, Sudesh, E-mail: email@example.com [Department of Plant, Soil, and Insect Sciences, 270 Stockbridge Road, University of Massachusetts Amherst, MA 01003 (United States); Dhankher, Om Parkash, E-mail: firstname.lastname@example.org [Department of Plant, Soil, and Insect Sciences, 270 Stockbridge Road, University of Massachusetts Amherst, MA 01003 (United States)
Chromium pollution is a serious environmental problem with few cost-effective remediation strategies available. Crambe abyssinica (a member of Brassicaseae), a non-food, fast growing high biomass crop, is an ideal candidate for phytoremediation of heavy metals contaminated soils. The present study used a PCR-Select Suppression Subtraction Hybridization approach in C. abyssinica to isolate differentially expressed genes in response to Cr exposure. A total of 72 differentially expressed subtracted cDNAs were sequenced and found to represent 43 genes. The subtracted cDNAs suggest that Cr stress significantly affects pathways related to stress/defense, ion transporters, sulfur assimilation, cell signaling, protein degradation, photosynthesis and cell metabolism. The regulation of these genes in response to Cr exposure was further confirmed by semi-quantitative RT-PCR. Characterization of these differentially expressed genes may enable the engineering of non-food, high-biomass plants, including C. abyssinica, for phytoremediation of Cr-contaminated soils and sediments. - Highlights: > Molecular mechanism of Cr uptake and detoxification in plants is not well known. > We identified differentially regulated genes upon Cr exposure in Crambe abyssinica. > 72 Cr-induced subtracted cDNAs were sequenced and found to represent 43 genes. > Pathways linked to stress, ion transport, and sulfur assimilation were affected. > This is the first Cr transcriptome study in a crop with phytoremediation potential. - This study describes the identification and isolation of differentially expressed genes involved in chromium metabolism and detoxification in a non-food industrial oil crop Crambe abyssinica.
Full Text Available Abstract Background Keloids are protrusive claw-like scars that have a propensity to recur even after surgery, and its molecular etiology remains elusive. The goal of reverse engineering is to infer gene networks from observational data, thus providing insight into the inner workings of a cell. However, most attempts at modeling biological networks have been done using simulated data. This study aims to highlight some of the issues involved in working with experimental data, and at the same time gain some insights into the transcriptional regulatory mechanism present in keloid fibroblasts. Methods Microarray data from our previous study was combined with microarray data obtained from the literature as well as new microarray data generated by our group. For the physical approach, we used the fREDUCE algorithm for correlating expression values to binding motifs. For the influence approach, we compared the Bayesian algorithm BANJO with the information theoretic method ARACNE in terms of performance in recovering known influence networks obtained from the KEGG database. In addition, we also compared the performance of different normalization methods as well as different types of gene networks. Results Using the physical approach, we found consensus sequences that were active in the keloid condition, as well as some sequences that were responsive to steroids, a commonly used treatment for keloids. From the influence approach, we found that BANJO was better at recovering the gene networks compared to ARACNE and that transcriptional networks were better suited for network recovery compared to cytokine-receptor interaction networks and intracellular signaling networks. We also found that the NFKB transcriptional network that was inferred from normal fibroblast data was more accurate compared to that inferred from keloid data, suggesting a more robust network in the keloid condition. Conclusions Consensus sequences that were found from this study are
Gurtan, Allan M; Sharp, Phillip A
MicroRNAs (miRNAs) are key regulators of gene expression. They are conserved across species, expressed across cell types, and active against a large proportion of the transcriptome. The sequence-complementary mechanism of miRNA activity exploits combinatorial diversity, a property conducive to network-wide regulation of gene expression, and functional evidence supporting this hypothesized systems-level role has steadily begun to accumulate. The emerging models are exciting and will yield deep insight into the regulatory architecture of biology. However, because of the technical challenges facing the network-based study of miRNAs, many gaps remain. Here, we review mammalian miRNAs by describing recent advances in understanding their molecular activity and network-wide function. Copyright © 2013 Elsevier Ltd. All rights reserved.
Recent advances in proteomic and transcriptomic technologies resulted in the accumulation of vast amount of high-throughput data that span multiple biological processes and characteristics in different organisms. Much of the data come in the form of interaction networks and mRNA expression arrays. An important task in systems biology is functional modules discovery where the goal is to uncover well-connected sub-networks (modules). These discovered modules help to unravel the underlying mechanisms of the observed biological processes. While most of the existing module discovery methods use only the interaction data, in this work we propose, CLARM, which discovers biological modules by incorporating gene profiles data with protein-protein interaction networks. We demonstrate the effectiveness of CLARM on Yeast and Human interaction datasets, and gene expression and molecular function profiles. Experiments on these real datasets show that the CLARM approach is competitive to well established functional module discovery methods.
Mousavian, Zaynab; Kavousi, Kaveh; Masoudi-Nejad, Ali
"A Mathematical Theory of Communication", was published in 1948 by Claude Shannon to establish a framework that is now known as information theory. In recent decades, information theory has gained much attention in the area of systems biology. The aim of this paper is to provide a systematic review of those contributions that have applied information theory in inferring or understanding of biological systems. Based on the type of system components and the interactions between them, we classify the biological systems into 4 main classes: gene regulatory, metabolic, protein-protein interaction and signaling networks. In the first part of this review, we attempt to introduce most of the existing studies on two types of biological networks, including gene regulatory and metabolic networks, which are founded on the concepts of information theory. Copyright © 2015 Elsevier Ltd. All rights reserved.
Full Text Available Inferring a gene regulatory network from time-series gene expression data in systems biology is a challenging problem. Many methods have been suggested, most of which have a scalability limitation due to the combinatorial cost of searching a regulatory set of genes. In addition, they have focused on the accurate inference of a network structure only. Therefore, there is a pressing need to develop a network inference method to search regulatory genes efficiently and to predict the network dynamics accurately.In this study, we employed a Boolean network model with a restricted update rule scheme to capture coarse-grained dynamics, and propose a novel mutual information-based Boolean network inference (MIBNI method. Given time-series gene expression data as an input, the method first identifies a set of initial regulatory genes using mutual information-based feature selection, and then improves the dynamics prediction accuracy by iteratively swapping a pair of genes between sets of the selected regulatory genes and the other genes. Through extensive simulations with artificial datasets, MIBNI showed consistently better performance than six well-known existing methods, REVEAL, Best-Fit, RelNet, CST, CLR, and BIBN in terms of both structural and dynamics prediction accuracy. We further tested the proposed method with two real gene expression datasets for an Escherichia coli gene regulatory network and a fission yeast cell cycle network, and also observed better results using MIBNI compared to the six other methods.Taken together, MIBNI is a promising tool for predicting both the structure and the dynamics of a gene regulatory network.
Zhang, Xue; Acencio, Marcio Luis; Lemke, Ney
Essential proteins/genes are indispensable to the survival or reproduction of an organism, and the deletion of such essential proteins will result in lethality or infertility. The identification of essential genes is very important not only for understanding the minimal requirements for survival of an organism, but also for finding human disease genes and new drug targets. Experimental methods for identifying essential genes are costly, time-consuming, and laborious. With the accumulation of sequenced genomes data and high-throughput experimental data, many computational methods for identifying essential proteins are proposed, which are useful complements to experimental methods. In this review, we show the state-of-the-art methods for identifying essential genes and proteins based on machine learning and network topological features, point out the progress and limitations of current methods, and discuss the challenges and directions for further research. PMID:27014079
Cancer transcriptome analysis is one of the leading areas of Big Data science, biomarker, and pharmaceutical discovery, not to forget personalized medicine. Yet, cancer transcriptomics and postgenomic medicine require innovation in bioinformatics as well as comparison of the performance of available algorithms. In this data analytics context, the value of network generation and algorithms has been widely underscored for addressing the salient questions in cancer pathogenesis. Analysis of cancer trancriptome often results in complicated networks where identification of network modularity remains critical, for example, in delineating the "druggable" molecular targets. Network clustering is useful, but depends on the network topology in and of itself. Notably, the performance of different network-generating tools for network cluster (NC) identification has been little investigated to date. Hence, using gastric cancer (GC) transcriptomic datasets, we compared two algorithms for generating pathway versus gene regulatory network-based NCs, showing that the pathway-based approach better agrees with a reference set of cancer-functional contexts. Finally, by applying pathway-based NC identification to GC transcriptome datasets, we describe cancer NCs that associate with candidate therapeutic targets and biomarkers in GC. These observations collectively inform future research on cancer transcriptomics, drug discovery, and rational development of new analysis tools for optimal harnessing of omics data.
Verardo, L L; Silva, F F; Varona, L; Resende, M D V; Bastiaansen, J W M; Lopes, P S; Guimarães, S E F
The genetic improvement of reproductive traits such as the number of teats is essential to the success of the pig industry. As opposite to most SNP association studies that consider continuous phenotypes under Gaussian assumptions, this trait is characterized as a discrete variable, which could potentially follow other distributions, such as the Poisson. Therefore, in order to access the complexity of a counting random regression considering all SNPs simultaneously as covariate under a GWAS modeling, the Bayesian inference tools become necessary. Currently, another point that deserves to be highlighted in GWAS is the genetic dissection of complex phenotypes through candidate genes network derived from significant SNPs. We present a full Bayesian treatment of SNP association analysis for number of teats assuming alternatively Gaussian and Poisson distributions for this trait. Under this framework, significant SNP effects were identified by hypothesis tests using 95% highest posterior density intervals. These SNPs were used to construct associated candidate genes network aiming to explain the genetic mechanism behind this reproductive trait. The Bayesian model comparisons based on deviance posterior distribution indicated the superiority of Gaussian model. In general, our results suggest the presence of 19 significant SNPs, which mapped 13 genes. Besides, we predicted gene interactions through networks that are consistent with the mammals known breast biology (e.g., development of prolactin receptor signaling, and cell proliferation), captured known regulation binding sites, and provided candidate genes for that trait (e.g., TINAGL1 and ICK).
Wang, Rui-Sheng; Oldham, William M; Loscalzo, Joseph
Molecular oxygen is indispensable for cellular viability and function. Hypoxia is a stress condition in which oxygen demand exceeds supply. Low cellular oxygen content induces a number of molecular changes to activate regulatory pathways responsible for increasing the oxygen supply and optimizing cellular metabolism under limited oxygen conditions. Hypoxia plays critical roles in the pathobiology of many diseases, such as cancer, heart failure, myocardial ischemia, stroke, and chronic lung diseases. Although the complicated associations between hypoxia and cardiovascular (and cerebrovascular) diseases (CVD) have been recognized for some time, there are few studies that investigate their biological link from a systems biology perspective. In this study, we integrate hypoxia genes, CVD genes, and the human protein interactome in order to explore the relationship between hypoxia and cardiovascular diseases at a systems level. We show that hypoxia genes are much closer to CVD genes in the human protein interactome than that expected by chance. We also find that hypoxia genes play significant bridging roles in connecting different cardiovascular diseases. We construct a hypoxia-CVD bipartite network and find several interesting hypoxia-CVD modules with significant gene ontology similarity. Finally, we show that hypoxia genes tend to have more CVD interactors in the human interactome than in random networks of matching topology. Based on these observations, we can predict novel genes that may be associated with CVD. This network-based association study gives us a broad view of the relationships between hypoxia and cardiovascular diseases and provides new insights into the role of hypoxia in cardiovascular biology. (paper)
Full Text Available Background/Aims: Pediatric sepsis is a disease that threatens life of children. The incidence of pediatric sepsis is higher in developing countries due to various reasons, such as insufficient immunization and nutrition, water and air pollution, etc. Exploring the potential genes via different methods is of significance for the prevention and treatment of pediatric sepsis. This study aimed to identify potential genes associated with pediatric sepsis utilizing analysis of gene network and entropy. Methods: The mRNA expression in the blood samples collected from 20 septic children and 30 healthy controls was quantified by using Affymetrix HG-U133A microarray. Two condition-specific protein-protein interaction networks (PINs, one for the healthy control and the other one for the children with sepsis, were deduced by combining the fundamental human PINs with gene expression profiles in the two phenotypes. Subsequently, distinct modules from the two conditional networks were extracted by adopting a maximal clique-merging approach. Delta entropy (ΔS was calculated between sepsis and control modules. Results: Then, key genes displaying changes in gene composition were identified by matching the control and sepsis modules. Two objective modules were obtained, in which ribosomal protein RPL4 and RPL9 as well as TOP2A were probably considered as the key genes differentiating sepsis from healthy controls. Conclusion: According to previous reports and this work, TOP2A is the potential gene therapy target for pediatric sepsis. The relationship between pediatric sepsis and RPL4 and RPL9 needs further investigation.
Full Text Available Abstract Background We have recently identified a number of Quantitative Trait Loci (QTL contributing to the 2-fold muscle weight difference between the LG/J and SM/J mouse strains and refined their confidence intervals. To facilitate nomination of the candidate genes responsible for these differences we examined the transcriptome of the tibialis anterior (TA muscle of each strain by RNA-Seq. Results 13,726 genes were expressed in mouse skeletal muscle. Intersection of a set of 1061 differentially expressed transcripts with a mouse muscle Bayesian Network identified a coherent set of differentially expressed genes that we term the LG/J and SM/J Regulatory Network (LSRN. The integration of the QTL, transcriptome and the network analyses identified eight key drivers of the LSRN (Kdr, Plbd1, Mgp, Fah, Prss23, 2310014F06Rik, Grtp1, Stk10 residing within five QTL regions, which were either polymorphic or differentially expressed between the two strains and are strong candidates for quantitative trait genes (QTGs underlying muscle mass. The insight gained from network analysis including the ability to make testable predictions is illustrated by annotating the LSRN with knowledge-based signatures and showing that the SM/J state of the network corresponds to a more oxidative state. We validated this prediction by NADH tetrazolium reductase staining in the TA muscle revealing higher oxidative potential of the SM/J compared to the LG/J strain (p Conclusion Thus, integration of fine resolution QTL mapping, RNA-Seq transcriptome information and mouse muscle Bayesian Network analysis provides a novel and unbiased strategy for nomination of muscle QTGs.
Raluca G. Mateescu
Full Text Available Improvements in eating satisfaction will benefit consumers and should increase beef demand which is of interest to the beef industry. Tenderness, juiciness, and flavor are major determinants of the palatability of beef and are often used to reflect eating satisfaction. Carcass qualities are used as indicator traits for meat quality, with higher quality grade carcasses expected to relate to more tender and palatable meat. However, meat quality is a complex concept determined by many component traits making interpretation of genome-wide association studies (GWAS on any one component challenging to interpret. Recent approaches combining traditional GWAS with gene network interactions theory could be more efficient in dissecting the genetic architecture of complex traits. Phenotypic measures of 23 traits reflecting carcass characteristics, components of meat quality, along with mineral and peptide concentrations were used along with Illumina 54k bovine SNP genotypes to derive an annotated gene network associated with meat quality in 2,110 Angus beef cattle. The efficient mixed model association (EMMAX approach in combination with a genomic relationship matrix was used to directly estimate the associations between 54k SNP genotypes and each of the 23 component traits. Genomic correlated regions were identified by partial correlations which were further used along with an information theory algorithm to derive gene network clusters. Correlated SNP across 23 component traits were subjected to network scoring and visualization software to identify significant SNP. Significant pathways implicated in the meat quality complex through GO term enrichment analysis included angiogenesis, inflammation, transmembrane transporter activity, and receptor activity. These results suggest that network analysis using partial correlations and annotation of significant SNP can reveal the genetic architecture of complex traits and provide novel information regarding
Lee, Hangnoh; Cho, Dong-Yeon; Whitworth, Cale; Eisman, Robert; Phelps, Melissa; Roote, John; Kaufman, Thomas; Cook, Kevin; Russell, Steven; Przytycka, Teresa; Oliver, Brian
Deletions, commonly referred to as deficiencies by Drosophila geneticists, are valuable tools for mapping genes and for genetic pathway discovery via dose-dependent suppressor and enhancer screens. More recently, it has become clear that deviations from normal gene dosage are associated with multiple disorders in a range of species including humans. While we are beginning to understand some of the transcriptional effects brought about by gene dosage changes and the chromosome rearrangement breakpoints associated with them, much of this work relies on isolated examples. We have systematically examined deficiencies of the left arm of chromosome 2 and characterize gene-by-gene dosage responses that vary from collapsed expression through modest partial dosage compensation to full or even over compensation. We found negligible long-range effects of creating novel chromosome domains at deletion breakpoints, suggesting that cases of gene regulation due to altered nuclear architecture are rare. These rare cases include trans de-repression when deficiencies delete chromatin characterized as repressive in other studies. Generally, effects of breakpoints on expression are promoter proximal (~100bp) or in the gene body. Effects of deficiencies genome-wide are in genes with regulatory relationships to genes within the deleted segments, highlighting the subtle expression network defects in these sensitized genetic backgrounds.
Full Text Available Deletions, commonly referred to as deficiencies by Drosophila geneticists, are valuable tools for mapping genes and for genetic pathway discovery via dose-dependent suppressor and enhancer screens. More recently, it has become clear that deviations from normal gene dosage are associated with multiple disorders in a range of species including humans. While we are beginning to understand some of the transcriptional effects brought about by gene dosage changes and the chromosome rearrangement breakpoints associated with them, much of this work relies on isolated examples. We have systematically examined deficiencies of the left arm of chromosome 2 and characterize gene-by-gene dosage responses that vary from collapsed expression through modest partial dosage compensation to full or even over compensation. We found negligible long-range effects of creating novel chromosome domains at deletion breakpoints, suggesting that cases of gene regulation due to altered nuclear architecture are rare. These rare cases include trans de-repression when deficiencies delete chromatin characterized as repressive in other studies. Generally, effects of breakpoints on expression are promoter proximal (~100bp or in the gene body. Effects of deficiencies genome-wide are in genes with regulatory relationships to genes within the deleted segments, highlighting the subtle expression network defects in these sensitized genetic backgrounds.
Davis Anna C
Full Text Available Abstract Background With the advent of increasingly efficient means to obtain genetic information, a great insurgence of data has resulted, leading to the need for methods for analyzing this data beyond that of tra