WorldWideScience

Sample records for gene flow inferred

  1. Inferred vs realized patterns of gene flow: an analysis of population structure in the Andros Island Rock Iguana.

    Science.gov (United States)

    Colosimo, Giuliano; Knapp, Charles R; Wallace, Lisa E; Welch, Mark E

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p<0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.

  2. Inferred vs Realized Patterns of Gene Flow: An Analysis of Population Structure in the Andros Island Rock Iguana

    Science.gov (United States)

    Colosimo, Giuliano; Knapp, Charles R.; Wallace, Lisa E.; Welch, Mark E.

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes. PMID:25229344

  3. Inferred vs realized patterns of gene flow: an analysis of population structure in the Andros Island Rock Iguana.

    Directory of Open Access Journals (Sweden)

    Giuliano Colosimo

    Full Text Available Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p<<0.01. These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.

  4. Bears in a forest of gene trees: phylogenetic inference is complicated by incomplete lineage sorting and gene flow.

    Science.gov (United States)

    Kutschera, Verena E; Bidon, Tobias; Hailer, Frank; Rodi, Julia L; Fain, Steven R; Janke, Axel

    2014-08-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  5. Fused Regression for Multi-source Gene Regulatory Network Inference.

    Directory of Open Access Journals (Sweden)

    Kari Y Lam

    2016-12-01

    Full Text Available Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method's utility in learning from data collected on different experimental platforms.

  6. Inferring Gene Regulatory Networks Using Conditional Regulation Pattern to Guide Candidate Genes.

    Directory of Open Access Journals (Sweden)

    Fei Xiao

    Full Text Available Combining path consistency (PC algorithms with conditional mutual information (CMI are widely used in reconstruction of gene regulatory networks. CMI has many advantages over Pearson correlation coefficient in measuring non-linear dependence to infer gene regulatory networks. It can also discriminate the direct regulations from indirect ones. However, it is still a challenge to select the conditional genes in an optimal way, which affects the performance and computation complexity of the PC algorithm. In this study, we develop a novel conditional mutual information-based algorithm, namely RPNI (Regulation Pattern based Network Inference, to infer gene regulatory networks. For conditional gene selection, we define the co-regulation pattern, indirect-regulation pattern and mixture-regulation pattern as three candidate patterns to guide the selection of candidate genes. To demonstrate the potential of our algorithm, we apply it to gene expression data from DREAM challenge. Experimental results show that RPNI outperforms existing conditional mutual information-based methods in both accuracy and time complexity for different sizes of gene samples. Furthermore, the robustness of our algorithm is demonstrated by noisy interference analysis using different types of noise.

  7. Inference of gene-phenotype associations via protein-protein interaction and orthology.

    Directory of Open Access Journals (Sweden)

    Panwen Wang

    Full Text Available One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.

  8. Influence of the experimental design of gene expression studies on the inference of gene regulatory networks: environmental factors

    Directory of Open Access Journals (Sweden)

    Frank Emmert-Streib

    2013-02-01

    Full Text Available The inference of gene regulatory networks gained within recent years a considerable interest in the biology and biomedical community. The purpose of this paper is to investigate the influence that environmental conditions can exhibit on the inference performance of network inference algorithms. Specifically, we study five network inference methods, Aracne, BC3NET, CLR, C3NET and MRNET, and compare the results for three different conditions: (I observational gene expression data: normal environmental condition, (II interventional gene expression data: growth in rich media, (III interventional gene expression data: normal environmental condition interrupted by a positive spike-in stimulation. Overall, we find that different statistical inference methods lead to comparable, but condition-specific results. Further, our results suggest that non-steady-state data enhance the inferability of regulatory networks.

  9. Inference of cancer-specific gene regulatory networks using soft computing rules.

    Science.gov (United States)

    Wang, Xiaosheng; Gotoh, Osamu

    2010-03-24

    Perturbations of gene regulatory networks are essentially responsible for oncogenesis. Therefore, inferring the gene regulatory networks is a key step to overcoming cancer. In this work, we propose a method for inferring directed gene regulatory networks based on soft computing rules, which can identify important cause-effect regulatory relations of gene expression. First, we identify important genes associated with a specific cancer (colon cancer) using a supervised learning approach. Next, we reconstruct the gene regulatory networks by inferring the regulatory relations among the identified genes, and their regulated relations by other genes within the genome. We obtain two meaningful findings. One is that upregulated genes are regulated by more genes than downregulated ones, while downregulated genes regulate more genes than upregulated ones. The other one is that tumor suppressors suppress tumor activators and activate other tumor suppressors strongly, while tumor activators activate other tumor activators and suppress tumor suppressors weakly, indicating the robustness of biological systems. These findings provide valuable insights into the pathogenesis of cancer.

  10. Gene expression inference with deep learning.

    Science.gov (United States)

    Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui

    2016-06-15

    Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Genetic structure and gene flow among Komodo dragon populations inferred by microsatellite loci analysis.

    Science.gov (United States)

    Ciofi, C; Bruford, M W

    1999-12-01

    A general concern for the conservation of endangered species is the maintenance of genetic variation within populations, particularly when they become isolated and reduced in size. Estimates of gene flow and effective population size are therefore important for any conservation initiative directed to the long-term persistence of a species in its natural habitat. In the present study, 10 microsatellite loci were used to assess the level of genetic variability among populations of the Komodo dragon Varanus komodoensis. Effective population size was calculated and gene flow estimates were compared with palaeogeographic data in order to assess the degree of vulnerability of four island populations. Rinca and Flores, currently separated by an isthmus of about 200 m, retained a high level of genetic diversity and showed a high degree of genetic similarity, with gene flow values close to one migrant per generation. The island of Komodo showed by far the highest levels of genetic divergence, and its allelic distinctiveness was considered of great importance in the maintenance of genetic variability within the species. A lack of distinct alleles and low levels of gene flow and genetic variability were found for the small population of Gili Motang island, which was identified as vulnerable to stochastic threats. Our results are potentially important for both the short- and long-term management of the Komodo dragon, and are critical in view of future re-introduction or augmentation in areas where the species is now extinct or depleted.

  12. Inferring time-varying network topologies from gene expression data.

    Science.gov (United States)

    Rao, Arvind; Hero, Alfred O; States, David J; Engel, James Douglas

    2007-01-01

    Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster--to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.

  13. Inference of Cancer-specific Gene Regulatory Networks Using Soft Computing Rules

    Directory of Open Access Journals (Sweden)

    Xiaosheng Wang

    2010-03-01

    Full Text Available Perturbations of gene regulatory networks are essentially responsible for oncogenesis. Therefore, inferring the gene regulatory networks is a key step to overcoming cancer. In this work, we propose a method for inferring directed gene regulatory networks based on soft computing rules, which can identify important cause-effect regulatory relations of gene expression. First, we identify important genes associated with a specific cancer (colon cancer using a supervised learning approach. Next, we reconstruct the gene regulatory networks by inferring the regulatory relations among the identified genes, and their regulated relations by other genes within the genome. We obtain two meaningful findings. One is that upregulated genes are regulated by more genes than downregulated ones, while downregulated genes regulate more genes than upregulated ones. The other one is that tumor suppressors suppress tumor activators and activate other tumor suppressors strongly, while tumor activators activate other tumor activators and suppress tumor suppressors weakly, indicating the robustness of biological systems. These findings provide valuable insights into the pathogenesis of cancer.

  14. Integration of steady-state and temporal gene expression data for the inference of gene regulatory networks.

    Science.gov (United States)

    Wang, Yi Kan; Hurley, Daniel G; Schnell, Santiago; Print, Cristin G; Crampin, Edmund J

    2013-01-01

    We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data.

  15. Inferring the conservative causal core of gene regulatory networks

    Directory of Open Access Journals (Sweden)

    Emmert-Streib Frank

    2010-09-01

    Full Text Available Abstract Background Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. Results In this paper, we introduce a novel gene regulatory network inference (GRNI algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. Conclusions For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.

  16. Inferring the conservative causal core of gene regulatory networks.

    Science.gov (United States)

    Altay, Gökmen; Emmert-Streib, Frank

    2010-09-28

    Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.

  17. An algebra-based method for inferring gene regulatory networks.

    Science.gov (United States)

    Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard

    2014-03-26

    The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the

  18. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    Directory of Open Access Journals (Sweden)

    Joeri Ruyssinck

    Full Text Available One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made

  19. Genealogy-based methods for inference of historical recombination and gene flow and their application in Saccharomyces cerevisiae.

    Science.gov (United States)

    Jenkins, Paul A; Song, Yun S; Brem, Rachel B

    2012-01-01

    Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance.

  20. A Test for Gene Flow among Sympatric and Allopatric Hawaiian Picture-Winged Drosophila.

    Science.gov (United States)

    Kang, Lin; Garner, Harold R; Price, Donald K; Michalak, Pawel

    2017-06-01

    The Hawaiian Drosophila are one of the most species-rich endemic groups in Hawaii and a spectacular example of adaptive radiation. Drosophila silvestris and D. heteroneura are two closely related picture-winged Drosophila species that occur sympatrically on Hawaii Island and are known to hybridize in nature, yet exhibit highly divergent behavioral and morphological traits driven largely through sexual selection. Their closest-related allopatric species, D. planitibia from Maui, exhibits hybrid male sterility and reduced behavioral reproductive isolation when crossed experimentally with D. silvestris or D. heteroneura. A modified four-taxon test for gene flow was applied to recently obtained genomes of the three Hawaiian Drosophila species. The analysis indicates recent gene flow in sympatry, but also, although less extensive, between allopatric species. This study underscores the prevalence of gene flow, even in taxonomic groups considered classic examples of allopatric speciation on islands. The potential confounding effects of gene flow in phylogenetic and population genetics inference are discussed, as well as the implications for conservation.

  1. STRATEGIES IN SEISMIC INFERENCE OF SUPERGRANULAR FLOWS ON THE SUN

    Energy Technology Data Exchange (ETDEWEB)

    Bhattacharya, Jishnu; Hanasoge, Shravan M. [Department of Astronomy and Astrophysics, Tata Institute of Fundamental Research, Mumbai-400005 (India)

    2016-08-01

    Observations of the solar surface reveal the presence of flows with length scales of around 35 Mm, commonly referred to as supergranules. Inferring the subsurface flow profile of supergranules from measurements of the surface and photospheric wavefield is an important challenge faced by helioseismology. Traditionally, the inverse problem has been approached by studying the linear response of seismic waves in a horizontally translationally invariant background to the presence of the supergranule; following an iterative approach that does not depend on horizontal translational invariance might perform better, since the misfit can be analyzed post iterations. In this work, we construct synthetic observations using a reference supergranule and invert for the flow profile using surface measurements of travel times of waves belonging to modal ridges f (surface gravity) and p {sub 1} through p {sub 7} (acoustic). We study the extent to which individual modes and their combinations contribute to infer the flow. We show that this method of nonlinear iterative inversion tends to underestimate the flow velocities, as well as inferring a shallower flow profile, with significant deviations from the reference supergranule near the surface. We carry out a similar analysis for a sound-speed perturbation and find that analogous near-surface deviations persist, although the iterations converge faster and more accurately. We conclude that a better approach to inversion would be to expand the supergranule profile in an appropriate basis, thereby reducing the number of parameters being inverted for and appropriately regularizing them.

  2. Methodology for the inference of gene function from phenotype data.

    Science.gov (United States)

    Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A

    2014-12-12

    Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and

  3. A Bayesian Framework That Integrates Heterogeneous Data for Inferring Gene Regulatory Networks

    Energy Technology Data Exchange (ETDEWEB)

    Santra, Tapesh, E-mail: tapesh.santra@ucd.ie [Systems Biology Ireland, University College Dublin, Dublin (Ireland)

    2014-05-20

    Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein–protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.

  4. A Bayesian Framework That Integrates Heterogeneous Data for Inferring Gene Regulatory Networks

    International Nuclear Information System (INIS)

    Santra, Tapesh

    2014-01-01

    Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein–protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.

  5. Evaluation of artificial time series microarray data for dynamic gene regulatory network inference.

    Science.gov (United States)

    Xenitidis, P; Seimenis, I; Kakolyris, S; Adamopoulos, A

    2017-08-07

    High-throughput technology like microarrays is widely used in the inference of gene regulatory networks (GRNs). We focused on time series data since we are interested in the dynamics of GRNs and the identification of dynamic networks. We evaluated the amount of information that exists in artificial time series microarray data and the ability of an inference process to produce accurate models based on them. We used dynamic artificial gene regulatory networks in order to create artificial microarray data. Key features that characterize microarray data such as the time separation of directly triggered genes, the percentage of directly triggered genes and the triggering function type were altered in order to reveal the limits that are imposed by the nature of microarray data on the inference process. We examined the effect of various factors on the inference performance such as the network size, the presence of noise in microarray data, and the network sparseness. We used a system theory approach and examined the relationship between the pole placement of the inferred system and the inference performance. We examined the relationship between the inference performance in the time domain and the true system parameter identification. Simulation results indicated that time separation and the percentage of directly triggered genes are crucial factors. Also, network sparseness, the triggering function type and noise in input data affect the inference performance. When two factors were simultaneously varied, it was found that variation of one parameter significantly affects the dynamic response of the other. Crucial factors were also examined using a real GRN and acquired results confirmed simulation findings with artificial data. Different initial conditions were also used as an alternative triggering approach. Relevant results confirmed that the number of datasets constitutes the most significant parameter with regard to the inference performance. Copyright © 2017 Elsevier

  6. Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation.

    Directory of Open Access Journals (Sweden)

    Xiaobo Guo

    Full Text Available Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs. It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC curve and the precision-recall (PR curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference.

  7. STRIDE: Species Tree Root Inference from Gene Duplication Events.

    Science.gov (United States)

    Emms, David M; Kelly, Steven

    2017-12-01

    The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  8. Inferring the gene network underlying the branching of tomato inflorescence.

    Directory of Open Access Journals (Sweden)

    Laura Astola

    Full Text Available The architecture of tomato inflorescence strongly affects flower production and subsequent crop yield. To understand the genetic activities involved, insight into the underlying network of genes that initiate and control the sympodial growth in the tomato is essential. In this paper, we show how the structure of this network can be derived from available data of the expressions of the involved genes. Our approach starts from employing biological expert knowledge to select the most probable gene candidates behind branching behavior. To find how these genes interact, we develop a stepwise procedure for computational inference of the network structure. Our data consists of expression levels from primary shoot meristems, measured at different developmental stages on three different genotypes of tomato. With the network inferred by our algorithm, we can explain the dynamics corresponding to all three genotypes simultaneously, despite their apparent dissimilarities. We also correctly predict the chronological order of expression peaks for the main hubs in the network. Based on the inferred network, using optimal experimental design criteria, we are able to suggest an informative set of experiments for further investigation of the mechanisms underlying branching behavior.

  9. Gene regulatory network inference by point-based Gaussian approximation filters incorporating the prior information.

    Science.gov (United States)

    Jia, Bin; Wang, Xiaodong

    2013-12-17

    : The extended Kalman filter (EKF) has been applied to inferring gene regulatory networks. However, it is well known that the EKF becomes less accurate when the system exhibits high nonlinearity. In addition, certain prior information about the gene regulatory network exists in practice, and no systematic approach has been developed to incorporate such prior information into the Kalman-type filter for inferring the structure of the gene regulatory network. In this paper, an inference framework based on point-based Gaussian approximation filters that can exploit the prior information is developed to solve the gene regulatory network inference problem. Different point-based Gaussian approximation filters, including the unscented Kalman filter (UKF), the third-degree cubature Kalman filter (CKF3), and the fifth-degree cubature Kalman filter (CKF5) are employed. Several types of network prior information, including the existing network structure information, sparsity assumption, and the range constraint of parameters, are considered, and the corresponding filters incorporating the prior information are developed. Experiments on a synthetic network of eight genes and the yeast protein synthesis network of five genes are carried out to demonstrate the performance of the proposed framework. The results show that the proposed methods provide more accurate inference results than existing methods, such as the EKF and the traditional UKF.

  10. Does gene flow constrain adaptive divergence or vice versa? A test using ecomorphology and sexual isolation in Timema cristinae walking-sticks.

    Science.gov (United States)

    Nosil, P; Crespi, B J

    2004-01-01

    Population differentiation often reflects a balance between divergent natural selection and the opportunity for homogenizing gene flow to erode the effects of selection. However, during ecological speciation, trait divergence results in reproductive isolation and becomes a cause, rather than a consequence, of reductions in gene flow. To assess both the causes and the reproductive consequences of morphological differentiation, we examined morphological divergence and sexual isolation among 17 populations of Timema cristinae walking-sticks. Individuals from populations adapted to using Adenostoma as a host plant tended to exhibit smaller overall body size, wide heads, and short legs relative to individuals using Ceonothus as a host. However, there was also significant variation in morphology among populations within host-plant species. Mean trait values for each single population could be reliably predicted based upon host-plant used and the potential for homogenizing gene flow, inferred from the size of the neighboring population using the alternate host and mitochondrial DNA estimates of gene flow. Morphology did not influence the probability of copulation in between-population mating trials. Thus, morphological divergence is facilitated by reductions in gene flow, but does not cause reductions in gene flow via the evolution of sexual isolation. Combined with rearing data indicating that size and shape have a partial genetic basis, evidence for parallel origins of the host-associated forms, and inferences from functional morphology, these results indicate that morphological divergence in T. cristinae reflects a balance between the effects of host-specific natural selection and gene flow. Our findings illustrate how data on mating preferences can help determine the causal associations between trait divergence and levels of gene flow.

  11. Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention.

    Science.gov (United States)

    Johnston, Iain G; Williams, Ben P

    2016-02-24

    Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations.

    Directory of Open Access Journals (Sweden)

    Xiaodong Cai

    Full Text Available Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL, for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL based scheme, and the QTL-directed dependency graph (QDG method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.

  13. Structural influence of gene networks on their inference: analysis of C3NET

    Directory of Open Access Journals (Sweden)

    Emmert-Streib Frank

    2011-06-01

    Full Text Available Abstract Background The availability of large-scale high-throughput data possesses considerable challenges toward their functional analysis. For this reason gene network inference methods gained considerable interest. However, our current knowledge, especially about the influence of the structure of a gene network on its inference, is limited. Results In this paper we present a comprehensive investigation of the structural influence of gene networks on the inferential characteristics of C3NET - a recently introduced gene network inference algorithm. We employ local as well as global performance metrics in combination with an ensemble approach. The results from our numerical study for various biological and synthetic network structures and simulation conditions, also comparing C3NET with other inference algorithms, lead a multitude of theoretical and practical insights into the working behavior of C3NET. In addition, in order to facilitate the practical usage of C3NET we provide an user-friendly R package, called c3net, and describe its functionality. It is available from https://r-forge.r-project.org/projects/c3net and from the CRAN package repository. Conclusions The availability of gene network inference algorithms with known inferential properties opens a new era of large-scale screening experiments that could be equally beneficial for basic biological and biomedical research with auspicious prospects. The availability of our easy to use software package c3net may contribute to the popularization of such methods. Reviewers This article was reviewed by Lev Klebanov, Joel Bader and Yuriy Gusev.

  14. Inferring gene networks from discrete expression data

    KAUST Repository

    Zhang, L.

    2013-07-18

    The modeling of gene networks from transcriptional expression data is an important tool in biomedical research to reveal signaling pathways and to identify treatment targets. Current gene network modeling is primarily based on the use of Gaussian graphical models applied to continuous data, which give a closedformmarginal likelihood. In this paper,we extend network modeling to discrete data, specifically data from serial analysis of gene expression, and RNA-sequencing experiments, both of which generate counts of mRNAtranscripts in cell samples.We propose a generalized linear model to fit the discrete gene expression data and assume that the log ratios of the mean expression levels follow a Gaussian distribution.We restrict the gene network structures to decomposable graphs and derive the graphs by selecting the covariance matrix of the Gaussian distribution with the hyper-inverse Wishart priors. Furthermore, we incorporate prior network models based on gene ontology information, which avails existing biological information on the genes of interest. We conduct simulation studies to examine the performance of our discrete graphical model and apply the method to two real datasets for gene network inference. © The Author 2013. Published by Oxford University Press. All rights reserved.

  15. Comparison of evolutionary algorithms in gene regulatory network model inference.

    LENUS (Irish Health Repository)

    2010-01-01

    ABSTRACT: BACKGROUND: The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very difficult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insufficient. RESULTS: This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and offer a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. CONCLUSIONS: Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identified and a platform for development of appropriate model formalisms is established.

  16. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

    Science.gov (United States)

    Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

    2014-01-16

    To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high

  17. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.

    Science.gov (United States)

    Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.

  18. A novel mutual information-based Boolean network inference method from time-series gene expression data.

    Directory of Open Access Journals (Sweden)

    Shohag Barman

    Full Text Available Inferring a gene regulatory network from time-series gene expression data in systems biology is a challenging problem. Many methods have been suggested, most of which have a scalability limitation due to the combinatorial cost of searching a regulatory set of genes. In addition, they have focused on the accurate inference of a network structure only. Therefore, there is a pressing need to develop a network inference method to search regulatory genes efficiently and to predict the network dynamics accurately.In this study, we employed a Boolean network model with a restricted update rule scheme to capture coarse-grained dynamics, and propose a novel mutual information-based Boolean network inference (MIBNI method. Given time-series gene expression data as an input, the method first identifies a set of initial regulatory genes using mutual information-based feature selection, and then improves the dynamics prediction accuracy by iteratively swapping a pair of genes between sets of the selected regulatory genes and the other genes. Through extensive simulations with artificial datasets, MIBNI showed consistently better performance than six well-known existing methods, REVEAL, Best-Fit, RelNet, CST, CLR, and BIBN in terms of both structural and dynamics prediction accuracy. We further tested the proposed method with two real gene expression datasets for an Escherichia coli gene regulatory network and a fission yeast cell cycle network, and also observed better results using MIBNI compared to the six other methods.Taken together, MIBNI is a promising tool for predicting both the structure and the dynamics of a gene regulatory network.

  19. Human synthetic lethal inference as potential anti-cancer target gene detection

    Directory of Open Access Journals (Sweden)

    Solé Ricard V

    2009-12-01

    Full Text Available Abstract Background Two genes are called synthetic lethal (SL if mutation of either alone is not lethal, but mutation of both leads to death or a significant decrease in organism's fitness. The detection of SL gene pairs constitutes a promising alternative for anti-cancer therapy. As cancer cells exhibit a large number of mutations, the identification of these mutated genes' SL partners may provide specific anti-cancer drug candidates, with minor perturbations to the healthy cells. Since existent SL data is mainly restricted to yeast screenings, the road towards human SL candidates is limited to inference methods. Results In the present work, we use phylogenetic analysis and database manipulation (BioGRID for interactions, Ensembl and NCBI for homology, Gene Ontology for GO attributes in order to reconstruct the phylogenetically-inferred SL gene network for human. In addition, available data on cancer mutated genes (COSMIC and Cancer Gene Census databases as well as on existent approved drugs (DrugBank database supports our selection of cancer-therapy candidates. Conclusions Our work provides a complementary alternative to the current methods for drug discovering and gene target identification in anti-cancer research. Novel SL screening analysis and the use of highly curated databases would contribute to improve the results of this methodology.

  20. Predictive minimum description length principle approach to inferring gene regulatory networks.

    Science.gov (United States)

    Chaitankar, Vijender; Zhang, Chaoyang; Ghosh, Preetam; Gong, Ping; Perkins, Edward J; Deng, Youping

    2011-01-01

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold that defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we propose a new inference algorithm that incorporates mutual information (MI), conditional mutual information (CMI), and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm is evaluated using both synthetic time series data sets and a biological time series data set (Saccharomyces cerevisiae). The results show that the proposed algorithm produced fewer false edges and significantly improved the precision when compared to existing MDL algorithm.

  1. Sex-biased gene flow in spectacled eiders (Anatidae): Inferences from molecular markers with contrasting modes of inheritance

    Science.gov (United States)

    Scribner, Kim T.; Petersen, Margaret R.; Fields, Raymond L.; Talbot, Sandra L.; Pearce, John M.; Chesser, Ronald K.

    2001-01-01

    Genetic markers that differ in mode of inheritance and rate of evolution (a sex-linked Z-specific microsatellite locus, five biparentally inherited microsatellite loci, and maternally inherited mitochondrial [mtDNA] sequences) were used to evaluate the degree of spatial genetic structuring at macro- and microgeographic scales, among breeding regions and local nesting populations within each region, respectively, for a migratory sea duck species, the spectacled eider (Somateria fisheri). Disjunct and declining breeding populations coupled with sex-specific differences in seasonal migratory patterns and life history provide a series of hypotheses regarding rates and directionality of gene flow among breeding populations from the Indigirka River Delta, Russia, and the North Slope and Yukon-Kuskokwim Delta, Alaska. The degree of differentiation in mtDNA haplotype frequency among breeding regions and populations within regions was high (ϕCT = 0.189, P 0.05; biparentally inherited microsatellites: mean θ = 0.001, P > 0.05) than was observed for mtDNA. Using models explicitly designed for uniparental and biparentally inherited genes, estimates of spatial divergence based on nuclear and mtDNA data together with elements of the species' breeding ecology were used to estimate effective population size and degree of male and female gene flow. Differences in the magnitude and spatial patterns of gene correlations for maternally inherited and nuclear genes revealed that females exhibit greater natal philopatry than do males. Estimates of generational female and male rates of gene flow among breeding regions differed markedly (3.67 × 10−4 and 1.28 × 10−2, respectively). Effective population size for mtDNA was estimated to be at least three times lower than that for biparental genes (30,671 and 101,528, respectively). Large disparities in population sizes among breeding areas greatly reduces the proportion of total genetic variance captured by dispersal, which may

  2. A novel gene network inference algorithm using predictive minimum description length approach.

    Science.gov (United States)

    Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang

    2010-05-28

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data. We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the

  3. Inference of gene regulatory networks from time series by Tsallis entropy

    Directory of Open Access Journals (Sweden)

    de Oliveira Evaldo A

    2011-05-01

    Full Text Available Abstract Background The inference of gene regulatory networks (GRNs from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information, a new criterion function is here proposed. Results In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5

  4. A comparative study of covariance selection models for the inference of gene regulatory networks.

    Science.gov (United States)

    Stifanelli, Patrizia F; Creanza, Teresa M; Anglani, Roberto; Liuzzi, Vania C; Mukherjee, Sayan; Schena, Francesco P; Ancona, Nicola

    2013-10-01

    The inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology. In this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) 'ℓ(2C)' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that ℓ(2C) outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling. Software implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip. Copyright © 2013 The Authors. Published by

  5. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

    Science.gov (United States)

    Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

    2011-08-15

    Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.

  6. Inferring Phylogenetic Networks from Gene Order Data

    Directory of Open Access Journals (Sweden)

    Alexey Anatolievich Morozov

    2013-01-01

    Full Text Available Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary, sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm. Binary encoding can also be useful, but only when the methods mentioned above cannot be used.

  7. Doing ecohydrology backward: Inferring wetland flow and hydroperiod from landscape patterns

    Science.gov (United States)

    Acharya, Subodh; Kaplan, David A.; Jawitz, James W.; Cohen, Matthew J.

    2017-07-01

    Human alterations to hydrology have globally impacted wetland ecosystems. Preventing or reversing these impacts is a principal focus of restoration efforts. However, restoration effectiveness is often hampered by limited information on historical landscape properties and hydrologic regime. To help address this gap, we developed a novel statistical approach for inferring flows and inundation frequency (i.e., hydroperiod, HP) in wetlands where changes in spatial vegetation and geomorphic patterns have occurred due to hydrologic alteration. We developed an analytical expression for HP as a transformation of the landscape-scale stage-discharge relationship. We applied this model to the Everglades "ridge-slough" (RS) landscape, a patterned, lotic peatland in southern Florida that has been drastically degraded by compartmentalization, drainage, and flow diversions. The new method reliably estimated flow and HP for a range of RS landscape patterns. Crucially, ridge-patch anisotropy and elevation above sloughs were strong drivers of flow-HP relationships. Increasing ridge heights markedly increased flow required to achieve sufficient HP to support peat accretion. Indeed, ridge heights inferred from historical accounts would require boundary flows 3-4 times greater than today, which agrees with restoration flow estimates from more complex, spatially distributed models. While observed loss of patch anisotropy allows HP targets to be met with lower flows, such landscapes likely fail to support other ecological functions. This work helps inform restoration flows required to restore stable ridge-slough patterning and positive peat accretion in this degraded ecosystem, and, more broadly, provides tools for exploring interactions between landscape and hydrology in lotic wetlands and floodplains.

  8. Inferring causal genomic alterations in breast cancer using gene expression data

    Science.gov (United States)

    2011-01-01

    Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811

  9. Inferring the functional effect of gene expression changes in signaling pathways

    Science.gov (United States)

    Sebastián-León, Patricia; Carbonell, José; Salavert, Francisco; Sanchez, Rubén; Medina, Ignacio; Dopazo, Joaquín

    2013-01-01

    Signaling pathways constitute a valuable source of information that allows interpreting the way in which alterations in gene activities affect to particular cell functionalities. There are web tools available that allow viewing and editing pathways, as well as representing experimental data on them. However, few methods aimed to identify the signaling circuits, within a pathway, associated to the biological problem studied exist and none of them provide a convenient graphical web interface. We present PATHiWAYS, a web-based signaling pathway visualization system that infers changes in signaling that affect cell functionality from the measurements of gene expression values in typical expression microarray case–control experiments. A simple probabilistic model of the pathway is used to estimate the probabilities for signal transmission from any receptor to any final effector molecule (taking into account the pathway topology) using for this the individual probabilities of gene product presence/absence inferred from gene expression values. Significant changes in these probabilities allow linking different cell functionalities triggered by the pathway to the biological problem studied. PATHiWAYS is available at: http://pathiways.babelomics.org/. PMID:23748960

  10. Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

    Science.gov (United States)

    2013-01-01

    Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as

  11. Speciation with gene flow in whiptail lizards from a Neotropical xeric biome.

    Science.gov (United States)

    Oliveira, Eliana F; Gehara, Marcelo; São-Pedro, Vinícius A; Chen, Xin; Myers, Edward A; Burbrink, Frank T; Mesquita, Daniel O; Garda, Adrian A; Colli, Guarino R; Rodrigues, Miguel T; Arias, Federico J; Zaher, Hussam; Santos, Rodrigo M L; Costa, Gabriel C

    2015-12-01

    Two main hypotheses have been proposed to explain the diversification of the Caatinga biota. The riverine barrier hypothesis (RBH) claims that the São Francisco River (SFR) is a major biogeographic barrier to gene flow. The Pleistocene climatic fluctuation hypothesis (PCH) states that gene flow, geographic genetic structure and demographic signatures on endemic Caatinga taxa were influenced by Quaternary climate fluctuation cycles. Herein, we analyse genetic diversity and structure, phylogeographic history, and diversification of a widespread Caatinga lizard (Cnemidophorus ocellifer) based on large geographical sampling for multiple loci to test the predictions derived from the RBH and PCH. We inferred two well-delimited lineages (Northeast and Southwest) that have diverged along the Cerrado-Caatinga border during the Mid-Late Miocene (6-14 Ma) despite the presence of gene flow. We reject both major hypotheses proposed to explain diversification in the Caatinga. Surprisingly, our results revealed a striking complex diversification pattern where the Northeast lineage originated as a founder effect from a few individuals located along the edge of the Southwest lineage that eventually expanded throughout the Caatinga. The Southwest lineage is more diverse, older and associated with the Cerrado-Caatinga boundaries. Finally, we suggest that C. ocellifer from the Caatinga is composed of two distinct species. Our data support speciation in the presence of gene flow and highlight the role of environmental gradients in the diversification process. © 2015 John Wiley & Sons Ltd.

  12. State of the Art of Fuzzy Methods for Gene Regulatory Networks Inference

    Directory of Open Access Journals (Sweden)

    Tuqyah Abdullah Al Qazlan

    2015-01-01

    Full Text Available To address one of the most challenging issues at the cellular level, this paper surveys the fuzzy methods used in gene regulatory networks (GRNs inference. GRNs represent causal relationships between genes that have a direct influence, trough protein production, on the life and the development of living organisms and provide a useful contribution to the understanding of the cellular functions as well as the mechanisms of diseases. Fuzzy systems are based on handling imprecise knowledge, such as biological information. They provide viable computational tools for inferring GRNs from gene expression data, thus contributing to the discovery of gene interactions responsible for specific diseases and/or ad hoc correcting therapies. Increasing computational power and high throughput technologies have provided powerful means to manage these challenging digital ecosystems at different levels from cell to society globally. The main aim of this paper is to report, present, and discuss the main contributions of this multidisciplinary field in a coherent and structured framework.

  13. Spurious correlations and inference in landscape genetics

    Science.gov (United States)

    Samuel A. Cushman; Erin L. Landguth

    2010-01-01

    Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...

  14. Directed partial correlation: inferring large-scale gene regulatory network through induced topology disruptions.

    Directory of Open Access Journals (Sweden)

    Yinyin Yuan

    Full Text Available Inferring regulatory relationships among many genes based on their temporal variation in transcript abundance has been a popular research topic. Due to the nature of microarray experiments, classical tools for time series analysis lose power since the number of variables far exceeds the number of the samples. In this paper, we describe some of the existing multivariate inference techniques that are applicable to hundreds of variables and show the potential challenges for small-sample, large-scale data. We propose a directed partial correlation (DPC method as an efficient and effective solution to regulatory network inference using these data. Specifically for genomic data, the proposed method is designed to deal with large-scale datasets. It combines the efficiency of partial correlation for setting up network topology by testing conditional independence, and the concept of Granger causality to assess topology change with induced interruptions. The idea is that when a transcription factor is induced artificially within a gene network, the disruption of the network by the induction signifies a genes role in transcriptional regulation. The benchmarking results using GeneNetWeaver, the simulator for the DREAM challenges, provide strong evidence of the outstanding performance of the proposed DPC method. When applied to real biological data, the inferred starch metabolism network in Arabidopsis reveals many biologically meaningful network modules worthy of further investigation. These results collectively suggest DPC is a versatile tool for genomics research. The R package DPC is available for download (http://code.google.com/p/dpcnet/.

  15. Artificial neural network inference (ANNI: a study on gene-gene interaction for biomarkers in childhood sarcomas.

    Directory of Open Access Journals (Sweden)

    Dong Ling Tong

    Full Text Available OBJECTIVE: To model the potential interaction between previously identified biomarkers in children sarcomas using artificial neural network inference (ANNI. METHOD: To concisely demonstrate the biological interactions between correlated genes in an interaction network map, only 2 types of sarcomas in the children small round blue cell tumors (SRBCTs dataset are discussed in this paper. A backpropagation neural network was used to model the potential interaction between genes. The prediction weights and signal directions were used to model the strengths of the interaction signals and the direction of the interaction link between genes. The ANN model was validated using Monte Carlo cross-validation to minimize the risk of over-fitting and to optimize generalization ability of the model. RESULTS: Strong connection links on certain genes (TNNT1 and FNDC5 in rhabdomyosarcoma (RMS; FCGRT and OLFM1 in Ewing's sarcoma (EWS suggested their potency as central hubs in the interconnection of genes with different functionalities. The results showed that the RMS patients in this dataset are likely to be congenital and at low risk of cardiomyopathy development. The EWS patients are likely to be complicated by EWS-FLI fusion and deficiency in various signaling pathways, including Wnt, Fas/Rho and intracellular oxygen. CONCLUSIONS: The ANN network inference approach and the examination of identified genes in the published literature within the context of the disease highlights the substantial influence of certain genes in sarcomas.

  16. Learning a Markov Logic network for supervised gene regulatory network inference.

    Science.gov (United States)

    Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence

    2013-09-12

    Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a

  17. Large-scale modeling of condition-specific gene regulatory networks by information integration and inference.

    Science.gov (United States)

    Ellwanger, Daniel Christian; Leonhardt, Jörn Florian; Mewes, Hans-Werner

    2014-12-01

    Understanding how regulatory networks globally coordinate the response of a cell to changing conditions, such as perturbations by shifting environments, is an elementary challenge in systems biology which has yet to be met. Genome-wide gene expression measurements are high dimensional as these are reflecting the condition-specific interplay of thousands of cellular components. The integration of prior biological knowledge into the modeling process of systems-wide gene regulation enables the large-scale interpretation of gene expression signals in the context of known regulatory relations. We developed COGERE (http://mips.helmholtz-muenchen.de/cogere), a method for the inference of condition-specific gene regulatory networks in human and mouse. We integrated existing knowledge of regulatory interactions from multiple sources to a comprehensive model of prior information. COGERE infers condition-specific regulation by evaluating the mutual dependency between regulator (transcription factor or miRNA) and target gene expression using prior information. This dependency is scored by the non-parametric, nonlinear correlation coefficient η(2) (eta squared) that is derived by a two-way analysis of variance. We show that COGERE significantly outperforms alternative methods in predicting condition-specific gene regulatory networks on simulated data sets. Furthermore, by inferring the cancer-specific gene regulatory network from the NCI-60 expression study, we demonstrate the utility of COGERE to promote hypothesis-driven clinical research. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Inferring Drosophila gap gene regulatory network: Pattern analysis of simulated gene expression profiles and stability analysis

    OpenAIRE

    Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.

    2009-01-01

    Abstract Background Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori assumptions about the interactions, which all simulate the observed patterns. It is important to analyze the properties of the circuits. Findings We have analyzed the simulated gene expression ...

  19. Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

    Directory of Open Access Journals (Sweden)

    Mingxin Gan

    2014-01-01

    Full Text Available Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  20. Forest corridors maintain historical gene flow in a tiger metapopulation in the highlands of central India.

    Science.gov (United States)

    Sharma, Sandeep; Dutta, Trishna; Maldonado, Jesús E; Wood, Thomas C; Panwar, Hemendra Singh; Seidensticker, John

    2013-09-22

    Understanding the patterns of gene flow of an endangered species metapopulation occupying a fragmented habitat is crucial for landscape-level conservation planning and devising effective conservation strategies. Tigers (Panthera tigris) are globally endangered and their populations are highly fragmented and exist in a few isolated metapopulations across their range. We used multi-locus genotypic data from 273 individual tigers (Panthera tigris tigris) from four tiger populations of the Satpura-Maikal landscape of central India to determine whether the corridors in this landscape are functional. This 45 000 km(2) landscape contains 17% of India's tiger population and 12% of its tiger habitat. We applied Bayesian and coalescent-based analyses to estimate contemporary and historical gene flow among these populations and to infer their evolutionary history. We found that the tiger metapopulation in central India has high rates of historical and contemporary gene flow. The tests for population history reveal that tigers populated central India about 10 000 years ago. Their population subdivision began about 1000 years ago and accelerated about 200 years ago owing to habitat fragmentation, leading to four spatially separated populations. These four populations have been in migration-drift equilibrium maintained by high gene flow. We found the highest rates of contemporary gene flow in populations that are connected by forest corridors. This information is highly relevant to conservation practitioners and policy makers, because deforestation, road widening and mining are imminent threats to these corridors.

  1. Inferring Drosophila gap gene regulatory network: Pattern analysis of simulated gene expression profiles and stability analysis

    NARCIS (Netherlands)

    Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.

    2009-01-01

    Background: Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori

  2. Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.

    Science.gov (United States)

    Wu, Yufeng

    2012-03-01

    Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.

  3. A Meta-Analysis of Multiple Matched Copy Number and Transcriptomics Data Sets for Inferring Gene Regulatory Relationships

    Science.gov (United States)

    Newton, Richard; Wernisch, Lorenz

    2014-01-01

    Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments. PMID:25148247

  4. Pathogenomic inference of virulence-associated genes in Leptospira interrogans.

    Science.gov (United States)

    Lehmann, Jason S; Fouts, Derrick E; Haft, Daniel H; Cannella, Anthony P; Ricaldi, Jessica N; Brinkac, Lauren; Harkins, Derek; Durkin, Scott; Sanka, Ravi; Sutton, Granger; Moreno, Angelo; Vinetz, Joseph M; Matthias, Michael A

    2013-01-01

    Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens.

  5. An integrative approach to inferring biologically meaningful gene modules

    Directory of Open Access Journals (Sweden)

    Wang Kai

    2011-07-01

    Full Text Available Abstract Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.

  6. Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes.

    Science.gov (United States)

    Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D

    2016-01-01

    We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.

  7. Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles.

    Science.gov (United States)

    Yu, Yun; Warnow, Tandy; Nakhleh, Luay

    2011-11-01

    One of the criteria for inferring a species tree from a collection of gene trees, when gene tree incongruence is assumed to be due to incomplete lineage sorting (ILS), is Minimize Deep Coalescence (MDC). Exact algorithms for inferring the species tree from rooted, binary trees under MDC were recently introduced. Nevertheless, in phylogenetic analyses of biological data sets, estimated gene trees may differ from true gene trees, be incompletely resolved, and not necessarily rooted. In this article, we propose new MDC formulations for the cases where the gene trees are unrooted/binary, rooted/non-binary, and unrooted/non-binary. Further, we prove structural theorems that allow us to extend the algorithms for the rooted/binary gene tree case to these cases in a straightforward manner. In addition, we devise MDC-based algorithms for cases when multiple alleles per species may be sampled. We study the performance of these methods in coalescent-based computer simulations.

  8. Pathogenomic inference of virulence-associated genes in Leptospira interrogans.

    Directory of Open Access Journals (Sweden)

    Jason S Lehmann

    Full Text Available Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens.

  9. GRN2SBML: automated encoding and annotation of inferred gene regulatory networks complying with SBML.

    Science.gov (United States)

    Vlaic, Sebastian; Hoffmann, Bianca; Kupfer, Peter; Weber, Michael; Dräger, Andreas

    2013-09-01

    GRN2SBML automatically encodes gene regulatory networks derived from several inference tools in systems biology markup language. Providing a graphical user interface, the networks can be annotated via the simple object access protocol (SOAP)-based application programming interface of BioMart Central Portal and minimum information required in the annotation of models registry. Additionally, we provide an R-package, which processes the output of supported inference algorithms and automatically passes all required parameters to GRN2SBML. Therefore, GRN2SBML closes a gap in the processing pipeline between the inference of gene regulatory networks and their subsequent analysis, visualization and storage. GRN2SBML is freely available under the GNU Public License version 3 and can be downloaded from http://www.hki-jena.de/index.php/0/2/490. General information on GRN2SBML, examples and tutorials are available at the tool's web page.

  10. CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.

    Science.gov (United States)

    Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming

    2014-11-30

    Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .

  11. A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

    Directory of Open Access Journals (Sweden)

    Xiangyun Xiao

    Full Text Available The reconstruction of gene regulatory networks (GRNs from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM, experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.

  12. A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

    Science.gov (United States)

    Xiao, Xiangyun; Zhang, Wei; Zou, Xiufen

    2015-01-01

    The reconstruction of gene regulatory networks (GRNs) from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE)-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM), experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.

  13. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    Science.gov (United States)

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  14. Inferences of the deep solar meridional flow

    Science.gov (United States)

    Böning, Vincent G. A.

    2017-10-01

    Understanding the solar meridional flow is important for uncovering the origin of the solar activity cycle. Yet, recent helioseismic estimates of this flow have come to conflicting conclusions in deeper layers of the solar interior, i.e., at depths below about 0.9 solar radii. The aim of this thesis is to contribute to a better understanding of the deep solar meridional flow. Time-distance helioseismology is the major method for investigating this flow. In this method, travel times of waves propagating between pairs of locations on the solar surface are measured. Until now, the travel-time measurements have been modeled using the ray approximation, which assumes that waves travel along infinitely thin ray paths between these locations. In contrast, the scattering of the full wave field in the solar interior due to the flow is modeled in first order by the Born approximation. It is in general a more accurate model of the physics in the solar interior. In a first step, an existing model for calculating the sensitivity of travel-time measurements to solar interior flows using the Born approximation is extended from Cartesian to spherical geometry. The results are succesfully compared to the Cartesian ones and are tested for self-consistency. In a second step, the newly developed model is validated using an existing numerical simulation of linear wave propagation in the Sun. An inversion of artificial travel times for meridional flow shows excellent agreement for noiseless data and reproduces many features in the input flow profile in the case of noisy data. Finally, the new method is used to infer the deep meridional flow. I used Global Oscillation Network Group (GONG) data that were earlier analyzed using the ray approximation and I employed the same Substractive Optimized Local Averaging (SOLA) inversion technique as in the earlier study. Using an existing formula for the covariance of travel-time measurements, it is shown that the assumption of uncorrelated errors

  15. Evolutionary history of the third chromosome gene arrangements of Drosophila pseudoobscura inferred from inversion breakpoints.

    Science.gov (United States)

    Wallace, Andre G; Detweiler, Don; Schaeffer, Stephen W

    2011-08-01

    The third chromosome of Drosophila pseudoobscura is polymorphic for numerous gene arrangements that form classical clines in North America. The polytene salivary chromosomes isolated from natural populations revealed changes in gene order that allowed the different gene arrangements to be linked together by paracentric inversions representing one of the first cases where genetic data were used to construct a phylogeny. Although the inversion phylogeny can be used to determine the relationships among the gene arrangements, the cytogenetic data are unable to infer the ancestral arrangement or the age of the different chromosome types. These are both important properties if one is to infer the evolutionary forces responsible for the spread and maintenance of the chromosomes. Here, we employ the nucleotide sequences of 18 regions distributed across the third chromosome in 80-100 D. pseudoobscura strains to test whether five gene arrangements are of unique or multiple origin, what the ancestral arrangement was, and what are the ages of the different arrangements. Each strain carried one of six commonly found gene arrangements and the sequences were used to infer their evolutionary relationships. Breakpoint regions in the center of the chromosome supported monophyly of the gene arrangements, whereas regions at the ends of the chromosome gave phylogenies that provided less support for monophyly of the chromosomes either because the individual markers did not have enough phylogenetically informative sites or genetic exchange scrambled information among the gene arrangements. A data set where the genetic markers were concatenated strongly supported a unique origin of the different gene arrangements. The inversion polymorphism of D. pseudoobscura is estimated to be about a million years old. We have also shown that the generated phylogeny is consistent with the cytological phylogeny of this species. In addition, the data presented here support hypothetical as the ancestral

  16. cDREM: inferring dynamic combinatorial gene regulation.

    Science.gov (United States)

    Wise, Aaron; Bar-Joseph, Ziv

    2015-04-01

    Genes are often combinatorially regulated by multiple transcription factors (TFs). Such combinatorial regulation plays an important role in development and facilitates the ability of cells to respond to different stresses. While a number of approaches have utilized sequence and ChIP-based datasets to study combinational regulation, these have often ignored the combinational logic and the dynamics associated with such regulation. Here we present cDREM, a new method for reconstructing dynamic models of combinatorial regulation. cDREM integrates time series gene expression data with (static) protein interaction data. The method is based on a hidden Markov model and utilizes the sparse group Lasso to identify small subsets of combinatorially active TFs, their time of activation, and the logical function they implement. We tested cDREM on yeast and human data sets. Using yeast we show that the predicted combinatorial sets agree with other high throughput genomic datasets and improve upon prior methods developed to infer combinatorial regulation. Applying cDREM to study human response to flu, we were able to identify several combinatorial TF sets, some of which were known to regulate immune response while others represent novel combinations of important TFs.

  17. Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data.

    Science.gov (United States)

    Gong, Wuming; Koyano-Nakagawa, Naoko; Li, Tongbin; Garry, Daniel J

    2015-03-07

    Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion. We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p network inference model identified a region with an elevated LR score approximately -9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (-9435 to -8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP

  18. Detecting dynamic causal inference in nonlinear two-phase fracture flow

    Science.gov (United States)

    Faybishenko, Boris

    2017-08-01

    Identifying dynamic causal inference involved in flow and transport processes in complex fractured-porous media is generally a challenging task, because nonlinear and chaotic variables may be positively coupled or correlated for some periods of time, but can then become spontaneously decoupled or non-correlated. In his 2002 paper (Faybishenko, 2002), the author performed a nonlinear dynamical and chaotic analysis of time-series data obtained from the fracture flow experiment conducted by Persoff and Pruess (1995), and, based on the visual examination of time series data, hypothesized that the observed pressure oscillations at both inlet and outlet edges of the fracture result from a superposition of both forward and return waves of pressure propagation through the fracture. In the current paper, the author explores an application of a combination of methods for detecting nonlinear chaotic dynamics behavior along with the multivariate Granger Causality (G-causality) time series test. Based on the G-causality test, the author infers that his hypothesis is correct, and presents a causation loop diagram of the spatial-temporal distribution of gas, liquid, and capillary pressures measured at the inlet and outlet of the fracture. The causal modeling approach can be used for the analysis of other hydrological processes, for example, infiltration and pumping tests in heterogeneous subsurface media, and climatic processes, for example, to find correlations between various meteorological parameters, such as temperature, solar radiation, barometric pressure, etc.

  19. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    Science.gov (United States)

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  20. A canonical correlation analysis-based dynamic bayesian network prior to infer gene regulatory networks from multiple types of biological data.

    Science.gov (United States)

    Baur, Brittany; Bozdag, Serdar

    2015-04-01

    One of the challenging and important computational problems in systems biology is to infer gene regulatory networks (GRNs) of biological systems. Several methods that exploit gene expression data have been developed to tackle this problem. In this study, we propose the use of copy number and DNA methylation data to infer GRNs. We developed an algorithm that scores regulatory interactions between genes based on canonical correlation analysis. In this algorithm, copy number or DNA methylation variables are treated as potential regulator variables, and expression variables are treated as potential target variables. We first validated that the canonical correlation analysis method is able to infer true interactions in high accuracy. We showed that the use of DNA methylation or copy number datasets leads to improved inference over steady-state expression. Our results also showed that epigenetic and structural information could be used to infer directionality of regulatory interactions. Additional improvements in GRN inference can be gleaned from incorporating the result in an informative prior in a dynamic Bayesian algorithm. This is the first study that incorporates copy number and DNA methylation into an informative prior in dynamic Bayesian framework. By closely examining top-scoring interactions with different sources of epigenetic or structural information, we also identified potential novel regulatory interactions.

  1. Evolutionary inference across eukaryotes identifies specific pressures favoring mitochondrial gene retention

    OpenAIRE

    Williams, Ben; Johnston, Iain

    2016-01-01

    Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modelling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondri...

  2. Simultaneous inference of phenotype-associated genes and relevant tissues from GWAS data via Bayesian integration of multiple tissue-specific gene networks.

    Science.gov (United States)

    Wu, Mengmeng; Lin, Zhixiang; Ma, Shining; Chen, Ting; Jiang, Rui; Wong, Wing Hung

    2017-12-01

    Although genome-wide association studies (GWAS) have successfully identified thousands of genomic loci associated with hundreds of complex traits in the past decade, the debate about such problems as missing heritability and weak interpretability has been appealing for effective computational methods to facilitate the advanced analysis of the vast volume of existing and anticipated genetic data. Towards this goal, gene-level integrative GWAS analysis with the assumption that genes associated with a phenotype tend to be enriched in biological gene sets or gene networks has recently attracted much attention, due to such advantages as straightforward interpretation, less multiple testing burdens, and robustness across studies. However, existing methods in this category usually exploit non-tissue-specific gene networks and thus lack the ability to utilize informative tissue-specific characteristics. To overcome this limitation, we proposed a Bayesian approach called SIGNET (Simultaneously Inference of GeNEs and Tissues) to integrate GWAS data and multiple tissue-specific gene networks for the simultaneous inference of phenotype-associated genes and relevant tissues. Through extensive simulation studies, we showed the effectiveness of our method in finding both associated genes and relevant tissues for a phenotype. In applications to real GWAS data of 14 complex phenotypes, we demonstrated the power of our method in both deciphering genetic basis and discovering biological insights of a phenotype. With this understanding, we expect to see SIGNET as a valuable tool for integrative GWAS analysis, thereby boosting the prevention, diagnosis, and treatment of human inherited diseases and eventually facilitating precision medicine.

  3. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies

    Directory of Open Access Journals (Sweden)

    Hero Alfred

    2010-11-01

    Full Text Available Abstract Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP, the Indian Buffet Process (IBP, and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV, Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD, closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  4. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies.

    Science.gov (United States)

    Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, Lawrence

    2010-11-09

    Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  5. ERC analysis: web-based inference of gene function via evolutionary rate covariation.

    Science.gov (United States)

    Wolfe, Nicholas W; Clark, Nathan L

    2015-12-01

    The recent explosion of comparative genomics data presents an unprecedented opportunity to construct gene networks via the evolutionary rate covariation (ERC) signature. ERC is used to identify genes that experienced similar evolutionary histories, and thereby draws functional associations between them. The ERC Analysis website allows researchers to exploit genome-wide datasets to infer novel genes in any biological function and to explore deep evolutionary connections between distinct pathways and complexes. The website provides five analytical methods, graphical output, statistical support and access to an increasing number of taxonomic groups. Analyses and data at http://csb.pitt.edu/erc_analysis/ nclark@pitt.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Phylogenetic relationships and timing of diversification in gonorynchiform fishes inferred using nuclear gene DNA sequences (Teleostei: Ostariophysi).

    Science.gov (United States)

    Near, Thomas J; Dornburg, Alex; Friedman, Matt

    2014-11-01

    The Gonorynchiformes are the sister lineage of the species-rich Otophysi and provide important insights into the diversification of ostariophysan fishes. Phylogenies of gonorynchiforms inferred using morphological characters and mtDNA gene sequences provide differing resolutions with regard to the sister lineage of all other gonorynchiforms (Chanos vs. Gonorynchus) and support for monophyly of the two miniaturized lineages Cromeria and Grasseichthys. In this study the phylogeny and divergence times of gonorynchiforms are investigated with DNA sequences sampled from nine nuclear genes and a published morphological character matrix. Bayesian phylogenetic analyses reveal substantial congruence among individual gene trees with inferences from eight genes placing Gonorynchus as the sister lineage to all other gonorynchiforms. Seven gene trees resolve Cromeria and Grasseichthys as a clade, supporting previous inferences using morphological characters. Phylogenies resulting from either concatenating the nuclear genes, performing a multispecies coalescent species tree analysis, or combining the morphological and nuclear gene DNA sequences resolve Gonorynchus as the living sister lineage of all other gonorynchiforms, strongly support the monophyly of Cromeria and Grasseichthys, and resolve a clade containing Parakneria, Cromeria, and Grasseichthys. The morphological dataset, which includes 13 gonorynchiform fossil taxa that range in age from Early Cretaceous to Eocene, was analyzed in combination with DNA sequences from the nine nuclear genes and a relaxed molecular clock to estimate times of evolutionary divergence. This "tip dating" strategy accommodates uncertainty in the phylogenetic resolution of fossil taxa that provide calibration information in the relaxed molecular clock analysis. The estimated age of the most recent common ancestor (MRCA) of living gonorynchiforms is slightly older than estimates from previous node dating efforts, but the molecular tip dating

  7. Platform dependence of inference on gene-wise and gene-set involvement in human lung development

    Directory of Open Access Journals (Sweden)

    Kho Alvin T

    2009-06-01

    Full Text Available Abstract Background With the recent development of microarray technologies, the comparability of gene expression data obtained from different platforms poses an important problem. We evaluated two widely used platforms, Affymetrix U133 Plus 2.0 and the Illumina HumanRef-8 v2 Expression Bead Chips, for comparability in a biological system in which changes may be subtle, namely fetal lung tissue as a function of gestational age. Results We performed the comparison via sequence-based probe matching between the two platforms. "Significance grouping" was defined as a measure of comparability. Using both expression correlation and significance grouping as measures of comparability, we demonstrated that despite overall cross-platform differences at the single gene level, increased correlation between the two platforms was found in genes with higher expression level, higher probe overlap, and lower p-value. We also demonstrated that biological function as determined via KEGG pathways or GO categories is more consistent across platforms than single gene analysis. Conclusion We conclude that while the comparability of the platforms at the single gene level may be increased by increasing sample size, they are highly comparable ontologically even for subtle differences in a relatively small sample size. Biologically relevant inference should therefore be reproducible across laboratories using different platforms.

  8. Optimal structural inference of signaling pathways from unordered and overlapping gene sets.

    Science.gov (United States)

    Acharya, Lipi R; Judeh, Thair; Wang, Guangdi; Zhu, Dongxiao

    2012-02-15

    A plethora of bioinformatics analysis has led to the discovery of numerous gene sets, which can be interpreted as discrete measurements emitted from latent signaling pathways. Their potential to infer signaling pathway structures, however, has not been sufficiently exploited. Existing methods accommodating discrete data do not explicitly consider signal cascading mechanisms that characterize a signaling pathway. Novel computational methods are thus needed to fully utilize gene sets and broaden the scope from focusing only on pairwise interactions to the more general cascading events in the inference of signaling pathway structures. We propose a gene set based simulated annealing (SA) algorithm for the reconstruction of signaling pathway structures. A signaling pathway structure is a directed graph containing up to a few hundred nodes and many overlapping signal cascades, where each cascade represents a chain of molecular interactions from the cell surface to the nucleus. Gene sets in our context refer to discrete sets of genes participating in signal cascades, the basic building blocks of a signaling pathway, with no prior information about gene orderings in the cascades. From a compendium of gene sets related to a pathway, SA aims to search for signal cascades that characterize the optimal signaling pathway structure. In the search process, the extent of overlap among signal cascades is used to measure the optimality of a structure. Throughout, we treat gene sets as random samples from a first-order Markov chain model. We evaluated the performance of SA in three case studies. In the first study conducted on 83 KEGG pathways, SA demonstrated a significantly better performance than Bayesian network methods. Since both SA and Bayesian network methods accommodate discrete data, use a 'search and score' network learning strategy and output a directed network, they can be compared in terms of performance and computational time. In the second study, we compared SA and

  9. On the relation between gene flow theory and genetic gain

    Directory of Open Access Journals (Sweden)

    Woolliams John A

    2000-01-01

    Full Text Available Abstract In conventional gene flow theory the rate of genetic gain is calculated as the summed products of genetic selection differential and asymptotic proportion of genes deriving from sex-age groups. Recent studies have shown that asymptotic proportions of genes predicted from conventional gene flow theory may deviate considerably from true proportions. However, the rate of genetic gain predicted from conventional gene flow theory was accurate. The current note shows that the connection between asymptotic proportions of genes and rate of genetic gain that is embodied in conventional gene flow theory is invalid, even though genetic gain may be predicted correctly from it.

  10. Extensive error in the number of genes inferred from draft genome assemblies.

    Directory of Open Access Journals (Sweden)

    James F Denton

    2014-12-01

    Full Text Available Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.

  11. Inferring transcriptional gene regulation network of starch metabolism in Arabidopsis thaliana leaves using graphical Gaussian model

    Directory of Open Access Journals (Sweden)

    Ingkasuwan Papapit

    2012-08-01

    Full Text Available Abstract Background Starch serves as a temporal storage of carbohydrates in plant leaves during day/night cycles. To study transcriptional regulatory modules of this dynamic metabolic process, we conducted gene regulation network analysis based on small-sample inference of graphical Gaussian model (GGM. Results Time-series significant analysis was applied for Arabidopsis leaf transcriptome data to obtain a set of genes that are highly regulated under a diurnal cycle. A total of 1,480 diurnally regulated genes included 21 starch metabolic enzymes, 6 clock-associated genes, and 106 transcription factors (TF. A starch-clock-TF gene regulation network comprising 117 nodes and 266 edges was constructed by GGM from these 133 significant genes that are potentially related to the diurnal control of starch metabolism. From this network, we found that β-amylase 3 (b-amy3: At4g17090, which participates in starch degradation in chloroplast, is the most frequently connected gene (a hub gene. The robustness of gene-to-gene regulatory network was further analyzed by TF binding site prediction and by evaluating global co-expression of TFs and target starch metabolic enzymes. As a result, two TFs, indeterminate domain 5 (AtIDD5: At2g02070 and constans-like (COL: At2g21320, were identified as positive regulators of starch synthase 4 (SS4: At4g18240. The inference model of AtIDD5-dependent positive regulation of SS4 gene expression was experimentally supported by decreased SS4 mRNA accumulation in Atidd5 mutant plants during the light period of both short and long day conditions. COL was also shown to positively control SS4 mRNA accumulation. Furthermore, the knockout of AtIDD5 and COL led to deformation of chloroplast and its contained starch granules. This deformity also affected the number of starch granules per chloroplast, which increased significantly in both knockout mutant lines. Conclusions In this study, we utilized a systematic approach of microarray

  12. Highly restricted gene flow and deep evolutionary lineages in the giant clam Tridacna maxima

    Science.gov (United States)

    Nuryanto, A.; Kochzius, M.

    2009-09-01

    The tropical Indo-West Pacific is the biogeographic region with the highest diversity of marine shallow water species, with its centre in the Indo-Malay Archipelago. However, due to its high endemism, the Red Sea is also considered as an important centre of evolution. Currently, not much is known about exchange among the Red Sea, Indian Ocean and West Pacific, as well as connectivity within the Indo-Malay Archipelago, even though such information is important to illuminate ecological and evolutionary processes that shape marine biodiversity in these regions. In addition, the inference of connectivity among populations is important for conservation. This study aims to test the hypothesis that the Indo-Malay Archipelago and the Red Sea are important centres of evolution by studying the genetic population structure of the giant clam Tridacna maxima. This study is based on a 484-bp fragment of the cytochrome c oxidase I gene from 211 individuals collected at 14 localities in the Indo-West Pacific to infer lineage diversification and gene flow as a measure for connectivity. The analysis showed a significant genetic differentiation among sample sites in the Indo-West Pacific (Φst = 0.74, P < 0.001) and across the Indo-Malay Archipelago (Φst = 0.72, P < 0.001), indicating restricted gene flow. Hierarchical AMOVA revealed the highest fixation index (Φct = 0.8, P < 0.001) when sample sites were assigned to the following regions: (1) Red Sea, (2) Indian Ocean and Java Sea, (3) Indonesian throughflow and seas in the East of Sulawesi, and (4) Western Pacific. Geological history as well as oceanography are important factors that shape the genetic structure of T. maxima in the Indo-Malay Archipelago and Red Sea. The observed deep evolutionary lineages might include cryptic species and this result supports the notion that the Indo-Malay Archipelago and the Red Sea are important centres of evolution.

  13. NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference.

    Science.gov (United States)

    Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E

    2015-09-29

    In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.

  14. Paternity analysis of pollen-mediated gene flow for Fraxinus excelsior L. in a chronically fragmented landscape.

    Science.gov (United States)

    Bacles, C F E; Ennos, R A

    2008-10-01

    Paternity analysis based on microsatellite marker genotyping was used to infer contemporary genetic connectivity by pollen of three population remnants of the wind-pollinated, wind-dispersed tree Fraxinus excelsior, in a deforested Scottish landscape. By deterministically accounting for genotyping error and comparing a range of assignment methods, individual-based paternity assignments were used to derive population-level estimates of gene flow. Pollen immigration into a 300 ha landscape represents between 43 and 68% of effective pollination, mostly depending on assignment method. Individual male reproductive success is unequal, with 31 of 48 trees fertilizing one seed or more, but only three trees fertilizing more than ten seeds. Spatial analysis suggests a fat-tailed pollen dispersal curve with 85% of detected pollination occurring within 100 m, and 15% spreading between 300 and 1900 m from the source. Identification of immigrating pollen sourced from two neighbouring remnants indicates further effective dispersal at 2900 m. Pollen exchange among remnants is driven by population size rather than geographic distance, with larger remnants acting predominantly as pollen donors, and smaller remnants as pollen recipients. Enhanced wind dispersal of pollen in a barren landscape ensures that the seed produced within the catchment includes genetic material from a wide geographic area. However, gene flow estimates based on analysis of non-dispersed seeds were shown to underestimate realized gene immigration into the remnants by a factor of two suggesting that predictive landscape conservation requires integrated estimates of post-recruitment gene flow occurring via both pollen and seed.

  15. Data Integration for Microarrays: Enhanced Inference for Gene Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Alina Sîrbu

    2015-05-01

    Full Text Available Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions. Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.

  16. Data Integration for Microarrays: Enhanced Inference for Gene Regulatory Networks.

    Science.gov (United States)

    Sîrbu, Alina; Crane, Martin; Ruskin, Heather J

    2015-05-14

    Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions). Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.

  17. Phylogenetic relationships of Hemiptera inferred from mitochondrial and nuclear genes.

    Science.gov (United States)

    Song, Nan; Li, Hu; Cai, Wanzhi; Yan, Fengming; Wang, Jianyun; Song, Fan

    2016-11-01

    Here, we reconstructed the Hemiptera phylogeny based on the expanded mitochondrial protein-coding genes and the nuclear 18S rRNA gene, separately. The differential rates of change across lineages may associate with long-branch attraction (LBA) effect and result in conflicting estimates of phylogeny from different types of data. To reduce the potential effects of systematic biases on inferences of topology, various data coding schemes, site removal method, and different algorithms were utilized in phylogenetic reconstruction. We show that the outgroups Phthiraptera, Thysanoptera, and the ingroup Sternorrhyncha share similar base composition, and exhibit "long branches" relative to other hemipterans. Thus, the long-branch attraction between these groups is suspected to cause the failure of recovering Hemiptera under the homogeneous model. In contrast, a monophyletic Hemiptera is supported when heterogeneous model is utilized in the analysis. Although higher level phylogenetic relationships within Hemiptera remain to be answered, consensus between analyses is beginning to converge on a stable phylogeny.

  18. Allopatric speciation despite historical gene flow: Divergence and hybridization in Carex furva and C. lucennoiberica (Cyperaceae) inferred from plastid and nuclear RAD-seq data.

    Science.gov (United States)

    Maguilla, Enrique; Escudero, Marcial; Hipp, Andrew L; Luceño, Modesto

    2017-10-01

    Gene flow among incipient species can act as a creative or destructive force in the speciation process, generating variation on which natural selection can act while, potentially, undermining population divergence. The flowering plant genus Carex exhibits a rapid and relatively recent radiation with many species limits still unclear. This is the case with the Iberian Peninsula (Spain and Portugal)-endemic C. lucennoiberica, which lay unrecognized within Carex furva until its recent description as a new species. In this study, we test how these species were impacted by interspecific gene flow during speciation. We sampled the full range of distribution of C. furva (15 individuals sampled) and C. lucennoiberica (88 individuals), sequenced two cpDNA regions (atpI-atpH, psbA-trnH) and performed genomic sequencing of 45,100 SNPs using restriction site-associated DNA sequencing (RAD-seq). We utilized a set of partitioned D-statistic tests and demographic analyses to study the degree and direction of introgression. Additionally, we modelled species distributions to reconstruct changes in range distribution during glacial and interglacial periods. Plastid, nuclear and morphological data strongly support divergence between species with subsequent gene flow. Combined with species distribution modelling, these data support a scenario of allopatry leading to species divergence, followed by secondary contact and gene flow due to long-distance dispersal and/or range expansions and contractions in response to Quaternary glacial cycles. We conclude that this is a case of allopatric speciation despite historical secondary contacts, which could have temporally influenced the speciation process, contributing to the knowledge of forces that are driving or counteracting speciation. © 2017 John Wiley & Sons Ltd.

  19. BPhyOG: An interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes

    Directory of Open Access Journals (Sweden)

    Lin Kui

    2007-07-01

    Full Text Available Abstract Background Overlapping genes (OGs in bacterial genomes are pairs of adjacent genes of which the coding sequences overlap partly or entirely. With the rapid accumulation of sequence data, many OGs in bacterial genomes have now been identified. Indeed, these might prove a consistent feature across all microbial genomes. Our previous work suggests that OGs can be considered as robust markers at the whole genome level for the construction of phylogenies. An online, interactive web server for inferring phylogenies is needed for biologists to analyze phylogenetic relationships among a set of bacterial genomes of interest. Description BPhyOG is an online interactive server for reconstructing the phylogenies of completely sequenced bacterial genomes on the basis of their shared overlapping genes. It provides two tree-reconstruction methods: Neighbor Joining (NJ and Unweighted Pair-Group Method using Arithmetic averages (UPGMA. Users can apply the desired method to generate phylogenetic trees, which are based on an evolutionary distance matrix for the selected genomes. The distance between two genomes is defined by the normalized number of their shared OG pairs. BPhyOG also allows users to browse the OGs that were used to infer the phylogenetic relationships. It provides detailed annotation for each OG pair and the features of the component genes through hyperlinks. Users can also retrieve each of the homologous OG pairs that have been determined among 177 genomes. It is a useful tool for analyzing the tree of life and overlapping genes from a genomic standpoint. Conclusion BPhyOG is a useful interactive web server for genome-wide inference of any potential evolutionary relationship among the genomes selected by users. It currently includes 177 completely sequenced bacterial genomes containing 79,855 OG pairs, the annotation and homologous OG pairs of which are integrated comprehensively. The reliability of phylogenies complemented by

  20. A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species.

    Directory of Open Access Journals (Sweden)

    Thomas Mailund

    Full Text Available We present a hidden Markov model (HMM for inferring gradual isolation between two populations during speciation, modelled as a time interval with restricted gene flow. The HMM describes the history of adjacent nucleotides in two genomic sequences, such that the nucleotides can be separated by recombination, can migrate between populations, or can coalesce at variable time points, all dependent on the parameters of the model, which are the effective population sizes, splitting times, recombination rate, and migration rate. We show by extensive simulations that the HMM can accurately infer all parameters except the recombination rate, which is biased downwards. Inference is robust to variation in the mutation rate and the recombination rate over the sequence and also robust to unknown phase of genomes unless they are very closely related. We provide a test for whether divergence is gradual or instantaneous, and we apply the model to three key divergence processes in great apes: (a the bonobo and common chimpanzee, (b the eastern and western gorilla, and (c the Sumatran and Bornean orang-utan. We find that the bonobo and chimpanzee appear to have undergone a clear split, whereas the divergence processes of the gorilla and orang-utan species occurred over several hundred thousands years with gene flow stopping quite recently. We also apply the model to the Homo/Pan speciation event and find that the most likely scenario involves an extended period of gene flow during speciation.

  1. River Stream-Flow and Zayanderoud Reservoir Operation Modeling Using the Fuzzy Inference System

    Directory of Open Access Journals (Sweden)

    Saeed Jamali

    2007-12-01

    Full Text Available The Zayanderoud basin is located in the central plateau of Iran. As a result of population increase and agricultural and industrial developments, water demand on this basin has increased extensively. Given the importance of reservoir operation in water resource and management studies, the performance of fuzzy inference system (FIS for Zayanderoud reservoir operation is investigated in this paper. The model of operation consists of two parts. In the first part, the seasonal river stream-flow is forecasted using the fuzzy rule-based system. The southern oscillated index, rain, snow, and discharge are inputs of the model and the seasonal river stream-flow its output. In the second part, the operation model is constructed. The amount of releases is first optimized by a nonlinear optimization model and then the rule curves are extracted using the fuzzy inference system. This model operates on an "if-then" principle, where the "if" is a vector of fuzzy permits and "then" is the fuzzy result. The reservoir storage capacity, inflow, demand, and year condition factor are used as permits. Monthly release is taken as the consequence. The Zayanderoud basin is investigated as a case study. Different performance indices such as reliability, resiliency, and vulnerability are calculated. According to results, FIS works more effectively than the traditional reservoir operation methods such as standard operation policy (SOP or linear regression.

  2. Intraspecific relationship within the genus convolvulus l. inferred by rbcl gene using different phylogenetic approaches

    International Nuclear Information System (INIS)

    Kausar, S.; Qamarunnisa, S.

    2016-01-01

    A molecular systematics analysis was conducted using sequence data of chloroplast rbcL gene for the genus Convolvulus L., by distance and character based phylogenetic methods. Fifteen representative members from genus Convolvulus L., were included as in group whereas two members from a sister family Solanaceae were taken as out group to root the tree. Intraspecific relationships within Convolvulus were inferred by distance matrix, maximum parsimony and bayesian analysis. Transition/transversion ratio was also calculated and it was revealed that in the investigated Convolvulus species, transitional changes were more prevalent in rbcL gene. The nature of rbcL gene in the present study was observed to be conserved, as it does not show major variations between examined species. Distance matrix represented the minimal genetic variations between some species (C. glomeratus and C. pyrrhotrichus), thus exhibiting them as close relatives. The result of parsimonious and bayesian analysis revealed almost similar clades however maximum parsimony based tree was unable to establish relationship between some Convolvulus species. The bayesian inference method was found to be the method of choice for establishing intraspecific associations between Convolvulus species using rbcL data as it clearly defined the connections supported by posterior probability values. (author)

  3. Sex-biased gene flow among elk in the greater Yellowstone ecosystem

    Science.gov (United States)

    Hand, Brian K.; Chen, Shanyuan; Anderson, Neil; Beja-Pereira, Albano; Cross, Paul C.; Ebinger, Michael R.; Edwards, Hank; Garrott, Robert A.; Kardos, Marty D.; Kauffman, Matthew J.; Landguth, Erin L.; Middleton, Arthur; Scurlock, Brandon M.; White, P.J.; Zager, Pete; Schwartz, Michael K.; Luikart, Gordon

    2014-01-01

    We quantified patterns of population genetic structure to help understand gene flow among elk populations across the Greater Yellowstone Ecosystem. We sequenced 596 base pairs of the mitochondrial control region of 380 elk from eight populations. Analysis revealed high mitochondrial DNA variation within populations, averaging 13.0 haplotypes with high mean gene diversity (0.85). The genetic differentiation among populations for mitochondrial DNA was relatively high (FST  =  0.161; P  =  0.001) compared to genetic differentiation for nuclear microsatellite data (FST  =  0.002; P  =  0.332), which suggested relatively low female gene flow among populations. The estimated ratio of male to female gene flow (mm/mf  =  46) was among the highest we have seen reported for large mammals. Genetic distance (for mitochondrial DNA pairwise FST) was not significantly correlated with geographic (Euclidean) distance between populations (Mantel's r  =  0.274, P  =  0.168). Large mitochondrial DNA genetic distances (e.g., FST > 0.2) between some of the geographically closest populations (<65 km) suggested behavioral factors and/or landscape features might shape female gene flow patterns. Given the strong sex-biased gene flow, future research and conservation efforts should consider the sexes separately when modeling corridors of gene flow or predicting spread of maternally transmitted diseases. The growing availability of genetic data to compare male vs. female gene flow provides many exciting opportunities to explore the magnitude, causes, and implications of sex-biased gene flow likely to occur in many species.

  4. Patterns of gene flow define species of thermophilic Archaea.

    Directory of Open Access Journals (Sweden)

    Hinsby Cadillo-Quiroz

    2012-02-01

    Full Text Available Despite a growing appreciation of their vast diversity in nature, mechanisms of speciation are poorly understood in Bacteria and Archaea. Here we use high-throughput genome sequencing to identify ongoing speciation in the thermoacidophilic Archaeon Sulfolobus islandicus. Patterns of homologous gene flow among genomes of 12 strains from a single hot spring in Kamchatka, Russia, demonstrate higher levels of gene flow within than between two persistent, coexisting groups, demonstrating that these microorganisms fit the biological species concept. Furthermore, rates of gene flow between two species are decreasing over time in a manner consistent with incipient speciation. Unlike other microorganisms investigated, we do not observe a relationship between genetic divergence and frequency of recombination along a chromosome, or other physical mechanisms that would reduce gene flow between lineages. Each species has its own genetic island encoding unique physiological functions and a unique growth phenotype that may be indicative of ecological specialization. Genetic differentiation between these coexisting groups occurs in large genomic "continents," indicating the topology of genomic divergence during speciation is not uniform and is not associated with a single locus under strong diversifying selection. These data support a model where species do not require physical barriers to gene flow but are maintained by ecological differentiation.

  5. Patterns of gene flow define species of thermophilic Archaea.

    Science.gov (United States)

    Cadillo-Quiroz, Hinsby; Didelot, Xavier; Held, Nicole L; Herrera, Alfa; Darling, Aaron; Reno, Michael L; Krause, David J; Whitaker, Rachel J

    2012-02-01

    Despite a growing appreciation of their vast diversity in nature, mechanisms of speciation are poorly understood in Bacteria and Archaea. Here we use high-throughput genome sequencing to identify ongoing speciation in the thermoacidophilic Archaeon Sulfolobus islandicus. Patterns of homologous gene flow among genomes of 12 strains from a single hot spring in Kamchatka, Russia, demonstrate higher levels of gene flow within than between two persistent, coexisting groups, demonstrating that these microorganisms fit the biological species concept. Furthermore, rates of gene flow between two species are decreasing over time in a manner consistent with incipient speciation. Unlike other microorganisms investigated, we do not observe a relationship between genetic divergence and frequency of recombination along a chromosome, or other physical mechanisms that would reduce gene flow between lineages. Each species has its own genetic island encoding unique physiological functions and a unique growth phenotype that may be indicative of ecological specialization. Genetic differentiation between these coexisting groups occurs in large genomic "continents," indicating the topology of genomic divergence during speciation is not uniform and is not associated with a single locus under strong diversifying selection. These data support a model where species do not require physical barriers to gene flow but are maintained by ecological differentiation.

  6. [Phylogeny of protostome moulting animals (Ecdysozoa) inferred from 18 and 28S rRNA gene sequences].

    Science.gov (United States)

    Petrov, N B; Vladychenskaia, N S

    2005-01-01

    Reliability of reconstruction of phylogenetic relationships within a group of protostome moulting animals was evaluated by means of comparison of 18 and 28S rRNA gene sequences sets both taken separately and combined. Reliability of reconstructions was evaluated by values of the bootstrap support of major phylogenetic tree nodes and by degree of congruence of phylogenetic trees inferred by various methods. By both criteria, phylogenetic trees reconstructed from the combined 18 and 28S rRNA gene sequences were better than those inferred from 18 and 28S sequences taken separately. Results obtained are consistent with phylogenetic hypothesis separating protostome animals into two major clades, moulting Ecdysozoa (Priapulida + Kinorhyncha, Nematoda + Nematomorpha, Onychophora + Tardigrada, Myriapoda + Chelicerata, Crustacea + Hexapoda) and unmoulting Lophotrochozoa (Plathelminthes, Nemertini, Annelida, Mollusca, Echiura, Sipuncula). Clade Cephalorhyncha does not include nematomorphs (Nematomorpha). Conclusion was taken that it is necessary to use combined 18 and 28S data in phylogenetic studies.

  7. Dinoflagellate phylogeny as inferred from heat shock protein 90 and ribosomal gene sequences.

    Directory of Open Access Journals (Sweden)

    Mona Hoppenrath

    2010-10-01

    Full Text Available Interrelationships among dinoflagellates in molecular phylogenies are largely unresolved, especially in the deepest branches. Ribosomal DNA (rDNA sequences provide phylogenetic signals only at the tips of the dinoflagellate tree. Two reasons for the poor resolution of deep dinoflagellate relationships using rDNA sequences are (1 most sites are relatively conserved and (2 there are different evolutionary rates among sites in different lineages. Therefore, alternative molecular markers are required to address the deeper phylogenetic relationships among dinoflagellates. Preliminary evidence indicates that the heat shock protein 90 gene (Hsp90 will provide an informative marker, mainly because this gene is relatively long and appears to have relatively uniform rates of evolution in different lineages.We more than doubled the previous dataset of Hsp90 sequences from dinoflagellates by generating additional sequences from 17 different species, representing seven different orders. In order to concatenate the Hsp90 data with rDNA sequences, we supplemented the Hsp90 sequences with three new SSU rDNA sequences and five new LSU rDNA sequences. The new Hsp90 sequences were generated, in part, from four additional heterotrophic dinoflagellates and the type species for six different genera. Molecular phylogenetic analyses resulted in a paraphyletic assemblage near the base of the dinoflagellate tree consisting of only athecate species. However, Noctiluca was never part of this assemblage and branched in a position that was nested within other lineages of dinokaryotes. The phylogenetic trees inferred from Hsp90 sequences were consistent with trees inferred from rDNA sequences in that the backbone of the dinoflagellate clade was largely unresolved.The sequence conservation in both Hsp90 and rDNA sequences and the poor resolution of the deepest nodes suggests that dinoflagellates reflect an explosive radiation in morphological diversity in their recent

  8. Classification of natural circulation two-phase flow patterns using fuzzy inference on image analysis

    International Nuclear Information System (INIS)

    Mesquita, R.N. de; Masotti, P.H.F.; Penha, R.M.L.; Andrade, D.A.; Sabundjian, G.; Torres, W.M.

    2012-01-01

    Highlights: ► A fuzzy classification system for two-phase flow instability patterns is developed. ► Flow patterns are classified based on images of natural circulation experiments. ► Fuzzy inference is optimized to use single grayscale profiles as input. - Abstract: Two-phase flow on natural circulation phenomenon has been an important theme on recent studies related to nuclear reactor designs. The accuracy of heat transfer estimation has been improved with new models that require precise prediction of pattern transitions of flow. In this work, visualization of natural circulation cycles is used to study two-phase flow patterns associated with phase transients and static instabilities of flow. A Fuzzy Flow-type Classification System (FFCS) was developed to classify these patterns based only on image extracted features. Image acquisition and temperature measurements were simultaneously done. Experiments in natural circulation facility were adjusted to generate a series of characteristic two-phase flow instability periodic cycles. The facility is composed of a loop of glass tubes, a heat source using electrical heaters, a cold source using a helicoidal heat exchanger, a visualization section and thermocouples positioned over different loop sections. The instability cyclic period is estimated based on temperature measurements associated with the detection of a flow transition image pattern. FFCS shows good results provided that adequate image acquisition parameters and pre-processing adjustments are used.

  9. Speciation and gene flow between snails of opposite chirality.

    Directory of Open Access Journals (Sweden)

    Angus Davison

    2005-09-01

    Full Text Available Left-right asymmetry in snails is intriguing because individuals of opposite chirality are either unable to mate or can only mate with difficulty, so could be reproductively isolated from each other. We have therefore investigated chiral evolution in the Japanese land snail genus Euhadra to understand whether changes in chirality have promoted speciation. In particular, we aimed to understand the effect of the maternal inheritance of chirality on reproductive isolation and gene flow. We found that the mitochondrial DNA phylogeny of Euhadra is consistent with a single, relatively ancient evolution of sinistral species and suggests either recent "single-gene speciation" or gene flow between chiral morphs that are unable to mate. To clarify the conditions under which new chiral morphs might evolve and whether single-gene speciation can occur, we developed a mathematical model that is relevant to any maternal-effect gene. The model shows that reproductive character displacement can promote the evolution of new chiral morphs, tending to counteract the positive frequency-dependent selection that would otherwise drive the more common chiral morph to fixation. This therefore suggests a general mechanism as to how chiral variation arises in snails. In populations that contain both chiral morphs, two different situations are then possible. In the first, gene flow is substantial between morphs even without interchiral mating, because of the maternal inheritance of chirality. In the second, reproductive isolation is possible but unstable, and will also lead to gene flow if intrachiral matings occasionally produce offspring with the opposite chirality. Together, the results imply that speciation by chiral reversal is only meaningful in the context of a complex biogeographical process, and so must usually involve other factors. In order to understand the roles of reproductive character displacement and gene flow in the chiral evolution of Euhadra, it will be

  10. The evolutionary history of bears is characterized by gene flow across species

    Science.gov (United States)

    Kumar, Vikas; Lammers, Fritjof; Bidon, Tobias; Pfenninger, Markus; Kolter, Lydia; Nilsson, Maria A.; Janke, Axel

    2017-01-01

    Bears are iconic mammals with a complex evolutionary history. Natural bear hybrids and studies of few nuclear genes indicate that gene flow among bears may be more common than expected and not limited to polar and brown bears. Here we present a genome analysis of the bear family with representatives of all living species. Phylogenomic analyses of 869 mega base pairs divided into 18,621 genome fragments yielded a well-resolved coalescent species tree despite signals for extensive gene flow across species. However, genome analyses using different statistical methods show that gene flow is not limited to closely related species pairs. Strong ancestral gene flow between the Asiatic black bear and the ancestor to polar, brown and American black bear explains uncertainties in reconstructing the bear phylogeny. Gene flow across the bear clade may be mediated by intermediate species such as the geographically wide-spread brown bears leading to large amounts of phylogenetic conflict. Genome-scale analyses lead to a more complete understanding of complex evolutionary processes. Evidence for extensive inter-specific gene flow, found also in other animal species, necessitates shifting the attention from speciation processes achieving genome-wide reproductive isolation to the selective processes that maintain species divergence in the face of gene flow. PMID:28422140

  11. The evolutionary history of bears is characterized by gene flow across species.

    Science.gov (United States)

    Kumar, Vikas; Lammers, Fritjof; Bidon, Tobias; Pfenninger, Markus; Kolter, Lydia; Nilsson, Maria A; Janke, Axel

    2017-04-19

    Bears are iconic mammals with a complex evolutionary history. Natural bear hybrids and studies of few nuclear genes indicate that gene flow among bears may be more common than expected and not limited to polar and brown bears. Here we present a genome analysis of the bear family with representatives of all living species. Phylogenomic analyses of 869 mega base pairs divided into 18,621 genome fragments yielded a well-resolved coalescent species tree despite signals for extensive gene flow across species. However, genome analyses using different statistical methods show that gene flow is not limited to closely related species pairs. Strong ancestral gene flow between the Asiatic black bear and the ancestor to polar, brown and American black bear explains uncertainties in reconstructing the bear phylogeny. Gene flow across the bear clade may be mediated by intermediate species such as the geographically wide-spread brown bears leading to large amounts of phylogenetic conflict. Genome-scale analyses lead to a more complete understanding of complex evolutionary processes. Evidence for extensive inter-specific gene flow, found also in other animal species, necessitates shifting the attention from speciation processes achieving genome-wide reproductive isolation to the selective processes that maintain species divergence in the face of gene flow.

  12. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    Directory of Open Access Journals (Sweden)

    Allan Peter Davis

    Full Text Available Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and

  13. Inferring common cognitive mechanisms from brain blood-flow lateralisation data obtained with functional transcranial Doppler ultrasound.

    Directory of Open Access Journals (Sweden)

    Georg eMeyer

    2014-06-01

    Full Text Available Current neuroimaging techniques with high spatial resolution constrain participant motion so that many natural tasks cannot be carried out. The aim of this paper is to show how a time-locked correlation-analysis of cerebral blood flow velocity (CBFV lateralisation data, obtained with functional TransCranial Doppler (fTCD ultrasound, can be used to infer cerebral activation patterns across tasks. In a first experiment we demonstrate that the proposed analysis method results in data that are comparable with the standard Lateralisation Index (LI for within-task comparisons of CBFV patterns, recorded during cued word generation (CWG at two difficulty levels.In the main experiment we demonstrate that the proposed analysis method shows correlated blood-flow patterns for two different cognitive tasks that are known to draw on common brain areas, CWG and Music Synthesis. We show that CBFV patterns for Music and CWG are correlated only for participants with prior musical training.CBFV patterns for tasks that draw on distinct brain areas, the Tower of London and CWG, are not correlated.The proposed methodology extends conventional fTCD analysis by including temporal information in the analysis of cerebral blood-flow patterns to provide a robust, non-invasive method to infer whether common brain areas are used in different cognitive tasks. It complements conventional high resolution imaging techniques.

  14. Polyphyly and gene flow between non-sibling Heliconius species

    Directory of Open Access Journals (Sweden)

    Jiggins Chris D

    2006-04-01

    Full Text Available Abstract Background The view that gene flow between related animal species is rare and evolutionarily unimportant largely antedates sensitive molecular techniques. Here we use DNA sequencing to investigate a pair of morphologically and ecologically divergent, non-sibling butterfly species, Heliconius cydno and H. melpomene (Lepidoptera: Nymphalidae, whose distributions overlap in Central and Northwestern South America. Results In these taxa, we sequenced 30–45 haplotypes per locus of a mitochondrial region containing the genes for cytochrome oxidase subunits I and II (CoI/CoII, and intron-spanning fragments of three unlinked nuclear loci: triose-phosphate isomerase (Tpi, mannose-6-phosphate isomerase (Mpi and cubitus interruptus (Ci genes. A fifth gene, dopa decarboxylase (Ddc produced sequence data likely to be from different duplicate loci in some of the taxa, and so was excluded. Mitochondrial and Tpi genealogies are consistent with reciprocal monophyly, whereas sympatric populations of the species in Panama share identical or similar Mpi and Ci haplotypes, giving rise to genealogical polyphyly at the species level despite evidence for rapid sequence divergence at these genes between geographic races of H. melpomene. Conclusion Recent transfer of Mpi haplotypes between species is strongly supported, but there is no evidence for introgression at the other three loci. Our results demonstrate that the boundaries between animal species can remain selectively porous to gene flow long after speciation, and that introgression, even between non-sibling species, can be an important factor in animal evolution. Interspecific gene flow is demonstrated here for the first time in Heliconius and may provide a route for the transfer of switch-gene adaptations for Müllerian mimicry. The results also forcefully demonstrate how reliance on a single locus may give an erroneous picture of the overall genealogical history of speciation and gene flow.

  15. Prosecutor: parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources

    Directory of Open Access Journals (Sweden)

    van Hijum Sacha AFT

    2008-10-01

    Full Text Available Abstract Background Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar expression patterns are more likely to participate in shared biological processes. Results We developed Prosecutor, an application that enables researchers to rapidly infer gene function based on available gene expression data and functional annotations. Our parameter-free functional prediction method uses a sensitive algorithm to achieve a high association rate of linking genes with unknown function to annotated genes. Furthermore, Prosecutor utilizes additional biological information such as genomic context and known regulatory mechanisms that are specific for prokaryotes. We analyzed publicly available transcriptome data sets and used literature sources to validate putative functions suggested by Prosecutor. We supply the complete results of our analysis for 11 prokaryotic organisms on a dedicated website. Conclusion The Prosecutor software and supplementary datasets available at http://www.prosecutor.nl allow researchers working on any of the analyzed organisms to quickly identify the putative functions of their genes of interest. A de novo analysis allows new organisms to be studied.

  16. Cortical information flow during inferences of agency

    Directory of Open Access Journals (Sweden)

    Myrthel eDogge

    2014-08-01

    Full Text Available Building on the recent finding that agency experiences do not merely rely on sensorimotor information but also on cognitive cues, this exploratory study uses electroencephalographic recordings to examine functional connectivity during agency inference processing in a setting where action and outcome are independent. Participants completed a computerized task in which they pressed a button followed by one of two color words (red or blue and rated their experienced agency over producing the color. Before executing the action, a matching or mismatching color word was pre-activated by explicitly instructing participants to produce the color (goal condition or by briefly presenting the color word (prime condition. In both conditions, experienced agency was higher in matching versus mismatching trials. Furthermore, increased electroencephalography (EEG-based connectivity strength was observed between parietal and frontal nodes and within the (prefrontal cortex when color-outcomes matched with goals and participants reported high agency. This pattern of increased connectivity was not identified in trials where outcomes were pre-activated through primes. These results suggest that different connections are involved in the experience and in the loss of agency, as well as in inferences of agency resulting from different types of pre-activation. Moreover, the findings provide novel support for the involvement of a fronto-parietal network in agency inferences.

  17. Ecological Divergence and the Origins of Intrinsic Postmating Isolation with Gene Flow

    Directory of Open Access Journals (Sweden)

    Aneil F. Agrawal

    2011-01-01

    Full Text Available The evolution of intrinsic postmating isolation has received much attention, both historically and in recent studies of speciation genes. Intrinsic isolation often stems from between-locus genetic incompatibilities, where alleles that function well within species are incompatible with one another when brought together in the genome of a hybrid. It can be difficult for such incompatibilities to originate when populations diverge with gene flow, because deleterious genotypic combinations will be created and then purged by selection. However, it has been argued that if genes underlying incompatibilities are themselves subject to divergent selection, then they might overcome gene flow to diverge between populations, resulting in the origin of incompatibilities. Nonetheless, there has been little explicit mathematical exploration of such scenarios for the origin of intrinsic incompatibilities during ecological speciation with gene flow. Here we explore theoretical models for the origin of intrinsic isolation where genes subject to divergent natural selection also affect intrinsic isolation, either directly or via linkage disequilibrium with other loci. Such genes indeed overcome gene flow, diverge between populations, and thus result in the evolution of intrinsic isolation. We also examine barriers to neutral gene flow. Surprisingly, we find that intrinsic isolation sometimes weakens this barrier, by impeding differentiation via ecologically based divergent selection.

  18. Urban landscape genomics identifies fine-scale gene flow patterns in an avian invasive.

    Science.gov (United States)

    Low, G W; Chattopadhyay, B; Garg, K M; Irestedt, M; Ericson, Pgp; Yap, G; Tang, Q; Wu, S; Rheindt, F E

    2018-01-01

    Invasive species exert a serious impact on native fauna and flora and have been the target of many eradication and management efforts worldwide. However, a lack of data on population structure and history, exacerbated by the recency of many species introductions, limits the efficiency with which such species can be kept at bay. In this study we generated a novel genome of high assembly quality and genotyped 4735 genome-wide single nucleotide polymorphic (SNP) markers from 78 individuals of an invasive population of the Javan Myna Acridotheres javanicus across the island of Singapore. We inferred limited population subdivision at a micro-geographic level, a genetic patch size (~13-14 km) indicative of a pronounced dispersal ability, and barely an increase in effective population size since introduction despite an increase of four to five orders of magnitude in actual population size, suggesting that low population-genetic diversity following a bottleneck has not impeded establishment success. Landscape genomic analyses identified urban features, such as low-rise neighborhoods, that constitute pronounced barriers to gene flow. Based on our data, we consider an approach targeting the complete eradication of Javan Mynas across Singapore to be unfeasible. Instead, a mixed approach of localized mitigation measures taking into account urban geographic features and planning policy may be the most promising avenue to reducing the adverse impacts of this urban pest. Our study demonstrates how genomic methods can directly inform the management and control of invasive species, even in geographically limited datasets with high gene flow rates.

  19. Gene network inference and biochemical assessment delineates GPCR pathways and CREB targets in small intestinal neuroendocrine neoplasia.

    Directory of Open Access Journals (Sweden)

    Ignat Drozdov

    Full Text Available Small intestinal (SI neuroendocrine tumors (NET are increasing in incidence, however little is known about their biology. High throughput techniques such as inference of gene regulatory networks from microarray experiments can objectively define signaling machinery in this disease. Genome-wide co-expression analysis was used to infer gene relevance network in SI-NETs. The network was confirmed to be non-random, scale-free, and highly modular. Functional analysis of gene co-expression modules revealed processes including 'Nervous system development', 'Immune response', and 'Cell-cycle'. Importantly, gene network topology and differential expression analysis identified over-expression of the GPCR signaling regulators, the cAMP synthetase, ADCY2, and the protein kinase A, PRKAR1A. Seven CREB response element (CRE transcripts associated with proliferation and secretion: BEX1, BICD1, CHGB, CPE, GABRB3, SCG2 and SCG3 as well as ADCY2 and PRKAR1A were measured in an independent SI dataset (n = 10 NETs; n = 8 normal preparations. All were up-regulated (p<0.035 with the exception of SCG3 which was not differently expressed. Forskolin (a direct cAMP activator, 10(-5 M significantly stimulated transcription of pCREB and 3/7 CREB targets, isoproterenol (a selective ß-adrenergic receptor agonist and cAMP activator, 10(-5 M stimulated pCREB and 4/7 targets while BIM-53061 (a dopamine D(2 and Serotonin [5-HT(2] receptor agonist, 10(-6 M stimulated 100% of targets as well as pCREB; CRE transcription correlated with the levels of cAMP accumulation and PKA activity; BIM-53061 stimulated the highest levels of cAMP and PKA (2.8-fold and 2.5-fold vs. 1.8-2-fold for isoproterenol and forskolin. Gene network inference and graph topology analysis in SI NETs suggests that SI NETs express neural GPCRs that activate different CRE targets associated with proliferation and secretion. In vitro studies, in a model NET cell system, confirmed that transcriptional

  20. Inferring common cognitive mechanisms from brain blood-flow lateralization data: a new methodology for fTCD analysis.

    Science.gov (United States)

    Meyer, Georg F; Spray, Amy; Fairlie, Jo E; Uomini, Natalie T

    2014-01-01

    Current neuroimaging techniques with high spatial resolution constrain participant motion so that many natural tasks cannot be carried out. The aim of this paper is to show how a time-locked correlation-analysis of cerebral blood flow velocity (CBFV) lateralization data, obtained with functional TransCranial Doppler (fTCD) ultrasound, can be used to infer cerebral activation patterns across tasks. In a first experiment we demonstrate that the proposed analysis method results in data that are comparable with the standard Lateralization Index (LI) for within-task comparisons of CBFV patterns, recorded during cued word generation (CWG) at two difficulty levels. In the main experiment we demonstrate that the proposed analysis method shows correlated blood-flow patterns for two different cognitive tasks that are known to draw on common brain areas, CWG, and Music Synthesis. We show that CBFV patterns for Music and CWG are correlated only for participants with prior musical training. CBFV patterns for tasks that draw on distinct brain areas, the Tower of London and CWG, are not correlated. The proposed methodology extends conventional fTCD analysis by including temporal information in the analysis of cerebral blood-flow patterns to provide a robust, non-invasive method to infer whether common brain areas are used in different cognitive tasks. It complements conventional high resolution imaging techniques.

  1. Inferring metabolic states in uncharacterized environments using gene-expression measurements.

    Directory of Open Access Journals (Sweden)

    Sergio Rossell

    Full Text Available The large size of metabolic networks entails an overwhelming multiplicity in the possible steady-state flux distributions that are compatible with stoichiometric constraints. This space of possibilities is largest in the frequent situation where the nutrients available to the cells are unknown. These two factors: network size and lack of knowledge of nutrient availability, challenge the identification of the actual metabolic state of living cells among the myriad possibilities. Here we address this challenge by developing a method that integrates gene-expression measurements with genome-scale models of metabolism as a means of inferring metabolic states. Our method explores the space of alternative flux distributions that maximize the agreement between gene expression and metabolic fluxes, and thereby identifies reactions that are likely to be active in the culture from which the gene-expression measurements were taken. These active reactions are used to build environment-specific metabolic models and to predict actual metabolic states. We applied our method to model the metabolic states of Saccharomyces cerevisiae growing in rich media supplemented with either glucose or ethanol as the main energy source. The resulting models comprise about 50% of the reactions in the original model, and predict environment-specific essential genes with high sensitivity. By minimizing the sum of fluxes while forcing our predicted active reactions to carry flux, we predicted the metabolic states of these yeast cultures that are in large agreement with what is known about yeast physiology. Most notably, our method predicts the Crabtree effect in yeast cells growing in excess glucose, a long-known phenomenon that could not have been predicted by traditional constraint-based modeling approaches. Our method is of immediate practical relevance for medical and industrial applications, such as the identification of novel drug targets, and the development of

  2. Phylogenetic Relationships of Pseudorasbora, Pseudopungtungia, and Pungtungia (Teleostei; Cypriniformes; Gobioninae Inferred from Multiple Nuclear Gene Sequences

    Directory of Open Access Journals (Sweden)

    Keun-Yong Kim

    2013-01-01

    Full Text Available Gobionine species belonging to the genera Pseudorasbora, Pseudopungtungia, and Pungtungia (Teleostei; Cypriniformes; Cyprinidae have been heavily studied because of problems on taxonomy, threats of extinction, invasion, and human health. Nucleotide sequences of three nuclear genes, that is, recombination activating protein gene 1 (rag1, recombination activating gene 2 (rag2, and early growth response 1 gene (egr1, from Pseudorasbora, Pseudopungtungia, and Pungtungia species residing in China, Japan, and Korea, were analyzed to elucidate their intergeneric and interspecific phylogenetic relationships. In the phylogenetic tree inferred from their multiple gene sequences, Pseudorasbora, Pseudopungtungia and Pungtungia species ramified into three phylogenetically distinct clades; the “tenuicorpa” clade composed of Pseudopungtungia tenuicorpa, the “parva” clade composed of all Pseudorasbora species/subspecies, and the “herzi” clade composed of Pseudopungtungia nigra, and Pungtungia herzi. The genus Pseudorasbora was recovered as monophyletic, while the genus Pseudopungtungia was recovered as polyphyletic. Our phylogenetic result implies the unstable taxonomic status of the genus Pseudopungtungia.

  3. Assessment of network inference methods: how to cope with an underdetermined problem.

    Directory of Open Access Journals (Sweden)

    Caroline Siegenthaler

    Full Text Available The inference of biological networks is an active research area in the field of systems biology. The number of network inference algorithms has grown tremendously in the last decade, underlining the importance of a fair assessment and comparison among these methods. Current assessments of the performance of an inference method typically involve the application of the algorithm to benchmark datasets and the comparison of the network predictions against the gold standard or reference networks. While the network inference problem is often deemed underdetermined, implying that the inference problem does not have a (unique solution, the consequences of such an attribute have not been rigorously taken into consideration. Here, we propose a new procedure for assessing the performance of gene regulatory network (GRN inference methods. The procedure takes into account the underdetermined nature of the inference problem, in which gene regulatory interactions that are inferable or non-inferable are determined based on causal inference. The assessment relies on a new definition of the confusion matrix, which excludes errors associated with non-inferable gene regulations. For demonstration purposes, the proposed assessment procedure is applied to the DREAM 4 In Silico Network Challenge. The results show a marked change in the ranking of participating methods when taking network inferability into account.

  4. A conceptual framework that links pollinator foraging behavior to gene flow

    Science.gov (United States)

    In insect-pollinated crops such as alfalfa, a better understanding of how pollinator foraging behavior affects gene flow could lead to the development of management strategies to reduce gene flow and facilitate the coexistence of distinct seed-production markets. Here, we introduce a conceptual fram...

  5. Cortical information flow during inferences of agency

    NARCIS (Netherlands)

    Dogge, Myrthel; Hofman, Dennis; Boersma, Maria; Dijkerman, H Chris; Aarts, Henk

    2014-01-01

    Building on the recent finding that agency experiences do not merely rely on sensorimotor information but also on cognitive cues, this exploratory study uses electroencephalographic recordings to examine functional connectivity during agency inference processing in a setting where action and outcome

  6. Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

    Directory of Open Access Journals (Sweden)

    Richard R Stein

    2015-07-01

    Full Text Available Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

  7. Long-distance gene flow and adaptation of forest trees to rapid climate change

    Science.gov (United States)

    Kremer, Antoine; Ronce, Ophélie; Robledo-Arnuncio, Juan J; Guillaume, Frédéric; Bohrer, Gil; Nathan, Ran; Bridle, Jon R; Gomulkiewicz, Richard; Klein, Etienne K; Ritland, Kermit; Kuparinen, Anna; Gerber, Sophie; Schueler, Silvio

    2012-01-01

    Forest trees are the dominant species in many parts of the world and predicting how they might respond to climate change is a vital global concern. Trees are capable of long-distance gene flow, which can promote adaptive evolution in novel environments by increasing genetic variation for fitness. It is unclear, however, if this can compensate for maladaptive effects of gene flow and for the long-generation times of trees. We critically review data on the extent of long-distance gene flow and summarise theory that allows us to predict evolutionary responses of trees to climate change. Estimates of long-distance gene flow based both on direct observations and on genetic methods provide evidence that genes can move over spatial scales larger than habitat shifts predicted under climate change within one generation. Both theoretical and empirical data suggest that the positive effects of gene flow on adaptation may dominate in many instances. The balance of positive to negative consequences of gene flow may, however, differ for leading edge, core and rear sections of forest distributions. We propose future experimental and theoretical research that would better integrate dispersal biology with evolutionary quantitative genetics and improve predictions of tree responses to climate change. PMID:22372546

  8. Phylogeny of the Celastraceae inferred from phytochrome B gene sequence and morphology.

    Science.gov (United States)

    Simmons, M P; Clevinger, C C; Savolainen, V; Archer, R H; Mathews, S; Doyle, J J

    2001-02-01

    Phylogenetic relationships within Celastraceae were inferred using a simultaneous analysis of 61 morphological characters and 1123 base pairs of phytochrome B exon 1 from the nuclear genome. No gaps were inferred, and the gene tree topology suggests that the primers were specific to a single locus that did not duplicate among the lineages sampled. This region of phytochrome B was most useful for examining relationships among closely related genera. Fifty-one species from 38 genera of Celastraceae were sampled. The Celastraceae sensu lato (including Hippocrateaceae) were resolved as a monophyletic group. Loesener's subfamilies and tribes of Celastraceae were not supported. The Hippocrateaceae were resolved as a monophyletic group nested within a paraphyletic Celastraceae sensu stricto. Goupia was resolved as more closely related to Euphorbiaceae, Corynocarpaceae, and Linaceae than to Celastraceae. Plagiopteron (Flacourtiaceae) was resolved as the sister group of Hippocrateoideae. Brexia (Brexiaceae) was resolved as closely related to Elaeodendron and Pleurostylia. Canotia was resolved as the sister group of Acanthothamnus within Celastraceae. Perrottetia and Mortonia were resolved as the sister group of the rest of the Celastraceae. Siphonodon was resolved as a derived member of Celastraceae. Maytenus was resolved as three disparate groups, suggesting that this large genus needs to be recircumscribed.

  9. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.

  10. Integrative inference of population history in the Ibero-Maghrebian endemic Pleurodeles waltl (Salamandridae).

    Science.gov (United States)

    Gutiérrez-Rodríguez, Jorge; Barbosa, A Márcia; Martínez-Solano, Íñigo

    2017-07-01

    Inference of population histories from the molecular signatures of past demographic processes is challenging, but recent methodological advances in species distribution models and their integration in time-calibrated phylogeographic studies allow detailed reconstruction of complex biogeographic scenarios. We apply an integrative approach to infer the evolutionary history of the Iberian ribbed newt (Pleurodeles waltl), an Ibero-Maghrebian endemic with populations north and south of the Strait of Gibraltar. We analyzed an extensive multilocus dataset (mitochondrial and nuclear DNA sequences and ten polymorphic microsatellite loci) and found a deep east-west phylogeographic break in Iberian populations dating back to the Plio-Pleistocene. This break is inferred to result from vicariance associated with the formation of the Guadalquivir river basin. In contrast with previous studies, North African populations showed exclusive mtDNA haplotypes, and formed a monophyletic clade within the Eastern Iberian lineage in the mtDNA genealogy. On the other hand, microsatellites failed to recover Moroccan populations as a differentiated genetic cluster. This is interpreted to result from post-divergence gene flow based on the results of IMA2 and Migrate analyses. Thus, Moroccan populations would have originated after overseas dispersal from the Iberian Peninsula in the Pleistocene, with subsequent gene flow in more recent times, implying at least two trans-marine dispersal events. We modeled the distribution of the species and of each lineage, and projected these models back in time to infer climatically favourable areas during the mid-Holocene, the last glacial maximum (LGM) and the last interglacial (LIG), to reconstruct more recent population dynamics. We found minor differences in climatic favourability across lineages, suggesting intraspecific niche conservatism. Genetic diversity was significantly correlated with the intersection of environmental favourability in the LIG and

  11. Inferring Phylogenetic Networks Using PhyloNet.

    Science.gov (United States)

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  12. Functional networks inference from rule-based machine learning models.

    Science.gov (United States)

    Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume

    2016-01-01

    Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The

  13. Geography, assortative mating, and the effects of sexual selection on speciation with gene flow.

    Science.gov (United States)

    Servedio, Maria R

    2016-01-01

    Theoretical and empirical research on the evolution of reproductive isolation have both indicated that the effects of sexual selection on speciation with gene flow are quite complex. As part of this special issue on the contributions of women to basic and applied evolutionary biology, I discuss my work on this question in the context of a broader assessment of the patterns of sexual selection that lead to, versus inhibit, the speciation process, as derived from theoretical research. In particular, I focus on how two factors, the geographic context of speciation and the mechanism leading to assortative mating, interact to alter the effect that sexual selection through mate choice has on speciation. I concentrate on two geographic contexts: sympatry and secondary contact between two geographically separated populations that are exchanging migrants and two mechanisms of assortative mating: phenotype matching and separate preferences and traits. I show that both of these factors must be considered for the effects of sexual selection on speciation to be inferred.

  14. Study of gene flow from GM cotton (Gossypium hirsutum) varieties in El Espinal (Tolima, Colombia)

    International Nuclear Information System (INIS)

    Rache Cardenal, Leidy Yanira; Mora Oberlaender, Julian; Chaparro Giraldo, Alejandro

    2013-01-01

    In 2009, 4088 hectares of genetically modified (GM) cotton were planted in Tolima (Colombia), however there is some uncertainty about containment measures needed to prevent the flow of pollen and seed from regulated GM fields into adjacent fields. In this study, the gene flow from GM cotton varieties to conventional or feral cotton plants via seed and pollen was evaluated. ImmunostripTM, PCR and ELISA assays were used to detect gene flow. Fifty six refuges, 27 fields with conventional cotton and four feral individuals of the enterprise Remolinos Inc. located in El Espinal (Tolima) were analyzed in the first half of 2010. The results indicated seed mediated gene flow in 45 refuges (80.4 %) and 26 fields with conventional cotton (96 %), besides pollen mediated gene flow in one field with conventional cotton and nine refuges. All fields cultivated with conventional cotton showed gene flow from GM cotton. Two refuges and two feral individuals did not reveal gene flow from GM cotton.

  15. A new fast method for inferring multiple consensus trees using k-medoids.

    Science.gov (United States)

    Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir

    2018-04-05

    Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Caliński-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while

  16. Algorithms for MDC-Based Multi-locus Phylogeny Inference

    Science.gov (United States)

    Yu, Yun; Warnow, Tandy; Nakhleh, Luay

    One of the criteria for inferring a species tree from a collection of gene trees, when gene tree incongruence is assumed to be due to incomplete lineage sorting (ILS), is minimize deep coalescence, or MDC. Exact algorithms for inferring the species tree from rooted, binary trees under MDC were recently introduced. Nevertheless, in phylogenetic analyses of biological data sets, estimated gene trees may differ from true gene trees, be incompletely resolved, and not necessarily rooted. In this paper, we propose new MDC formulations for the cases where the gene trees are unrooted/binary, rooted/non-binary, and unrooted/non-binary. Further, we prove structural theorems that allow us to extend the algorithms for the rooted/binary gene tree case to these cases in a straightforward manner. Finally, we study the performance of these methods in coalescent-based computer simulations.

  17. Bayesian Inference of High-Dimensional Dynamical Ocean Models

    Science.gov (United States)

    Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.

    2015-12-01

    This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.

  18. Interspecific and interploidal gene flow in Central European Arabidopsis (Brassicaceae

    Directory of Open Access Journals (Sweden)

    Jørgensen Marte H

    2011-11-01

    Full Text Available Abstract Background Effects of polyploidisation on gene flow between natural populations are little known. Central European diploid and tetraploid populations of Arabidopsis arenosa and A. lyrata are here used to study interspecific and interploidal gene flow, using a combination of nuclear and plastid markers. Results Ploidal levels were confirmed by flow cytometry. Network analyses clearly separated diploids according to species. Tetraploids and diploids were highly intermingled within species, and some tetraploids intermingled with the other species, as well. Isolation with migration analyses suggested interspecific introgression from tetraploid A. arenosa to tetraploid A. lyrata and vice versa, and some interploidal gene flow, which was unidirectional from diploid to tetraploid in A. arenosa and bidirectional in A. lyrata. Conclusions Interspecific genetic isolation at diploid level combined with introgression at tetraploid level indicates that polyploidy may buffer against negative consequences of interspecific hybridisation. The role of introgression in polyploid systems may, however, differ between plant species, and even within the small genus Arabidopsis, we find very different evolutionary fates when it comes to introgression.

  19. SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354

  20. Impact of Bee Species and Plant Density on Alfalfa Pollination and Potential for Gene Flow

    Directory of Open Access Journals (Sweden)

    Johanne Brunet

    2010-01-01

    Full Text Available In outcrossing crops like alfalfa, various bee species can contribute to pollination and gene flow in seed production fields. With the increasing use of transgenic crops, it becomes important to determine the role of these distinct pollinators on alfalfa pollination and gene flow. The current study examines the relative contribution of honeybees, three bumble bee species, and three solitary bee species to pollination and gene flow in alfalfa. Two wild solitary bee species and one wild bumble bee species were best at tripping flowers, while the two managed pollinators commonly used in alfalfa seed production, honeybees and leaf cutting bees, had the lowest tripping rate. Honeybees had the greatest potential for gene flow and risk of transgene escape relative to the other pollinators. For honeybees, gene flow and risk of transgene escape were not affected by plant density although for the three bumble bee species gene flow and risk of transgene escape were the greatest in high-density fields.

  1. Information-Theoretic Inference of Large Transcriptional Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Meyer Patrick

    2007-01-01

    Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

  2. Information-Theoretic Inference of Large Transcriptional Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Patrick E. Meyer

    2007-06-01

    Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

  3. Inferring contemporary levels of gene flow and demographic history in a local population of the leaf beetle Gonioctena olivacea from mitochondrial DNA sequence variation.

    Science.gov (United States)

    Mardulyn, Patrick; Milinkovitch, Michel C

    2005-05-01

    We have studied mitochondrial DNA variation in a local population of the leaf beetle species Gonioctena olivacea, to check whether its apparent low dispersal behaviour affects its pattern of genetic variation at a small geographical scale. We have sampled 10 populations of G. olivacea within a rectangle of 5 x 2 km in the Belgian Ardennes, as well as five populations located approximately along a straight line of 30 km and separated by distances of 3-12 km. For each sampled individual (8-19 per population), a fragment of the mtDNA control region was polymerase chain reaction-amplified and sequenced. Sequence data were analysed to test whether significant genetic differentiation could be detected among populations separated by such relatively short distances. The reconstructed genealogy of the mitochondrial haplotypes was also used to investigate the demographic history of these populations. Computer simulations of the evolution of populations were conducted to assess the minimum amount of gene flow that is necessary to explain the observed pattern of variation in the samples. Results show that migration among populations included in the rectangle of 5 x 2 km is substantial, and probably involves the occurrence of dispersal flights. This appears difficult to reconcile with the results of a previous ecological field study that concluded that most of this species dispersal occurs by walking. While sufficient migration to homogenize genetic diversity occurs among populations separated by distances of a few hundred metres to a few kilometres, distances greater than 5 km results in contrast in strong differentiation among populations, suggesting that migration is drastically reduced on such distances. Finally, the results of coalescent simulations suggest that the star-like genealogy inferred from the mtDNA sequence data is fully compatible with a past demographic expansion. However, a metapopulation structure alone (without the need to invoke a population expansion

  4. Divergence and gene flow in the globally distributed blue-winged ducks

    Science.gov (United States)

    Nelson, Joel; Wilson, Robert E.; McCracken, Kevin G.; Cumming, Graeme; Joseph, Leo; Guay, Patrick-Jean; Peters, Jeffrey

    2017-01-01

    The ability to disperse over long distances can result in a high propensity for colonizing new geographic regions, including uninhabited continents, and lead to lineage diversification via allopatric speciation. However, high vagility can also result in gene flow between otherwise allopatric populations, and in some cases, parapatric or divergence-with-gene-flow models might be more applicable to widely distributed lineages. Here, we use five nuclear introns and the mitochondrial control region along with Bayesian models of isolation with migration to examine divergence, gene flow, and phylogenetic relationships within a cosmopolitan lineage comprising six species, the blue-winged ducks (genus Anas), which inhabit all continents except Antarctica. We found two primary sub-lineages, the globally-distributed shoveler group and the New World blue-winged/cinnamon teal group. The blue-winged/cinnamon sub-lineage is composed of sister taxa from North America and South America, and taxa with parapatric distributions are characterized by low to moderate levels of gene flow. In contrast, our data support strict allopatry for most comparisons within the shovelers. However, we found evidence of gene flow from the migratory, Holarctic northern shoveler (A. clypeata) and the more sedentary, African Cape shoveler (A. smithii) into the Australasian shoveler (A. rhynchotis), although we could not reject strict allopatry. Given the diverse mechanisms of speciation within this complex, the shovelers and blue-winged/cinnamon teals can serve as an effective model system for examining how the genome diverges under different evolutionary processes and how genetic variation is partitioned among highly dispersive taxa.

  5. Phylogeny of Celastraceae tribe Euonymeae inferred from morphological characters and nuclear and plastid genes.

    Science.gov (United States)

    Simmons, Mark P; McKenna, Miles J; Bacon, Christine D; Yakobson, Kendra; Cappa, Jennifer J; Archer, Robert H; Ford, Andrew J

    2012-01-01

    The phylogeny of Celastraceae tribe Euonymeae (≈ 230 species in eight genera in both the Old and New Worlds) was inferred using morphological characters together with plastid (matK, trnL-F) and nuclear (ITS and 26S rDNA) genes. Tribe Euonymeae has been defined as those genera of Celastraceae with generally opposite leaves, isomerous carpels, loculicidally dehiscent capsules, and arillate seeds (except Microtropis). Euonymus is the most diverse (129 species) and widely cultivated genus in the tribe. We infer that tribe Euonymeae consists of at least six separate lineages within Celastraceae and that a revised natural classification of the family is needed. Microtropis and Quetzalia are inferred to be distinct sister groups that together are sister to Zinowiewia. The endangered Monimopetalum chinense is an isolated and early derived lineage of Celastraceae that represents an important component of phylogenetic diversity within the family. Hedraianthera is sister to Brassiantha, and we describe a second species (Brassiantha hedraiantheroides A.J. Ford) that represents the first reported occurrence of this genus in Australia. Euonymus globularis, from eastern Australia, is sister to Menepetalum, which is endemic to New Caledonia, and we erect a new genus (Dinghoua R.H. Archer) for it. The Madagascan species of Euonymus are sister to Pleurostylia and recognized as a distinct genus (Astrocassine ined.). Glyptopetalum, Torralbasia, and Xylonymus are all closely related to Euonymus sensu stricto and are questionably distinct from it. Current intrageneric classifications of Euonymus are not completely natural and require revision. Copyright © 2011 Elsevier Inc. All rights reserved.

  6. Bayesian assignment of gene ontology terms to gene expression experiments

    Science.gov (United States)

    Sykacek, P.

    2012-01-01

    Motivation: Gene expression assays allow for genome scale analyses of molecular biological mechanisms. State-of-the-art data analysis provides lists of involved genes, either by calculating significance levels of mRNA abundance or by Bayesian assessments of gene activity. A common problem of such approaches is the difficulty of interpreting the biological implication of the resulting gene lists. This lead to an increased interest in methods for inferring high-level biological information. A common approach for representing high level information is by inferring gene ontology (GO) terms which may be attributed to the expression data experiment. Results: This article proposes a probabilistic model for GO term inference. Modelling assumes that gene annotations to GO terms are available and gene involvement in an experiment is represented by a posterior probabilities over gene-specific indicator variables. Such probability measures result from many Bayesian approaches for expression data analysis. The proposed model combines these indicator probabilities in a probabilistic fashion and provides a probabilistic GO term assignment as a result. Experiments on synthetic and microarray data suggest that advantages of the proposed probabilistic GO term inference over statistical test-based approaches are in particular evident for sparsely annotated GO terms and in situations of large uncertainty about gene activity. Provided that appropriate annotations exist, the proposed approach is easily applied to inferring other high level assignments like pathways. Availability: Source code under GPL license is available from the author. Contact: peter.sykacek@boku.ac.at PMID:22962488

  7. Bayesian assignment of gene ontology terms to gene expression experiments.

    Science.gov (United States)

    Sykacek, P

    2012-09-15

    Gene expression assays allow for genome scale analyses of molecular biological mechanisms. State-of-the-art data analysis provides lists of involved genes, either by calculating significance levels of mRNA abundance or by Bayesian assessments of gene activity. A common problem of such approaches is the difficulty of interpreting the biological implication of the resulting gene lists. This lead to an increased interest in methods for inferring high-level biological information. A common approach for representing high level information is by inferring gene ontology (GO) terms which may be attributed to the expression data experiment. This article proposes a probabilistic model for GO term inference. Modelling assumes that gene annotations to GO terms are available and gene involvement in an experiment is represented by a posterior probabilities over gene-specific indicator variables. Such probability measures result from many Bayesian approaches for expression data analysis. The proposed model combines these indicator probabilities in a probabilistic fashion and provides a probabilistic GO term assignment as a result. Experiments on synthetic and microarray data suggest that advantages of the proposed probabilistic GO term inference over statistical test-based approaches are in particular evident for sparsely annotated GO terms and in situations of large uncertainty about gene activity. Provided that appropriate annotations exist, the proposed approach is easily applied to inferring other high level assignments like pathways. Source code under GPL license is available from the author. peter.sykacek@boku.ac.at.

  8. Gene flow among wild and domesticated almond species: insights from chloroplast and nuclear markers

    Science.gov (United States)

    Delplancke, Malou; Alvarez, Nadir; Espíndola, Anahí; Joly, Hélène; Benoit, Laure; Brouck, Elise; Arrigo, Nils

    2012-01-01

    Hybridization has played a central role in the evolutionary history of domesticated plants. Notably, several breeding programs relying on gene introgression from the wild compartment have been performed in fruit tree species within the genus Prunus but few studies investigated spontaneous gene flow among wild and domesticated Prunus species. Consequently, a comprehensive understanding of genetic relationships and levels of gene flow between domesticated and wild Prunus species is needed. Combining nuclear and chloroplastic microsatellites, we investigated the gene flow and hybridization among two key almond tree species, the cultivated Prunus dulcis and one of the most widespread wild relative Prunus orientalis in the Fertile Crescent. We detected high genetic diversity levels in both species along with substantial and symmetric gene flow between the domesticated P. dulcis and the wild P. orientalis. These results were discussed in light of the cultivated species diversity, by outlining the frequent spontaneous genetic contributions of wild species to the domesticated compartment. In addition, crop-to-wild gene flow suggests that ad hoc transgene containment strategies would be required if genetically modified cultivars were introduced in the northwestern Mediterranean. PMID:25568053

  9. Ancestry dynamics in a South American population: The impact of gene flow and preferential mating.

    Science.gov (United States)

    Hedrick, Philip W

    2017-07-01

    European ancestry in many populations in Latin America at autosomal loci is often higher than that from X-linked loci indicating more European male ancestry and more Amerindian female ancestry. Generally, this has been attributed to more European male gene flow but could also result from an advantage to European mating or reproductive success. Population genetic models were developed to investigate the dynamics of gene flow and mating or reproductive success. Using estimates of autosomal and X-chromosome European ancestry, the amount of male gene flow or mating or reproductive advantage for Europeans, or those with European ancestry, was estimated. In a population from Antioquia, Colombia with an estimated 79% European autosomal ancestry and an estimated 69% European X-chromosome ancestry, about 15% male gene flow from Europe or about 20% mating or reproductive advantage of Europeans over Amerindians resulted in these levels of European ancestry in the contemporary population. Combinations of gene flow and mating advantage were nearly additive in their impact. Gene flow, mating advantage, or a combination of both factors, are consistent with observed levels of European ancestry in a Latin American population. This approach provides a general methodology to determine the levels of gene flow and mating differences that can explain the observed contemporary differences in ancestry from autosomes and X-chromosomes. © 2017 Wiley Periodicals, Inc.

  10. Gene flow in genetically modified wheat.

    Directory of Open Access Journals (Sweden)

    Silvan Rieben

    Full Text Available Understanding gene flow in genetically modified (GM crops is critical to answering questions regarding risk-assessment and the coexistence of GM and non-GM crops. In two field experiments, we tested whether rates of cross-pollination differed between GM and non-GM lines of the predominantly self-pollinating wheat Triticum aestivum. In the first experiment, outcrossing was studied within the field by planting "phytometers" of one line into stands of another line. In the second experiment, outcrossing was studied over distances of 0.5-2.5 m from a central patch of pollen donors to adjacent patches of pollen recipients. Cross-pollination and outcrossing was detected when offspring of a pollen recipient without a particular transgene contained this transgene in heterozygous condition. The GM lines had been produced from the varieties Bobwhite or Frisal and contained Pm3b or chitinase/glucanase transgenes, respectively, in homozygous condition. These transgenes increase plant resistance against pathogenic fungi. Although the overall outcrossing rate in the first experiment was only 3.4%, Bobwhite GM lines containing the Pm3b transgene were six times more likely than non-GM control lines to produce outcrossed offspring. There was additional variation in outcrossing rate among the four GM-lines, presumably due to the different transgene insertion events. Among the pollen donors, the Frisal GM line expressing a chitinase transgene caused more outcrossing than the GM line expressing both a chitinase and a glucanase transgene. In the second experiment, outcrossing after cross-pollination declined from 0.7-0.03% over the test distances of 0.5-2.5 m. Our results suggest that pollen-mediated gene flow between GM and non-GM wheat might only be a concern if it occurs within fields, e.g. due to seed contamination. Methodologically our study demonstrates that outcrossing rates between transgenic and other lines within crops can be assessed using a phytometer

  11. Gene flow in genetically modified wheat.

    Science.gov (United States)

    Rieben, Silvan; Kalinina, Olena; Schmid, Bernhard; Zeller, Simon L

    2011-01-01

    Understanding gene flow in genetically modified (GM) crops is critical to answering questions regarding risk-assessment and the coexistence of GM and non-GM crops. In two field experiments, we tested whether rates of cross-pollination differed between GM and non-GM lines of the predominantly self-pollinating wheat Triticum aestivum. In the first experiment, outcrossing was studied within the field by planting "phytometers" of one line into stands of another line. In the second experiment, outcrossing was studied over distances of 0.5-2.5 m from a central patch of pollen donors to adjacent patches of pollen recipients. Cross-pollination and outcrossing was detected when offspring of a pollen recipient without a particular transgene contained this transgene in heterozygous condition. The GM lines had been produced from the varieties Bobwhite or Frisal and contained Pm3b or chitinase/glucanase transgenes, respectively, in homozygous condition. These transgenes increase plant resistance against pathogenic fungi. Although the overall outcrossing rate in the first experiment was only 3.4%, Bobwhite GM lines containing the Pm3b transgene were six times more likely than non-GM control lines to produce outcrossed offspring. There was additional variation in outcrossing rate among the four GM-lines, presumably due to the different transgene insertion events. Among the pollen donors, the Frisal GM line expressing a chitinase transgene caused more outcrossing than the GM line expressing both a chitinase and a glucanase transgene. In the second experiment, outcrossing after cross-pollination declined from 0.7-0.03% over the test distances of 0.5-2.5 m. Our results suggest that pollen-mediated gene flow between GM and non-GM wheat might only be a concern if it occurs within fields, e.g. due to seed contamination. Methodologically our study demonstrates that outcrossing rates between transgenic and other lines within crops can be assessed using a phytometer approach and that gene-flow

  12. STUDY OF GENE FLOW FROM GM COTTON (Gossypium hirsutum VARIETIES IN “EL ESPINAL” (TOLIMA, COLOMBIA.

    Directory of Open Access Journals (Sweden)

    Alejandro Chaparro Giraldo

    2013-09-01

    Full Text Available In 2009, 4088 hectares of genetically modified (GM cotton were planted in Tolima (Colombia, however there is some uncertainty about containment measures needed to prevent the flow of pollen and seed from regulated GM fields into adjacent fields. In this study, the gene flow from GM cotton varieties to conventional or feral cotton plants via seed and pollen was evaluated. ImmunostripTM, PCR and ELISA assays were used to detect gene flow. Fifty six refuges, 27 fields with conventional cotton and four feral individuals of the enterprise “Remolinos Inc.” located in El Espinal (Tolima were analyzed in the first half of 2010. The results indicated seeds mediated gene flow in 45 refuges (80,4 % and 26 fields with conventional cotton (96 %, besides a pollen mediated gene flow in one field with conventional cotton and nine refuges. All fields cultivated with conventional cotton showed gene flow from GM cotton. Two refuges and two feral individuals did not reveal gene flow from GM cotton.

  13. Gene flow analysis method, the D-statistic, is robust in a wide parameter space.

    Science.gov (United States)

    Zheng, Yichen; Janke, Axel

    2018-01-08

    We evaluated the sensitivity of the D-statistic, a parsimony-like method widely used to detect gene flow between closely related species. This method has been applied to a variety of taxa with a wide range of divergence times. However, its parameter space and thus its applicability to a wide taxonomic range has not been systematically studied. Divergence time, population size, time of gene flow, distance of outgroup and number of loci were examined in a sensitivity analysis. The sensitivity study shows that the primary determinant of the D-statistic is the relative population size, i.e. the population size scaled by the number of generations since divergence. This is consistent with the fact that the main confounding factor in gene flow detection is incomplete lineage sorting by diluting the signal. The sensitivity of the D-statistic is also affected by the direction of gene flow, size and number of loci. In addition, we examined the ability of the f-statistics, [Formula: see text] and [Formula: see text], to estimate the fraction of a genome affected by gene flow; while these statistics are difficult to implement to practical questions in biology due to lack of knowledge of when the gene flow happened, they can be used to compare datasets with identical or similar demographic background. The D-statistic, as a method to detect gene flow, is robust against a wide range of genetic distances (divergence times) but it is sensitive to population size. The D-statistic should only be applied with critical reservation to taxa where population sizes are large relative to branch lengths in generations.

  14. Pleistocene land bridges act as semipermeable agents of avian gene flow in Wallacea.

    Science.gov (United States)

    Garg, Kritika M; Chattopadhyay, Balaji; Wilton, Peter R; Malia Prawiradilaga, Dewi; Rheindt, Frank E

    2018-08-01

    Cyclical periods of global cooling have been important drivers of biotic differentiation throughout the Quaternary. Ice age-induced sea level fluctuations can lead to changing patterns of land connections, both facilitating and disrupting gene flow. In this study, we test if species with differing life histories are differentially affected by Quaternary land connections. We used genome-wide SNPs in combination with mitochondrial gene sequences to analyse levels of divergence and gene flow between two songbird complexes across two Wallacean islands that have been repeatedly connected during glaciations. Although the two bird complexes are similar in ecological attributes, the forest and edge-inhabiting golden whistler Pachycephala pectoralis is comparatively flexible in its diet and niche requirements as compared to the henna-tailed jungle-flycatcher Cyornis colonus, which is largely restricted to the forest interior. Using population-genomic and coalescent approaches, we estimated levels of gene flow, population differentiation and divergence time between the two island populations. We observed higher levels of differentiation, an approximately two to four times deeper divergence time and near-zero levels of gene flow between the two island populations of the more forest-dependent henna-tailed jungle-flycatcher as compared to the more generalist golden whistler. Our results suggest that Quaternary land bridges act as semipermeable agents of gene flow in Wallacea, allowing only certain taxa to connect between islands while others remain isolated. Quaternary land bridges do not accommodate all terrestrial species equally, differing in suitability according to life history and species biology. More generalist species are likely to use Quaternary land connections as a conduit for gene flow between islands whereas island populations of more specialist species may continue to be reproductively isolated even during periods of Quaternary land bridges. Copyright © 2018

  15. Inference of purifying and positive selection in three subspecies of chimpanzees (Pan troglodytes) from exome sequencing

    DEFF Research Database (Denmark)

    Bataillon, Thomas; Duan, Jinjie; Hvilsom, Christina

    2015-01-01

    of recent gene flow from Western into Eastern chimpanzees. The striking contrast in X-linked vs. autosomal polymorphism and divergence previously reported in Central chimpanzees is also found in Eastern and Western chimpanzees. We show that the direction of selection (DoS) statistic exhibits a strong non......-monotonic relationship with the strength of purifying selection S, making it inappropriate for estimating S. We instead use counts in synonymous vs. non-synonymous frequency classes to infer the distribution of S coefficients acting on non-synonymous mutations in each subspecies. The strength of purifying selection we...... infer is congruent with the differences in effective sizes of each subspecies: Central chimpanzees are undergoing the strongest purifying selection followed by Eastern and Western chimpanzees. Coding indels show stronger selection against indels changing the reading frame than observed in human...

  16. Cryptic species? Patterns of maternal and paternal gene flow in eight neotropical bats.

    Directory of Open Access Journals (Sweden)

    Elizabeth L Clare

    Full Text Available Levels of sequence divergence at mitochondrial loci are frequently used in phylogeographic analysis and species delimitation though single marker systems cannot assess bi-parental gene flow. In this investigation I compare the phylogeographic patterns revealed through the maternally inherited mitochondrial COI region and the paternally inherited 7(th intron region of the Dby gene on the Y-chromosome in eight common Neotropical bat species. These species are diverse and include members of two families from the feeding guilds of sanguivores, nectarivores, frugivores, carnivores and insectivores. In each case, the currently recognized taxon is comprised of distinct, substantially divergent intraspecific mitochondrial lineages suggesting cryptic species complexes. In Chrotopterus auritus, and Saccopteryx bilineata I observed congruent patterns of divergence in both genetic regions suggesting a cessation of gene flow between intraspecific groups. This evidence supports the existence of cryptic species complexes which meet the criteria of the genetic species concept. In Glossophaga soricina two intraspecific groups with largely sympatric South American ranges show evidence for incomplete lineage sorting or frequent hybridization while a third group with a Central American distribution appears to diverge congruently at both loci suggesting speciation. Within Desmodus rotundus and Trachops cirrhosus the paternally inherited region was monomorphic and thus does not support or refute the potential for cryptic speciation. In Uroderma bilobatum, Micronycteris megalotis and Platyrrhinus helleri the gene regions show conflicting patterns of divergence and I cannot exclude ongoing gene flow between intraspecific groups. This analysis provides a comprehensive comparison across taxa and employs both maternally and paternally inherited gene regions to validate patterns of gene flow. I present evidence for previously unrecognized species meeting the criteria of

  17. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

    Science.gov (United States)

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

    2012-01-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606

  18. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks.

    Science.gov (United States)

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A; Kellis, Manolis

    2012-07-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein-protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.

  19. Introgression in the Drosophila subobscura--D. Madeirensis sister species: evidence of gene flow in nuclear genes despite mitochondrial differentiation.

    Science.gov (United States)

    Herrig, Danielle K; Modrick, Alec J; Brud, Evgeny; Llopart, Ana

    2014-03-01

    Species hybridization, and thus the potential for gene flow, was once viewed as reproductive mistake. However, recent analysis based on large datasets and newly developed models suggest that gene exchange is not as rare as originally suspected. To investigate the history and speciation of the closely related species Drosophila subobscura, D. madeirensis, and D. guanche, we obtained polymorphism and divergence data for 26 regions throughout the genome, including the Y chromosome and mitochondrial DNA. We found that the D. subobscura X/autosome ratio of silent nucleotide diversity is significantly smaller than the 0.75 expected under neutrality. This pattern, if held genomewide, may reflect a faster accumulation of beneficial mutations on the X chromosome than on autosomes. We also detected evidence of gene flow in autosomal regions, while sex chromosomes remain distinct. This is consistent with the large X effect on hybrid male sterility seen in this system and the presence of two X chromosome inversions fixed between species. Overall, our data conform to chromosomal speciation models in which rearrangements are proposed to serve as gene flow barriers. Contrary to other observations in Drosophila, the mitochondrial genome appears resilient to gene flow in the presence of nuclear exchange. © 2013 The Authors. Evolution published by Wiley Periodicals, Inc. on behalf of The Society for the Study of Evolution.

  20. Data identification for improving gene network inference using computational algebra.

    Science.gov (United States)

    Dimitrova, Elena; Stigler, Brandilyn

    2014-11-01

    Identification of models of gene regulatory networks is sensitive to the amount of data used as input. Considering the substantial costs in conducting experiments, it is of value to have an estimate of the amount of data required to infer the network structure. To minimize wasted resources, it is also beneficial to know which data are necessary to identify the network. Knowledge of the data and knowledge of the terms in polynomial models are often required a priori in model identification. In applications, it is unlikely that the structure of a polynomial model will be known, which may force data sets to be unnecessarily large in order to identify a model. Furthermore, none of the known results provides any strategy for constructing data sets to uniquely identify a model. We provide a specialization of an existing criterion for deciding when a set of data points identifies a minimal polynomial model when its monomial terms have been specified. Then, we relax the requirement of the knowledge of the monomials and present results for model identification given only the data. Finally, we present a method for constructing data sets that identify minimal polynomial models.

  1. Pollen-mediated gene flow in flax (Linum usitatissimum L.): can genetically engineered and organic flax coexist?

    Science.gov (United States)

    Jhala, A J; Bhatt, H; Topinka, K; Hall, L M

    2011-04-01

    Coexistence allows growers and consumers the choice of producing or purchasing conventional or organic crops with known standards for adventitious presence of genetically engineered (GE) seed. Flax (Linum usitatissimum L.) is multipurpose oilseed crop in which product diversity and utility could be enhanced for industrial, nutraceutical and pharmaceutical markets through genetic engineering. If GE flax were released commercially, pollen-mediated gene flow will determine in part whether GE flax could coexist without compromising other markets. As a part of pre-commercialization risk assessment, we quantified pollen-mediated gene flow between two cultivars of flax. Field experiments were conducted at four locations during 2006 and 2007 in western Canada using a concentric donor (20 × 20 m) receptor (120 × 120 m) design. Gene flow was detected through the xenia effect of dominant alleles of high α-linolenic acid (ALA; 18:3(cisΔ9,12,15)) to the low ALA trait. Seeds were harvested from the pollen recipient plots up to a distance of 50 m in eight directions from the pollen donor. High ALA seeds were identified using a thiobarbituric acid test and served as a marker for gene flow. Binomial distribution and power analysis were used to predict the minimum number of seeds statistically required to detect the frequency of gene flow at specific α (confidence interval) and power (1-β) values. As a result of the low frequency of gene flow, approximately 4 million seeds were screened to derive accurate quantification. Frequency of gene flow was highest near the source: averaging 0.0185 at 0.1 m but declined rapidly with distance, 0.0013 and 0.00003 at 3 and 35 m, respectively. Gene flow was reduced to 50% (O₅₀) and 90% (O₉₀) between 0.85 to 2.64 m, and 5.68 to 17.56 m, respectively. No gene flow was detected at any site or year > 35 m distance from the pollen source, suggesting that frequency of gene flow was ≤ 0.00003 (P = 0.95). Although it is not possible to

  2. Impact of flows on ion temperatures inferred from neutron spectra in asymmetrically driven OMEGA DT implosions

    Science.gov (United States)

    Gatu Johnson, M.; Frenje, J.; Lahmann, B.; Seguin, F.; Petrasso, R.; Appelbe, B.; Chittenden, J.; Walsh, C.; Delettrez, J.; Igumenshchev, I.; Knauer, J. P.; Glebov, V. Yu.; Forrest, C.; Grimble, W.; Marshall, F.; Michel, T.; Stoeckl, C.; Haines, B. M.; Zylstra, A. B.

    2017-10-01

    Ion temperatures (Tion) in Inertial Confinement Fusion (ICF) experiments have traditionally been inferred from the broadening of primary neutron spectra. Directional motion (flow) of the fuel at burn, expected to arise due to asymmetries imposed by e.g. engineering features or drive non-uniformity, also impacts broadening and may lead to artificially inflated ``Tion'' values. Flow due to low-mode asymmetries is expected to give rise to line-of-sight variations in measured Tion, as observed in OMEGA cryogenic DT implosions but not in similar experiments at the NIF. In this presentation, we report on OMEGA experiments with intentional drive asymmetry designed for testing the ability to accurately predict and measure line-of-sight differences in apparent Tion due to low-mode asymmetry-seeded flows. The measurements are contrasted to CHIMERA, RAGE and ASTER simulations, providing insight into implosion dynamics and the relative importance of laser drive non-uniformity, stalk and offset as sources of asymmetry. The results highlight the complexity of hot-spot dynamics, which is a problem that must be mastered to achieve ICF ignition. This work was supported in part by the U.S. DOE, NLUF and LLE.

  3. Targeted Enrichment of Large Gene Families for Phylogenetic Inference: Phylogeny and Molecular Evolution of Photosynthesis Genes in the Portullugo Clade (Caryophyllales).

    Science.gov (United States)

    Moore, Abigail J; Vos, Jurriaan M De; Hancock, Lillian P; Goolsby, Eric; Edwards, Erika J

    2018-05-01

    Hybrid enrichment is an increasingly popular approach for obtaining hundreds of loci for phylogenetic analysis across many taxa quickly and cheaply. The genes targeted for sequencing are typically single-copy loci, which facilitate a more straightforward sequence assembly and homology assignment process. However, this approach limits the inclusion of most genes of functional interest, which often belong to multi-gene families. Here, we demonstrate the feasibility of including large gene families in hybrid enrichment protocols for phylogeny reconstruction and subsequent analyses of molecular evolution, using a new set of bait sequences designed for the "portullugo" (Caryophyllales), a moderately sized lineage of flowering plants (~ 2200 species) that includes the cacti and harbors many evolutionary transitions to C$_{\\mathrm{4}}$ and CAM photosynthesis. Including multi-gene families allowed us to simultaneously infer a robust phylogeny and construct a dense sampling of sequences for a major enzyme of C$_{\\mathrm{4}}$ and CAM photosynthesis, which revealed the accumulation of adaptive amino acid substitutions associated with C$_{\\mathrm{4}}$ and CAM origins in particular paralogs. Our final set of matrices for phylogenetic analyses included 75-218 loci across 74 taxa, with ~ 50% matrix completeness across data sets. Phylogenetic resolution was greatly improved across the tree, at both shallow and deep levels. Concatenation and coalescent-based approaches both resolve the sister lineage of the cacti with strong support: Anacampserotaceae $+$ Portulacaceae, two lineages of mostly diminutive succulent herbs of warm, arid regions. In spite of this congruence, BUCKy concordance analyses demonstrated strong and conflicting signals across gene trees. Our results add to the growing number of examples illustrating the complexity of phylogenetic signals in genomic-scale data.

  4. Intercontinental gene flow among western arctic populations of Lesser Snow Geese

    Science.gov (United States)

    Shorey, Rainy I.; Scribner, Kim T.; Kanefsky, Jeannette; Samuel, Michael D.; Libants, Scot V.

    2011-01-01

    Quantifying the spatial genetic structure of highly vagile species of birds is important in predicting their degree of population demographic and genetic independence during changing environmental conditions, and in assessing their abundance and distribution. In the western Arctic, Lesser Snow Geese (Chen caerulescens caerulescens) provide an example useful for evaluating spatial population genetic structure and the relative contribution of male and female philopatry to breeding and wintering locales. We analyzed biparentally inherited microsatellite loci and maternally inherited mtDNA sequences from geese breeding at Wrangel Island (Russia) and Banks Island (Canada) to estimate gene flow among populations whose geographic overlap during breeding and winter differ. Significant differences in the frequencies of mtDNA haplotypes contrast with the homogeneity of allele frequencies for microsatellite loci. Coalescence simulations revealed high variability and asymmetry between males and females in rates and direction of gene flow between populations. Our results highlight the importance of wintering areas to demographic independence and spatial genetic structure of these populations. Male-mediated gene flow among the populations on northern Wrangel Island, southern Wrangel Island, and Banks Island has been substantial. A high rate of female-mediated gene flow from southern Wrangel Island to Banks Island suggests that population exchange can be achieved when populations winter in a common area. Conversely, when birds from different breeding populations do not share a common wintering area, the probability of population exchange is likely to be dramatically reduced.

  5. Gene flow from transgenic common beans expressing the bar gene.

    Science.gov (United States)

    Faria, Josias C; Carneiro, Geraldo E S; Aragão, Francisco J L

    2010-01-01

    Gene flow is a common phenomenon even in self-pollinated plant species. With the advent of genetically modified plants this subject has become of the utmost importance due to the need for controlling the spread of transgenes. This study was conducted to determine the occurrence and intensity of outcrossing in transgenic common beans. In order to evaluate the outcross rates, four experiments were conducted in Santo Antonio de Goiás (GO, Brazil) and one in Londrina (PR, Brazil), using transgenic cultivars resistant to the herbicide glufosinate ammonium and their conventional counterparts as recipients of the transgene. Experiments with cv. Olathe Pinto and the transgenic line Olathe M1/4 were conducted in a completely randomized design with ten replications for three years in one location, whereas the experiments with cv. Pérola and the transgenic line Pérola M1/4 were conducted at two locations for one year, with the transgenic cultivar surrounded on all sides by the conventional counterpart. The outcross occurred at a negligible rate of 0.00741% in cv. Pérola, while none was observed (0.0%) in cv. Olathe Pinto. The frequency of gene flow was cultivar dependent and most of the observed outcross was within 2.5 m from the edge of the pollen source. Index terms: Phaseolus vulgaris, outcross, glufosinate ammonium.

  6. Phylogenetic relationships within Echinococcus and Taenia tapeworms (Cestoda: Taeniidae): an inference from nuclear protein-coding genes.

    Science.gov (United States)

    Knapp, Jenny; Nakao, Minoru; Yanagida, Tetsuya; Okamoto, Munehiro; Saarma, Urmas; Lavikainen, Antti; Ito, Akira

    2011-12-01

    The family Taeniidae of tapeworms is composed of two genera, Echinococcus and Taenia, which obligately parasitize mammals including humans. Inferring phylogeny via molecular markers is the only way to trace back their evolutionary histories. However, molecular dating approaches are lacking so far. Here we established new markers from nuclear protein-coding genes for RNA polymerase II second largest subunit (rpb2), phosphoenolpyruvate carboxykinase (pepck) and DNA polymerase delta (pold). Bayesian inference and maximum likelihood analyses of the concatenated gene sequences allowed us to reconstruct phylogenetic trees for taeniid parasites. The tree topologies clearly demonstrated that Taenia is paraphyletic and that the clade of Echinococcus oligarthrus and Echinococcusvogeli is sister to all other members of Echinococcus. Both species are endemic in Central and South America, and their definitive hosts originated from carnivores that immigrated from North America after the formation of the Panamanian land bridge about 3 million years ago (Ma). A time-calibrated phylogeny was estimated by a Bayesian relaxed-clock method based on the assumption that the most recent common ancestor of E. oligarthrus and E. vogeli existed during the late Pliocene (3.0 Ma). The results suggest that a clade of Taenia including human-pathogenic species diversified primarily in the late Miocene (11.2 Ma), whereas Echinococcus started to diversify later, in the end of the Miocene (5.8 Ma). Close genetic relationships among the members of Echinococcus imply that the genus is a young group in which speciation and global radiation occurred rapidly. Copyright © 2011 Elsevier Inc. All rights reserved.

  7. Effect of selective logging on genetic diversity and gene flow in Cariniana legalis sampled from a cacao agroforestry system.

    Science.gov (United States)

    Leal, J B; Santos, R P; Gaiotto, F A

    2014-01-28

    The fragments of the Atlantic Forest of southern Bahia have a long history of intense logging and selective cutting. Some tree species, such as jequitibá rosa (Cariniana legalis), have experienced a reduction in their populations with respect to both area and density. To evaluate the possible effects of selective logging on genetic diversity, gene flow, and spatial genetic structure, 51 C. legalis individuals were sampled, representing the total remaining population from the cacao agroforestry system. A total of 120 alleles were observed from the 11 microsatellite loci analyzed. The average observed heterozygosity (0.486) was less than the expected heterozygosity (0.721), indicating a loss of genetic diversity in this population. A high fixation index (FIS = 0.325) was found, which is possibly due to a reduction in population size, resulting in increased mating among relatives. The maximum (1055 m) and minimum (0.095 m) distances traveled by pollen or seeds were inferred based on paternity tests. We found 36.84% of unique parents among all sampled seedlings. The progenitors of the remaining seedlings (63.16%) were most likely out of the sampled area. Positive and significant spatial genetic structure was identified in this population among classes 10 to 30 m away with an average coancestry coefficient between pairs of individuals of 0.12. These results suggest that the agroforestry system of cacao cultivation is contributing to maintaining levels of diversity and gene flow in the studied population, thus minimizing the effects of selective logging.

  8. A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data

    Science.gov (United States)

    Gallopin, Mélina; Rau, Andrea; Jaffrézic, Florence

    2013-01-01

    Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data. PMID:24147011

  9. High rates of gene flow by pollen and seed in oak populations across Europe

    NARCIS (Netherlands)

    Gerber, S.; Chadoeuf, J.; Gugerli, F.; Lascoux, M.; Buiteveld, J.; Cottrell, J.; Dounavi, A.; Fineschi, S.; Forrest, L.; Fogelqvist, J.; Goicoechea, P.G.; Jensen, J.S.; Salvini, D.; Vendramin, G.G.; Kremer, A.

    2014-01-01

    Gene flow is a key factor in the evolution of species, influencing effective population size, hybridisation and local adaptation. We analysed local gene flow in eight stands of white oak (mostly Quercus petraea and Q. robur, but also Q. pubescens and Q. faginea) distributed across Europe. Adult

  10. Adaptive divergence with gene flow in incipient speciation of Miscanthus floridulus / sinensis complex (Poaceae)

    KAUST Repository

    Huang, Chao-Li; Ho, Chuan-Wen; Chiang, Yu-Chung; Shigemoto, Yasumasa; Hsu, Tsai-Wen; Hwang, Chi-Chuan; Ge, Xue-Jun; Chen, Charles; Wu, Tai-Han; Chou, Chang-Hung; Huang, Hao-Jen; Gojobori, Takashi; Osada, Naoki; Chiang, Tzen-Yuh

    2014-01-01

    Young incipient species provide ideal materials for untangling the process of ecological speciation in the presence of gene flow. The Miscanthus floridulus/sinensis complex exhibits diverse phenotypic and ecological differences despite recent divergence (approximately 1.59million years ago). To elucidate the process of genetic differentiation during early stages of ecological speciation, we analyzed genomic divergence in the Miscanthus complex using 72 randomly selected genes from a newly assembled transcriptome. In this study, rampant gene flow was detected between species, estimated as M=3.36x10(-9) to 1.20x10(-6), resulting in contradicting phylogenies across loci. Nevertheless, beast analyses revealed the species identity and the effects of extrinsic cohesive forces that counteracted the non-stop introgression. As expected, early in speciation with gene flow, only 3-13 loci were highly diverged; two to five outliers (approximately 2.78-6.94% of the genome) were characterized by strong linkage disequilibrium, and asymmetrically distributed among ecotypes, indicating footprints of diversifying selection. In conclusion, ecological speciation of incipient species of Miscanthus probably followed the parapatric model, whereas allopatric speciation cannot be completely ruled out, especially between the geographically isolated northern and southern M.sinensis, for which no significant gene flow across oceanic barriers was detected. Divergence between local ecotypes in early-stage speciation began at a few genomic regions under the influence of natural selection and divergence hitchhiking that overcame gene flow.

  11. Adaptive divergence with gene flow in incipient speciation of Miscanthus floridulus / sinensis complex (Poaceae)

    KAUST Repository

    Huang, Chao-Li

    2014-11-11

    Young incipient species provide ideal materials for untangling the process of ecological speciation in the presence of gene flow. The Miscanthus floridulus/sinensis complex exhibits diverse phenotypic and ecological differences despite recent divergence (approximately 1.59million years ago). To elucidate the process of genetic differentiation during early stages of ecological speciation, we analyzed genomic divergence in the Miscanthus complex using 72 randomly selected genes from a newly assembled transcriptome. In this study, rampant gene flow was detected between species, estimated as M=3.36x10(-9) to 1.20x10(-6), resulting in contradicting phylogenies across loci. Nevertheless, beast analyses revealed the species identity and the effects of extrinsic cohesive forces that counteracted the non-stop introgression. As expected, early in speciation with gene flow, only 3-13 loci were highly diverged; two to five outliers (approximately 2.78-6.94% of the genome) were characterized by strong linkage disequilibrium, and asymmetrically distributed among ecotypes, indicating footprints of diversifying selection. In conclusion, ecological speciation of incipient species of Miscanthus probably followed the parapatric model, whereas allopatric speciation cannot be completely ruled out, especially between the geographically isolated northern and southern M.sinensis, for which no significant gene flow across oceanic barriers was detected. Divergence between local ecotypes in early-stage speciation began at a few genomic regions under the influence of natural selection and divergence hitchhiking that overcame gene flow.

  12. Illegal gene flow from transgenic creeping bentgrass: the saga continues.

    Science.gov (United States)

    Snow, Allison A

    2012-10-01

    Ecologists have paid close attention to environmental effects that fitness-enhancing transgenes might have following crop-to-wild gene flow (e.g. Snow et al. 2003). For some crops, gene flow also can lead to legal problems,especially when government agencies have not approved transgenic events for unrestricted environmental release.Creeping bentgrass (Agrostis stolonifera), a common turf grass used in golf courses, is the focus of both areas of concern. In 2002, prior to expected deregulation (still pending), The Scotts Company planted creeping bentgrass with transgenic resistance to the herbicide glyphosate,also known as RoundUp, on 162 ha in a designated control area in central Oregon (Fig. 1).Despite efforts to restrict gene flow, wind-dispersed pollen carried transgenes to florets of local A. stolonifera and A. gigantea as far as 14 km away, and to sentinel plants placed as far as 21 km away (Watrud et al. 2004).Then, in August 2003, a strong wind event moved transgenic seeds from wind rows of cut bentgrass into nearby areas. The company’s efforts to kill all transgenic survivors in the area failed: feral glyphosate-resistant populations of A. stolonifera were found by Reichman et al.(2006), and 62% of 585 bentgrass plants had the telltale CP4 EPSPS transgene in 2006 (Zapiola et al. 2008; Fig. 2).Now, in this issue, the story gets even more interesting as Zapiola & Mallory-Smith (2012) describe a transgenic,intergeneric hybrid produced on a feral, transgenic creeping bentgrass plant that received pollen from Polypogon monspeliensis (rabbitfoot grass). Their finding raises a host of new questions about the prevalence and fitness of intergeneric hybrids, as well as how to evaluate the full extent of gene flow from transgenic crops.

  13. Learning gene regulatory networks from gene expression data using weighted consensus

    KAUST Repository

    Fujii, Chisato; Kuwahara, Hiroyuki; Yu, Ge; Guo, Lili; Gao, Xin

    2016-01-01

    An accurate determination of the network structure of gene regulatory systems from high-throughput gene expression data is an essential yet challenging step in studying how the expression of endogenous genes is controlled through a complex interaction of gene products and DNA. While numerous methods have been proposed to infer the structure of gene regulatory networks, none of them seem to work consistently over different data sets with high accuracy. A recent study to compare gene network inference methods showed that an average-ranking-based consensus method consistently performs well under various settings. Here, we propose a linear programming-based consensus method for the inference of gene regulatory networks. Unlike the average-ranking-based one, which treats the contribution of each individual method equally, our new consensus method assigns a weight to each method based on its credibility. As a case study, we applied the proposed consensus method on synthetic and real microarray data sets, and compared its performance to that of the average-ranking-based consensus and individual inference methods. Our results show that our weighted consensus method achieves superior performance over the unweighted one, suggesting that assigning weights to different individual methods rather than giving them equal weights improves the accuracy. © 2016 Elsevier B.V.

  14. Learning gene regulatory networks from gene expression data using weighted consensus

    KAUST Repository

    Fujii, Chisato

    2016-08-25

    An accurate determination of the network structure of gene regulatory systems from high-throughput gene expression data is an essential yet challenging step in studying how the expression of endogenous genes is controlled through a complex interaction of gene products and DNA. While numerous methods have been proposed to infer the structure of gene regulatory networks, none of them seem to work consistently over different data sets with high accuracy. A recent study to compare gene network inference methods showed that an average-ranking-based consensus method consistently performs well under various settings. Here, we propose a linear programming-based consensus method for the inference of gene regulatory networks. Unlike the average-ranking-based one, which treats the contribution of each individual method equally, our new consensus method assigns a weight to each method based on its credibility. As a case study, we applied the proposed consensus method on synthetic and real microarray data sets, and compared its performance to that of the average-ranking-based consensus and individual inference methods. Our results show that our weighted consensus method achieves superior performance over the unweighted one, suggesting that assigning weights to different individual methods rather than giving them equal weights improves the accuracy. © 2016 Elsevier B.V.

  15. Spatial and temporal assessment of pollen- and seed-mediated gene flow from genetically engineered plum Prunus domestica.

    Directory of Open Access Journals (Sweden)

    Ralph Scorza

    Full Text Available Pollen flow from a 0.46 ha plot of genetically engineered (GE Prunus domestica located in West Virginia, USA was evaluated from 2000-2010. Sentinel plum trees were planted at distances ranging from 132 to 854 m from the center of the GE orchard. Plots of mixed plum varieties and seedlings were located at 384, 484 and 998 m from the GE plot. Bee hives (Apis mellifera were dispersed between the GE plum plot and the pollen flow monitoring sites. Pollen-mediated gene flow from out of the GE plum plot to non-GE plums under the study conditions was low, only occurring at all in 4 of 11 years and then in only 0.31% of the 12,116 seeds analyzed. When it occurred, gene flow, calculated as the number of GUS positive embryos/total embryos sampled, ranged from 0.215% at 132 m from the center of the GE plum plot (28 m from the nearest GE plum tree to 0.033-0.017% at longer distances (384-998 m. Based on the percentage of GUS positive seeds per individual sampled tree the range was 0.4% to 12%. Within the GE field plot, gene flow ranged from 4.9 to 39%. Gene flow was related to distance and environmental conditions. A single year sample from a sentinel plot 132 m from the center of the GE plot accounted for 65% of the total 11-year gene flow. Spatial modeling indicated that gene flow dramatically decreased at distances over 400 m from the GE plot. Air temperature and rainfall were, respectively, positively and negatively correlated with gene flow, reflecting the effects of weather conditions on insect pollinator activity. Seed-mediated gene flow was not detected. These results support the feasibility of coexistence of GE and non-GE plum orchards.

  16. Reconstruction of gene regulatory modules from RNA silencing of IFN-α modulators: experimental set-up and inference method.

    Science.gov (United States)

    Grassi, Angela; Di Camillo, Barbara; Ciccarese, Francesco; Agnusdei, Valentina; Zanovello, Paola; Amadori, Alberto; Finesso, Lorenzo; Indraccolo, Stefano; Toffolo, Gianna Maria

    2016-03-12

    Inference of gene regulation from expression data may help to unravel regulatory mechanisms involved in complex diseases or in the action of specific drugs. A challenging task for many researchers working in the field of systems biology is to build up an experiment with a limited budget and produce a dataset suitable to reconstruct putative regulatory modules worth of biological validation. Here, we focus on small-scale gene expression screens and we introduce a novel experimental set-up and a customized method of analysis to make inference on regulatory modules starting from genetic perturbation data, e.g. knockdown and overexpression data. To illustrate the utility of our strategy, it was applied to produce and analyze a dataset of quantitative real-time RT-PCR data, in which interferon-α (IFN-α) transcriptional response in endothelial cells is investigated by RNA silencing of two candidate IFN-α modulators, STAT1 and IFIH1. A putative regulatory module was reconstructed by our method, revealing an intriguing feed-forward loop, in which STAT1 regulates IFIH1 and they both negatively regulate IFNAR1. STAT1 regulation on IFNAR1 was object of experimental validation at the protein level. Detailed description of the experimental set-up and of the analysis procedure is reported, with the intent to be of inspiration for other scientists who want to realize similar experiments to reconstruct gene regulatory modules starting from perturbations of possible regulators. Application of our approach to the study of IFN-α transcriptional response modulators in endothelial cells has led to many interesting novel findings and new biological hypotheses worth of validation.

  17. Interspecific gene flow and maintenance of species integrity in oaks

    Directory of Open Access Journals (Sweden)

    Oliver Gailing

    2014-07-01

    Full Text Available Oak species show a wide variation in morphological and physiological characters, and species boundaries between closely related species are often not clear-cut. Still, despite frequent interspecific gene flow, oaks maintain distinct morphological and physiological adaptations. In sympatric stands, spatial distribution of species with different ecological requirements is not random but constrained by soil and other microenvironmental factors. Pre-zygotic isolation (e.g. cross incompatibilities, asynchrony in flowering, pollen competition and post-zygotic isolation (divergent selection contribute to the maintenance of species integrity in sympatric oak stands. The antagonistic effects of interspecific gene flow and divergent selection are reflected in the low genetic differentiation between hybridizing oak species at most genomic regions interspersed by regions with signatures of divergent selection (outlier regions. In the near future, the availability of high-density genetic linkage maps anchored to scaffolds of a sequenced Q. robur genome will allow to characterize the underlying genes in these outlier regions and their putative role in reproductive isolation between species. Reciprocal transplant experiments of seedlings between parental environments can be used to characterize selection on outlier genes. High transferability of gene-based markers will enable comparative outlier screens in different oak species.

  18. Inference of Transcriptional Network for Pluripotency in Mouse Embryonic Stem Cells

    International Nuclear Information System (INIS)

    Aburatani, S

    2015-01-01

    In embryonic stem cells, various transcription factors (TFs) maintain pluripotency. To gain insights into the regulatory system controlling pluripotency, I inferred the regulatory relationships between the TFs expressed in ES cells. In this study, I applied a method based on structural equation modeling (SEM), combined with factor analysis, to 649 expression profiles of 19 TF genes measured in mouse Embryonic Stem Cells (ESCs). The factor analysis identified 19 TF genes that were regulated by several unmeasured factors. Since the known cell reprogramming TF genes (Pou5f1, Sox2 and Nanog) are regulated by different factors, each estimated factor is considered to be an input for signal transduction to control pluripotency in mouse ESCs. In the inferred network model, TF proteins were also arranged as unmeasured factors that control other TFs. The interpretation of the inferred network model revealed the regulatory mechanism for controlling pluripotency in ES cells

  19. A Network Inference Workflow Applied to Virulence-Related Processes in Salmonella typhimurium

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, Ronald C.; Singhal, Mudita; Weller, Jennifer B.; Khoshnevis, Saeed; Shi, Liang; McDermott, Jason E.

    2009-04-20

    Inference of the structure of mRNA transcriptional regulatory networks, protein regulatory or interaction networks, and protein activation/inactivation-based signal transduction networks are critical tasks in systems biology. In this article we discuss a workflow for the reconstruction of parts of the transcriptional regulatory network of the pathogenic bacterium Salmonella typhimurium based on the information contained in sets of microarray gene expression data now available for that organism, and describe our results obtained by following this workflow. The primary tool is one of the network inference algorithms deployed in the Software Environment for BIological Network Inference (SEBINI). Specifically, we selected the algorithm called Context Likelihood of Relatedness (CLR), which uses the mutual information contained in the gene expression data to infer regulatory connections. The associated analysis pipeline automatically stores the inferred edges from the CLR runs within SEBINI and, upon request, transfers the inferred edges into either Cytoscape or the plug-in Collective Analysis of Biological of Biological Interaction Networks (CABIN) tool for further post-analysis of the inferred regulatory edges. The following article presents the outcome of this workflow, as well as the protocols followed for microarray data collection, data cleansing, and network inference. Our analysis revealed several interesting interactions, functional groups, metabolic pathways, and regulons in S. typhimurium.

  20. Time and flow-dependent changes in the p27(kip1) gene network drive maladaptive vascular remodeling.

    Science.gov (United States)

    DeSart, Kenneth M; Butler, Khayree; O'Malley, Kerri A; Jiang, Zhihua; Berceli, Scott A

    2015-11-01

    Although clinical studies have identified that a single nucleotide polymorphism in the p27(kip1) gene is associated with success or failure after vein bypass grafting, the underlying mechanisms for this difference are not well defined. Using a high-throughput approach in a flow-dependent vein graft model, we explored the differences in p27(kip1)-related genes that drive the enhanced hyperplastic response under low-flow conditions. Bilateral rabbit carotid artery interposition grafts with jugular vein were placed with a unilateral distal outflow branch ligation to create differential flow states. Grafts were harvested at 2 hours and at 1, 3, 7, 14, and 28 days after implantation, measured for neointimal area, and assayed for cell proliferation. Whole-vessel messenger RNA was isolated and analyzed using an Affymetrix (Santa Clara, Calif) gene array platform. Ingenuity Pathway Analysis (Ingenuity, Redwood City, Calif) was used to identify the gene networks surrounding p27(kip1). This gene set was then analyzed for temporal expression changes after graft placement and for differential expression in the alternate flow conditions. Outflow branch ligation resulted in an eightfold difference in mean flow rates throughout the 28-day perfusion period (P Flow reduction led to a robust hyperplastic response, resulting in a significant increase in intimal area by 7 days (0.13 ± 0.04 mm(2) vs 0.014 ± 0.006 mm(2); P flow grafts demonstrated a burst of actively dividing intimal cells (36.4 ± 9.4 cells/mm(2) vs 11.5 ± 1.9 cells/mm(2); P = .04). Sixty-five unique genes within the microarray were identified as components of the p27(kip1) network. At a false discovery rate of 0.05, 26 genes demonstrated significant temporal changes, and two dominant patterns of expression were identified. Class comparison analysis identified differential expression of 11 genes at 2 hours and seven genes and 14 days between the high-flow and low-flow grafts (P flow and shear stress result in

  1. dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data.

    Science.gov (United States)

    Huynh-Thu, Vân Anh; Geurts, Pierre

    2018-02-21

    The elucidation of gene regulatory networks is one of the major challenges of systems biology. Measurements about genes that are exploited by network inference methods are typically available either in the form of steady-state expression vectors or time series expression data. In our previous work, we proposed the GENIE3 method that exploits variable importance scores derived from Random forests to identify the regulators of each target gene. This method provided state-of-the-art performance on several benchmark datasets, but it could however not specifically be applied to time series expression data. We propose here an adaptation of the GENIE3 method, called dynamical GENIE3 (dynGENIE3), for handling both time series and steady-state expression data. The proposed method is evaluated extensively on the artificial DREAM4 benchmarks and on three real time series expression datasets. Although dynGENIE3 does not systematically yield the best performance on each and every network, it is competitive with diverse methods from the literature, while preserving the main advantages of GENIE3 in terms of scalability.

  2. Inferring regulatory networks from expression data using tree-based methods.

    Directory of Open Access Journals (Sweden)

    Vân Anh Huynh-Thu

    2010-09-01

    Full Text Available One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene is predicted from the expression patterns of all the other genes (input genes, using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.

  3. Applying gene flow science to environmental policy needs: a boundary work perspective.

    Science.gov (United States)

    Ridley, Caroline E; Alexander, Laurie C

    2016-08-01

    One application of gene flow science is the policy arena. In this article, we describe two examples in which the topic of gene flow has entered into the U.S. national environmental policymaking process: regulation of genetically engineered crops and clarification of the jurisdictional scope of the Clean Water Act. We summarize both current scientific understanding and the legal context within which gene flow science has relevance. We also discuss the process by which scientific knowledge has been synthesized and communicated to decision-makers in these two contexts utilizing the concept of 'boundary work'. Boundary organizations, the work they engage in to bridge the worlds of science, policy, and practice, and the boundary objects they produce to translate scientific knowledge existed in both examples. However, the specific activities and attributes of the objects produced varied based on the needs of the decision-makers. We close with suggestions for how scientists can contribute to or engage in boundary work with policymakers.

  4. Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks

    Directory of Open Access Journals (Sweden)

    Ji Wei

    2010-10-01

    Full Text Available Abstract Background Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discretization has been conducted, in the context of gene regulatory network inference from time series gene expression data. Results In this study, we propose a new discretization method "bikmeans", and compare its performance with four other widely-used discretization methods using different datasets, modeling algorithms and number of intervals. Sensitivities, specificities and total accuracies were calculated and statistical analysis was carried out. Bikmeans method always gave high total accuracies. Conclusions Our results indicate that proper discretization methods can consistently improve gene regulatory network inference independent of network modeling algorithms and datasets. Our new method, bikmeans, resulted in significant better total accuracies than other methods.

  5. Speciation with gene flow in equids despite extensive chromosomal plasticity.

    Science.gov (United States)

    Jónsson, Hákon; Schubert, Mikkel; Seguin-Orlando, Andaine; Ginolhac, Aurélien; Petersen, Lillian; Fumagalli, Matteo; Albrechtsen, Anders; Petersen, Bent; Korneliussen, Thorfinn S; Vilstrup, Julia T; Lear, Teri; Myka, Jennifer Leigh; Lundquist, Judith; Miller, Donald C; Alfarhan, Ahmed H; Alquraishi, Saleh A; Al-Rasheid, Khaled A S; Stagegaard, Julia; Strauss, Günter; Bertelsen, Mads Frost; Sicheritz-Ponten, Thomas; Antczak, Douglas F; Bailey, Ernest; Nielsen, Rasmus; Willerslev, Eske; Orlando, Ludovic

    2014-12-30

    Horses, asses, and zebras belong to a single genus, Equus, which emerged 4.0-4.5 Mya. Although the equine fossil record represents a textbook example of evolution, the succession of events that gave rise to the diversity of species existing today remains unclear. Here we present six genomes from each living species of asses and zebras. This completes the set of genomes available for all extant species in the genus, which was hitherto represented only by the horse and the domestic donkey. In addition, we used a museum specimen to characterize the genome of the quagga zebra, which was driven to extinction in the early 1900s. We scan the genomes for lineage-specific adaptations and identify 48 genes that have evolved under positive selection and are involved in olfaction, immune response, development, locomotion, and behavior. Our extensive genome dataset reveals a highly dynamic demographic history with synchronous expansions and collapses on different continents during the last 400 ky after major climatic events. We show that the earliest speciation occurred with gene flow in Northern America, and that the ancestor of present-day asses and zebras dispersed into the Old World 2.1-3.4 Mya. Strikingly, we also find evidence for gene flow involving three contemporary equine species despite chromosomal numbers varying from 16 pairs to 31 pairs. These findings challenge the claim that the accumulation of chromosomal rearrangements drive complete reproductive isolation, and promote equids as a fundamental model for understanding the interplay between chromosomal structure, gene flow, and, ultimately, speciation.

  6. High gene flow in epiphytic ferns despite habitat loss and fragmentation.

    Science.gov (United States)

    Winkler, Manuela; Koch, Marcus; Hietz, Peter

    2011-01-01

    Tropical montane forests suffer from increasing fragmentation and replacement by other types of land-use such as coffee plantations. These processes are known to affect gene flow and genetic structure of plant populations. Epiphytes are particularly vulnerable because they depend on their supporting trees for their entire life-cycle. We compared population genetic structure and genetic diversity derived from AFLP markers of two epiphytic fern species differing in their ability to colonize secondary habitats. One species, Pleopeltis crassinervata , is a successful colonizer of shade trees and isolated trees whereas the other species, Polypodium rhodopleuron , is restricted to forests with anthropogenic separation leading to significant isolation between populations. By far most genetic variation was distributed within rather than among populations in both species, and a genetic admixture analysis did not reveal any clustering. Gene flow exceeded by far the benchmark of one migrant per generation to prevent genetic divergence between populations in both species. Though populations are threatened by habitat loss, long-distance dispersal is likely to support gene flow even between distant populations, which efficiently delays genetic isolation. Consequently, populations may rather be threatened by ecological consequences of habitat loss and fragmentation.

  7. Performance of single and concatenated sets of mitochondrial genes at inferring metazoan relationships relative to full mitogenome data.

    Directory of Open Access Journals (Sweden)

    Justin C Havird

    Full Text Available Mitochondrial (mt genes are some of the most popular and widely-utilized genetic loci in phylogenetic studies of metazoan taxa. However, their linked nature has raised questions on whether using the entire mitogenome for phylogenetics is overkill (at best or pseudoreplication (at worst. Moreover, no studies have addressed the comparative phylogenetic utility of mitochondrial genes across individual lineages within the entire Metazoa. To comment on the phylogenetic utility of individual mt genes as well as concatenated subsets of genes, we analyzed mitogenomic data from 1865 metazoan taxa in 372 separate lineages spanning genera to subphyla. Specifically, phylogenies inferred from these datasets were statistically compared to ones generated from all 13 mt protein-coding (PC genes (i.e., the "supergene" set to determine which single genes performed "best" at, and the minimum number of genes required to, recover the "supergene" topology. Surprisingly, the popular marker COX1 performed poorest, while ND5, ND4, and ND2 were most likely to reproduce the "supergene" topology. Averaged across all lineages, the longest ∼2 mt PC genes were sufficient to recreate the "supergene" topology, although this average increased to ∼5 genes for datasets with 40 or more taxa. Furthermore, concatenation of the three "best" performing mt PC genes outperformed that of the three longest mt PC genes (i.e, ND5, COX1, and ND4. Taken together, while not all mt PC genes are equally interchangeable in phylogenetic studies of the metazoans, some subset can serve as a proxy for the 13 mt PC genes. However, the exact number and identity of these genes is specific to the lineage in question and cannot be applied indiscriminately across the Metazoa.

  8. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks.

    Science.gov (United States)

    Ghanat Bari, Mehrab; Ung, Choong Yong; Zhang, Cheng; Zhu, Shizhen; Li, Hu

    2017-08-01

    Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 10 8 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.

  9. Gene flow of common ash (Fraxinus excelsior L. in a fragmented landscape.

    Directory of Open Access Journals (Sweden)

    Devrim Semizer-Cuming

    Full Text Available Gene flow dynamics of common ash (Fraxinus excelsior L. is affected by several human activities in Central Europe, including habitat fragmentation, agroforestry expansion, controlled and uncontrolled transfer of reproductive material, and a recently introduced emerging infectious disease, ash dieback, caused by Hymenoscyphus fraxineus. Habitat fragmentation may alter genetic connectivity and effective population size, leading to loss of genetic diversity and increased inbreeding in ash populations. Gene flow from cultivated trees in landscapes close to their native counterparts may also influence the adaptability of future generations. The devastating effects of ash dieback have already been observed in both natural and managed populations in continental Europe. However, potential long-term effects of genetic bottlenecks depend on gene flow across fragmented landscapes. For this reason, we studied the genetic connectivity of ash trees in an isolated forest patch of a fragmented landscape in Rösenbeck, Germany. We applied two approaches to parentage analysis to estimate gene flow patterns at the study site. We specifically investigated the presence of background pollination at the landscape level and the degree of genetic isolation between native and cultivated trees. Local meteorological data was utilized to understand the effect of wind on the pollen and seed dispersal patterns. Gender information of the adult trees was considered for calculating the dispersal distances. We found that the majority of the studied seeds (55-64% and seedlings (75-98% in the forest patch were fathered and mothered by the trees within the same patch. However, we determined a considerable amount of pollen flow (26-45% from outside of the study site, representing background pollination at the landscape level. Limited pollen flow was observed from neighbouring cultivated trees (2%. Both pollen and seeds were dispersed in all directions in accordance with the local

  10. Gene flow of common ash (Fraxinus excelsior L.) in a fragmented landscape.

    Science.gov (United States)

    Semizer-Cuming, Devrim; Kjær, Erik Dahl; Finkeldey, Reiner

    2017-01-01

    Gene flow dynamics of common ash (Fraxinus excelsior L.) is affected by several human activities in Central Europe, including habitat fragmentation, agroforestry expansion, controlled and uncontrolled transfer of reproductive material, and a recently introduced emerging infectious disease, ash dieback, caused by Hymenoscyphus fraxineus. Habitat fragmentation may alter genetic connectivity and effective population size, leading to loss of genetic diversity and increased inbreeding in ash populations. Gene flow from cultivated trees in landscapes close to their native counterparts may also influence the adaptability of future generations. The devastating effects of ash dieback have already been observed in both natural and managed populations in continental Europe. However, potential long-term effects of genetic bottlenecks depend on gene flow across fragmented landscapes. For this reason, we studied the genetic connectivity of ash trees in an isolated forest patch of a fragmented landscape in Rösenbeck, Germany. We applied two approaches to parentage analysis to estimate gene flow patterns at the study site. We specifically investigated the presence of background pollination at the landscape level and the degree of genetic isolation between native and cultivated trees. Local meteorological data was utilized to understand the effect of wind on the pollen and seed dispersal patterns. Gender information of the adult trees was considered for calculating the dispersal distances. We found that the majority of the studied seeds (55-64%) and seedlings (75-98%) in the forest patch were fathered and mothered by the trees within the same patch. However, we determined a considerable amount of pollen flow (26-45%) from outside of the study site, representing background pollination at the landscape level. Limited pollen flow was observed from neighbouring cultivated trees (2%). Both pollen and seeds were dispersed in all directions in accordance with the local wind directions

  11. Illustration interface of accident progression in PWR by quick inference based on multilevel flow models

    International Nuclear Information System (INIS)

    Yoshikawa, H.; Ouyang, J.; Niwa, Y.

    2006-01-01

    In this paper, a new accident inference method is proposed by using a goal and function oriented modeling method called Multilevel Flow Model focusing on explaining the causal-consequence relations and the objective of automatic action in the accident of nuclear power plant. Users can easily grasp how the various plant parameters will behave and how the various safety facilities will be activated sequentially to cope with the accident until the nuclear power plants are settled into safety state, i.e., shutdown state. The applicability of the developed method was validated by the conduction of internet-based 'view' experiment to the voluntary respondents, and in the future, further elaboration of interface design and the further introduction of instruction contents will be developed to make it become the usable CAI system. (authors)

  12. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees.

    Science.gov (United States)

    Stolzer, Maureen; Lai, Han; Xu, Minli; Sathaye, Deepa; Vernot, Benjamin; Durand, Dannie

    2012-09-15

    Gene duplication (D), transfer (T), loss (L) and incomplete lineage sorting (I) are crucial to the evolution of gene families and the emergence of novel functions. The history of these events can be inferred via comparison of gene and species trees, a process called reconciliation, yet current reconciliation algorithms model only a subset of these evolutionary processes. We present an algorithm to reconcile a binary gene tree with a nonbinary species tree under a DTLI parsimony criterion. This is the first reconciliation algorithm to capture all four evolutionary processes driving tree incongruence and the first to reconcile non-binary species trees with a transfer model. Our algorithm infers all optimal solutions and reports complete, temporally feasible event histories, giving the gene and species lineages in which each event occurred. It is fixed-parameter tractable, with polytime complexity when the maximum species outdegree is fixed. Application of our algorithms to prokaryotic and eukaryotic data show that use of an incomplete event model has substantial impact on the events inferred and resulting biological conclusions. Our algorithms have been implemented in Notung, a freely available phylogenetic reconciliation software package, available at http://www.cs.cmu.edu/~durand/Notung. mstolzer@andrew.cmu.edu.

  13. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

    Science.gov (United States)

    Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D

    2012-10-05

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  14. Genetics, Gene Flow, and Glaciation: The Case of the South American Limpet Nacella mytilina.

    Directory of Open Access Journals (Sweden)

    Claudio A González-Wevar

    Full Text Available Glacial episodes of the Quaternary, and particularly the Last Glacial Maximum (LGM drastically altered the distribution of the Southern-Hemisphere biota, principally at higher latitudes. The irregular coastline of Patagonia expanding for more than 84.000 km constitutes a remarkable area to evaluate the effect of Quaternary landscape and seascape shifts over the demography of near-shore marine benthic organisms. Few studies describing the biogeographic responses of marine species to the LGM have been conducted in Patagonia, but existing data from coastal marine species have demonstrated marked genetic signatures of post-LGM recolonization and expansion. The kelp-dweller limpet Nacella mytilina is broadly distributed along the southern tip of South America and at the Falkland/Malvinas Islands. Considering its distribution, abundance, and narrow bathymetry, N. mytilina represents an appropriate model to infer how historical and contemporary processes affected the distribution of intraspecific genetic diversity and structure along the southern tip of South America. At the same time, it will be possible to determine how life history traits and the ecology of the species are responsible for the current pattern of gene flow and connectivity across the study area. We conducted phylogeographic and demographic inference analyses in N. mytilina from 12 localities along Pacific Patagonia (PP and one population from the Falkland/Malvinas Islands (FI. Analyses of the mitochondrial gene COI in 300 individuals of N. mytilina revealed low levels of genetic polymorphism and the absence of genetic differentiation along PP. In contrast, FI showed a strong and significant differentiation from Pacific Patagonian populations. Higher levels of genetic diversity were also recorded in the FI population, together with a more expanded genealogy supporting the hypothesis of glacial persistence of the species in these islands. Haplotype genealogy, and mismatch analyses in

  15. Genetics, Gene Flow, and Glaciation: The Case of the South American Limpet Nacella mytilina

    Science.gov (United States)

    González-Wevar, Claudio A.; Rosenfeld, Sebastián; Segovia, Nicolás I.; Hüne, Mathias; Gérard, Karin; Ojeda, Jaime; Mansilla, Andrés; Brickle, Paul; Díaz, Angie; Poulin, Elie

    2016-01-01

    Glacial episodes of the Quaternary, and particularly the Last Glacial Maximum (LGM) drastically altered the distribution of the Southern-Hemisphere biota, principally at higher latitudes. The irregular coastline of Patagonia expanding for more than 84.000 km constitutes a remarkable area to evaluate the effect of Quaternary landscape and seascape shifts over the demography of near-shore marine benthic organisms. Few studies describing the biogeographic responses of marine species to the LGM have been conducted in Patagonia, but existing data from coastal marine species have demonstrated marked genetic signatures of post-LGM recolonization and expansion. The kelp-dweller limpet Nacella mytilina is broadly distributed along the southern tip of South America and at the Falkland/Malvinas Islands. Considering its distribution, abundance, and narrow bathymetry, N. mytilina represents an appropriate model to infer how historical and contemporary processes affected the distribution of intraspecific genetic diversity and structure along the southern tip of South America. At the same time, it will be possible to determine how life history traits and the ecology of the species are responsible for the current pattern of gene flow and connectivity across the study area. We conducted phylogeographic and demographic inference analyses in N. mytilina from 12 localities along Pacific Patagonia (PP) and one population from the Falkland/Malvinas Islands (FI). Analyses of the mitochondrial gene COI in 300 individuals of N. mytilina revealed low levels of genetic polymorphism and the absence of genetic differentiation along PP. In contrast, FI showed a strong and significant differentiation from Pacific Patagonian populations. Higher levels of genetic diversity were also recorded in the FI population, together with a more expanded genealogy supporting the hypothesis of glacial persistence of the species in these islands. Haplotype genealogy, and mismatch analyses in the FI population

  16. ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information.

    Science.gov (United States)

    Lachmann, Alexander; Giorgi, Federico M; Lopez, Gonzalo; Califano, Andrea

    2016-07-15

    The accurate reconstruction of gene regulatory networks from large scale molecular profile datasets represents one of the grand challenges of Systems Biology. The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) represents one of the most effective tools to accomplish this goal. However, the initial Fixed Bandwidth (FB) implementation is both inefficient and unable to deal with sample sets providing largely uneven coverage of the probability density space. Here, we present a completely new implementation of the algorithm, based on an Adaptive Partitioning strategy (AP) for estimating the Mutual Information. The new AP implementation (ARACNe-AP) achieves a dramatic improvement in computational performance (200× on average) over the previous methodology, while preserving the Mutual Information estimator and the Network inference accuracy of the original algorithm. Given that the previous version of ARACNe is extremely demanding, the new version of the algorithm will allow even researchers with modest computational resources to build complex regulatory networks from hundreds of gene expression profiles. A JAVA cross-platform command line executable of ARACNe, together with all source code and a detailed usage guide are freely available on Sourceforge (http://sourceforge.net/projects/aracne-ap). JAVA version 8 or higher is required. califano@c2b2.columbia.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  17. Genetic diversity and gene flow revealed by microsatellite DNA ...

    African Journals Online (AJOL)

    Dacryodes edulis is a multipurpose tree integrated in the cropping system of Central African region still dominated by subsistence agriculture. Some populations grown are wild which can provide information on the domestication process, and could also represent a potential source of gene flow. Leaves samples for DNA ...

  18. Gene flow among established Puerto Rican populations of the exotic tree species, Albizia lebbeck.

    Science.gov (United States)

    Dunphy, B K; Hamrick, J L

    2005-04-01

    We estimate gene flow and patterns of genetic diversity in Albizia lebbeck, an invasive leguminous tree in the dry forest of southwestern Puerto Rico. Genetic diversity estimates calculated for 10 populations of 24 trees each indicated that these populations may have been formed from multiple introductions. The presence of unique genotypes in the northernmost populations suggests that novel genotypes are still immigrating into the area. This combination of individuals from disparate locations led to high estimates of genetic diversity (He = 0.266, P = 0.67). Indirect estimates of gene flow indicate that only 0.69 migrants per generation move between populations, suggesting that genetic diversity within populations should decrease due to genetic drift. Since migration-drift equilibrium was not found, however, this estimate needs to be viewed with caution. The regular production of pods in this outcrossing species (tm = 0.979) indicates that sufficient outcross pollen is received to insure successful reproduction. Direct estimates of gene flow indicate that between 44 and 100% of pollen received by trees in four small stands of trees (n < 11) was foreign. The role of gene flow in facilitating the spread of this invasive plant species is discussed.

  19. Improving catchment discharge predictions by inferring flow route contributions from a nested-scale monitoring and model setup

    Science.gov (United States)

    van der Velde, Y.; Rozemeijer, J. C.; de Rooij, G. H.; van Geer, F. C.; Torfs, P. J. J. F.; de Louw, P. G. B.

    2011-03-01

    Identifying effective measures to reduce nutrient loads of headwaters in lowland catchments requires a thorough understanding of flow routes of water and nutrients. In this paper we assess the value of nested-scale discharge and groundwater level measurements for the estimation of flow route volumes and for predictions of catchment discharge. In order to relate field-site measurements to the catchment-scale an upscaling approach is introduced that assumes that scale differences in flow route fluxes originate from differences in the relationship between groundwater storage and the spatial structure of the groundwater table. This relationship is characterized by the Groundwater Depth Distribution (GDD) curve that relates spatial variation in groundwater depths to the average groundwater depth. The GDD-curve was measured for a single field site (0.009 km2) and simple process descriptions were applied to relate groundwater levels to flow route discharges. This parsimonious model could accurately describe observed storage, tube drain discharge, overland flow and groundwater flow simultaneously with Nash-Sutcliff coefficients exceeding 0.8. A probabilistic Monte Carlo approach was applied to upscale field-site measurements to catchment scales by inferring scale-specific GDD-curves from the hydrographs of two nested catchments (0.4 and 6.5 km2). The estimated contribution of tube drain effluent (a dominant source for nitrates) decreased with increasing scale from 76-79% at the field-site to 34-61% and 25-50% for both catchment scales. These results were validated by demonstrating that a model conditioned on nested-scale measurements improves simulations of nitrate loads and predictions of extreme discharges during validation periods compared to a model that was conditioned on catchment discharge only.

  20. DOSE RESPONSE FROM HIGH THROUGHPUT GENE EXPRESSION STUDIES AND THE INFLUENCE OF TIME AND CELL LINE ON INFERRED MODE OF ACTION BY ONTOLOGIC ENRICHMENT (SOT)

    Science.gov (United States)

    Gene expression with ontologic enrichment and connectivity mapping tools is widely used to infer modes of action (MOA) for therapeutic drugs. Despite progress in high-throughput (HT) genomic systems, strategies suitable to identify industrial chemical MOA are needed. The L1000 is...

  1. Beekeeping practices and geographic distance, not land use, drive gene flow across tropical bees.

    Science.gov (United States)

    Jaffé, Rodolfo; Pope, Nathaniel; Acosta, André L; Alves, Denise A; Arias, Maria C; De la Rúa, Pilar; Francisco, Flávio O; Giannini, Tereza C; González-Chaves, Adrian; Imperatriz-Fonseca, Vera L; Tavares, Mara G; Jha, Shalene; Carvalheiro, Luísa G

    2016-11-01

    Across the globe, wild bees are threatened by ongoing natural habitat loss, risking the maintenance of plant biodiversity and agricultural production. Despite the ecological and economic importance of wild bees and the fact that several species are now managed for pollination services worldwide, little is known about how land use and beekeeping practices jointly influence gene flow. Using stingless bees as a model system, containing wild and managed species that are presumed to be particularly susceptible to habitat degradation, here we examine the main drivers of tropical bee gene flow. We employ a novel landscape genetic approach to analyse data from 135 populations of 17 stingless bee species distributed across diverse tropical biomes within the Americas. Our work has important methodological implications, as we illustrate how a maximum-likelihood approach can be applied in a meta-analysis framework to account for multiple factors, and weight estimates by sample size. In contrast to previously held beliefs, gene flow was not related to body size or deforestation, and isolation by geographic distance (IBD) was significantly affected by management, with managed species exhibiting a weaker IBD than wild ones. Our study thus reveals the critical importance of beekeeping practices in shaping the patterns of genetic differentiation across bee species. Additionally, our results show that many stingless bee species maintain high gene flow across heterogeneous landscapes. We suggest that future efforts to preserve wild tropical bees should focus on regulating beekeeping practices to maintain natural gene flow and enhancing pollinator-friendly habitats, prioritizing species showing a limited dispersal ability. © 2016 John Wiley & Sons Ltd.

  2. Inferring gene expression dynamics via functional regression analysis

    Directory of Open Access Journals (Sweden)

    Leng Xiaoyan

    2008-01-01

    Full Text Available Abstract Background Temporal gene expression profiles characterize the time-dynamics of expression of specific genes and are increasingly collected in current gene expression experiments. In the analysis of experiments where gene expression is obtained over the life cycle, it is of interest to relate temporal patterns of gene expression associated with different developmental stages to each other to study patterns of long-term developmental gene regulation. We use tools from functional data analysis to study dynamic changes by relating temporal gene expression profiles of different developmental stages to each other. Results We demonstrate that functional regression methodology can pinpoint relationships that exist between temporary gene expression profiles for different life cycle phases and incorporates dimension reduction as needed for these high-dimensional data. By applying these tools, gene expression profiles for pupa and adult phases are found to be strongly related to the profiles of the same genes obtained during the embryo phase. Moreover, one can distinguish between gene groups that exhibit relationships with positive and others with negative associations between later life and embryonal expression profiles. Specifically, we find a positive relationship in expression for muscle development related genes, and a negative relationship for strictly maternal genes for Drosophila, using temporal gene expression profiles. Conclusion Our findings point to specific reactivation patterns of gene expression during the Drosophila life cycle which differ in characteristic ways between various gene groups. Functional regression emerges as a useful tool for relating gene expression patterns from different developmental stages, and avoids the problems with large numbers of parameters and multiple testing that affect alternative approaches.

  3. Spread of a new parasitic B chromosome variant is facilitated by high gene flow.

    Directory of Open Access Journals (Sweden)

    María Inmaculada Manrique-Poyato

    Full Text Available The B24 chromosome variant emerged several decades ago in a Spanish population of the grasshopper Eyprepocnemis plorans and is currently reaching adjacent populations. Here we report, for the first time, how a parasitic B chromosome (a strictly vertically transmitted parasite expands its geographical range aided by high gene flow in the host species. For six years we analyzed B frequency in several populations to the east and west of the original population and found extensive spatial variation, but only a slight temporal trend. The highest B24 frequency was found in its original population (Torrox and it decreased closer to both the eastern and the western populations. The analysis of Inter Simple Sequence Repeat (ISSR markers showed the existence of a low but significant degree of population subdivision, as well as significant isolation by distance (IBD. Pairwise Nem estimates suggested the existence of high gene flow between the four populations located in the Torrox area, with higher values towards the east. No significant barriers to gene flow were found among these four populations, and we conclude that high gene flow is facilitating B24 diffusion both eastward and westward, with minor role for B24 drive due to the arrival of drive suppressor genes which are also frequent in the donor population.

  4. Stratigraphy, sedimentology and inferred flow dynamics from the July 2015 block-and-ash flow deposits at Volcán de Colima, Mexico

    Science.gov (United States)

    Macorps, Elodie; Charbonnier, Sylvain J.; Varley, Nick R.; Capra, Lucia; Atlas, Zachary; Cabré, Josep

    2018-01-01

    The July 2015 block-and-ash flow (BAF) events represent the first documented series of large-volume and long-runout BAFs generated from sustained dome collapses at Volcán de Colima. This eruption is particularly exceptional at this volcano due to (1) the large volume of BAF material emplaced (0.0077 ± 0.001 km3), (2) the long runout reached by the associated BAFs (max. 10 km), and (3) the short period ( 18 h) over which two main long-sustained dome collapse events occurred (on 10 and 11 July, respectively). Stratigraphy and sedimentology of the 2015 BAF deposits exposed in the southern flank of the volcano based on lithofacies description, grain size measurements and clast componentry allowed the recognition of three main deposit facies (i.e., valley-confined, overbank and ash-cloud surge deposits). Correlations and lithofacies variations inside three main flow units from both the valley-confined and overbank deposits left from the emplacement of the second series of BAFs on 11 July provide detailed information about: (1) the distribution, volumes and sedimentological characteristics of the different units; (2) flow parameters (i.e., velocity and dynamic pressure) and mobility metrics as inferred from associated deposits; and (3) changes in the dynamics of the different flows and their material during emplacement. These data were coupled with geomorphic analyses to assess the role of the topography in controlling the behaviour and impacts of the successive BAF pulses on the volcano flanks. Finally, these findings are used to propose a conceptual model for transport and deposition mechanisms of the July 2015 BAFs at Volcán de Colima. In this model, deposition occurs by rapid stepwise aggradation of successive BAF pulses. Flow confinement in a narrow and sinuous channel enhance the mobility and runout of individual channelized BAF pulses. When these conditions occur, the progressive valley infilling from successive sustained dome-collapse events promote the

  5. AN UNUSUAL PATTERN OF GENE FLOW BETWEEN THE TWO SOCIAL FORMS OF THE FIRE ANT SOLENOPSIS INVICTA.

    Science.gov (United States)

    Ross, Kenneth G; Shoemaker, D DeWayne

    1993-10-01

    Uncertainty over the role of shifts in social behavior in the process of speciation in social insects has stimulated interest in determining the extent of gene flow between conspecific populations differing in colony social organization. Allele and genotype frequencies at 12 neutral polymorphic protein markers, as well as the numbers of alleles at the sex-determining locus (loci), are shown here to be consistent with significant ongoing gene flow between two geographically adjacent populations of Solenopsis invicta that differ in colony queen number. Data from a thirteenth protein marker that is under strong differential selection in the two social forms confirm that such gene flow occurs. Data from this selected locus, combined with knowledge of the reproductive biology of the two social forms, further suggest that interform gene flow is largely unidirectional and mediated through males only. This unusual pattern of gene flow results from the influence of the unique social enviroments of the two forms on the behavior of workers and on the reproductive physiology of sexuals. © 1993 The Society for the Study of Evolution.

  6. Inferring transcriptional compensation interactions in yeast via stepwise structure equation modeling

    Directory of Open Access Journals (Sweden)

    Wang Woei-Fuh

    2008-03-01

    Full Text Available Abstract Background With the abundant information produced by microarray technology, various approaches have been proposed to infer transcriptional regulatory networks. However, few approaches have studied subtle and indirect interaction such as genetic compensation, the existence of which is widely recognized although its mechanism has yet to be clarified. Furthermore, when inferring gene networks most models include only observed variables whereas latent factors, such as proteins and mRNA degradation that are not measured by microarrays, do participate in networks in reality. Results Motivated by inferring transcriptional compensation (TC interactions in yeast, a stepwise structural equation modeling algorithm (SSEM is developed. In addition to observed variables, SSEM also incorporates hidden variables to capture interactions (or regulations from latent factors. Simulated gene networks are used to determine with which of six possible model selection criteria (MSC SSEM works best. SSEM with Bayesian information criterion (BIC results in the highest true positive rates, the largest percentage of correctly predicted interactions from all existing interactions, and the highest true negative (non-existing interactions rates. Next, we apply SSEM using real microarray data to infer TC interactions among (1 small groups of genes that are synthetic sick or lethal (SSL to SGS1, and (2 a group of SSL pairs of 51 yeast genes involved in DNA synthesis and repair that are of interest. For (1, SSEM with BIC is shown to outperform three Bayesian network algorithms and a multivariate autoregressive model, checked against the results of qRT-PCR experiments. The predictions for (2 are shown to coincide with several known pathways of Sgs1 and its partners that are involved in DNA replication, recombination and repair. In addition, experimentally testable interactions of Rad27 are predicted. Conclusion SSEM is a useful tool for inferring genetic networks, and the

  7. System-level insights into the cellular interactome of a non-model organism: inferring, modelling and analysing functional gene network of soybean (Glycine max.

    Directory of Open Access Journals (Sweden)

    Yungang Xu

    Full Text Available Cellular interactome, in which genes and/or their products interact on several levels, forming transcriptional regulatory-, protein interaction-, metabolic-, signal transduction networks, etc., has attracted decades of research focuses. However, such a specific type of network alone can hardly explain the various interactive activities among genes. These networks characterize different interaction relationships, implying their unique intrinsic properties and defects, and covering different slices of biological information. Functional gene network (FGN, a consolidated interaction network that models fuzzy and more generalized notion of gene-gene relations, have been proposed to combine heterogeneous networks with the goal of identifying functional modules supported by multiple interaction types. There are yet no successful precedents of FGNs on sparsely studied non-model organisms, such as soybean (Glycine max, due to the absence of sufficient heterogeneous interaction data. We present an alternative solution for inferring the FGNs of soybean (SoyFGNs, in a pioneering study on the soybean interactome, which is also applicable to other organisms. SoyFGNs exhibit the typical characteristics of biological networks: scale-free, small-world architecture and modularization. Verified by co-expression and KEGG pathways, SoyFGNs are more extensive and accurate than an orthology network derived from Arabidopsis. As a case study, network-guided disease-resistance gene discovery indicates that SoyFGNs can provide system-level studies on gene functions and interactions. This work suggests that inferring and modelling the interactome of a non-model plant are feasible. It will speed up the discovery and definition of the functions and interactions of other genes that control important functions, such as nitrogen fixation and protein or lipid synthesis. The efforts of the study are the basis of our further comprehensive studies on the soybean functional

  8. Inference of Transcription Regulatory Network in Low Phytic Acid Soybean Seeds

    Directory of Open Access Journals (Sweden)

    Neelam Redekar

    2017-11-01

    Full Text Available A dominant loss of function mutation in myo-inositol phosphate synthase (MIPS gene and recessive loss of function mutations in two multidrug resistant protein type-ABC transporter genes not only reduce the seed phytic acid levels in soybean, but also affect the pathways associated with seed development, ultimately resulting in low emergence. To understand the regulatory mechanisms and identify key genes that intervene in the seed development process in low phytic acid crops, we performed computational inference of gene regulatory networks in low and normal phytic acid soybeans using a time course transcriptomic data and multiple network inference algorithms. We identified a set of putative candidate transcription factors and their regulatory interactions with genes that have functions in myo-inositol biosynthesis, auxin-ABA signaling, and seed dormancy. We evaluated the performance of our unsupervised network inference method by comparing the predicted regulatory network with published regulatory interactions in Arabidopsis. Some contrasting regulatory interactions were observed in low phytic acid mutants compared to non-mutant lines. These findings provide important hypotheses on expression regulation of myo-inositol metabolism and phytohormone signaling in developing low phytic acid soybeans. The computational pipeline used for unsupervised network learning in this study is provided as open source software and is freely available at https://lilabatvt.github.io/LPANetwork/.

  9. Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression.

    Science.gov (United States)

    Walker, Jeffrey A

    2016-01-01

    Self-contained tests estimate and test the association between a phenotype and mean expression level in a gene set defined a priori . Many self-contained gene set analysis methods have been developed but the performance of these methods for phenotypes that are continuous rather than discrete and with multiple nuisance covariates has not been well studied. Here, I use Monte Carlo simulation to evaluate the performance of both novel and previously published (and readily available via R) methods for inferring effects of a continuous predictor on mean expression in the presence of nuisance covariates. The motivating data are a high-profile dataset which was used to show opposing effects of hedonic and eudaimonic well-being (or happiness) on the mean expression level of a set of genes that has been correlated with social adversity (the CTRA gene set). The original analysis of these data used a linear model (GLS) of fixed effects with correlated error to infer effects of Hedonia and Eudaimonia on mean CTRA expression. The standardized effects of Hedonia and Eudaimonia on CTRA gene set expression estimated by GLS were compared to estimates using multivariate (OLS) linear models and generalized estimating equation (GEE) models. The OLS estimates were tested using O'Brien's OLS test, Anderson's permutation [Formula: see text]-test, two permutation F -tests (including GlobalAncova), and a rotation z -test (Roast). The GEE estimates were tested using a Wald test with robust standard errors. The performance (Type I, II, S, and M errors) of all tests was investigated using a Monte Carlo simulation of data explicitly modeled on the re-analyzed dataset. GLS estimates are inconsistent between data sets, and, in each dataset, at least one coefficient is large and highly statistically significant. By contrast, effects estimated by OLS or GEE are very small, especially relative to the standard errors. Bootstrap and permutation GLS distributions suggest that the GLS results in

  10. Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jeffrey A. Walker

    2016-10-01

    Full Text Available Background Self-contained tests estimate and test the association between a phenotype and mean expression level in a gene set defined a priori. Many self-contained gene set analysis methods have been developed but the performance of these methods for phenotypes that are continuous rather than discrete and with multiple nuisance covariates has not been well studied. Here, I use Monte Carlo simulation to evaluate the performance of both novel and previously published (and readily available via R methods for inferring effects of a continuous predictor on mean expression in the presence of nuisance covariates. The motivating data are a high-profile dataset which was used to show opposing effects of hedonic and eudaimonic well-being (or happiness on the mean expression level of a set of genes that has been correlated with social adversity (the CTRA gene set. The original analysis of these data used a linear model (GLS of fixed effects with correlated error to infer effects of Hedonia and Eudaimonia on mean CTRA expression. Methods The standardized effects of Hedonia and Eudaimonia on CTRA gene set expression estimated by GLS were compared to estimates using multivariate (OLS linear models and generalized estimating equation (GEE models. The OLS estimates were tested using O’Brien’s OLS test, Anderson’s permutation ${r}_{F}^{2}$ r F 2 -test, two permutation F-tests (including GlobalAncova, and a rotation z-test (Roast. The GEE estimates were tested using a Wald test with robust standard errors. The performance (Type I, II, S, and M errors of all tests was investigated using a Monte Carlo simulation of data explicitly modeled on the re-analyzed dataset. Results GLS estimates are inconsistent between data sets, and, in each dataset, at least one coefficient is large and highly statistically significant. By contrast, effects estimated by OLS or GEE are very small, especially relative to the standard errors. Bootstrap and permutation GLS

  11. Inferring genetic interactions from comparative fitness data.

    Science.gov (United States)

    Crona, Kristina; Gavryushkin, Alex; Greene, Devin; Beerenwinkel, Niko

    2017-12-20

    Darwinian fitness is a central concept in evolutionary biology. In practice, however, it is hardly possible to measure fitness for all genotypes in a natural population. Here, we present quantitative tools to make inferences about epistatic gene interactions when the fitness landscape is only incompletely determined due to imprecise measurements or missing observations. We demonstrate that genetic interactions can often be inferred from fitness rank orders, where all genotypes are ordered according to fitness, and even from partial fitness orders. We provide a complete characterization of rank orders that imply higher order epistasis. Our theory applies to all common types of gene interactions and facilitates comprehensive investigations of diverse genetic interactions. We analyzed various genetic systems comprising HIV-1, the malaria-causing parasite Plasmodium vivax , the fungus Aspergillus niger , and the TEM-family of β-lactamase associated with antibiotic resistance. For all systems, our approach revealed higher order interactions among mutations.

  12. Structure, evolution and functional inference on the Mildew Locus O (MLO) gene family in three cultivated Cucurbitaceae spp.

    Science.gov (United States)

    Iovieno, Paolo; Andolfo, Giuseppe; Schiavulli, Adalgisa; Catalano, Domenico; Ricciardi, Luigi; Frusciante, Luigi; Ercolano, Maria Raffaella; Pavan, Stefano

    2015-12-29

    The powdery mildew disease affects thousands of plant species and arguably represents the major fungal threat for many Cucurbitaceae crops, including melon (Cucumis melo L.), watermelon (Citrullus lanatus L.) and zucchini (Cucurbita pepo L.). Several studies revealed that specific members of the Mildew Locus O (MLO) gene family act as powdery mildew susceptibility factors. Indeed, their inactivation, as the result of gene knock-out or knock-down, is associated with a peculiar form of resistance, referred to as mlo resistance. We exploited recently available genomic information to provide a comprehensive overview of the MLO gene family in Cucurbitaceae. We report the identification of 16 MLO homologs in C. melo, 14 in C. lanatus and 18 in C. pepo genomes. Bioinformatic treatment of data allowed phylogenetic inference and the prediction of several ortholog pairs and groups. Comparison with functionally characterized MLO genes and, in C. lanatus, gene expression analysis, resulted in the detection of candidate powdery mildew susceptibility factors. We identified a series of conserved amino acid residues and motifs that are likely to play a major role for the function of MLO proteins. Finally, we performed a codon-based evolutionary analysis indicating a general high level of purifying selection in the three Cucurbitaceae MLO gene families, and the occurrence of regions under diversifying selection in candidate susceptibility factors. Results of this study may help to address further biological questions concerning the evolution and function of MLO genes. Moreover, data reported here could be conveniently used by breeding research, aiming to select powdery mildew resistant cultivars in Cucurbitaceae.

  13. The evolutionary history of ferns inferred from 25 low-copy nuclear genes.

    Science.gov (United States)

    Rothfels, Carl J; Li, Fay-Wei; Sigel, Erin M; Huiet, Layne; Larsson, Anders; Burge, Dylan O; Ruhsam, Markus; Deyholos, Michael; Soltis, Douglas E; Stewart, C Neal; Shaw, Shane W; Pokorny, Lisa; Chen, Tao; dePamphilis, Claude; DeGironimo, Lisa; Chen, Li; Wei, Xiaofeng; Sun, Xiao; Korall, Petra; Stevenson, Dennis W; Graham, Sean W; Wong, Gane K-S; Pryer, Kathleen M

    2015-07-01

    • Understanding fern (monilophyte) phylogeny and its evolutionary timescale is critical for broad investigations of the evolution of land plants, and for providing the point of comparison necessary for studying the evolution of the fern sister group, seed plants. Molecular phylogenetic investigations have revolutionized our understanding of fern phylogeny, however, to date, these studies have relied almost exclusively on plastid data.• Here we take a curated phylogenomics approach to infer the first broad fern phylogeny from multiple nuclear loci, by combining broad taxon sampling (73 ferns and 12 outgroup species) with focused character sampling (25 loci comprising 35877 bp), along with rigorous alignment, orthology inference and model selection.• Our phylogeny corroborates some earlier inferences and provides novel insights; in particular, we find strong support for Equisetales as sister to the rest of ferns, Marattiales as sister to leptosporangiate ferns, and Dennstaedtiaceae as sister to the eupolypods. Our divergence-time analyses reveal that divergences among the extant fern orders all occurred prior to ∼200 MYA. Finally, our species-tree inferences are congruent with analyses of concatenated data, but generally with lower support. Those cases where species-tree support values are higher than expected involve relationships that have been supported by smaller plastid datasets, suggesting that deep coalescence may be reducing support from the concatenated nuclear data.• Our study demonstrates the utility of a curated phylogenomics approach to inferring fern phylogeny, and highlights the need to consider underlying data characteristics, along with data quantity, in phylogenetic studies. © 2015 Botanical Society of America, Inc.

  14. The genetic assimilation in language borrowing inferred from Jing People.

    Science.gov (United States)

    Huang, Xiufeng; Zhou, Qinghui; Bin, Xiaoyun; Lai, Shu; Lin, Chaowen; Hu, Rong; Xiao, Jiashun; Luo, Dajun; Li, Yingxiang; Wei, Lan-Hai; Yeh, Hui-Yuan; Chen, Gang; Wang, Chuan-Chao

    2018-02-28

    The Jing people are a recognized ethnic group in Guangxi, southwest China, who are the immigrants from Vietnam during the 16th century. They speak Vietnamese but with lots of language borrowings from Cantonese, Zhuang, and Mandarin. However, it's unclear if there is large-scale gene flow from surrounding populations into Jing people during their language change due to the very limited genetic information of this population. We collected blood samples from 37 Jing and 3 Han Chinese individuals from Wanwei, Shanxin, and Wutou islands in Guangxi and genotyped about 600,000 genome-wide single nucleotide polymorphisms (SNPs). We used Principal Component Analysis (PCA), ADMIXTURE analysis, f statistics, qpWave and qpAdm to infer the population genetic structure and admixture. Our data revealed that the Jing people are genetically similar to the populations in southwest China and mainland Southeast Asia. But compared with Vietnamese, they show significant evidence of gene flow from surrounding East Asians. The admixture proportion is estimated to be around 35-42% in different Jing groups using southern Han Chinese as a proxy. The majority of the paternal lineages of Jing people are most likely from surrounding East Asians. We conclude that the formation and language change of present-day Jing people have involved genetic assimilation of surrounding East Asian populations. The language borrowing, in this case, is not only a cultural phenomenon but has involved demic diffusion. © 2018 Wiley Periodicals, Inc.

  15. Improving catchment discharge predictions by inferring flow route contributions from a nested-scale monitoring and model setup

    Directory of Open Access Journals (Sweden)

    Y. van der Velde

    2011-03-01

    Full Text Available Identifying effective measures to reduce nutrient loads of headwaters in lowland catchments requires a thorough understanding of flow routes of water and nutrients. In this paper we assess the value of nested-scale discharge and groundwater level measurements for the estimation of flow route volumes and for predictions of catchment discharge. In order to relate field-site measurements to the catchment-scale an upscaling approach is introduced that assumes that scale differences in flow route fluxes originate from differences in the relationship between groundwater storage and the spatial structure of the groundwater table. This relationship is characterized by the Groundwater Depth Distribution (GDD curve that relates spatial variation in groundwater depths to the average groundwater depth. The GDD-curve was measured for a single field site (0.009 km2 and simple process descriptions were applied to relate groundwater levels to flow route discharges. This parsimonious model could accurately describe observed storage, tube drain discharge, overland flow and groundwater flow simultaneously with Nash-Sutcliff coefficients exceeding 0.8. A probabilistic Monte Carlo approach was applied to upscale field-site measurements to catchment scales by inferring scale-specific GDD-curves from the hydrographs of two nested catchments (0.4 and 6.5 km2. The estimated contribution of tube drain effluent (a dominant source for nitrates decreased with increasing scale from 76–79% at the field-site to 34–61% and 25–50% for both catchment scales. These results were validated by demonstrating that a model conditioned on nested-scale measurements improves simulations of nitrate loads and predictions of extreme discharges during validation periods compared to a model that was conditioned on catchment discharge only.

  16. Consequences of gene flow between oilseed rape (Brassica napus) and its relatives.

    Science.gov (United States)

    Liu, Yongbo; Wei, Wei; Ma, Keping; Li, Junsheng; Liang, Yuyong; Darmency, Henri

    2013-10-01

    Numerous studies have focused on the probability of occurrence of gene flow between transgenic crops and their wild relatives and the likelihood of transgene escape, which should be assessed before the commercial release of transgenic crops. This review paper focuses on this issue for oilseed rape, Brassica napus L., a species that produces huge numbers of pollen grains and seeds. We analyze separately the distinct steps of gene flow: (1) pollen and seeds as vectors of gene flow; (2) spontaneous hybridization; (3) hybrid behavior, fitness cost due to hybridization and mechanisms of introgression; (4) and fitness benefit due to transgenes (e.g. herbicide resistance and Bt toxin). Some physical, biological and molecular means of transgene containment are also described. Although hybrids and first generation progeny are difficult to identify in fields and non-crop habitats, the literature shows that transgenes could readily introgress into Brassica rapa, Brassica juncea and Brassica oleracea, while introgression is expected to be rare with Brassica nigra, Hirschfeldia incana and Raphanus raphanistrum. The hybrids grow well but produce less seed than their wild parent. The difference declines with increasing generations. However, there is large uncertainty about the evolution of chromosome numbers and recombination, and many parameters of life history traits of hybrids and progeny are not determined with satisfactory confidence to build generic models capable to really cover the wide diversity of situations. We show that more studies are needed to strengthen and organize biological knowledge, which is a necessary prerequisite for model simulations to assess the practical and evolutionary outputs of introgression, and to provide guidelines for gene flow management. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  17. Landscape genetics as a tool for conservation planning: predicting the effects of landscape change on gene flow.

    Science.gov (United States)

    van Strien, Maarten J; Keller, Daniela; Holderegger, Rolf; Ghazoul, Jaboury; Kienast, Felix; Bolliger, Janine

    2014-03-01

    For conservation managers, it is important to know whether landscape changes lead to increasing or decreasing gene flow. Although the discipline of landscape genetics assesses the influence of landscape elements on gene flow, no studies have yet used landscape-genetic models to predict gene flow resulting from landscape change. A species that has already been severely affected by landscape change is the large marsh grasshopper (Stethophyma grossum), which inhabits moist areas in fragmented agricultural landscapes in Switzerland. From transects drawn between all population pairs within maximum dispersal distance (landscape composition as well as some measures of habitat configuration. Additionally, a complete sampling of all populations in our study area allowed incorporating measures of population topology. These measures together with the landscape metrics formed the predictor variables in linear models with gene flow as response variable (F(ST) and mean pairwise assignment probability). With a modified leave-one-out cross-validation approach, we selected the model with the highest predictive accuracy. With this model, we predicted gene flow under several landscape-change scenarios, which simulated construction, rezoning or restoration projects, and the establishment of a new population. For some landscape-change scenarios, significant increase or decrease in gene flow was predicted, while for others little change was forecast. Furthermore, we found that the measures of population topology strongly increase model fit in landscape genetic analysis. This study demonstrates the use of predictive landscape-genetic models in conservation and landscape planning.

  18. Genetic interaction motif finding by expectation maximization – a novel statistical model for inferring gene modules from synthetic lethality

    Directory of Open Access Journals (Sweden)

    Ye Ping

    2005-12-01

    Full Text Available Abstract Background Synthetic lethality experiments identify pairs of genes with complementary function. More direct functional associations (for example greater probability of membership in a single protein complex may be inferred between genes that share synthetic lethal interaction partners than genes that are directly synthetic lethal. Probabilistic algorithms that identify gene modules based on motif discovery are highly appropriate for the analysis of synthetic lethal genetic interaction data and have great potential in integrative analysis of heterogeneous datasets. Results We have developed Genetic Interaction Motif Finding (GIMF, an algorithm for unsupervised motif discovery from synthetic lethal interaction data. Interaction motifs are characterized by position weight matrices and optimized through expectation maximization. Given a seed gene, GIMF performs a nonlinear transform on the input genetic interaction data and automatically assigns genes to the motif or non-motif category. We demonstrate the capacity to extract known and novel pathways for Saccharomyces cerevisiae (budding yeast. Annotations suggested for several uncharacterized genes are supported by recent experimental evidence. GIMF is efficient in computation, requires no training and automatically down-weights promiscuous genes with high degrees. Conclusion GIMF effectively identifies pathways from synthetic lethality data with several unique features. It is mostly suitable for building gene modules around seed genes. Optimal choice of one single model parameter allows construction of gene networks with different levels of confidence. The impact of hub genes the generic probabilistic framework of GIMF may be used to group other types of biological entities such as proteins based on stochastic motifs. Analysis of the strongest motifs discovered by the algorithm indicates that synthetic lethal interactions are depleted between genes within a motif, suggesting that synthetic

  19. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method.

    Science.gov (United States)

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-10-06

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli , and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.

  20. Recent long-distance transgene flow into wild populations conforms to historical patterns of gene flow in cotton (Gossypium hirsutum) at its centre of origin.

    Science.gov (United States)

    Wegier, A; Piñeyro-Nelson, A; Alarcón, J; Gálvez-Mariscal, A; Alvarez-Buylla, E R; Piñero, D

    2011-10-01

    Over 95% of the currently cultivated cotton was domesticated from Gossypium hirsutum, which originated and diversified in Mexico. Demographic and genetic studies of this species at its centre of origin and diversification are lacking, although they are critical for cotton conservation and breeding. We investigated the actual and potential distribution of wild cotton populations, as well as the contribution of historical and recent gene flow in shaping cotton genetic diversity and structure. We evaluated historical gene flow using chloroplast microsatellites and recent gene flow through the assessment of transgene presence in wild cotton populations, exploiting the fact that genetically modified cotton has been planted in the North of Mexico since 1996. Assessment of geographic structure through Bayesian spatial analysis, BAPS and Genetic Algorithm for Rule-set Production (GARP), suggests that G. hirsutum seems to conform to a metapopulation scheme, with eight distinct metapopulations. Despite evidence for long-distance gene flow, genetic variation among the metapopulations of G. hirsutum is high (He = 0.894 ± 0.01). We identified 46 different haplotypes, 78% of which are unique to a particular metapopulation, in contrast to a single haplotype detected in cotton cultivars. Recent gene flow was also detected (m = 66/270 = 0.24), with four out of eight metapopulations having transgenes. We discuss the implications of the data presented here with respect to the conservation and future breeding of cotton populations and genetic diversity at its centre of crop origin. © 2011 Blackwell Publishing Ltd.

  1. Causal inference in biology networks with integrated belief propagation.

    Science.gov (United States)

    Chang, Rui; Karr, Jonathan R; Schadt, Eric E

    2015-01-01

    Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.

  2. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe

    Science.gov (United States)

    Botigué, Laura R.; Henn, Brenna M.; Gravel, Simon; Maples, Brian K.; Gignoux, Christopher R.; Corona, Erik; Atzmon, Gil; Burns, Edward; Ostrer, Harry; Flores, Carlos; Bertranpetit, Jaume; Comas, David; Bustamante, Carlos D.

    2013-01-01

    Human genetic diversity in southern Europe is higher than in other regions of the continent. This difference has been attributed to postglacial expansions, the demic diffusion of agriculture from the Near East, and gene flow from Africa. Using SNP data from 2,099 individuals in 43 populations, we show that estimates of recent shared ancestry between Europe and Africa are substantially increased when gene flow from North Africans, rather than Sub-Saharan Africans, is considered. The gradient of North African ancestry accounts for previous observations of low levels of sharing with Sub-Saharan Africa and is independent of recent gene flow from the Near East. The source of genetic diversity in southern Europe has important biomedical implications; we find that most disease risk alleles from genome-wide association studies follow expected patterns of divergence between Europe and North Africa, with the principal exception of multiple sclerosis. PMID:23733930

  3. Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers (Helianthus sect. Helianthus).

    Science.gov (United States)

    Moody, Michael L; Rieseberg, Loren H

    2012-07-01

    The annual sunflowers (Helianthus sect. Helianthus) present a formidable challenge for phylogenetic inference because of ancient hybrid speciation, recent introgression, and suspected issues with deep coalescence. Here we analyze sequence data from 11 nuclear DNA (nDNA) genes for multiple genotypes of species within the section to (1) reconstruct the phylogeny of this group, (2) explore the utility of nDNA gene trees for detecting hybrid speciation and introgression; and (3) test an empirical method of hybrid identification based on the phylogenetic congruence of nDNA gene trees from tightly linked genes. We uncovered considerable topological heterogeneity among gene trees with or without three previously identified hybrid species included in the analyses, as well as a general lack of reciprocal monophyly of species. Nonetheless, partitioned Bayesian analyses provided strong support for the reciprocal monophyly of all species except H. annuus (0.89 PP), the most widespread and abundant annual sunflower. Previous hypotheses of relationships among taxa were generally strongly supported (1.0 PP), except among taxa typically associated with H. annuus, apparently due to the paraphyly of the latter in all gene trees. While the individual nDNA gene trees provided a useful means for detecting recent hybridization, identification of ancient hybridization was problematic for all ancient hybrid species, even when linkage was considered. We discuss biological factors that affect the efficacy of phylogenetic methods for hybrid identification.

  4. Phylogenetic inference of Coxiella burnetii by 16S rRNA gene sequencing.

    Directory of Open Access Journals (Sweden)

    Heather P McLaughlin

    Full Text Available Coxiella burnetii is a human pathogen that causes the serious zoonotic disease Q fever. It is ubiquitous in the environment and due to its wide host range, long-range dispersal potential and classification as a bioterrorism agent, this microorganism is considered an HHS Select Agent. In the event of an outbreak or intentional release, laboratory strain typing methods can contribute to epidemiological investigations, law enforcement investigation and the public health response by providing critical information about the relatedness between C. burnetii isolates collected from different sources. Laboratory cultivation of C. burnetii is both time-consuming and challenging. Availability of strain collections is often limited and while several strain typing methods have been described over the years, a true gold-standard method is still elusive. Building upon epidemiological knowledge from limited, historical strain collections and typing data is essential to more accurately infer C. burnetii phylogeny. Harmonization of auspicious high-resolution laboratory typing techniques is critical to support epidemiological and law enforcement investigation. The single nucleotide polymorphism (SNP -based genotyping approach offers simplicity, rapidity and robustness. Herein, we demonstrate SNPs identified within 16S rRNA gene sequences can differentiate C. burnetii strains. Using this method, 55 isolates were assigned to six groups based on six polymorphisms. These 16S rRNA SNP-based genotyping results were largely congruent with those obtained by analyzing restriction-endonuclease (RE-digested DNA separated by SDS-PAGE and by the high-resolution approach based on SNPs within multispacer sequence typing (MST loci. The SNPs identified within the 16S rRNA gene can be used as targets for the development of additional SNP-based genotyping assays for C. burnetii.

  5. Lineage divergence and historical gene flow in the Chinese horseshoe bat (Rhinolophus sinicus.

    Directory of Open Access Journals (Sweden)

    Xiuguang Mao

    Full Text Available Closely related taxa living in sympatry provide good opportunities to investigate the origin of barriers to gene flow as well as the extent of reproductive isolation. The only two recognized subspecies of the Chinese rufous horseshoe bat Rhinolophus sinicus are characterized by unusual relative distributions in which R. s. septentrionalis is restricted to a small area within the much wider range of its sister taxon R. s. sinicus. To determine the history of lineage divergence and gene flow between these taxa, we applied phylogenetic, demographic and coalescent analyses to multi-locus datasets. MtDNA gene genealogies and microsatellite-based clustering together revealed three divergent lineages of sinicus, corresponding to Central China, East China and the offshore Hainan Island. However, the central lineage of sinicus showed a closer relationship with septentrionalis than with other lineages of R. s. sinicus, in contrary to morphological data. Paraphyly of sinicus could result from either past asymmetric mtDNA introgression between these two taxa, or could suggest septentrionalis evolved in situ from its more widespread sister subspecies. To test between these hypotheses, we applied coalescent-based phylogenetic reconstruction and Approximate Bayesian Computation (ABC. We found that septentrionalis is likely to be the ancestral taxon and therefore a recent origin of this subspecies can be ruled out. On the other hand, we found a clear signature of asymmetric mtDNA gene flow from septentrionalis into central populations of sinicus yet no nuclear gene flow, thus strongly pointing to historical mtDNA introgression. We suggest that the observed deeply divergent lineages within R. sinicus probably evolved in isolation in separate Pleistocene refugia, although their close phylogeographic correspondence with distinct eco-environmental zones suggests that divergent selection might also have promoted broad patterns of population genetic structure.

  6. Use of joint-growth directions and rock textures to infer thermal regimes during solidification of basaltic lava flows

    Science.gov (United States)

    Degraff, James M.; Long, Philip E.; Aydin, Atilla

    1989-09-01

    Thermal contraction joints form in the upper and lower solidifying crusts of basaltic lava flows and grow toward the interior as the crusts thicken. Lava flows are thus divided by vertical joints that, by changes in joint spacing and form, define horizontal intraflow layers known as tiers. Entablatures are tiers with joint spacings less than about 40 cm, whereas colonnades have larger joint spacings. We use structural and petrographic methods to infer heat-transfer processes and to constrain environmental conditions that produce these contrasting tiers. Joint-surface morphology indicates overall joint-growth direction and thus identifies the level in a flow where the upper and lower crusts met. Rock texture provides information on relative cooling rates in the tiers of a flow. Lava flows without entablature have textures that develop by relatively slow cooling, and two joint sets that usually meet near their middles, which indicate mostly conductive cooling. Entablature-bearing flows have two main joint sets that meet well below their middles, and textures that indicate fast cooling of entablatures and slow cooling of colonnades. Entablatures always occur in the upper joint sets and sometimes alternate several times with colonnades. Solidification times of entablature-bearing flows, constrained by lower joint-set thicknesses, are much less than those predicted by a purely conductive cooling model. These results are best explained by a cooling model based on conductive heat transfer near a flow base and water-steam convection in the upper part of an entablature-bearing flow. Calculated solidification rates in the upper parts of such flows exceed that of the upper crust of Kilauea Iki lava lake, where water-steam convection is documented. Use of the solidification rates in an available model of water-steam convection yields permeability values that agree with measured values for fractured crystalline rock. We conclude, therefore, that an entablature forms when part

  7. Evidence for gene flow between two sympatric mealybug species (Insecta; Coccoidea; Pseudococcidae.

    Directory of Open Access Journals (Sweden)

    Hofit Kol-Maimon

    Full Text Available Occurrence of inter-species hybrids in natural populations might be evidence of gene flow between species. In the present study we found evidence of gene flow between two sympatric, genetically related scale insect species--the citrus mealybug Planococcus citri (Risso and the vine mealybug Planococcus ficus (Signoret. These species can be distinguished by morphological, behavioral, and molecular traits. We employed the sex pheromones of the two respective species to study their different patterns of male attraction. We also used nuclear ITS2 (internal transcribed spacer 2 and mitochondrial COI (Cytochrome c oxidase sub unit 1 DNA sequences to characterize populations of the two species, in order to demonstrate the outcome of a possible gene flow between feral populations of the two species. Our results showed attraction to P. ficus pheromones of all tested populations of P. citri males but not vice versa. Furthermore, ITS2 sequences revealed the presence of 'hybrid females' among P. citri populations but not among those of P. ficus. 'hybrid females' from P. citri populations identified as P. citri females according to COI sequences. We offer two hypotheses for these results. 1 The occurrence of phenotypic and genotypic traits of P. ficus in P. citri populations may be attributed to both ancient and contemporary gene flow between their populations; and 2 we cannot rule out that an ancient sympatric speciation by which P. ficus emerged from P. citri might have led to the present situation of shared traits between these species. In light of these findings we also discuss the origin of the studied species and the importance of the pherotype phenomenon as a tool with which to study genetic relationships between congener scale insects.

  8. Genetic clusters and sex-biased gene flow in a unicolonial Formica ant

    Directory of Open Access Journals (Sweden)

    Chapuisat Michel

    2009-03-01

    Full Text Available Abstract Background Animal societies are diverse, ranging from small family-based groups to extraordinarily large social networks in which many unrelated individuals interact. At the extreme of this continuum, some ant species form unicolonial populations in which workers and queens can move among multiple interconnected nests without eliciting aggression. Although unicoloniality has been mostly studied in invasive ants, it also occurs in some native non-invasive species. Unicoloniality is commonly associated with very high queen number, which may result in levels of relatedness among nestmates being so low as to raise the question of the maintenance of altruism by kin selection in such systems. However, the actual relatedness among cooperating individuals critically depends on effective dispersal and the ensuing pattern of genetic structuring. In order to better understand the evolution of unicoloniality in native non-invasive ants, we investigated the fine-scale population genetic structure and gene flow in three unicolonial populations of the wood ant F. paralugubris. Results The analysis of geo-referenced microsatellite genotypes and mitochondrial haplotypes revealed the presence of cryptic clusters of genetically-differentiated nests in the three populations of F. paralugubris. Because of this spatial genetic heterogeneity, members of the same clusters were moderately but significantly related. The comparison of nuclear (microsatellite and mitochondrial differentiation indicated that effective gene flow was male-biased in all populations. Conclusion The three unicolonial populations exhibited male-biased and mostly local gene flow. The high number of queens per nest, exchanges among neighbouring nests and restricted long-distance gene flow resulted in large clusters of genetically similar nests. The positive relatedness among clustermates suggests that kin selection may still contribute to the maintenance of altruism in unicolonial

  9. The population genomics of begomoviruses: global scale population structure and gene flow

    Directory of Open Access Journals (Sweden)

    Prasanna HC

    2010-09-01

    Full Text Available Abstract Background The rapidly growing availability of diverse full genome sequences from across the world is increasing the feasibility of studying the large-scale population processes that underly observable pattern of virus diversity. In particular, characterizing the genetic structure of virus populations could potentially reveal much about how factors such as geographical distributions, host ranges and gene flow between populations combine to produce the discontinuous patterns of genetic diversity that we perceive as distinct virus species. Among the richest and most diverse full genome datasets that are available is that for the dicotyledonous plant infecting genus, Begomovirus, in the Family Geminiviridae. The begomoviruses all share the same whitefly vector, are highly recombinogenic and are distributed throughout tropical and subtropical regions where they seriously threaten the food security of the world's poorest people. Results We focus here on using a model-based population genetic approach to identify the genetically distinct sub-populations within the global begomovirus meta-population. We demonstrate the existence of at least seven major sub-populations that can further be sub-divided into as many as thirty four significantly differentiated and genetically cohesive minor sub-populations. Using the population structure framework revealed in the present study, we further explored the extent of gene flow and recombination between genetic populations. Conclusions Although geographical barriers are apparently the most significant underlying cause of the seven major population sub-divisions, within the framework of these sub-divisions, we explore patterns of gene flow to reveal that both host range differences and genetic barriers to recombination have probably been major contributors to the minor population sub-divisions that we have identified. We believe that the global Begomovirus population structure revealed here could

  10. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets.

    Science.gov (United States)

    Springer, Mark S; Gatesy, John

    2018-02-26

    coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset-the 'recombination ratchet'-is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d'etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful

  11. Statistical inference of the generation probability of T-cell receptors from sequence repertoires.

    Science.gov (United States)

    Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G

    2012-10-02

    Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.

  12. LASSIM-A network inference toolbox for genome-wide mechanistic modeling.

    Directory of Open Access Journals (Sweden)

    Rasmus Magnusson

    2017-06-01

    Full Text Available Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM, which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE for gene regulatory networks (GRNs. LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of naïve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models

  13. Study of male–mediated gene flow across a hybrid zone in the common shrew (Sorex araneus using Y chromosome

    Directory of Open Access Journals (Sweden)

    Andrei V. Polyakov

    2017-06-01

    Full Text Available Despite many studies, the impact of chromosome rearrangements on gene flow between chromosome races of the common shrew (Sorex araneus Linnaeus, 1758 remains unclear. Interracial hybrids form meiotic chromosome complexes that are associated with reduced fertility. Nevertheless comprehensive investigations of autosomal and mitochondrial markers revealed weak or no barrier to gene flow between chromosomally divergent populations. In a narrow zone of contact between the Novosibirsk and Tomsk races hybrids are produced with extraordinarily complex configurations at meiosis I. Microsatellite markers have not revealed any barrier to gene flow, but the phenotypic differentiation between races is greater than may be expected if gene flow was unrestricted. To explore this contradiction we analyzed the distribution of the Y chromosome SNP markers within this hybrid zone. The Y chromosome variants in combination with race specific autosome complements allow backcrosses to be distinguished and their proportion among individuals within the hybrid zone to be evaluated. The balanced ratio of the Y variants observed among the pure race individuals as well as backcrosses reveals no male mediated barrier to gene flow. The impact of reproductive unfitness of backcrosses on gene flow is discussed as a possible mechanism of the preservation of race-specific morphology within the hybrid zone.

  14. An analysis pipeline for the inference of protein-protein interaction networks

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, Ronald C.; Singhal, Mudita; Daly, Don S.; Gilmore, Jason M.; Cannon, William R.; Domico, Kelly O.; White, Amanda M.; Auberry, Deanna L.; Auberry, Kenneth J.; Hooker, Brian S.; Hurst, G. B.; McDermott, Jason E.; McDonald, W. H.; Pelletier, Dale A.; Schmoyer, Denise A.; Wiley, H. S.

    2009-12-01

    An analysis pipeline has been created for deployment of a novel algorithm, the Bayesian Estimator of Protein-Protein Association Probabilities (BEPro), for use in the reconstruction of protein-protein interaction networks. We have combined the Software Environment for BIological Network Inference (SEBINI), an interactive environment for the deployment and testing of network inference algorithms that use high-throughput data, and the Collective Analysis of Biological Interaction Networks (CABIN), software that allows integration and analysis of protein-protein interaction and gene-to-gene regulatory evidence obtained from multiple sources, to allow interactions computed by BEPro to be stored, visualized, and further analyzed. Incorporating BEPro into SEBINI and automatically feeding the resulting inferred network into CABIN, we have created a structured workflow for protein-protein network inference and supplemental analysis from sets of mass spectrometry bait-prey experiment data. SEBINI demo site: https://www.emsl.pnl.gov /SEBINI/ Contact: ronald.taylor@pnl.gov. BEPro is available at http://www.pnl.gov/statistics/BEPro3/index.htm. Contact: ds.daly@pnl.gov. CABIN is available at http://www.sysbio.org/dataresources/cabin.stm. Contact: mudita.singhal@pnl.gov.

  15. Integrating gene flow, crop biology, and farm management in on-farm conservation of avocado (Persea americana, Lauraceae).

    Science.gov (United States)

    Birnbaum, Kenneth; Desalle, Rob; Peters, Charles M; Benfey, Philip N

    2003-11-01

    Maintaining crop diversity on farms where cultivars can evolve is a conservation goal, but few tools are available to assess the long-term maintenance of genetic diversity on farms. One important issue for on-farm conservation is gene flow from crops with a narrow genetic base into related populations that are genetically diverse. In a case study of avocado (Persea americana var. americana) in one of its centers of diversity (San Jerónimo, Costa Rica), we used 10 DNA microsatellite markers in a parentage analysis to estimate gene flow from commercialized varieties into a traditional crop population. Five commercialized genotypes comprised nearly 40% of orchard trees, but they contributed only about 14.5% of the gametes to the youngest cohort of trees. Although commercialized varieties and the diverse population were often planted on the same farm, planting patterns appeared to keep the two types of trees separated on small scales, possibly explaining the limited gene flow. In a simulation that combined gene flow estimates, crop biology, and graft tree management, loss of allelic diversity was less than 10% over 150 yr, and selection was effective in retaining desirable alleles in the diverse subpopulation. Simulations also showed that, in addition to gene flow, managing the genetic makeup and life history traits of the invasive commercialized varieties could have a significant impact on genetic diversity in the target population. The results support the feasibility of on-farm crop conservation, but simulations also showed that higher levels of gene flow could lead to severe losses of genetic diversity even if farmers continue to plant diverse varieties.

  16. TIMP2 gene polymorphism as a potential tool to infer Brazilian population origin

    Directory of Open Access Journals (Sweden)

    da Silva RA

    2013-12-01

    Full Text Available Rodrigo Augusto da Silva,1 André Luis Shinohara,2 Denise Carleto Andia,1 Ariadne Letra,3 Regina Célia Peres,1 Ana Paula de Souza11Department of Morphology, Piracicaba Dental School, State University of Campinas, 2Oral Biology Program, Bauru Dental School, State University of São Paulo, São Paulo, Brazil; 3Department of Endodontics and Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center, Houston, TX, USAAbstract: Single nucleotide polymorphisms are genome variations that can be used as population-specific markers to infer genetic background and population origin. The Brazilian population is highly admixed due to immigration from several other populations. In particular, the state of São Paulo is recognized for the presence of Japanese individuals who seem likely to have contributed to a substantial proportion of ancestry in the modern Brazilian population. In the present study, we analyzed allele and genotype frequencies and associations of the –418G>C (rs8179090 single nucleotide polymorphism in the TIMP2 gene promoter in Brazilian and Japanese subjects, as well as in Japanese descendants from southeastern Brazil. The allele and genotype frequency analyses among groups demonstrated statistical significance (PC single nucleotide polymorphism of the TIMP2 gene, have a high probability of being Japanese or Japanese descendants. In addition to other genetic polymorphisms, the −418G>C TIMP2 polymorphism could be a population marker to assist in predicting Japanese ancestry, both in Japanese individuals and in admixed populations.Keywords: Brazilian, Japanese, polymorphism, allele, TIMP2

  17. High rates of gene flow by pollen and seed in oak populations across Europe.

    Directory of Open Access Journals (Sweden)

    Sophie Gerber

    Full Text Available Gene flow is a key factor in the evolution of species, influencing effective population size, hybridisation and local adaptation. We analysed local gene flow in eight stands of white oak (mostly Quercus petraea and Q. robur, but also Q. pubescens and Q. faginea distributed across Europe. Adult trees within a given area in each stand were exhaustively sampled (range [239, 754], mean 423, mapped, and acorns were collected ([17,147], 51 from several mother trees ([3], [47], 23. Seedlings ([65,387], 178 were harvested and geo-referenced in six of the eight stands. Genetic information was obtained from screening distinct molecular markers spread across the genome, genotyping each tree, acorn or seedling. All samples were thus genotyped at 5-8 nuclear microsatellite loci. Fathers/parents were assigned to acorns and seedlings using likelihood methods. Mating success of male and female parents, pollen and seed dispersal curves, and also hybridisation rates were estimated in each stand and compared on a continental scale. On average, the percentage of the wind-borne pollen from outside the stand was 60%, with large variation among stands (21-88%. Mean seed immigration into the stand was 40%, a high value for oaks that are generally considered to have limited seed dispersal. However, this estimate varied greatly among stands (20-66%. Gene flow was mostly intraspecific, with large variation, as some trees and stands showed particularly high rates of hybridisation. Our results show that mating success was unevenly distributed among trees. The high levels of gene flow suggest that geographically remote oak stands are unlikely to be genetically isolated, questioning the static definition of gene reserves and seed stands.

  18. Ploidy levels among species in the 'Oxalis tuberosa alliance' as inferred by flow cytometry.

    Science.gov (United States)

    Emshwiller, Eve

    2002-06-01

    The 'Oxalis tuberosa alliance' is a group of Andean Oxalis species allied to the Andean tuber crop O. tuberosa Molina (Oxalidaceae), commonly known as 'oca'. As part of a larger project studying the origins of polyploidy and domestication of cultivated oca, flow cytometry was used to survey DNA ploidy levels among Bolivian and Peruvian accessions of alliance members. In addition, this study provided a first assessment of C-values in the alliance by estimating nuclear DNA contents of these accessions using chicken erythrocytes as internal standard. Ten Bolivian accessions of cultivated O. tuberosa were confirmed to be octoploid, with a mean nuclear DNA content of approx. 3.6 pg/2C. Two Peruvian wild Oxalis species, O. phaeotricha and O. picchensis, were inferred to be tetraploid (both with approx. 1.67 pg/2C), the latter being one of the putative progenitors of O. tuberosa identified by chloroplast-expressed glutamine synthetase data in prior work. The remaining accessions (from 78 populations provisionally identified as 35 species) were DNA diploid, with nuclear DNA contents varying from 0.79 to 1.34 pg/2C.

  19. Morphological differentiation despite gene flow in an endangered grasshopper.

    Science.gov (United States)

    Dowle, Eddy J; Morgan-Richards, Mary; Trewick, Steven A

    2014-10-16

    Gene flow is traditionally considered a limitation to speciation because selection is required to counter the homogenising effect of allele exchange. Here we report on two sympatric short-horned grasshoppers species in the South Island of New Zealand; one (Sigaus australis) widespread and the other (Sigaus childi) a narrow endemic. Of the 79 putatively neutral markers (mtDNA, microsatellite loci, ITS sequences and RAD-seq SNPs) all but one marker we examined showed extensive allele sharing, and similar or identical allele frequencies in the two species where they co-occur. We found no genetic evidence of deviation from random mating in the region of sympatry. However, analysis of morphological and geometric traits revealed no evidence of morphological introgression. Based on phenotype the two species are clearly distinct, but their genotypes thus far reveal no divergence. The best explanation for this is that some loci associated with the distinguishing morphological characters are under strong selection, but exchange of neutral loci is occurring freely between the two species. Although it is easier to define species as requiring a barrier between them, a dynamic model that accommodates gene flow is a biologically more reasonable explanation for these grasshoppers.

  20. Asian wild rice is a hybrid swarm with extensive gene flow and feralization from domesticated rice.

    Science.gov (United States)

    Wang, Hongru; Vieira, Filipe G; Crawford, Jacob E; Chu, Chengcai; Nielsen, Rasmus

    2017-06-01

    The domestication history of rice remains controversial, with multiple studies reaching different conclusions regarding its origin(s). These studies have generally assumed that populations of living wild rice, O. rufipogon , are descendants of the ancestral population that gave rise to domesticated rice, but relatively little attention has been paid to the origins and history of wild rice itself. Here, we investigate the genetic ancestry of wild rice by analyzing a diverse panel of rice genomes consisting of 203 domesticated and 435 wild rice accessions. We show that most modern wild rice is heavily admixed with domesticated rice through both pollen- and seed-mediated gene flow. In fact, much presumed wild rice may simply represent different stages of feralized domesticated rice. In line with this hypothesis, many presumed wild rice varieties show remnants of the effects of selective sweeps in previously identified domestication genes, as well as evidence of recent selection in flowering genes possibly associated with the feralization process. Furthermore, there is a distinct geographical pattern of gene flow from aus , indica , and japonica varieties into colocated wild rice. We also show that admixture from aus and indica is more recent than gene flow from japonica , possibly consistent with an earlier spread of japonica varieties. We argue that wild rice populations should be considered a hybrid swarm, connected to domesticated rice by continuous and extensive gene flow. © 2017 Wang et al.; Published by Cold Spring Harbor Laboratory Press.

  1. Sex− and species−biased gene flow in a spotted eagle hybrid zone

    Directory of Open Access Journals (Sweden)

    Väli Ülo

    2011-04-01

    Full Text Available Abstract Background Recent theoretical and empirical work points toward a significant role for sex-chromosome linked genes in the evolution of traits that induce reproductive isolation and for traits that evolve under influence of sexual selection. Empirical studies including recently diverged (Pleistocene, short-lived avian species pairs with short generation times have found that introgression occurs on the autosomes but not on the Z-chromosome. Here we study genetic differentiation and gene flow in the long-lived greater spotted eagle (Aquila clanga and lesser spotted eagle (A. pomarina, two species with comparatively long generation times. Results Our data suggest that there is a directional bias in migration rates between hybridizing spotted eagles in eastern Europe. We find that a model including post divergence gene flow fits our data best for both autosomal and Z-chromosome linked loci but, for the Z-chromosome, the rate is reduced in the direction from A. pomarina to A. clanga. Conclusions The fact that some introgression still occurs on the Z-chromosome between these species suggests that the differentiation process is in a more premature phase in our study system than in previously studied avian species pairs and that could be explained by a shorter divergence time and/or a longer average generation time in the spotted eagles. The results are in agreement with field observations and provide further insight into the role of sex-linked loci for the build-up of barriers to gene flow among diverging populations and species.

  2. Gene flow for Echinococcus granulosus metapopulations determined by mitochondrial sequences: A reliable approach for reflecting epidemiological drift of parasite among neighboring countries.

    Science.gov (United States)

    Mahami-Oskouei, Mahmoud; Kaseb-Yazdanparast, Azam; Spotin, Adel; Shahbazi, Abbas; Adibpour, Mohammad; Ahmadpour, Ehsan; Ghabouli-Mehrabani, Nader

    2016-12-01

    In genetic diversity and population structure of Echinococcus granulosus, the gene flow can illustrate how the Echinococcus isolates have epidemiologically drifted among endemic neighboring countries. 51 isolates of hydatid cysts were collected from human, dog, cattle and sheep in northwest Iran, where placed co-border with Turkey. DNA samples were extracted, amplified and subjected to sequence analysis of NADH dehydrogenase subunit 1 (nad1) and cytochrome oxidase subunit 1 (cox1) genes. As well, sequences of Echinococcus at east to the southeast regions of Turkey were retrieved from GenBank database for the cox1 gene. The confirmed isolates were grouped as G1 (n = 74) and G3 (n = 6) genotypes. 31 unique haplotypes were identified inferred by the analyzed sequences of cox1 among two distinct populations. A parsimonious network of the sequence haplotypes displayed star-like features in the overall population containing TUR1, IR15 and IR22 as the most common haplotypes. According to AMOVA test, the high value of haplotype diversity (0.94758-0.98901) of E. granulosus was reflected the total genetic variability within populations while nucleotide diversity was low (0.00727-0.01046) in Iranian and Turkish metapopulations. Neutrality indices of the cox1 were shown negative values (-15.078 to -10.057) in Echinococcus populations which indicating a significant divergence from neutrality. A pairwise fixation index (Fst) as a degree of gene flow was partially high value for all populations (0.151). The statistically Fst value indicates that E. granulosus sensu stricto (G1-G3) are genetically moderate differentiated among Iranian and Turkish isolates. The occurrence of TUR1 and IR15 elucidate that there is possibly the dawn of domestication due to transfer of alleles between populations through the diffusion of stock raising or anthropogenic movements. To evaluate the hypothetical evolutionary scenario, further exploration is necessitated to analyze isolates from

  3. Intrinsic incompatibilities evolving as a by-product of divergent ecological selection: Considering them in empirical studies on divergence with gene flow.

    Science.gov (United States)

    Kulmuni, J; Westram, A M

    2017-06-01

    The possibility of intrinsic barriers to gene flow is often neglected in empirical research on local adaptation and speciation with gene flow, for example when interpreting patterns observed in genome scans. However, we draw attention to the fact that, even with gene flow, divergent ecological selection may generate intrinsic barriers involving both ecologically selected and other interacting loci. Mechanistically, the link between the two types of barriers may be generated by genes that have multiple functions (i.e., pleiotropy), and/or by gene interaction networks. Because most genes function in complex networks, and their evolution is not independent of other genes, changes evolving in response to ecological selection can generate intrinsic barriers as a by-product. A crucial question is to what extent such by-product barriers contribute to divergence and speciation-that is whether they stably reduce gene flow. We discuss under which conditions by-product barriers may increase isolation. However, we also highlight that, depending on the conditions (e.g., the amount of gene flow and the strength of selection acting on the intrinsic vs. the ecological barrier component), the intrinsic incompatibility may actually destabilize barriers to gene flow. In practice, intrinsic barriers generated as a by-product of divergent ecological selection may generate peaks in genome scans that cannot easily be interpreted. We argue that empirical studies on divergence with gene flow should consider the possibility of both ecological and intrinsic barriers. Future progress will likely come from work combining population genomic studies, experiments quantifying fitness and molecular studies on protein function and interactions. © 2017 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  4. An insight into subterranean flow proposition around Alleppey mudbank coastal sector, Kerala, India: Inferences from the subsurface profiles of ground penetrating radar

    Digital Repository Service at National Institute of Oceanography (India)

    Loveson, V.J.; Dubey, R.; DineshKumar, P.K.; Nigam, R.; Naqvi, S.W.A.

    -1 Author Version: Environ. Earth Sci., vol.75(20); 2016; no.1361 doi:10.1007/s12665-016-6172-6 An insight into subterranean flow proposition around Alleppey mudbank coastal sector, Kerala, India: inferences from the subsurface profiles of Ground... and productivity, physical and chemical aspects of the sea, annual drift etc. (Bristow et al., 1938; Varma and Kurup 1969; Gopinath and Qasim 1974; Jacob and Qasim (1974), Ramachandran and Mallik, 1985).Similar occurrences of mud banks in few other countries...

  5. PhySIC_IST: cleaning source trees to infer more informative supertrees.

    Science.gov (United States)

    Scornavacca, Celine; Berry, Vincent; Lefort, Vincent; Douzery, Emmanuel J P; Ranwez, Vincent

    2008-10-04

    Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, liberal methods infer supertrees containing the most frequent alternative, while veto methods infer supertrees not contradicting any source tree, i.e. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees. To overcome this problem, we propose to infer non-plenary supertrees, i.e. supertrees that do not necessarily contain all the taxa present in the source trees, discarding those whose position greatly differs among source trees or for which insufficient information is provided. We detail a variant of the PhySIC veto method called PhySIC_IST that can infer non-plenary supertrees. PhySIC_IST aims at inferring supertrees that satisfy the same appealing theoretical properties as with PhySIC, while being as informative as possible under this constraint. The informativeness of a supertree is estimated using a variation of the CIC (Cladistic Information Content) criterion, that takes into account both the presence of multifurcations and the absence of some taxa. Additionally, we propose a statistical preprocessing step called STC (Source Trees Correction) to correct the source trees prior to the supertree inference. STC is a liberal step that removes the parts of each source tree that significantly conflict with other source trees. Combining STC with a veto method allows an explicit trade-off between veto and liberal approaches, tuned by a single parameter.Performing large-scale simulations, we observe that STC+PhySIC_IST infers much more informative

  6. Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering

    Science.gov (United States)

    Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland

    2000-01-01

    Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.

  7. Inference of RNA polymerase II transcription dynamics from chromatin immunoprecipitation time course data.

    Directory of Open Access Journals (Sweden)

    Ciira wa Maina

    2014-05-01

    Full Text Available Gene transcription mediated by RNA polymerase II (pol-II is a key step in gene expression. The dynamics of pol-II moving along the transcribed region influence the rate and timing of gene expression. In this work, we present a probabilistic model of transcription dynamics which is fitted to pol-II occupancy time course data measured using ChIP-Seq. The model can be used to estimate transcription speed and to infer the temporal pol-II activity profile at the gene promoter. Model parameters are estimated using either maximum likelihood estimation or via Bayesian inference using Markov chain Monte Carlo sampling. The Bayesian approach provides confidence intervals for parameter estimates and allows the use of priors that capture domain knowledge, e.g. the expected range of transcription speeds, based on previous experiments. The model describes the movement of pol-II down the gene body and can be used to identify the time of induction for transcriptionally engaged genes. By clustering the inferred promoter activity time profiles, we are able to determine which genes respond quickly to stimuli and group genes that share activity profiles and may therefore be co-regulated. We apply our methodology to biological data obtained using ChIP-seq to measure pol-II occupancy genome-wide when MCF-7 human breast cancer cells are treated with estradiol (E2. The transcription speeds we obtain agree with those obtained previously for smaller numbers of genes with the advantage that our approach can be applied genome-wide. We validate the biological significance of the pol-II promoter activity clusters by investigating cluster-specific transcription factor binding patterns and determining canonical pathway enrichment. We find that rapidly induced genes are enriched for both estrogen receptor alpha (ERα and FOXA1 binding in their proximal promoter regions.

  8. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    Science.gov (United States)

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  9. A flood-based information flow analysis and network minimization method for gene regulatory networks.

    Science.gov (United States)

    Pavlogiannis, Andreas; Mozhayskiy, Vadim; Tagkopoulos, Ilias

    2013-04-24

    Biological networks tend to have high interconnectivity, complex topologies and multiple types of interactions. This renders difficult the identification of sub-networks that are involved in condition- specific responses. In addition, we generally lack scalable methods that can reveal the information flow in gene regulatory and biochemical pathways. Doing so will help us to identify key participants and paths under specific environmental and cellular context. This paper introduces the theory of network flooding, which aims to address the problem of network minimization and regulatory information flow in gene regulatory networks. Given a regulatory biological network, a set of source (input) nodes and optionally a set of sink (output) nodes, our task is to find (a) the minimal sub-network that encodes the regulatory program involving all input and output nodes and (b) the information flow from the source to the sink nodes of the network. Here, we describe a novel, scalable, network traversal algorithm and we assess its potential to achieve significant network size reduction in both synthetic and E. coli networks. Scalability and sensitivity analysis show that the proposed method scales well with the size of the network, and is robust to noise and missing data. The method of network flooding proves to be a useful, practical approach towards information flow analysis in gene regulatory networks. Further extension of the proposed theory has the potential to lead in a unifying framework for the simultaneous network minimization and information flow analysis across various "omics" levels.

  10. Consequences of population topology for studying gene flow using link-based landscape genetic methods.

    Science.gov (United States)

    van Strien, Maarten J

    2017-07-01

    Many landscape genetic studies aim to determine the effect of landscape on gene flow between populations. These studies frequently employ link-based methods that relate pairwise measures of historical gene flow to measures of the landscape and the geographical distance between populations. However, apart from landscape and distance, there is a third important factor that can influence historical gene flow, that is, population topology (i.e., the arrangement of populations throughout a landscape). As the population topology is determined in part by the landscape configuration, I argue that it should play a more prominent role in landscape genetics. Making use of existing literature and theoretical examples, I discuss how population topology can influence results in landscape genetic studies and how it can be taken into account to improve the accuracy of these results. In support of my arguments, I have performed a literature review of landscape genetic studies published during the first half of 2015 as well as several computer simulations of gene flow between populations. First, I argue why one should carefully consider which population pairs should be included in link-based analyses. Second, I discuss several ways in which the population topology can be incorporated in response and explanatory variables. Third, I outline why it is important to sample populations in such a way that a good representation of the population topology is obtained. Fourth, I discuss how statistical testing for link-based approaches could be influenced by the population topology. I conclude the article with six recommendations geared toward better incorporating population topology in link-based landscape genetic studies.

  11. Gene flow and pathogen transmission among bobcats (Lynx rufus) in a fragmented urban landscape

    Science.gov (United States)

    Lee, Justin S.; Ruell, Emily W.; Boydston, Erin E.; Lyren, Lisa M.; Alonso, Robert S.; Troyer, Jennifer L.; Crooks, Kevin R.; VandeWoude, Sue

    2012-01-01

    Urbanization can result in the fragmentation of once contiguous natural landscapes into a patchy habitat interspersed within a growing urban matrix. Animals living in fragmented landscapes often have reduced movement among habitat patches because of avoidance of intervening human development, which potentially leads to both reduced gene flow and pathogen transmission between patches. Mammalian carnivores with large home ranges, such as bobcats (Lynx rufus), may be particularly sensitive to habitat fragmentation. We performed genetic analyses on bobcats and their directly transmitted viral pathogen, feline immunodeficiency virus (FIV), to investigate the effects of urbanization on bobcat movement. We predicted that urban development, including major freeways, would limit bobcat movement and result in genetically structured host and pathogen populations. We analysed molecular markers from 106 bobcats and 19 FIV isolates from seropositive animals in urban southern California. Our findings indicate that reduced gene flow between two primary habitat patches has resulted in genetically distinct bobcat subpopulations separated by urban development including a major highway. However, the distribution of genetic diversity among FIV isolates determined through phylogenetic analyses indicates that pathogen genotypes are less spatially structured--exhibiting a more even distribution between habitat fragments. We conclude that the types of movement and contact sufficient for disease transmission occur with enough frequency to preclude structuring among the viral population, but that the bobcat population is structured owing to low levels of effective bobcat migration resulting in gene flow. We illustrate the utility in using multiple molecular markers that differentially detect movement and gene flow between subpopulations when assessing connectivity.

  12. MAINTENANCE OF ECOLOGICALLY SIGNIFICANT GENETIC VARIATION IN THE TIGER SWALLOWTAIL BUTTERFLY THROUGH DIFFERENTIAL SELECTION AND GENE FLOW.

    Science.gov (United States)

    Bossart, J L; Scriber, J M

    1995-12-01

    Differential selection in a heterogeneous environment is thought to promote the maintenance of ecologically significant genetic variation. Variation is maintained when selection is counterbalanced by the homogenizing effects of gene flow and random mating. In this study, we examine the relative importance of differential selection and gene flow in maintaining genetic variation in Papilio glaucus. Differential selection on traits contributing to successful use of host plants (oviposition preference and larval performance) was assessed by comparing the responses of southern Ohio, north central Georgia, and southern Florida populations of P. glaucus to three hosts: Liriodendron tulipifera, Magnolia virginiana, and Prunus serotina. Gene flow among populations was estimated using allozyme frequencies from nine polymorphic loci. Significant genetic differentiation was observed among populations for both oviposition preference and larval performance. This differentiation was interpreted to be the result of selection acting on Florida P. glaucus for enhanced use of Magnolia, the prevalent host in Florida. In contrast, no evidence of population differentiation was revealed by allozyme frequencies. F ST -values were very small and Nm, an estimate of the relative strengths of gene flow and genetic drift, was large, indicating that genetic exchange among P. glaucus populations is relatively unrestricted. The contrasting patterns of spatial differentiation for host-use traits and lack of differentiation for electrophoretically detectable variation implies that differential selection among populations will be counterbalanced by gene flow, thereby maintaining genetic variation for host-use traits. © 1995 The Society for the Study of Evolution.

  13. Local evolution of pyrethroid resistance offsets gene flow among Aedes aegypti collections in Yucatan State, Mexico.

    Science.gov (United States)

    Saavedra-Rodriguez, Karla; Beaty, Meaghan; Lozano-Fuentes, Saul; Denham, Steven; Garcia-Rejon, Julian; Reyes-Solis, Guadalupe; Machain-Williams, Carlos; Loroño-Pino, Maria Alba; Flores-Suarez, Adriana; Ponce-Garcia, Gustavo; Beaty, Barry; Eisen, Lars; Black, William C

    2015-01-01

    The mosquito Aedes aegypti is the major vector of the four serotypes of dengue virus (DENV1-4). Previous studies have shown that Ae. aegypti in Mexico have a high effective migration rate and that gene flow occurs among populations that are up to 150 km apart. Since 2000, pyrethroids have been widely used for suppression of Ae. aegypti in cities in Mexico. In Yucatan State in particular, pyrethroids have been applied in and around dengue case households creating an opportunity for local selection and evolution of resistance. Herein, we test for evidence of local adaptation by comparing patterns of variation among 27 Ae. aegypti collections at 13 single nucleotide polymorphisms (SNPs): two in the voltage-gated sodium channel gene para known to confer knockdown resistance, three in detoxification genes previously associated with pyrethroid resistance, and eight in putatively neutral loci. The SNPs in para varied greatly in frequency among collections, whereas SNPs at the remaining 11 loci showed little variation supporting previous evidence for extensive local gene flow. Among Ae. aegypti in Yucatan State, Mexico, local adaptation to pyrethroids appears to offset the homogenizing effects of gene flow. © The American Society of Tropical Medicine and Hygiene.

  14. A Local Poisson Graphical Model for inferring networks from sequencing data.

    Science.gov (United States)

    Allen, Genevera I; Liu, Zhandong

    2013-09-01

    Gaussian graphical models, a class of undirected graphs or Markov Networks, are often used to infer gene networks based on microarray expression data. Many scientists, however, have begun using high-throughput sequencing technologies such as RNA-sequencing or next generation sequencing to measure gene expression. As the resulting data consists of counts of sequencing reads for each gene, Gaussian graphical models are not optimal for this discrete data. In this paper, we propose a novel method for inferring gene networks from sequencing data: the Local Poisson Graphical Model. Our model assumes a Local Markov property where each variable conditional on all other variables is Poisson distributed. We develop a neighborhood selection algorithm to fit our model locally by performing a series of l1 penalized Poisson, or log-linear, regressions. This yields a fast parallel algorithm for estimating networks from next generation sequencing data. In simulations, we illustrate the effectiveness of our methods for recovering network structure from count data. A case study on breast cancer microRNAs (miRNAs), a novel application of graphical models, finds known regulators of breast cancer genes and discovers novel miRNA clusters and hubs that are targets for future research.

  15. Constructing an integrated gene similarity network for the identification of disease genes.

    Science.gov (United States)

    Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin

    2017-09-20

    Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .

  16. Inferring biological functions of guanylyl cyclases with computational methods

    KAUST Repository

    Alquraishi, May Majed; Meier, Stuart Kurt

    2013-01-01

    A number of studies have shown that functionally related genes are often co-expressed and that computational based co-expression analysis can be used to accurately identify functional relationships between genes and by inference, their encoded proteins. Here we describe how a computational based co-expression analysis can be used to link the function of a specific gene of interest to a defined cellular response. Using a worked example we demonstrate how this methodology is used to link the function of the Arabidopsis Wall-Associated Kinase-Like 10 gene, which encodes a functional guanylyl cyclase, to host responses to pathogens. © Springer Science+Business Media New York 2013.

  17. Inferring biological functions of guanylyl cyclases with computational methods

    KAUST Repository

    Alquraishi, May Majed

    2013-09-03

    A number of studies have shown that functionally related genes are often co-expressed and that computational based co-expression analysis can be used to accurately identify functional relationships between genes and by inference, their encoded proteins. Here we describe how a computational based co-expression analysis can be used to link the function of a specific gene of interest to a defined cellular response. Using a worked example we demonstrate how this methodology is used to link the function of the Arabidopsis Wall-Associated Kinase-Like 10 gene, which encodes a functional guanylyl cyclase, to host responses to pathogens. © Springer Science+Business Media New York 2013.

  18. The scale of hydrothermal circulation of the Iheya-North field inferred from intensive heat flow measurements and ocean drilling

    Science.gov (United States)

    Masaki, Y.; Kinoshita, M.; Yamamoto, H.; Nakajima, R.; Kumagai, H.; Takai, K.

    2014-12-01

    Iheya-North hydrothermal field situated in the middle Okinawa trough backarc basin is one of the largest ongoing Kuroko deposits in the world. Active chimneys as well as diffuse ventings (maximum fluid temperature 311 °C) have been located and studied in detail through various geological and geophysical surveys. To clarify the spatial scale of the hydrothermal circulation system, intensive heat flow measurements were carried out and ~100 heat flow data in and around the field from 2002 to 2014. In 2010, Integrated Ocean Drilling Program (IODP) Expedition 331 was carried out, and subbottom temperature data were obtained around the hydrothermal sites. During the JAMSTEC R/V Kaiyo cruise, KY14-01 in 2014, Iheya-North "Natsu" and "Aki" hydrothermal fields were newly found. The Iheya-Noth "Natsu" and "Aki" sites are located 1.2 km and 2.6 km south from the Iheya-North original site, respectively, and the maximum venting fluid temperature was 317 °C. We obtained one heat flow data at the "Aki" site. The value was 17 W/m2. Currently, the relationship between these hydrothermal sites are not well known. Three distinct zones are identified by heat flow values within 3 km from the active hydrothermal field. They are high-heat flow zone (>1 W/m2; HHZ), moderate-heat-flow zone (1-0.1 W/m2; MHZ); and low-heat-flow zone (<0.1 W/m2; LHZ). With increasing distance east of the HHZ, heat flow gradually decreases towards MHZ and LHZ. In the LHZ, temperature at 37m below the seafloor (mbsf) was 6 °C, that is consistent with the surface low heat flow suggesting the recharge of seawater. However, between 70 and 90 mbsf, the coarser sediments were cored, and temperature increased from 25 °C to 40°C. The temperature was 905°C at 151 mbsf, which was measured with thermoseal strips. The low thermal gradient in the upper 40 m suggests downward fluid flow. We infer that a hydrothermal circulation in the scale of ~1.5 km horizontal vs. ~a few hundred meters vertical.

  19. The role of gene flow in shaping genetic structures of the subtropical conifer species Araucaria angustifolia.

    Science.gov (United States)

    Stefenon, V M; Gailing, O; Finkeldey, R

    2008-05-01

    The morphological features of pollen and seed of Araucaria angustifolia have led to the proposal of limited gene dispersal for this species. We used nuclear microsatellite and AFLP markers to assess patterns of genetic variation in six natural populations at the intra- and inter-population level, and related our findings to gene dispersal in this species. Estimates of both fine-scale spatial genetic structure (SGS) and migration rate suggest relatively short-distance gene dispersal. However, gene dispersal differed among populations, and effects of more efficient dispersal within population were observed in at least one stand. In addition, even though some seed dispersal may be aggregated in this principally barochorous species, reasonable secondary seed dispersal, presumably facilitated by animals, and overlap of seed shadows within populations is suggested. Overall, no correlation was observed between levels of SGS and inbreeding, density or age structure, except that a higher level of SGS was revealed for the population with a higher number of juvenile individuals. A low estimate for the number of migrants per generation between two neighbouring populations implies limited gene flow. We expect that stepping-stone pollen flow may have contributed to low genetic differentiation among populations observed in a previous survey. Thus, strategies for maintenance of gene flow among remnant populations should be considered in order to avoid degrading effects of population fragmentation on the evolution of A. angustifolia.

  20. Inferring clocks when lacking rocks: the variable rates of molecular evolution in bacteria

    Directory of Open Access Journals (Sweden)

    Ochman Howard

    2009-09-01

    Full Text Available Abstract Background Because bacteria do not have a robust fossil record, attempts to infer the timing of events in their evolutionary history requires comparisons of molecular sequences. This use of molecular clocks is based on the assumptions that substitution rates for homologous genes or sites are fairly constant through time and across taxa. Violation of these conditions can lead to erroneous inferences and result in estimates that are off by orders of magnitude. In this study, we examine the consistency of substitution rates among a set of conserved genes in diverse bacterial lineages, and address the questions regarding the validity of molecular dating. Results By examining the evolution of 16S rRNA gene in obligate endosymbionts, which can be calibrated by the fossil record of their hosts, we found that the rates are consistent within a clade but varied widely across different bacterial lineages. Genome-wide estimates of nonsynonymous and synonymous substitutions suggest that these two measures are highly variable in their rates across bacterial taxa. Genetic drift plays a fundamental role in determining the accumulation of substitutions in 16S rRNA genes and at nonsynonymous sites. Moreover, divergence estimates based on a set of universally conserved protein-coding genes also exhibit low correspondence to those based on 16S rRNA genes. Conclusion Our results document a wide range of substitution rates across genes and bacterial taxa. This high level of variation cautions against the assumption of a universal molecular clock for inferring divergence times in bacteria. However, by applying relative-rate tests to homologous genes, it is possible to derive reliable local clocks that can be used to calibrate bacterial evolution. Reviewers This article was reviewed by Adam Eyre-Walker, Simonetta Gribaldo and Tal Pupko (nominated by Dan Graur.

  1. Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

    Science.gov (United States)

    Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

    2014-01-01

    Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.

  2. Barriers to gene flow in the marine environment: insights from two common intertidal limpet species of the Atlantic and Mediterranean.

    Directory of Open Access Journals (Sweden)

    Alexandra Sá-Pinto

    Full Text Available Knowledge of the scale of dispersal and the mechanisms governing gene flow in marine environments remains fragmentary despite being essential for understanding evolution of marine biota and to design management plans. We use the limpets Patella ulyssiponensis and Patella rustica as models for identifying factors affecting gene flow in marine organisms across the North-East Atlantic and the Mediterranean Sea. A set of allozyme loci and a fragment of the mitochondrial gene cytochrome C oxidase subunit I were screened for genetic variation through starch gel electrophoresis and DNA sequencing, respectively. An approach combining clustering algorithms with clinal analyses was used to test for the existence of barriers to gene flow and estimate their geographic location and abruptness. Sharp breaks in the genetic composition of individuals were observed in the transitions between the Atlantic and the Mediterranean and across southern Italian shores. An additional break within the Atlantic cluster separates samples from the Alboran Sea and Atlantic African shores from those of the Iberian Atlantic shores. The geographic congruence of the genetic breaks detected in these two limpet species strongly supports the existence of transpecific barriers to gene flow in the Mediterranean Sea and Northeastern Atlantic. This leads to testable hypotheses regarding factors restricting gene flow across the study area.

  3. Progression inference for somatic mutations in cancer

    Directory of Open Access Journals (Sweden)

    Leif E. Peterson

    2017-04-01

    Full Text Available Computational methods were employed to determine progression inference of genomic alterations in commonly occurring cancers. Using cross-sectional TCGA data, we computed evolutionary trajectories involving selectivity relationships among pairs of gene-specific genomic alterations such as somatic mutations, deletions, amplifications, downregulation, and upregulation among the top 20 driver genes associated with each cancer. Results indicate that the majority of hierarchies involved TP53, PIK3CA, ERBB2, APC, KRAS, EGFR, IDH1, VHL, etc. Research into the order and accumulation of genomic alterations among cancer driver genes will ever-increase as the costs of nextgen sequencing subside, and personalized/precision medicine incorporates whole-genome scans into the diagnosis and treatment of cancer. Keywords: Oncology, Cancer research, Genetics, Computational biology

  4. Phylogeny of the Celastreae (Celastraceae) and the relationships of Catha edulis (qat) inferred from morphological characters and nuclear and plastid genes.

    Science.gov (United States)

    Simmons, Mark P; Cappa, Jennifer J; Archer, Robert H; Ford, Andrew J; Eichstedt, Dedra; Clevinger, Curtis C

    2008-08-01

    The phylogeny of Celastraceae tribe Celastreae, which includes about 350 species of trees and shrubs in 15 genera, was inferred in a simultaneous analysis of morphological characters together with nuclear (ITS and 26S rDNA) and plastid (matK, trnL-F) genes. A strong correlation was found between the geography of the species sampled and their inferred relationships. Species of Maytenus and Gymnosporia from different regions were resolved as polyphyletic groups. Maytenus was resolved in three lineages (New World, African, and Austral-Pacific), while Gymnosporia was resolved in two lineages (New World and Old World). Putterlickia was resolved as nested within the Old World Gymnosporia. Catha edulis (qat, khat) was resolved as sister to the clade of Allocassine, Cassine, Lauridia, and Maurocenia. Gymnosporia cassinoides, which is reportedly chewed as a stimulant in the Canary Islands, was resolved as a derived member of Gymnosporia and is more closely related to Lydenburgia and Putterlickia than it is to Catha. Therefore, all eight of these genera are candidates for containing cathinone- and/or cathine-related alkaloids.

  5. Genomic evidence of geographically widespread effect of gene flow from polar bears into brown bears.

    Science.gov (United States)

    Cahill, James A; Stirling, Ian; Kistler, Logan; Salamzade, Rauf; Ersmark, Erik; Fulton, Tara L; Stiller, Mathias; Green, Richard E; Shapiro, Beth

    2015-03-01

    Polar bears are an arctic, marine adapted species that is closely related to brown bears. Genome analyses have shown that polar bears are distinct and genetically homogeneous in comparison to brown bears. However, these analyses have also revealed a remarkable episode of polar bear gene flow into the population of brown bears that colonized the Admiralty, Baranof and Chichagof islands (ABC islands) of Alaska. Here, we present an analysis of data from a large panel of polar bear and brown bear genomes that includes brown bears from the ABC islands, the Alaskan mainland and Europe. Our results provide clear evidence that gene flow between the two species had a geographically wide impact, with polar bear DNA found within the genomes of brown bears living both on the ABC islands and in the Alaskan mainland. Intriguingly, while brown bear genomes contain up to 8.8% polar bear ancestry, polar bear genomes appear to be devoid of brown bear ancestry, suggesting the presence of a barrier to gene flow in that direction. © 2014 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  6. Evolutionary relationships in Aspergillus section Fumigati inferred from partial beta-tubulin and hydrophobin sequences

    DEFF Research Database (Denmark)

    Geiser, D.M.; Frisvad, Jens Christian; Taylor, J.W.

    1998-01-01

    are heterothallic. Phylogenetic relationships were inferred among members of Aspergillus section Fumigati based on partial DNA sequences from the benA beta-tubulin and rodA hydrophobin genes. Aspergillus clavatus was chosen as an outgroup. The two gene regions provided nearly equal numbers of phylogenetically...... informative nucleotide characters. The rodA region possessed a considerably higher level of inferred amino acid variation than did the benA region. The results of a partition homogeneity test showed that the benA and rodA data sets were not in significant conflict, and the topologies of the most parsimonious...

  7. Inferring biological tasks using Pareto analysis of high-dimensional data.

    Science.gov (United States)

    Hart, Yuval; Sheftel, Hila; Hausser, Jean; Szekely, Pablo; Ben-Moshe, Noa Bossel; Korem, Yael; Tendler, Avichai; Mayo, Avraham E; Alon, Uri

    2015-03-01

    We present the Pareto task inference method (ParTI; http://www.weizmann.ac.il/mcb/UriAlon/download/ParTI) for inferring biological tasks from high-dimensional biological data. Data are described as a polytope, and features maximally enriched closest to the vertices (or archetypes) allow identification of the tasks the vertices represent. We demonstrate that human breast tumors and mouse tissues are well described by tetrahedrons in gene expression space, with specific tumor types and biological functions enriched at each of the vertices, suggesting four key tasks.

  8. Inference of the Genetic Network Regulating Lateral Root Initiation in Arabidopsis thaliana

    KAUST Repository

    Muraro, D.

    2013-01-01

    Regulation of gene expression is crucial for organism growth, and it is one of the challenges in systems biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyze two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, and assess causality of their regulatory interactions by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation. © 2004-2012 IEEE.

  9. Genome-Scale Analysis of Translation Elongation with a Ribosome Flow Model

    Science.gov (United States)

    Meilijson, Isaac; Kupiec, Martin; Ruppin, Eytan

    2011-01-01

    We describe the first large scale analysis of gene translation that is based on a model that takes into account the physical and dynamical nature of this process. The Ribosomal Flow Model (RFM) predicts fundamental features of the translation process, including translation rates, protein abundance levels, ribosomal densities and the relation between all these variables, better than alternative (‘non-physical’) approaches. In addition, we show that the RFM can be used for accurate inference of various other quantities including genes' initiation rates and translation costs. These quantities could not be inferred by previous predictors. We find that increasing the number of available ribosomes (or equivalently the initiation rate) increases the genomic translation rate and the mean ribosome density only up to a certain point, beyond which both saturate. Strikingly, assuming that the translation system is tuned to work at the pre-saturation point maximizes the predictive power of the model with respect to experimental data. This result suggests that in all organisms that were analyzed (from bacteria to Human), the global initiation rate is optimized to attain the pre-saturation point. The fact that similar results were not observed for heterologous genes indicates that this feature is under selection. Remarkably, the gap between the performance of the RFM and alternative predictors is strikingly large in the case of heterologous genes, testifying to the model's promising biotechnological value in predicting the abundance of heterologous proteins before expressing them in the desired host. PMID:21909250

  10. Gene flow rise with habitat fragmentation in the bog fritillary butterfly (Lepidoptera: Nymphalidae).

    Science.gov (United States)

    Nève, Gabriel; Barascud, Bernard; Descimon, Henri; Baguette, Michel

    2008-03-17

    The main components of the spatial genetic structure of the populations are neighbourhood size and isolation by distance. These may be inferred from the allele frequencies across a series of populations within a region. Here, the spatial population structure of Proclossiana eunomia was investigated in two mountainous areas of southern Europe (Asturias, Spain and Pyrenees, France) and in two areas of intermediate elevation (Morvan, France and Ardennes, Belgium). A total of eight polymorphic loci were scored by allozyme electrophoresis, revealing a higher polymorphism in the populations of southern Europe than in those of central Europe. Isolation by distance effect was much stronger in the two mountain ranges (Pyrenees and Asturias) than in the two areas of lower elevation (Ardennes and Morvan). By contrast, the neighbourhood size estimates were smaller in the Ardennes and in the Morvan than in the two high mountain areas, indicating more common movements between neighbouring patches in the mountains than in plains. Short and long dispersal events are two phenomena with distinct consequences in the population genetics of natural populations. The differences in level of population differentiation within each the four regions may be explained by change in dispersal in lowland recently fragmented landscapes: on average, butterflies disperse to a shorter distance but the few ones which disperse long distance do so more efficiently. Habitat fragmentation has evolutionary consequences exceeding by far the selection of dispersal related traits: the balance between local specialisation and gene flow would be perturbed, which would modify the extent to which populations are adapted to heterogeneous environments.

  11. Gene flow rise with habitat fragmentation in the bog fritillary butterfly (Lepidoptera: Nymphalidae

    Directory of Open Access Journals (Sweden)

    Descimon Henri

    2008-03-01

    Full Text Available Abstract Background The main components of the spatial genetic structure of the populations are neighbourhood size and isolation by distance. These may be inferred from the allele frequencies across a series of populations within a region. Here, the spatial population structure of Proclossiana eunomia was investigated in two mountainous areas of southern Europe (Asturias, Spain and Pyrenees, France and in two areas of intermediate elevation (Morvan, France and Ardennes, Belgium. Results A total of eight polymorphic loci were scored by allozyme electrophoresis, revealing a higher polymorphism in the populations of southern Europe than in those of central Europe. Isolation by distance effect was much stronger in the two mountain ranges (Pyrenees and Asturias than in the two areas of lower elevation (Ardennes and Morvan. By contrast, the neighbourhood size estimates were smaller in the Ardennes and in the Morvan than in the two high mountain areas, indicating more common movements between neighbouring patches in the mountains than in plains. Conclusion Short and long dispersal events are two phenomena with distinct consequences in the population genetics of natural populations. The differences in level of population differentiation within each the four regions may be explained by change in dispersal in lowland recently fragmented landscapes: on average, butterflies disperse to a shorter distance but the few ones which disperse long distance do so more efficiently. Habitat fragmentation has evolutionary consequences exceeding by far the selection of dispersal related traits: the balance between local specialisation and gene flow would be perturbed, which would modify the extent to which populations are adapted to heterogeneous environments.

  12. Contrasting Patterns of Gene Flow for Amazonian Snakes That Actively Forage and Those That Wait in Ambush.

    Science.gov (United States)

    de Fraga, Rafael; Lima, Albertina P; Magnusson, William E; Ferrão, Miquéias; Stow, Adam J

    2017-07-01

    Knowledge of genetic structure, geographic distance and environmental heterogeneity can be used to identify environmental features and natural history traits that influence dispersal and gene flow. Foraging mode is a trait that might predict dispersal capacity in snakes, because actively foragers typically have greater movement rates than ambush predators. Here, we test the hypothesis that 2 actively foraging snakes have higher levels of gene flow than 2 ambush predators. We evaluated these 4 co-distributed species of snakes in the Brazilian Amazon. Snakes were sampled along an 880 km transect from the central to the southwest of the Amazon basin, which covered a mosaic of vegetation types and seasonal differences in climate. We analyzed thousands of single nucleotide polymorphisms to compare patterns of neutral gene flow based on isolation by geographic distance (IBD) and environmental resistance (IBR). We show that IBD and IBR were only evident in ambush predators, implying lower levels of dispersal than the active foragers. Therefore, gene flow was high enough in the active foragers analyzed here to prevent any build-up of spatial genotypic structure with respect to geographic distance and environmental heterogeneity. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  13. Circumpolar Genetic Structure and Recent Gene Flow of Polar Bears: A Reanalysis.

    Science.gov (United States)

    Malenfant, René M; Davis, Corey S; Cullingham, Catherine I; Coltman, David W

    2016-01-01

    Recently, an extensive study of 2,748 polar bears (Ursus maritimus) from across their circumpolar range was published in PLOS ONE, which used microsatellites and mitochondrial haplotypes to apparently show altered population structure and a dramatic change in directional gene flow towards the Canadian Archipelago-an area believed to be a future refugium for polar bears as their southernmost habitats decline under climate change. Although this study represents a major international collaborative effort and promised to be a baseline for future genetics work, methodological shortcomings and errors of interpretation undermine some of the study's main conclusions. Here, we present a reanalysis of this data in which we address some of these issues, including: (1) highly unbalanced sample sizes and large amounts of systematically missing data; (2) incorrect calculation of FST and of significance levels; (3) misleading estimates of recent gene flow resulting from non-convergence of the program BayesAss. In contrast to the original findings, in our reanalysis we find six genetic clusters of polar bears worldwide: the Hudson Bay Complex, the Western and Eastern Canadian Arctic Archipelago, the Western and Eastern Polar Basin, and-importantly-we reconfirm the presence of a unique and possibly endangered cluster of bears in Norwegian Bay near Canada's expected last sea-ice refugium. Although polar bears' abundance, distribution, and population structure will certainly be negatively affected by ongoing-and increasingly rapid-loss of Arctic sea ice, these genetic data provide no evidence of strong directional gene flow in response to recent climate change.

  14. Circumpolar Genetic Structure and Recent Gene Flow of Polar Bears: A Reanalysis.

    Directory of Open Access Journals (Sweden)

    René M Malenfant

    Full Text Available Recently, an extensive study of 2,748 polar bears (Ursus maritimus from across their circumpolar range was published in PLOS ONE, which used microsatellites and mitochondrial haplotypes to apparently show altered population structure and a dramatic change in directional gene flow towards the Canadian Archipelago-an area believed to be a future refugium for polar bears as their southernmost habitats decline under climate change. Although this study represents a major international collaborative effort and promised to be a baseline for future genetics work, methodological shortcomings and errors of interpretation undermine some of the study's main conclusions. Here, we present a reanalysis of this data in which we address some of these issues, including: (1 highly unbalanced sample sizes and large amounts of systematically missing data; (2 incorrect calculation of FST and of significance levels; (3 misleading estimates of recent gene flow resulting from non-convergence of the program BayesAss. In contrast to the original findings, in our reanalysis we find six genetic clusters of polar bears worldwide: the Hudson Bay Complex, the Western and Eastern Canadian Arctic Archipelago, the Western and Eastern Polar Basin, and-importantly-we reconfirm the presence of a unique and possibly endangered cluster of bears in Norwegian Bay near Canada's expected last sea-ice refugium. Although polar bears' abundance, distribution, and population structure will certainly be negatively affected by ongoing-and increasingly rapid-loss of Arctic sea ice, these genetic data provide no evidence of strong directional gene flow in response to recent climate change.

  15. Passive larval transport explains recent gene flow in a Mediterranean gorgonian

    Science.gov (United States)

    Padrón, Mariana; Costantini, Federica; Baksay, Sandra; Bramanti, Lorenzo; Guizien, Katell

    2018-06-01

    Understanding the patterns of connectivity is required by the Strategic Plan for Biodiversity 2011-2020 and will be used to guide the extension of marine protection measures. Despite the increasing accuracy of ocean circulation modelling, the capacity to model the population connectivity of sessile benthic species with dispersal larval stages can be limited due to the potential effect of filters acting before or after dispersal, which modulates offspring release or settlement, respectively. We applied an interdisciplinary approach that combined demographic surveys, genetic methods (assignment tests and coalescent-based analyses) and larval transport simulations to test the relative importance of demographics and ocean currents in shaping the recent patterns of gene flow among populations of a Mediterranean gorgonian ( Eunicella singularis) in a fragmented rocky habitat (Gulf of Lion, NW Mediterranean Sea). We show that larval transport is a dominant driver of recent gene flow among the populations, and significant correlations were found between recent gene flow and larval transport during an average single dispersal event when the pelagic larval durations (PLDs) ranged from 7 to 14 d. Our results suggest that PLDs that efficiently connect populations distributed over a fragmented habitat are filtered by the habitat layout within the species competency period. Moreover, a PLD ranging from 7 to 14 d is sufficient to connect the fragmented rocky substrate of the Gulf of Lion. The rocky areas located in the centre of the Gulf of Lion, which are currently not protected, were identified as essential hubs for the distribution of migrants in the region. We encourage the use of a range of PLDs instead of a single value when estimating larval transport with biophysical models to identify potential connectivity patterns among a network of Marine Protected Areas or even solely a seascape.

  16. Investigating Pollen and Gene Flow of WYMV-Resistant Transgenic Wheat N12-1 Using a Dwarf Male-Sterile Line as the Pollen Receptor.

    Science.gov (United States)

    Dong, Shanshan; Liu, Yan; Yu, Cigang; Zhang, Zhenhua; Chen, Ming; Wang, Changyong

    2016-01-01

    Pollen-mediated gene flow (PMGF) is the main mode of transgene flow in flowering plants. The study of pollen and gene flow of transgenic wheat can help to establish the corresponding strategy for preventing transgene escape and contamination between compatible genotypes in wheat. To investigate the pollen dispersal and gene flow frequency in various directions and distances around the pollen source and detect the association between frequency of transgene flow and pollen density from transgenic wheat, a concentric circle design was adopted to conduct a field experiment using transgenic wheat with resistance to wheat yellow mosaic virus (WYMV) as the pollen donor and dwarf male-sterile wheat as the pollen receptor. The results showed that the pollen and gene flow of transgenic wheat varied significantly among the different compass sectors. A higher pollen density and gene flow frequency was observed in the downwind SW and W sectors, with average frequencies of transgene flow of 26.37 and 23.69% respectively. The pollen and gene flow of transgenic wheat declined dramatically with increasing distance from its source. Most of the pollen grains concentrated within 5 m and only a few pollen grains were detected beyond 30 m. The percentage of transgene flow was the highest where adjacent to the pollen source, with an average of 48.24% for all eight compass directions at 0 m distance. Transgene flow was reduced to 50% and 95% between 1.61 to 3.15 m, and 10.71 to 20.93 m, respectively. Our results suggest that climate conditions, especially wind direction, may significantly affect pollen dispersal and gene flow of wheat. The isolation-by-distance model is one of the most effective methods for achieving stringent transgene confinement in wheat. The frequency of transgene flow is directly correlated with the relative density of GM pollen grains in air currents, and pollen competition may be a major factor influencing transgene flow.

  17. Investigating Pollen and Gene Flow of WYMV-Resistant Transgenic Wheat N12-1 Using a Dwarf Male-Sterile Line as the Pollen Receptor.

    Directory of Open Access Journals (Sweden)

    Shanshan Dong

    Full Text Available Pollen-mediated gene flow (PMGF is the main mode of transgene flow in flowering plants. The study of pollen and gene flow of transgenic wheat can help to establish the corresponding strategy for preventing transgene escape and contamination between compatible genotypes in wheat. To investigate the pollen dispersal and gene flow frequency in various directions and distances around the pollen source and detect the association between frequency of transgene flow and pollen density from transgenic wheat, a concentric circle design was adopted to conduct a field experiment using transgenic wheat with resistance to wheat yellow mosaic virus (WYMV as the pollen donor and dwarf male-sterile wheat as the pollen receptor. The results showed that the pollen and gene flow of transgenic wheat varied significantly among the different compass sectors. A higher pollen density and gene flow frequency was observed in the downwind SW and W sectors, with average frequencies of transgene flow of 26.37 and 23.69% respectively. The pollen and gene flow of transgenic wheat declined dramatically with increasing distance from its source. Most of the pollen grains concentrated within 5 m and only a few pollen grains were detected beyond 30 m. The percentage of transgene flow was the highest where adjacent to the pollen source, with an average of 48.24% for all eight compass directions at 0 m distance. Transgene flow was reduced to 50% and 95% between 1.61 to 3.15 m, and 10.71 to 20.93 m, respectively. Our results suggest that climate conditions, especially wind direction, may significantly affect pollen dispersal and gene flow of wheat. The isolation-by-distance model is one of the most effective methods for achieving stringent transgene confinement in wheat. The frequency of transgene flow is directly correlated with the relative density of GM pollen grains in air currents, and pollen competition may be a major factor influencing transgene flow.

  18. Extensive gene flow over Europe and possible speciation

    Energy Technology Data Exchange (ETDEWEB)

    VINCENOT, Dr. LUCIE [Centre d’Ecologie Fonctionnelle et Evolutive Montpellier, France; NARA, Dr. KAZUHIDE [Department of Natural Environmental Studies, The University of Tokyo, Japan; STHULTZ, CHRISTOPHER [Centre d’Ecologie Fonctionnelle et Evolutive Montpellier, France; Labbe, Jessy L [ORNL; DUBOIS, MARIE-PIERRE [Centre d’Ecologie Fonctionnelle et Evolutive Montpellier, France; TEDERSOO, LEHO [University of Tartu, Estonia; Martin, Francis [INRA, Nancy, France; SELOSSE, Dr. MARC-ANDRE [Centre d’Ecologie Fonctionnelle et Evolutive Montpellier, France

    2012-01-01

    Biogeographical patterns and large-scale genetic structure have been little studied in ectomycorrhizal (EM) fungi, despite the ecological and economic importance of EM symbioses. We coupled population genetics and phylogenetic approaches to understand spatial structure in fungal populations on a continental scale. Using nine microsatellite markers, we characterized gene flow among 16 populations of the widespread EM basidiomycete Laccaria amethystina over Europe (i.e. over 2900 km). We also widened our scope to two additional populations from Japan (104 km away) and compared them with European populations through microsatellite markers and multilocus phylogenies, using three nuclear genes (NAR, G6PD and ribosomal DNA) and two mitochondrial ribosomal genes. European L. amethystina populations displayed limited differentiation (average FST = 0.041) and very weak isolation by distance (IBD). This panmictic European pattern may result from effective aerial dispersal of spores, high genetic diversity in populations and mutualistic interactions with multiple hosts that all facilitate migration. The multilocus phylogeny based on nuclear genes confirmed that Japanese and European specimens were closely related but clustered on a geographical basis. By using microsatellite markers, we found that Japanese populations were strongly differentiated from the European populations (FST = 0.416), more than expected by extrapolating the European pattern of IBD. Population structure analyses clearly separated the populations into two clusters, i.e. European and Japanese clusters. We discuss the possibility of IBD in a continuous population (considering some evidence for a ring species over the Northern Hemisphere) vs. an allopatric speciation over Eurasia, making L. amethystina a promising model of intercontinental species for future studies.

  19. Adaptive genetic markers discriminate migratory runs of Chinook salmon (Oncorhynchus tshawytscha) amid continued gene flow.

    Science.gov (United States)

    O'Malley, Kathleen G; Jacobson, Dave P; Kurth, Ryon; Dill, Allen J; Banks, Michael A

    2013-12-01

    Neutral genetic markers are routinely used to define distinct units within species that warrant discrete management. Human-induced changes to gene flow however may reduce the power of such an approach. We tested the efficiency of adaptive versus neutral genetic markers in differentiating temporally divergent migratory runs of Chinook salmon (Oncorhynchus tshawytscha) amid high gene flow owing to artificial propagation and habitat alteration. We compared seven putative migration timing genes to ten microsatellite loci in delineating three migratory groups of Chinook in the Feather River, CA: offspring of fall-run hatchery broodstock that returned as adults to freshwater in fall (fall run), spring-run offspring that returned in spring (spring run), and fall-run offspring that returned in spring (FRS). We found evidence for significant differentiation between the fall and federally listed threatened spring groups based on divergence at three circadian clock genes (OtsClock1b, OmyFbxw11, and Omy1009UW), but not neutral markers. We thus demonstrate the importance of genetic marker choice in resolving complex life history types. These findings directly impact conservation management strategies and add to previous evidence from Pacific and Atlantic salmon indicating that circadian clock genes influence migration timing.

  20. Pollen-mediated gene flow and seed exchange in small-scale Zambian maize farming, implications for biosafety assessment.

    Science.gov (United States)

    Bøhn, Thomas; Aheto, Denis W; Mwangala, Felix S; Fischer, Klara; Bones, Inger Louise; Simoloka, Christopher; Mbeule, Ireen; Schmidt, Gunther; Breckling, Broder

    2016-10-03

    Gene flow in agricultural crops is important for risk assessment of genetically modified (GM) crops, particularly in countries with a large informal agricultural sector of subsistence cultivation. We present a pollen flow model for maize (Zea mays), a major staple crop in Africa. We use spatial properties of fields (size, position) in three small-scale maize farming communities in Zambia and estimate rates of cross-fertilisation between fields sown with different maize varieties (e.g. conventional and transgene). As an additional factor contributing to gene flow, we present data on seed saving and sharing among farmers that live in the same communities. Our results show that: i) maize fields were small and located in immediate vicinity of neighboring fields; ii) a majority of farmers saved and shared seed; iii) modeled rates of pollen-mediated gene flow showed extensive mixing of germplasm between fields and farms and iv) as a result, segregation of GM and non-GM varieties is not likely to be an option in these systems. We conclude that the overall genetic composition of maize, in this and similar agricultural contexts, will be strongly influenced both by self-organised ecological factors (pollen flow), and by socially mediated intervention (seed recycling and sharing).

  1. OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks.

    Science.gov (United States)

    Lim, Néhémy; Senbabaoglu, Yasin; Michailidis, George; d'Alché-Buc, Florence

    2013-06-01

    Reverse engineering of gene regulatory networks remains a central challenge in computational systems biology, despite recent advances facilitated by benchmark in silico challenges that have aided in calibrating their performance. A number of approaches using either perturbation (knock-out) or wild-type time-series data have appeared in the literature addressing this problem, with the latter using linear temporal models. Nonlinear dynamical models are particularly appropriate for this inference task, given the generation mechanism of the time-series data. In this study, we introduce a novel nonlinear autoregressive model based on operator-valued kernels that simultaneously learns the model parameters, as well as the network structure. A flexible boosting algorithm (OKVAR-Boost) that shares features from L2-boosting and randomization-based algorithms is developed to perform the tasks of parameter learning and network inference for the proposed model. Specifically, at each boosting iteration, a regularized Operator-valued Kernel-based Vector AutoRegressive model (OKVAR) is trained on a random subnetwork. The final model consists of an ensemble of such models. The empirical estimation of the ensemble model's Jacobian matrix provides an estimation of the network structure. The performance of the proposed algorithm is first evaluated on a number of benchmark datasets from the DREAM3 challenge and then on real datasets related to the In vivo Reverse-Engineering and Modeling Assessment (IRMA) and T-cell networks. The high-quality results obtained strongly indicate that it outperforms existing approaches. The OKVAR-Boost Matlab code is available as the archive: http://amis-group.fr/sourcecode-okvar-boost/OKVARBoost-v1.0.zip. Supplementary data are available at Bioinformatics online.

  2. Sexual selection and magic traits in speciation with gene flow

    Directory of Open Access Journals (Sweden)

    Maria R. SERVEDIO, Michael KOPP

    2012-06-01

    Full Text Available The extent to which sexual selection is involved in speciation with gene flow remains an open question and the subject of much research. Here, we propose that some insight can be gained from considering the concept of magic traits (i.e., traits involved in both reproductive isolation and ecological divergence. Both magic traits and other, “non-magic”, traits can contribute to speciation via a number of specific mechanisms. We argue that many of these mechanisms are likely to differ widely in the extent to which they involve sexual selection. Furthermore, in some cases where sexual selection is present, it may be prone to inhibit rather than drive speciation. Finally, there are a priori reasons to believe that certain categories of traits are much more effective than others in driving speciation. The combination of these points suggests a classification of traits that may shed light on the broader role of sexual selection in speciation with gene flow. In particular, we suggest that sexual selection can act as a driver of speciation in some scenarios, but may play a negligible role in potentially common categories of magic traits, and may be likely to inhibit speciation in common categories of non-magic traits [Current Zoology 58 (3: 507–513, 2012].

  3. Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures

    Science.gov (United States)

    Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland

    1998-01-01

    Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.

  4. The Himalayas: barrier and conduit for gene flow.

    Science.gov (United States)

    Gayden, Tenzin; Perez, Annabel; Persad, Patrice J; Bukhari, Areej; Chennakrishnaiah, Shilpa; Simms, Tanya; Maloney, Trisha; Rodriguez, Kristina; Herrera, Rene J

    2013-06-01

    The Himalayan mountain range is strategically located at the crossroads of the major cultural centers in Asia, the Middle East and Europe. Although previous Y-chromosome studies indicate that the Himalayas served as a natural barrier for gene flow from the south to the Tibetan plateau, this region is believed to have played an important role as a corridor for human migrations between East and West Eurasia along the ancient Silk Road. To evaluate the effects of the Himalayan mountain range in shaping the maternal lineages of populations residing on either side of the cordillera, we analyzed mitochondrial DNA variation in 344 samples from three Nepalese collections (Newar, Kathmandu and Tamang) and a general population of Tibet. Our results revealed a predominantly East Asian-specific component in Tibet and Tamang, whereas Newar and Kathmandu are both characterized by a combination of East and South Central Asian lineages. Interestingly, Newar and Kathmandu harbor several deep-rooted Indian lineages, including M2, R5, and U2, whose coalescent times from this study (U2, >40 kya) and previous reports (M2 and R5, >50 kya) suggest that Nepal was inhabited during the initial peopling of South Central Asia. Comparisons with our previous Y-chromosome data indicate sex-biased migrations in Tamang and a founder effect and/or genetic drift in Tamang and Newar. Altogether, our results confirm that while the Himalayas acted as a geographic barrier for human movement from the Indian subcontinent to the Tibetan highland, it also served as a conduit for gene flow between Central and East Asia. Copyright © 2013 Wiley Periodicals, Inc.

  5. Improved functional overview of protein complexes using inferred epistatic relationships

    LENUS (Irish Health Repository)

    Ryan, Colm

    2011-05-23

    Abstract Background Epistatic Miniarray Profiling(E-MAP) quantifies the net effect on growth rate of disrupting pairs of genes, often producing phenotypes that may be more (negative epistasis) or less (positive epistasis) severe than the phenotype predicted based on single gene disruptions. Epistatic interactions are important for understanding cell biology because they define relationships between individual genes, and between sets of genes involved in biochemical pathways and protein complexes. Each E-MAP screen quantifies the interactions between a logically selected subset of genes (e.g. genes whose products share a common function). Interactions that occur between genes involved in different cellular processes are not as frequently measured, yet these interactions are important for providing an overview of cellular organization. Results We introduce a method for combining overlapping E-MAP screens and inferring new interactions between them. We use this method to infer with high confidence 2,240 new strongly epistatic interactions and 34,469 weakly epistatic or neutral interactions. We show that accuracy of the predicted interactions approaches that of replicate experiments and that, like measured interactions, they are enriched for features such as shared biochemical pathways and knockout phenotypes. We constructed an expanded epistasis map for yeast cell protein complexes and show that our new interactions increase the evidence for previously proposed inter-complex connections, and predict many new links. We validated a number of these in the laboratory, including new interactions linking the SWR-C chromatin modifying complex and the nuclear transport apparatus. Conclusion Overall, our data support a modular model of yeast cell protein network organization and show how prediction methods can considerably extend the information that can be extracted from overlapping E-MAP screens.

  6. Ploidy Levels among Species in the ‘Oxalis tuberosa Alliance’ as Inferred by Flow Cytometry

    Science.gov (United States)

    EMSHWILLER, EVE

    2002-01-01

    The ‘Oxalis tuberosa alliance’ is a group of Andean Oxalis species allied to the Andean tuber crop O. tuberosa Molina (Oxalidaceae), commonly known as ‘oca’. As part of a larger project studying the origins of polyploidy and domestication of cultivated oca, flow cytometry was used to survey DNA ploidy levels among Bolivian and Peruvian accessions of alliance members. In addition, this study provided a first assessment of C‐values in the alliance by estimating nuclear DNA contents of these accessions using chicken erythrocytes as internal standard. Ten Bolivian accessions of cultivated O. tuberosa were confirmed to be octoploid, with a mean nuclear DNA content of approx. 3·6 pg/2C. Two Peruvian wild Oxalis species, O. phaeotricha and O. picchensis, were inferred to be tetraploid (both with approx. 1·67 pg/2C), the latter being one of the putative progenitors of O. tuberosa identified by chloroplast‐expressed glutamine synthetase data in prior work. The remaining accessions (from 78 populations provisionally identified as 35 species) were DNA diploid, with nuclear DNA contents varying from 0·79 to 1·34 pg/2C. PMID:12102530

  7. Amplified fragment length polymorphism fingerprints support limited gene flow among social spider populations

    NARCIS (Netherlands)

    Smith, Deborah; van Rijn, Sander; Henschel, Joh; Bilde, Trine; Lubin, Yael

    We used DNA fingerprints to determine whether the population structure and colony composition of the cooperative social spider Stegodyphus dumicola are compatible with requirements of interdemic ('group') selection: differential proliferation of demes or groups and limited gene flow among groups. To

  8. Inferring Allele Frequency Trajectories from Ancient DNA Indicates That Selection on a Chicken Gene Coincided with Changes in Medieval Husbandry Practices.

    Science.gov (United States)

    Loog, Liisa; Thomas, Mark G; Barnett, Ross; Allen, Richard; Sykes, Naomi; Paxinos, Ptolemaios D; Lebrasseur, Ophélie; Dobney, Keith; Peters, Joris; Manica, Andrea; Larson, Greger; Eriksson, Anders

    2017-08-01

    Ancient DNA provides an opportunity to infer the drivers of natural selection by linking allele frequency changes to temporal shifts in environment or cultural practices. However, analyses have often been hampered by uneven sampling and uncertainties in sample dating, as well as being confounded by demographic processes. Here, we present a Bayesian statistical framework for quantifying the timing and strength of selection using ancient DNA that explicitly addresses these challenges. We applied this method to time series data for two loci: TSHR and BCDO2, both hypothesised to have undergone strong and recent selection in domestic chickens. The derived variant in TSHR, associated with reduced aggression to conspecifics and faster onset of egg laying, shows strong selection beginning around 1,100 years ago, coincident with archaeological evidence for intensified chicken production and documented changes in egg and chicken consumption. To our knowledge, this is the first example of preindustrial domesticate trait selection in response to a historically attested cultural shift in food preference. For BCDO2, we find support for selection, but demonstrate that the recent rise in allele frequency could also have been driven by gene flow from imported Asian chickens during more recent breed formations. Our findings highlight that traits found ubiquitously in modern domestic species may not necessarily have originated during the early stages of domestication. In addition, our results demonstrate the importance of precise estimation of allele frequency trajectories through time for understanding the drivers of selection. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  9. Chromosomal rearrangements and gene flow over time in an inter-specific hybrid zone of the Sorex araneus group.

    Science.gov (United States)

    Yannic, G; Basset, P; Hausser, J

    2009-06-01

    Most hybrid zones have existed for hundreds or thousands of years but have generally been observed for only a short time period. Studies extending over periods long enough to track evolutionary changes in the zones or assess the ultimate outcome of hybridization are scarce. Here, we describe the evolution over time of the level of genetic isolation between two karyotypically different species of shrews (Sorex araneus and Sorex antinorii) at a hybrid zone located in the Swiss Alps. We first evaluated hybrid zone movement by contrasting patterns of gene flow and changes in cline parameters (centre and width) using 24 microsatellite loci, between two periods separated by 10 years apart. Additionally, we tested the role of chromosomal rearrangements on gene flow by analysing microsatellite loci located on both rearranged and common chromosomes to both species. We did not detect any movement of the hybrid zone during the period analysed, suggesting that the zone is a typical tension zone. However, the gene flow was significantly lower among the rearranged than the common chromosomes for the second period, whereas the difference was only marginally significant for the first period. This further supports the role of chromosomal rearrangements on gene flow between these taxa.

  10. Inferring species trees from gene trees in a radiation of California trapdoor spiders (Araneae, Antrodiaetidae, Aliatypus.

    Directory of Open Access Journals (Sweden)

    Jordan D Satler

    Full Text Available The California Floristic Province is a biodiversity hotspot, reflecting a complex geologic history, strong selective gradients, and a heterogeneous landscape. These factors have led to high endemic diversity across many lifeforms within this region, including the richest diversity of mygalomorph spiders (tarantulas, trapdoor spiders, and kin in North America. The trapdoor spider genus Aliatypus encompasses twelve described species, eleven of which are endemic to California. Several Aliatypus species show disjunct distributional patterns in California (some are found on both sides of the vast Central Valley, and the genus as a whole occupies an impressive variety of habitats.We collected specimens from 89 populations representing all described species. DNA sequence data were collected from seven gene regions, including two newly developed for spider systematics. Bayesian inference (in individual gene tree and species tree approaches recovered a general "3 clade" structure for the genus (A. gulosus, californicus group, erebus group, with three other phylogenetically isolated species differing slightly in position across different phylogenetic analyses. Because of extremely high intraspecific divergences in mitochondrial COI sequences, the relatively slowly evolving 28S rRNA gene was found to be more useful than mitochondrial data for identification of morphologically indistinguishable immatures. For multiple species spanning the Central Valley, explicit hypothesis testing suggests a lack of monophyly for regional populations (e.g., western Coast Range populations. Phylogenetic evidence clearly shows that syntopy is restricted to distant phylogenetic relatives, consistent with ecological niche conservatism.This study provides fundamental insight into a radiation of trapdoor spiders found in the biodiversity hotspot of California. Species relationships are clarified and undescribed lineages are discovered, with more geographic sampling likely to

  11. Simulating pattern-process relationships to validate landscape genetic models

    Science.gov (United States)

    A. J. Shirk; S. A. Cushman; E. L. Landguth

    2012-01-01

    Landscapes may resist gene flow and thereby give rise to a pattern of genetic isolation within a population. The mechanism by which a landscape resists gene flow can be inferred by evaluating the relationship between landscape models and an observed pattern of genetic isolation. This approach risks false inferences because researchers can never feasibly test all...

  12. Revisiting the phylogeny of Zoanthidea (Cnidaria: Anthozoa): Staggered alignment of hypervariable sequences improves species tree inference.

    Science.gov (United States)

    Swain, Timothy D

    2018-01-01

    The recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea. Spanish language abstract available in Text S1. Translation by L. O. Swain, DePaul University, Chicago, Illinois, 60604, USA. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. FARNA: knowledgebase of inferred functions of non-coding RNA transcripts

    KAUST Repository

    Alam, Tanvir

    2016-10-12

    Non-coding RNA (ncRNA) genes play a major role in control of heterogeneous cellular behavior. Yet, their functions are largely uncharacterized. Current available databases lack in-depth information of ncRNA functions across spectrum of various cells/tissues. Here, we present FARNA, a knowledgebase of inferred functions of 10,289 human ncRNA transcripts (2,734 microRNA and 7,555 long ncRNA) in 119 tissues and 177 primary cells of human. Since transcription factors (TFs) and TF co-factors (TcoFs) are crucial components of regulatory machinery for activation of gene transcription, cellular processes and diseases in which TFs and TcoFs are involved suggest functions of the transcripts they regulate. In FARNA, functions of a transcript are inferred from TFs and TcoFs whose genes co-express with the transcript controlled by these TFs and TcoFs in a considered cell/tissue. Transcripts were annotated using statistically enriched GO terms, pathways and diseases across cells/tissues based on guilt-by-association principle. Expression profiles across cells/tissues based on Cap Analysis of Gene Expression (CAGE) are provided. FARNA, having the most comprehensive function annotation of considered ncRNAs across widest spectrum of human cells/tissues, has a potential to greatly contribute to our understanding of ncRNA roles and their regulatory mechanisms in human. FARNA can be accessed at: http://cbrc.kaust.edu.sa/farna

  14. FARNA: knowledgebase of inferred functions of non-coding RNA transcripts

    KAUST Repository

    Alam, Tanvir; Uludag, Mahmut; Essack, Magbubah; Salhi, Adil; Ashoor, Haitham; Hanks, John B.; Kapfer, Craig Eric; Mineta, Katsuhiko; Gojobori, Takashi; Bajic, Vladimir B.

    2016-01-01

    Non-coding RNA (ncRNA) genes play a major role in control of heterogeneous cellular behavior. Yet, their functions are largely uncharacterized. Current available databases lack in-depth information of ncRNA functions across spectrum of various cells/tissues. Here, we present FARNA, a knowledgebase of inferred functions of 10,289 human ncRNA transcripts (2,734 microRNA and 7,555 long ncRNA) in 119 tissues and 177 primary cells of human. Since transcription factors (TFs) and TF co-factors (TcoFs) are crucial components of regulatory machinery for activation of gene transcription, cellular processes and diseases in which TFs and TcoFs are involved suggest functions of the transcripts they regulate. In FARNA, functions of a transcript are inferred from TFs and TcoFs whose genes co-express with the transcript controlled by these TFs and TcoFs in a considered cell/tissue. Transcripts were annotated using statistically enriched GO terms, pathways and diseases across cells/tissues based on guilt-by-association principle. Expression profiles across cells/tissues based on Cap Analysis of Gene Expression (CAGE) are provided. FARNA, having the most comprehensive function annotation of considered ncRNAs across widest spectrum of human cells/tissues, has a potential to greatly contribute to our understanding of ncRNA roles and their regulatory mechanisms in human. FARNA can be accessed at: http://cbrc.kaust.edu.sa/farna

  15. Inference and Analysis of Population Structure Using Genetic Data and Network Theory.

    Science.gov (United States)

    Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli

    2016-04-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of

  16. Quantitative inference of dynamic regulatory pathways via microarray data

    Directory of Open Access Journals (Sweden)

    Chen Bor-Sen

    2005-03-01

    Full Text Available Abstract Background The cellular signaling pathway (network is one of the main topics of organismic investigations. The intracellular interactions between genes in a signaling pathway are considered as the foundation of functional genomics. Thus, what genes and how much they influence each other through transcriptional binding or physical interactions are essential problems. Under the synchronous measures of gene expression via a microarray chip, an amount of dynamic information is embedded and remains to be discovered. Using a systematically dynamic modeling approach, we explore the causal relationship among genes in cellular signaling pathways from the system biology approach. Results In this study, a second-order dynamic model is developed to describe the regulatory mechanism of a target gene from the upstream causality point of view. From the expression profile and dynamic model of a target gene, we can estimate its upstream regulatory function. According to this upstream regulatory function, we would deduce the upstream regulatory genes with their regulatory abilities and activation delays, and then link up a regulatory pathway. Iteratively, these regulatory genes are considered as target genes to trace back their upstream regulatory genes. Then we could construct the regulatory pathway (or network to the genome wide. In short, we can infer the genetic regulatory pathways from gene-expression profiles quantitatively, which can confirm some doubted paths or seek some unknown paths in a regulatory pathway (network. Finally, the proposed approach is validated by randomly reshuffling the time order of microarray data. Conclusion We focus our algorithm on the inference of regulatory abilities of the identified causal genes, and how much delay before they regulate the downstream genes. With this information, a regulatory pathway would be built up using microarray data. In the present study, two signaling pathways, i.e. circadian regulatory

  17. Fuzzy boundaries: color and gene flow patterns among parapatric lineages of the western shovel-nosed snake and taxonomic implication.

    Science.gov (United States)

    Wood, Dustin A; Fisher, Robert N; Vandergast, Amy G

    2014-01-01

    Accurate delineation of lineage diversity is increasingly important, as species distributions are becoming more reduced and threatened. During the last century, the subspecies category was often used to denote phenotypic variation within a species range and to provide a framework for understanding lineage differentiation, often considered incipient speciation. While this category has largely fallen into disuse, previously recognized subspecies often serve as important units for conservation policy and management when other information is lacking. In this study, we evaluated phenotypic subspecies hypotheses within shovel-nosed snakes on the basis of genetic data and considered how evolutionary processes such as gene flow influenced possible incongruence between phenotypic and genetic patterns. We used both traditional phylogenetic and Bayesian clustering analyses to infer range-wide genetic structure and spatially explicit analyses to detect possible boundary locations of lineage contact. Multilocus analyses supported three historically isolated groups with low to moderate levels of contemporary gene exchange. Genetic data did not support phenotypic subspecies as exclusive groups, and we detected patterns of discordance in areas where three subspecies are presumed to be in contact. Based on genetic and phenotypic evidence, we suggested that species-level diversity is underestimated in this group and we proposed that two species be recognized, Chionactis occipitalis and C. annulata. In addition, we recommend retention of two subspecific designations within C. annulata (C. a. annulata and C. a. klauberi) that reflect regional shifts in both genetic and phenotypic variation within the species. Our results highlight the difficultly in validating taxonomic boundaries within lineages that are evolving under a time-dependent, continuous process.

  18. Inferred performance of surface hydraulic barriers from landfill operational data

    International Nuclear Information System (INIS)

    Gross, B.A.; Bonaparte, R.; Othman, M.A.

    1997-01-01

    There are few published data on the field performance of surface hydraulic barriers (SHBs) used in waste containment or remediation applications. In contrast, operational data for liner systems used beneath landfills are widely available. These data are frequently collected and reported as a facility permit condition. This paper uses leachate collection system (LCS) and leak detection system (LDS) liquid flow rate and chemical quality data collected from modem landfill double-liner systems to infer the likely hydraulic performance of SHBs. Operational data for over 200 waste management unit liner systems are currently being collected and evaluated by the authors as part of an ongoing research investigation for the United States Environmental Protection Agency (USEPA). The top liner of the double-liner system for the units is either a geomembrane (GMB) alone, geomembrane overlying a geosynthetic clay liner (GMB/GCL), or geomembrane overlying a compacted clay liner (GMB/CCL). In this paper, select data from the USEPA study are used to: (i) infer the likely efficiencies of SHBs incorporating GMBs and overlain by drainage layers; and (ii) evaluate the effectiveness of SHBs in reducing water infiltration into, and drainage from, the underlying waste (i.e., source control). SHB efficiencies are inferred from calculated landfill liner efficiencies and then used to estimate average water percolation rates through SHBs as a function of site average annual rainfall. The effectiveness of SHBs for source control is investigated by comparing LCS liquid flow rates for open and closed landfill cells. The LCS flow rates for closed cells are also compared to the estimated average water percolation rates through SHBs presented in the paper

  19. The gene flow and mode of reproduction of Dothistroma septosporum in the Czech Republic

    Czech Academy of Sciences Publication Activity Database

    Tomšovský, M.; Tomešová, V.; Palovčíková, D.; Kostovčík, Martin; Rohrer, M.; Hanáček, P.; Jankovský, L.

    2014-01-01

    Roč. 62, č. 1 (2014), s. 59-68 ISSN 0032-0862 Institutional support: RVO:61388971 Keywords : Dothistroma * dothistroma needle blight * gene flow Subject RIV: EE - Microbiology, Virology Impact factor: 2.121, year: 2014

  20. Scaffold filling, contig fusion and comparative gene order inference

    Directory of Open Access Journals (Sweden)

    Rounsley Steve

    2010-06-01

    Full Text Available Abstract Background There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes? Results Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera. Conclusions The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.

  1. Scaffold filling, contig fusion and comparative gene order inference.

    Science.gov (United States)

    Muñoz, Adriana; Zheng, Chunfang; Zhu, Qian; Albert, Victor A; Rounsley, Steve; Sankoff, David

    2010-06-04

    There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes? Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera. The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.

  2. Inferring gene networks from discrete expression data

    KAUST Repository

    Zhang, L.; Mallick, B. K.

    2013-01-01

    graphical models applied to continuous data, which give a closedformmarginal likelihood. In this paper,we extend network modeling to discrete data, specifically data from serial analysis of gene expression, and RNA-sequencing experiments, both of which

  3. Urban landscape genetics: canopy cover predicts gene flow between white-footed mouse (Peromyscus leucopus) populations in New York City.

    Science.gov (United States)

    Munshi-South, Jason

    2012-03-01

    In this study, I examine the influence of urban canopy cover on gene flow between 15 white-footed mouse (Peromyscus leucopus) populations in New York City parklands. Parks in the urban core are often highly fragmented, leading to rapid genetic differentiation of relatively nonvagile species. However, a diverse array of 'green' spaces may provide dispersal corridors through 'grey' urban infrastructure. I identify urban landscape features that promote genetic connectivity in an urban environment and compare the success of two different landscape connectivity approaches at explaining gene flow. Gene flow was associated with 'effective distances' between populations that were calculated based on per cent tree canopy cover using two different approaches: (i) isolation by effective distance (IED) that calculates the single best pathway to minimize passage through high-resistance (i.e. low canopy cover) areas, and (ii) isolation by resistance (IBR), an implementation of circuit theory that identifies all low-resistance paths through the landscape. IBR, but not IED, models were significantly associated with three measures of gene flow (Nm from F(ST) , BayesAss+ and Migrate-n) after factoring out the influence of isolation by distance using partial Mantel tests. Predicted corridors for gene flow between city parks were largely narrow, linear parklands or vegetated spaces that are not managed for wildlife, such as cemeteries and roadway medians. These results have implications for understanding the impacts of urbanization trends on native wildlife, as well as for urban reforestation efforts that aim to improve urban ecosystem processes. © 2012 Blackwell Publishing Ltd.

  4. Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling.

    Directory of Open Access Journals (Sweden)

    Junha Shin

    Full Text Available Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life-Archaea, Bacteria, and Eukaryota-suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co

  5. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    Science.gov (United States)

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  6. Genomic evidence for divergence with gene flow in host races of the larch budmoth

    Czech Academy of Sciences Publication Activity Database

    Emelianov, I.; Marec, František; Mallet, J.

    2003-01-01

    Roč. 271, - (2003), s. 97-105 ISSN 0962-8452 Institutional research plan: CEZ:AV0Z5007907 Keywords : speciation * gene flow * selection Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.544, year: 2003

  7. Inference of the Genetic Network Regulating Lateral Root Initiation in Arabidopsis thaliana

    KAUST Repository

    Muraro, D.; Voss, U.; Wilson, M.; Bennett, M.; Byrne, H.; De Smet, I.; Hodgman, C.; King, J.

    2013-01-01

    thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based

  8. Forecasting Water Level Fluctuations of Urmieh Lake Using Gene Expression Programming and Adaptive Neuro-Fuzzy Inference System

    Directory of Open Access Journals (Sweden)

    Sepideh Karimi

    2012-06-01

    Full Text Available Forecasting lake level at various prediction intervals is an essential issue in such industrial applications as navigation, water resource planning and catchment management. In the present study, two data driven techniques, namely Gene Expression Programming and Adaptive Neuro-Fuzzy Inference System, were applied for predicting daily lake levels for three prediction intervals. Daily water-level data from Urmieh Lake in Northwestern Iran were used to train, test and validate the used techniques. Three statistical indexes, coefficient of determination, root mean square error and variance accounted for were used to assess the performance of the used techniques. Technique inter-comparisons demonstrated that the GEP surpassed the ANFIS model at each of the prediction intervals. A traditional auto regressive moving average model was also applied to the same data sets; the obtained results were compared with those of the data driven approaches demonstrating superiority of the data driven models to ARMA.

  9. The antibiotic resistome: gene flow in environments, animals and human beings.

    Science.gov (United States)

    Hu, Yongfei; Gao, George F; Zhu, Baoli

    2017-06-01

    The antibiotic resistance is natural in bacteria and predates the human use of antibiotics. Numerous antibiotic resistance genes (ARGs) have been discovered to confer resistance to a wide range of antibiotics. The ARGs in natural environments are highly integrated and tightly regulated in specific bacterial metabolic networks. However, the antibiotic selection pressure conferred by the use of antibiotics in both human medicine and agriculture practice leads to a significant increase of antibiotic resistance and a steady accumulation of ARGs in bacteria. In this review, we summarized, with an emphasis on an ecological point of view, the important research progress regarding the collective ARGs (antibiotic resistome) in bacterial communities of natural environments, human and animals, i.e., in the one health settings.We propose that the resistance gene flow in nature is "from the natural environments" and "to the natural environments"; human and animals, as intermediate recipients and disseminators, contribute greatly to such a resistance gene "circulation."

  10. Models of gene gain and gene loss for probabilistic reconstruction of gene content in the last universal common ancestor of life.

    Science.gov (United States)

    Kannan, Lavanya; Li, Hua; Rubinstein, Boris; Mushegian, Arcady

    2013-12-19

    The problem of probabilistic inference of gene content in the last common ancestor of several extant species with completely sequenced genomes is: for each gene that is conserved in all or some of the genomes, assign the probability that its ancestral gene was present in the genome of their last common ancestor. We have developed a family of models of gene gain and gene loss in evolution, and applied the maximum-likelihood approach that uses phylogenetic tree of prokaryotes and the record of orthologous relationships between their genes to infer the gene content of LUCA, the Last Universal Common Ancestor of all currently living cellular organisms. The crucial parameter, the ratio of gene losses and gene gains, was estimated from the data and was higher in models that take account of the number of in-paralogs in genomes than in models that treat gene presences and absences as a binary trait. While the numbers of genes that are placed confidently into LUCA are similar in the ML methods and in previously published methods that use various parsimony-based approaches, the identities of genes themselves are different. Most of the models of either kind treat the genes found in many existing genomes in a similar way, assigning to them high probabilities of being ancestral ("high ancestrality"). The ML models are more likely than others to assign high ancestrality to the genes that are relatively rare in the present-day genomes.

  11. Empirical phylogenies and species abundance distributions are consistent with pre-equilibrium dynamics of neutral community models with gene flow

    KAUST Repository

    Bonnet-Lebrun, Anne-Sophie

    2017-03-17

    Community characteristics reflect past ecological and evolutionary dynamics. Here, we investigate whether it is possible to obtain realistically shaped modelled communities - i.e., with phylogenetic trees and species abundance distributions shaped similarly to typical empirical bird and mammal communities - from neutral community models. To test the effect of gene flow, we contrasted two spatially explicit individual-based neutral models: one with protracted speciation, delayed by gene flow, and one with point mutation speciation, unaffected by gene flow. The former produced more realistic communities (shape of phylogenetic tree and species-abundance distribution), consistent with gene flow being a key process in macro-evolutionary dynamics. Earlier models struggled to capture the empirically observed branching tempo in phylogenetic trees, as measured by the gamma statistic. We show that the low gamma values typical of empirical trees can be obtained in models with protracted speciation, in pre-equilibrium communities developing from an initially abundant and widespread species. This was even more so in communities sampled incompletely, particularly if the unknown species are the youngest. Overall, our results demonstrate that the characteristics of empirical communities that we have studied can, to a large extent, be explained through a purely neutral model under pre-equilibrium conditions. This article is protected by copyright. All rights reserved.

  12. Empirical phylogenies and species abundance distributions are consistent with pre-equilibrium dynamics of neutral community models with gene flow

    KAUST Repository

    Bonnet-Lebrun, Anne-Sophie; Manica, Andrea; Eriksson, Anders; Rodrigues, Ana S.L.

    2017-01-01

    Community characteristics reflect past ecological and evolutionary dynamics. Here, we investigate whether it is possible to obtain realistically shaped modelled communities - i.e., with phylogenetic trees and species abundance distributions shaped similarly to typical empirical bird and mammal communities - from neutral community models. To test the effect of gene flow, we contrasted two spatially explicit individual-based neutral models: one with protracted speciation, delayed by gene flow, and one with point mutation speciation, unaffected by gene flow. The former produced more realistic communities (shape of phylogenetic tree and species-abundance distribution), consistent with gene flow being a key process in macro-evolutionary dynamics. Earlier models struggled to capture the empirically observed branching tempo in phylogenetic trees, as measured by the gamma statistic. We show that the low gamma values typical of empirical trees can be obtained in models with protracted speciation, in pre-equilibrium communities developing from an initially abundant and widespread species. This was even more so in communities sampled incompletely, particularly if the unknown species are the youngest. Overall, our results demonstrate that the characteristics of empirical communities that we have studied can, to a large extent, be explained through a purely neutral model under pre-equilibrium conditions. This article is protected by copyright. All rights reserved.

  13. Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

    Science.gov (United States)

    Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

    2017-01-01

    Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.

  14. Identification of landscape features influencing gene flow: How useful are habitat selection models?

    Science.gov (United States)

    Gretchen H. Roffler; Michael K. Schwartz; Kristine Pilgrim; Sandra L. Talbot; George K. Sage; Layne G. Adams; Gordon Luikart

    2016-01-01

    Understanding how dispersal patterns are influenced by landscape heterogeneity is critical for modeling species connectivity. Resource selection function (RSF) models are increasingly used in landscape genetics approaches. However, because the ecological factors that drive habitat selection may be different from those influencing dispersal and gene flow, it is...

  15. Geographical patterns of adaptation within a species' range : Interactions between drift and gene flow

    NARCIS (Netherlands)

    Alleaume-Benharira, M; Pen, IR; Ronce, O

    We use individual-based stochastic simulations and analytical deterministic predictions to investigate the interaction between drift, natural selection and gene flow on the patterns of local adaptation across a fragmented species' range under clinally varying selection. Migration between populations

  16. Intersectional gene flow between insular endemics of Ilex (Aquifoliaceae) on the Bonin Islands and the Ryukyu Islands.

    Science.gov (United States)

    Setoguchi, H; Watanabe, I

    2000-06-01

    Hybridization and introgression play important roles in plant evolution, and their occurrence on the oceanic islands provides good examples of plant speciation and diversification. Restriction fragment length polymorphisms (RFLPs) and trnL (UAA) 3'exon-trnF (GAA) intergenic spacer (IGS) sequences of chloroplast DNA (cpDNA), and the sequences of internal transcribed spacer (ITS) of nuclear ribosomal DNA were examined to investigate the occurrence of gene transfer in Ilex species on the Bonin Islands and the Ryukyu Islands in Japan. A gene phylogeny for the plastid genome is in agreement with the morphologically based taxonomy, whereas the nuclear genome phylogeny clusters putatively unrelated endemics both on the Bonin and the Ryukyu Islands. Intersectional hybridization and nuclear gene flow were independently observed in insular endemics of Ilex on both sets of islands without evidence of plastid introgression. Gene flow observed in these island systems can be explained by ecological features of insular endemics, i.e., limits of distribution range or sympatric distribution in a small land area.

  17. Inference of Causal Relationships between Biomarkers and Outcomes in High Dimensions

    Directory of Open Access Journals (Sweden)

    Felix Agakov

    2011-12-01

    Full Text Available We describe a unified computational framework for learning causal dependencies between genotypes, biomarkers, and phenotypic outcomes from large-scale data. In contrast to previous studies, our framework allows for noisy measurements, hidden confounders, missing data, and pleiotropic effects of genotypes on outcomes. The method exploits the use of genotypes as “instrumental variables” to infer causal associations between phenotypic biomarkers and outcomes, without requiring the assumption that genotypic effects are mediated only through the observed biomarkers. The framework builds on sparse linear methods developed in statistics and machine learning and modified here for inferring structures of richer networks with latent variables. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs detected in genetic linkage studies. To demonstrate our method, we examined effects of gene transcript levels in the liver on plasma HDL cholesterol levels in a sample of 260 mice from a heterogeneous stock.

  18. Models of gene gain and gene loss for probabilistic reconstruction of gene content in the last universal common ancestor of life

    OpenAIRE

    Kannan, Lavanya; Li, Hua; Rubinstein, Boris; Mushegian, Arcady

    2013-01-01

    Background The problem of probabilistic inference of gene content in the last common ancestor of several extant species with completely sequenced genomes is: for each gene that is conserved in all or some of the genomes, assign the probability that its ancestral gene was present in the genome of their last common ancestor. Results We have developed a family of models of gene gain and gene loss in evolution, and applied the maximum-likelihood approach that uses phylogenetic tree of prokaryotes...

  19. Admixture and gene flow from Russia in the recovering Northern European brown bear (Ursus arctos).

    Science.gov (United States)

    Kopatz, Alexander; Eiken, Hans Geir; Aspi, Jouni; Kojola, Ilpo; Tobiassen, Camilla; Tirronen, Konstantin F; Danilov, Pjotr I; Hagen, Snorre B

    2014-01-01

    Large carnivores were persecuted to near extinction during the last centuries, but have now recovered in some countries. It has been proposed earlier that the recovery of the Northern European brown bear is supported by migration from Russia. We tested this hypothesis by obtaining for the first time continuous sampling of the whole Finnish bear population, which is located centrally between the Russian and Scandinavian bear populations. The Finnish population is assumed to experience high gene flow from Russian Karelia. If so, no or a low degree of genetic differentiation between Finnish and Russian bears could be expected. We have genotyped bears extensively from all over Finland using 12 validated microsatellite markers and compared their genetic composition to bears from Russian Karelia, Sweden, and Norway. Our fine masked investigation identified two overlapping genetic clusters structured by isolation-by-distance in Finland (pairwise FST = 0.025). One cluster included Russian bears, and migration analyses showed a high number of migrants from Russia into Finland, providing evidence of eastern gene flow as an important driver during recovery. In comparison, both clusters excluded bears from Sweden and Norway, and we found no migrants from Finland in either country, indicating that eastern gene flow was probably not important for the population recovery in Scandinavia. Our analyses on different spatial scales suggest a continuous bear population in Finland and Russian Karelia, separated from Scandinavia.

  20. High gene flow in epiphytic ferns despite habitat loss and fragmentation

    OpenAIRE

    Winkler, Manuela; Koch, Marcus; Hietz, Peter

    2011-01-01

    Tropical montane forests suffer from increasing fragmentation and replacement by other types of land-use such as coffee plantations. These processes are known to affect gene flow and genetic structure of plant populations. Epiphytes are particularly vulnerable because they depend on their supporting trees for their entire life-cycle. We compared population genetic structure and genetic diversity derived from AFLP markers of two epiphytic fern species differing in their ability to colonize sec...

  1. Inference algorithms and learning theory for Bayesian sparse factor analysis

    International Nuclear Information System (INIS)

    Rattray, Magnus; Sharp, Kevin; Stegle, Oliver; Winn, John

    2009-01-01

    Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.

  2. Inference algorithms and learning theory for Bayesian sparse factor analysis

    Energy Technology Data Exchange (ETDEWEB)

    Rattray, Magnus; Sharp, Kevin [School of Computer Science, University of Manchester, Manchester M13 9PL (United Kingdom); Stegle, Oliver [Max-Planck-Institute for Biological Cybernetics, Tuebingen (Germany); Winn, John, E-mail: magnus.rattray@manchester.ac.u [Microsoft Research Cambridge, Roger Needham Building, Cambridge, CB3 0FB (United Kingdom)

    2009-12-01

    Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.

  3. Fuzzy boundaries: color and gene flow patterns among parapatric lineages of the western shovel-nosed snake and taxonomic implication.

    Directory of Open Access Journals (Sweden)

    Dustin A Wood

    Full Text Available Accurate delineation of lineage diversity is increasingly important, as species distributions are becoming more reduced and threatened. During the last century, the subspecies category was often used to denote phenotypic variation within a species range and to provide a framework for understanding lineage differentiation, often considered incipient speciation. While this category has largely fallen into disuse, previously recognized subspecies often serve as important units for conservation policy and management when other information is lacking. In this study, we evaluated phenotypic subspecies hypotheses within shovel-nosed snakes on the basis of genetic data and considered how evolutionary processes such as gene flow influenced possible incongruence between phenotypic and genetic patterns. We used both traditional phylogenetic and Bayesian clustering analyses to infer range-wide genetic structure and spatially explicit analyses to detect possible boundary locations of lineage contact. Multilocus analyses supported three historically isolated groups with low to moderate levels of contemporary gene exchange. Genetic data did not support phenotypic subspecies as exclusive groups, and we detected patterns of discordance in areas where three subspecies are presumed to be in contact. Based on genetic and phenotypic evidence, we suggested that species-level diversity is underestimated in this group and we proposed that two species be recognized, Chionactis occipitalis and C. annulata. In addition, we recommend retention of two subspecific designations within C. annulata (C. a. annulata and C. a. klauberi that reflect regional shifts in both genetic and phenotypic variation within the species. Our results highlight the difficultly in validating taxonomic boundaries within lineages that are evolving under a time-dependent, continuous process.

  4. Models of gene gain and gene loss for probabilistic reconstruction of gene content in the last universal common ancestor of life

    Science.gov (United States)

    2013-01-01

    Background The problem of probabilistic inference of gene content in the last common ancestor of several extant species with completely sequenced genomes is: for each gene that is conserved in all or some of the genomes, assign the probability that its ancestral gene was present in the genome of their last common ancestor. Results We have developed a family of models of gene gain and gene loss in evolution, and applied the maximum-likelihood approach that uses phylogenetic tree of prokaryotes and the record of orthologous relationships between their genes to infer the gene content of LUCA, the Last Universal Common Ancestor of all currently living cellular organisms. The crucial parameter, the ratio of gene losses and gene gains, was estimated from the data and was higher in models that take account of the number of in-paralogs in genomes than in models that treat gene presences and absences as a binary trait. Conclusion While the numbers of genes that are placed confidently into LUCA are similar in the ML methods and in previously published methods that use various parsimony-based approaches, the identities of genes themselves are different. Most of the models of either kind treat the genes found in many existing genomes in a similar way, assigning to them high probabilities of being ancestral (“high ancestrality”). The ML models are more likely than others to assign high ancestrality to the genes that are relatively rare in the present-day genomes. Reviewers This article was reviewed by Martijn A Huynen, Toni Gabaldón and Fyodor Kondrashov. PMID:24354654

  5. Genetic structure and gene flows within horses: a genealogical study at the french population scale.

    Science.gov (United States)

    Pirault, Pauline; Danvy, Sophy; Verrier, Etienne; Leroy, Grégoire

    2013-01-01

    Since horse breeds constitute populations submitted to variable and multiple outcrossing events, we analyzed the genetic structure and gene flows considering horses raised in France. We used genealogical data, with a reference population of 547,620 horses born in France between 2002 and 2011, grouped according to 55 breed origins. On average, individuals had 6.3 equivalent generations known. Considering different population levels, fixation index decreased from an overall species FIT of 1.37%, to an average [Formula: see text] of -0.07% when considering the 55 origins, showing that most horse breeds constitute populations without genetic structure. We illustrate the complexity of gene flows existing among horse breeds, a few populations being closed to foreign influence, most, however, being submitted to various levels of introgression. In particular, Thoroughbred and Arab breeds are largely used as introgression sources, since those two populations explain together 26% of founder origins within the overall horse population. When compared with molecular data, breeds with a small level of coancestry also showed low genetic distance; the gene pool of the breeds was probably impacted by their reproducer exchanges.

  6. Isolation with asymmetric gene flow during the nonsynchronous divergence of dry forest birds.

    Science.gov (United States)

    Oswald, Jessica A; Overcast, Isaac; Mauck, William M; Andersen, Michael J; Smith, Brian Tilston

    2017-03-01

    Dry forest bird communities in South America are often fragmented by intervening mountains and rainforests, generating high local endemism. The historical assembly of dry forest communities likely results from dynamic processes linked to numerous population histories among codistributed species. Nevertheless, species may diversify in the same way through time if landscape and environmental features, or species ecologies, similarly structure populations. Here we tested whether six co-distributed taxon pairs that occur in the dry forests of the Tumbes and Marañón Valley of northwestern South America show concordant patterns and modes of diversification. We employed a genome reduction technique, double-digest restriction site-associated DNA sequencing, and obtained 4407-7186 genomewide SNPs. We estimated demographic history in each taxon pair and inferred that all pairs had the same best-fit demographic model: isolation with asymmetric gene flow from the Tumbes into the Marañón Valley, suggesting a common diversification mode. Overall, we also observed congruence in effective population size (N e ) patterns where ancestral N e were 2.9-11.0× larger than present-day Marañón Valley populations and 0.3-2.0× larger than Tumbesian populations. Present-day Marañón Valley N e was smaller than Tumbes. In contrast, we found simultaneous population isolation due to a single event to be unlikely as taxon pairs diverged over an extended period of time (0.1-2.9 Ma) with multiple nonoverlapping divergence periods. Our results show that even when populations of codistributed species asynchronously diverge, the mode of their differentiation can remain conserved over millions of years. Divergence by allopatric isolation due to barrier formation does not explain the mode of differentiation between these two bird assemblages; rather, migration of individuals occurred before and after geographic isolation. © 2017 John Wiley & Sons Ltd.

  7. Using Semantic Association to Extend and Infer Literature-Oriented Relativity Between Terms.

    Science.gov (United States)

    Cheng, Liang; Li, Jie; Hu, Yang; Jiang, Yue; Liu, Yongzhuang; Chu, Yanshuo; Wang, Zhenxing; Wang, Yadong

    2015-01-01

    Relative terms often appear together in the literature. Methods have been presented for weighting relativity of pairwise terms by their co-occurring literature and inferring new relationship. Terms in the literature are also in the directed acyclic graph of ontologies, such as Gene Ontology and Disease Ontology. Therefore, semantic association between terms may help for establishing relativities between terms in literature. However, current methods do not use these associations. In this paper, an adjusted R-scaled score (ARSS) based on information content (ARSSIC) method is introduced to infer new relationship between terms. First, set inclusion relationship between terms of ontology was exploited to extend relationships between these terms and literature. Next, the ARSS method was presented to measure relativity between terms across ontologies according to these extensional relationships. Then, the ARSSIC method using ratios of information shared of term's ancestors was designed to infer new relationship between terms across ontologies. The result of the experiment shows that ARSS identified more pairs of statistically significant terms based on corresponding gene sets than other methods. And the high average area under the receiver operating characteristic curve (0.9293) shows that ARSSIC achieved a high true positive rate and a low false positive rate. Data is available at http://mlg.hit.edu.cn/ARSSIC/.

  8. Exploiting the Vulnerability of Flow Table Overflow in Software-Defined Network: Attack Model, Evaluation, and Defense

    Directory of Open Access Journals (Sweden)

    Yadong Zhou

    2018-01-01

    Full Text Available As the most competitive solution for next-generation network, SDN and its dominant implementation OpenFlow are attracting more and more interests. But besides convenience and flexibility, SDN/OpenFlow also introduces new kinds of limitations and security issues. Of these limitations, the most obvious and maybe the most neglected one is the flow table capacity of SDN/OpenFlow switches. In this paper, we proposed a novel inference attack targeting at SDN/OpenFlow network, which is motivated by the limited flow table capacities of SDN/OpenFlow switches and the following measurable network performance decrease resulting from frequent interactions between data and control plane when the flow table is full. To the best of our knowledge, this is the first proposed inference attack model of this kind for SDN/OpenFlow. We implemented an inference attack framework according to our model and examined its efficiency and accuracy. The evaluation results demonstrate that our framework can infer the network parameters (flow table capacity and usage with an accuracy of 80% or higher. We also proposed two possible defense strategies for the discovered vulnerability, including routing aggregation algorithm and multilevel flow table architecture. These findings give us a deeper understanding of SDN/OpenFlow limitations and serve as guidelines to future improvements of SDN/OpenFlow.

  9. Genetic structure, mating system, and long-distance gene flow in heart of palm (Euterpe edulis Mart.).

    Science.gov (United States)

    Gaiotto, F A; Grattapaglia, D; Vencovsky, R

    2003-01-01

    We report a detailed analysis of the population genetic structure, mating system, and gene flow of heart of palm (Euterpe edulis Mart.-Arecaceae) in central Brazil. This palm is considered a keystone species because it supplies fruits for birds and rodents all year and is intensively harvested for culinary purposes. Two populations of this palm tree were examined, using 18 microsatellite loci. The species displays a predominantly outcrossed mating system (tm = 0.94), with a probability of full sibship greater than 70% within open-pollinated families. The following estimates of interpopulation genetic variation were calculated and found significant: FIT = 0.17, FIS = 0.12, FST = 0.06, and RST = 0.07. This low but significant level of interpopulation genetic variation indicates high levels of gene flow. Two adult trees were identified as likely seed parents (P > 99.9%) of juveniles located at a distance of 22 km. Gene flow over such distances has not been reported before for tropical tree species. The establishment and management of in situ genetic reserves or ex situ conservation and breeding populations for E. edulis should contemplate the collection of several hundreds open-pollinated maternal families from relatively few distant populations to maximize the genetic sampling of a larger number of pollen parents.

  10. Structure, expression profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L. genome

    Directory of Open Access Journals (Sweden)

    Łucja ePrzysiecka

    2015-04-01

    Full Text Available Lupins, like other legumes, have a unique biosynthesis scheme of 5-deoxy-type flavonoids and isoflavonoids. A key enzyme in this pathway is chalcone isomerase (CHI, a member of CHI-fold protein family, encompassing subfamilies of CHI1, CHI2, CHI-like (CHIL, and fatty acid-binding (FAP proteins. Here, two Lupinus angustifolius (narrow-leafed lupin CHILs, LangCHIL1 and LangCHIL2, were identified and characterized using DNA fingerprinting, cytogenetic and linkage mapping, sequencing and expression profiling. Clones carrying CHIL sequences were assembled into two contigs. Full gene sequences were obtained from these contigs, and mapped in two L. angustifolius linkage groups by gene-specific markers. Bacterial artificial chromosome fluorescence in situ hybridization approach confirmed the localization of two LangCHIL genes in distinct chromosomes. The expression profiles of both LangCHIL isoforms were very similar. The highest level of transcription was in the roots of the third week of plant growth; thereafter, expression declined. The expression of both LangCHIL genes in leaves and stems was similar and low. Comparative mapping to reference legume genome sequences revealed strong syntenic links; however, LangCHIL2 contig had a much more conserved structure than LangCHIL1. LangCHIL2 is assumed to be an ancestor gene, whereas LangCHIL1 probably appeared as a result of duplication. As both copies are transcriptionally active, questions arise concerning their hypothetical functional divergence. Screening of the narrow-leafed lupin genome and transcriptome with CHI-fold protein sequences, followed by Bayesian inference of phylogeny and cross-genera synteny survey, identified representatives of all but one (CHI1 main subfamilies. They are as follows: two copies of CHI2, FAPa2 and CHIL, and single copies of FAPb and FAPa1. Duplicated genes are remnants of whole genome duplication which is assumed to have occurred after the divergence of Lupinus, Arachis

  11. Highly Pathogenic H5N1 Avian Influenza Viruses Exhibit Few Barriers to Gene Flow in Vietnam

    Science.gov (United States)

    Carrel, Margaret; Wan, Xiu-Feng; Nguyen, Tung; Emch, Michael

    2013-01-01

    Locating areas where genetic change is inhibited can illuminate underlying processes that drive evolution of pathogens. The persistence of highly pathogenic H5N1 avian influenza in Vietnam since 2003, and the continuous molecular evolution of Vietnamese avian influenza viruses, indicates that local environmental factors are supportive not only of incidence but also of viral adaptation. This article explores whether gene flow is constant across Vietnam, or whether there exist boundary areas where gene flow exhibits discontinuity. Using a dataset of 125 highly pathogenic H5N1 avian influenza viruses, principal components analysis and wombling analysis are used to indicate the location, magnitude, and statistical significance of genetic boundaries. Results show that a small number of geographically minor boundaries to gene flow in highly pathogenic H5N1 avian influenza viruses exist in Vietnam, but that overall there is little division in genetic exchange. This suggests that differences in genetic characteristics of viruses from one region to another are not the result of barriers to H5N1 viral exchange in Vietnam, and that H5N1 avian influenza is able to spread relatively unimpeded across the country. PMID:22350419

  12. Limited contemporary gene flow and high self-replenishment drives peripheral isolation in an endemic coral reef fish.

    Science.gov (United States)

    van der Meer, Martin H; Horne, John B; Gardner, Michael G; Hobbs, Jean-Paul A; Pratchett, Morgan; van Herwerden, Lynne

    2013-06-01

    Extensive ongoing degradation of coral reef habitats worldwide has lead to declines in abundance of coral reef fishes and local extinction of some species. Those most vulnerable are ecological specialists and endemic species. Determining connectivity between locations is vital to understanding recovery and long-term persistence of these species following local extinction. This study explored population connectivity in the ecologically-specialized endemic three-striped butterflyfish (Chaetodon tricinctus) using mt and msatDNA (nuclear microsatellites) to distinguish evolutionary versus contemporary gene flow, estimate self-replenishment and measure genetic diversity among locations at the remote Australian offshore coral reefs of Middleton Reef (MR), Elizabeth Reef (ER), Lord Howe Island (LHI), and Norfolk Island (NI). Mt and msatDNA suggested genetic differentiation of the most peripheral location (NI) from the remaining three locations (MR, ER, LHI). Despite high levels of mtDNA gene flow, there is limited msatDNA gene flow with evidence of high levels of self-replenishment (≥76%) at all four locations. Taken together, this suggests prolonged population recovery times following population declines. The peripheral population (NI) is most vulnerable to local extinction due to its relative isolation, extreme levels of self-replenishment (95%), and low contemporary abundance.

  13. Entropic Inference

    Science.gov (United States)

    Caticha, Ariel

    2011-03-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.

  14. Tocantins river as an effective barrier to gene flow in Saguinus niger populations

    Directory of Open Access Journals (Sweden)

    Marcelo Vallinoto

    2006-01-01

    Full Text Available The Saguinus represent the basal genus of the Callitrichinae subfamily. Traditionally this genus is divided into three groups: Hairy, Mottled and Bare-face, however, molecular data failed to validate these groups as monophyletic units, as well as raised some subspecies to the species status. This is the case of the former subspecies Saguinus midas midas and S. midas niger, which are now considered as different species. In the present study, we sequenced a portion of the D-loop mtDNA region in populations from the East bank of the Xingu and from both banks of the Tocantins river, in order to test the effectiveness of large rivers as barriers to the gene flow in Saguinus. According to our results, the populations from the East and West banks of the Tocantins river are more divergent than true species like S. mystax and S. imperator. The Tocantins river may be acting as a barrier to gene flow, and consequently these very divergent populations may represent distinct taxonomic entities (species?.

  15. Adaptive population divergence and directional gene flow across steep elevational gradients in a climate-sensitive mammal.

    Science.gov (United States)

    Waterhouse, Matthew D; P Erb, Liesl; Beever, Erik A; Russello, Michael A

    2018-04-25

    The ecological effects of climate change have been shown in most major taxonomic groups; however, the evolutionary consequences are less well-documented. Adaptation to new climatic conditions offers a potential long-term mechanism for species to maintain viability in rapidly changing environments, but mammalian examples remain scarce. The American pika (Ochotona princeps) has been impacted by recent climate-associated extirpations and range-wide reductions in population sizes, establishing it as a sentinel mammalian species for climate change. To investigate evidence for local adaptation and reconstruct patterns of genomic diversity and gene flow across rapidly changing environments, we used a space-for-time design and restriction site-associated DNA sequencing to genotype American pikas along two steep elevational gradients at 30,966 SNPs and employed independent outlier detection methods that scanned for genotype-environment associations. We identified 338 outlier SNPs detected by two separate analyses and/or replicated in both transects, several of which were annotated to genes involved in metabolic function and oxygen transport. Additionally, we found evidence of directional gene flow primarily downslope from high-elevation populations, along with reduced gene flow at outlier loci. If this trend continues, elevational range contractions in American pikas will likely be from local extirpation rather than upward movement of low-elevation individuals; this, in turn, could limit the potential for adaptation within this landscape. These findings are of particular relevance for future conservation and management of American pikas and other elevationally-restricted, thermally-sensitive species. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  16. Flow Cytometry-Assisted Cloning of Specific Sequence Motifs from Complex 16S rRNA Gene Libraries

    DEFF Research Database (Denmark)

    Nielsen, Jeppe Lund; Schramm, Andreas; Bernhard, Anne E.

    2004-01-01

    for Systems Biology,3 Seattle, Washington, and Department of Ecological Microbiology, University of Bayreuth, Bayreuth, Germany2 A flow cytometry method was developed for rapid screening and recovery of cloned DNA containing common sequence motifs. This approach, termed fluorescence-activated cell sorting......  FLOW CYTOMETRY-ASSISTED CLONING OF SPECIFIC SEQUENCE MOTIFS FROM COMPLEX 16S RRNA GENE LIBRARIES Jeppe L. Nielsen,1 Andreas Schramm,1,2 Anne E. Bernhard,1 Gerrit J. van den Engh,3 and David A. Stahl1* Department of Civil and Environmental Engineering, University of Washington,1 and Institute......-assisted cloning, was used to recover sequences affiliated with a unique lineage within the Bacteroidetes not abundant in a clone library of environmental 16S rRNA genes.  ...

  17. Mathematical inference and control of molecular networks from perturbation experiments

    Science.gov (United States)

    Mohammed-Rasheed, Mohammed

    One of the main challenges facing biologists and mathematicians in the post genomic era is to understand the behavior of molecular networks and harness this understanding into an educated intervention of the cell. The cell maintains its function via an elaborate network of interconnecting positive and negative feedback loops of genes, RNA and proteins that send different signals to a large number of pathways and molecules. These structures are referred to as genetic regulatory networks (GRNs) or molecular networks. GRNs can be viewed as dynamical systems with inherent properties and mechanisms, such as steady-state equilibriums and stability, that determine the behavior of the cell. The biological relevance of the mathematical concepts are important as they may predict the differentiation of a stem cell, the maintenance of a normal cell, the development of cancer and its aberrant behavior, and the design of drugs and response to therapy. Uncovering the underlying GRN structure from gene/protein expression data, e.g., microarrays or perturbation experiments, is called inference or reverse engineering of the molecular network. Because of the high cost and time consuming nature of biological experiments, the number of available measurements or experiments is very small compared to the number of molecules (genes, RNA and proteins). In addition, the observations are noisy, where the noise is due to the measurements imperfections as well as the inherent stochasticity of genetic expression levels. Intra-cellular activities and extra-cellular environmental attributes are also another source of variability. Thus, the inference of GRNs is, in general, an under-determined problem with a highly noisy set of observations. The ultimate goal of GRN inference and analysis is to be able to intervene within the network, in order to force it away from undesirable cellular states and into desirable ones. However, it remains a major challenge to design optimal intervention strategies

  18. Inferring ontology graph structures using OWL reasoning

    KAUST Repository

    Rodriguez-Garcia, Miguel Angel

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies\\' semantic content remains a challenge.We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies\\' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph .Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  19. Inferring ontology graph structures using OWL reasoning.

    Science.gov (United States)

    Rodríguez-García, Miguel Ángel; Hoehndorf, Robert

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  20. Invasion of Ancestral Mammals into Dim-light Environments Inferred from Adaptive Evolution of the Phototransduction Genes.

    Science.gov (United States)

    Wu, Yonghua; Wang, Haifeng; Hadly, Elizabeth A

    2017-04-20

    Nocturnality is a key evolutionary innovation of mammals that enables mammals to occupy relatively empty nocturnal niches. Invasion of ancestral mammals into nocturnality has long been inferred from the phylogenetic relationships of crown Mammalia, which is primarily nocturnal, and crown Reptilia, which is primarily diurnal, although molecular evidence for this is lacking. Here we used phylogenetic analyses of the vision genes involved in the phototransduction pathway to predict the diel activity patterns of ancestral mammals and reptiles. Our results demonstrated that the common ancestor of the extant Mammalia was dominated by positive selection for dim-light vision, supporting the predominate nocturnality of the ancestral mammals. Further analyses showed that the nocturnality of the ancestral mammals was probably derived from the predominate diurnality of the ancestral amniotes, which featured strong positive selection for bright-light vision. Like the ancestral amniotes, the common ancestor of the extant reptiles and various taxa in Squamata, one of the main competitors of the temporal niches of the ancestral mammals, were found to be predominate diurnality as well. Despite this relatively apparent temporal niche partitioning between ancestral mammals and the relevant reptiles, our results suggested partial overlap of their temporal niches during crepuscular periods.

  1. Molecular inference of sources and spreading patterns of Plasmodium falciparum malaria parasites in internally displaced persons settlements in Myanmar-China border area.

    Science.gov (United States)

    Lo, Eugenia; Zhou, Guofa; Oo, Winny; Lee, Ming-Chieh; Baum, Elisabeth; Felgner, Philip L; Yang, Zhaoqing; Cui, Liwang; Yan, Guiyun

    2015-07-01

    In Myanmar, civil unrest and establishment of internally displaced persons (IDP) settlement along the Myanmar-China border have impacted malaria transmission. The growing IDP populations raise deep concerns about health impact on local communities. Microsatellite markers were used to examine the source and spreading patterns of Plasmodium falciparum between IDP settlement and surrounding villages in Myanmar along the China border. Genotypic structure of P. falciparum was compared over the past three years from the same area and the demographic history was inferred to determine the source of recent infections. In addition, we examined if border migration is a factor of P. falciparum infections in China by determining gene flow patterns across borders. Compared to local community, the IDP samples showed a reduced and consistently lower genetic diversity over the past three years. A strong signature of genetic bottleneck was detected in the IDP samples. P. falciparum infections from the border regions in China were genetically similar to Myanmar and parasite gene flow was not constrained by geographical distance. Reduced genetic diversity of P. falciparum suggested intense malaria control within the IDP settlement. Human movement was a key factor to the spread of malaria both locally in Myanmar and across the international border. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. More than one kind of inference: re-examining what's learned in feature inference and classification.

    Science.gov (United States)

    Sweller, Naomi; Hayes, Brett K

    2010-08-01

    Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.

  3. Perceptual inference.

    Science.gov (United States)

    Aggelopoulos, Nikolaos C

    2015-08-01

    Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. The impact of selection, gene flow and demographic history on heterogeneous genomic divergence: three-spine sticklebacks in divergent environments.

    Science.gov (United States)

    Ferchaud, Anne-Laure; Hansen, Michael M

    2016-01-01

    Heterogeneous genomic divergence between populations may reflect selection, but should also be seen in conjunction with gene flow and drift, particularly population bottlenecks. Marine and freshwater three-spine stickleback (Gasterosteus aculeatus) populations often exhibit different lateral armour plate morphs. Moreover, strikingly parallel genomic footprints across different marine-freshwater population pairs are interpreted as parallel evolution and gene reuse. Nevertheless, in some geographic regions like the North Sea and Baltic Sea, different patterns are observed. Freshwater populations in coastal regions are often dominated by marine morphs, suggesting that gene flow overwhelms selection, and genomic parallelism may also be less pronounced. We used RAD sequencing for analysing 28 888 SNPs in two marine and seven freshwater populations in Denmark, Europe. Freshwater populations represented a variety of environments: river populations accessible to gene flow from marine sticklebacks and large and small isolated lakes with and without fish predators. Sticklebacks in an accessible river environment showed minimal morphological and genomewide divergence from marine populations, supporting the hypothesis of gene flow overriding selection. Allele frequency spectra suggested bottlenecks in all freshwater populations, and particularly two small lake populations. However, genomic footprints ascribed to selection could nevertheless be identified. No genomic regions were consistent freshwater-marine outliers, and parallelism was much lower than in other comparable studies. Two genomic regions previously described to be under divergent selection in freshwater and marine populations were outliers between different freshwater populations. We ascribe these patterns to stronger environmental heterogeneity among freshwater populations in our study as compared to most other studies, although the demographic history involving bottlenecks should also be considered in the

  5. High seed dispersal ability of Pinus canariensis in stands of contrasting density inferred from genotypic data

    Directory of Open Access Journals (Sweden)

    Unai López de Heredia

    2015-04-01

    Full Text Available Aim of the study: Models that combine parentage analysis from molecular data with spatial information of seeds and seedlings provide a framework to describe and identify the factors involved in seed dispersal and recruitment of forest species. In the present study we used a spatially explicit method (the gene shadow model in order to assess primary and effective dispersal in Pinus canariensis. Area of study: Pinus canariensis is endemic to the Canary Islands (Spain. Sampling sites were a high density forest in southern slopes of Tenerife and a low density stand in South Gran Canaria. Materials and methods: We fitted models based on parentage analysis from seeds and seedlings collected in two sites with contrasting stand density, and then compared the resulting dispersal distributions. Main results: The results showed that: 1 P. canariensis has a remarkable dispersal ability compared to other pine species; 2 there is no discordance between primary and effective dispersals, suggesting limited secondary dispersal by animals and lack of Janzen-Connell effect; and 3 low stand densities enhance the extent of seed dispersal, which was higher in the low density stand. Research highlights: The efficient dispersal mechanism of P. canariensis by wind inferred by the gene shadow model is congruent with indirect measures of gene flow, and has utility in reconstructing past demographic events and in predicting future distribution ranges for the species.

  6. Mantle Circulation Models with variational data assimilation: Inferring past mantle flow and structure from plate motion histories and seismic tomography

    Science.gov (United States)

    Bunge, H.; Hagelberg, C.; Travis, B.

    2002-12-01

    -Cretaceous mantle structure can be inferred accurately from our inverse approach assuming present-day mantle structure is well-known, even if an initial first guess assumption about the mid-Cretaceous mantle involved only a simple 1-D radial temperature profile. We suggest that geodynamic inverse modeling should make it possible to infer a number of flow parameters from observational constraints of the mantle.

  7. SEMANTIC PATCH INFERENCE

    DEFF Research Database (Denmark)

    Andersen, Jesper

    2009-01-01

    Collateral evolution the problem of updating several library-using programs in response to API changes in the used library. In this dissertation we address the issue of understanding collateral evolutions by automatically inferring a high-level specification of the changes evident in a given set ...... specifications inferred by spdiff in Linux are shown. We find that the inferred specifications concisely capture the actual collateral evolution performed in the examples....

  8. Adaptive traits are maintained on steep selective gradients despite gene flow and hybridization in the intertidal zone.

    Directory of Open Access Journals (Sweden)

    Gerardo I Zardi

    Full Text Available Gene flow among hybridizing species with incomplete reproductive barriers blurs species boundaries, while selection under heterogeneous local ecological conditions or along strong gradients may counteract this tendency. Congeneric, externally-fertilizing fucoid brown algae occur as distinct morphotypes along intertidal exposure gradients despite gene flow. Combining analyses of genetic and phenotypic traits, we investigate the potential for physiological resilience to emersion stressors to act as an isolating mechanism in the face of gene flow. Along vertical exposure gradients in the intertidal zone of Northern Portugal and Northwest France, the mid-low shore species Fucus vesiculosus, the upper shore species Fucus spiralis, and an intermediate distinctive morphotype of F. spiralis var. platycarpus were morphologically characterized. Two diagnostic microsatellite loci recovered 3 genetic clusters consistent with prior morphological assignment. Phylogenetic analysis based on single nucleotide polymorphisms in 14 protein coding regions unambiguously resolved 3 clades; sympatric F. vesiculosus, F. spiralis, and the allopatric (in southern Iberia population of F. spiralis var. platycarpus. In contrast, the sympatric F. spiralis var. platycarpus (from Northern Portugal was distributed across the 3 clades, strongly suggesting hybridization/introgression with both other entities. Common garden experiments showed that physiological resilience following exposure to desiccation/heat stress differed significantly between the 3 sympatric genetic taxa; consistent with their respective vertical distribution on steep environmental clines in exposure time. Phylogenetic analyses indicate that F. spiralis var. platycarpus is a distinct entity in allopatry, but that extensive gene flow occurs with both higher and lower shore species in sympatry. Experimental results suggest that strong selection on physiological traits across steep intertidal exposure gradients

  9. Gene delivery by microfluidic flow-through electroporation based on constant DC and AC field.

    Science.gov (United States)

    Geng, Tao; Zhan, Yihong; Lu, Chang

    2012-01-01

    Electroporation is one of the most widely used physical methods to deliver exogenous nucleic acids into cells with high efficiency and low toxicity. Conventional electroporation systems typically require expensive pulse generators to provide short electrical pulses at high voltage. In this work, we demonstrate a flow-through electroporation method for continuous transfection of cells based on disposable chips, a syringe pump, and a low-cost power supply that provides a constant voltage. We successfully transfect cells using either DC or AC voltage with high flow rates (ranging from 40 µl/min to 20 ml/min) and high efficiency (up to 75%). We also enable the entire cell membrane to be uniformly permeabilized and dramatically improve gene delivery by inducing complex migrations of cells during the flow.

  10. Probabilistic Inference of Biological Networks via Data Integration

    Directory of Open Access Journals (Sweden)

    Mark F. Rogers

    2015-01-01

    Full Text Available There is significant interest in inferring the structure of subcellular networks of interaction. Here we consider supervised interactive network inference in which a reference set of known network links and nonlinks is used to train a classifier for predicting new links. Many types of data are relevant to inferring functional links between genes, motivating the use of data integration. We use pairwise kernels to predict novel links, along with multiple kernel learning to integrate distinct sources of data into a decision function. We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%.

  11. Genetic structure and gene flows within horses: a genealogical study at the french population scale.

    Directory of Open Access Journals (Sweden)

    Pauline Pirault

    Full Text Available Since horse breeds constitute populations submitted to variable and multiple outcrossing events, we analyzed the genetic structure and gene flows considering horses raised in France. We used genealogical data, with a reference population of 547,620 horses born in France between 2002 and 2011, grouped according to 55 breed origins. On average, individuals had 6.3 equivalent generations known. Considering different population levels, fixation index decreased from an overall species FIT of 1.37%, to an average [Formula: see text] of -0.07% when considering the 55 origins, showing that most horse breeds constitute populations without genetic structure. We illustrate the complexity of gene flows existing among horse breeds, a few populations being closed to foreign influence, most, however, being submitted to various levels of introgression. In particular, Thoroughbred and Arab breeds are largely used as introgression sources, since those two populations explain together 26% of founder origins within the overall horse population. When compared with molecular data, breeds with a small level of coancestry also showed low genetic distance; the gene pool of the breeds was probably impacted by their reproducer exchanges.

  12. Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

    Directory of Open Access Journals (Sweden)

    Adina J Renz

    Full Text Available Cartilaginous fishes, divided into Holocephali (chimaeras and Elasmoblanchii (sharks, rays and skates, occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.

  13. Multimodel inference and adaptive management

    Science.gov (United States)

    Rehme, S.E.; Powell, L.A.; Allen, Craig R.

    2011-01-01

    Ecology is an inherently complex science coping with correlated variables, nonlinear interactions and multiple scales of pattern and process, making it difficult for experiments to result in clear, strong inference. Natural resource managers, policy makers, and stakeholders rely on science to provide timely and accurate management recommendations. However, the time necessary to untangle the complexities of interactions within ecosystems is often far greater than the time available to make management decisions. One method of coping with this problem is multimodel inference. Multimodel inference assesses uncertainty by calculating likelihoods among multiple competing hypotheses, but multimodel inference results are often equivocal. Despite this, there may be pressure for ecologists to provide management recommendations regardless of the strength of their study’s inference. We reviewed papers in the Journal of Wildlife Management (JWM) and the journal Conservation Biology (CB) to quantify the prevalence of multimodel inference approaches, the resulting inference (weak versus strong), and how authors dealt with the uncertainty. Thirty-eight percent and 14%, respectively, of articles in the JWM and CB used multimodel inference approaches. Strong inference was rarely observed, with only 7% of JWM and 20% of CB articles resulting in strong inference. We found the majority of weak inference papers in both journals (59%) gave specific management recommendations. Model selection uncertainty was ignored in most recommendations for management. We suggest that adaptive management is an ideal method to resolve uncertainty when research results in weak inference.

  14. A comparative study of artificial neural network, adaptive neuro fuzzy inference system and support vector machine for forecasting river flow in the semiarid mountain region

    Science.gov (United States)

    He, Zhibin; Wen, Xiaohu; Liu, Hu; Du, Jun

    2014-02-01

    Data driven models are very useful for river flow forecasting when the underlying physical relationships are not fully understand, but it is not clear whether these data driven models still have a good performance in the small river basin of semiarid mountain regions where have complicated topography. In this study, the potential of three different data driven methods, artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS) and support vector machine (SVM) were used for forecasting river flow in the semiarid mountain region, northwestern China. The models analyzed different combinations of antecedent river flow values and the appropriate input vector has been selected based on the analysis of residuals. The performance of the ANN, ANFIS and SVM models in training and validation sets are compared with the observed data. The model which consists of three antecedent values of flow has been selected as the best fit model for river flow forecasting. To get more accurate evaluation of the results of ANN, ANFIS and SVM models, the four quantitative standard statistical performance evaluation measures, the coefficient of correlation (R), root mean squared error (RMSE), Nash-Sutcliffe efficiency coefficient (NS) and mean absolute relative error (MARE), were employed to evaluate the performances of various models developed. The results indicate that the performance obtained by ANN, ANFIS and SVM in terms of different evaluation criteria during the training and validation period does not vary substantially; the performance of the ANN, ANFIS and SVM models in river flow forecasting was satisfactory. A detailed comparison of the overall performance indicated that the SVM model performed better than ANN and ANFIS in river flow forecasting for the validation data sets. The results also suggest that ANN, ANFIS and SVM method can be successfully applied to establish river flow with complicated topography forecasting models in the semiarid mountain regions.

  15. Connectivity in the yeast cell cycle transcription network: inferences from neural networks.

    Directory of Open Access Journals (Sweden)

    Christopher E Hart

    2006-12-01

    Full Text Available A current challenge is to develop computational approaches to infer gene network regulatory relationships based on multiple types of large-scale functional genomic data. We find that single-layer feed-forward artificial neural network (ANN models can effectively discover gene network structure by integrating global in vivo protein:DNA interaction data (ChIP/Array with genome-wide microarray RNA data. We test this on the yeast cell cycle transcription network, which is composed of several hundred genes with phase-specific RNA outputs. These ANNs were robust to noise in data and to a variety of perturbations. They reliably identified and ranked 10 of 12 known major cell cycle factors at the top of a set of 204, based on a sum-of-squared weights metric. Comparative analysis of motif occurrences among multiple yeast species independently confirmed relationships inferred from ANN weights analysis. ANN models can capitalize on properties of biological gene networks that other kinds of models do not. ANNs naturally take advantage of patterns of absence, as well as presence, of factor binding associated with specific expression output; they are easily subjected to in silico "mutation" to uncover biological redundancies; and they can use the full range of factor binding values. A prominent feature of cell cycle ANNs suggested an analogous property might exist in the biological network. This postulated that "network-local discrimination" occurs when regulatory connections (here between MBF and target genes are explicitly disfavored in one network module (G2, relative to others and to the class of genes outside the mitotic network. If correct, this predicts that MBF motifs will be significantly depleted from the discriminated class and that the discrimination will persist through evolution. Analysis of distantly related Schizosaccharomyces pombe confirmed this, suggesting that network-local discrimination is real and complements well-known enrichment of

  16. Lineage diversification of fringe-toed lizards (Phrynosomatidae: Uma notata complex) in the Colorado Desert: Delimiting species in the presence of gene flow

    Science.gov (United States)

    Gottscho, Andrew D.; Wood, Dustin A.; Vandergast, Amy; Lemos Espinal, Julio A.; Gatesy, John; Reeder, Tod

    2017-01-01

    Multi-locus nuclear DNA data were used to delimit species of fringe-toed lizards of theUma notata complex, which are specialized for living in wind-blown sand habitats in the deserts of southwestern North America, and to infer whether Quaternary glacial cycles or Tertiary geological events were important in shaping the historical biogeography of this group. We analyzed ten nuclear loci collected using Sanger sequencing and genome-wide sequence and single-nucleotide polymorphism (SNP) data collected using restriction-associated DNA (RAD) sequencing. A combination of species discovery methods (concatenated phylogenies, parametric and non-parametric clustering algorithms) and species validation approaches (coalescent-based species tree/isolation-with-migration models) were used to delimit species, infer phylogenetic relationships, and to estimate effective population sizes, migration rates, and speciation times. Uma notata, U. inornata, U. cowlesi, and an undescribed species from Mohawk Dunes, Arizona (U. sp.) were supported as distinct in the concatenated analyses and by clustering algorithms, and all operational taxonomic units were decisively supported as distinct species by ranking hierarchical nested speciation models with Bayes factors based on coalescent-based species tree methods. However, significant unidirectional gene flow (2NM >1) from U. cowlesi and U. notata into U. rufopunctata was detected under the isolation-with-migration model. Therefore, we conservatively delimit four species-level lineages within this complex (U. inornata, U. notata, U. cowlesi, and U. sp.), treating U. rufopunctata as a hybrid population (U. notata x cowlesi). Both concatenated and coalescent-based estimates of speciation times support the hypotheses that speciation within the complex occurred during the late Pleistocene, and that the geological evolution of the Colorado River delta during this period was an important process shaping the observed phylogeographic patterns.

  17. Prokaryotic Phylogenies Inferred from Whole-Genome Sequence and Annotation Data

    Directory of Open Access Journals (Sweden)

    Wei Du

    2013-01-01

    Full Text Available Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, genes involving potential HGT events are eliminated, since such genes are considered to be the highly conserved genes across different species and the genes located on fragments with abnormal genome barcode. Third, we calculate the distance of the orthologous gene clusters between each genome pair in terms of the number of orthologous genes in conserved clusters. Finally, the neighbor-joining method is employed to construct phylogenetic trees across different species. CGCPhy has been examined on different datasets from 617 complete single-chromosome prokaryotic genomes and achieved applicative accuracies on different species sets in agreement with Bergey's taxonomy in quartet topologies. Simulation results show that CGCPhy achieves high average accuracy and has a low standard deviation on different datasets, so it has an applicative potential for phylogenetic analysis.

  18. Measurement and inference of profile soil-water dynamics at different hillslope positions in a semiarid agricultural watershed

    Science.gov (United States)

    Green, Timothy R.; Erskine, Robert H.

    2011-12-01

    Dynamics of profile soil water vary with terrain, soil, and plant characteristics. The objectives addressed here are to quantify dynamic soil water content over a range of slope positions, infer soil profile water fluxes, and identify locations most likely influenced by multidimensional flow. The instrumented 56 ha watershed lies mostly within a dryland (rainfed) wheat field in semiarid eastern Colorado. Dielectric capacitance sensors were used to infer hourly soil water content for approximately 8 years (minus missing data) at 18 hillslope positions and four or more depths. Based on previous research and a new algorithm, sensor measurements (resonant frequency) were rescaled to estimate soil permittivity, then corrected for temperature effects on bulk electrical conductivity before inferring soil water content. Using a mass-conservation method, we analyzed multitemporal changes in soil water content at each sensor to infer the dynamics of water flux at different depths and landscape positions. At summit positions vertical processes appear to control profile soil water dynamics. At downslope positions infrequent overland flow and unsaturated subsurface lateral flow appear to influence soil water dynamics. Crop water use accounts for much of the variability in soil water between transects that are either cropped or fallow in alternating years, while soil hydraulic properties and near-surface hydrology affect soil water variability across landscape positions within each management zone. The observed spatiotemporal patterns exhibit the joint effects of short-term hydrology and long-term soil development. Quantitative methods of analyzing soil water patterns in space and time improve our understanding of dominant soil hydrological processes and provide alternative measures of model performance.

  19. Intergenic DNA sequences from the human X chromosome reveal high rates of global gene flow

    Directory of Open Access Journals (Sweden)

    Wall Jeffrey D

    2008-11-01

    Full Text Available Abstract Background Despite intensive efforts devoted to collecting human polymorphism data, little is known about the role of gene flow in the ancestry of human populations. This is partly because most analyses have applied one of two simple models of population structure, the island model or the splitting model, which make unrealistic biological assumptions. Results Here, we analyze 98-kb of DNA sequence from 20 independently evolving intergenic regions on the X chromosome in a sample of 90 humans from six globally diverse populations. We employ an isolation-with-migration (IM model, which assumes that populations split and subsequently exchange migrants, to independently estimate effective population sizes and migration rates. While the maximum effective size of modern humans is estimated at ~10,000, individual populations vary substantially in size, with African populations tending to be larger (2,300–9,000 than non-African populations (300–3,300. We estimate mean rates of bidirectional gene flow at 4.8 × 10-4/generation. Bidirectional migration rates are ~5-fold higher among non-African populations (1.5 × 10-3 than among African populations (2.7 × 10-4. Interestingly, because effective sizes and migration rates are inversely related in African and non-African populations, population migration rates are similar within Africa and Eurasia (e.g., global mean Nm = 2.4. Conclusion We conclude that gene flow has played an important role in structuring global human populations and that migration rates should be incorporated as critical parameters in models of human demography.

  20. Discovering time-lagged rules from microarray data using gene profile classifiers

    Directory of Open Access Journals (Sweden)

    Ponzoni Ignacio

    2011-04-01

    Full Text Available Abstract Background Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes. Results This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2, which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations. Conclusions A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation.

  1. Occurrence of alfalfa (Medicago sativa L.) populations along roadsides in southern Manitoba, Canada and their potential role in intraspecific gene flow.

    Science.gov (United States)

    Bagavathiannan, Muthukumar V; Gulden, Robert H; Van Acker, Rene C

    2011-04-01

    Alfalfa is a highly outcrossing perennial species that can be noticed in roadsides as feral populations. There remains little information available on the extent of feral alfalfa populations in western Canadian prairies and their role in gene flow. The main objectives of this study were (a) to document the occurrence of feral alfalfa populations, and (b) to estimate the levels of outcrossing facilitated by feral populations. A roadside survey confirmed widespread occurrence of feral alfalfa populations, particularly in alfalfa growing regions. The feral populations were dynamic and their frequency ranged from 0.2 to 1.7 populations km(-1). In many cases, the nearest feral alfalfa population from alfalfa production field was located within a distance sufficient for outcrossing in alfalfa. The gene flow study confirmed that genes can move back and forth between feral and cultivated alfalfa populations. In this study, the estimated outcrossing levels were 62% (seed fields to feral), 78% (feral to seed fields), 82% (hay fields to feral) and 85% (feral to feral). Overall, the results show that feral alfalfa plants are prevalent in alfalfa producing regions in western Canada and they can serve as bridges for gene flow at landscape level. Management of feral populations should be considered, if gene flow is a concern. Emphasis on preventing seed spill/escapes and intentional roadside planting of alfalfa cultivars will be particularly helpful. Further, realistic and pragmatic threshold levels should be established for markets sensitive to the presence of GE traits.

  2. Inferring kangaroo phylogeny from incongruent nuclear and mitochondrial genes.

    Directory of Open Access Journals (Sweden)

    Matthew J Phillips

    Full Text Available The marsupial genus Macropus includes three subgenera, the familiar large grazing kangaroos and wallaroos of M. (Macropus and M. (Osphranter, as well as the smaller mixed grazing/browsing wallabies of M. (Notamacropus. A recent study of five concatenated nuclear genes recommended subsuming the predominantly browsing Wallabia bicolor (swamp wallaby into Macropus. To further examine this proposal we sequenced partial mitochondrial genomes for kangaroos and wallabies. These sequences strongly favour the morphological placement of W. bicolor as sister to Macropus, although place M. irma (black-gloved wallaby within M. (Osphranter rather than as expected, with M. (Notamacropus. Species tree estimation from separately analysed mitochondrial and nuclear genes favours retaining Macropus and Wallabia as separate genera. A simulation study finds that incomplete lineage sorting among nuclear genes is a plausible explanation for incongruence with the mitochondrial placement of W. bicolor, while mitochondrial introgression from a wallaroo into M. irma is the deepest such event identified in marsupials. Similar such coalescent simulations for interpreting gene tree conflicts will increase in both relevance and statistical power as species-level phylogenetics enters the genomic age. Ecological considerations in turn, hint at a role for selection in accelerating the fixation of introgressed or incompletely sorted loci. More generally the inclusion of the mitochondrial sequences substantially enhanced phylogenetic resolution. However, we caution that the evolutionary dynamics that enhance mitochondria as speciation indicators in the presence of incomplete lineage sorting may also render them especially susceptible to introgression.

  3. Testing the effect of the Himalayan mountains as a physical barrier to gene flow in Hippophae tibetana Schlect. (Elaeagnaceae.

    Directory of Open Access Journals (Sweden)

    La Qiong

    Full Text Available Hippophae tibetana is a small, dioecious wind-pollinated shrub endemic to the Tibetan-Qinghai Plateau. It is one of the shrubs that occur at very high elevations (5250 m a.s.l.. The Himalayan mountains provides a significant geographical barrier to the Qinghai-Tibetan Plateau, dividing the Himalayan area into two regions with Nepal to the south and Tibet to the north. There is no information on how the Himalayan mountains influence gene flow and population differentiation of alpine plants. In this study, we analyzed eight nuclear microsatellite markers and cpDNA trnT-trnF regions to test the role of the Himalayan mountains as a barrier to gene flow between populations of H. tibetana. We also examined the fine-scale genetic structure within a population of H. tibetana on the north slope of Mount (Mt. Everest. For microsatellite analyses, a total of 241 individuals were sampled from seven populations in our study area (4 from Nepal, 3 from Tibet, including 121 individuals that were spatially mapped within a 100 m × 100 m plot. To test for seed flow, the cpDNA trnT-trnF regions of 100 individuals from 6 populations (4 from Nepal, 2 from Tibet were also sequenced. Significant genetic differentiation was detected between the two regions by both microsatellite and cpDNA data analyses. These two datasets agree about southern and northern population differentiation, indicating that the Himalayan mountains represent a barrier to H. tibetana limiting gene flow between these two areas. At a fine scale, spatial autocorrelation analysis suggests significant genetic structure within a distance of less than 45 m, which may be attributed mainly to vegetative reproduction and habitat fragmentation, as well as limited gene flow.

  4. Xp21 contiguous gene syndromes: Deletion quantitation with bivariate flow karyotyping allows mapping of patient breakpoints

    Energy Technology Data Exchange (ETDEWEB)

    McCabe, E.R.B.; Towbin, J.A. (Baylor College of Medicine, Houston, TX (United States)); Engh, G. van den; Trask, B.J. (Lawrence Livermore National Lab., CA (United States))

    1992-12-01

    Bivariate flow karyotyping was used to estimate the deletion sizes for a series of patients with Xp21 contiguous gene syndromes. The deletion estimates were used to develop an approximate scale for the genomic map in Xp21. The bivariate flow karyotype results were compared with clinical and molecular genetic information on the extent of the patients' deletions, and these various types of data were consistent. The resulting map spans >15 Mb, from the telomeric interval between DXS41 (99-6) and DXS68 (1-4) to a position centromeric to the ornithine transcarbamylase locus. The deletion sizing was considered to be accurate to [plus minus]1 Mb. The map provides information on the relative localization of genes and markers within this region. For example, the map suggests that the adrenal hypoplasia congenita and glycerol kinase genes are physically close to each other, are within 1-2 Mb of the telomeric end of the Duchenne muscular dystrophy (DMD) gene, and are nearer to the DMD locus than to the more distal marker DXS28 (C7). Information of this type is useful in developing genomic strategies for positional cloning in Xp21. These investigations demonstrate that the DNA from patients with Xp21 contiguous gene syndromes can be valuable reagents, not only for ordering loci and markers but also for providing an approximate scale to the map of the Xp21 region surrounding DMD. 44 refs., 3 figs.

  5. Assessment of genetic diversity, population structure, and gene flow of tigers (Panthera tigris tigris) across Nepal's Terai Arc Landscape.

    Science.gov (United States)

    Thapa, Kanchan; Manandhar, Sulochana; Bista, Manisha; Shakya, Jivan; Sah, Govind; Dhakal, Maheshwar; Sharma, Netra; Llewellyn, Bronwyn; Wultsch, Claudia; Waits, Lisette P; Kelly, Marcella J; Hero, Jean-Marc; Hughes, Jane; Karmacharya, Dibesh

    2018-01-01

    With fewer than 200 tigers (Panthera tigris tigris) left in Nepal, that are generally confined to five protected areas across the Terai Arc Landscape, genetic studies are needed to provide crucial information on diversity and connectivity for devising an effective country-wide tiger conservation strategy. As part of the Nepal Tiger Genome Project, we studied landscape change, genetic variation, population structure, and gene flow of tigers across the Terai Arc Landscape by conducting Nepal's first comprehensive and systematic scat-based, non-invasive genetic survey. Of the 770 scat samples collected opportunistically from five protected areas and six presumed corridors, 412 were tiger (57%). Out of ten microsatellite loci, we retain eight markers that were used in identifying 78 individual tigers. We used this dataset to examine population structure, genetic variation, contemporary gene flow, and potential population bottlenecks of tigers in Nepal. We detected three genetic clusters consistent with three demographic sub-populations and found moderate levels of genetic variation (He = 0.61, AR = 3.51) and genetic differentiation (FST = 0.14) across the landscape. We detected 3-7 migrants, confirming the potential for dispersal-mediated gene flow across the landscape. We found evidence of a bottleneck signature likely caused by large-scale land-use change documented in the last two centuries in the Terai forest. Securing tiger habitat including functional forest corridors is essential to enhance gene flow across the landscape and ensure long-term tiger survival. This requires cooperation among multiple stakeholders and careful conservation planning to prevent detrimental effects of anthropogenic activities on tigers.

  6. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk.

    Science.gov (United States)

    Cheng, Liang; Jiang, Yue; Ju, Hong; Sun, Jie; Peng, Jiajie; Zhou, Meng; Hu, Yang

    2018-01-19

    Since the establishment of the first biomedical ontology Gene Ontology (GO), the number of biomedical ontology has increased dramatically. Nowadays over 300 ontologies have been built including extensively used Disease Ontology (DO) and Human Phenotype Ontology (HPO). Because of the advantage of identifying novel relationships between terms, calculating similarity between ontology terms is one of the major tasks in this research area. Though similarities between terms within each ontology have been studied with in silico methods, term similarities across different ontologies were not investigated as deeply. The latest method took advantage of gene functional interaction network (GFIN) to explore such inter-ontology similarities of terms. However, it only used gene interactions and failed to make full use of the connectivity among gene nodes of the network. In addition, all existent methods are particularly designed for GO and their performances on the extended ontology community remain unknown. We proposed a method InfAcrOnt to infer similarities between terms across ontologies utilizing the entire GFIN. InfAcrOnt builds a term-gene-gene network which comprised ontology annotations and GFIN, and acquires similarities between terms across ontologies through modeling the information flow within the network by random walk. In our benchmark experiments on sub-ontologies of GO, InfAcrOnt achieves a high average area under the receiver operating characteristic curve (AUC) (0.9322 and 0.9309) and low standard deviations (1.8746e-6 and 3.0977e-6) in both human and yeast benchmark datasets exhibiting superior performance. Meanwhile, comparisons of InfAcrOnt results and prior knowledge on pair-wise DO-HPO terms and pair-wise DO-GO terms show high correlations. The experiment results show that InfAcrOnt significantly improves the performance of inferring similarities between terms across ontologies in benchmark set.

  7. Optimal inference with suboptimal models: Addiction and active Bayesian inference

    Science.gov (United States)

    Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl

    2015-01-01

    When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321

  8. Loss of the flagellum happened only once in the fungal lineage: phylogenetic structure of Kingdom Fungi inferred from RNA polymerase II subunit genes

    Directory of Open Access Journals (Sweden)

    Hodson Matthew C

    2006-09-01

    Full Text Available Abstract Background At present, there is not a widely accepted consensus view regarding the phylogenetic structure of kingdom Fungi although two major phyla, Ascomycota and Basidiomycota, are clearly delineated. Regarding the lower fungi, Zygomycota and Chytridiomycota, a variety of proposals have been advanced. Microsporidia may or may not be fungi; the Glomales (vesicular-arbuscular mycorrhizal fungi may or may not constitute a fifth fungal phylum, and the loss of the flagellum may have occurred either once or multiple times during fungal evolution. All of these issues are capable of being resolved by a molecular phylogenetic analysis which achieves strong statistical support for major branches. To date, no fungal phylogeny based upon molecular characters has satisfied this criterion. Results Using the translated amino acid sequences of the RPB1 and RPB2 genes, we have inferred a fungal phylogeny that consists largely of well-supported monophyletic phyla. Our major results, each with significant statistical support, are: (1 Microsporidia are sister to kingdom Fungi and are not members of Zygomycota; that is, Microsporidia and fungi originated from a common ancestor. (2 Chytridiomycota, the only fungal phylum having a developmental stage with a flagellum, is paraphyletic and is the basal lineage. (3 Zygomycota is monophyletic based upon sampling of Trichomycetes, Zygomycetes, and Glomales. (4 Zygomycota, Basidiomycota, and Ascomycota form a monophyletic group separate from Chytridiomycota. (5 Basidiomycota and Ascomycota are monophyletic sister groups. Conclusion In general, this paper highlights the evolutionary position and significance of the lower fungi (Zygomycota and Chytridiomycota. Our results suggest that loss of the flagellum happened only once during early stages of fungal evolution; consequently, the majority of fungi, unlike plants and animals, are nonflagellated. The phylogeny we infer from gene sequences is the first one that is

  9. Loss of the flagellum happened only once in the fungal lineage: phylogenetic structure of kingdom Fungi inferred from RNA polymerase II subunit genes.

    Science.gov (United States)

    Liu, Yajuan J; Hodson, Matthew C; Hall, Benjamin D

    2006-09-29

    At present, there is not a widely accepted consensus view regarding the phylogenetic structure of kingdom Fungi although two major phyla, Ascomycota and Basidiomycota, are clearly delineated. Regarding the lower fungi, Zygomycota and Chytridiomycota, a variety of proposals have been advanced. Microsporidia may or may not be fungi; the Glomales (vesicular-arbuscular mycorrhizal fungi) may or may not constitute a fifth fungal phylum, and the loss of the flagellum may have occurred either once or multiple times during fungal evolution. All of these issues are capable of being resolved by a molecular phylogenetic analysis which achieves strong statistical support for major branches. To date, no fungal phylogeny based upon molecular characters has satisfied this criterion. Using the translated amino acid sequences of the RPB1 and RPB2 genes, we have inferred a fungal phylogeny that consists largely of well-supported monophyletic phyla. Our major results, each with significant statistical support, are: (1) Microsporidia are sister to kingdom Fungi and are not members of Zygomycota; that is, Microsporidia and fungi originated from a common ancestor. (2) Chytridiomycota, the only fungal phylum having a developmental stage with a flagellum, is paraphyletic and is the basal lineage. (3) Zygomycota is monophyletic based upon sampling of Trichomycetes, Zygomycetes, and Glomales. (4) Zygomycota, Basidiomycota, and Ascomycota form a monophyletic group separate from Chytridiomycota. (5) Basidiomycota and Ascomycota are monophyletic sister groups. In general, this paper highlights the evolutionary position and significance of the lower fungi (Zygomycota and Chytridiomycota). Our results suggest that loss of the flagellum happened only once during early stages of fungal evolution; consequently, the majority of fungi, unlike plants and animals, are nonflagellated. The phylogeny we infer from gene sequences is the first one that is congruent with the widely accepted morphology

  10. Bayesian inference in mass flow simulations - from back calculation to prediction

    Science.gov (United States)

    Kofler, Andreas; Fischer, Jan-Thomas; Hellweger, Valentin; Huber, Andreas; Mergili, Martin; Pudasaini, Shiva; Fellin, Wolfgang; Oberguggenberger, Michael

    2017-04-01

    Mass flow simulations are an integral part of hazard assessment. Determining the hazard potential requires a multidisciplinary approach, including different scientific fields such as geomorphology, meteorology, physics, civil engineering and mathematics. An important task in snow avalanche simulation is to predict process intensities (runout, flow velocity and depth, ...). The application of probabilistic methods allows one to develop a comprehensive simulation concept, ranging from back to forward calculation and finally to prediction of mass flow events. In this context optimized parameter sets for the used simulation model or intensities of the modeled mass flow process (e.g. runout distances) are represented by probability distributions. Existing deterministic flow models, in particular with respect to snow avalanche dynamics, contain several parameters (e.g. friction). Some of these parameters are more conceptual than physical and their direct measurement in the field is hardly possible. Hence, parameters have to be optimized by matching simulation results to field observations. This inverse problem can be solved by a Bayesian approach (Markov chain Monte Carlo). The optimization process yields parameter distributions, that can be utilized for probabilistic reconstruction and prediction of avalanche events. Arising challenges include the limited amount of observations, correlations appearing in model parameters or observed avalanche characteristics (e.g. velocity and runout) and the accurate handling of ensemble simulations, always taking into account the related uncertainties. Here we present an operational Bayesian simulation framework with r.avaflow, the open source GIS simulation model for granular avalanches and debris flows.

  11. Large scale statistical inference of signaling pathways from RNAi and microarray data

    Directory of Open Access Journals (Sweden)

    Poustka Annemarie

    2007-10-01

    Full Text Available Abstract Background The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene expression. These secondary effects can be used to computationally reverse engineer features of the upstream signaling pathway. Results In this paper we address this challenging problem by extending previous work by Markowetz et al., who proposed a statistical framework to score networks hypotheses in a Bayesian manner. Our extensions go in three directions: First, we introduce a way to omit the data discretization step needed in the original framework via a calculation based on p-values instead. Second, we show how prior assumptions on the network structure can be incorporated into the scoring scheme using regularization techniques. Third and most important, we propose methods to scale up the original approach, which is limited to around 5 genes, to large scale networks. Conclusion Comparisons of these methods on artificial data are conducted. Our proposed module network is employed to infer the signaling network between 13 genes in the ER-α pathway in human MCF-7 breast cancer cells. Using a bootstrapping approach this reconstruction can be found with good statistical stability. The code for the module network inference method is available in the latest version of the R-package nem, which can be obtained from the Bioconductor homepage.

  12. Dispersal and gene flow in the rare, parasitic Large Blue butterfly Maculinea arion

    DEFF Research Database (Denmark)

    Ugelvig, Line Vej; Andersen, Anne; Boomsma, Jacobus Jan

    2012-01-01

    decline throughout Europe and extinction in Britain followed by reintroduction of a seed population from the Swedish island of Öland. We find that populations are highly structured genetically, but that gene flow occurs over distances 15 times longer than the maximum distance recorded from mark...

  13. Inference rule and problem solving

    Energy Technology Data Exchange (ETDEWEB)

    Goto, S

    1982-04-01

    Intelligent information processing signifies an opportunity of having man's intellectual activity executed on the computer, in which inference, in place of ordinary calculation, is used as the basic operational mechanism for such an information processing. Many inference rules are derived from syllogisms in formal logic. The problem of programming this inference function is referred to as a problem solving. Although logically inference and problem-solving are in close relation, the calculation ability of current computers is on a low level for inferring. For clarifying the relation between inference and computers, nonmonotonic logic has been considered. The paper deals with the above topics. 16 references.

  14. Generating Models of Infinite-State Communication Protocols Using Regular Inference with Abstraction

    Science.gov (United States)

    Aarts, Fides; Jonsson, Bengt; Uijen, Johan

    In order to facilitate model-based verification and validation, effort is underway to develop techniques for generating models of communication system components from observations of their external behavior. Most previous such work has employed regular inference techniques which generate modest-size finite-state models. They typically suppress parameters of messages, although these have a significant impact on control flow in many communication protocols. We present a framework, which adapts regular inference to include data parameters in messages and states for generating components with large or infinite message alphabets. A main idea is to adapt the framework of predicate abstraction, successfully used in formal verification. Since we are in a black-box setting, the abstraction must be supplied externally, using information about how the component manages data parameters. We have implemented our techniques by connecting the LearnLib tool for regular inference with the protocol simulator ns-2, and generated a model of the SIP component as implemented in ns-2.

  15. Knowledge and inference

    CERN Document Server

    Nagao, Makoto

    1990-01-01

    Knowledge and Inference discusses an important problem for software systems: How do we treat knowledge and ideas on a computer and how do we use inference to solve problems on a computer? The book talks about the problems of knowledge and inference for the purpose of merging artificial intelligence and library science. The book begins by clarifying the concept of """"knowledge"""" from many points of view, followed by a chapter on the current state of library science and the place of artificial intelligence in library science. Subsequent chapters cover central topics in the artificial intellig

  16. Use of Gene Expression Programming in regionalization of flow duration curve

    Science.gov (United States)

    Hashmi, Muhammad Z.; Shamseldin, Asaad Y.

    2014-06-01

    In this paper, a recently introduced artificial intelligence technique known as Gene Expression Programming (GEP) has been employed to perform symbolic regression for developing a parametric scheme of flow duration curve (FDC) regionalization, to relate selected FDC characteristics to catchment characteristics. Stream flow records of selected catchments located in the Auckland Region of New Zealand were used. FDCs of the selected catchments were normalised by dividing the ordinates by their median value. Input for the symbolic regression analysis using GEP was (a) selected characteristics of normalised FDCs; and (b) 26 catchment characteristics related to climate, morphology, soil properties and land cover properties obtained using the observed data and GIS analysis. Our study showed that application of this artificial intelligence technique expedites the selection of a set of the most relevant independent variables out of a large set, because these are automatically selected through the GEP process. Values of the FDC characteristics obtained from the developed relationships have high correlations with the observed values.

  17. Extensive dispersal of Roanoke logperch (Percina rex) inferred from genetic marker data

    Science.gov (United States)

    Roberts, James H.; Angermeier, Paul; Hallerman, Eric M.

    2016-01-01

    The dispersal ecology of most stream fishes is poorly characterised, complicating conservation efforts for these species. We used microsatellite DNA marker data to characterise dispersal patterns and effective population size (Ne) for a population of Roanoke logperchPercina rex, an endangered darter (Percidae). Juveniles and candidate parents were sampled for 2 years at sites throughout the Roanoke River watershed. Dispersal was inferred via genetic assignment tests (ATs), pedigree reconstruction (PR) and estimation of lifetime dispersal distance under a genetic isolation-by-distance model. Estimates of Ne varied from 105 to 1218 individuals, depending on the estimation method. Based on PR, polygamy was frequent in parents of both sexes, with individuals spawning with an average of 2.4 mates. The sample contained 61 half-sibling pairs, but only one parent–offspring pair and no full-sib pairs, which limited our ability to discriminate natal dispersal of juveniles from breeding dispersal of their parents between spawning events. Nonetheless, all methods indicated extensive dispersal. The AT indicated unrestricted dispersal among sites ≤15 km apart, while siblings inferred by the PR were captured an average of 14 km and up to 55 km apart. Model-based estimates of median lifetime dispersal distance (6–24 km, depending on assumptions) bracketed AT and PR estimates, indicating that widely dispersed individuals do, on average, contribute to gene flow. Extensive dispersal of P. rex suggests that darters and other small benthic stream fishes may be unexpectedly mobile. Monitoring and management activities for such populations should encompass entire watersheds to fully capture population dynamics.

  18. Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data

    Directory of Open Access Journals (Sweden)

    de los Reyes Benildo G

    2008-04-01

    Full Text Available Abstract Background Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published data. Integrating these complementary datasets helps infer a mutually consistent transcriptional regulatory network (TRN with strong similarity to the structure of the underlying genetic regulatory modules. Decomposing the TRN into a small set of recurring regulatory patterns, called network motifs (NM, facilitates the inference. Identifying NMs defined by specific transcription factors (TF establishes the framework structure of a TRN and allows the inference of TF-target gene relationship. This paper introduces a computational framework for utilizing data from multiple sources to infer TF-target gene relationships on the basis of NMs. The data include time course gene expression profiles, genome-wide location analysis data, binding sequence data, and gene ontology (GO information. Results The proposed computational framework was tested using gene expression data associated with cell cycle progression in yeast. Among 800 cell cycle related genes, 85 were identified as candidate TFs and classified into four previously defined NMs. The NMs for a subset of TFs are obtained from literature. Support vector machine (SVM classifiers were used to estimate NMs for the remaining TFs. The potential downstream target genes for the TFs were clustered into 34 biologically significant groups. The relationships between TFs and potential target gene clusters were examined by training recurrent neural networks whose topologies mimic the NMs to which the TFs are classified. The identified relationships between TFs and gene clusters were evaluated using the following biological validation and statistical analyses: (1 Gene set enrichment

  19. Geometric statistical inference

    International Nuclear Information System (INIS)

    Periwal, Vipul

    1999-01-01

    A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined

  20. Phylogenetic position of Loricifera inferred from nearly complete 18S and 28S rRNA gene sequences.

    Science.gov (United States)

    Yamasaki, Hiroshi; Fujimoto, Shinta; Miyazaki, Katsumi

    2015-01-01

    Loricifera is an enigmatic metazoan phylum; its morphology appeared to place it with Priapulida and Kinorhyncha in the group Scalidophora which, along with Nematoida (Nematoda and Nematomorpha), comprised the group Cycloneuralia. Scarce molecular data have suggested an alternative phylogenetic hypothesis, that the phylum Loricifera is a sister taxon to Nematomorpha, although the actual phylogenetic position of the phylum remains unclear. Ecdysozoan phylogeny was reconstructed through maximum-likelihood (ML) and Bayesian inference (BI) analyses of nuclear 18S and 28S rRNA gene sequences from 60 species representing all eight ecdysozoan phyla, and including a newly collected loriciferan species. Ecdysozoa comprised two clades with high support values in both the ML and BI trees. One consisted of Priapulida and Kinorhyncha, and the other of Loricifera, Nematoida, and Panarthropoda (Tardigrada, Onychophora, and Arthropoda). The relationships between Loricifera, Nematoida, and Panarthropoda were not well resolved. Loricifera appears to be closely related to Nematoida and Panarthropoda, rather than grouping with Priapulida and Kinorhyncha, as had been suggested by previous studies. Thus, both Scalidophora and Cycloneuralia are a polyphyletic or paraphyletic groups. In addition, Loricifera and Nematomorpha did not emerge as sister groups.

  1. Bayesian Inference of Forces Causing Cytoplasmic Streaming in Caenorhabditis elegans Embryos and Mouse Oocytes.

    Science.gov (United States)

    Niwayama, Ritsuya; Nagao, Hiromichi; Kitajima, Tomoya S; Hufnagel, Lars; Shinohara, Kyosuke; Higuchi, Tomoyuki; Ishikawa, Takuji; Kimura, Akatsuki

    2016-01-01

    Cellular structures are hydrodynamically interconnected, such that force generation in one location can move distal structures. One example of this phenomenon is cytoplasmic streaming, whereby active forces at the cell cortex induce streaming of the entire cytoplasm. However, it is not known how the spatial distribution and magnitude of these forces move distant objects within the cell. To address this issue, we developed a computational method that used cytoplasm hydrodynamics to infer the spatial distribution of shear stress at the cell cortex induced by active force generators from experimentally obtained flow field of cytoplasmic streaming. By applying this method, we determined the shear-stress distribution that quantitatively reproduces in vivo flow fields in Caenorhabditis elegans embryos and mouse oocytes during meiosis II. Shear stress in mouse oocytes were predicted to localize to a narrower cortical region than that with a high cortical flow velocity and corresponded with the localization of the cortical actin cap. The predicted patterns of pressure gradient in both species were consistent with species-specific cytoplasmic streaming functions. The shear-stress distribution inferred by our method can contribute to the characterization of active force generation driving biological streaming.

  2. Alignment-free genome tree inference by learning group-specific distance metrics.

    Science.gov (United States)

    Patil, Kaustubh R; McHardy, Alice C

    2013-01-01

    Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.

  3. PlantTribes: a gene and gene family resource for comparative genomics in plants

    OpenAIRE

    Wall, P. Kerr; Leebens-Mack, Jim; Müller, Kai F.; Field, Dawn; Altman, Naomi S.; dePamphilis, Claude W.

    2007-01-01

    The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, ca...

  4. Convergent evolution of gene networks by single-gene duplications in higher eukaryotes

    OpenAIRE

    Amoutzias, Gregory D; Robertson, David L; Oliver, Stephen G; Bornberg-Bauer, Erich

    2004-01-01

    By combining phylogenetic, proteomic and structural information, we have elucidated the evolutionary driving forces for the gene-regulatory interaction networks of basic helix–loop–helix transcription factors. We infer that recurrent events of single-gene duplication and domain rearrangement repeatedly gave rise to distinct networks with almost identical hub-based topologies, and multiple activators and repressors. We thus provide the first empirical evidence for scale-free protein networks e...

  5. Genetic population structure of the desert shrub species lycium ruthenicum inferred from chloroplast dna

    International Nuclear Information System (INIS)

    Chen, H.; Yonezawa, T.

    2014-01-01

    Lycium ruthenicum (Solananeae), a spiny shrub mostly distributed in the desert regions of north and northwest China, has been shown to exhibit high tolerance to the extreme environment. In this study, the phylogeography and evolutionary history of L. ruthenicum were examined, on the basis of 80 individuals from eight populations. Using the sequence variations of two spacer regions of chloroplast DNA (trnH-psbA and rps16-trnK) , the absence of a geographic component in the chloroplast DNA genetic structure was identified (GST = 0.351, NST = 0.304, NST< GST), which was consisted with the result of SAMOVA, suggesting weak phylogeographic structure of this species. Phylogenetic and network analyses showed that a total of 10 haplotypes identified in the present study clustered into two clades, in which clade I harbored the ancestral haplotypes that inferred two independent glacial refugia in the middle of Qaidam Basin and the western Inner Mongolia. The existence of regional evolutionary differences was supported by GENETREE, which revealed that one of the population in Qaidam Basin and the two populations in Tarim Basin had experienced rapid expansion, and the other populations retained relatively stable population size during the Pleistocene . Given the results of long-term gene flow and pairwise differences, strong gene flow was insufficient to reduce the genetic differentiation among populations or within populations, probably due to the genetic composition containing a common haplotype and the high number of private haplotypes fixed for most of the population. The divergence times of different lineages were consistent with the rapid uplift phases of the Qinghai-Tibetan Plateau and the initiation and expansion of deserts in northern China, suggesting that the origin and evolution of L. ruthenicum were strongly influenced by Quaternary environment changes. (author)

  6. Improving the extraction of complex regulatory events from scientific text by using ontology-based inference.

    Science.gov (United States)

    Kim, Jung-Jae; Rebholz-Schuhmann, Dietrich

    2011-10-06

    The extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge. We developed a system that deduces implicit events from explicitly expressed events by using inference rules that encode domain knowledge. We evaluated the system with the inference module on three tasks: First, when tested against a corpus with manually annotated events, the inference module of our system contributes 53.2% of correct extractions, but does not cause any incorrect results. Second, the system overall reproduces 33.1% of the transcription regulatory events contained in RegulonDB (up to 85.0% precision) and the inference module is required for 93.8% of the reproduced events. Third, we applied the system with minimum adaptations to the identification of cell activity regulation events, confirming that the inference improves the performance of the system also on this task. Our research shows that the inference based on domain knowledge plays a significant role in extracting complex events from text. This approach has great potential in recognizing the complex concepts of such biomedical ontologies as Gene Ontology in the literature.

  7. Improving the extraction of complex regulatory events from scientific text by using ontology-based inference

    Directory of Open Access Journals (Sweden)

    Kim Jung-jae

    2011-10-01

    Full Text Available Abstract Background The extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge. Results We developed a system that deduces implicit events from explicitly expressed events by using inference rules that encode domain knowledge. We evaluated the system with the inference module on three tasks: First, when tested against a corpus with manually annotated events, the inference module of our system contributes 53.2% of correct extractions, but does not cause any incorrect results. Second, the system overall reproduces 33.1% of the transcription regulatory events contained in RegulonDB (up to 85.0% precision and the inference module is required for 93.8% of the reproduced events. Third, we applied the system with minimum adaptations to the identification of cell activity regulation events, confirming that the inference improves the performance of the system also on this task. Conclusions Our research shows that the inference based on domain knowledge plays a significant role in extracting complex events from text. This approach has great potential in recognizing the complex concepts of such biomedical ontologies as Gene Ontology in the literature.

  8. Goal inferences about robot behavior : goal inferences and human response behaviors

    NARCIS (Netherlands)

    Broers, H.A.T.; Ham, J.R.C.; Broeders, R.; De Silva, P.; Okada, M.

    2014-01-01

    This explorative research focused on the goal inferences human observers draw based on a robot's behavior, and the extent to which those inferences predict people's behavior in response to that robot. Results show that different robot behaviors cause different response behavior from people.

  9. The gene regulatory network for breast cancer: Integrated regulatory landscape of cancer hallmarks

    Directory of Open Access Journals (Sweden)

    Frank eEmmert-Streib

    2014-02-01

    Full Text Available In this study, we infer the breast cancer gene regulatory network from gene expression data. This network is obtained from the application of the BC3Net inference algorithm to a large-scale gene expression data set consisting of $351$ patient samples. In order to elucidate the functional relevance of the inferred network, we are performing a Gene Ontology (GO analysis for its structural components. Our analysis reveals that most significant GO-terms we find for the breast cancer network represent functional modules of biological processes that are described by known cancer hallmarks, including translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. Furthermore, by using a curated list of census cancer genes, we find an enrichment in these functional modules. Finally, we study cooperative effects of chromosomes based on information of interacting genes in the beast cancer network. We find that chromosome $21$ is most coactive with other chromosomes. To our knowledge this is the first study investigating the genome-scale breast cancer network.

  10. Streamflow Forecasting Using Nuero-Fuzzy Inference System

    Science.gov (United States)

    Nanduri, U. V.; Swain, P. C.

    2005-12-01

    The prediction of flow into a reservoir is fundamental in water resources planning and management. The need for timely and accurate streamflow forecasting is widely recognized and emphasized by many in water resources fraternity. Real-time forecasts of natural inflows to reservoirs are of particular interest for operation and scheduling. The physical system of the river basin that takes the rainfall as an input and produces the runoff is highly nonlinear, complicated and very difficult to fully comprehend. The system is influenced by large number of factors and variables. The large spatial extent of the systems forces the uncertainty into the hydrologic information. A variety of methods have been proposed for forecasting reservoir inflows including conceptual (physical) and empirical (statistical) models (WMO 1994), but none of them can be considered as unique superior model (Shamseldin 1997). Owing to difficulties of formulating reasonable non-linear watershed models, recent attempts have resorted to Neural Network (NN) approach for complex hydrologic modeling. In recent years the use of soft computing in the field of hydrological forecasting is gaining ground. The relatively new soft computing technique of Adaptive Neuro-Fuzzy Inference System (ANFIS), developed by Jang (1993) is able to take care of the non-linearity, uncertainty, and vagueness embedded in the system. It is a judicious combination of the Neural Networks and fuzzy systems. It can learn and generalize highly nonlinear and uncertain phenomena due to the embedded neural network (NN). NN is efficient in learning and generalization, and the fuzzy system mimics the cognitive capability of human brain. Hence, ANFIS can learn the complicated processes involved in the basin and correlate the precipitation to the corresponding discharge. In the present study, one step ahead forecasts are made for ten-daily flows, which are mostly required for short term operational planning of multipurpose reservoirs. A

  11. Convergent evolution of gene networks by single-gene duplications in higher eukaryotes.

    Science.gov (United States)

    Amoutzias, Gregory D; Robertson, David L; Oliver, Stephen G; Bornberg-Bauer, Erich

    2004-03-01

    By combining phylogenetic, proteomic and structural information, we have elucidated the evolutionary driving forces for the gene-regulatory interaction networks of basic helix-loop-helix transcription factors. We infer that recurrent events of single-gene duplication and domain rearrangement repeatedly gave rise to distinct networks with almost identical hub-based topologies, and multiple activators and repressors. We thus provide the first empirical evidence for scale-free protein networks emerging through single-gene duplications, the dominant importance of molecular modularity in the bottom-up construction of complex biological entities, and the convergent evolution of networks.

  12. Gene Flow Results in High Genetic Similarity Between Sibiraea (Rosaceae species in the Qinghai-Tibetan Plateau

    Directory of Open Access Journals (Sweden)

    Peng-Cheng Fu

    2016-10-01

    Full Text Available Studying closely related species and divergent populations provides insight into the process of speciation. Previous studies showed that the Sibiraea complex's evolutionary history on the Qinghai-Tibetan Plateau (QTP was confusing and could not be distinguishable on the molecular level. In this study, the genetic structure and gene flow of S. laevigata and S. angustata on the QTP was examined across 45 populations using 8 microsatellite loci. Microsatellites revealed high genetic diversity in Sibiraea populations. Most of the variance was detected within populations (87.45% rather than between species (4.39%. We found no significant correlations between genetic and geographical distances among populations. Bayesian cluster analysis grouped all individuals in the sympatric area of Sibiraea into one cluster and other individuals of S. angustata into another. Divergence history analysis based on the approximate Bayesian computation method indicated that the populations of S. angustata at the sympatric area derived from the admixture of 2 species. The assignment test assigned all individuals to populations of their own species rather than its congeneric species. Consistently, intraspecies were detected rather than interspecies first-generation migrants. The bidirectional gene flow in long-term patterns between the 2 species was asymmetric, with more from S. angustata to S. laevigata. In conclusion, the Sibiraea complex was distinguishable on the molecular level using microsatellite loci. We found that the high genetic similarity of these 2 species resulted from huge bidirectional gene flow, especially on the sympatric area where population admixtures between the species occurred.

  13. A spatial assessment of Brassica napus gene flow potential to wild and weedy relatives in the Fynbos Biome

    Directory of Open Access Journals (Sweden)

    J. M. Kalwij

    2010-01-01

    Full Text Available Gene flow between related plant species, and between transgenic and non-transgenic crop varieties, may be considered a form of biological invasion. Brassica napus (oilseed rape or canola and its relatives are well known for intra- and inter-specific gene flow, hybridisation and weediness. Gene flow associated with B. napus poses a potential ecological risk in the Fynbos Biome of South Africa, because of the existence of both naturalised (alien, weedy and native relatives in this region. This risk is particularly pertinent given the proposed use of B. napus for biofuel and the potential future introduction of herbicide-tolerant transgenic B. napus. Here we quantify the presence and co-occurrence of B. napus and its wild and weedy relatives in the Fynbos Biome, as a first step in the ecological risk assessment for this crop. Several alien and at least one native relative of B. napus were found to be prevalent in the region, and to be spatially congruent with B. napus fields. The first requirement for potential gene flow to occur has thus been met. In addition, a number of these species have elsewhere been found to be reproductively compatible with B. napus. Further assessment of the potential ecological risks associated with B. napus in South Africa is constrained by uncertainties in the phylogeny of the Brassicaceae, difficulties with morphology-based identification, and poor knowledge of the biology of several of the species involved, particularly under South African conditions.

  14. AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data.

    Science.gov (United States)

    Schaefer, Nathan K; Shapiro, Beth; Green, Richard E

    2017-04-04

    Inferring the ancestry of each region of admixed individuals' genomes is useful in studies ranging from disease gene mapping to speciation genetics. Current methods require high-coverage genotype data and phased reference panels, and are therefore inappropriate for many data sets. We present a software application, AD-LIBS, that uses a hidden Markov model to infer ancestry across hybrid genomes without requiring variant calling or phasing. This approach is useful for non-model organisms and in cases of low-coverage data, such as ancient DNA. We demonstrate the utility of AD-LIBS with synthetic data. We then use AD-LIBS to infer ancestry in two published data sets: European human genomes with Neanderthal ancestry and brown bear genomes with polar bear ancestry. AD-LIBS correctly infers 87-91% of ancestry in simulations and produces ancestry maps that agree with published results and global ancestry estimates in humans. In brown bears, we find more polar bear ancestry than has been published previously, using both AD-LIBS and an existing software application for local ancestry inference, HAPMIX. We validate AD-LIBS polar bear ancestry maps by recovering a geographic signal within bears that mirrors what is seen in SNP data. Finally, we demonstrate that AD-LIBS is more effective than HAPMIX at inferring ancestry when preexisting phased reference data are unavailable and genomes are sequenced to low coverage. AD-LIBS is an effective tool for ancestry inference that can be used even when few individuals are available for comparison or when genomes are sequenced to low coverage. AD-LIBS is therefore likely to be useful in studies of non-model or ancient organisms that lack large amounts of genomic DNA. AD-LIBS can therefore expand the range of studies in which admixture mapping is a viable tool.

  15. Phylodynamic Inference with Kernel ABC and Its Application to HIV Epidemiology.

    Science.gov (United States)

    Poon, Art F Y

    2015-09-01

    The shapes of phylogenetic trees relating virus populations are determined by the adaptation of viruses within each host, and by the transmission of viruses among hosts. Phylodynamic inference attempts to reverse this flow of information, estimating parameters of these processes from the shape of a virus phylogeny reconstructed from a sample of genetic sequences from the epidemic. A key challenge to phylodynamic inference is quantifying the similarity between two trees in an efficient and comprehensive way. In this study, I demonstrate that a new distance measure, based on a subset tree kernel function from computational linguistics, confers a significant improvement over previous measures of tree shape for classifying trees generated under different epidemiological scenarios. Next, I incorporate this kernel-based distance measure into an approximate Bayesian computation (ABC) framework for phylodynamic inference. ABC bypasses the need for an analytical solution of model likelihood, as it only requires the ability to simulate data from the model. I validate this "kernel-ABC" method for phylodynamic inference by estimating parameters from data simulated under a simple epidemiological model. Results indicate that kernel-ABC attained greater accuracy for parameters associated with virus transmission than leading software on the same data sets. Finally, I apply the kernel-ABC framework to study a recent outbreak of a recombinant HIV subtype in China. Kernel-ABC provides a versatile framework for phylodynamic inference because it can fit a broader range of models than methods that rely on the computation of exact likelihoods. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  16. Entropic Inference

    OpenAIRE

    Caticha, Ariel

    2010-01-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEn...

  17. FUZZY INFERENCE SYSTEM MODELING FOR BED ACTIVE CARBON RE-GENERATION PROCESS (CO2 GAS FACTORY CASE

    Directory of Open Access Journals (Sweden)

    S. Febriana

    2005-01-01

    Full Text Available Bed active carbon is one of the most important materials that had great impact in determining level of impurities in production of CO2 gas. In this particular factory case, there is unavailability of standard duration time of heating and cooling and steam flow rate for the re-generation process of bed active carbon. The paper discusses the fuzzy inference system for modeling of re-generation process of bed active carbon to find the optimum setting parameter. The fuzzy inference system was build using real historical daily processing data. After validation process, surface plot analysis was performed to find the optimum setting. The result of re-generation parameter setting is 9-10 hours of heating process, 4.66-5.32 hours of cooling process, and 1500-2500 kg/hr of steam flow rate.

  18. Gene expression responses of HeLa cells to chemical species generated by an atmospheric plasma flow

    International Nuclear Information System (INIS)

    Yokoyama, Mayo; Johkura, Kohei; Sato, Takehiko

    2014-01-01

    Highlights: • Response of HeLa cells to a plasma-irradiated medium was revealed by DNA microarray. • Gene expression pattern was basically different from that in a H 2 O 2 -added medium. • Prominently up-/down-regulated genes were partly shared by the two media. • Gene ontology analysis showed both similar and different responses in the two media. • Candidate genes involved in response to ROS were detected in each medium. - Abstract: Plasma irradiation generates many factors able to affect the cellular condition, and this feature has been studied for its application in the field of medicine. We previously reported that hydrogen peroxide (H 2 O 2 ) was the major cause of HeLa cell death among the chemical species generated by high level irradiation of a culture medium by atmospheric plasma. To assess the effect of plasma-induced factors on the response of live cells, HeLa cells were exposed to a medium irradiated by a non-lethal plasma flow level, and their gene expression was broadly analyzed by DNA microarray in comparison with that in a corresponding concentration of 51 μM H 2 O 2 . As a result, though the cell viability was sufficiently maintained at more than 90% in both cases, the plasma-medium had a greater impact on it than the H 2 O 2 -medium. Hierarchical clustering analysis revealed fundamentally different cellular responses between these two media. A larger population of genes was upregulated in the plasma-medium, whereas genes were downregulated in the H 2 O 2 -medium. However, a part of the genes that showed prominent differential expression was shared by them, including an immediate early gene ID2. In gene ontology analysis of upregulated genes, the plasma-medium showed more diverse ontologies than the H 2 O 2 -medium, whereas ontologies such as “response to stimulus” were common, and several genes corresponded to “response to reactive oxygen species.” Genes of AP-1 proteins, e.g., JUN and FOS, were detected and notably elevated in

  19. Gene expression responses of HeLa cells to chemical species generated by an atmospheric plasma flow

    Energy Technology Data Exchange (ETDEWEB)

    Yokoyama, Mayo, E-mail: yokoyama@plasma.ifs.tohoku.ac.jp [Institute of Fluid Science, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai 980-8577 (Japan); Johkura, Kohei, E-mail: kohei@shinshu-u.ac.jp [Department of Histology and Embryology, Shinshu University School of Medicine, 3-1-1 Asahi, Matsumoto 390-8621 (Japan); Sato, Takehiko, E-mail: sato@ifs.tohoku.ac.jp [Institute of Fluid Science, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai 980-8577 (Japan)

    2014-08-08

    Highlights: • Response of HeLa cells to a plasma-irradiated medium was revealed by DNA microarray. • Gene expression pattern was basically different from that in a H{sub 2}O{sub 2}-added medium. • Prominently up-/down-regulated genes were partly shared by the two media. • Gene ontology analysis showed both similar and different responses in the two media. • Candidate genes involved in response to ROS were detected in each medium. - Abstract: Plasma irradiation generates many factors able to affect the cellular condition, and this feature has been studied for its application in the field of medicine. We previously reported that hydrogen peroxide (H{sub 2}O{sub 2}) was the major cause of HeLa cell death among the chemical species generated by high level irradiation of a culture medium by atmospheric plasma. To assess the effect of plasma-induced factors on the response of live cells, HeLa cells were exposed to a medium irradiated by a non-lethal plasma flow level, and their gene expression was broadly analyzed by DNA microarray in comparison with that in a corresponding concentration of 51 μM H{sub 2}O{sub 2}. As a result, though the cell viability was sufficiently maintained at more than 90% in both cases, the plasma-medium had a greater impact on it than the H{sub 2}O{sub 2}-medium. Hierarchical clustering analysis revealed fundamentally different cellular responses between these two media. A larger population of genes was upregulated in the plasma-medium, whereas genes were downregulated in the H{sub 2}O{sub 2}-medium. However, a part of the genes that showed prominent differential expression was shared by them, including an immediate early gene ID2. In gene ontology analysis of upregulated genes, the plasma-medium showed more diverse ontologies than the H{sub 2}O{sub 2}-medium, whereas ontologies such as “response to stimulus” were common, and several genes corresponded to “response to reactive oxygen species.” Genes of AP-1 proteins, e.g., JUN

  20. Population genetics of Southern Hemisphere tope shark (Galeorhinus galeus: Intercontinental divergence and constrained gene flow at different geographical scales.

    Directory of Open Access Journals (Sweden)

    Aletta E Bester-van der Merwe

    Full Text Available The tope shark (Galeorhinus galeus Linnaeus, 1758 is a temperate, coastal hound shark found in the Atlantic and Indo-Pacific oceans. In this study, the population structure of Galeorhinus galeus was determined across the entire Southern Hemisphere, where the species is heavily targeted by commercial fisheries, as well as locally, along the South African coastline. Analysis was conducted on a total of 185 samples using 19 microsatellite markers and a 671 bp fragment of the NADH dehydrogenase subunit 2 (ND2 gene. Across the Southern Hemisphere, three geographically distinct clades were recovered, including one from South America (Argentina, Chile, one from Africa (all the South African collections and an Australia-New Zealand clade. Nuclear data revealed significant population subdivisions (FST = 0.192 to 0.376, p<0.05 indicating limited gene flow for tope sharks across ocean basins. Marked population connectivity was however evident across the Indian Ocean based on Bayesian clustering analysis. More locally in South Africa, F-statistics and multivariate analysis supported moderate to high gene flow across the Atlantic/Indian Ocean boundary (FST = 0.035 to 0.044, p<0.05, with exception of samples from Struisbaai and Port Elizabeth which differed significantly from the rest. Discriminant and Bayesian clustering analysis indicated admixture in all sampling populations, decreasing from west to east, corroborating possible restriction to gene flow across regional oceanographic barriers. Mitochondrial sequence data recovered seven haplotypes (h = 0.216, π = 0.001 for South Africa, with one major haplotype shared by 87% of the individuals and at least one private haplotype for each sampling location except Port Elizabeth. As with many other coastal shark species with cosmopolitan distribution, this study confirms the lack of both historical dispersal and inter-oceanic gene flow while also implicating contemporary factors such as oceanic currents and

  1. Bayesian quantification of thermodynamic uncertainties in dense gas flows

    International Nuclear Information System (INIS)

    Merle, X.; Cinnella, P.

    2015-01-01

    A Bayesian inference methodology is developed for calibrating complex equations of state used in numerical fluid flow solvers. Precisely, the input parameters of three equations of state commonly used for modeling the thermodynamic behavior of the so-called dense gas flows, – i.e. flows of gases characterized by high molecular weights and complex molecules, working in thermodynamic conditions close to the liquid–vapor saturation curve – are calibrated by means of Bayesian inference from reference aerodynamic data for a dense gas flow over a wing section. Flow thermodynamic conditions are such that the gas thermodynamic behavior strongly deviates from that of a perfect gas. In the aim of assessing the proposed methodology, synthetic calibration data – specifically, wall pressure data – are generated by running the numerical solver with a more complex and accurate thermodynamic model. The statistical model used to build the likelihood function includes a model-form inadequacy term, accounting for the gap between the model output associated to the best-fit parameters and the true phenomenon. Results show that, for all of the relatively simple models under investigation, calibrations lead to informative posterior probability density distributions of the input parameters and improve the predictive distribution significantly. Nevertheless, calibrated parameters strongly differ from their expected physical values. The relationship between this behavior and model-form inadequacy is discussed. - Highlights: • Development of a Bayesian inference procedure for calibrating dense-gas flow solvers. • Complex thermodynamic models calibrated by using aerodynamic data for the flow. • Preliminary Sobol analysis used to reduce parameter space. • Piecewise polynomial surrogate model constructed to reduce computational cost. • Calibration results show the crucial role played by model-form inadequacies

  2. DNA context represents transcription regulation of the gene in mouse embryonic stem cells

    Science.gov (United States)

    Ha, Misook; Hong, Soondo

    2016-04-01

    Understanding gene regulatory information in DNA remains a significant challenge in biomedical research. This study presents a computational approach to infer gene regulatory programs from primary DNA sequences. Using DNA around transcription start sites as attributes, our model predicts gene regulation in the gene. We find that H3K27ac around TSS is an informative descriptor of the transcription program in mouse embryonic stem cells. We build a computational model inferring the cell-type-specific H3K27ac signatures in the DNA around TSS. A comparison of embryonic stem cell and liver cell-specific H3K27ac signatures in DNA shows that the H3K27ac signatures in DNA around TSS efficiently distinguish the cell-type specific H3K27ac peaks and the gene regulation. The arrangement of the H3K27ac signatures inferred from the DNA represents the transcription regulation of the gene in mESC. We show that the DNA around transcription start sites is associated with the gene regulatory program by specific interaction with H3K27ac.

  3. Comparative inference of duplicated genes produced by polyploidization in soybean genome.

    Science.gov (United States)

    Yang, Yanmei; Wang, Jinpeng; Di, Jianyong

    2013-01-01

    Soybean (Glycine max) is one of the most important crop plants for providing protein and oil. It is important to investigate soybean genome for its economic and scientific value. Polyploidy is a widespread and recursive phenomenon during plant evolution, and it could generate massive duplicated genes which is an important resource for genetic innovation. Improved sequence alignment criteria and statistical analysis are used to identify and characterize duplicated genes produced by polyploidization in soybean. Based on the collinearity method, duplicated genes by whole genome duplication account for 70.3% in soybean. From the statistical analysis of the molecular distances between duplicated genes, our study indicates that the whole genome duplication event occurred more than once in the genome evolution of soybean, which is often distributed near the ends of chromosomes.

  4. Learning Convex Inference of Marginals

    OpenAIRE

    Domke, Justin

    2012-01-01

    Graphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly in terms of the performance of the inference process at univariate marginal prediction. The main ...

  5. A Regulatory Network Analysis of Orphan Genes in Arabidopsis Thaliana

    Science.gov (United States)

    Singh, Pramesh; Chen, Tianlong; Arendsee, Zebulun; Wurtele, Eve S.; Bassler, Kevin E.

    Orphan genes, which are genes unique to each particular species, have recently drawn significant attention for their potential usefulness for organismal robustness. Their origin and regulatory interaction patterns remain largely undiscovered. Recently, methods that use the context likelihood of relatedness to infer a network followed by modularity maximizing community detection algorithms on the inferred network to find the functional structure of regulatory networks were shown to be effective. We apply improved versions of these methods to gene expression data from Arabidopsis thaliana, identify groups (clusters) of interacting genes with related patterns of expression and analyze the structure within those groups. Focusing on clusters that contain orphan genes, we compare the identified clusters to gene ontology (GO) terms, regulons, and pathway designations and analyze their hierarchical structure. We predict new regulatory interactions and unravel the structure of the regulatory interaction patterns of orphan genes. Work supported by the NSF through Grants DMR-1507371 and IOS-1546858.

  6. Phase Inversion: Inferring Solar Subphotospheric Flow and Other Asphericity from the Distortion of Acoustic Waves

    Science.gov (United States)

    Gough, Douglas; Merryfield, William J.; Toomre, Juri

    1998-01-01

    A method is proposed for analyzing an almost monochromatic train of waves propagating in a single direction in an inhomogeneous medium that is not otherwise changing in time. An effective phase is defined in terms of the Hilbert transform of the wave function, which is related, via the JWKB approximation, to the spatial variation of the background state against which the wave is propagating. The contaminating effect of interference between the truly monochromatic components of the train is eliminated using its propagation properties. Measurement errors, provided they are uncorrelated, are manifest as rapidly varying noise; although that noise can dominate the raw phase-processed signal, it can largely be removed by low-pass filtering. The intended purpose of the analysis is to determine the distortion of solar oscillations induced by horizontal structural variation and material flow. It should be possible to apply the method directly to sectoral modes. The horizontal phase distortion provides a measure of longitudinally averaged properties of the Sun in the vicinity of the equator, averaged also in radius down to the depth to which the modes penetrate. By combining such averages from different modes, the two-dimensional variation can be inferred by standard inversion techniques. After taking due account of horizontal refraction, it should be possible to apply the technique also to locally sectoral modes that propagate obliquely to the equator and thereby build a network of lateral averages at each radius, from which the full three-dimensional structure of the Sun can, in principle, be determined as an inverse Radon transform.

  7. Hypersonic rarefied-flow aerodynamics inferred from Shuttle Orbiter acceleration measurements

    Science.gov (United States)

    Blanchard, R. C.; Hinson, E. W.

    1989-01-01

    Data obtained from multiple flights of sensitive accelerometers on the Space Shuttle Orbiter during reentry have been used to develop an improved aerodynamic model for the Orbiter normal- and axial-force coefficients in hypersonic rarefied flow. The lack of simultaneous atmospheric density measurements was overcome in part by using the ratio of normal-to-axial acceleration, in which density cancels, as a constraint. Differences between the preflight model and the flight-acceleration-derived model in the continuum regime are attributed primarily to real gas effects. New insights are gained into the variation of the force coefficients in the transition between the continuum regime and free molecule flow.

  8. A Bayesian analysis of gene flow from crops to their wild relatives: cultivated (Lactuca sativa L.) and prickly lettuce (L. serriola L.) and the recent expansion of L. serriola in Europe.

    Science.gov (United States)

    Uwimana, Brigitte; D'Andrea, Luigi; Felber, François; Hooftman, Danny A P; Den Nijs, Hans C M; Smulders, Marinus J M; Visser, Richard G F; Van De Wiel, Clemens C M

    2012-06-01

    Interspecific gene flow can lead to the formation of hybrid populations that have a competitive advantage over the parental populations, even for hybrids from a cross between crops and wild relatives. Wild prickly lettuce (Lactuca serriola) has recently expanded in Europe and hybridization with the related crop species (cultivated lettuce, L. sativa) has been hypothesized as one of the mechanisms behind this expansion. In a basically selfing species, such as lettuce, assessing hybridization in natural populations may not be straightforward. Therefore, we analysed a uniquely large data set of plants genotyped with SSR (simple sequence repeat) markers with two programs for Bayesian population genetic analysis, STRUCTURE and NewHybrids. The data set comprised 7738 plants, including a complete genebank collection, which provided a wide coverage of cultivated germplasm and a fair coverage of wild accessions, and a set of wild populations recently sampled across Europe. STRUCTURE analysis inferred the occurrence of hybrids at a level of 7% across Europe. NewHybrids indicated these hybrids to be advanced selfed generations of a hybridization event or of one backcross after such an event, which is according to expectations for a basically selfing species. These advanced selfed generations could not be detected effectively with crop-specific alleles. In the northern part of Europe, where the expansion of L. serriola took place, the fewest putative hybrids were found. Therefore, we conclude that other mechanisms than crop/wild gene flow, such as an increase in disturbed habitats and/or climate warming, are more likely explanations for this expansion. © 2012 Blackwell Publishing Ltd.

  9. Inferring gene dependency network specific to phenotypic alteration based on gene expression data and clinical information of breast cancer.

    Science.gov (United States)

    Zhou, Xionghui; Liu, Juan

    2014-01-01

    Although many methods have been proposed to reconstruct gene regulatory network, most of them, when applied in the sample-based data, can not reveal the gene regulatory relations underlying the phenotypic change (e.g. normal versus cancer). In this paper, we adopt phenotype as a variable when constructing the gene regulatory network, while former researches either neglected it or only used it to select the differentially expressed genes as the inputs to construct the gene regulatory network. To be specific, we integrate phenotype information with gene expression data to identify the gene dependency pairs by using the method of conditional mutual information. A gene dependency pair (A,B) means that the influence of gene A on the phenotype depends on gene B. All identified gene dependency pairs constitute a directed network underlying the phenotype, namely gene dependency network. By this way, we have constructed gene dependency network of breast cancer from gene expression data along with two different phenotype states (metastasis and non-metastasis). Moreover, we have found the network scale free, indicating that its hub genes with high out-degrees may play critical roles in the network. After functional investigation, these hub genes are found to be biologically significant and specially related to breast cancer, which suggests that our gene dependency network is meaningful. The validity has also been justified by literature investigation. From the network, we have selected 43 discriminative hubs as signature to build the classification model for distinguishing the distant metastasis risks of breast cancer patients, and the result outperforms those classification models with published signatures. In conclusion, we have proposed a promising way to construct the gene regulatory network by using sample-based data, which has been shown to be effective and accurate in uncovering the hidden mechanism of the biological process and identifying the gene signature for

  10. Regional heterogeneity and gene flow maintain variance in a quantitative trait within populations of lodgepole pine

    Science.gov (United States)

    Yeaman, Sam; Jarvis, Andy

    2006-01-01

    Genetic variation is of fundamental importance to biological evolution, yet we still know very little about how it is maintained in nature. Because many species inhabit heterogeneous environments and have pronounced local adaptations, gene flow between differently adapted populations may be a persistent source of genetic variation within populations. If this migration–selection balance is biologically important then there should be strong correlations between genetic variance within populations and the amount of heterogeneity in the environment surrounding them. Here, we use data from a long-term study of 142 populations of lodgepole pine (Pinus contorta) to compare levels of genetic variation in growth response with measures of climatic heterogeneity in the surrounding region. We find that regional heterogeneity explains at least 20% of the variation in genetic variance, suggesting that gene flow and heterogeneous selection may play an important role in maintaining the high levels of genetic variation found within natural populations. PMID:16769628

  11. β-empirical Bayes inference and model diagnosis of microarray data

    Directory of Open Access Journals (Sweden)

    Hossain Mollah Mohammad

    2012-06-01

    Full Text Available Abstract Background Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, the data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models. Results As an extension of empirical Bayes (EB procedures, we have developed the β-empirical Bayes (β-EB approach based on a β-likelihood measure which can be regarded as an ’evidence-based’ weighted (quasi- likelihood inference. The weight of a transcript t is described as a power function of its likelihood, fβ(yt|θ. Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β0-likelihood by cross-validation. The proposed β-EB approach identified six significant (p−5 contaminated transcripts as differentially expressed (DE in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach. Conclusions The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model

  12. Bayesian Inversion for Large Scale Antarctic Ice Sheet Flow

    KAUST Repository

    Ghattas, Omar

    2015-01-07

    The flow of ice from the interior of polar ice sheets is the primary contributor to projected sea level rise. One of the main difficulties faced in modeling ice sheet flow is the uncertain spatially-varying Robin boundary condition that describes the resistance to sliding at the base of the ice. Satellite observations of the surface ice flow velocity, along with a model of ice as a creeping incompressible shear-thinning fluid, can be used to infer this uncertain basal boundary condition. We cast this ill-posed inverse problem in the framework of Bayesian inference, which allows us to infer not only the basal sliding parameters, but also the associated uncertainty. To overcome the prohibitive nature of Bayesian methods for large-scale inverse problems, we exploit the fact that, despite the large size of observational data, they typically provide only sparse information on model parameters. We show results for Bayesian inversion of the basal sliding parameter field for the full Antarctic continent, and demonstrate that the work required to solve the inverse problem, measured in number of forward (and adjoint) ice sheet model solves, is independent of the parameter and data dimensions

  13. Bayesian Inversion for Large Scale Antarctic Ice Sheet Flow

    KAUST Repository

    Ghattas, Omar

    2015-01-01

    The flow of ice from the interior of polar ice sheets is the primary contributor to projected sea level rise. One of the main difficulties faced in modeling ice sheet flow is the uncertain spatially-varying Robin boundary condition that describes the resistance to sliding at the base of the ice. Satellite observations of the surface ice flow velocity, along with a model of ice as a creeping incompressible shear-thinning fluid, can be used to infer this uncertain basal boundary condition. We cast this ill-posed inverse problem in the framework of Bayesian inference, which allows us to infer not only the basal sliding parameters, but also the associated uncertainty. To overcome the prohibitive nature of Bayesian methods for large-scale inverse problems, we exploit the fact that, despite the large size of observational data, they typically provide only sparse information on model parameters. We show results for Bayesian inversion of the basal sliding parameter field for the full Antarctic continent, and demonstrate that the work required to solve the inverse problem, measured in number of forward (and adjoint) ice sheet model solves, is independent of the parameter and data dimensions

  14. Spatial genetic structure and asymmetrical gene flow within the Pacific walrus

    Science.gov (United States)

    Sonsthagen, Sarah A.; Jay, Chadwick V.; Fischbach, Anthony S.; Sage, George K.; Talbot, Sandra L.

    2012-01-01

    Pacific walruses (Odobenus rosmarus divergens) occupying shelf waters of Pacific Arctic seas migrate during spring and summer from 3 breeding areas in the Bering Sea to form sexually segregated nonbreeding aggregations. We assessed genetic relationships among 2 putative breeding populations and 6 nonbreeding aggregations. Analyses of mitochondrial DNA (mtDNA) control region sequence data suggest that males are distinct among breeding populations (ΦST=0.051), and between the eastern Chukchi and other nonbreeding aggregations (ΦST=0.336–0.449). Nonbreeding female aggregations were genetically distinct across marker types (microsatellite FST=0.019; mtDNA ΦST=0.313), as was eastern Chukchi and all other nonbreeding aggregations (microsatellite FST=0.019–0.035; mtDNA ΦST=0.386–0.389). Gene flow estimates are asymmetrical from St. Lawrence Island into the southeastern Bering breeding population for both sexes. Partitioning of haplotype frequencies among breeding populations suggests that individuals exhibit some degree of philopatry, although weak. High levels of genetic differentiation among eastern Chukchi and all other nonbreeding aggregations, but considerably lower genetic differentiation between breeding populations, suggest that at least 1 genetically distinct breeding population remained unsampled. Limited genetic structure at microsatellite loci between assayed breeding areas can emerge from several processes, including male-mediated gene flow, or population admixture following a decrease in census size (i.e., due to commercial harvest during 1880–1950s) and subsequent recovery. Nevertheless, high levels of genetic diversity in the Pacific walrus, which withstood prolonged decreases in census numbers with little impact on neutral genetic diversity, may reflect resiliency in the face of past environmental challenges.

  15. Fine scale gene flow and individual movements among subpopulations of Centrolene prosoblepon (Anura: Centrolenidae

    Directory of Open Access Journals (Sweden)

    Jeanne M Robertson

    2008-03-01

    Full Text Available Dispersal capabilities determine and maintain local gene flow, and this has implications for population persistence and/or recolonization following environmental perturbations (natural or anthropogenic, disease outbreaks, or other demographic collapses. To predict recolonization and understand dispersal capacity in a stream-breeding frog, we examined individual movement patterns and gene flow among four subpopulations of the Neotropical glassfrog, Centrolene prosoblepon, at a mid-elevation cloud forest site at El Copé, Panama. We measured male movement directly during a two year mark-recapture study, and indirectly with gene flow estimates from mitochondrial DNA sequences (mtDNA. Individuals of this species showed strong site fidelity: over two years, male frogs in all four headwater streams moved very little (mean = 2.33 m; mode = 0 m. Nine individuals changed streams within one or two years, moving 675-1 108 m. For those males moving more than 10 m, movement was biased upstream (p ST = 0.007, p = 0.325 but gene flow was more limited across greater distances (CT = 0.322, p = 0.065, even within the same drainage network. Lowland populations of C. prosoblepon potentially act as an important source of colonists for upland populations in this watershed. Rev. Biol. Trop. 56 (1: 13-26. Epub 2008 March 31.La capacidad de dispersión determina y mantiene el flujo genético local, y esto tiene implicaciones para la persistencia poblacional y/o la recolonización que sigue a perturbaciones ambientales. Examinamos patrones individuales de movimiento y flujo genético entre subpoblaciones de Centrolene prosoblepon (Anura: Centrolenidae en un sitio de elevación media en El Copé, Panamá. Medimos directamente el movimiento de los machos durante un estudio de marcado-recaptura, e indirectamente con estimaciones de flujo genético a partir de secuencias de ADN mitocondrial (mtDNA. Los individuos mostraron fuerte fidelidad a su lugar: por más de dos a

  16. Sibling competition arena: selfing and a competition arena can combine to constitute a barrier to gene flow in sympatry.

    Science.gov (United States)

    Gibson, A K; Hood, M E; Giraud, T

    2012-06-01

    Closely related species coexisting in sympatry provide critical insight into the mechanisms underlying speciation and the maintenance of genetic divergence. Selfing may promote reproductive isolation by facilitating local adaptation, causing reduced hybrid fitness in parental environments. Here, we propose a novel mechanism by which selfing can further impair interspecific gene flow: selfing may act to ensure that nonhybrid progeny systematically co-occur whenever hybrid genotypes are produced. Under a competition arena, the fitness differentials between nonhybrid and hybrid progeny are then magnified, preventing development of interspecific hybrids. We investigate whether this "sibling competition arena" can explain the coexistence in sympatry of closely related species of the plant fungal pathogens (Microbotryum) causing anther-smut disease. The probabilities of intrapromycelial mating (automixis), outcrossing, and sibling competition were manipulated in artificial inoculations to evaluate their contribution to reproductive isolation. We report that both intrapromycelial selfing and sibling competition significantly reduced rates of hybrid infection beyond that expected based solely upon selfing rates and noncompetitive fitness differentials between hybrid and nonhybrid progeny. Our results thus suggest that selfing and a sibling competition arena can combine to constitute a barrier to gene flow and diminish selection for additional barriers to gene flow in sympatry. © 2012 The Author(s). Evolution © 2012 The Society for the Study of Evolution.

  17. Human disease MiRNA inference by combining target information based on heterogeneous manifolds.

    Science.gov (United States)

    Ding, Pingjian; Luo, Jiawei; Liang, Cheng; Xiao, Qiu; Cao, Buwen

    2018-04-01

    The emergence of network medicine has provided great insight into the identification of disease-related molecules, which could help with the development of personalized medicine. However, the state-of-the-art methods could neither simultaneously consider target information and the known miRNA-disease associations nor effectively explore novel gene-disease associations as a by-product during the process of inferring disease-related miRNAs. Computational methods incorporating multiple sources of information offer more opportunities to infer disease-related molecules, including miRNAs and genes in heterogeneous networks at a system level. In this study, we developed a novel algorithm, named inference of Disease-related MiRNAs based on Heterogeneous Manifold (DMHM), to accurately and efficiently identify miRNA-disease associations by integrating multi-omics data. Graph-based regularization was utilized to obtain a smooth function on the data manifold, which constitutes the main principle of DMHM. The novelty of this framework lies in the relatedness between diseases and miRNAs, which are measured via heterogeneous manifolds on heterogeneous networks integrating target information. To demonstrate the effectiveness of DMHM, we conducted comprehensive experiments based on HMDD datasets and compared DMHM with six state-of-the-art methods. Experimental results indicated that DMHM significantly outperformed the other six methods under fivefold cross validation and de novo prediction tests. Case studies have further confirmed the practical usefulness of DMHM. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Probabilistic inductive inference: a survey

    OpenAIRE

    Ambainis, Andris

    2001-01-01

    Inductive inference is a recursion-theoretic theory of learning, first developed by E. M. Gold (1967). This paper surveys developments in probabilistic inductive inference. We mainly focus on finite inference of recursive functions, since this simple paradigm has produced the most interesting (and most complex) results.

  19. LAIT: a local ancestry inference toolkit.

    Science.gov (United States)

    Hui, Daniel; Fang, Zhou; Lin, Jerome; Duan, Qing; Li, Yun; Hu, Ming; Chen, Wei

    2017-09-06

    Inferring local ancestry in individuals of mixed ancestry has many applications, most notably in identifying disease-susceptible loci that vary among different ethnic groups. Many software packages are available for inferring local ancestry in admixed individuals. However, most of these existing software packages require specific formatted input files and generate output files in various types, yielding practical inconvenience. We developed a tool set, Local Ancestry Inference Toolkit (LAIT), which can convert standardized files into software-specific input file formats as well as standardize and summarize inference results for four popular local ancestry inference software: HAPMIX, LAMP, LAMP-LD, and ELAI. We tested LAIT using both simulated and real data sets and demonstrated that LAIT provides convenience to run multiple local ancestry inference software. In addition, we evaluated the performance of local ancestry software among different supported software packages, mainly focusing on inference accuracy and computational resources used. We provided a toolkit to facilitate the use of local ancestry inference software, especially for users with limited bioinformatics background.

  20. Bayesian statistical inference

    Directory of Open Access Journals (Sweden)

    Bruno De Finetti

    2017-04-01

    Full Text Available This work was translated into English and published in the volume: Bruno De Finetti, Induction and Probability, Biblioteca di Statistica, eds. P. Monari, D. Cocchi, Clueb, Bologna, 1993.Bayesian statistical Inference is one of the last fundamental philosophical papers in which we can find the essential De Finetti's approach to the statistical inference.

  1. A human genome-wide library of local phylogeny predictions for whole-genome inference problems

    Directory of Open Access Journals (Sweden)

    Schwartz Russell

    2008-08-01

    Full Text Available Abstract Background Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding. Results In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history. Conclusion Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

  2. Is there a hierarchy of social inferences? The likelihood and speed of inferring intentionality, mind, and personality.

    Science.gov (United States)

    Malle, Bertram F; Holbrook, Jess

    2012-04-01

    People interpret behavior by making inferences about agents' intentionality, mind, and personality. Past research studied such inferences 1 at a time; in real life, people make these inferences simultaneously. The present studies therefore examined whether 4 major inferences (intentionality, desire, belief, and personality), elicited simultaneously in response to an observed behavior, might be ordered in a hierarchy of likelihood and speed. To achieve generalizability, the studies included a wide range of stimulus behaviors, presented them verbally and as dynamic videos, and assessed inferences both in a retrieval paradigm (measuring the likelihood and speed of accessing inferences immediately after they were made) and in an online processing paradigm (measuring the speed of forming inferences during behavior observation). Five studies provide evidence for a hierarchy of social inferences-from intentionality and desire to belief to personality-that is stable across verbal and visual presentations and that parallels the order found in developmental and primate research. (c) 2012 APA, all rights reserved.

  3. Multispecies coalescent analysis of the early diversification of neotropical primates: phylogenetic inference under strong gene trees/species tree conflict.

    Science.gov (United States)

    Schrago, Carlos G; Menezes, Albert N; Furtado, Carolina; Bonvicino, Cibele R; Seuanez, Hector N

    2014-11-05

    Neotropical primates (NP) are presently distributed in the New World from Mexico to northern Argentina, comprising three large families, Cebidae, Atelidae, and Pitheciidae, consequently to their diversification following their separation from Old World anthropoids near the Eocene/Oligocene boundary, some 40 Ma. The evolution of NP has been intensively investigated in the last decade by studies focusing on their phylogeny and timescale. However, despite major efforts, the phylogenetic relationship between these three major clades and the age of their last common ancestor are still controversial because these inferences were based on limited numbers of loci and dating analyses that did not consider the evolutionary variation associated with the distribution of gene trees within the proposed phylogenies. We show, by multispecies coalescent analyses of selected genome segments, spanning along 92,496,904 bp that the early diversification of extant NP was marked by a 2-fold increase of their effective population size and that Atelids and Cebids are more closely related respective to Pitheciids. The molecular phylogeny of NP has been difficult to solve because of population-level phenomena at the early evolution of the lineage. The association of evolutionary variation with the distribution of gene trees within proposed phylogenies is crucial for distinguishing the mean genetic divergence between species (the mean coalescent time between loci) from speciation time. This approach, based on extensive genomic data provided by new generation DNA sequencing, provides more accurate reconstructions of phylogenies and timescales for all organisms. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  4. INFERENCE BUILDING BLOCKS

    Science.gov (United States)

    2018-02-15

    expressed a variety of inference techniques on discrete and continuous distributions: exact inference, importance sampling, Metropolis-Hastings (MH...without redoing any math or rewriting any code. And although our main goal is composable reuse, our performance is also good because we can use...control paths. • The Hakaru language can express mixtures of discrete and continuous distributions, but the current disintegration transformation

  5. Practical Bayesian Inference

    Science.gov (United States)

    Bailer-Jones, Coryn A. L.

    2017-04-01

    Preface; 1. Probability basics; 2. Estimation and uncertainty; 3. Statistical models and inference; 4. Linear models, least squares, and maximum likelihood; 5. Parameter estimation: single parameter; 6. Parameter estimation: multiple parameters; 7. Approximating distributions; 8. Monte Carlo methods for inference; 9. Parameter estimation: Markov chain Monte Carlo; 10. Frequentist hypothesis testing; 11. Model comparison; 12. Dealing with more complicated problems; References; Index.

  6. Annotating gene sets by mining large literature collections with protein networks.

    Science.gov (United States)

    Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey

    2018-01-01

    Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

  7. The Stream Flow Prediction Model Using Fuzzy Inference System and Particle Swarm Optimization

    Directory of Open Access Journals (Sweden)

    Mahmoud Mohammad RezapourTabari

    2013-03-01

    Full Text Available The aim of this study is the spatial prediction runoff using hydrometric and meteorological stations data. The research shows that usually there is a certain communication between the meteorological and hydrometric data of upstream basin and runoff rates in output basin. So, if can be extracted the rules related to historical data that recorded at stations, can be easily predicted runoff amount based on data measured. Accordingly, among the tools available, the fuzzy theory (with flexibility in developing fuzzy rules can be provide the knowledge lies in the observed data to parameters prediction in real time. So, in this research the fuzzy inference system has been used for estimating runoff rates at stations located in the Taleghan river downstream using rain gage stations and hydrometric stations upstream. Because the inappropriate values associated with membership functions, the fuzzy system model can not provide correct value for the prediction. In this study, a combination of intelligence-based optimization algorithm and fuzzy theory developed to accelerate and improve modeling. The result of proposed model, optimum values to each membership function that related to dependent and independent variable extracted and based on it’s the runoff rates in rivers downstream predicted. The results of this study were shown that the high accuracy of proposed model compared with fuzzy inference system. Also based on proposed model can be more accurately the rate of runoff estimated for future conditions.

  8. Complex networks from experimental horizontal oil–water flows: Community structure detection versus flow pattern discrimination

    International Nuclear Information System (INIS)

    Gao, Zhong-Ke; Fang, Peng-Cheng; Ding, Mei-Shuang; Yang, Dan; Jin, Ning-De

    2015-01-01

    We propose a complex network-based method to distinguish complex patterns arising from experimental horizontal oil–water two-phase flow. We first use the adaptive optimal kernel time–frequency representation (AOK TFR) to characterize flow pattern behaviors from the energy and frequency point of view. Then, we infer two-phase flow complex networks from experimental measurements and detect the community structures associated with flow patterns. The results suggest that the community detection in two-phase flow complex network allows objectively discriminating complex horizontal oil–water flow patterns, especially for the segregated and dispersed flow patterns, a task that existing method based on AOK TFR fails to work. - Highlights: • We combine time–frequency analysis and complex network to identify flow patterns. • We explore the transitional flow behaviors in terms of betweenness centrality. • Our analysis provides a novel way for recognizing complex flow patterns. • Broader applicability of our method is demonstrated and articulated

  9. Genealogy and gene trees.

    Science.gov (United States)

    Rasmuson, Marianne

    2008-02-01

    Heredity can be followed in persons or in genes. Persons can be identified only a few generations back, but simplified models indicate that universal ancestors to all now living persons have occurred in the past. Genetic variability can be characterized as variants of DNA sequences. Data are available only from living persons, but from the pattern of variation gene trees can be inferred by means of coalescence models. The merging of lines backwards in time leads to a MRCA (most recent common ancestor). The time and place of living for this inferred person can give insights in human evolutionary history. Demographic processes are incorporated in the model, but since culture and customs are known to influence demography the models used ought to be tested against available genealogy. The Icelandic data base offers a possibility to do so and points to some discrepancies. Mitochondrial DNA and Y chromosome patterns give a rather consistent view of human evolutionary history during the latest 100 000 years but the earlier epochs of human evolution demand gene trees with longer branches. The results of such studies reveal as yet unsolved problems about the sources of our genome.

  10. Inferring network topology from complex dynamics

    International Nuclear Information System (INIS)

    Shandilya, Srinivas Gorur; Timme, Marc

    2011-01-01

    Inferring the network topology from dynamical observations is a fundamental problem pervading research on complex systems. Here, we present a simple, direct method for inferring the structural connection topology of a network, given an observation of one collective dynamical trajectory. The general theoretical framework is applicable to arbitrary network dynamical systems described by ordinary differential equations. No interference (external driving) is required and the type of dynamics is hardly restricted in any way. In particular, the observed dynamics may be arbitrarily complex; stationary, invariant or transient; synchronous or asynchronous and chaotic or periodic. Presupposing a knowledge of the functional form of the dynamical units and of the coupling functions between them, we present an analytical solution to the inverse problem of finding the network topology from observing a time series of state variables only. Robust reconstruction is achieved in any sufficiently long generic observation of the system. We extend our method to simultaneously reconstructing both the entire network topology and all parameters appearing linear in the system's equations of motion. Reconstruction of network topology and system parameters is viable even in the presence of external noise that distorts the original dynamics substantially. The method provides a conceptually new step towards reconstructing a variety of real-world networks, including gene and protein interaction networks and neuronal circuits.

  11. Logical inference and evaluation

    International Nuclear Information System (INIS)

    Perey, F.G.

    1981-01-01

    Most methodologies of evaluation currently used are based upon the theory of statistical inference. It is generally perceived that this theory is not capable of dealing satisfactorily with what are called systematic errors. Theories of logical inference should be capable of treating all of the information available, including that not involving frequency data. A theory of logical inference is presented as an extension of deductive logic via the concept of plausibility and the application of group theory. Some conclusions, based upon the application of this theory to evaluation of data, are also given

  12. Approximation and inference methods for stochastic biochemical kinetics—a tutorial review

    International Nuclear Information System (INIS)

    Schnoerr, David; Grima, Ramon; Sanguinetti, Guido

    2017-01-01

    Stochastic fluctuations of molecule numbers are ubiquitous in biological systems. Important examples include gene expression and enzymatic processes in living cells. Such systems are typically modelled as chemical reaction networks whose dynamics are governed by the chemical master equation. Despite its simple structure, no analytic solutions to the chemical master equation are known for most systems. Moreover, stochastic simulations are computationally expensive, making systematic analysis and statistical inference a challenging task. Consequently, significant effort has been spent in recent decades on the development of efficient approximation and inference methods. This article gives an introduction to basic modelling concepts as well as an overview of state of the art methods. First, we motivate and introduce deterministic and stochastic methods for modelling chemical networks, and give an overview of simulation and exact solution methods. Next, we discuss several approximation methods, including the chemical Langevin equation, the system size expansion, moment closure approximations, time-scale separation approximations and hybrid methods. We discuss their various properties and review recent advances and remaining challenges for these methods. We present a comparison of several of these methods by means of a numerical case study and highlight some of their respective advantages and disadvantages. Finally, we discuss the problem of inference from experimental data in the Bayesian framework and review recent methods developed the literature. In summary, this review gives a self-contained introduction to modelling, approximations and inference methods for stochastic chemical kinetics. (topical review)

  13. Gene Flow of a Forest-Dependent Bird across a Fragmented Landscape.

    Directory of Open Access Journals (Sweden)

    Rachael V Adams

    Full Text Available Habitat loss and fragmentation can affect the persistence of populations by reducing connectivity and restricting the ability of individuals to disperse across landscapes. Dispersal corridors promote population connectivity and therefore play important roles in maintaining gene flow in natural populations inhabiting fragmented landscapes. In the prairies, forests are restricted to riparian areas along river systems which act as important dispersal corridors for forest dependent species across large expanses of unsuitable grassland habitat. However, natural and anthropogenic barriers within riparian systems have fragmented these forested habitats. In this study, we used microsatellite markers to assess the fine-scale genetic structure of a forest-dependent species, the black-capped chickadee (Poecile atricapillus, along 10 different river systems in Southern Alberta. Using a landscape genetic approach, landscape features (e.g., land cover were found to have a significant effect on patterns of genetic differentiation. Populations are genetically structured as a result of natural breaks in continuous habitat at small spatial scales, but the artificial barriers we tested do not appear to restrict gene flow. Dispersal between rivers is impeded by grasslands, evident from isolation of nearby populations (~ 50 km apart, but also within river systems by large treeless canyons (>100 km. Significant population genetic differentiation within some rivers corresponded with zones of different cottonwood (riparian poplar tree species and their hybrids. This study illustrates the importance of considering the impacts of habitat fragmentation at small spatial scales as well as other ecological processes to gain a better understanding of how organisms respond to their environmental connectivity. Here, even in a common and widespread songbird with high dispersal potential, small breaks in continuous habitats strongly influenced the spatial patterns of genetic

  14. Limits to gene flow in a cosmopolitan marine planktonic diatom.

    Science.gov (United States)

    Casteleyn, Griet; Leliaert, Frederik; Backeljau, Thierry; Debeer, Ann-Eline; Kotaki, Yuichi; Rhodes, Lesley; Lundholm, Nina; Sabbe, Koen; Vyverman, Wim

    2010-07-20

    The role of geographic isolation in marine microbial speciation is hotly debated because of the high dispersal potential and large population sizes of planktonic microorganisms and the apparent lack of strong dispersal barriers in the open sea. Here, we show that gene flow between distant populations of the globally distributed, bloom-forming diatom species Pseudo-nitzschia pungens (clade I) is limited and follows a strong isolation by distance pattern. Furthermore, phylogenetic analysis implies that under appropriate geographic and environmental circumstances, like the pronounced climatic changes in the Pleistocene, population structuring may lead to speciation and hence may play an important role in diversification of marine planktonic microorganisms. A better understanding of the factors that control population structuring is thus essential to reveal the role of allopatric speciation in marine microorganisms.

  15. Low level of gene flow from cultivated beets (¤Beta vulgaris¤ L. ssp. ¤vulgaris¤) into Danish populations of sea beet (¤Beta vulgaris¤ L. ssp. ¤maritima¤ (L.) Arcangeli)

    DEFF Research Database (Denmark)

    Andersen, N.S.; Siegismund, H.R.; Meyer, V.

    2005-01-01

    Gene flow from sugar beets to sea beets occurs in the seed propagation areas in southern Europe. Some seed propagation also takes place in Denmark, but here the crop-wild gene flow has not been investigated. Hence, we studied gene flow to sea beet populations from sugar beet lines used in Danish ...

  16. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    (This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.1 with the ......(This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.......1 with the title ‘Inference'.) This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods using Markov chain Monte Carlo (MCMC) simulations. Due to space limitations the focus...

  17. Lower complexity bounds for lifted inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred

    2015-01-01

    instances of the model. Numerous approaches for such “lifted inference” techniques have been proposed. While it has been demonstrated that these techniques will lead to significantly more efficient inference on some specific models, there are only very recent and still quite restricted results that show...... the feasibility of lifted inference on certain syntactically defined classes of models. Lower complexity bounds that imply some limitations for the feasibility of lifted inference on more expressive model classes were established earlier in Jaeger (2000; Jaeger, M. 2000. On the complexity of inference about...... that under the assumption that NETIME≠ETIME, there is no polynomial lifted inference algorithm for knowledge bases of weighted, quantifier-, and function-free formulas. Further strengthening earlier results, this is also shown to hold for approximate inference and for knowledge bases not containing...

  18. Diurnal Transcriptome and Gene Network Represented through Sparse Modeling in Brachypodium distachyon

    Directory of Open Access Journals (Sweden)

    Satoru Koda

    2017-11-01

    Full Text Available We report the comprehensive identification of periodic genes and their network inference, based on a gene co-expression analysis and an Auto-Regressive eXogenous (ARX model with a group smoothly clipped absolute deviation (SCAD method using a time-series transcriptome dataset in a model grass, Brachypodium distachyon. To reveal the diurnal changes in the transcriptome in B. distachyon, we performed RNA-seq analysis of its leaves sampled through a diurnal cycle of over 48 h at 4 h intervals using three biological replications, and identified 3,621 periodic genes through our wavelet analysis. The expression data are feasible to infer network sparsity based on ARX models. We found that genes involved in biological processes such as transcriptional regulation, protein degradation, and post-transcriptional modification and photosynthesis are significantly enriched in the periodic genes, suggesting that these processes might be regulated by circadian rhythm in B. distachyon. On the basis of the time-series expression patterns of the periodic genes, we constructed a chronological gene co-expression network and identified putative transcription factors encoding genes that might be involved in the time-specific regulatory transcriptional network. Moreover, we inferred a transcriptional network composed of the periodic genes in B. distachyon, aiming to identify genes associated with other genes through variable selection by grouping time points for each gene. Based on the ARX model with the group SCAD regularization using our time-series expression datasets of the periodic genes, we constructed gene networks and found that the networks represent typical scale-free structure. Our findings demonstrate that the diurnal changes in the transcriptome in B. distachyon leaves have a sparse network structure, demonstrating the spatiotemporal gene regulatory network over the cyclic phase transitions in B. distachyon diurnal growth.

  19. Does fragmentation of wetlands affect gene flow in sympatric Acrocephalus warblers with different migration strategies?

    OpenAIRE

    Ceresa, Francesco; Belda, E.J.; Kvist, Laura; Rguibi-Idrissi, Hamid; Monrós González, Juan Salvador

    2015-01-01

    Wetlands are naturally patchy habitats, but patchiness has been accentuated by the extensive wetlands loss due to human activities. In such a fragmented habitat, dispersal ability is especially important to maintain gene flow between populations. Here we studied population structure, genetic diversity and demographic history of Iberian and North African populations of two wetland passerines, the Eurasian reed warbler Acrocephalus scirpaceus and the moustached warbler Acrocephalus melanopogon....

  20. Assessment of the potential for gene flow from transgenic maize (Zea mays L.) to eastern gamagrass (Tripsacum dactyloides L.).

    Science.gov (United States)

    Lee, Moon-Sub; Anderson, Eric K; Stojšin, Duška; McPherson, Marc A; Baltazar, Baltazar; Horak, Michael J; de la Fuente, Juan Manuel; Wu, Kunsheng; Crowley, James H; Rayburn, A Lane; Lee, D K

    2017-08-01

    Eastern gamagrass (Tripsacum dactyloides L.) belongs to the same tribe of the Poaceae family as maize (Zea mays L.) and grows naturally in the same region where maize is commercially produced in the USA. Although no evidence exists of gene flow from maize to eastern gamagrass in nature, experimental crosses between the two species were produced using specific techniques. As part of environmental risk assessment, the possibility of transgene flow from maize to eastern gamagrass populations in nature was evaluated with the objectives: (1) to assess the seeds of eastern gamagrass populations naturally growing near commercial maize fields for the presence of a transgenic glyphosate-tolerance gene (cp4 epsps) that would indicate cross-pollination between the two species, and (2) to evaluate the possibility of interspecific hybridization between transgenic maize used as male parent and eastern gamagrass used as female parent. A total of 46,643 seeds from 54 eastern gamagrass populations collected in proximity of maize fields in Illinois, USA were planted in a field in 2014 and 2015. Emerged seedlings were treated with glyphosate herbicide and assessed for survival. An additional 48,000 seeds from the same 54 eastern gamagrass populations were tested for the presence of the cp4 epsps transgene markers using TaqMan ® PCR method. The results from these trials showed that no seedlings survived the herbicide treatment and no seed indicated presence of the herbicide tolerant cp4 epsps transgene, even though these eastern gamagrass populations were exposed to glyphosate-tolerant maize pollen for years. Furthermore, no interspecific hybrid seeds were produced from 135 hand-pollination attempts involving 1529 eastern gamagrass spikelets exposed to maize pollen. Together, these results indicate that there is no evidence of gene flow from maize to eastern gamagrass in natural habitats. The outcome of this study should be taken in consideration when assessing for environmental

  1. Variations on Bayesian Prediction and Inference

    Science.gov (United States)

    2016-05-09

    inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle

  2. Efficient Reverse-Engineering of a Developmental Gene Regulatory Network

    Science.gov (United States)

    Cicin-Sain, Damjan; Ashyraliyev, Maksat; Jaeger, Johannes

    2012-01-01

    Understanding the complex regulatory networks underlying development and evolution of multi-cellular organisms is a major problem in biology. Computational models can be used as tools to extract the regulatory structure and dynamics of such networks from gene expression data. This approach is called reverse engineering. It has been successfully applied to many gene networks in various biological systems. However, to reconstitute the structure and non-linear dynamics of a developmental gene network in its spatial context remains a considerable challenge. Here, we address this challenge using a case study: the gap gene network involved in segment determination during early development of Drosophila melanogaster. A major problem for reverse-engineering pattern-forming networks is the significant amount of time and effort required to acquire and quantify spatial gene expression data. We have developed a simplified data processing pipeline that considerably increases the throughput of the method, but results in data of reduced accuracy compared to those previously used for gap gene network inference. We demonstrate that we can infer the correct network structure using our reduced data set, and investigate minimal data requirements for successful reverse engineering. Our results show that timing and position of expression domain boundaries are the crucial features for determining regulatory network structure from data, while it is less important to precisely measure expression levels. Based on this, we define minimal data requirements for gap gene network inference. Our results demonstrate the feasibility of reverse-engineering with much reduced experimental effort. This enables more widespread use of the method in different developmental contexts and organisms. Such systematic application of data-driven models to real-world networks has enormous potential. Only the quantitative investigation of a large number of developmental gene regulatory networks will allow us to

  3. Gene prioritization for livestock diseases by data integration

    DEFF Research Database (Denmark)

    Jiang, Li; Sørensen, Peter; Thomsen, Bo Stjerne

    2012-01-01

    in bovine mastitis. Gene-associated phenome profile and transcriptome profile in response to Escherichia coli infection in the mammary gland were integrated to make a global inference of bovine genes involved in mastitis. The top ranked genes were highly enriched for pathways and biological processes...... underlying inflammation and immune responses, which supports the validity of our approach for identifying genes that are relevant to animal health and disease. These gene-associated phenotypes were used for a local prioritization of candidate genes located in a QTL affecting the susceptibility to mastitis...

  4. Scale dependent inference in landscape genetics

    Science.gov (United States)

    Samuel A. Cushman; Erin L. Landguth

    2010-01-01

    Ecological relationships between patterns and processes are highly scale dependent. This paper reports the first formal exploration of how changing scale of research away from the scale of the processes governing gene flow affects the results of landscape genetic analysis. We used an individual-based, spatially explicit simulation model to generate patterns of genetic...

  5. Pollen-mediated gene flow and fine-scale spatial genetic structure in Olea europaea subsp. europaea var. sylvestris.

    Science.gov (United States)

    Beghè, D; Piotti, A; Satovic, Z; de la Rosa, R; Belaj, A

    2017-03-01

    Wild olive ( Olea europaea subsp. europaea var. sylvestris ) is important from an economic and ecological point of view. The effects of anthropogenic activities may lead to the genetic erosion of its genetic patrimony, which has high value for breeding programmes. In particular, the consequences of the introgression from cultivated stands are strongly dependent on the extent of gene flow and therefore this work aims at quantitatively describing contemporary gene flow patterns in wild olive natural populations. The studied wild population is located in an undisturbed forest, in southern Spain, considered one of the few extant hotspots of true oleaster diversity. A total of 225 potential father trees and seeds issued from five mother trees were genotyped by eight microsatellite markers. Levels of contemporary pollen flow, in terms of both pollen immigration rates and within-population dynamics, were measured through paternity analyses. Moreover, the extent of fine-scale spatial genetic structure (SGS) was studied to assess the relative importance of seed and pollen dispersal in shaping the spatial distribution of genetic variation. The results showed that the population under study is characterized by a high genetic diversity, a relatively high pollen immigration rate (0·57), an average within-population pollen dispersal of about 107 m and weak but significant SGS up to 40 m. The population is a mosaic of several intermingled genetic clusters that is likely to be generated by spatially restricted seed dispersal. Moreover, wild oleasters were found to be self-incompatible and preferential mating between some genotypes was revealed. Knowledge of the within-population genetic structure and gene flow dynamics will lead to identifying possible strategies aimed at limiting the effect of anthropogenic activities and improving breeding programmes for the conservation of olive tree forest genetic resources. © The Author 2016. Published by Oxford University Press on behalf

  6. Gene flow and population subdivision in a pantropical plant with sea-drifted seeds Hibiscus tiliaceus and its allied species: evidence from microsatellite analyses.

    Science.gov (United States)

    Takayama, Koji; Tateishi, Yoichi; Murata, Jin; Kajita, Tadashi

    2008-06-01

    The genetic differentiation and structure of Hibiscus tiliaceus, a pantropical plant with sea-drifted seeds, and four allied species were studied using six microsatellite markers. A low level of genetic differentiation was observed among H. tiliaceus populations in the Pacific and Indian Ocean regions, similar to the results of a previous chloroplast DNA (cpDNA) study. Frequent gene flow by long-distance seed dispersal is responsible for species integration of H. tiliaceus in the wide distribution range. On the other hand, highly differentiated populations of H. tiliaceus were detected in West Africa, as well as of Hibiscus pernambucensis in southern Brazil. In the former populations, the African continent may be a geographical barrier that prevents gene flow by sea-drifted seeds. In the latter populations, although there are no known land barriers, the bifurcating South Equatorial Current at the north-eastern horn of Brazil can be a potential barrier to gene flow and may promote the genetic differentiation of these populations. Our results also suggest clear species segregation between H. tiliaceus and H. pernambucensis, which confirms the introgression scenario between these two species that was suggested by a previous cpDNA study. Our results also provide good evidence for recent transatlantic long-distance seed dispersal by sea current. Despite the distinct geographical structure observed in the cpDNA haplotypes, a low level of genetic differentiation was found between Pacific and Atlantic populations of H. pernambucensis, which could be caused by transisthmian gene flow.

  7. Isolated in an ocean of grass: low levels of gene flow between termite subpopulations.

    Science.gov (United States)

    Schmidt, Anna M; Jacklyn, Peter; Korb, Judith

    2013-04-01

    Habitat fragmentation is one of the most important causes of biodiversity loss, but many species are distributed in naturally patchy habitats. Such species are often organized in highly dynamic metapopulations or in patchy populations with high gene flow between subpopulations. Yet, there are also species that exist in stable patchy habitats with small subpopulations and presumably low dispersal rates. Here, we present population genetic data for the 'magnetic' termite Amitermes meridionalis, which show that short distances between subpopulations do not hinder exceptionally strong genetic differentiation (FST : 0.339; RST : 0.636). Despite the strong genetic differentiation between subpopulations, we did not find evidence for genetic impoverishment. We propose that loss of genetic diversity might be counteracted by a long colony life with low colony turnover. Indeed, we found evidence for the inheritance of colonies by so-called 'replacement reproductives'. Inhabiting a mound for several generations might result in loss of gene diversity within a colony but maintenance of gene diversity at the subpopulation level. © 2013 Blackwell Publishing Ltd.

  8. Crop-to-wild gene flow and its fitness consequences for a wild fruit tree: Towards a comprehensive conservation strategy of the wild apple in Europe.

    Science.gov (United States)

    Feurtey, Alice; Cornille, Amandine; Shykoff, Jacqui A; Snirc, Alodie; Giraud, Tatiana

    2017-02-01

    Crop-to-wild gene flow can reduce the fitness and genetic integrity of wild species. Malus sylvestris , the European crab-apple fruit tree in particular, is threatened by the disappearance of its habitat and by gene flow from its domesticated relative , Malus domestica . With the aims of evaluating threats for M. sylvestris and of formulating recommendations for its conservation, we studied here, using microsatellite markers and growth experiments: (i) hybridization rates in seeds and trees from a French forest and in seeds used for replanting crab apples in agrosystems and in forests, (ii) the impact of the level of M. domestica ancestry on individual tree fitness and (iii) pollen dispersal abilities in relation to crop-to-wild gene flow. We found substantial contemporary crop-to-wild gene flow in crab-apple tree populations and superior fitness of hybrids compared to wild seeds and seedlings. Using paternity analyses, we showed that pollen dispersal could occur up to 4 km and decreased with tree density. The seed network furnishing the wild apple reintroduction agroforestry programmes was found to suffer from poor genetic diversity, introgressions and species misidentification. Overall, our findings indicate supported threats for the European wild apple steering us to provide precise recommendations for its conservation.

  9. The standard lateral gene transfer model is statistically consistent for pectinate four-taxon trees

    DEFF Research Database (Denmark)

    Sand, Andreas; Steel, Mike

    2013-01-01

    Evolutionary events such as incomplete lineage sorting and lateral gene transfers constitute major problems for inferring species trees from gene trees, as they can sometimes lead to gene trees which conflict with the underlying species tree. One particularly simple and efficient way to infer...... species trees from gene trees under such conditions is to combine three-taxon analyses for several genes using a majority vote approach. For incomplete lineage sorting this method is known to be statistically consistent; however, for lateral gene transfers it was recently shown that a zone...... of inconsistency exists for a specific four-taxon tree topology, and it was posed as an open question whether inconsistencies could exist for other four-taxon tree topologies? In this letter we analyze all remaining four-taxon topologies and show that no other inconsistencies exist....

  10. Phylogenetic position of the giant anuran trypanosomes Trypanosoma chattoni, Trypanosoma fallisi, Trypanosoma mega, Trypanosoma neveulemairei, and Trypanosoma ranarum inferred from 18S rRNA gene sequences.

    Science.gov (United States)

    Martin, Donald S; Wright, André-Denis G; Barta, John R; Desser, Sherwin S

    2002-06-01

    Phylogenetic relationships within the kinetoplastid flagellates were inferred from comparisons of small-subunit ribosomal RNA gene sequences. These included 5 new gene sequences, Trypanosoma fallisi (2,239 bp), Trypanosoma chattoni (2,180 bp), Trypanosoma mega (2,211 bp), Trypanosoma neveulemairei (2,197 bp), and Trypanosoma ranarum (2,203 bp). Trees produced using maximum-parsimony and distance-matrix methods (least-squares, neighbor-joining, and maximum-likelihood), supported by strong bootstrap and quartet-puzzle analyses, indicated that the trypanosomes are a monophyletic group that divides into 2 major lineages, the salivarian trypanosomes and the nonsalivarian trypanosomes. The nonsalivarian trypanosomes further divide into 2 lineages, 1 containing trypanosomes of birds, mammals, and reptiles and the other containing trypanosomes of fish, reptiles, and anurans. Among the giant trypanosomes, T. chattoni is clearly shown to be distantly related to all the other anuran trypanosome species. Trypanosoma mega is closely associated with T. fallisi and T. ranarum, whereas T. neveulemairei and Trypanosoma rotatorium are sister taxa. The branching order of the anuran trypanosomes suggests that some toad trypanosomes may have evolved by host switching from frogs to toads.

  11. Population structure of the malaria vector Anopheles sinensis (Diptera: Culicidae in China: two gene pools inferred by microsatellites.

    Directory of Open Access Journals (Sweden)

    Yajun Ma

    Full Text Available BACKGROUND: Anopheles sinensis is a competent malaria vector in China. An understanding of vector population structure is important to the vector-based malaria control programs. However, there is no adequate data of A. sinensis population genetics available yet. METHODOLOGY/PRINCIPAL FINDINGS: This study used 5 microsatellite loci to estimate population genetic diversity, genetic differentiation and demographic history of A. sinensis from 14 representative localities in China. All 5 microsatellite loci were highly polymorphic across populations, with high allelic richness and heterozygosity. Hardy-Weinberg disequilibrium was found in 12 populations associated with heterozygote deficits, which was likely caused by the presence of null allele and the Wahlund effect. Bayesian clustering analysis revealed two gene pools, grouping samples into two population clusters; one includes six and the other includes eight populations. Out of 14 samples, six samples were mixed with individuals from both gene pools, indicating the coexistence of two genetic units in the areas sampled. The overall differentiation between two genetic pools was moderate (F(ST = 0.156. Pairwise differentiation between populations were lower within clusters (F(ST = 0.008-0.028 in cluster I and F(ST = 0.004-0.048 in cluster II than between clusters (F(ST = 0.120-0.201. A reduced gene flow (Nm = 1-1.7 was detected between clusters. No evidence of isolation by distance was detected among populations neither within nor between the two clusters. There are differences in effective population size (Ne = 14.3-infinite across sampled populations. CONCLUSIONS/SIGNIFICANCE: Two genetic pools with moderate genetic differentiation were identified in the A. sinensis populations in China. The population divergence was not correlated with geographic distance or barrier in the range. Variable effective population size and other demographic effects of historical population

  12. Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins

    Science.gov (United States)

    Gaucher, Eric A.; Thomson, J. Michael; Burgan, Michelle F.; Benner, Steven A.

    2003-01-01

    Features of the physical environment surrounding an ancestral organism can be inferred by reconstructing sequences of ancient proteins made by those organisms, resurrecting these proteins in the laboratory, and measuring their properties. Here, we resurrect candidate sequences for elongation factors of the Tu family (EF-Tu) found at ancient nodes in the bacterial evolutionary tree, and measure their activities as a function of temperature. The ancient EF-Tu proteins have temperature optima of 55-65 degrees C. This value seems to be robust with respect to uncertainties in the ancestral reconstruction. This suggests that the ancient bacteria that hosted these particular genes were thermophiles, and neither hyperthermophiles nor mesophiles. This conclusion can be compared and contrasted with inferences drawn from an analysis of the lengths of branches in trees joining proteins from contemporary bacteria, the distribution of thermophily in derived bacterial lineages, the inferred G + C content of ancient ribosomal RNA, and the geological record combined with assumptions concerning molecular clocks. The study illustrates the use of experimental palaeobiochemistry and assumptions about deep phylogenetic relationships between bacteria to explore the character of ancient life.

  13. Adaptive Inference on General Graphical Models

    OpenAIRE

    Acar, Umut A.; Ihler, Alexander T.; Mettu, Ramgopal; Sumer, Ozgur

    2012-01-01

    Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive inference is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional ...

  14. Inference of miRNA targets using evolutionary conservation and pathway analysis

    Directory of Open Access Journals (Sweden)

    van Nimwegen Erik

    2007-03-01

    Full Text Available Abstract Background MicroRNAs have emerged as important regulatory genes in a variety of cellular processes and, in recent years, hundreds of such genes have been discovered in animals. In contrast, functional annotations are available only for a very small fraction of these miRNAs, and even in these cases only partially. Results We developed a general Bayesian method for the inference of miRNA target sites, in which, for each miRNA, we explicitly model the evolution of orthologous target sites in a set of related species. Using this method we predict target sites for all known miRNAs in flies, worms, fish, and mammals. By comparing our predictions in fly with a reference set of experimentally tested miRNA-mRNA interactions we show that our general method performs at least as well as the most accurate methods available to date, including ones specifically tailored for target prediction in fly. An important novel feature of our model is that it explicitly infers the phylogenetic distribution of functional target sites, independently for each miRNA. This allows us to infer species-specific and clade-specific miRNA targeting. We also show that, in long human 3' UTRs, miRNA target sites occur preferentially near the start and near the end of the 3' UTR. To characterize miRNA function beyond the predicted lists of targets we further present a method to infer significant associations between the sets of targets predicted for individual miRNAs and specific biochemical pathways, in particular those of the KEGG pathway database. We show that this approach retrieves several known functional miRNA-mRNA associations, and predicts novel functions for known miRNAs in cell growth and in development. Conclusion We have presented a Bayesian target prediction algorithm without any tunable parameters, that can be applied to sequences from any clade of species. The algorithm automatically infers the phylogenetic distribution of functional sites for each miRNA, and

  15. Conservation, spillover and gene flow within a network of Northern European marine protected areas.

    Directory of Open Access Journals (Sweden)

    Mats Brockstedt Olsen Huserbråten

    Full Text Available To ensure that marine protected areas (MPAs benefit conservation and fisheries, the effectiveness of MPA designs has to be evaluated in field studies. Using an interdisciplinary approach, we empirically assessed the design of a network of northern MPAs where fishing for European lobster (Homarusgammarus is prohibited. First, we demonstrate a high level of residency and survival (50% for almost a year (363 days within MPAs, despite small MPA sizes (0.5-1 km(2. Second, we demonstrate limited export (4.7% of lobsters tagged within MPAs (N = 1810 to neighbouring fished areas, over a median distance of 1.6 km out to maximum 21 km away from MPA centres. In comparison, median movement distance of lobsters recaptured within MPAs was 164 m, and recapture rate was high (40%. Third, we demonstrate a high level of gene flow within the study region, with an estimated F ST of less than 0.0001 over a ≈ 400 km coastline. Thus, the restricted movement of older life stages, combined with a high level of gene flow suggests that connectivity is primarily driven by larval drift. Larval export from the MPAs can most likely affect areas far beyond their borders. Our findings are of high importance for the design of MPA networks for sedentary species with pelagic early life stages.

  16. Approximate Bayesian computation for modular inference problems with many parameters: the example of migration rates.

    Science.gov (United States)

    Aeschbacher, S; Futschik, A; Beaumont, M A

    2013-02-01

    We propose a two-step procedure for estimating multiple migration rates in an approximate Bayesian computation (ABC) framework, accounting for global nuisance parameters. The approach is not limited to migration, but generally of interest for inference problems with multiple parameters and a modular structure (e.g. independent sets of demes or loci). We condition on a known, but complex demographic model of a spatially subdivided population, motivated by the reintroduction of Alpine ibex (Capra ibex) into Switzerland. In the first step, the global parameters ancestral mutation rate and male mating skew have been estimated for the whole population in Aeschbacher et al. (Genetics 2012; 192: 1027). In the second step, we estimate in this study the migration rates independently for clusters of demes putatively connected by migration. For large clusters (many migration rates), ABC faces the problem of too many summary statistics. We therefore assess by simulation if estimation per pair of demes is a valid alternative. We find that the trade-off between reduced dimensionality for the pairwise estimation on the one hand and lower accuracy due to the assumption of pairwise independence on the other depends on the number of migration rates to be inferred: the accuracy of the pairwise approach increases with the number of parameters, relative to the joint estimation approach. To distinguish between low and zero migration, we perform ABC-type model comparison between a model with migration and one without. Applying the approach to microsatellite data from Alpine ibex, we find no evidence for substantial gene flow via migration, except for one pair of demes in one direction. © 2013 Blackwell Publishing Ltd.

  17. Inferring the demographic history of African farmers and pygmy hunter-gatherers using a multilocus resequencing data set.

    Directory of Open Access Journals (Sweden)

    Etienne Patin

    2009-04-01

    Full Text Available The transition from hunting and gathering to farming involved a major cultural innovation that has spread rapidly over most of the globe in the last ten millennia. In sub-Saharan Africa, hunter-gatherers have begun to shift toward an agriculture-based lifestyle over the last 5,000 years. Only a few populations still base their mode of subsistence on hunting and gathering. The Pygmies are considered to be the largest group of mobile hunter-gatherers of Africa. They dwell in equatorial rainforests and are characterized by their short mean stature. However, little is known about the chronology of the demographic events-size changes, population splits, and gene flow--ultimately giving rise to contemporary Pygmy (Western and Eastern groups and neighboring agricultural populations. We studied the branching history of Pygmy hunter-gatherers and agricultural populations from Africa and estimated separation times and gene flow between these populations. We resequenced 24 independent noncoding regions across the genome, corresponding to a total of approximately 33 kb per individual, in 236 samples from seven Pygmy and five agricultural populations dispersed over the African continent. We used simulation-based inference to identify the historical model best fitting our data. The model identified included the early divergence of the ancestors of Pygmy hunter-gatherers and farming populations approximately 60,000 years ago, followed by a split of the Pygmies' ancestors into the Western and Eastern Pygmy groups approximately 20,000 years ago. Our findings increase knowledge of the history of the peopling of the African continent in a region lacking archaeological data. An appreciation of the demographic and adaptive history of African populations with different modes of subsistence should improve our understanding of the influence of human lifestyles on genome diversity.

  18. Population genetic analysis reveals barriers and corridors for gene flow within and among riparian populations of a rare plant.

    Science.gov (United States)

    Hevroy, Tanya H; Moody, Michael L; Krauss, Siegfried L

    2018-02-01

    Landscape features and life-history traits affect gene flow, migration and drift to impact on spatial genetic structure of species. Understanding this is important for managing genetic diversity of threatened species. This study assessed the spatial genetic structure of the rare riparian Grevillea sp. Cooljarloo (Proteaceae), which is restricted to a 20 km 2 region impacted by mining in the northern sandplains of the Southwest Australian Floristic Region, an international biodiversity hotspot. Within creek lines and floodplains, the distribution is largely continuous. Models of dispersal within riparian systems were assessed by spatial genetic analyses including population level partitioning of genetic variation and individual Bayesian clustering. High levels of genetic variation and weak isolation by distance within creek line and floodplain populations suggest large effective population sizes and strong connectivity, with little evidence for unidirectional gene flow as might be expected from hydrochory. Regional clustering of creek line populations and strong divergence among creek line populations suggest substantially lower levels of gene flow among creek lines than within creek lines. There was however a surprising amount of genetic admixture in floodplain populations, which could be explained by irregular flooding and/or movements by highly mobile nectar-feeding bird pollinators. Our results highlight that for conservation of rare riparian species, avoiding an impact to hydrodynamic processes, such as water tables and flooding dynamics, may be just as critical as avoiding direct impacts on the number of plants.

  19. Adaptive population divergence and directional gene flow across steep elevational gradients in a climate‐sensitive mammal

    Science.gov (United States)

    Waterhouse, Matthew D.; Erb, Liesl P.; Beever, Erik; Russello, Michael A.

    2018-01-01

    The American pika is a thermally sensitive, alpine lagomorph species. Recent climate-associated population extirpations and genetic signatures of reduced population sizes range-wide indicate the viability of this species is sensitive to climate change. To test for potential adaptive responses to climate stress, we sampled pikas along two elevational gradients (each ~470 to 1640 m) and employed three outlier detection methods, BAYESCAN, LFMM, and BAYPASS, to scan for genotype-environment associations in samples genotyped at 30,763 SNP loci. We resolved 173 loci with robust evidence of natural selection detected by either two independent analyses or replicated in both transects. A BLASTN search of these outlier loci revealed several genes associated with metabolic function and oxygen transport, indicating natural selection from thermal stress and hypoxia. We also found evidence of directional gene flow primarily downslope from large high-elevation populations and reduced gene flow at outlier loci, a pattern suggesting potential impediments to the upward elevational movement of adaptive alleles in response to contemporary climate change. Finally, we documented evidence of reduced genetic diversity associated the south-facing transect and an increase in corticosterone stress levels associated with inbreeding. This study suggests the American pika is already undergoing climate-associated natural selection at multiple genomic regions. Further analysis is needed to determine if the rate of climate adaptation in the American pika and other thermally sensitive species will be able to keep pace with rapidly changing climate conditions.

  20. The inference from a single case: moral versus scientific inferences in implementing new biotechnologies.

    Science.gov (United States)

    Hofmann, B

    2008-06-01

    Are there similarities between scientific and moral inference? This is the key question in this article. It takes as its point of departure an instance of one person's story in the media changing both Norwegian public opinion and a brand-new Norwegian law prohibiting the use of saviour siblings. The case appears to falsify existing norms and to establish new ones. The analysis of this case reveals similarities in the modes of inference in science and morals, inasmuch as (a) a single case functions as a counter-example to an existing rule; (b) there is a common presupposition of stability, similarity and order, which makes it possible to reason from a few cases to a general rule; and (c) this makes it possible to hold things together and retain order. In science, these modes of inference are referred to as falsification, induction and consistency. In morals, they have a variety of other names. Hence, even without abandoning the fact-value divide, there appear to be similarities between inference in science and inference in morals, which may encourage communication across the boundaries between "the two cultures" and which are relevant to medical humanities.

  1. Quantitative Information Flow as Safety and Liveness Hyperproperties

    Directory of Open Access Journals (Sweden)

    Hirotoshi Yasuoka

    2012-07-01

    Full Text Available We employ Clarkson and Schneider's "hyperproperties" to classify various verification problems of quantitative information flow. The results of this paper unify and extend the previous results on the hardness of checking and inferring quantitative information flow. In particular, we identify a subclass of liveness hyperproperties, which we call "k-observable hyperproperties", that can be checked relative to a reachability oracle via self composition.

  2. Introductory statistical inference

    CERN Document Server

    Mukhopadhyay, Nitis

    2014-01-01

    This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist

  3. Population structure of barley landrace populations and gene-flow with modern varieties.

    Directory of Open Access Journals (Sweden)

    Elisa Bellucci

    Full Text Available Landraces are heterogeneous plant varieties that are reproduced by farmers as populations that are subject to both artificial and natural selection. Landraces are distinguished by farmers due to their specific traits, and different farmers often grow different populations of the same landrace. We used simple sequence repeats (SSRs to analyse 12 barley landrace populations from Sardinia from two collections spanning 10 years. We analysed the population structure, and compared the population diversity of the landraces that were collected at field level (population. We used a representative pool of barley varieties for diversity comparisons and to analyse the effects of gene flow from modern varieties. We found that the Sardinian landraces are a distinct gene pool from those of both two-row and six-row barley varieties. There is also a low, but significant, mean level and population-dependent level of introgression from the modern varieties into the Sardinian landraces. Moreover, we show that the Sardinian landraces have the same level of gene diversity as the representative sample of modern commercial varieties grown in Italy in the last decades, even within population level. Thus, these populations represent crucial sources of germplasm that will be useful for crop improvement and for population genomics studies and association mapping, to identify genes, loci and genome regions responsible for adaptive variations. Our data also suggest that landraces are a source of valuable germplasm for sustainable agriculture in the context of future climate change, and that in-situ conservation strategies based on farmer use can preserve the genetic identity of landraces while allowing adaptation to local environments.

  4. Active inference, communication and hermeneutics.

    Science.gov (United States)

    Friston, Karl J; Frith, Christopher D

    2015-07-01

    Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. Evolutionary rates at codon sites may be used to align sequences and infer protein domain function

    Directory of Open Access Journals (Sweden)

    Hazelhurst Scott

    2010-03-01

    Full Text Available Abstract Background Sequence alignments form part of many investigations in molecular biology, including the determination of phylogenetic relationships, the prediction of protein structure and function, and the measurement of evolutionary rates. However, to obtain meaningful results, a significant degree of sequence similarity is required to ensure that the alignments are accurate and the inferences correct. Limitations arise when sequence similarity is low, which is particularly problematic when working with fast-evolving genes, evolutionary distant taxa, genomes with nucleotide biases, and cases of convergent evolution. Results A novel approach was conceptualized to address the "low sequence similarity" alignment problem. We developed an alignment algorithm termed FIRE (Functional Inference using the Rates of Evolution, which aligns sequences using the evolutionary rate at codon sites, as measured by the dN/dS ratio, rather than nucleotide or amino acid residues. FIRE was used to test the hypotheses that evolutionary rates can be used to align sequences and that the alignments may be used to infer protein domain function. Using a range of test data, we found that aligning domains based on evolutionary rates was possible even when sequence similarity was very low (for example, antibody variable regions. Furthermore, the alignment has the potential to infer protein domain function, indicating that domains with similar functions are subject to similar evolutionary constraints. These data suggest that an evolutionary rate-based approach to sequence analysis (particularly when combined with structural data may be used to study cases of convergent evolution or when sequences have very low similarity. However, when aligning homologous gene sets with sequence similarity, FIRE did not perform as well as the best traditional alignment algorithms indicating that the conventional approach of aligning residues as opposed to evolutionary rates remains the

  6. Genetic diversity and population structure of Lantana camara in India indicates multiple introductions and gene flow.

    Science.gov (United States)

    Ray, A; Quader, S

    2014-05-01

    Lantana camara is a highly invasive plant, which has spread over 60 countries and island groups of Asia, Africa and Australia. In India, it was introduced in the early nineteenth century, since when it has expanded and gradually established itself in almost every available ecosystem. We investigated the genetic diversity and population structure of this plant in India in order to understand its introduction, subsequent range expansion and gene flow. A total of 179 individuals were sequenced at three chloroplast loci and 218 individuals were genotyped for six nuclear microsatellites. Both chloroplasts (nine haplotypes) and microsatellites (83 alleles) showed high genetic diversity. Besides, each type of marker confirmed the presence of private polymorphism. We uncovered low to medium population structure in both markers, and found a faint signal of isolation by distance with microsatellites. Bayesian clustering analyses revealed multiple divergent genetic clusters. Taken together, these findings (i.e. high genetic diversity with private alleles and multiple genetic clusters) suggest that Lantana was introduced multiple times and gradually underwent spatial expansion with recurrent gene flow. © 2013 German Botanical Society and The Royal Botanical Society of the Netherlands.

  7. Genetic Structure and Gene Flows within Horses: A Genealogical Study at the French Population Scale

    OpenAIRE

    Pirault, Pauline; Danvy, Sophy; Verrier, Etienne; Leroy, Gr?goire

    2013-01-01

    Since horse breeds constitute populations submitted to variable and multiple outcrossing events, we analyzed the genetic structure and gene flows considering horses raised in France. We used genealogical data, with a reference population of 547,620 horses born in France between 2002 and 2011, grouped according to 55 breed origins. On average, individuals had 6.3 equivalent generations known. Considering different population levels, fixation index decreased from an overall species FIT of 1.37%...

  8. Phylogenetic inference in Rafflesiales: the influence of rate heterogeneity and horizontal gene transfer

    Directory of Open Access Journals (Sweden)

    Vidal-Russell Romina

    2004-10-01

    Full Text Available Abstract Background The phylogenetic relationships among the holoparasites of Rafflesiales have remained enigmatic for over a century. Recent molecular phylogenetic studies using the mitochondrial matR gene placed Rafflesia, Rhizanthes and Sapria (Rafflesiaceae s. str. in the angiosperm order Malpighiales and Mitrastema (Mitrastemonaceae in Ericales. These phylogenetic studies did not, however, sample two additional groups traditionally classified within Rafflesiales (Apodantheaceae and Cytinaceae. Here we provide molecular phylogenetic evidence using DNA sequence data from mitochondrial and nuclear genes for representatives of all genera in Rafflesiales. Results Our analyses indicate that the phylogenetic affinities of the large-flowered clade and Mitrastema, ascertained using mitochondrial matR, are congruent with results from nuclear SSU rDNA when these data are analyzed using maximum likelihood and Bayesian methods. The relationship of Cytinaceae to Malvales was recovered in all analyses. Relationships between Apodanthaceae and photosynthetic angiosperms varied depending upon the data partition: Malvales (3-gene, Cucurbitales (matR or Fabales (atp1. The latter incongruencies suggest that horizontal gene transfer (HGT may be affecting the mitochondrial gene topologies. The lack of association between Mitrastema and Ericales using atp1 is suggestive of HGT, but greater sampling within eudicots is needed to test this hypothesis further. Conclusions Rafflesiales are not monophyletic but composed of three or four independent lineages (families: Rafflesiaceae, Mitrastemonaceae, Apodanthaceae and Cytinaceae. Long-branch attraction appears to be misleading parsimony analyses of nuclear small-subunit rDNA data, but model-based methods (maximum likelihood and Bayesian analyses recover a topology that is congruent with the mitochondrial matR gene tree, thus providing compelling evidence for organismal relationships. Horizontal gene transfer appears to

  9. Numerical optimization using flow equations

    Science.gov (United States)

    Punk, Matthias

    2014-12-01

    We develop a method for multidimensional optimization using flow equations. This method is based on homotopy continuation in combination with a maximum entropy approach. Extrema of the optimizing functional correspond to fixed points of the flow equation. While ideas based on Bayesian inference such as the maximum entropy method always depend on a prior probability, the additional step in our approach is to perform a continuous update of the prior during the homotopy flow. The prior probability thus enters the flow equation only as an initial condition. We demonstrate the applicability of this optimization method for two paradigmatic problems in theoretical condensed matter physics: numerical analytic continuation from imaginary to real frequencies and finding (variational) ground states of frustrated (quantum) Ising models with random or long-range antiferromagnetic interactions.

  10. Intercoalescence time distribution of incomplete gene genealogies in temporally varying populations, and applications in population genetic inference.

    Science.gov (United States)

    Chen, Hua

    2013-03-01

    Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time-varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent-based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large-scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations. © 2013 Blackwell Publishing Ltd/University College London.

  11. Optimization methods for logical inference

    CERN Document Server

    Chandru, Vijay

    2011-01-01

    Merging logic and mathematics in deductive inference-an innovative, cutting-edge approach. Optimization methods for logical inference? Absolutely, say Vijay Chandru and John Hooker, two major contributors to this rapidly expanding field. And even though ""solving logical inference problems with optimization methods may seem a bit like eating sauerkraut with chopsticks. . . it is the mathematical structure of a problem that determines whether an optimization model can help solve it, not the context in which the problem occurs."" Presenting powerful, proven optimization techniques for logic in

  12. Inference of topology and the nature of synapses, and the flow of information in neuronal networks

    Science.gov (United States)

    Borges, F. S.; Lameu, E. L.; Iarosz, K. C.; Protachevicz, P. R.; Caldas, I. L.; Viana, R. L.; Macau, E. E. N.; Batista, A. M.; Baptista, M. S.

    2018-02-01

    The characterization of neuronal connectivity is one of the most important matters in neuroscience. In this work, we show that a recently proposed informational quantity, the causal mutual information, employed with an appropriate methodology, can be used not only to correctly infer the direction of the underlying physical synapses, but also to identify their excitatory or inhibitory nature, considering easy to handle and measure bivariate time series. The success of our approach relies on a surprising property found in neuronal networks by which nonadjacent neurons do "understand" each other (positive mutual information), however, this exchange of information is not capable of causing effect (zero transfer entropy). Remarkably, inhibitory connections, responsible for enhancing synchronization, transfer more information than excitatory connections, known to enhance entropy in the network. We also demonstrate that our methodology can be used to correctly infer directionality of synapses even in the presence of dynamic and observational Gaussian noise, and is also successful in providing the effective directionality of intermodular connectivity, when only mean fields can be measured.

  13. Inference in `poor` languages

    Energy Technology Data Exchange (ETDEWEB)

    Petrov, S.

    1996-10-01

    Languages with a solvable implication problem but without complete and consistent systems of inference rules (`poor` languages) are considered. The problem of existence of finite complete and consistent inference rule system for a ``poor`` language is stated independently of the language or rules syntax. Several properties of the problem arc proved. An application of results to the language of join dependencies is given.

  14. Constraints on genome dynamics revealed from gene distribution among the Ralstonia solanacearum species.

    Directory of Open Access Journals (Sweden)

    Pierre Lefeuvre

    Full Text Available Because it is suspected that gene content may partly explain host adaptation and ecology of pathogenic bacteria, it is important to study factors affecting genome composition and its evolution. While recent genomic advances have revealed extremely large pan-genomes for some bacterial species, it remains difficult to predict to what extent gene pool is accessible within or transferable between populations. As genomes bear imprints of the history of the organisms, gene distribution pattern analyses should provide insights into the forces and factors at play in the shaping and maintaining of bacterial genomes. In this study, we revisited the data obtained from a previous CGH microarrays analysis in order to assess the genomic plasticity of the R. solanacearum species complex. Gene distribution analyses demonstrated the remarkably dispersed genome of R. solanacearum with more than half of the genes being accessory. From the reconstruction of the ancestral genomes compositions, we were able to infer the number of gene gain and loss events along the phylogeny. Analyses of gene movement patterns reveal that factors associated with gene function, genomic localization and ecology delineate gene flow patterns. While the chromosome displayed lower rates of movement, the megaplasmid was clearly associated with hot-spots of gene gain and loss. Gene function was also confirmed to be an essential factor in gene gain and loss dynamics with significant differences in movement patterns between different COG categories. Finally, analyses of gene distribution highlighted possible highways of horizontal gene transfer. Due to sampling and design bias, we can only speculate on factors at play in this gene movement dynamic. Further studies examining precise conditions that favor gene transfer would provide invaluable insights in the fate of bacteria, species delineation and the emergence of successful pathogens.

  15. EI: A Program for Ecological Inference

    Directory of Open Access Journals (Sweden)

    Gary King

    2004-09-01

    Full Text Available The program EI provides a method of inferring individual behavior from aggregate data. It implements the statistical procedures, diagnostics, and graphics from the book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (King 1997. Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological" data to infer discrete individual-level relationships of interest when individual-level data are not available. Ecological inferences are required in political science research when individual-level surveys are unavailable (e.g., local or comparative electoral politics, unreliable (racial politics, insufficient (political geography, or infeasible (political history. They are also required in numerous areas of ma jor significance in public policy (e.g., for applying the Voting Rights Act and other academic disciplines ranging from epidemiology and marketing to sociology and quantitative history.

  16. Reconstructing Dynamic Promoter Activity Profiles from Reporter Gene Data.

    Science.gov (United States)

    Kannan, Soumya; Sams, Thomas; Maury, Jérôme; Workman, Christopher T

    2018-03-16

    Accurate characterization of promoter activity is important when designing expression systems for systems biology and metabolic engineering applications. Promoters that respond to changes in the environment enable the dynamic control of gene expression without the necessity of inducer compounds, for example. However, the dynamic nature of these processes poses challenges for estimating promoter activity. Most experimental approaches utilize reporter gene expression to estimate promoter activity. Typically the reporter gene encodes a fluorescent protein that is used to infer a constant promoter activity despite the fact that the observed output may be dynamic and is a number of steps away from the transcription process. In fact, some promoters that are often thought of as constitutive can show changes in activity when growth conditions change. For these reasons, we have developed a system of ordinary differential equations for estimating dynamic promoter activity for promoters that change their activity in response to the environment that is robust to noise and changes in growth rate. Our approach, inference of dynamic promoter activity (PromAct), improves on existing methods by more accurately inferring known promoter activity profiles. This method is also capable of estimating the correct scale of promoter activity and can be applied to quantitative data sets to estimate quantitative rates.

  17. Role of zebrafish cytochrome P450 CYP1C genes in the reduced mesencephalic vein blood flow caused by activation of AHR2

    International Nuclear Information System (INIS)

    Kubota, Akira; Stegeman, John J.; Woodin, Bruce R.; Iwanaga, Toshihiko; Harano, Ryo; Peterson, Richard E.; Hiraga, Takeo; Teraoka, Hiroki

    2011-01-01

    2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) causes various signs of toxicity in early life stages of vertebrates through activation of the aryl hydrocarbon receptor (AHR). We previously reported a sensitive and useful endpoint of TCDD developmental toxicity in zebrafish, namely a decrease in blood flow in the dorsal midbrain, but downstream genes involved in the effect are not known. The present study addressed the role of zebrafish cytochrome P450 1C (CYP1C) genes in association with a decrease in mesencephalic vein (MsV) blood flow. The CYP1C subfamily was recently discovered in fish and includes the paralogues CYP1C1 and CYP1C2, both of which are induced via AHR2 in zebrafish embryos. We used morpholino antisense oligonucleotides (MO or morpholino) to block initiation of translation of the target genes. TCDD-induced mRNA expression of CYP1Cs and a decrease in MsV blood flow were both blocked by gene knockdown of AHR2. Gene knockdown of CYP1C1 by two different morpholinos and CYP1C2 by two different morpholinos, but not by their 5 nucleotide-mismatch controls, was effective in blocking reduced MsV blood flow caused by TCDD. The same CYP1C-MOs prevented reduction of blood flow in the MsV caused by β-naphthoflavone (BNF), representing another class of AHR agonists. Whole-mount in situ hybridization revealed that mRNA expression of CYP1C1 and CYP1C2 was induced by TCDD most strongly in branchiogenic primordia and pectoral fin buds. In situ hybridization using head transverse sections showed that TCDD increased the expression of both CYP1Cs in endothelial cells of blood vessels, including the MsV. These results indicate a potential role of CYP1C1 and CYP1C2 in the local circulation failure induced by AHR2 activation in the dorsal midbrain of the zebrafish embryo. - Research Highlights: → We examine the roles of zebrafish CYP1C1 and CYP1C2 in TCDD developmental toxicity. → TCDD induces mRNA expression of both CYP1Cs in the mesencephalic vein. → Knockdown of each

  18. Genomic evidence of geographically widespread effect of gene flow from polar bears into brown bears

    OpenAIRE

    Cahill, James A; Stirling, Ian; Kistler, Logan; Salamzade, Rauf; Ersmark, Erik; Fulton, Tara L; Stiller, Mathias; Green, Richard E; Shapiro, Beth

    2015-01-01

    © 2014 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd. Polar bears are an arctic, marine adapted species that is closely related to brown bears. Genome analyses have shown that polar bears are distinct and genetically homogeneous in comparison to brown bears. However, these analyses have also revealed a remarkable episode of polar bear gene flow into the population of brown bears that colonized the Admiralty, Baranof and Chichagof islands (ABC islands) of Alaska. Here, we...

  19. Contrasting Effects of Historical Sea Level Rise and Contemporary Ocean Currents on Regional Gene Flow of Rhizophora racemosa in Eastern Atlantic Mangroves.

    Directory of Open Access Journals (Sweden)

    Magdalene N Ngeve

    Full Text Available Mangroves are seafaring taxa through their hydrochorous propagules that have the potential to disperse over long distances. Therefore, investigating their patterns of gene flow provides insights on the processes involved in the spatial genetic structuring of populations. The coastline of Cameroon has a particular geomorphological history and coastal hydrology with complex contemporary patterns of ocean currents, which we hypothesize to have effects on the spatial configuration and composition of present-day mangroves within its spans. A total of 982 trees were sampled from 33 transects (11 sites in 4 estuaries. Using 11 polymorphic SSR markers, we investigated genetic diversity and structure of Rhizophora racemosa, a widespread species in the region. Genetic diversity was low to moderate and genetic differentiation between nearly all population pairs was significant. Bayesian clustering analysis, PCoA, estimates of contemporary migration rates and identification of barriers to gene flow were used and complemented with estimated dispersal trajectories of hourly released virtual propagules, using high-resolution surface current from a mesoscale and tide-resolving ocean simulation. These indicate that the Cameroon Volcanic Line (CVL is not a present-day barrier to gene flow. Rather, the Inter-Bioko-Cameroon (IBC corridor, formed due to sea level rise, allows for connectivity between two mangrove areas that were isolated during glacial times by the CVL. Genetic data and numerical ocean simulations indicated that an oceanic convergence zone near the Cameroon Estuary complex (CEC presents a strong barrier to gene flow, resulting in genetic discontinuities between the mangrove areas on either side. This convergence did not result in higher genetic diversity at the CEC as we had hypothesized. In conclusion, the genetic structure of Rhizophora racemosa is maintained by the contrasting effects of the contemporary oceanic convergence and historical climate

  20. Air-mediated pollen flow from genetically modified to conventional crops.

    Science.gov (United States)

    Kuparinen, Anna; Schurr, Frank; Tackenberg, Oliver; O'Hara, Robert B

    2007-03-01

    Tools for estimating pollen dispersal and the resulting gene flow are necessary to assess the risk of gene flow from genetically modified (GM) to conventional fields, and to quantify the effectiveness of measures that may prevent such gene flow. A mechanistic simulation model is presented and used to simulate pollen dispersal by wind in different agricultural scenarios over realistic pollination periods. The relative importance of landscape-related variables such as isolation distance, topography, spatial configuration of the fields, GM field size and barrier, and environmental variation are examined in order to find ways to minimize gene flow and to detect possible risk factors. The simulations demonstrated a large variation in pollen dispersal and in the predicted amount of contamination between different pollination periods. This was largely due to variation in vertical wind. As this variation in wind conditions is difficult to control through management measures, it should be carefully considered when estimating the risk of gene flow from GM crops. On average, the predicted level of gene flow decreased with increasing isolation distance and with increasing depth of the conventional field, and increased with increasing GM field size. Therefore, at a national scale and over the long term these landscape properties should be accounted for when setting regulations for controlling gene flow. However, at the level of an individual field the level of gene flow may be dominated by uncontrollable variation. Due to the sensitivity of pollen dispersal to the wind, we conclude that gene flow cannot be summarized only by the mean contamination; information about the frequency of extreme events should also be considered. The modeling approach described in this paper offers a way to predict and compare pollen dispersal and gene flow in varying environmental conditions, and to assess the effectiveness of different management measures.

  1. Recent and projected increases in atmospheric CO2 concentration can enhance gene flow between wild and genetically altered rice (Oryza sativa.

    Directory of Open Access Journals (Sweden)

    Lewis H Ziska

    Full Text Available Although recent and projected increases in atmospheric carbon dioxide can alter plant phenological development, these changes have not been quantified in terms of floral outcrossing rates or gene transfer. Could differential phenological development in response to rising CO(2 between genetically modified crops and wild, weedy relatives increase the spread of novel genes, potentially altering evolutionary fitness? Here we show that increasing CO(2 from an early 20(th century concentration (300 µmol mol(-1 to current (400 µmol mol(-1 and projected, mid-21(st century (600 µmol mol(-1 values, enhanced the flow of genes from wild, weedy rice to the genetically altered, herbicide resistant, cultivated population, with outcrossing increasing from 0.22% to 0.71% from 300 to 600 µmol mol(-1. The increase in outcrossing and gene transfer was associated with differential increases in plant height, as well as greater tiller and panicle production in the wild, relative to the cultivated population. In addition, increasing CO(2 also resulted in a greater synchronicity in flowering times between the two populations. The observed changes reported here resulted in a subsequent increase in rice dedomestication and a greater number of weedy, herbicide-resistant hybrid progeny. Overall, these data suggest that differential phenological responses to rising atmospheric CO(2 could result in enhanced flow of novel genes and greater success of feral plant species in agroecosystems.

  2. Recent and projected increases in atmospheric CO2 concentration can enhance gene flow between wild and genetically altered rice (Oryza sativa).

    Science.gov (United States)

    Ziska, Lewis H; Gealy, David R; Tomecek, Martha B; Jackson, Aaron K; Black, Howard L

    2012-01-01

    Although recent and projected increases in atmospheric carbon dioxide can alter plant phenological development, these changes have not been quantified in terms of floral outcrossing rates or gene transfer. Could differential phenological development in response to rising CO(2) between genetically modified crops and wild, weedy relatives increase the spread of novel genes, potentially altering evolutionary fitness? Here we show that increasing CO(2) from an early 20(th) century concentration (300 µmol mol(-1)) to current (400 µmol mol(-1)) and projected, mid-21(st) century (600 µmol mol(-1)) values, enhanced the flow of genes from wild, weedy rice to the genetically altered, herbicide resistant, cultivated population, with outcrossing increasing from 0.22% to 0.71% from 300 to 600 µmol mol(-1). The increase in outcrossing and gene transfer was associated with differential increases in plant height, as well as greater tiller and panicle production in the wild, relative to the cultivated population. In addition, increasing CO(2) also resulted in a greater synchronicity in flowering times between the two populations. The observed changes reported here resulted in a subsequent increase in rice dedomestication and a greater number of weedy, herbicide-resistant hybrid progeny. Overall, these data suggest that differential phenological responses to rising atmospheric CO(2) could result in enhanced flow of novel genes and greater success of feral plant species in agroecosystems.

  3. Transcriptome inference and systems approaches to polypharmacology and drug discovery in herbal medicine.

    Science.gov (United States)

    Li, Peng; Chen, Jianxin; Zhang, Wuxia; Fu, Bangze; Wang, Wei

    2017-01-04

    Herbal medicine is a concoction of numerous chemical ingredients, and it exhibits polypharmacological effects to act on multiple pharmacological targets, regulating different biological mechanisms and treating a variety of diseases. Thus, this complexity is impossible to deconvolute by the reductionist method of extracting one active ingredient acting on one biological target. To dissect the polypharmacological effects of herbal medicines and their underling pharmacological targets as well as their corresponding active ingredients. We propose a system-biology strategy that combines omics and bioinformatical methodologies for exploring the polypharmacology of herbal mixtures. The myocardial ischemia model was induced by Ameroid constriction of the left anterior descending coronary in Ba-Ma miniature pigs. RNA-seq analysis was utilized to find the differential genes induced by myocardial ischemia in pigs treated with formula QSKL. A transcriptome-based inference method was used to find the landmark drugs with similar mechanisms to QSKL. Gene-level analysis of RNA-seq data in QSKL-treated cases versus control animals yields 279 differential genes. Transcriptome-based inference methods identified 80 landmark drugs that covered nearly all drug classes. Then, based on the landmark drugs, 155 potential pharmacological targets and 57 indications were identified for QSKL. Our results demonstrate the power of a combined approach for exploring the pharmacological target and chemical space of herbal medicines. We hope that our method could enhance our understanding of the molecular mechanisms of herbal systems and further accelerate the exploration of the value of traditional herbal medicine systems. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. On the criticality of inferred models

    Science.gov (United States)

    Mastromatteo, Iacopo; Marsili, Matteo

    2011-10-01

    Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality.

  5. On the criticality of inferred models

    International Nuclear Information System (INIS)

    Mastromatteo, Iacopo; Marsili, Matteo

    2011-01-01

    Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality

  6. An Inference Language for Imaging

    DEFF Research Database (Denmark)

    Pedemonte, Stefano; Catana, Ciprian; Van Leemput, Koen

    2014-01-01

    We introduce iLang, a language and software framework for probabilistic inference. The iLang framework enables the definition of directed and undirected probabilistic graphical models and the automated synthesis of high performance inference algorithms for imaging applications. The iLang framewor...

  7. Patterns of gene flow and selection across multiple species of Acrocephalus warblers: footprints of parallel selection on the Z chromosome

    Czech Academy of Sciences Publication Activity Database

    Reifová, R.; Majerová, V.; Reif, J.; Ahola, M.; Lindholm, A.; Procházka, Petr

    2016-01-01

    Roč. 16, č. 130 (2016), s. 130 ISSN 1471-2148 Institutional support: RVO:68081766 Keywords : Adaptive radiation * Speciation * Gene flow * Parallel adaptive evolution * Z chromosome * Acrocephalus warblers Subject RIV: EG - Zoology Impact factor: 3.221, year: 2016

  8. Bottlenecks and Hubs in Inferred Networks Are Important for Virulence in Salmonella typhimurium

    Energy Technology Data Exchange (ETDEWEB)

    McDermott, Jason E.; Taylor, Ronald C.; Yoon, Hyunjin; Heffron, Fred

    2009-02-01

    Recent advances in experimental methods have provided sufficient data to consider systems as large networks of interconnected components. High-throughput determination of protein-protein interaction networks has led to the observation that topological bottlenecks, that is proteins defined by high centrality in the network, are enriched in proteins with systems-level phenotypes such as essentiality. Global transcriptional profiling by microarray analysis has been used extensively to characterize systems, for example, cellular response to environmental conditions and genetic mutations. These transcriptomic datasets have been used to infer regulatory and functional relationship networks based on co-regulation. We use the context likelihood of relatedness (CLR) method to infer networks from two datasets gathered from the pathogen Salmonella typhimurium; one under a range of environmental culture conditions and the other from deletions of 15 regulators found to be essential in virulence. Bottleneck nodes were identified from these inferred networks and we show that these nodes are significantly more likely to be essential for virulence than their non-bottleneck counterparts. A network generated using Pearson correlation did not display this behavior. Overall this study demonstrates that topology of networks inferred from global transcriptional profiles provides information about the systems-level roles of bottleneck genes. Analysis of the differences between the two CLR-derived networks suggests that the bottleneck nodes are either mediators of transitions between system states or sentinels that reflect the dynamics of these transitions.

  9. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    2010-01-01

    Chapter 9: This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods based on a maximum likelihood or Bayesian approach combined with markov chain Monte Carlo...... (MCMC) techniques. Due to space limitations the focus is on spatial point processes....

  10. Feature Inference Learning and Eyetracking

    Science.gov (United States)

    Rehder, Bob; Colner, Robert M.; Hoffman, Aaron B.

    2009-01-01

    Besides traditional supervised classification learning, people can learn categories by inferring the missing features of category members. It has been proposed that feature inference learning promotes learning a category's internal structure (e.g., its typical features and interfeature correlations) whereas classification promotes the learning of…

  11. Final Report, DOE Early Career Award: Predictive modeling of complex physical systems: new tools for statistical inference, uncertainty quantification, and experimental design

    Energy Technology Data Exchange (ETDEWEB)

    Marzouk, Youssef [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)

    2016-08-31

    Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesian inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.

  12. Relaxation rates of gene expression kinetics reveal the feedback signs of autoregulatory gene networks

    Science.gov (United States)

    Jia, Chen; Qian, Hong; Chen, Min; Zhang, Michael Q.

    2018-03-01

    The transient response to a stimulus and subsequent recovery to a steady state are the fundamental characteristics of a living organism. Here we study the relaxation kinetics of autoregulatory gene networks based on the chemical master equation model of single-cell stochastic gene expression with nonlinear feedback regulation. We report a novel relation between the rate of relaxation, characterized by the spectral gap of the Markov model, and the feedback sign of the underlying gene circuit. When a network has no feedback, the relaxation rate is exactly the decaying rate of the protein. We further show that positive feedback always slows down the relaxation kinetics while negative feedback always speeds it up. Numerical simulations demonstrate that this relation provides a possible method to infer the feedback topology of autoregulatory gene networks by using time-series data of gene expression.

  13. Adaptive surrogate modeling for response surface approximations with application to bayesian inference

    KAUST Repository

    Prudhomme, Serge; Bryant, Corey M.

    2015-01-01

    Parameter estimation for complex models using Bayesian inference is usually a very costly process as it requires a large number of solves of the forward problem. We show here how the construction of adaptive surrogate models using a posteriori error estimates for quantities of interest can significantly reduce the computational cost in problems of statistical inference. As surrogate models provide only approximations of the true solutions of the forward problem, it is nevertheless necessary to control these errors in order to construct an accurate reduced model with respect to the observables utilized in the identification of the model parameters. Effectiveness of the proposed approach is demonstrated on a numerical example dealing with the Spalart–Allmaras model for the simulation of turbulent channel flows. In particular, we illustrate how Bayesian model selection using the adapted surrogate model in place of solving the coupled nonlinear equations leads to the same quality of results while requiring fewer nonlinear PDE solves.

  14. Adaptive surrogate modeling for response surface approximations with application to bayesian inference

    KAUST Repository

    Prudhomme, Serge

    2015-09-17

    Parameter estimation for complex models using Bayesian inference is usually a very costly process as it requires a large number of solves of the forward problem. We show here how the construction of adaptive surrogate models using a posteriori error estimates for quantities of interest can significantly reduce the computational cost in problems of statistical inference. As surrogate models provide only approximations of the true solutions of the forward problem, it is nevertheless necessary to control these errors in order to construct an accurate reduced model with respect to the observables utilized in the identification of the model parameters. Effectiveness of the proposed approach is demonstrated on a numerical example dealing with the Spalart–Allmaras model for the simulation of turbulent channel flows. In particular, we illustrate how Bayesian model selection using the adapted surrogate model in place of solving the coupled nonlinear equations leads to the same quality of results while requiring fewer nonlinear PDE solves.

  15. Gene flow and genetic diversity in cultivated and wild cacao (Theobroma cacao) in Bolivia.

    Science.gov (United States)

    Chumacero de Schawe, Claudia; Durka, Walter; Tscharntke, Teja; Hensen, Isabell; Kessler, Michael

    2013-11-01

    The role of pollen flow within and between cultivated and wild tropical crop species is little known. To study the pollen flow of cacao, we estimated the degree of self-pollination and pollen dispersal distances as well as gene flow between wild and cultivated cacao (Theobroma cacao L.). We studied pollen flow and genetic diversity of cultivated and wild cacao populations by genotyping 143 wild and 86 cultivated mature plants and 374 seedlings raised from 19 wild and 25 cultivated trees at nine microsatellite loci. A principal component analysis distinguished wild and cultivated cacao trees, supporting the notion that Bolivia harbors truly wild cacao populations. Cultivated cacao had a higher level of genetic diversity than wild cacao, presumably reflecting the varied origin of cultivated plants. Both cacao types had high outcrossing rates, but the paternity analysis revealed 7-14% self-pollination in wild and cultivated cacao. Despite the tiny size of the pollinators, pollen was transported distances up to 3 km; wild cacao showed longer distances (mean = 922 m) than cultivated cacao (826 m). Our data revealed that 16-20% of pollination events occurred between cultivated and wild populations. We found evidence of self-pollination in both wild and cultivated cacao. Pollination distances are larger than those typically reported in tropical understory tree species. The relatively high pollen exchange from cultivated to wild cacao compromises genetic identity of wild populations, calling for the protection of extensive natural forest tracts to protect wild cacao in Bolivia.

  16. Forward and backward inference in spatial cognition.

    Directory of Open Access Journals (Sweden)

    Will D Penny

    Full Text Available This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.

  17. Gene tree rooting methods give distributions that mimic the coalescent process.

    Science.gov (United States)

    Tian, Yuan; Kubatko, Laura S

    2014-01-01

    Multi-locus phylogenetic inference is commonly carried out via models that incorporate the coalescent process to model the possibility that incomplete lineage sorting leads to incongruence between gene trees and the species tree. An interesting question that arises in this context is whether data "fit" the coalescent model. Previous work (Rosenfeld et al., 2012) has suggested that rooting of gene trees may account for variation in empirical data that has been previously attributed to the coalescent process. We examine this possibility using simulated data. We show that, in the case of four taxa, the distribution of gene trees observed from rooting estimated gene trees with either the molecular clock or with outgroup rooting can be closely matched by the distribution predicted by the coalescent model with specific choices of species tree branch lengths. We apply commonly-used coalescent-based methods of species tree inference to assess their performance in these situations. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Y-chromosomal diversity in Haiti and Jamaica: contrasting levels of sex-biased gene flow.

    Science.gov (United States)

    Simms, Tanya M; Wright, Marisil R; Hernandez, Michelle; Perez, Omar A; Ramirez, Evelyn C; Martinez, Emanuel; Herrera, Rene J

    2012-08-01

    Although previous studies have characterized the genetic structure of populations from Haiti and Jamaica using classical and autosomal STR polymorphisms, the patrilineal influences that are present in these countries have yet to be explored. To address this lacuna, the current study aims to investigate, for the first time, the potential impact of different ancestral sources, unique colonial histories, and distinct family structures on the paternal profile of both groups. According to previous reports examining populations from the Americas, island-specific demographic histories can greatly impact population structure, including various patterns of sex-biased gene flow. Also, given the contrasting autosomal profiles provided in our earlier study (Simms et al.: Am J Phys Anthropol 142 (2010) 49-66), we hypothesize that the degree and directionality of gene flow from Europeans, Africans, Amerindians, and East Asians are dissimilar in the two countries. To test this premise, 177 high-resolution Y-chromosome binary markers and 17 Y-STR loci were typed in Haiti (n = 123) and Jamaica (n = 159) and subsequently utilized for phylogenetic comparisons to available reference collections encompassing Africa, Europe, Asia (East and South), and the New World. Our results reveal that both studied populations exhibit a predominantly South-Saharan paternal component, with haplogroups A1b-V152, A3-M32, B2-M182, E1a-M33, E1b1a-M2, E2b-M98, and R1b2-V88 comprising 77.2% and 66.7% of the Haitian and Jamaican paternal gene pools, respectively. Yet, European derived chromosomes (i.e., haplogroups G2a*-P15, I-M258, R1b1b-M269, and T-M184) were detected at commensurate levels in Haiti (20.3%) and Jamaica (18.9%), whereas Y-haplogroups indicative of Chinese [O-M175 (3.8%)] and Indian [H-M69 (0.6%) and L-M20 (0.6%)] ancestry were restricted to Jamaica. Copyright © 2012 Wiley Periodicals, Inc.

  19. In search of functional association from time-series microarray data based on the change trend and level of gene expression

    Directory of Open Access Journals (Sweden)

    Zeng An-Ping

    2006-02-01

    Full Text Available Abstract Background The increasing availability of time-series expression data opens up new possibilities to study functional linkages of genes. Present methods used to infer functional linkages between genes from expression data are mainly based on a point-to-point comparison. Change trends between consecutive time points in time-series data have been so far not well explored. Results In this work we present a new method based on extracting main features of the change trend and level of gene expression between consecutive time points. The method, termed as trend correlation (TC, includes two major steps: 1, calculating a maximal local alignment of change trend score by dynamic programming and a change trend correlation coefficient between the maximal matched change levels of each gene pair; 2, inferring relationships of gene pairs based on two statistical extraction procedures. The new method considers time shifts and inverted relationships in a similar way as the local clustering (LC method but the latter is merely based on a point-to-point comparison. The TC method is demonstrated with data from yeast cell cycle and compared with the LC method and the widely used Pearson correlation coefficient (PCC based clustering method. The biological significance of the gene pairs is examined with several large-scale yeast databases. Although the TC method predicts an overall lower number of gene pairs than the other two methods at a same p-value threshold, the additional number of gene pairs inferred by the TC method is considerable: e.g. 20.5% compared with the LC method and 49.6% with the PCC method for a p-value threshold of 2.7E-3. Moreover, the percentage of the inferred gene pairs consistent with databases by our method is generally higher than the LC method and similar to the PCC method. A significant number of the gene pairs only inferred by the TC method are process-identity or function-similarity pairs or have well-documented biological

  20. A formal model of interpersonal inference

    Directory of Open Access Journals (Sweden)

    Michael eMoutoussis

    2014-03-01

    Full Text Available Introduction: We propose that active Bayesian inference – a general framework for decision-making – can equally be applied to interpersonal exchanges. Social cognition, however, entails special challenges. We address these challenges through a novel formulation of a formal model and demonstrate its psychological significance. Method: We review relevant literature, especially with regards to interpersonal representations, formulate a mathematical model and present a simulation study. The model accommodates normative models from utility theory and places them within the broader setting of Bayesian inference. Crucially, we endow people's prior beliefs, into which utilities are absorbed, with preferences of self and others. The simulation illustrates the model's dynamics and furnishes elementary predictions of the theory. Results: 1. Because beliefs about self and others inform both the desirability and plausibility of outcomes, in this framework interpersonal representations become beliefs that have to be actively inferred. This inference, akin to 'mentalising' in the psychological literature, is based upon the outcomes of interpersonal exchanges. 2. We show how some well-known social-psychological phenomena (e.g. self-serving biases can be explained in terms of active interpersonal inference. 3. Mentalising naturally entails Bayesian updating of how people value social outcomes. Crucially this includes inference about one’s own qualities and preferences. Conclusion: We inaugurate a Bayes optimal framework for modelling intersubject variability in mentalising during interpersonal exchanges. Here, interpersonal representations are endowed with explicit functional and affective properties. We suggest the active inference framework lends itself to the study of psychiatric conditions where mentalising is distorted.