WorldWideScience

Sample records for methods infer supertrees

  1. PhySIC_IST: cleaning source trees to infer more informative supertrees.

    Science.gov (United States)

    Scornavacca, Celine; Berry, Vincent; Lefort, Vincent; Douzery, Emmanuel J P; Ranwez, Vincent

    2008-10-04

    Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, liberal methods infer supertrees containing the most frequent alternative, while veto methods infer supertrees not contradicting any source tree, i.e. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees. To overcome this problem, we propose to infer non-plenary supertrees, i.e. supertrees that do not necessarily contain all the taxa present in the source trees, discarding those whose position greatly differs among source trees or for which insufficient information is provided. We detail a variant of the PhySIC veto method called PhySIC_IST that can infer non-plenary supertrees. PhySIC_IST aims at inferring supertrees that satisfy the same appealing theoretical properties as with PhySIC, while being as informative as possible under this constraint. The informativeness of a supertree is estimated using a variation of the CIC (Cladistic Information Content) criterion, that takes into account both the presence of multifurcations and the absence of some taxa. Additionally, we propose a statistical preprocessing step called STC (Source Trees Correction) to correct the source trees prior to the supertree inference. STC is a liberal step that removes the parts of each source tree that significantly conflict with other source trees. Combining STC with a veto method allows an explicit trade-off between veto and liberal approaches, tuned by a single parameter.Performing large-scale simulations, we observe that STC+PhySIC_IST infers much more informative

  2. PhySIC: a veto supertree method with desirable properties.

    Science.gov (United States)

    Ranwez, Vincent; Berry, Vincent; Criscuolo, Alexis; Fabre, Pierre-Henri; Guillemot, Sylvain; Scornavacca, Celine; Douzery, Emmanuel J P

    2007-10-01

    This paper focuses on veto supertree methods; i.e., methods that aim at producing a conservative synthesis of the relationships agreed upon by all source trees. We propose desirable properties that a supertree should satisfy in this framework, namely the non-contradiction property (PC) and the induction property (PI). The former requires that the supertree does not contain relationships that contradict one or a combination of the source topologies, whereas the latter requires that all topological information contained in the supertree is present in a source tree or collectively induced by several source trees. We provide simple examples to illustrate their relevance and that allow a comparison with previously advocated properties. We show that these properties can be checked in polynomial time for any given rooted supertree. Moreover, we introduce the PhySIC method (PHYlogenetic Signal with Induction and non-Contradiction). For k input trees spanning a set of n taxa, this method produces a supertree that satisfies the above-mentioned properties in O(kn(3) + n(4)) computing time. The polytomies of the produced supertree are also tagged by labels indicating areas of conflict as well as those with insufficient overlap. As a whole, PhySIC enables the user to quickly summarize consensual information of a set of trees and localize groups of taxa for which the data require consolidation. Lastly, we illustrate the behaviour of PhySIC on primate data sets of various sizes, and propose a supertree covering 95% of all primate extant genera. The PhySIC algorithm is available at http://atgc.lirmm.fr/cgi-bin/PhySIC.

  3. Robinson-Foulds Supertrees

    Directory of Open Access Journals (Sweden)

    Eulenstein Oliver

    2010-02-01

    Full Text Available Abstract Background Supertree methods synthesize collections of small phylogenetic trees with incomplete taxon overlap into comprehensive trees, or supertrees, that include all taxa found in the input trees. Supertree methods based on the well established Robinson-Foulds (RF distance have the potential to build supertrees that retain much information from the input trees. Specifically, the RF supertree problem seeks a binary supertree that minimizes the sum of the RF distances from the supertree to the input trees. Thus, an RF supertree is a supertree that is consistent with the largest number of clusters (or clades from the input trees. Results We introduce efficient, local search based, hill-climbing heuristics for the intrinsically hard RF supertree problem on rooted trees. These heuristics use novel non-trivial algorithms for the SPR and TBR local search problems which improve on the time complexity of the best known (naïve solutions by a factor of Θ(n and Θ(n2 respectively (where n is the number of taxa, or leaves, in the supertree. We use an implementation of our new algorithms to examine the performance of the RF supertree method and compare it to matrix representation with parsimony (MRP and the triplet supertree method using four supertree data sets. Not only did our RF heuristic provide fast estimates of RF supertrees in all data sets, but the RF supertrees also retained more of the information from the input trees (based on the RF distance than the other supertree methods. Conclusions Our heuristics for the RF supertree problem, based on our new local search algorithms, make it possible for the first time to estimate large supertrees by directly optimizing the RF distance from rooted input trees to the supertrees. This provides a new and fast method to build accurate supertrees. RF supertrees may also be useful for estimating majority-rule(- supertrees, which are a generalization of majority-rule consensus trees.

  4. MRL and SuperFine+MRL: new supertree methods

    Science.gov (United States)

    2012-01-01

    Background Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then running maximum parsimony heuristics on the MRP matrix. Experimental studies evaluating MRP in comparison to other supertree methods have established that for large datasets, MRP generally produces trees of equal or greater accuracy than other methods, and can run on larger datasets. A recent development in supertree methods is SuperFine+MRP, a method that combines MRP with a divide-and-conquer approach, and produces more accurate trees in less time than MRP. In this paper we consider a new approach for supertree estimation, called MRL (Matrix Representation with Likelihood). MRL begins with the same MRP matrix, but then analyzes the MRP matrix using heuristics (such as RAxML) for 2-state Maximum Likelihood. Results We compared MRP and SuperFine+MRP with MRL and SuperFine+MRL on simulated and biological datasets. We examined the MRP and MRL scores of each method on a wide range of datasets, as well as the resulting topological accuracy of the trees. Our experimental results show that MRL, coupled with a very good ML heuristic such as RAxML, produced more accurate trees than MRP, and MRL scores were more strongly correlated with topological accuracy than MRP scores. Conclusions SuperFine+MRP, when based upon a good MP heuristic, such as TNT, produces among the best scores for both MRP and MRL, and is generally faster and more topologically accurate than other supertree methods we tested. PMID:22280525

  5. L.U.St: a tool for approximated maximum likelihood supertree reconstruction.

    Science.gov (United States)

    Akanni, Wasiu A; Creevey, Christopher J; Wilkinson, Mark; Pisani, Davide

    2014-06-12

    Supertrees combine disparate, partially overlapping trees to generate a synthesis that provides a high level perspective that cannot be attained from the inspection of individual phylogenies. Supertrees can be seen as meta-analytical tools that can be used to make inferences based on results of previous scientific studies. Their meta-analytical application has increased in popularity since it was realised that the power of statistical tests for the study of evolutionary trends critically depends on the use of taxon-dense phylogenies. Further to that, supertrees have found applications in phylogenomics where they are used to combine gene trees and recover species phylogenies based on genome-scale data sets. Here, we present the L.U.St package, a python tool for approximate maximum likelihood supertree inference and illustrate its application using a genomic data set for the placental mammals. L.U.St allows the calculation of the approximate likelihood of a supertree, given a set of input trees, performs heuristic searches to look for the supertree of highest likelihood, and performs statistical tests of two or more supertrees. To this end, L.U.St implements a winning sites test allowing ranking of a collection of a-priori selected hypotheses, given as a collection of input supertree topologies. It also outputs a file of input-tree-wise likelihood scores that can be used as input to CONSEL for calculation of standard tests of two trees (e.g. Kishino-Hasegawa, Shimidoara-Hasegawa and Approximately Unbiased tests). This is the first fully parametric implementation of a supertree method, it has clearly understood properties, and provides several advantages over currently available supertree approaches. It is easy to implement and works on any platform that has python installed. bitBucket page - https://afro-juju@bitbucket.org/afro-juju/l.u.st.git. Davide.Pisani@bristol.ac.uk.

  6. Bad Clade Deletion Supertrees: A Fast and Accurate Supertree Algorithm.

    Science.gov (United States)

    Fleischauer, Markus; Böcker, Sebastian

    2017-09-01

    Supertree methods merge a set of overlapping phylogenetic trees into a supertree containing all taxa of the input trees. The challenge in supertree reconstruction is the way of dealing with conflicting information in the input trees. Many different algorithms for different objective functions have been suggested to resolve these conflicts. In particular, there exist methods based on encoding the source trees in a matrix, where the supertree is constructed applying a local search heuristic to optimize the respective objective function. We present a novel heuristic supertree algorithm called Bad Clade Deletion (BCD) supertrees. It uses minimum cuts to delete a locally minimal number of columns from such a matrix representation so that it is compatible. This is the complement problem to Matrix Representation with Compatibility (Maximum Split Fit). Our algorithm has guaranteed polynomial worst-case running time and performs swiftly in practice. Different from local search heuristics, it guarantees to return the directed perfect phylogeny for the input matrix, corresponding to the parent tree of the input trees, if one exists. Comparing supertrees to model trees for simulated data, BCD shows a better accuracy (F1 score) than the state-of-the-art algorithms SuperFine (up to 3%) and Matrix Representation with Parsimony (up to 7%); at the same time, BCD is up to 7 times faster than SuperFine, and up to 600 times faster than Matrix Representation with Parsimony. Finally, using the BCD supertree as a starting tree for a combined Maximum Likelihood analysis using RAxML, we reach significantly improved accuracy (1% higher F1 score) and running time (1.7-fold speedup). © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  7. COSPEDTree: COuplet Supertree by Equivalence Partitioning of Taxa Set and DAG Formation.

    Science.gov (United States)

    Bhattacharyya, Sourya; Mukherjee, Jayanta

    2015-01-01

    From a set of phylogenetic trees with overlapping taxa set, a supertree exhibits evolutionary relationships among all input taxa. The key is to resolve the contradictory relationships with respect to input trees, between individual taxa subsets. Formulation of this NP hard problem employs either local search heuristics to reduce tree search space, or resolves the conflicts with respect to fixed or varying size subtree level decompositions. Different approximation techniques produce supertrees with considerable performance variations. Moreover, the majority of the algorithms involve high computational complexity, thus not suitable for use on large biological data sets. Current study presents COSPEDTree, a novel method for supertree construction. The technique resolves source tree conflicts by analyzing couplet (taxa pair) relationships for each source trees. Subsequently, individual taxa pairs are resolved with a single relation. To prioritize the consensus relations among individual taxa pairs for resolving them, greedy scoring is employed to assign higher score values for the consensus relations among a taxa pair. Selected set of relations resolving individual taxa pairs is subsequently used to construct a directed acyclic graph (DAG). Vertices of DAG represents a taxa subset inferred from the same speciation event. Thus, COSPEDTree can generate non-binary supertrees as well. Depth first traversal on this DAG yields final supertree. According to the performance metrics on branch dissimilarities (such as FP, FN and RF), COSPEDTree produces mostly conservative, well resolved supertrees. Specifically, RF metrics are mostly lower compared to the reference approaches, and FP values are lower apart from only strictly conservative (or veto) approaches. COSPEDTree has worst case time and space complexities of cubic and quadratic order, respectively, better or comparable to the reference approaches. Such high performance and low computational costs enable COSPEDTree to be

  8. A supertree approach to shorebird phylogeny

    Directory of Open Access Journals (Sweden)

    Thomas Gavin H

    2004-08-01

    Full Text Available Abstract Background Order Charadriiformes (shorebirds is an ideal model group in which to study a wide range of behavioural, ecological and macroevolutionary processes across species. However, comparative studies depend on phylogeny to control for the effects of shared evolutionary history. Although numerous hypotheses have been presented for subsets of the Charadriiformes none to date include all recognised species. Here we use the matrix representation with parsimony method to produce the first fully inclusive supertree of Charadriiformes. We also provide preliminary estimates of ages for all nodes in the tree. Results Three main lineages are revealed: i the plovers and allies; ii the gulls and allies; and iii the sandpipers and allies. The relative position of these clades is unresolved in the strict consensus tree but a 50% majority-rule consensus tree indicates that the sandpiper clade is sister group to the gulls and allies whilst the plover group is placed at the base of the tree. The overall topology is highly consistent with recent molecular hypotheses of shorebird phylogeny. Conclusion The supertree hypothesis presented herein is (to our knowledge the only complete phylogenetic hypothesis of all extant shorebirds. Despite concerns over the robustness of supertrees (see Discussion, we believe that it provides a valuable framework for testing numerous evolutionary hypotheses relating to the diversity of behaviour, ecology and life-history of the Charadriiformes.

  9. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis

    Directory of Open Access Journals (Sweden)

    Stajich Jason E

    2006-11-01

    Full Text Available Abstract Background To date, most fungal phylogenies have been derived from single gene comparisons, or from concatenated alignments of a small number of genes. The increase in fungal genome sequencing presents an opportunity to reconstruct evolutionary events using entire genomes. As a tool for future comparative, phylogenomic and phylogenetic studies, we used both supertrees and concatenated alignments to infer relationships between 42 species of fungi for which complete genome sequences are available. Results A dataset of 345,829 genes was extracted from 42 publicly available fungal genomes. Supertree methods were employed to derive phylogenies from 4,805 single gene families. We found that the average consensus supertree method may suffer from long-branch attraction artifacts, while matrix representation with parsimony (MRP appears to be immune from these. A genome phylogeny was also reconstructed from a concatenated alignment of 153 universally distributed orthologs. Our MRP supertree and concatenated phylogeny are highly congruent. Within the Ascomycota, the sub-phyla Pezizomycotina and Saccharomycotina were resolved. Both phylogenies infer that the Leotiomycetes are the closest sister group to the Sordariomycetes. There is some ambiguity regarding the placement of Stagonospora nodurum, the sole member of the class Dothideomycetes present in the dataset. Within the Saccharomycotina, a monophyletic clade containing organisms that translate CTG as serine instead of leucine is evident. There is also strong support for two groups within the CTG clade, one containing the fully sexual species Candida lusitaniae, Candida guilliermondii and Debaryomyces hansenii, and the second group containing Candida albicans, Candida dubliniensis, Candida tropicalis, Candida parapsilosis and Lodderomyces elongisporus. The second major clade within the Saccharomycotina contains species whose genomes have undergone a whole genome duplication (WGD, and their close

  10. Building supertrees: an empirical assessment using the grass family (Poaceae).

    Science.gov (United States)

    Salamin, Nicolas; Hodkinson, Trevor R; Savolainen, Vincent

    2002-02-01

    Large and comprehensive phylogenetic trees are desirable for studying macroevolutionary processes and for classification purposes. Such trees can be obtained in two different ways. Either the widest possible range of taxa can be sampled and used in a phylogenetic analysis to produce a "big tree," or preexisting topologies can be used to create a supertree. Although large multigene analyses are often favored, combinable data are not always available, and supertrees offer a suitable solution. The most commonly used method of supertree reconstruction, matrix representation with parsimony (MRP), is presented here. We used a combined data set for the Poaceae to (1) assess the differences between an approach that uses combined data and one that uses different MRP modifications based on the character partitions and (2) investigate the advantages and disadvantages of these modifications. Baum and Ragan and Purvis modifications gave similar results. Incorporating bootstrap support associated with pre-existing topologies improved Baum and Ragan modification and its similarity with a combined analysis. Finally, we used the supertree reconstruction approach on 55 published phylogenies to build one of most comprehensive phylogenetic trees published for the grass family including 403 taxa and discuss its strengths and weaknesses in relation to other published hypotheses.

  11. Optimization methods for logical inference

    CERN Document Server

    Chandru, Vijay

    2011-01-01

    Merging logic and mathematics in deductive inference-an innovative, cutting-edge approach. Optimization methods for logical inference? Absolutely, say Vijay Chandru and John Hooker, two major contributors to this rapidly expanding field. And even though ""solving logical inference problems with optimization methods may seem a bit like eating sauerkraut with chopsticks. . . it is the mathematical structure of a problem that determines whether an optimization model can help solve it, not the context in which the problem occurs."" Presenting powerful, proven optimization techniques for logic in

  12. The Supertree Toolkit 2: a new and improved software package with a Graphical User Interface for supertree construction

    Directory of Open Access Journals (Sweden)

    Jon Hill

    2014-03-01

    Full Text Available Building large supertrees involves the collection, storage, and processing of thousands of individual phylogenies to create large phylogenies with thousands to tens of thousands of taxa. Such large phylogenies are useful for macroevolutionary studies, comparative biology and in conservation and biodiversity. No easy to use and fully integrated software package currently exists to carry out this task. Here, we present a new Python-based software package that uses well defined XML schema to manage both data and metadata. It builds on previous versions by 1 including new processing steps, such as Safe Taxonomic Reduction, 2 using a user-friendly GUI that guides the user to complete at least the minimum information required and includes context-sensitive documentation, and 3 a revised storage format that integrates both tree- and meta-data into a single file. These data can then be manipulated according to a well-defined, but flexible, processing pipeline using either the GUI or a command-line based tool. Processing steps include standardising names, deleting or replacing taxa, ensuring adequate taxonomic overlap, ensuring data independence, and safe taxonomic reduction. This software has been successfully used to store and process data consisting of over 1000 trees ready for analyses using standard supertree methods. This software makes large supertree creation a much easier task and provides far greater flexibility for further work.

  13. The Supertree Toolkit 2: a new and improved software package with a Graphical User Interface for supertree construction.

    Science.gov (United States)

    Hill, Jon; Davis, Katie E

    2014-01-01

    Building large supertrees involves the collection, storage, and processing of thousands of individual phylogenies to create large phylogenies with thousands to tens of thousands of taxa. Such large phylogenies are useful for macroevolutionary studies, comparative biology and in conservation and biodiversity. No easy to use and fully integrated software package currently exists to carry out this task. Here, we present a new Python-based software package that uses well defined XML schema to manage both data and metadata. It builds on previous versions by 1) including new processing steps, such as Safe Taxonomic Reduction, 2) using a user-friendly GUI that guides the user to complete at least the minimum information required and includes context-sensitive documentation, and 3) a revised storage format that integrates both tree- and meta-data into a single file. These data can then be manipulated according to a well-defined, but flexible, processing pipeline using either the GUI or a command-line based tool. Processing steps include standardising names, deleting or replacing taxa, ensuring adequate taxonomic overlap, ensuring data independence, and safe taxonomic reduction. This software has been successfully used to store and process data consisting of over 1000 trees ready for analyses using standard supertree methods. This software makes large supertree creation a much easier task and provides far greater flexibility for further work.

  14. A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction

    Science.gov (United States)

    De Oliveira Martins, Leonardo; Mallo, Diego; Posada, David

    2016-01-01

    Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models. PMID:25281847

  15. A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction.

    Science.gov (United States)

    De Oliveira Martins, Leonardo; Mallo, Diego; Posada, David

    2016-05-01

    Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

  16. The origins of species richness in the Hymenoptera: insights from a family-level supertree

    Directory of Open Access Journals (Sweden)

    Davis Robert B

    2010-04-01

    Full Text Available Abstract Background The order Hymenoptera (bees, ants, wasps, sawflies contains about eight percent of all described species, but no analytical studies have addressed the origins of this richness at family-level or above. To investigate which major subtaxa experienced significant shifts in diversification, we assembled a family-level phylogeny of the Hymenoptera using supertree methods. We used sister-group species-richness comparisons to infer the phylogenetic position of shifts in diversification. Results The supertrees most supported by the underlying input trees are produced using matrix representation with compatibility (MRC (from an all-in and a compartmentalised analysis. Whilst relationships at the tips of the tree tend to be well supported, those along the backbone of the tree (e.g. between Parasitica superfamilies are generally not. Ten significant shifts in diversification (six positive and four negative are found common to both MRC supertrees. The Apocrita (wasps, ants, bees experienced a positive shift at their origin accounting for approximately 4,000 species. Within Apocrita other positive shifts include the Vespoidea (vespoid wasps/ants containing 24,000 spp., Anthophila + Sphecidae (bees/thread-waisted wasps; 22,000 spp., Bethylidae + Chrysididae (bethylid/cuckoo wasps; 5,200 spp., Dryinidae (dryinid wasps; 1,100 spp., and Proctotrupidae (proctotrupid wasps; 310 spp.. Four relatively species-poor families (Stenotritidae, Anaxyelidae, Blasticotomidae, Xyelidae have undergone negative shifts. There are some two-way shifts in diversification where sister taxa have undergone shifts in opposite directions. Conclusions Our results suggest that numerous phylogenetically distinctive radiations contribute to the richness of large clades. They also suggest that evolutionary events restricting the subsequent richness of large clades are common. Problematic phylogenetic issues in the Hymenoptera are identified, relating especially to

  17. Weed management at ArborGen, South Carolina SuperTree Nursery

    Science.gov (United States)

    Mike Arnette

    2009-01-01

    Weed management is vital to producing healthy hardwood seedlings. Several methods are available to each nursery, and it is common knowledge that what works for one situation may not work for another. The weed control methods used in nursery beds of hardwood species at the South Carolina SuperTree Nursery (Blenheim) are listed below.

  18. Future trypanosomatid phylogenies: refined homologies, supertrees and networks

    Directory of Open Access Journals (Sweden)

    Stothard JR

    2000-01-01

    Full Text Available There has been good progress in inferring the evolutionary relationships within trypanosomes from DNA data as until relatively recently, many relationships have remained rather speculative. Ongoing molecular studies have provided data that have adequately shown Trypanosoma to be monophyletic and, rather surprisingly, that there are sharply contrasting levels of genetic variation within and between the major trypanosomatid groups. There are still, however, areas of research that could benefit from further development and resolution that broadly fall upon three questions. Are the current statements of evolutionary homology within ribosomal small sub-unit genes in need of refinement? Can the published phylograms be expanded upon to form `supertrees' depicting further relationships? Does a bifurcating tree structure impose an untenable dogma upon trypanosomatid phylogeny where hybridisation or reticulate evolutionary steps have played a part? This article briefly addresses these three questions and, in so doing, hopes to stimulate further interest in the molecular evolution of the group.

  19. Statistical inference via fiducial methods

    OpenAIRE

    Salomé, Diemer

    1998-01-01

    In this thesis the attention is restricted to inductive reasoning using a mathematical probability model. A statistical procedure prescribes, for every theoretically possible set of data, the inference about the unknown of interest. ... Zie: Summary

  20. Towards a Supertree of Arthropoda: A Species-Level Supertree of the Spiny, Slipper and Coral Lobsters (Decapoda: Achelata).

    Science.gov (United States)

    Davis, Katie E; Hesketh, Thomas W; Delmer, Cyrille; Wills, Matthew A

    2015-01-01

    While supertrees have been built for many vertebrate groups (notably birds, mammals and dinosaurs), invertebrates have attracted relatively little attention. The paucity of supertrees of arthropods is particularly surprising given their economic and ecological importance, as well as their overwhelming contribution to biodiversity. The absence of comprehensive archives of machine-readable source trees, coupled with the need for software implementing repeatable protocols for managing them, has undoubtedly impeded progress. Here we present a supertree of Achelata (spiny, slipper and coral lobsters) as a proof of concept, constructed using new supertree specific software (the Supertree Toolkit; STK) and following a published protocol. We also introduce a new resource for archiving and managing published source trees. Our supertree of Achelata is synthesised from morphological and molecular source trees, and represents the most complete species-level tree of the group to date. Our findings are consistent with recent taxonomic treatments, confirming the validity of just two families: Palinuridae and Scyllaridae; Synaxidae were resolved within Palinuridae. Monophyletic Silentes and Stridentes lineages are recovered within Palinuridae, and all sub-families within Scyllaridae are found to be monophyletic with the exception of Ibacinae. We demonstrate the feasibility of building larger supertrees of arthropods, with the ultimate objective of building a complete species-level phylogeny for the entire phylum using a divide and conquer strategy.

  1. Towards a Supertree of Arthropoda: A Species-Level Supertree of the Spiny, Slipper and Coral Lobsters (Decapoda: Achelata.

    Directory of Open Access Journals (Sweden)

    Katie E Davis

    Full Text Available While supertrees have been built for many vertebrate groups (notably birds, mammals and dinosaurs, invertebrates have attracted relatively little attention. The paucity of supertrees of arthropods is particularly surprising given their economic and ecological importance, as well as their overwhelming contribution to biodiversity. The absence of comprehensive archives of machine-readable source trees, coupled with the need for software implementing repeatable protocols for managing them, has undoubtedly impeded progress. Here we present a supertree of Achelata (spiny, slipper and coral lobsters as a proof of concept, constructed using new supertree specific software (the Supertree Toolkit; STK and following a published protocol. We also introduce a new resource for archiving and managing published source trees. Our supertree of Achelata is synthesised from morphological and molecular source trees, and represents the most complete species-level tree of the group to date. Our findings are consistent with recent taxonomic treatments, confirming the validity of just two families: Palinuridae and Scyllaridae; Synaxidae were resolved within Palinuridae. Monophyletic Silentes and Stridentes lineages are recovered within Palinuridae, and all sub-families within Scyllaridae are found to be monophyletic with the exception of Ibacinae. We demonstrate the feasibility of building larger supertrees of arthropods, with the ultimate objective of building a complete species-level phylogeny for the entire phylum using a divide and conquer strategy.

  2. Bayesian Inference Methods for Sparse Channel Estimation

    DEFF Research Database (Denmark)

    Pedersen, Niels Lovmand

    2013-01-01

    This thesis deals with sparse Bayesian learning (SBL) with application to radio channel estimation. As opposed to the classical approach for sparse signal representation, we focus on the problem of inferring complex signals. Our investigations within SBL constitute the basis for the development...... of Bayesian inference algorithms for sparse channel estimation. Sparse inference methods aim at finding the sparse representation of a signal given in some overcomplete dictionary of basis vectors. Within this context, one of our main contributions to the field of SBL is a hierarchical representation...... analysis of the complex prior representation, where we show that the ability to induce sparse estimates of a given prior heavily depends on the inference method used and, interestingly, whether real or complex variables are inferred. We also show that the Bayesian estimators derived from the proposed...

  3. A higher-level MRP supertree of placental mammals

    Directory of Open Access Journals (Sweden)

    Bininda-Emonds Olaf RP

    2006-11-01

    Full Text Available Abstract Background The higher-level phylogeny of placental mammals has long been a phylogenetic Gordian knot, with disagreement about both the precise contents of, and relationships between, the extant orders. A recent MRP supertree that favoured 'outdated' hypotheses (notably, monophyly of both Artiodactyla and Lipotyphla has been heavily criticised for including low-quality and redundant data. We apply a stringent data selection protocol designed to minimise these problems to a much-expanded data set of morphological, molecular and combined source trees, to produce a supertree that includes every family of extant placental mammals. Results The supertree is well-resolved and supports both polyphyly of Lipotyphla and paraphyly of Artiodactyla with respect to Cetacea. The existence of four 'superorders' – Afrotheria, Xenarthra, Laurasiatheria and Euarchontoglires – is also supported. The topology is highly congruent with recent (molecular phylogenetic analyses of placental mammals, but is considerably more comprehensive, being the first phylogeny to include all 113 extant families without making a priori assumptions of suprafamilial monophyly. Subsidiary analyses reveal that the data selection protocol played a key role in the major changes relative to a previously published higher-level supertree of placentals. Conclusion The supertree should provide a useful framework for hypothesis testing in phylogenetic comparative biology, and supports the idea that biogeography has played a crucial role in the evolution of placental mammals. Our results demonstrate the importance of minimising poor and redundant data when constructing supertrees.

  4. Order statistics & inference estimation methods

    CERN Document Server

    Balakrishnan, N

    1991-01-01

    The literature on order statistics and inferenc eis quite extensive and covers a large number of fields ,but most of it is dispersed throughout numerous publications. This volume is the consolidtion of the most important results and places an emphasis on estimation. Both theoretical and computational procedures are presented to meet the needs of researchers, professionals, and students. The methods of estimation discussed are well-illustrated with numerous practical examples from both the physical and life sciences, including sociology,psychology,a nd electrical and chemical engineering. A co

  5. Fuzzy logic controller using different inference methods

    International Nuclear Information System (INIS)

    Liu, Z.; De Keyser, R.

    1994-01-01

    In this paper the design of fuzzy controllers by using different inference methods is introduced. Configuration of the fuzzy controllers includes a general rule-base which is a collection of fuzzy PI or PD rules, the triangular fuzzy data model and a centre of gravity defuzzification algorithm. The generalized modus ponens (GMP) is used with the minimum operator of the triangular norm. Under the sup-min inference rule, six fuzzy implication operators are employed to calculate the fuzzy look-up tables for each rule base. The performance is tested in simulated systems with MATLAB/SIMULINK. Results show the effects of using the fuzzy controllers with different inference methods and applied to different test processes

  6. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    2010-01-01

    Chapter 9: This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods based on a maximum likelihood or Bayesian approach combined with markov chain Monte Carlo...... (MCMC) techniques. Due to space limitations the focus is on spatial point processes....

  7. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    (This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.1 with the ......(This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.......1 with the title ‘Inference'.) This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods using Markov chain Monte Carlo (MCMC) simulations. Due to space limitations the focus...

  8. Bayesian methods for hackers probabilistic programming and Bayesian inference

    CERN Document Server

    Davidson-Pilon, Cameron

    2016-01-01

    Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power. Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention. Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples a...

  9. Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation

    DEFF Research Database (Denmark)

    Brouwer, Thomas; Frellsen, Jes; Liò, Pietro

    2017-01-01

    In this paper, we study the trade-offs of different inference approaches for Bayesian matrix factorisation methods, which are commonly used for predicting missing values, and for finding patterns in the data. In particular, we consider Bayesian nonnegative variants of matrix factorisation and tri......-factorisation, and compare non-probabilistic inference, Gibbs sampling, variational Bayesian inference, and a maximum-a-posteriori approach. The variational approach is new for the Bayesian nonnegative models. We compare their convergence, and robustness to noise and sparsity of the data, on both synthetic and real...

  10. An algebra-based method for inferring gene regulatory networks.

    Science.gov (United States)

    Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard

    2014-03-26

    The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the

  11. Explanation in causal inference methods for mediation and interaction

    CERN Document Server

    VanderWeele, Tyler

    2015-01-01

    A comprehensive examination of methods for mediation and interaction, VanderWeele's book is the first to approach this topic from the perspective of causal inference. Numerous software tools are provided, and the text is both accessible and easy to read, with examples drawn from diverse fields. The result is an essential reference for anyone conducting empirical research in the biomedical or social sciences.

  12. SuperTRI: A new approach based on branch support analyses of multiple independent data sets for assessing reliability of phylogenetic inferences.

    Science.gov (United States)

    Ropiquet, Anne; Li, Blaise; Hassanin, Alexandre

    2009-09-01

    Supermatrix and supertree are two methods for constructing a phylogenetic tree by using multiple data sets. However, these methods are not a panacea, as conflicting signals between data sets can lead to misinterpret the evolutionary history of taxa. In particular, the supermatrix approach is expected to be misleading if the species-tree signal is not dominant after the combination of the data sets. Moreover, most current supertree methods suffer from two limitations: (i) they ignore or misinterpret secondary (non-dominant) phylogenetic signals of the different data sets; and (ii) the logical basis of node robustness measures is unclear. To overcome these limitations, we propose a new approach, called SuperTRI, which is based on the branch support analyses of the independent data sets, and where the reliability of the nodes is assessed using three measures: the supertree Bootstrap percentage and two other values calculated from the separate analyses: the mean branch support (mean Bootstrap percentage or mean posterior probability) and the reproducibility index. The SuperTRI approach is tested on a data matrix including seven genes for 82 taxa of the family Bovidae (Mammalia, Ruminantia), and the results are compared to those found with the supermatrix approach. The phylogenetic analyses of the supermatrix and independent data sets were done using four methods of tree reconstruction: Bayesian inference, maximum likelihood, and unweighted and weighted maximum parsimony. The results indicate, firstly, that the SuperTRI approach shows less sensitivity to the four phylogenetic methods, secondly, that it is more accurate to interpret the relationships among taxa, and thirdly, that interesting conclusions on introgression and radiation can be drawn from the comparisons between SuperTRI and supermatrix analyses.

  13. Detecting Violations of Unidimensionality by Order-Restricted Inference Methods

    Directory of Open Access Journals (Sweden)

    Moritz eHeene

    2016-03-01

    Full Text Available The assumption of unidimensionality and quantitative measurement represents one of the key concepts underlying most of the commonly applied of item response models. The assumption of unidimensionality is frequently tested although most commonly applied methods have been shown having low power against violations of unidimensionality whereas the assumption of quantitative measurement remains in most of the cases only an (implicit assumption. On the basis of a simulation study it is shown that order restricted inference methods within a Markov Chain Monte Carlo framework can successfully be used to test both assumptions.

  14. Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches

    Directory of Open Access Journals (Sweden)

    Beaulieu Jeremy M

    2009-02-01

    Full Text Available Abstract Background Biology has increasingly recognized the necessity to build and utilize larger phylogenies to address broad evolutionary questions. Large phylogenies have facilitated the discovery of differential rates of molecular evolution between trees and herbs. They have helped us understand the diversification patterns of mammals as well as the patterns of seed evolution. In addition to these broad evolutionary questions there is increasing awareness of the importance of large phylogenies for addressing conservation issues such as biodiversity hotspots and response to global change. Two major classes of methods have been employed to accomplish the large tree-building task: supertrees and supermatrices. Although these methods are continually being developed, they have yet to be made fully accessible to comparative biologists making extremely large trees rare. Results Here we describe and demonstrate a modified supermatrix method termed mega-phylogeny that uses databased sequences as well as taxonomic hierarchies to make extremely large trees with denser matrices than supermatrices. The two major challenges facing large-scale supermatrix phylogenetics are assembling large data matrices from databases and reconstructing trees from those datasets. The mega-phylogeny approach addresses the former as the latter is accomplished by employing recently developed methods that have greatly reduced the run time of large phylogeny construction. We present an algorithm that requires relatively little human intervention. The implemented algorithm is demonstrated with a dataset and phylogeny for Asterales (within Campanulidae containing 4954 species and 12,033 sites and an rbcL matrix for green plants (Viridiplantae with 13,533 species and 1,401 sites. Conclusion By examining much larger phylogenies, patterns emerge that were otherwise unseen. The phylogeny of Viridiplantae successfully reconstructs major relationships of vascular plants that previously

  15. Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches.

    Science.gov (United States)

    Smith, Stephen A; Beaulieu, Jeremy M; Donoghue, Michael J

    2009-02-11

    Biology has increasingly recognized the necessity to build and utilize larger phylogenies to address broad evolutionary questions. Large phylogenies have facilitated the discovery of differential rates of molecular evolution between trees and herbs. They have helped us understand the diversification patterns of mammals as well as the patterns of seed evolution. In addition to these broad evolutionary questions there is increasing awareness of the importance of large phylogenies for addressing conservation issues such as biodiversity hotspots and response to global change. Two major classes of methods have been employed to accomplish the large tree-building task: supertrees and supermatrices. Although these methods are continually being developed, they have yet to be made fully accessible to comparative biologists making extremely large trees rare. Here we describe and demonstrate a modified supermatrix method termed mega-phylogeny that uses databased sequences as well as taxonomic hierarchies to make extremely large trees with denser matrices than supermatrices. The two major challenges facing large-scale supermatrix phylogenetics are assembling large data matrices from databases and reconstructing trees from those datasets. The mega-phylogeny approach addresses the former as the latter is accomplished by employing recently developed methods that have greatly reduced the run time of large phylogeny construction. We present an algorithm that requires relatively little human intervention. The implemented algorithm is demonstrated with a dataset and phylogeny for Asterales (within Campanulidae) containing 4954 species and 12,033 sites and an rbcL matrix for green plants (Viridiplantae) with 13,533 species and 1,401 sites. By examining much larger phylogenies, patterns emerge that were otherwise unseen. The phylogeny of Viridiplantae successfully reconstructs major relationships of vascular plants that previously required many more genes. These demonstrations

  16. Assessment of network inference methods: how to cope with an underdetermined problem.

    Directory of Open Access Journals (Sweden)

    Caroline Siegenthaler

    Full Text Available The inference of biological networks is an active research area in the field of systems biology. The number of network inference algorithms has grown tremendously in the last decade, underlining the importance of a fair assessment and comparison among these methods. Current assessments of the performance of an inference method typically involve the application of the algorithm to benchmark datasets and the comparison of the network predictions against the gold standard or reference networks. While the network inference problem is often deemed underdetermined, implying that the inference problem does not have a (unique solution, the consequences of such an attribute have not been rigorously taken into consideration. Here, we propose a new procedure for assessing the performance of gene regulatory network (GRN inference methods. The procedure takes into account the underdetermined nature of the inference problem, in which gene regulatory interactions that are inferable or non-inferable are determined based on causal inference. The assessment relies on a new definition of the confusion matrix, which excludes errors associated with non-inferable gene regulations. For demonstration purposes, the proposed assessment procedure is applied to the DREAM 4 In Silico Network Challenge. The results show a marked change in the ranking of participating methods when taking network inferability into account.

  17. Metainference: A Bayesian inference method for heterogeneous systems.

    Science.gov (United States)

    Bonomi, Massimiliano; Camilloni, Carlo; Cavalli, Andrea; Vendruscolo, Michele

    2016-01-01

    Modeling a complex system is almost invariably a challenging task. The incorporation of experimental observations can be used to improve the quality of a model and thus to obtain better predictions about the behavior of the corresponding system. This approach, however, is affected by a variety of different errors, especially when a system simultaneously populates an ensemble of different states and experimental data are measured as averages over such states. To address this problem, we present a Bayesian inference method, called "metainference," that is able to deal with errors in experimental measurements and with experimental measurements averaged over multiple states. To achieve this goal, metainference models a finite sample of the distribution of models using a replica approach, in the spirit of the replica-averaging modeling based on the maximum entropy principle. To illustrate the method, we present its application to a heterogeneous model system and to the determination of an ensemble of structures corresponding to the thermal fluctuations of a protein molecule. Metainference thus provides an approach to modeling complex systems with heterogeneous components and interconverting between different states by taking into account all possible sources of errors.

  18. Inference method using bayesian network for diagnosis of pulmonary nodules

    International Nuclear Information System (INIS)

    Kawagishi, Masami; Iizuka, Yoshio; Yamamoto, Hiroyuki; Yakami, Masahiro; Kubo, Takeshi; Fujimoto, Koji; Togashi, Kaori

    2010-01-01

    This report describes the improvements of a naive Bayes model that infers the diagnosis of pulmonary nodules in chest CT images based on the findings obtained when a radiologist interprets the CT images. We have previously introduced an inference model using a naive Bayes classifier and have reported its clinical value based on evaluation using clinical data. In the present report, we introduce the following improvements to the original inference model: the selection of findings based on correlations and the generation of a model using only these findings, and the introduction of classifiers that integrate several simple classifiers each of which is specialized for specific diagnosis. These improvements were found to increase the inference accuracy by 10.4% (p<.01) as compared to the original model in 100 cases (222 nodules) based on leave-one-out evaluation. (author)

  19. Approximation Methods for Inference and Learning in Belief Networks: Progress and Future Directions

    National Research Council Canada - National Science Library

    Pazzan, Michael

    1997-01-01

    .... In this research project, we have investigated methods and implemented algorithms for efficiently making certain classes of inference in belief networks, and for automatically learning certain...

  20. Classical methods for interpreting objective function minimization as intelligent inference

    Energy Technology Data Exchange (ETDEWEB)

    Golden, R.M. [Univ. of Texas, Dallas, TX (United States)

    1996-12-31

    Most recognition algorithms and neural networks can be formally viewed as seeking a minimum value of an appropriate objective function during either classification or learning phases. The goal of this paper is to argue that in order to show a recognition algorithm is making intelligent inferences, it is not sufficient to show that the recognition algorithm is computing (or trying to compute) the global minimum of some objective function. One must explicitly define a {open_quotes}relational system{close_quotes} for the recognition algorithm or neural network which identifies the: (i) sample space, (ii) the relevant sigmafield of events generated by the sample space, and (iii) the {open_quotes}relation{close_quotes} for that relational system. Only when such a {open_quotes}relational system{close_quotes} is properly defined, is it possible to formally establish the sense in which computing the global minimum of an objective function is an intelligent, inference.

  1. Maximum Likelihood Method for Predicting Environmental Conditions from Assemblage Composition: The R Package bio.infer

    Directory of Open Access Journals (Sweden)

    Lester L. Yuan

    2007-06-01

    Full Text Available This paper provides a brief introduction to the R package bio.infer, a set of scripts that facilitates the use of maximum likelihood (ML methods for predicting environmental conditions from assemblage composition. Environmental conditions can often be inferred from only biological data, and these inferences are useful when other sources of data are unavailable. ML prediction methods are statistically rigorous and applicable to a broader set of problems than more commonly used weighted averaging techniques. However, ML methods require a substantially greater investment of time to program algorithms and to perform computations. This package is designed to reduce the effort required to apply ML prediction methods.

  2. Inferring biological functions of guanylyl cyclases with computational methods

    KAUST Repository

    Alquraishi, May Majed; Meier, Stuart Kurt

    2013-01-01

    A number of studies have shown that functionally related genes are often co-expressed and that computational based co-expression analysis can be used to accurately identify functional relationships between genes and by inference, their encoded proteins. Here we describe how a computational based co-expression analysis can be used to link the function of a specific gene of interest to a defined cellular response. Using a worked example we demonstrate how this methodology is used to link the function of the Arabidopsis Wall-Associated Kinase-Like 10 gene, which encodes a functional guanylyl cyclase, to host responses to pathogens. © Springer Science+Business Media New York 2013.

  3. Inferring biological functions of guanylyl cyclases with computational methods

    KAUST Repository

    Alquraishi, May Majed

    2013-09-03

    A number of studies have shown that functionally related genes are often co-expressed and that computational based co-expression analysis can be used to accurately identify functional relationships between genes and by inference, their encoded proteins. Here we describe how a computational based co-expression analysis can be used to link the function of a specific gene of interest to a defined cellular response. Using a worked example we demonstrate how this methodology is used to link the function of the Arabidopsis Wall-Associated Kinase-Like 10 gene, which encodes a functional guanylyl cyclase, to host responses to pathogens. © Springer Science+Business Media New York 2013.

  4. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

    Science.gov (United States)

    Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

    2011-08-15

    Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.

  5. Statistical Sensitive Data Protection and Inference Prevention with Decision Tree Methods

    National Research Council Canada - National Science Library

    Chang, LiWu

    2003-01-01

    .... We consider inference as correct classification and approach it with decision tree methods. As in our previous work, sensitive data are viewed as classes of those test data and non-sensitive data are the rest attribute values...

  6. Kernel methods and flexible inference for complex stochastic dynamics

    Science.gov (United States)

    Capobianco, Enrico

    2008-07-01

    Approximation theory suggests that series expansions and projections represent standard tools for random process applications from both numerical and statistical standpoints. Such instruments emphasize the role of both sparsity and smoothness for compression purposes, the decorrelation power achieved in the expansion coefficients space compared to the signal space, and the reproducing kernel property when some special conditions are met. We consider these three aspects central to the discussion in this paper, and attempt to analyze the characteristics of some known approximation instruments employed in a complex application domain such as financial market time series. Volatility models are often built ad hoc, parametrically and through very sophisticated methodologies. But they can hardly deal with stochastic processes with regard to non-Gaussianity, covariance non-stationarity or complex dependence without paying a big price in terms of either model mis-specification or computational efficiency. It is thus a good idea to look at other more flexible inference tools; hence the strategy of combining greedy approximation and space dimensionality reduction techniques, which are less dependent on distributional assumptions and more targeted to achieve computationally efficient performances. Advantages and limitations of their use will be evaluated by looking at algorithmic and model building strategies, and by reporting statistical diagnostics.

  7. Bayesian inference method for stochastic damage accumulation modeling

    International Nuclear Information System (INIS)

    Jiang, Xiaomo; Yuan, Yong; Liu, Xian

    2013-01-01

    Damage accumulation based reliability model plays an increasingly important role in successful realization of condition based maintenance for complicated engineering systems. This paper developed a Bayesian framework to establish stochastic damage accumulation model from historical inspection data, considering data uncertainty. Proportional hazards modeling technique is developed to model the nonlinear effect of multiple influencing factors on system reliability. Different from other hazard modeling techniques such as normal linear regression model, the approach does not require any distribution assumption for the hazard model, and can be applied for a wide variety of distribution models. A Bayesian network is created to represent the nonlinear proportional hazards models and to estimate model parameters by Bayesian inference with Markov Chain Monte Carlo simulation. Both qualitative and quantitative approaches are developed to assess the validity of the established damage accumulation model. Anderson–Darling goodness-of-fit test is employed to perform the normality test, and Box–Cox transformation approach is utilized to convert the non-normality data into normal distribution for hypothesis testing in quantitative model validation. The methodology is illustrated with the seepage data collected from real-world subway tunnels.

  8. Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks

    Directory of Open Access Journals (Sweden)

    Ji Wei

    2010-10-01

    Full Text Available Abstract Background Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discretization has been conducted, in the context of gene regulatory network inference from time series gene expression data. Results In this study, we propose a new discretization method "bikmeans", and compare its performance with four other widely-used discretization methods using different datasets, modeling algorithms and number of intervals. Sensitivities, specificities and total accuracies were calculated and statistical analysis was carried out. Bikmeans method always gave high total accuracies. Conclusions Our results indicate that proper discretization methods can consistently improve gene regulatory network inference independent of network modeling algorithms and datasets. Our new method, bikmeans, resulted in significant better total accuracies than other methods.

  9. Numerical methods for Bayesian inference in the face of aging

    International Nuclear Information System (INIS)

    Clarotti, C.A.; Villain, B.; Procaccia, H.

    1996-01-01

    In recent years, much attention has been paid to Bayesian methods for Risk Assessment. Until now, these methods have been studied from a theoretical point of view. Researchers have been mainly interested in: studying the effectiveness of Bayesian methods in handling rare events; debating about the problem of priors and other philosophical issues. An aspect central to the Bayesian approach is numerical computation because any safety/reliability problem, in a Bayesian frame, ends with a problem of numerical integration. This aspect has been neglected until now because most Risk studies assumed the Exponential model as the basic probabilistic model. The existence of conjugate priors makes numerical integration unnecessary in this case. If aging is to be taken into account, no conjugate family is available and the use of numerical integration becomes compulsory. EDF (National Board of Electricity, of France) and ENEA (National Committee for Energy, New Technologies and Environment, of Italy) jointly carried out a research program aimed at developing quadrature methods suitable for Bayesian Interference with underlying Weibull or gamma distributions. The paper will illustrate the main results achieved during the above research program and will discuss, via some sample cases, the performances of the numerical algorithms which on the appearance of stress corrosion cracking in the tubes of Steam Generators of PWR French power plants. (authors)

  10. Generalized Bootstrap Method for Assessment of Uncertainty in Semivariogram Inference

    Science.gov (United States)

    Olea, R.A.; Pardo-Iguzquiza, E.

    2011-01-01

    The semivariogram and its related function, the covariance, play a central role in classical geostatistics for modeling the average continuity of spatially correlated attributes. Whereas all methods are formulated in terms of the true semivariogram, in practice what can be used are estimated semivariograms and models based on samples. A generalized form of the bootstrap method to properly model spatially correlated data is used to advance knowledge about the reliability of empirical semivariograms and semivariogram models based on a single sample. Among several methods available to generate spatially correlated resamples, we selected a method based on the LU decomposition and used several examples to illustrate the approach. The first one is a synthetic, isotropic, exhaustive sample following a normal distribution, the second example is also a synthetic but following a non-Gaussian random field, and a third empirical sample consists of actual raingauge measurements. Results show wider confidence intervals than those found previously by others with inadequate application of the bootstrap. Also, even for the Gaussian example, distributions for estimated semivariogram values and model parameters are positively skewed. In this sense, bootstrap percentile confidence intervals, which are not centered around the empirical semivariogram and do not require distributional assumptions for its construction, provide an achieved coverage similar to the nominal coverage. The latter cannot be achieved by symmetrical confidence intervals based on the standard error, regardless if the standard error is estimated from a parametric equation or from bootstrap. ?? 2010 International Association for Mathematical Geosciences.

  11. Approximation and inference methods for stochastic biochemical kinetics—a tutorial review

    International Nuclear Information System (INIS)

    Schnoerr, David; Grima, Ramon; Sanguinetti, Guido

    2017-01-01

    Stochastic fluctuations of molecule numbers are ubiquitous in biological systems. Important examples include gene expression and enzymatic processes in living cells. Such systems are typically modelled as chemical reaction networks whose dynamics are governed by the chemical master equation. Despite its simple structure, no analytic solutions to the chemical master equation are known for most systems. Moreover, stochastic simulations are computationally expensive, making systematic analysis and statistical inference a challenging task. Consequently, significant effort has been spent in recent decades on the development of efficient approximation and inference methods. This article gives an introduction to basic modelling concepts as well as an overview of state of the art methods. First, we motivate and introduce deterministic and stochastic methods for modelling chemical networks, and give an overview of simulation and exact solution methods. Next, we discuss several approximation methods, including the chemical Langevin equation, the system size expansion, moment closure approximations, time-scale separation approximations and hybrid methods. We discuss their various properties and review recent advances and remaining challenges for these methods. We present a comparison of several of these methods by means of a numerical case study and highlight some of their respective advantages and disadvantages. Finally, we discuss the problem of inference from experimental data in the Bayesian framework and review recent methods developed the literature. In summary, this review gives a self-contained introduction to modelling, approximations and inference methods for stochastic chemical kinetics. (topical review)

  12. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    Science.gov (United States)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  13. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    Science.gov (United States)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' machine-learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  14. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    Science.gov (United States)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' < or = 18 mag), with 4500 K < or = Teff < or = 7000 K, corresponding to those with the most reliable SSPP estimates, I find that the model predicts [Fe/H] values with a root-mean-squared-error (RMSE) of approx.0.27 dex. The RMSE from this machine-learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  15. Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives.

    Science.gov (United States)

    Ramstetter, Monica D; Dyer, Thomas D; Lehman, Donna M; Curran, Joanne E; Duggirala, Ravindranath; Blangero, John; Mezey, Jason G; Williams, Amy L

    2017-09-01

    Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92-99%) when detecting first- and second-degree relationships, but their accuracy dwindles to 76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance. Copyright © 2017 Ramstetter et al.

  16. A method of inferring k-infinity from reaction rate measurements in thermal reactor systems

    International Nuclear Information System (INIS)

    Newmarch, D.A.

    1967-05-01

    A scheme is described for inferring a value of k-infinity from reaction rate measurements. The method is devised with the METHUSELAH group structure in mind and was developed for the analysis of S.G.H.W. reactor experiments; the underlying principles, however, are general. (author)

  17. Designs and Methods for Association Studies and Population Size Inference in Statistical Genetics

    DEFF Research Database (Denmark)

    Waltoft, Berit Lindum

    method provides a simple goodness of t test by comparing the observed SFS with the expected SFS under a given model of population size changes. By the use of Monte Carlo estimation the expected time between coalescent events can be estimated and the expected SFS can thereby be evaluated. Using......). The OR is interpreted as the eect of an exposure on the probability of being diseased at the end of follow-up, while the interpretation of the IRR is the eect of an exposure on the probability of becoming diseased. Through a simulation study, the OR from a classical case-control study is shown to be an inconsistent...... the classical chi-square statistics we are able to infer single parameter models. Multiple parameter models, e.g. multiple epochs, are harder to identify. By introducing the inference of population size back in time as an inverse problem, the second procedure applies the theory of smoothing splines to infer...

  18. Improved inference in the evaluation of mutual fund performance using panel bootstrap methods

    OpenAIRE

    Blake, David; Caulfield, Tristan; Ioannidis, Christos; Tonks, I P

    2014-01-01

    Two new methodologies are introduced to improve inference in the evaluation of mutual fund performance against benchmarks. First, the benchmark models are estimated using panel methods with both fund and time effects. Second, the non-normality of individual mutual fund returns is accounted for by using panel bootstrap methods. We also augment the standard benchmark factors with fund-specific characteristics, such as fund size. Using a dataset of UK equity mutual fund returns, we find that fun...

  19. Development of the Bayesian method for unavailability inference. The new inferential theory and the examples of inference using BWR outage data in Japan

    International Nuclear Information System (INIS)

    Nakamura, Makoto

    2009-01-01

    It is important for Level 1 PSA to quantify input reliability parameters and their uncertainty. Bayesian methods for inference of system/component unavailability, however, are not well studied. At present practitioners allocate the uncertainty (i.e. error factor) of the unavailability based on engineering judgment. Systematic methods based on Bayesian statistics are needed for quantification of such uncertainty. In this study we have developed a new method for Bayesian inference of unavailability, where the posterior of system/component unavailability is described by the inverted gamma distribution. We show that the average of the posterior comes close to the point estimate of the unavailability as the number of outages goes to infinity. That indicates validity of the new method. Using plant data recorded in NUCIA, we have applied the new method to inference of system unavailability under unplanned outages due to violations of LCO at BWRs in Japan. According to the inference results, the unavailability is populated in the order of 10 -5 -10 -4 and the error factor is within 1-2. Thus, the new Bayesian method allows one to quantify magnitudes and widths (i.e. error factor) of uncertainty distributions of unavailability. (author)

  20. A method for crack sizing using Bayesian inference arising in eddy current testing

    International Nuclear Information System (INIS)

    Kojima, Fumio; Kikuchi, Mitsuhiro

    2008-01-01

    This paper is concerned with a sizing methodology of crack using Bayesian inference arising in eddy current testing. There is often uncertainty about data through quantitative measurements of nondestructive testing and this can yield misleading inference of crack sizing at on-site monitoring. In this paper, we propose optimal strategies of measurements in eddy current testing using Bayesian prior-to-posteriori analysis. First our likelihood functional is given by Gaussian distribution with the measurement model based on the hybrid use of finite and boundary element methods. Secondly, given a priori distributions of crack sizing, we propose a method for estimating the region of interest for sizing cracks. Finally an optimal sensing method is demonstrated using our idea. (author)

  1. A new fast method for inferring multiple consensus trees using k-medoids.

    Science.gov (United States)

    Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir

    2018-04-05

    Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Caliński-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while

  2. A graphical user interface for a method to infer kinetics and network architecture (MIKANA).

    Science.gov (United States)

    Mourão, Márcio A; Srividhya, Jeyaraman; McSharry, Patrick E; Crampin, Edmund J; Schnell, Santiago

    2011-01-01

    One of the main challenges in the biomedical sciences is the determination of reaction mechanisms that constitute a biochemical pathway. During the last decades, advances have been made in building complex diagrams showing the static interactions of proteins. The challenge for systems biologists is to build realistic models of the dynamical behavior of reactants, intermediates and products. For this purpose, several methods have been recently proposed to deduce the reaction mechanisms or to estimate the kinetic parameters of the elementary reactions that constitute the pathway. One such method is MIKANA: Method to Infer Kinetics And Network Architecture. MIKANA is a computational method to infer both reaction mechanisms and estimate the kinetic parameters of biochemical pathways from time course data. To make it available to the scientific community, we developed a Graphical User Interface (GUI) for MIKANA. Among other features, the GUI validates and processes an input time course data, displays the inferred reactions, generates the differential equations for the chemical species in the pathway and plots the prediction curves on top of the input time course data. We also added a new feature to MIKANA that allows the user to exclude a priori known reactions from the inferred mechanism. This addition improves the performance of the method. In this article, we illustrate the GUI for MIKANA with three examples: an irreversible Michaelis-Menten reaction mechanism; the interaction map of chemical species of the muscle glycolytic pathway; and the glycolytic pathway of Lactococcus lactis. We also describe the code and methods in sufficient detail to allow researchers to further develop the code or reproduce the experiments described. The code for MIKANA is open source, free for academic and non-academic use and is available for download (Information S1).

  3. Predictive Distribution of the Dirichlet Mixture Model by the Local Variational Inference Method

    DEFF Research Database (Denmark)

    Ma, Zhanyu; Leijon, Arne; Tan, Zheng-Hua

    2014-01-01

    the predictive likelihood of the new upcoming data, especially when the amount of training data is small. The Bayesian estimation of a Dirichlet mixture model (DMM) is, in general, not analytically tractable. In our previous work, we have proposed a global variational inference-based method for approximately...... calculating the posterior distributions of the parameters in the DMM analytically. In this paper, we extend our previous study for the DMM and propose an algorithm to calculate the predictive distribution of the DMM with the local variational inference (LVI) method. The true predictive distribution of the DMM...... is analytically intractable. By considering the concave property of the multivariate inverse beta function, we introduce an upper-bound to the true predictive distribution. As the global minimum of this upper-bound exists, the problem is reduced to seek an approximation to the true predictive distribution...

  4. Causal inference with missing exposure information: Methods and applications to an obstetric study.

    Science.gov (United States)

    Zhang, Zhiwei; Liu, Wei; Zhang, Bo; Tang, Li; Zhang, Jun

    2016-10-01

    Causal inference in observational studies is frequently challenged by the occurrence of missing data, in addition to confounding. Motivated by the Consortium on Safe Labor, a large observational study of obstetric labor practice and birth outcomes, this article focuses on the problem of missing exposure information in a causal analysis of observational data. This problem can be approached from different angles (i.e. missing covariates and causal inference), and useful methods can be obtained by drawing upon the available techniques and insights in both areas. In this article, we describe and compare a collection of methods based on different modeling assumptions, under standard assumptions for missing data (i.e. missing-at-random and positivity) and for causal inference with complete data (i.e. no unmeasured confounding and another positivity assumption). These methods involve three models: one for treatment assignment, one for the dependence of outcome on treatment and covariates, and one for the missing data mechanism. In general, consistent estimation of causal quantities requires correct specification of at least two of the three models, although there may be some flexibility as to which two models need to be correct. Such flexibility is afforded by doubly robust estimators adapted from the missing covariates literature and the literature on causal inference with complete data, and by a newly developed triply robust estimator that is consistent if any two of the three models are correct. The methods are applied to the Consortium on Safe Labor data and compared in a simulation study mimicking the Consortium on Safe Labor. © The Author(s) 2013.

  5. State of the Art of Fuzzy Methods for Gene Regulatory Networks Inference

    Directory of Open Access Journals (Sweden)

    Tuqyah Abdullah Al Qazlan

    2015-01-01

    Full Text Available To address one of the most challenging issues at the cellular level, this paper surveys the fuzzy methods used in gene regulatory networks (GRNs inference. GRNs represent causal relationships between genes that have a direct influence, trough protein production, on the life and the development of living organisms and provide a useful contribution to the understanding of the cellular functions as well as the mechanisms of diseases. Fuzzy systems are based on handling imprecise knowledge, such as biological information. They provide viable computational tools for inferring GRNs from gene expression data, thus contributing to the discovery of gene interactions responsible for specific diseases and/or ad hoc correcting therapies. Increasing computational power and high throughput technologies have provided powerful means to manage these challenging digital ecosystems at different levels from cell to society globally. The main aim of this paper is to report, present, and discuss the main contributions of this multidisciplinary field in a coherent and structured framework.

  6. Limitations of a metabolic network-based reverse ecology method for inferring host-pathogen interactions.

    Science.gov (United States)

    Takemoto, Kazuhiro; Aie, Kazuki

    2017-05-25

    Host-pathogen interactions are important in a wide range of research fields. Given the importance of metabolic crosstalk between hosts and pathogens, a metabolic network-based reverse ecology method was proposed to infer these interactions. However, the validity of this method remains unclear because of the various explanations presented and the influence of potentially confounding factors that have thus far been neglected. We re-evaluated the importance of the reverse ecology method for evaluating host-pathogen interactions while statistically controlling for confounding effects using oxygen requirement, genome, metabolic network, and phylogeny data. Our data analyses showed that host-pathogen interactions were more strongly influenced by genome size, primary network parameters (e.g., number of edges), oxygen requirement, and phylogeny than the reserve ecology-based measures. These results indicate the limitations of the reverse ecology method; however, they do not discount the importance of adopting reverse ecology approaches altogether. Rather, we highlight the need for developing more suitable methods for inferring host-pathogen interactions and conducting more careful examinations of the relationships between metabolic networks and host-pathogen interactions.

  7. Inferring the photometric and size evolution of galaxies from image simulations. I. Method

    Science.gov (United States)

    Carassou, Sébastien; de Lapparent, Valérie; Bertin, Emmanuel; Le Borgne, Damien

    2017-09-01

    Context. Current constraints on models of galaxy evolution rely on morphometric catalogs extracted from multi-band photometric surveys. However, these catalogs are altered by selection effects that are difficult to model, that correlate in non trivial ways, and that can lead to contradictory predictions if not taken into account carefully. Aims: To address this issue, we have developed a new approach combining parametric Bayesian indirect likelihood (pBIL) techniques and empirical modeling with realistic image simulations that reproduce a large fraction of these selection effects. This allows us to perform a direct comparison between observed and simulated images and to infer robust constraints on model parameters. Methods: We use a semi-empirical forward model to generate a distribution of mock galaxies from a set of physical parameters. These galaxies are passed through an image simulator reproducing the instrumental characteristics of any survey and are then extracted in the same way as the observed data. The discrepancy between the simulated and observed data is quantified, and minimized with a custom sampling process based on adaptive Markov chain Monte Carlo methods. Results: Using synthetic data matching most of the properties of a Canada-France-Hawaii Telescope Legacy Survey Deep field, we demonstrate the robustness and internal consistency of our approach by inferring the parameters governing the size and luminosity functions and their evolutions for different realistic populations of galaxies. We also compare the results of our approach with those obtained from the classical spectral energy distribution fitting and photometric redshift approach. Conclusions: Our pipeline infers efficiently the luminosity and size distribution and evolution parameters with a very limited number of observables (three photometric bands). When compared to SED fitting based on the same set of observables, our method yields results that are more accurate and free from

  8. Inferring regulatory networks from expression data using tree-based methods.

    Directory of Open Access Journals (Sweden)

    Vân Anh Huynh-Thu

    2010-09-01

    Full Text Available One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene is predicted from the expression patterns of all the other genes (input genes, using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.

  9. Entropic Inference

    Science.gov (United States)

    Caticha, Ariel

    2011-03-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.

  10. Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.

    Science.gov (United States)

    Zhang, Tingting; Kou, S C

    2010-01-01

    Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.

  11. A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.

    Science.gov (United States)

    Zheng, Chaojie; Wang, Xiuying; Feng, Dagan

    2015-01-01

    PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.

  12. Cycle-Based Cluster Variational Method for Direct and Inverse Inference

    Science.gov (United States)

    Furtlehner, Cyril; Decelle, Aurélien

    2016-08-01

    Large scale inference problems of practical interest can often be addressed with help of Markov random fields. This requires to solve in principle two related problems: the first one is to find offline the parameters of the MRF from empirical data (inverse problem); the second one (direct problem) is to set up the inference algorithm to make it as precise, robust and efficient as possible. In this work we address both the direct and inverse problem with mean-field methods of statistical physics, going beyond the Bethe approximation and associated belief propagation algorithm. We elaborate on the idea that loop corrections to belief propagation can be dealt with in a systematic way on pairwise Markov random fields, by using the elements of a cycle basis to define regions in a generalized belief propagation setting. For the direct problem, the region graph is specified in such a way as to avoid feed-back loops as much as possible by selecting a minimal cycle basis. Following this line we are led to propose a two-level algorithm, where a belief propagation algorithm is run alternatively at the level of each cycle and at the inter-region level. Next we observe that the inverse problem can be addressed region by region independently, with one small inverse problem per region to be solved. It turns out that each elementary inverse problem on the loop geometry can be solved efficiently. In particular in the random Ising context we propose two complementary methods based respectively on fixed point equations and on a one-parameter log likelihood function minimization. Numerical experiments confirm the effectiveness of this approach both for the direct and inverse MRF inference. Heterogeneous problems of size up to 10^5 are addressed in a reasonable computational time, notably with better convergence properties than ordinary belief propagation.

  13. Inferring regulatory networks from experimental morphological phenotypes: a computational method reverse-engineers planarian regeneration.

    Directory of Open Access Journals (Sweden)

    Daniel Lobo

    2015-06-01

    Full Text Available Transformative applications in biomedicine require the discovery of complex regulatory networks that explain the development and regeneration of anatomical structures, and reveal what external signals will trigger desired changes of large-scale pattern. Despite recent advances in bioinformatics, extracting mechanistic pathway models from experimental morphological data is a key open challenge that has resisted automation. The fundamental difficulty of manually predicting emergent behavior of even simple networks has limited the models invented by human scientists to pathway diagrams that show necessary subunit interactions but do not reveal the dynamics that are sufficient for complex, self-regulating pattern to emerge. To finally bridge the gap between high-resolution genetic data and the ability to understand and control patterning, it is critical to develop computational tools to efficiently extract regulatory pathways from the resultant experimental shape phenotypes. For example, planarian regeneration has been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model has yet been found by human scientists that explains more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. We present a method to infer the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. We demonstrated our approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature; By analyzing all the datasets together, our system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method

  14. Entropic Inference

    OpenAIRE

    Caticha, Ariel

    2010-01-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEn...

  15. FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

    Directory of Open Access Journals (Sweden)

    Bakos Jason D

    2010-04-01

    Full Text Available Abstract Background Likelihood (ML-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. Results We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10× speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Conclusions Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs 1.

  16. General Methods for Evolutionary Quantitative Genetic Inference from Generalized Mixed Models.

    Science.gov (United States)

    de Villemereuil, Pierre; Schielzeth, Holger; Nakagawa, Shinichi; Morrissey, Michael

    2016-11-01

    Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioral traits, have inherently nonnormal distributions. The generalized linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for nonnormal traits. However, whereas GLMMs provide inference on a statistically convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGglmm. We show that known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation and apply our approach to data from a wild pedigreed vertebrate population. Copyright © 2016 de Villemereuil et al.

  17. New Bayesian inference method using two steps of Markov chain Monte Carlo and its application to shock tube experiment data of Furan oxidation

    KAUST Repository

    Kim, Daesang; El Gharamti, Iman; Bisetti, Fabrizio; Farooq, Aamir; Knio, Omar

    2016-01-01

    A new Bayesian inference method has been developed and applied to Furan shock tube experimental data for efficient statistical inferences of the Arrhenius parameters of two OH radical consumption reactions. The collected experimental data, which

  18. A dynamic discretization method for reliability inference in Dynamic Bayesian Networks

    International Nuclear Information System (INIS)

    Zhu, Jiandao; Collette, Matthew

    2015-01-01

    The material and modeling parameters that drive structural reliability analysis for marine structures are subject to a significant uncertainty. This is especially true when time-dependent degradation mechanisms such as structural fatigue cracking are considered. Through inspection and monitoring, information such as crack location and size can be obtained to improve these parameters and the corresponding reliability estimates. Dynamic Bayesian Networks (DBNs) are a powerful and flexible tool to model dynamic system behavior and update reliability and uncertainty analysis with life cycle data for problems such as fatigue cracking. However, a central challenge in using DBNs is the need to discretize certain types of continuous random variables to perform network inference while still accurately tracking low-probability failure events. Most existing discretization methods focus on getting the overall shape of the distribution correct, with less emphasis on the tail region. Therefore, a novel scheme is presented specifically to estimate the likelihood of low-probability failure events. The scheme is an iterative algorithm which dynamically partitions the discretization intervals at each iteration. Through applications to two stochastic crack-growth example problems, the algorithm is shown to be robust and accurate. Comparisons are presented between the proposed approach and existing methods for the discretization problem. - Highlights: • A dynamic discretization method is developed for low-probability events in DBNs. • The method is compared to existing approaches on two crack growth problems. • The method is shown to improve on existing methods for low-probability events

  19. A time series approach to inferring groundwater recharge using the water table fluctuation method

    Science.gov (United States)

    Crosbie, Russell S.; Binning, Philip; Kalma, Jetse D.

    2005-01-01

    The water table fluctuation method for determining recharge from precipitation and water table measurements was originally developed on an event basis. Here a new multievent time series approach is presented for inferring groundwater recharge from long-term water table and precipitation records. Additional new features are the incorporation of a variable specific yield based upon the soil moisture retention curve, proper accounting for the Lisse effect on the water table, and the incorporation of aquifer drainage so that recharge can be detected even if the water table does not rise. A methodology for filtering noise and non-rainfall-related water table fluctuations is also presented. The model has been applied to 2 years of field data collected in the Tomago sand beds near Newcastle, Australia. It is shown that gross recharge estimates are very sensitive to time step size and specific yield. Properly accounting for the Lisse effect is also important to determining recharge.

  20. An alternative empirical likelihood method in missing response problems and causal inference.

    Science.gov (United States)

    Ren, Kaili; Drummond, Christopher A; Brewster, Pamela S; Haller, Steven T; Tian, Jiang; Cooper, Christopher J; Zhang, Biao

    2016-11-30

    Missing responses are common problems in medical, social, and economic studies. When responses are missing at random, a complete case data analysis may result in biases. A popular debias method is inverse probability weighting proposed by Horvitz and Thompson. To improve efficiency, Robins et al. proposed an augmented inverse probability weighting method. The augmented inverse probability weighting estimator has a double-robustness property and achieves the semiparametric efficiency lower bound when the regression model and propensity score model are both correctly specified. In this paper, we introduce an empirical likelihood-based estimator as an alternative to Qin and Zhang (2007). Our proposed estimator is also doubly robust and locally efficient. Simulation results show that the proposed estimator has better performance when the propensity score is correctly modeled. Moreover, the proposed method can be applied in the estimation of average treatment effect in observational causal inferences. Finally, we apply our method to an observational study of smoking, using data from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions clinical trial. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  1. A Fast Numerical Method for Max-Convolution and the Application to Efficient Max-Product Inference in Bayesian Networks.

    Science.gov (United States)

    Serang, Oliver

    2015-08-01

    Observations depending on sums of random variables are common throughout many fields; however, no efficient solution is currently known for performing max-product inference on these sums of general discrete distributions (max-product inference can be used to obtain maximum a posteriori estimates). The limiting step to max-product inference is the max-convolution problem (sometimes presented in log-transformed form and denoted as "infimal convolution," "min-convolution," or "convolution on the tropical semiring"), for which no O(k log(k)) method is currently known. Presented here is an O(k log(k)) numerical method for estimating the max-convolution of two nonnegative vectors (e.g., two probability mass functions), where k is the length of the larger vector. This numerical max-convolution method is then demonstrated by performing fast max-product inference on a convolution tree, a data structure for performing fast inference given information on the sum of n discrete random variables in O(nk log(nk)log(n)) steps (where each random variable has an arbitrary prior distribution on k contiguous possible states). The numerical max-convolution method can be applied to specialized classes of hidden Markov models to reduce the runtime of computing the Viterbi path from nk(2) to nk log(k), and has potential application to the all-pairs shortest paths problem.

  2. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.

  3. SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354

  4. Statistical inference methods for two crossing survival curves: a comparison of methods.

    Science.gov (United States)

    Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng

    2015-01-01

    A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman's smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér-von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman's smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests.

  5. Pairing field methods to improve inference in wildlife surveys while accommodating detection covariance.

    Science.gov (United States)

    Clare, John; McKinney, Shawn T; DePue, John E; Loftin, Cynthia S

    2017-10-01

    individuals more readily than passive hair catches. Inability to photographically distinguish individual sex did not appear to induce negative bias in camera density estimates; instead, hair catches appeared to produce detection competition between individuals that may have been a source of negative bias. Our model reformulations broaden the range of circumstances in which analyses incorporating multiple sources of information can be robustly used, and our empirical results demonstrate that using multiple field-methods can enhance inferences regarding ecological parameters of interest and improve understanding of how reliably survey methods sample these parameters. © 2017 by the Ecological Society of America.

  6. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method.

    Science.gov (United States)

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-10-06

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli , and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.

  7. An efficient Bayesian inference approach to inverse problems based on an adaptive sparse grid collocation method

    International Nuclear Information System (INIS)

    Ma Xiang; Zabaras, Nicholas

    2009-01-01

    A new approach to modeling inverse problems using a Bayesian inference method is introduced. The Bayesian approach considers the unknown parameters as random variables and seeks the probabilistic distribution of the unknowns. By introducing the concept of the stochastic prior state space to the Bayesian formulation, we reformulate the deterministic forward problem as a stochastic one. The adaptive hierarchical sparse grid collocation (ASGC) method is used for constructing an interpolant to the solution of the forward model in this prior space which is large enough to capture all the variability/uncertainty in the posterior distribution of the unknown parameters. This solution can be considered as a function of the random unknowns and serves as a stochastic surrogate model for the likelihood calculation. Hierarchical Bayesian formulation is used to derive the posterior probability density function (PPDF). The spatial model is represented as a convolution of a smooth kernel and a Markov random field. The state space of the PPDF is explored using Markov chain Monte Carlo algorithms to obtain statistics of the unknowns. The likelihood calculation is performed by directly sampling the approximate stochastic solution obtained through the ASGC method. The technique is assessed on two nonlinear inverse problems: source inversion and permeability estimation in flow through porous media

  8. Inferring Lévy walks from curved trajectories: A rescaling method

    Science.gov (United States)

    Tromer, R. M.; Barbosa, M. B.; Bartumeus, F.; Catalan, J.; da Luz, M. G. E.; Raposo, E. P.; Viswanathan, G. M.

    2015-08-01

    An important problem in the study of anomalous diffusion and transport concerns the proper analysis of trajectory data. The analysis and inference of Lévy walk patterns from empirical or simulated trajectories of particles in two and three-dimensional spaces (2D and 3D) is much more difficult than in 1D because path curvature is nonexistent in 1D but quite common in higher dimensions. Recently, a new method for detecting Lévy walks, which considers 1D projections of 2D or 3D trajectory data, has been proposed by Humphries et al. The key new idea is to exploit the fact that the 1D projection of a high-dimensional Lévy walk is itself a Lévy walk. Here, we ask whether or not this projection method is powerful enough to cleanly distinguish 2D Lévy walk with added curvature from a simple Markovian correlated random walk. We study the especially challenging case in which both 2D walks have exactly identical probability density functions (pdf) of step sizes as well as of turning angles between successive steps. Our approach extends the original projection method by introducing a rescaling of the projected data. Upon projection and coarse-graining, the renormalized pdf for the travel distances between successive turnings is seen to possess a fat tail when there is an underlying Lévy process. We exploit this effect to infer a Lévy walk process in the original high-dimensional curved trajectory. In contrast, no fat tail appears when a (Markovian) correlated random walk is analyzed in this way. We show that this procedure works extremely well in clearly identifying a Lévy walk even when there is noise from curvature. The present protocol may be useful in realistic contexts involving ongoing debates on the presence (or not) of Lévy walks related to animal movement on land (2D) and in air and oceans (3D).

  9. Perceptual inference.

    Science.gov (United States)

    Aggelopoulos, Nikolaos C

    2015-08-01

    Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Method of fuzzy inference for one class of MISO-structure systems with non-singleton inputs

    Science.gov (United States)

    Sinuk, V. G.; Panchenko, M. V.

    2018-03-01

    In fuzzy modeling, the inputs of the simulated systems can receive both crisp values and non-Singleton. Computational complexity of fuzzy inference with fuzzy non-Singleton inputs corresponds to an exponential. This paper describes a new method of inference, based on the theorem of decomposition of a multidimensional fuzzy implication and a fuzzy truth value. This method is considered for fuzzy inputs and has a polynomial complexity, which makes it possible to use it for modeling large-dimensional MISO-structure systems.

  11. When is an image a health claim? A false-recollection method to detect implicit inferences about products' health benefits.

    Science.gov (United States)

    Klepacz, Naomi A; Nash, Robert A; Egan, M Bernadette; Hodgkins, Charo E; Raats, Monique M

    2016-08-01

    Images on food and dietary supplement packaging might lead people to infer (appropriately or inappropriately) certain health benefits of those products. Research on this issue largely involves direct questions, which could (a) elicit inferences that would not be made unprompted, and (b) fail to capture inferences made implicitly. Using a novel memory-based method, in the present research, we explored whether packaging imagery elicits health inferences without prompting, and the extent to which these inferences are made implicitly. In 3 experiments, participants saw fictional product packages accompanied by written claims. Some packages contained an image that implied a health-related function (e.g., a brain), and some contained no image. Participants studied these packages and claims, and subsequently their memory for seen and unseen claims were tested. When a health image was featured on a package, participants often subsequently recognized health claims that-despite being implied by the image-were not truly presented. In Experiment 2, these recognition errors persisted despite an explicit warning against treating the images as informative. In Experiment 3, these findings were replicated in a large consumer sample from 5 European countries, and with a cued-recall test. These findings confirm that images can act as health claims, by leading people to infer health benefits without prompting. These inferences appear often to be implicit, and could therefore be highly pervasive. The data underscore the importance of regulating imagery on product packaging; memory-based methods represent innovative ways to measure how leading (or misleading) specific images can be. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  12. Excavation of attractor modules for nasopharyngeal carcinoma via integrating systemic module inference with attract method.

    Science.gov (United States)

    Jiang, T; Jiang, C-Y; Shu, J-H; Xu, Y-J

    2017-07-10

    The molecular mechanism of nasopharyngeal carcinoma (NPC) is poorly understood and effective therapeutic approaches are needed. This research aimed to excavate the attractor modules involved in the progression of NPC and provide further understanding of the underlying mechanism of NPC. Based on the gene expression data of NPC, two specific protein-protein interaction networks for NPC and control conditions were re-weighted using Pearson correlation coefficient. Then, a systematic tracking of candidate modules was conducted on the re-weighted networks via cliques algorithm, and a total of 19 and 38 modules were separately identified from NPC and control networks, respectively. Among them, 8 pairs of modules with similar gene composition were selected, and 2 attractor modules were identified via the attract method. Functional analysis indicated that these two attractor modules participate in one common bioprocess of cell division. Based on the strategy of integrating systemic module inference with the attract method, we successfully identified 2 attractor modules. These attractor modules might play important roles in the molecular pathogenesis of NPC via affecting the bioprocess of cell division in a conjunct way. Further research is needed to explore the correlations between cell division and NPC.

  13. Philosophy and phylogenetic inference: a comparison of likelihood and parsimony methods in the context of Karl Popper's writings on corroboration.

    Science.gov (United States)

    de Queiroz, K; Poe, S

    2001-06-01

    Advocates of cladistic parsimony methods have invoked the philosophy of Karl Popper in an attempt to argue for the superiority of those methods over phylogenetic methods based on Ronald Fisher's statistical principle of likelihood. We argue that the concept of likelihood in general, and its application to problems of phylogenetic inference in particular, are highly compatible with Popper's philosophy. Examination of Popper's writings reveals that his concept of corroboration is, in fact, based on likelihood. Moreover, because probabilistic assumptions are necessary for calculating the probabilities that define Popper's corroboration, likelihood methods of phylogenetic inference--with their explicit probabilistic basis--are easily reconciled with his concept. In contrast, cladistic parsimony methods, at least as described by certain advocates of those methods, are less easily reconciled with Popper's concept of corroboration. If those methods are interpreted as lacking probabilistic assumptions, then they are incompatible with corroboration. Conversely, if parsimony methods are to be considered compatible with corroboration, then they must be interpreted as carrying implicit probabilistic assumptions. Thus, the non-probabilistic interpretation of cladistic parsimony favored by some advocates of those methods is contradicted by an attempt by the same authors to justify parsimony methods in terms of Popper's concept of corroboration. In addition to being compatible with Popperian corroboration, the likelihood approach to phylogenetic inference permits researchers to test the assumptions of their analytical methods (models) in a way that is consistent with Popper's ideas about the provisional nature of background knowledge.

  14. Computational methods for analysis and inference of kinase/inhibitor relationships

    Directory of Open Access Journals (Sweden)

    Fabrizio eFerrè

    2014-06-01

    Full Text Available The central role of kinases in virtually all signal transduction networks is the driving motivation for the development of compounds modulating their activity. ATP-mimetic inhibitors are essential tools for elucidating signaling pathways and are emerging as promising therapeutic agents. However, off-target ligand binding and complex and sometimes unexpected kinase/inhibitor relationships can occur for seemingly unrelated kinases, stressing that computational approaches are needed for learning the interaction determinants and for the inference of the effect of small compounds on a given kinase. Recently published high-throughput profiling studies assessed the effects of thousands of small compound inhibitors, covering a substantial portion of the kinome. This wealth of data paved the road for computational resources and methods that can offer a major contribution in understanding the reasons of the inhibition, helping in the rational design of more specific molecules, in the in silico prediction of inhibition for those neglected kinases for which no systematic analysis has been carried yet, in the selection of novel inhibitors with desired selectivity, and offering novel avenues of personalized therapies.

  15. Methods for Inferring Health-Related Social Networks among Coworkers from Online Communication Patterns

    Science.gov (United States)

    Matthews, Luke J.; DeWan, Peter; Rula, Elizabeth Y.

    2013-01-01

    Studies of social networks, mapped using self-reported contacts, have demonstrated the strong influence of social connections on the propensity for individuals to adopt or maintain healthy behaviors and on their likelihood to adopt health risks such as obesity. Social network analysis may prove useful for businesses and organizations that wish to improve the health of their populations by identifying key network positions. Health traits have been shown to correlate across friendship ties, but evaluating network effects in large coworker populations presents the challenge of obtaining sufficiently comprehensive network data. The purpose of this study was to evaluate methods for using online communication data to generate comprehensive network maps that reproduce the health-associated properties of an offline social network. In this study, we examined three techniques for inferring social relationships from email traffic data in an employee population using thresholds based on: (1) the absolute number of emails exchanged, (2) logistic regression probability of an offline relationship, and (3) the highest ranked email exchange partners. As a model of the offline social network in the same population, a network map was created using social ties reported in a survey instrument. The email networks were evaluated based on the proportion of survey ties captured, comparisons of common network metrics, and autocorrelation of body mass index (BMI) across social ties. Results demonstrated that logistic regression predicted the greatest proportion of offline social ties, thresholding on number of emails exchanged produced the best match to offline network metrics, and ranked email partners demonstrated the strongest autocorrelation of BMI. Since each method had unique strengths, researchers should choose a method based on the aspects of offline behavior of interest. Ranked email partners may be particularly useful for purposes related to health traits in a social network. PMID

  16. Methods for inferring health-related social networks among coworkers from online communication patterns.

    Science.gov (United States)

    Matthews, Luke J; DeWan, Peter; Rula, Elizabeth Y

    2013-01-01

    Studies of social networks, mapped using self-reported contacts, have demonstrated the strong influence of social connections on the propensity for individuals to adopt or maintain healthy behaviors and on their likelihood to adopt health risks such as obesity. Social network analysis may prove useful for businesses and organizations that wish to improve the health of their populations by identifying key network positions. Health traits have been shown to correlate across friendship ties, but evaluating network effects in large coworker populations presents the challenge of obtaining sufficiently comprehensive network data. The purpose of this study was to evaluate methods for using online communication data to generate comprehensive network maps that reproduce the health-associated properties of an offline social network. In this study, we examined three techniques for inferring social relationships from email traffic data in an employee population using thresholds based on: (1) the absolute number of emails exchanged, (2) logistic regression probability of an offline relationship, and (3) the highest ranked email exchange partners. As a model of the offline social network in the same population, a network map was created using social ties reported in a survey instrument. The email networks were evaluated based on the proportion of survey ties captured, comparisons of common network metrics, and autocorrelation of body mass index (BMI) across social ties. Results demonstrated that logistic regression predicted the greatest proportion of offline social ties, thresholding on number of emails exchanged produced the best match to offline network metrics, and ranked email partners demonstrated the strongest autocorrelation of BMI. Since each method had unique strengths, researchers should choose a method based on the aspects of offline behavior of interest. Ranked email partners may be particularly useful for purposes related to health traits in a social network.

  17. Methods for inferring health-related social networks among coworkers from online communication patterns.

    Directory of Open Access Journals (Sweden)

    Luke J Matthews

    Full Text Available Studies of social networks, mapped using self-reported contacts, have demonstrated the strong influence of social connections on the propensity for individuals to adopt or maintain healthy behaviors and on their likelihood to adopt health risks such as obesity. Social network analysis may prove useful for businesses and organizations that wish to improve the health of their populations by identifying key network positions. Health traits have been shown to correlate across friendship ties, but evaluating network effects in large coworker populations presents the challenge of obtaining sufficiently comprehensive network data. The purpose of this study was to evaluate methods for using online communication data to generate comprehensive network maps that reproduce the health-associated properties of an offline social network. In this study, we examined three techniques for inferring social relationships from email traffic data in an employee population using thresholds based on: (1 the absolute number of emails exchanged, (2 logistic regression probability of an offline relationship, and (3 the highest ranked email exchange partners. As a model of the offline social network in the same population, a network map was created using social ties reported in a survey instrument. The email networks were evaluated based on the proportion of survey ties captured, comparisons of common network metrics, and autocorrelation of body mass index (BMI across social ties. Results demonstrated that logistic regression predicted the greatest proportion of offline social ties, thresholding on number of emails exchanged produced the best match to offline network metrics, and ranked email partners demonstrated the strongest autocorrelation of BMI. Since each method had unique strengths, researchers should choose a method based on the aspects of offline behavior of interest. Ranked email partners may be particularly useful for purposes related to health traits in a

  18. Estimation of parameter uncertainty for an activated sludge model using Bayesian inference: a comparison with the frequentist method.

    Science.gov (United States)

    Zonta, Zivko J; Flotats, Xavier; Magrí, Albert

    2014-08-01

    The procedure commonly used for the assessment of the parameters included in activated sludge models (ASMs) relies on the estimation of their optimal value within a confidence region (i.e. frequentist inference). Once optimal values are estimated, parameter uncertainty is computed through the covariance matrix. However, alternative approaches based on the consideration of the model parameters as probability distributions (i.e. Bayesian inference), may be of interest. The aim of this work is to apply (and compare) both Bayesian and frequentist inference methods when assessing uncertainty for an ASM-type model, which considers intracellular storage and biomass growth, simultaneously. Practical identifiability was addressed exclusively considering respirometric profiles based on the oxygen uptake rate and with the aid of probabilistic global sensitivity analysis. Parameter uncertainty was thus estimated according to both the Bayesian and frequentist inferential procedures. Results were compared in order to evidence the strengths and weaknesses of both approaches. Since it was demonstrated that Bayesian inference could be reduced to a frequentist approach under particular hypotheses, the former can be considered as a more generalist methodology. Hence, the use of Bayesian inference is encouraged for tackling inferential issues in ASM environments.

  19. A MACHINE-LEARNING METHOD TO INFER FUNDAMENTAL STELLAR PARAMETERS FROM PHOTOMETRIC LIGHT CURVES

    International Nuclear Information System (INIS)

    Miller, A. A.; Bloom, J. S.; Richards, J. W.; Starr, D. L.; Lee, Y. S.; Butler, N. R.; Tokarz, S.; Smith, N.; Eisner, J. A.

    2015-01-01

    A fundamental challenge for wide-field imaging surveys is obtaining follow-up spectroscopic observations: there are >10 9 photometrically cataloged sources, yet modern spectroscopic surveys are limited to ∼few× 10 6 targets. As we approach the Large Synoptic Survey Telescope era, new algorithmic solutions are required to cope with the data deluge. Here we report the development of a machine-learning framework capable of inferring fundamental stellar parameters (T eff , log g, and [Fe/H]) using photometric-brightness variations and color alone. A training set is constructed from a systematic spectroscopic survey of variables with Hectospec/Multi-Mirror Telescope. In sum, the training set includes ∼9000 spectra, for which stellar parameters are measured using the SEGUE Stellar Parameters Pipeline (SSPP). We employed the random forest algorithm to perform a non-parametric regression that predicts T eff , log g, and [Fe/H] from photometric time-domain observations. Our final optimized model produces a cross-validated rms error (RMSE) of 165 K, 0.39 dex, and 0.33 dex for T eff , log g, and [Fe/H], respectively. Examining the subset of sources for which the SSPP measurements are most reliable, the RMSE reduces to 125 K, 0.37 dex, and 0.27 dex, respectively, comparable to what is achievable via low-resolution spectroscopy. For variable stars this represents a ≈12%-20% improvement in RMSE relative to models trained with single-epoch photometric colors. As an application of our method, we estimate stellar parameters for ∼54,000 known variables. We argue that this method may convert photometric time-domain surveys into pseudo-spectrographic engines, enabling the construction of extremely detailed maps of the Milky Way, its structure, and history

  20. Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA

    DEFF Research Database (Denmark)

    Korneliussen, Thorfinn Sand

    Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing...... that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses...... a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known. Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating...

  1. A novel mutual information-based Boolean network inference method from time-series gene expression data.

    Directory of Open Access Journals (Sweden)

    Shohag Barman

    Full Text Available Inferring a gene regulatory network from time-series gene expression data in systems biology is a challenging problem. Many methods have been suggested, most of which have a scalability limitation due to the combinatorial cost of searching a regulatory set of genes. In addition, they have focused on the accurate inference of a network structure only. Therefore, there is a pressing need to develop a network inference method to search regulatory genes efficiently and to predict the network dynamics accurately.In this study, we employed a Boolean network model with a restricted update rule scheme to capture coarse-grained dynamics, and propose a novel mutual information-based Boolean network inference (MIBNI method. Given time-series gene expression data as an input, the method first identifies a set of initial regulatory genes using mutual information-based feature selection, and then improves the dynamics prediction accuracy by iteratively swapping a pair of genes between sets of the selected regulatory genes and the other genes. Through extensive simulations with artificial datasets, MIBNI showed consistently better performance than six well-known existing methods, REVEAL, Best-Fit, RelNet, CST, CLR, and BIBN in terms of both structural and dynamics prediction accuracy. We further tested the proposed method with two real gene expression datasets for an Escherichia coli gene regulatory network and a fission yeast cell cycle network, and also observed better results using MIBNI compared to the six other methods.Taken together, MIBNI is a promising tool for predicting both the structure and the dynamics of a gene regulatory network.

  2. Inference of directed climate networks: role of instability of causality estimation methods

    Science.gov (United States)

    Hlinka, Jaroslav; Hartman, David; Vejmelka, Martin; Paluš, Milan

    2013-04-01

    Climate data are increasingly analyzed by complex network analysis methods, including graph-theoretical approaches [1]. For such analysis, links between localized nodes of climate network are typically quantified by some statistical measures of dependence (connectivity) between measured variables of interest. To obtain information on the directionality of the interactions in the networks, a wide range of methods exists. These can be broadly divided into linear and nonlinear methods, with some of the latter having the theoretical advantage of being model-free, and principally a generalization of the former [2]. However, as a trade-off, this generality comes together with lower accuracy - in particular if the system was close to linear. In an overall stationary system, this may potentially lead to higher variability in the nonlinear network estimates. Therefore, with the same control of false alarms, this may lead to lower sensitivity for detection of real changes in the network structure. These problems are discussed on the example of daily SAT and SLP data from the NCEP/NCAR reanalysis dataset. We first reduce the dimensionality of data using PCA with VARIMAX rotation to detect several dozens of components that together explain most of the data variability. We further construct directed climate networks applying a selection of most widely used methods - variants of linear Granger causality and conditional mutual information. Finally, we assess the stability of the detected directed climate networks by computing them in sliding time windows. To understand the origin of the observed instabilities and their range, we also apply the same procedure to two types of surrogate data: either with non-stationarity in network structure removed, or imposed in a controlled way. In general, the linear methods show stable results in terms of overall similarity of directed climate networks inferred. For instance, for different decades of SAT data, the Spearman correlation of edge

  3. Strong Inference in Mathematical Modeling: A Method for Robust Science in the Twenty-First Century.

    Science.gov (United States)

    Ganusov, Vitaly V

    2016-01-01

    While there are many opinions on what mathematical modeling in biology is, in essence, modeling is a mathematical tool, like a microscope, which allows consequences to logically follow from a set of assumptions. Only when this tool is applied appropriately, as microscope is used to look at small items, it may allow to understand importance of specific mechanisms/assumptions in biological processes. Mathematical modeling can be less useful or even misleading if used inappropriately, for example, when a microscope is used to study stars. According to some philosophers (Oreskes et al., 1994), the best use of mathematical models is not when a model is used to confirm a hypothesis but rather when a model shows inconsistency of the model (defined by a specific set of assumptions) and data. Following the principle of strong inference for experimental sciences proposed by Platt (1964), I suggest "strong inference in mathematical modeling" as an effective and robust way of using mathematical modeling to understand mechanisms driving dynamics of biological systems. The major steps of strong inference in mathematical modeling are (1) to develop multiple alternative models for the phenomenon in question; (2) to compare the models with available experimental data and to determine which of the models are not consistent with the data; (3) to determine reasons why rejected models failed to explain the data, and (4) to suggest experiments which would allow to discriminate between remaining alternative models. The use of strong inference is likely to provide better robustness of predictions of mathematical models and it should be strongly encouraged in mathematical modeling-based publications in the Twenty-First century.

  4. Strong Inference in Mathematical Modeling: A Method for Robust Science in the Twenty-First Century

    Science.gov (United States)

    Ganusov, Vitaly V.

    2016-01-01

    While there are many opinions on what mathematical modeling in biology is, in essence, modeling is a mathematical tool, like a microscope, which allows consequences to logically follow from a set of assumptions. Only when this tool is applied appropriately, as microscope is used to look at small items, it may allow to understand importance of specific mechanisms/assumptions in biological processes. Mathematical modeling can be less useful or even misleading if used inappropriately, for example, when a microscope is used to study stars. According to some philosophers (Oreskes et al., 1994), the best use of mathematical models is not when a model is used to confirm a hypothesis but rather when a model shows inconsistency of the model (defined by a specific set of assumptions) and data. Following the principle of strong inference for experimental sciences proposed by Platt (1964), I suggest “strong inference in mathematical modeling” as an effective and robust way of using mathematical modeling to understand mechanisms driving dynamics of biological systems. The major steps of strong inference in mathematical modeling are (1) to develop multiple alternative models for the phenomenon in question; (2) to compare the models with available experimental data and to determine which of the models are not consistent with the data; (3) to determine reasons why rejected models failed to explain the data, and (4) to suggest experiments which would allow to discriminate between remaining alternative models. The use of strong inference is likely to provide better robustness of predictions of mathematical models and it should be strongly encouraged in mathematical modeling-based publications in the Twenty-First century. PMID:27499750

  5. Strong inference in mathematical modeling: a method for robust science in the 21st century

    Directory of Open Access Journals (Sweden)

    Vitaly V. Ganusov

    2016-07-01

    Full Text Available While there are many opinions on what mathematical modeling in biology is, in essence, modeling is a mathematical tool, like a microscope, which allows consequences to logically follow from a set of assumptions. Only when this tool is applied appropriately, as microscope is used to look at small items, it may allow to understand importance of specific mechanisms/assumptions in biological processes. Mathematical modeling can be less useful or even misleading if used inappropriately, for example, when a microscope is used to study stars. According to some philosophers [1], the best use of mathematical models is not when a model is used to confirm a hypothesis but rather when a model shows inconsistency of the model (defined by a specific set of assumptions and data. Following the principle of strong inference for experimental sciences proposed by Platt [2], I suggest ``strong inference in mathematical modeling'' as an effective and robust way of using mathematical modeling to understand mechanisms driving dynamics of biological systems. The major steps of strong inference in mathematical modeling are 1 to develop multiple alternative models for the phenomenon in question; 2 to compare the models with available experimental data and to determine which of the models are not consistent with the data; 3 to determine reasons why rejected models failed to explain the data, and 4 to suggest experiments which would allow to discriminate between remaining alternative models. The use of strong inference is likely to provide better robustness of predictions of mathematical models and it should be strongly encouraged in mathematical modeling-based publications in the 21st century.

  6. A hierarchical method for Bayesian inference of rate parameters from shock tube data: Application to the study of the reaction of hydroxyl with 2-methylfuran

    KAUST Repository

    Kim, Daesang; El Gharamti, Iman; Hantouche, Mireille; Elwardani, Ahmed Elsaid; Farooq, Aamir; Bisetti, Fabrizio; Knio, Omar

    2017-01-01

    We developed a novel two-step hierarchical method for the Bayesian inference of the rate parameters of a target reaction from time-resolved concentration measurements in shock tubes. The method was applied to the calibration of the parameters

  7. Inferring Weighted Directed Association Networks from Multivariate Time Series with the Small-Shuffle Symbolic Transfer Entropy Spectrum Method

    Directory of Open Access Journals (Sweden)

    Yanzhu Hu

    2016-09-01

    Full Text Available Complex network methodology is very useful for complex system exploration. However, the relationships among variables in complex systems are usually not clear. Therefore, inferring association networks among variables from their observed data has been a popular research topic. We propose a method, named small-shuffle symbolic transfer entropy spectrum (SSSTES, for inferring association networks from multivariate time series. The method can solve four problems for inferring association networks, i.e., strong correlation identification, correlation quantification, direction identification and temporal relation identification. The method can be divided into four layers. The first layer is the so-called data layer. Data input and processing are the things to do in this layer. In the second layer, we symbolize the model data, original data and shuffled data, from the previous layer and calculate circularly transfer entropy with different time lags for each pair of time series variables. Thirdly, we compose transfer entropy spectrums for pairwise time series with the previous layer’s output, a list of transfer entropy matrix. We also identify the correlation level between variables in this layer. In the last layer, we build a weighted adjacency matrix, the value of each entry representing the correlation level between pairwise variables, and then get the weighted directed association network. Three sets of numerical simulated data from a linear system, a nonlinear system and a coupled Rossler system are used to show how the proposed approach works. Finally, we apply SSSTES to a real industrial system and get a better result than with two other methods.

  8. A Bayesian method for inferring transmission chains in a partially observed epidemic.

    Energy Technology Data Exchange (ETDEWEB)

    Marzouk, Youssef M.; Ray, Jaideep

    2008-10-01

    We present a Bayesian approach for estimating transmission chains and rates in the Abakaliki smallpox epidemic of 1967. The epidemic affected 30 individuals in a community of 74; only the dates of appearance of symptoms were recorded. Our model assumes stochastic transmission of the infections over a social network. Distinct binomial random graphs model intra- and inter-compound social connections, while disease transmission over each link is treated as a Poisson process. Link probabilities and rate parameters are objects of inference. Dates of infection and recovery comprise the remaining unknowns. Distributions for smallpox incubation and recovery periods are obtained from historical data. Using Markov chain Monte Carlo, we explore the joint posterior distribution of the scalar parameters and provide an expected connectivity pattern for the social graph and infection pathway.

  9. A novel method for inferring RFID tag reader recordings into clinical events.

    Science.gov (United States)

    Chang, Yung-Ting; Syed-Abdul, Shabbir; Tsai, Chung-You; Li, Yu-Chuan

    2011-12-01

    Nosocomial infections (NIs) are among the important indicators used for evaluating patients' safety and hospital performance during accreditation of hospitals. NI rate is higher in Intensive Care Units (ICUs) than in the general wards because patients require intense care involving both invasive and non-invasive clinical procedures. The emergence of Superbugs is motivating health providers to enhance infection control measures. Contact behavior between health caregivers and patients is one of the main causes of cross infections. In this technology driven era remote monitoring of patients and caregivers in the hospital setting can be performed reliably, and thus is in demand. Proximity sensing using radio frequency identification (RFID) technology can be helpful in capturing and keeping track on all contact history between health caregivers and patients for example. This study intended to extend the use of proximity sensing of radio frequency identification technology by proposing a model for inferring RFID tag reader recordings into clinical events. The aims of the study are twofold. The first aim is to set up a Contact History Inferential Model (CHIM) between health caregivers and patients. The second is to verify CHIM with real-time observation done at the ICU ward. A pre-study was conducted followed by two study phases. During the pre-study proximity sensing of RFID was tested, and deployment of the RFID in the Clinical Skill Center in one of the medical centers in Taiwan was done. We simulated clinical events and developed CHIM using variables such as duration of time, frequency, and identity (tag) numbers assigned to caregivers. All clinical proximity events are classified into close-in events, contact events and invasive events. During the first phase three observers were recruited to do real time recordings of all clinical events in the Clinical Skill Center with the deployed automated RFID interaction recording system. The observations were used to verify

  10. ITrace: An implicit trust inference method for trust-aware collaborative filtering

    Science.gov (United States)

    He, Xu; Liu, Bin; Chen, Kejia

    2018-04-01

    The growth of Internet commerce has stimulated the use of collaborative filtering (CF) algorithms as recommender systems. A CF algorithm recommends items of interest to the target user by leveraging the votes given by other similar users. In a standard CF framework, it is assumed that the credibility of every voting user is exactly the same with respect to the target user. This assumption is not satisfied and thus may lead to misleading recommendations in many practical applications. A natural countermeasure is to design a trust-aware CF (TaCF) algorithm, which can take account of the difference in the credibilities of the voting users when performing CF. To this end, this paper presents a trust inference approach, which can predict the implicit trust of the target user on every voting user from a sparse explicit trust matrix. Then an improved CF algorithm termed iTrace is proposed, which takes advantage of both the explicit and the predicted implicit trust to provide recommendations with the CF framework. An empirical evaluation on a public dataset demonstrates that the proposed algorithm provides a significant improvement in recommendation quality in terms of mean absolute error.

  11. An Application of Fuzzy Inference System by Clustering Subtractive Fuzzy Method for Estimating of Product Requirement

    Directory of Open Access Journals (Sweden)

    Fajar Ibnu Tufeil

    2009-06-01

    Full Text Available Model fuzzy memiliki kemampuan untuk menjelaskan secara linguistik suatu sistem yang terlalu kompleks. Aturan-aturan dalam model fuzzy pada umumnya dibangun berdasarkan keahlian manusia dan pengetahuan heuristik dari sistem yang dimodelkan. Teknik ini selanjutnya dikembangkan menjadi teknik yang dapat mengidentifikasi aturan-aturan dari suatu basis data yang telah dikelompokkan berdasarkan persamaan strukturnya. Dalam hal ini metode pengelompokan fuzzy berfungsi untuk mencari kelompok-kelompok data. Informasi yang dihasilkan dari metode pengelompokan ini, yaitu informasi tentang pusat kelompok, digunakan untuk membentuk aturan-aturan dalam sistem penalaran fuzzy. Dalam skripsi ini dibahas mengenai penerapan fuzzy infereance system dengan metode pengelompokan fuzzy subtractive clustering, yaitu untuk membentuk sistem penalaran fuzzy dengan menggunakan model fuzzy Takagi-Sugeno orde satu. Selanjutnya, metode pengelompokan fuzzy subtractive clustering diterapkan dalam memodelkan masalah dibidang pemasaran, yaitu untuk memprediksi permintaan pasar terhadap suatu produk susu. Aplikasi ini dibangun menggunakan Borland Delphi 6.0. Dari hasil pengujian diperoleh tingkat error prediksi terkecil yaitu dengan Error Average 0.08%.

  12. New Bayesian inference method using two steps of Markov chain Monte Carlo and its application to shock tube experiment data of Furan oxidation

    KAUST Repository

    Kim, Daesang

    2016-01-06

    A new Bayesian inference method has been developed and applied to Furan shock tube experimental data for efficient statistical inferences of the Arrhenius parameters of two OH radical consumption reactions. The collected experimental data, which consist of time series signals of OH radical concentrations of 14 shock tube experiments, may require several days for MCMC computations even with the support of a fast surrogate of the combustion simulation model, while the new method reduces it to several hours by splitting the process into two steps of MCMC: the first inference of rate constants and the second inference of the Arrhenius parameters. Each step has low dimensional parameter spaces and the second step does not need the executions of the combustion simulation. Furthermore, the new approach has more flexibility in choosing the ranges of the inference parameters, and the higher speed and flexibility enable the more accurate inferences and the analyses of the propagation of errors in the measured temperatures and the alignment of the experimental time to the inference results.

  13. A combined evidence Bayesian method for human ancestry inference applied to Afro-Colombians.

    Science.gov (United States)

    Rishishwar, Lavanya; Conley, Andrew B; Vidakovic, Brani; Jordan, I King

    2015-12-15

    Uniparental genetic markers, mitochondrial DNA (mtDNA) and Y chromosomal DNA, are widely used for the inference of human ancestry. However, the resolution of ancestral origins based on mtDNA haplotypes is limited by the fact that such haplotypes are often found to be distributed across wide geographical regions. We have addressed this issue here by combining two sources of ancestry information that have typically been considered separately: historical records regarding population origins and genetic information on mtDNA haplotypes. To combine these distinct data sources, we applied a Bayesian approach that considers historical records, in the form of prior probabilities, together with data on the geographical distribution of mtDNA haplotypes, formulated as likelihoods, to yield ancestry assignments from posterior probabilities. This combined evidence Bayesian approach to ancestry assignment was evaluated for its ability to accurately assign sub-continental African ancestral origins to Afro-Colombians based on their mtDNA haplotypes. We demonstrate that the incorporation of historical prior probabilities via this analytical framework can provide for substantially increased resolution in sub-continental African ancestry assignment for members of this population. In addition, a personalized approach to ancestry assignment that involves the tuning of priors to individual mtDNA haplotypes yields even greater resolution for individual ancestry assignment. Despite the fact that Colombia has a large population of Afro-descendants, the ancestry of this community has been understudied relative to populations with primarily European and Native American ancestry. Thus, the application of the kind of combined evidence approach developed here to the study of ancestry in the Afro-Colombian population has the potential to be impactful. The formal Bayesian analytical framework we propose for combining historical and genetic information also has the potential to be widely applied

  14. Further Insight and Additional Inference Methods for Polynomial Regression Applied to the Analysis of Congruence

    Science.gov (United States)

    Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti

    2010-01-01

    In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…

  15. Practical Bayesian Inference

    Science.gov (United States)

    Bailer-Jones, Coryn A. L.

    2017-04-01

    Preface; 1. Probability basics; 2. Estimation and uncertainty; 3. Statistical models and inference; 4. Linear models, least squares, and maximum likelihood; 5. Parameter estimation: single parameter; 6. Parameter estimation: multiple parameters; 7. Approximating distributions; 8. Monte Carlo methods for inference; 9. Parameter estimation: Markov chain Monte Carlo; 10. Frequentist hypothesis testing; 11. Model comparison; 12. Dealing with more complicated problems; References; Index.

  16. An operant-based detection method for inferring tinnitus in mice.

    Science.gov (United States)

    Zuo, Hongyan; Lei, Debin; Sivaramakrishnan, Shobhana; Howie, Benjamin; Mulvany, Jessica; Bao, Jianxin

    2017-11-01

    Subjective tinnitus is a hearing disorder in which a person perceives sound when no external sound is present. It can be acute or chronic. Because our current understanding of its pathology is incomplete, no effective cures have yet been established. Mouse models are useful for studying the pathophysiology of tinnitus as well as for developing therapeutic treatments. We have developed a new method for determining acute and chronic tinnitus in mice, called sound-based avoidance detection (SBAD). The SBAD method utilizes one paradigm to detect tinnitus and another paradigm to monitor possible confounding factors, such as motor impairment, loss of motivation, and deficits in learning and memory. The SBAD method has succeeded in monitoring both acute and chronic tinnitus in mice. Its detection ability is further validated by functional studies demonstrating an abnormal increase in neuronal activity in the inferior colliculus of mice that had previously been identified as having tinnitus by the SBAD method. The SBAD method provides a new means by which investigators can detect tinnitus in a single mouse accurately and with more control over potential confounding factors than existing methods. This work establishes a new behavioral method for detecting tinnitus in mice. The detection outcome is consistent with functional validation. One key advantage of mouse models is they provide researchers the opportunity to utilize an extensive array of genetic tools. This new method could lead to a deeper understanding of the molecular pathways underlying tinnitus pathology. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Managing Operational Risk Related to Microfinance Lending Process using Fuzzy Inference System based on the FMEA Method: Moroccan Case Study

    Directory of Open Access Journals (Sweden)

    Alaoui Youssef Lamrani

    2017-12-01

    Full Text Available Managing operational risk efficiently is a critical factor of microfinance institutions (MFIs to get a financial and social return. The purpose of this paper is to identify, assess and prioritize the root causes of failure within the microfinance lending process (MLP especially in Moroccan microfinance institutions. Considering the limitation of traditional failure mode and effect analysis (FMEA method in assessing and classifying risks, the methodology adopted in this study focuses on developing a fuzzy logic inference system (FLIS based on (FMEA. This approach can take into account the subjectivity of risk indicators and the insufficiency of statistical data. The results show that the Moroccan MFIs need to focus more on customer relationship management and give more importance to their staff training, to clients screening as well as to their business analysis.

  18. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    Science.gov (United States)

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  19. Reconstruction of gene regulatory modules from RNA silencing of IFN-α modulators: experimental set-up and inference method.

    Science.gov (United States)

    Grassi, Angela; Di Camillo, Barbara; Ciccarese, Francesco; Agnusdei, Valentina; Zanovello, Paola; Amadori, Alberto; Finesso, Lorenzo; Indraccolo, Stefano; Toffolo, Gianna Maria

    2016-03-12

    Inference of gene regulation from expression data may help to unravel regulatory mechanisms involved in complex diseases or in the action of specific drugs. A challenging task for many researchers working in the field of systems biology is to build up an experiment with a limited budget and produce a dataset suitable to reconstruct putative regulatory modules worth of biological validation. Here, we focus on small-scale gene expression screens and we introduce a novel experimental set-up and a customized method of analysis to make inference on regulatory modules starting from genetic perturbation data, e.g. knockdown and overexpression data. To illustrate the utility of our strategy, it was applied to produce and analyze a dataset of quantitative real-time RT-PCR data, in which interferon-α (IFN-α) transcriptional response in endothelial cells is investigated by RNA silencing of two candidate IFN-α modulators, STAT1 and IFIH1. A putative regulatory module was reconstructed by our method, revealing an intriguing feed-forward loop, in which STAT1 regulates IFIH1 and they both negatively regulate IFNAR1. STAT1 regulation on IFNAR1 was object of experimental validation at the protein level. Detailed description of the experimental set-up and of the analysis procedure is reported, with the intent to be of inspiration for other scientists who want to realize similar experiments to reconstruct gene regulatory modules starting from perturbations of possible regulators. Application of our approach to the study of IFN-α transcriptional response modulators in endothelial cells has led to many interesting novel findings and new biological hypotheses worth of validation.

  20. A new method to infer vegetation boundary movement from 'snapshot' data

    NARCIS (Netherlands)

    Eppinga, M.B.; Pucko, C.A.; Baudena, M.; Beckage, B.; Molofsky, J.

    2012-01-01

    Global change may induce shifts in plant community distributions at multiple spatial scales. At the ecosystem scale, such shifts may result in movement of ecotones or vegetation boundaries. Most indicators for ecosystem change require timeseries data, but here a new method is proposed enabling

  1. Bayesian inference for data assimilation using Least-Squares Finite Element methods

    International Nuclear Information System (INIS)

    Dwight, Richard P

    2010-01-01

    It has recently been observed that Least-Squares Finite Element methods (LS-FEMs) can be used to assimilate experimental data into approximations of PDEs in a natural way, as shown by Heyes et al. in the case of incompressible Navier-Stokes flow. The approach was shown to be effective without regularization terms, and can handle substantial noise in the experimental data without filtering. Of great practical importance is that - unlike other data assimilation techniques - it is not significantly more expensive than a single physical simulation. However the method as presented so far in the literature is not set in the context of an inverse problem framework, so that for example the meaning of the final result is unclear. In this paper it is shown that the method can be interpreted as finding a maximum a posteriori (MAP) estimator in a Bayesian approach to data assimilation, with normally distributed observational noise, and a Bayesian prior based on an appropriate norm of the governing equations. In this setting the method may be seen to have several desirable properties: most importantly discretization and modelling error in the simulation code does not affect the solution in limit of complete experimental information, so these errors do not have to be modelled statistically. Also the Bayesian interpretation better justifies the choice of the method, and some useful generalizations become apparent. The technique is applied to incompressible Navier-Stokes flow in a pipe with added velocity data, where its effectiveness, robustness to noise, and application to inverse problems is demonstrated.

  2. Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states

    Directory of Open Access Journals (Sweden)

    Grünewald Stefan

    2011-01-01

    Full Text Available Abstract Background As one of the most widely used parsimony methods for ancestral reconstruction, the Fitch method minimizes the total number of hypothetical substitutions along all branches of a tree to explain the evolution of a character. Due to the extensive usage of this method, it has become a scientific endeavor in recent years to study the reconstruction accuracies of the Fitch method. However, most studies are restricted to 2-state evolutionary models and a study for higher-state models is needed since DNA sequences take the format of 4-state series and protein sequences even have 20 states. Results In this paper, the ambiguous and unambiguous reconstruction accuracy of the Fitch method are studied for N-state evolutionary models. Given an arbitrary phylogenetic tree, a recurrence system is first presented to calculate iteratively the two accuracies. As complete binary tree and comb-shaped tree are the two extremal evolutionary tree topologies according to balance, we focus on the reconstruction accuracies on these two topologies and analyze their asymptotic properties. Then, 1000 Yule trees with 1024 leaves are generated and analyzed to simulate real evolutionary scenarios. It is known that more taxa not necessarily increase the reconstruction accuracies under 2-state models. The result under N-state models is also tested. Conclusions In a large tree with many leaves, the reconstruction accuracies of using all taxa are sometimes less than those of using a leaf subset under N-state models. For complete binary trees, there always exists an equilibrium interval [a, b] of conservation probability, in which the limiting ambiguous reconstruction accuracy equals to the probability of randomly picking a state. The value b decreases with the increase of the number of states, and it seems to converge. When the conservation probability is greater than b, the reconstruction accuracies of the Fitch method increase rapidly. The reconstruction

  3. PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods

    Directory of Open Access Journals (Sweden)

    Francisco Alexandre P

    2012-05-01

    Full Text Available Abstract Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.

  4. A Hamiltonian Monte–Carlo method for Bayesian inference of supermassive black hole binaries

    International Nuclear Information System (INIS)

    Porter, Edward K; Carré, Jérôme

    2014-01-01

    We investigate the use of a Hamiltonian Monte–Carlo to map out the posterior density function for supermassive black hole binaries. While previous Markov Chain Monte–Carlo (MCMC) methods, such as Metropolis–Hastings MCMC, have been successfully employed for a number of different gravitational wave sources, these methods are essentially random walk algorithms. The Hamiltonian Monte–Carlo treats the inverse likelihood surface as a ‘gravitational potential’ and by introducing canonical positions and momenta, dynamically evolves the Markov chain by solving Hamilton's equations of motion. This method is not as widely used as other MCMC algorithms due to the necessity of calculating gradients of the log-likelihood, which for most applications results in a bottleneck that makes the algorithm computationally prohibitive. We circumvent this problem by using accepted initial phase-space trajectory points to analytically fit for each of the individual gradients. Eliminating the waveform generation needed for the numerical derivatives reduces the total number of required templates for a 10 6 iteration chain from ∼10 9 to ∼10 6 . The result is in an implementation of the Hamiltonian Monte–Carlo that is faster, and more efficient by a factor of approximately the dimension of the parameter space, than a Hessian MCMC. (paper)

  5. Smoothed Particle Inference: A Kilo-Parametric Method for X-ray Galaxy Cluster Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Peterson, John R.; Marshall, P.J.; /KIPAC, Menlo Park; Andersson, K.; /Stockholm U. /SLAC

    2005-08-05

    We propose an ambitious new method that models the intracluster medium in clusters of galaxies as a set of X-ray emitting smoothed particles of plasma. Each smoothed particle is described by a handful of parameters including temperature, location, size, and elemental abundances. Hundreds to thousands of these particles are used to construct a model cluster of galaxies, with the appropriate complexity estimated from the data quality. This model is then compared iteratively with X-ray data in the form of adaptively binned photon lists via a two-sample likelihood statistic and iterated via Markov Chain Monte Carlo. The complex cluster model is propagated through the X-ray instrument response using direct sampling Monte Carlo methods. Using this approach the method can reproduce many of the features observed in the X-ray emission in a less assumption-dependent way that traditional analyses, and it allows for a more detailed characterization of the density, temperature, and metal abundance structure of clusters. Multi-instrument X-ray analyses and simultaneous X-ray, Sunyaev-Zeldovich (SZ), and lensing analyses are a straight-forward extension of this methodology. Significant challenges still exist in understanding the degeneracy in these models and the statistical noise induced by the complexity of the models.

  6. Near-field hazard assessment of March 11, 2011 Japan Tsunami sources inferred from different methods

    Science.gov (United States)

    Wei, Y.; Titov, V.V.; Newman, A.; Hayes, G.; Tang, L.; Chamberlin, C.

    2011-01-01

    Tsunami source is the origin of the subsequent transoceanic water waves, and thus the most critical component in modern tsunami forecast methodology. Although impractical to be quantified directly, a tsunami source can be estimated by different methods based on a variety of measurements provided by deep-ocean tsunameters, seismometers, GPS, and other advanced instruments, some in real time, some in post real-time. Here we assess these different sources of the devastating March 11, 2011 Japan tsunami by model-data comparison for generation, propagation and inundation in the near field of Japan. This study provides a comparative study to further understand the advantages and shortcomings of different methods that may be potentially used in real-time warning and forecast of tsunami hazards, especially in the near field. The model study also highlights the critical role of deep-ocean tsunami measurements for high-quality tsunami forecast, and its combination with land GPS measurements may lead to better understanding of both the earthquake mechanisms and tsunami generation process. ?? 2011 MTS.

  7. Application of the EXtrapolated Efficiency Method (EXEM) to infer the gamma-cascade detection efficiency in the actinide region

    International Nuclear Information System (INIS)

    Ducasse, Q.; Jurado, B.; Mathieu, L.; Marini, P.; Morillon, B.; Aiche, M.; Tsekhanovich, I.

    2016-01-01

    The study of transfer-induced gamma-decay probabilities is very useful for understanding the surrogate-reaction method and, more generally, for constraining statistical-model calculations. One of the main difficulties in the measurement of gamma-decay probabilities is the determination of the gamma-cascade detection efficiency. In Boutoux et al. (2013) [10] we developed the EXtrapolated Efficiency Method (EXEM), a new method to measure this quantity. In this work, we have applied, for the first time, the EXEM to infer the gamma-cascade detection efficiency in the actinide region. In particular, we have considered the "2"3"8U(d,p)"2"3"9U and "2"3"8U("3He,d)"2"3"9Np reactions. We have performed Hauser–Feshbach calculations to interpret our results and to verify the hypothesis on which the EXEM is based. The determination of fission and gamma-decay probabilities of "2"3"9Np below the neutron separation energy allowed us to validate the EXEM.

  8. Integration of Adaptive Neuro-Fuzzy Inference System, Neural Networks and Geostatistical Methods for Fracture Density Modeling

    Directory of Open Access Journals (Sweden)

    Ja’fari A.

    2014-01-01

    Full Text Available Image logs provide useful information for fracture study in naturally fractured reservoir. Fracture dip, azimuth, aperture and fracture density can be obtained from image logs and have great importance in naturally fractured reservoir characterization. Imaging all fractured parts of hydrocarbon reservoirs and interpreting the results is expensive and time consuming. In this study, an improved method to make a quantitative correlation between fracture densities obtained from image logs and conventional well log data by integration of different artificial intelligence systems was proposed. The proposed method combines the results of Adaptive Neuro-Fuzzy Inference System (ANFIS and Neural Networks (NN algorithms for overall estimation of fracture density from conventional well log data. A simple averaging method was used to obtain a better result by combining results of ANFIS and NN. The algorithm applied on other wells of the field to obtain fracture density. In order to model the fracture density in the reservoir, we used variography and sequential simulation algorithms like Sequential Indicator Simulation (SIS and Truncated Gaussian Simulation (TGS. The overall algorithm applied to Asmari reservoir one of the SW Iranian oil fields. Histogram analysis applied to control the quality of the obtained models. Results of this study show that for higher number of fracture facies the TGS algorithm works better than SIS but in small number of fracture facies both algorithms provide approximately same results.

  9. Application of the EXtrapolated Efficiency Method (EXEM) to infer the gamma-cascade detection efficiency in the actinide region

    Energy Technology Data Exchange (ETDEWEB)

    Ducasse, Q. [CENBG, CNRS/IN2P3-Université de Bordeaux, Chemin du Solarium B.P. 120, 33175 Gradignan (France); CEA-Cadarache, DEN/DER/SPRC/LEPh, 13108 Saint Paul lez Durance (France); Jurado, B., E-mail: jurado@cenbg.in2p3.fr [CENBG, CNRS/IN2P3-Université de Bordeaux, Chemin du Solarium B.P. 120, 33175 Gradignan (France); Mathieu, L.; Marini, P. [CENBG, CNRS/IN2P3-Université de Bordeaux, Chemin du Solarium B.P. 120, 33175 Gradignan (France); Morillon, B. [CEA DAM DIF, 91297 Arpajon (France); Aiche, M.; Tsekhanovich, I. [CENBG, CNRS/IN2P3-Université de Bordeaux, Chemin du Solarium B.P. 120, 33175 Gradignan (France)

    2016-08-01

    The study of transfer-induced gamma-decay probabilities is very useful for understanding the surrogate-reaction method and, more generally, for constraining statistical-model calculations. One of the main difficulties in the measurement of gamma-decay probabilities is the determination of the gamma-cascade detection efficiency. In Boutoux et al. (2013) [10] we developed the EXtrapolated Efficiency Method (EXEM), a new method to measure this quantity. In this work, we have applied, for the first time, the EXEM to infer the gamma-cascade detection efficiency in the actinide region. In particular, we have considered the {sup 238}U(d,p){sup 239}U and {sup 238}U({sup 3}He,d){sup 239}Np reactions. We have performed Hauser–Feshbach calculations to interpret our results and to verify the hypothesis on which the EXEM is based. The determination of fission and gamma-decay probabilities of {sup 239}Np below the neutron separation energy allowed us to validate the EXEM.

  10. Distributional Inference

    NARCIS (Netherlands)

    Kroese, A.H.; van der Meulen, E.A.; Poortema, Klaas; Schaafsma, W.

    1995-01-01

    The making of statistical inferences in distributional form is conceptionally complicated because the epistemic 'probabilities' assigned are mixtures of fact and fiction. In this respect they are essentially different from 'physical' or 'frequency-theoretic' probabilities. The distributional form is

  11. Inferring global upper-mantle shear attenuation structure by waveform tomography using the spectral element method

    Science.gov (United States)

    Karaoǧlu, Haydar; Romanowicz, Barbara

    2018-06-01

    We present a global upper-mantle shear wave attenuation model that is built through a hybrid full-waveform inversion algorithm applied to long-period waveforms, using the spectral element method for wavefield computations. Our inversion strategy is based on an iterative approach that involves the inversion for successive updates in the attenuation parameter (δ Q^{-1}_μ) and elastic parameters (isotropic velocity VS, and radial anisotropy parameter ξ) through a Gauss-Newton-type optimization scheme that employs envelope- and waveform-type misfit functionals for the two steps, respectively. We also include source and receiver terms in the inversion steps for attenuation structure. We conducted a total of eight iterations (six for attenuation and two for elastic structure), and one inversion for updates to source parameters. The starting model included the elastic part of the relatively high-resolution 3-D whole mantle seismic velocity model, SEMUCB-WM1, which served to account for elastic focusing effects. The data set is a subset of the three-component surface waveform data set, filtered between 400 and 60 s, that contributed to the construction of the whole-mantle tomographic model SEMUCB-WM1. We applied strict selection criteria to this data set for the attenuation iteration steps, and investigated the effect of attenuation crustal structure on the retrieved mantle attenuation structure. While a constant 1-D Qμ model with a constant value of 165 throughout the upper mantle was used as starting model for attenuation inversion, we were able to recover, in depth extent and strength, the high-attenuation zone present in the depth range 80-200 km. The final 3-D model, SEMUCB-UMQ, shows strong correlation with tectonic features down to 200-250 km depth, with low attenuation beneath the cratons, stable parts of continents and regions of old oceanic crust, and high attenuation along mid-ocean ridges and backarcs. Below 250 km, we observe strong attenuation in the

  12. Gradient matching methods for computational inference in mechanistic models for systems biology: a review and comparative analysis

    Directory of Open Access Journals (Sweden)

    Benn eMacdonald

    2015-11-01

    Full Text Available Parameter inference in mathematical models of biological pathways, expressed as coupled ordinary differential equations (ODEs, is a challenging problem in contemporary systems biology. Conventional methods involve repeatedly solving the ODEs by numerical integration, which is computationally onerous and does not scale up to complex systems. Aimed at reducing the computational costs, new concepts based on gradient matching have recently been proposed in the computational statistics and machine learning literature. In a preliminary smoothing step, the time series data are interpolated; then, in a second step, the parameters of the ODEs are optimised so as to minimise some metric measuring the difference between the slopes of the tangents to the interpolants, and the time derivatives from the ODEs. In this way, the ODEs never have to be solved explicitly. This review provides a concise methodological overview of the current state-of-the-art methods for gradient matching in ODEs, followed by an empirical comparative evaluation based on a set of widely used and representative benchmark data.

  13. Improving statistical inference on pathogen densities estimated by quantitative molecular methods: malaria gametocytaemia as a case study.

    Science.gov (United States)

    Walker, Martin; Basáñez, María-Gloria; Ouédraogo, André Lin; Hermsen, Cornelus; Bousema, Teun; Churcher, Thomas S

    2015-01-16

    Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens.

  14. Multimodel inference and adaptive management

    Science.gov (United States)

    Rehme, S.E.; Powell, L.A.; Allen, Craig R.

    2011-01-01

    Ecology is an inherently complex science coping with correlated variables, nonlinear interactions and multiple scales of pattern and process, making it difficult for experiments to result in clear, strong inference. Natural resource managers, policy makers, and stakeholders rely on science to provide timely and accurate management recommendations. However, the time necessary to untangle the complexities of interactions within ecosystems is often far greater than the time available to make management decisions. One method of coping with this problem is multimodel inference. Multimodel inference assesses uncertainty by calculating likelihoods among multiple competing hypotheses, but multimodel inference results are often equivocal. Despite this, there may be pressure for ecologists to provide management recommendations regardless of the strength of their study’s inference. We reviewed papers in the Journal of Wildlife Management (JWM) and the journal Conservation Biology (CB) to quantify the prevalence of multimodel inference approaches, the resulting inference (weak versus strong), and how authors dealt with the uncertainty. Thirty-eight percent and 14%, respectively, of articles in the JWM and CB used multimodel inference approaches. Strong inference was rarely observed, with only 7% of JWM and 20% of CB articles resulting in strong inference. We found the majority of weak inference papers in both journals (59%) gave specific management recommendations. Model selection uncertainty was ignored in most recommendations for management. We suggest that adaptive management is an ideal method to resolve uncertainty when research results in weak inference.

  15. Introductory statistical inference

    CERN Document Server

    Mukhopadhyay, Nitis

    2014-01-01

    This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist

  16. The Impact of Reconstruction Methods, Phylogenetic Uncertainty and Branch Lengths on Inference of Chromosome Number Evolution in American Daisies (Melampodium, Asteraceae)

    OpenAIRE

    McCann, Jamie; Schneeweiss, Gerald M.; Stuessy, Tod F.; Villase?or, Jose L.; Weiss-Schneeweiss, Hanna

    2016-01-01

    Chromosome number change (polyploidy and dysploidy) plays an important role in plant diversification and speciation. Investigating chromosome number evolution commonly entails ancestral state reconstruction performed within a phylogenetic framework, which is, however, prone to uncertainty, whose effects on evolutionary inferences are insufficiently understood. Using the chromosomally diverse plant genus Melampodium (Asteraceae) as model group, we assess the impact of reconstruction method (ma...

  17. IntNetLncSim: an integrative network analysis method to infer human lncRNA functional similarity.

    Science.gov (United States)

    Cheng, Liang; Shi, Hongbo; Wang, Zhenzhen; Hu, Yang; Yang, Haixiu; Zhou, Chen; Sun, Jie; Zhou, Meng

    2016-07-26

    Increasing evidence indicated that long non-coding RNAs (lncRNAs) were involved in various biological processes and complex diseases by communicating with mRNAs/miRNAs each other. Exploiting interactions between lncRNAs and mRNA/miRNAs to lncRNA functional similarity (LFS) is an effective method to explore function of lncRNAs and predict novel lncRNA-disease associations. In this article, we proposed an integrative framework, IntNetLncSim, to infer LFS by modeling the information flow in an integrated network that comprises both lncRNA-related transcriptional and post-transcriptional information. The performance of IntNetLncSim was evaluated by investigating the relationship of LFS with the similarity of lncRNA-related mRNA sets (LmRSets) and miRNA sets (LmiRSets). As a result, LFS by IntNetLncSim was significant positively correlated with the LmRSet (Pearson correlation γ2=0.8424) and LmiRSet (Pearson correlation γ2=0.2601). Particularly, the performance of IntNetLncSim is superior to several previous methods. In the case of applying the LFS to identify novel lncRNA-disease relationships, we achieved an area under the ROC curve (0.7300) in experimentally verified lncRNA-disease associations based on leave-one-out cross-validation. Furthermore, highly-ranked lncRNA-disease associations confirmed by literature mining demonstrated the excellent performance of IntNetLncSim. Finally, a web-accessible system was provided for querying LFS and potential lncRNA-disease relationships: http://www.bio-bigdata.com/IntNetLncSim.

  18. Experiment, monitoring, and gradient methods used to infer climate change effects on plant communities yield consistent patterns

    Science.gov (United States)

    Sarah C. Elmendorf; Gregory H.R. Henry; Robert D. Hollisterd; Anna Maria Fosaa; William A. Gould; Luise Hermanutz; Annika Hofgaard; Ingibjorg I. Jonsdottir; Janet C. Jorgenson; Esther Levesque; Borgbor Magnusson; Ulf Molau; Isla H. Myers-Smith; Steven F. Oberbauer; Christian Rixen; Craig E. Tweedie; Marilyn Walkers

    2015-01-01

    Inference about future climate change impacts typically relies on one of three approaches: manipulative experiments, historical comparisons (broadly defined to include monitoring the response to ambient climate fluctuations using repeat sampling of plots, dendroecology, and paleoecology techniques), and space-for-time substitutions derived from sampling along...

  19. Statistical inference

    CERN Document Server

    Rohatgi, Vijay K

    2003-01-01

    Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth

  20. The Impact of Reconstruction Methods, Phylogenetic Uncertainty and Branch Lengths on Inference of Chromosome Number Evolution in American Daisies (Melampodium, Asteraceae).

    Science.gov (United States)

    McCann, Jamie; Schneeweiss, Gerald M; Stuessy, Tod F; Villaseñor, Jose L; Weiss-Schneeweiss, Hanna

    2016-01-01

    Chromosome number change (polyploidy and dysploidy) plays an important role in plant diversification and speciation. Investigating chromosome number evolution commonly entails ancestral state reconstruction performed within a phylogenetic framework, which is, however, prone to uncertainty, whose effects on evolutionary inferences are insufficiently understood. Using the chromosomally diverse plant genus Melampodium (Asteraceae) as model group, we assess the impact of reconstruction method (maximum parsimony, maximum likelihood, Bayesian methods), branch length model (phylograms versus chronograms) and phylogenetic uncertainty (topological and branch length uncertainty) on the inference of chromosome number evolution. We also address the suitability of the maximum clade credibility (MCC) tree as single representative topology for chromosome number reconstruction. Each of the listed factors causes considerable incongruence among chromosome number reconstructions. Discrepancies between inferences on the MCC tree from those made by integrating over a set of trees are moderate for ancestral chromosome numbers, but severe for the difference of chromosome gains and losses, a measure of the directionality of dysploidy. Therefore, reliance on single trees, such as the MCC tree, is strongly discouraged and model averaging, taking both phylogenetic and model uncertainty into account, is recommended. For studying chromosome number evolution, dedicated models implemented in the program ChromEvol and ordered maximum parsimony may be most appropriate. Chromosome number evolution in Melampodium follows a pattern of bidirectional dysploidy (starting from x = 11 to x = 9 and x = 14, respectively) with no prevailing direction.

  1. The Impact of Reconstruction Methods, Phylogenetic Uncertainty and Branch Lengths on Inference of Chromosome Number Evolution in American Daisies (Melampodium, Asteraceae.

    Directory of Open Access Journals (Sweden)

    Jamie McCann

    Full Text Available Chromosome number change (polyploidy and dysploidy plays an important role in plant diversification and speciation. Investigating chromosome number evolution commonly entails ancestral state reconstruction performed within a phylogenetic framework, which is, however, prone to uncertainty, whose effects on evolutionary inferences are insufficiently understood. Using the chromosomally diverse plant genus Melampodium (Asteraceae as model group, we assess the impact of reconstruction method (maximum parsimony, maximum likelihood, Bayesian methods, branch length model (phylograms versus chronograms and phylogenetic uncertainty (topological and branch length uncertainty on the inference of chromosome number evolution. We also address the suitability of the maximum clade credibility (MCC tree as single representative topology for chromosome number reconstruction. Each of the listed factors causes considerable incongruence among chromosome number reconstructions. Discrepancies between inferences on the MCC tree from those made by integrating over a set of trees are moderate for ancestral chromosome numbers, but severe for the difference of chromosome gains and losses, a measure of the directionality of dysploidy. Therefore, reliance on single trees, such as the MCC tree, is strongly discouraged and model averaging, taking both phylogenetic and model uncertainty into account, is recommended. For studying chromosome number evolution, dedicated models implemented in the program ChromEvol and ordered maximum parsimony may be most appropriate. Chromosome number evolution in Melampodium follows a pattern of bidirectional dysploidy (starting from x = 11 to x = 9 and x = 14, respectively with no prevailing direction.

  2. A Bayesian method for characterizing distributed micro-releases: II. inference under model uncertainty with short time-series data.

    Energy Technology Data Exchange (ETDEWEB)

    Marzouk, Youssef; Fast P. (Lawrence Livermore National Laboratory, Livermore, CA); Kraus, M. (Peterson AFB, CO); Ray, J. P.

    2006-01-01

    Terrorist attacks using an aerosolized pathogen preparation have gained credibility as a national security concern after the anthrax attacks of 2001. The ability to characterize such attacks, i.e., to estimate the number of people infected, the time of infection, and the average dose received, is important when planning a medical response. We address this question of characterization by formulating a Bayesian inverse problem predicated on a short time-series of diagnosed patients exhibiting symptoms. To be of relevance to response planning, we limit ourselves to 3-5 days of data. In tests performed with anthrax as the pathogen, we find that these data are usually sufficient, especially if the model of the outbreak used in the inverse problem is an accurate one. In some cases the scarcity of data may initially support outbreak characterizations at odds with the true one, but with sufficient data the correct inferences are recovered; in other words, the inverse problem posed and its solution methodology are consistent. We also explore the effect of model error-situations for which the model used in the inverse problem is only a partially accurate representation of the outbreak; here, the model predictions and the observations differ by more than a random noise. We find that while there is a consistent discrepancy between the inferred and the true characterizations, they are also close enough to be of relevance when planning a response.

  3. An asymptotic inversion method of inferring the sound velocity distribution in the sun from the spectrum of p-mode oscillations

    International Nuclear Information System (INIS)

    Sekii, Takashi; Shibahashi, Hiromoto

    1989-01-01

    We present an inversion method of inferring the sound velocity distribution in the Sun from its oscillation data of p-modes. The equation governing the p-mode oscillations is reduced to a form similar to the Schroedinger equation in quantum mechanics. By using a quantization rule based on the KWBJ asymptotic method, we derive an integral equation of which solution provides the 'acoustic potential' of the wave equation. The acoustic potential consists of two parts: One of them is related with the squared sound velocity and is dependent on the degree of the mode l, while the other term is independent of l and dominates in the outer part of the Sun. By examining the l-dependence of the acoustic potential obtained as the solution of the integral equation, we separate these two components of the potential and eventually obtain the sound velocity distribution from a set of eigenfrequencies of p-modes. In order to evaluate prospects of this inversion method, we perform numerical simulations in which eigenfrequencies of a theoretical solar model are used to reproduce the sound velocity distribution of the model. The error of thus inferred sound velocity relative to the true values is estimated to be less than a few percent. (author)

  4. Evaluating the impact of implementation factors on family-based prevention programming: methods for strengthening causal inference.

    Science.gov (United States)

    Crowley, D Max; Coffman, Donna L; Feinberg, Mark E; Greenberg, Mark T; Spoth, Richard L

    2014-04-01

    Despite growing recognition of the important role implementation plays in successful prevention efforts, relatively little work has sought to demonstrate a causal relationship between implementation factors and participant outcomes. In turn, failure to explore the implementation-to-outcome link limits our understanding of the mechanisms essential to successful programming. This gap is partially due to the inability of current methodological procedures within prevention science to account for the multitude of confounders responsible for variation in implementation factors (i.e., selection bias). The current paper illustrates how propensity and marginal structural models can be used to improve causal inferences involving implementation factors not easily randomized (e.g., participant attendance). We first present analytic steps for simultaneously evaluating the impact of multiple implementation factors on prevention program outcome. Then, we demonstrate this approach for evaluating the impact of enrollment and attendance in a family program, over and above the impact of a school-based program, within PROSPER, a large-scale real-world prevention trial. Findings illustrate the capacity of this approach to successfully account for confounders that influence enrollment and attendance, thereby more accurately representing true causal relations. For instance, after accounting for selection bias, we observed a 5% reduction in the prevalence of 11th grade underage drinking for those who chose to receive a family program and school program compared to those who received only the school program. Further, we detected a 7% reduction in underage drinking for those with high attendance in the family program.

  5. New PDE-based methods for image enhancement using SOM and Bayesian inference in various discretization schemes

    International Nuclear Information System (INIS)

    Karras, D A; Mertzios, G B

    2009-01-01

    A novel approach is presented in this paper for improving anisotropic diffusion PDE models, based on the Perona–Malik equation. A solution is proposed from an engineering perspective to adaptively estimate the parameters of the regularizing function in this equation. The goal of such a new adaptive diffusion scheme is to better preserve edges when the anisotropic diffusion PDE models are applied to image enhancement tasks. The proposed adaptive parameter estimation in the anisotropic diffusion PDE model involves self-organizing maps and Bayesian inference to define edge probabilities accurately. The proposed modifications attempt to capture not only simple edges but also difficult textural edges and incorporate their probability in the anisotropic diffusion model. In the context of the application of PDE models to image processing such adaptive schemes are closely related to the discrete image representation problem and the investigation of more suitable discretization algorithms using constraints derived from image processing theory. The proposed adaptive anisotropic diffusion model illustrates these concepts when it is numerically approximated by various discretization schemes in a database of magnetic resonance images (MRI), where it is shown to be efficient in image filtering and restoration applications

  6. A hierarchical method for Bayesian inference of rate parameters from shock tube data: Application to the study of the reaction of hydroxyl with 2-methylfuran

    KAUST Repository

    Kim, Daesang

    2017-06-22

    We developed a novel two-step hierarchical method for the Bayesian inference of the rate parameters of a target reaction from time-resolved concentration measurements in shock tubes. The method was applied to the calibration of the parameters of the reaction of hydroxyl with 2-methylfuran, which is studied experimentally via absorption measurements of the OH radical\\'s concentration following shock-heating. In the first step of the approach, each shock tube experiment is treated independently to infer the posterior distribution of the rate constant and error hyper-parameter that best explains the OH signal. In the second step, these posterior distributions are sampled to calibrate the parameters appearing in the Arrhenius reaction model for the rate constant. Furthermore, the second step is modified and repeated in order to explore alternative rate constant models and to assess the effect of uncertainties in the reflected shock\\'s temperature. Comparisons of the estimates obtained via the proposed methodology against the common least squares approach are presented. The relative merits of the novel Bayesian framework are highlighted, especially with respect to the opportunity to utilize the posterior distributions of the parameters in future uncertainty quantification studies.

  7. Fossil gaps inferred from phylogenies alter the apparent nature of diversification in dragonflies and their relatives

    Directory of Open Access Journals (Sweden)

    Nicholson David B

    2011-09-01

    Full Text Available Abstract Background The fossil record has suggested that clade growth may differ in marine and terrestrial taxa, supporting equilibrial models in the former and expansionist models in the latter. However, incomplete sampling may bias findings based on fossil data alone. To attempt to correct for such bias, we assemble phylogenetic supertrees on one of the oldest clades of insects, the Odonatoidea (dragonflies, damselflies and their extinct relatives, using MRP and MRC. We use the trees to determine when, and in what clades, changes in taxonomic richness have occurred. We then test whether equilibrial or expansionist models are supported by fossil data alone, and whether findings differ when phylogenetic information is used to infer gaps in the fossil record. Results There is broad agreement in family-level relationships between both supertrees, though with some uncertainty along the backbone of the tree regarding dragonflies (Anisoptera. "Anisozygoptera" are shown to be paraphyletic when fossil information is taken into account. In both trees, decreases in net diversification are associated with species-poor extant families (Neopetaliidae, Hemiphlebiidae, and an upshift is associated with Calopterygidae + Polythoridae. When ghost ranges are inferred from the fossil record, many families are shown to have much earlier origination dates. In a phylogenetic context, the number of family-level lineages is shown to be up to twice as high as the fossil record alone suggests through the Cretaceous and Cenozoic, and a logistic increase in richness is detected in contrast to an exponential increase indicated by fossils alone. Conclusions Our analysis supports the notion that taxa, which appear to have diversified exponentially using fossil data, may in fact have diversified more logistically. This in turn suggests that one of the major apparent differences between the marine and terrestrial fossil record may simply be an artifact of incomplete sampling

  8. An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

    KAUST Repository

    Rodriguez, Jesse M.; Batzoglou, Serafim; Bercovici, Sivan

    2013-01-01

    , accurate and efficient detection of hidden relatedness becomes a challenge. To enable disease-mapping studies of increasingly large cohorts, a fast and accurate method to detect IBD segments is required. We present PARENTE, a novel method for detecting

  9. [Confidence interval or p-value--similarities and differences between two important methods of statistical inference of quantitative studies].

    Science.gov (United States)

    Harari, Gil

    2014-01-01

    Statistic significance, also known as p-value, and CI (Confidence Interval) are common statistics measures and are essential for the statistical analysis of studies in medicine and life sciences. These measures provide complementary information about the statistical probability and conclusions regarding the clinical significance of study findings. This article is intended to describe the methodologies, compare between the methods, assert their suitability for the different needs of study results analysis and to explain situations in which each method should be used.

  10. Genealogy-based methods for inference of historical recombination and gene flow and their application in Saccharomyces cerevisiae.

    Science.gov (United States)

    Jenkins, Paul A; Song, Yun S; Brem, Rachel B

    2012-01-01

    Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance.

  11. An empirical Bayes method for updating inferences in analysis of quantitative trait loci using information from related genome scans.

    Science.gov (United States)

    Zhang, Kui; Wiener, Howard; Beasley, Mark; George, Varghese; Amos, Christopher I; Allison, David B

    2006-08-01

    Individual genome scans for quantitative trait loci (QTL) mapping often suffer from low statistical power and imprecise estimates of QTL location and effect. This lack of precision yields large confidence intervals for QTL location, which are problematic for subsequent fine mapping and positional cloning. In prioritizing areas for follow-up after an initial genome scan and in evaluating the credibility of apparent linkage signals, investigators typically examine the results of other genome scans of the same phenotype and informally update their beliefs about which linkage signals in their scan most merit confidence and follow-up via a subjective-intuitive integration approach. A method that acknowledges the wisdom of this general paradigm but formally borrows information from other scans to increase confidence in objectivity would be a benefit. We developed an empirical Bayes analytic method to integrate information from multiple genome scans. The linkage statistic obtained from a single genome scan study is updated by incorporating statistics from other genome scans as prior information. This technique does not require that all studies have an identical marker map or a common estimated QTL effect. The updated linkage statistic can then be used for the estimation of QTL location and effect. We evaluate the performance of our method by using extensive simulations based on actual marker spacing and allele frequencies from available data. Results indicate that the empirical Bayes method can account for between-study heterogeneity, estimate the QTL location and effect more precisely, and provide narrower confidence intervals than results from any single individual study. We also compared the empirical Bayes method with a method originally developed for meta-analysis (a closely related but distinct purpose). In the face of marked heterogeneity among studies, the empirical Bayes method outperforms the comparator.

  12. Effects of Three Diagram Instruction Methods on Transfer of Diagram Comprehension Skills: The Critical Role of Inference While Learning

    Science.gov (United States)

    Cromley, Jennifer G.; Bergey, Bradley W.; Fitzhugh, Shannon; Newcombe, Nora; Wills, Theodore W.; Shipley, Thomas F.; Tanaka, Jacqueline C.

    2013-01-01

    Can students be taught to better comprehend the diagrams in their textbooks? Can such teaching transfer to uninstructed diagrams in the same domain or even in a new domain? What methods work best for these goals? Building on previous research showing positive results compared to control groups in both laboratory studies and short-term…

  13. Using Self-Explanations in the Laboratory to Connect Theory and Practice: The Decision/ Explanation/Observation/Inference Writing Method

    Science.gov (United States)

    Van Duzor, Andrea Gay

    2016-01-01

    While many faculty seek to use student-centered, inquiry-based approaches in teaching laboratories, transitioning from traditional to inquiry instruction can be logistically challenging. This paper outlines use of a laboratory notebook and report writing-to-learn method that emphasizes student self-explanations of procedures and outcomes,…

  14. An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

    KAUST Repository

    Rodriguez, Jesse M.

    2013-01-01

    Studies that map disease genes rely on accurate annotations that indicate whether individuals in the studied cohorts are related to each other or not. For example, in genome-wide association studies, the cohort members are assumed to be unrelated to one another. Investigators can correct for individuals in a cohort with previously-unknown shared familial descent by detecting genomic segments that are shared between them, which are considered to be identical by descent (IBD). Alternatively, elevated frequencies of IBD segments near a particular locus among affected individuals can be indicative of a disease-associated gene. As genotyping studies grow to use increasingly large sample sizes and meta-analyses begin to include many data sets, accurate and efficient detection of hidden relatedness becomes a challenge. To enable disease-mapping studies of increasingly large cohorts, a fast and accurate method to detect IBD segments is required. We present PARENTE, a novel method for detecting related pairs of individuals and shared haplotypic segments within these pairs. PARENTE is a computationally-efficient method based on an embedded likelihood ratio test. As demonstrated by the results of our simulations, our method exhibits better accuracy than the current state of the art, and can be used for the analysis of large genotyped cohorts. PARENTE\\'s higher accuracy becomes even more significant in more challenging scenarios, such as detecting shorter IBD segments or when an extremely low false-positive rate is required. PARENTE is publicly and freely available at http://parente.stanford.edu/. © 2013 Springer-Verlag.

  15. Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method

    Science.gov (United States)

    Norris, Peter M.; da Silva, Arlindo M.

    2018-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847

  16. Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability Using High-Resolution Cloud Observations. Part 1: Method

    Science.gov (United States)

    Norris, Peter M.; Da Silva, Arlindo M.

    2016-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.

  17. Inside-sediment partitioning of PAH, PCB and organochlorine compounds and inferences on sampling and normalization methods

    International Nuclear Information System (INIS)

    Opel, Oliver; Palm, Wolf-Ulrich; Steffen, Dieter; Ruck, Wolfgang K.L.

    2011-01-01

    Comparability of sediment analyses for semivolatile organic substances is still low. Neither screening of the sediments nor organic-carbon based normalization is sufficient to obtain comparable results. We are showing the interdependency of grain-size effects with inside-sediment organic-matter distribution for PAH, PCB and organochlorine compounds. Surface sediment samples collected by Van-Veen grab were sieved and analyzed for 16 PAH, 6 PCB and 18 organochlorine pesticides (OCP) as well as organic-matter content. Since bulk concentrations are influenced by grain-size effects themselves, we used a novel normalization method based on the sum of concentrations in the separate grain-size fractions of the sediments. By calculating relative normalized concentrations, it was possible to clearly show underlying mechanisms throughout a heterogeneous set of samples. Furthermore, we were able to show that, for comparability, screening at <125 μm is best suited and can be further improved by additional organic-carbon normalization. - Research highlights: → New method for the comparison of heterogeneous sets of sediment samples. → Assessment of organic pollutants partitioning mechanisms in sediments. → Proposed method for more comparable sediment sampling. - Inside-sediment partitioning mechanisms are shown using a new mathematical approach and discussed in terms of sediment sampling and normalization.

  18. Development of an Origin Trace Method based on Bayesian Inference and Artificial Neural Network for Missing or Stolen Nuclear Materials

    Energy Technology Data Exchange (ETDEWEB)

    Bin, Yim Ho; Min, Lee Seung; Min, Kim Kyung; Jeong, Hong Yoon; Kim, Jae Kwang [Nuclear Security Div., Daejeon (Korea, Republic of)

    2014-05-15

    Thus, 'to put nuclear materials under control' is an important issue for prosperity mankind. Unfortunately, numbers of illicit trafficking of nuclear materials have been increased for decades. Consequently, security of nuclear materials is recently spotlighted. After the 2{sup nd} Nuclear Security Summit in Seoul in 2012, the president of Korea had showed his devotion to nuclear security. One of the main responses for nuclear security related interest of Korea was to develop a national nuclear forensic support system. International Atomic Energy Agency (IAEA) published the document of Nuclear Security Series No.2 'Nuclear Forensics Support' in 2006 to encourage international cooperation of all IAEA member states for tracking nuclear attributions. There are two main questions related to nuclear forensics to answer in the document. The first question is 'what type of material is it?', and the second one is 'where did the material come from?' Korea Nuclear Forensic Library (K-NFL) and mathematical methods to trace origins of missing or stolen nuclear materials (MSNMs) are being developed by Korea Institute of Nuclear Non-proliferation and Control (KINAC) to answer those questions. Although the K-NFL has been designed to perform many functions, K-NFL is being developed to effectively trace the origin of MSNMs and tested to validate suitability of trace methods. New fuels and spent fuels need each trace method because of the different nature of data acquisition. An inductive logic was found to be appropriate for new fuels, which had values as well as a bistable property. On the other hand, machine learning was suitable for spent fuels, which were unable to measure, and thus needed simulation.

  19. An evaluation of the performance and suitability of R × C methods for ecological inference with known true values.

    Science.gov (United States)

    Plescia, Carolina; De Sio, Lorenzo

    2018-01-01

    Ecological inference refers to the study of individuals using aggregate data and it is used in an impressive number of studies; it is well known, however, that the study of individuals using group data suffers from an ecological fallacy problem (Robinson in Am Sociol Rev 15:351-357, 1950). This paper evaluates the accuracy of two recent methods, the Rosen et al. (Stat Neerl 55:134-156, 2001) and the Greiner and Quinn (J R Stat Soc Ser A (Statistics in Society) 172:67-81, 2009) and the long-standing Goodman's (Am Sociol Rev 18:663-664, 1953; Am J Sociol 64:610-625, 1959) method designed to estimate all cells of R × C tables simultaneously by employing exclusively aggregate data. To conduct these tests we leverage on extensive electoral data for which the true quantities of interest are known. In particular, we focus on examining the extent to which the confidence intervals provided by the three methods contain the true values. The paper also provides important guidelines regarding the appropriate contexts for employing these models.

  20. Design of an expert system based on neuro-fuzzy inference analyzer for on-line microstructural characterization using magnetic NDT method

    International Nuclear Information System (INIS)

    Ghanei, S.; Vafaeenezhad, H.; Kashefi, M.; Eivani, A.R.; Mazinani, M.

    2015-01-01

    Tracing microstructural evolution has a significant importance and priority in manufacturing lines of dual-phase steels. In this paper, an artificial intelligence method is presented for on-line microstructural characterization of dual-phase steels. A new method for microstructure characterization based on the theory of magnetic Barkhausen noise nondestructive testing method is introduced using adaptive neuro-fuzzy inference system (ANFIS). In order to predict the accurate martensite volume fraction of dual-phase steels while eliminating the effect and interference of frequency on the magnetic Barkhausen noise outputs, the magnetic responses were fed into the ANFIS structure in terms of position, height and width of the Barkhausen profiles. The results showed that ANFIS approach has the potential to detect and characterize microstructural evolution while the considerable effect of the frequency on magnetic outputs is overlooked. In fact implementing multiple outputs simultaneously enables ANFIS to approach to the accurate results using only height, position and width of the magnetic Barkhausen noise peaks without knowing the value of the used frequency. - Highlights: • New NDT system for microstructural evaluation based on MBN using ANFIS modeling. • Sensitivity of magnetic Barkhausen noise to microstructure changes of the DP steels. • Accurate prediction of martensite by feeding multiple MBN outputs simultaneously. • Obtaining the modeled output without knowing the amount of the used frequency

  1. Design of an expert system based on neuro-fuzzy inference analyzer for on-line microstructural characterization using magnetic NDT method

    Energy Technology Data Exchange (ETDEWEB)

    Ghanei, S., E-mail: Sadegh.Ghanei@yahoo.com [Department of Materials Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Azadi Square, Mashhad (Iran, Islamic Republic of); Vafaeenezhad, H. [Centre of Excellence for High Strength Alloys Technology (CEHSAT), School of Metallurgical and Materials Engineering, Iran University of Science and Technology (IUST), Narmak, Tehran (Iran, Islamic Republic of); Kashefi, M. [Department of Materials Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Azadi Square, Mashhad (Iran, Islamic Republic of); Eivani, A.R. [Centre of Excellence for High Strength Alloys Technology (CEHSAT), School of Metallurgical and Materials Engineering, Iran University of Science and Technology (IUST), Narmak, Tehran (Iran, Islamic Republic of); Mazinani, M. [Department of Materials Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Azadi Square, Mashhad (Iran, Islamic Republic of)

    2015-04-01

    Tracing microstructural evolution has a significant importance and priority in manufacturing lines of dual-phase steels. In this paper, an artificial intelligence method is presented for on-line microstructural characterization of dual-phase steels. A new method for microstructure characterization based on the theory of magnetic Barkhausen noise nondestructive testing method is introduced using adaptive neuro-fuzzy inference system (ANFIS). In order to predict the accurate martensite volume fraction of dual-phase steels while eliminating the effect and interference of frequency on the magnetic Barkhausen noise outputs, the magnetic responses were fed into the ANFIS structure in terms of position, height and width of the Barkhausen profiles. The results showed that ANFIS approach has the potential to detect and characterize microstructural evolution while the considerable effect of the frequency on magnetic outputs is overlooked. In fact implementing multiple outputs simultaneously enables ANFIS to approach to the accurate results using only height, position and width of the magnetic Barkhausen noise peaks without knowing the value of the used frequency. - Highlights: • New NDT system for microstructural evaluation based on MBN using ANFIS modeling. • Sensitivity of magnetic Barkhausen noise to microstructure changes of the DP steels. • Accurate prediction of martensite by feeding multiple MBN outputs simultaneously. • Obtaining the modeled output without knowing the amount of the used frequency.

  2. Estimating Unbiased Land Cover Change Areas In The Colombian Amazon Using Landsat Time Series And Statistical Inference Methods

    Science.gov (United States)

    Arevalo, P. A.; Olofsson, P.; Woodcock, C. E.

    2017-12-01

    Unbiased estimation of the areas of conversion between land categories ("activity data") and their uncertainty is crucial for providing more robust calculations of carbon emissions to the atmosphere, as well as their removals. This is particularly important for the REDD+ mechanism of UNFCCC where an economic compensation is tied to the magnitude and direction of such fluxes. Dense time series of Landsat data and statistical protocols are becoming an integral part of forest monitoring efforts, but there are relatively few studies in the tropics focused on using these methods to advance operational MRV systems (Monitoring, Reporting and Verification). We present the results of a prototype methodology for continuous monitoring and unbiased estimation of activity data that is compliant with the IPCC Approach 3 for representation of land. We used a break detection algorithm (Continuous Change Detection and Classification, CCDC) to fit pixel-level temporal segments to time series of Landsat data in the Colombian Amazon. The segments were classified using a Random Forest classifier to obtain annual maps of land categories between 2001 and 2016. Using these maps, a biannual stratified sampling approach was implemented and unbiased stratified estimators constructed to calculate area estimates with confidence intervals for each of the stable and change classes. Our results provide evidence of a decrease in primary forest as a result of conversion to pastures, as well as increase in secondary forest as pastures are abandoned and the forest allowed to regenerate. Estimating areas of other land transitions proved challenging because of their very small mapped areas compared to stable classes like forest, which corresponds to almost 90% of the study area. Implications on remote sensing data processing, sample allocation and uncertainty reduction are also discussed.

  3. Test of Shi et al. Method to Infer the Magnetic Reconnection Geometry from Spacecraft Data: MHD Simulation with Guide Field and Antiparallel Kinetic Simulation

    Science.gov (United States)

    Denton, R.; Sonnerup, B. U. O.; Swisdak, M.; Birn, J.; Drake, J. F.; Heese, M.

    2012-01-01

    When analyzing data from an array of spacecraft (such as Cluster or MMS) crossing a site of magnetic reconnection, it is desirable to be able to accurately determine the orientation of the reconnection site. If the reconnection is quasi-two dimensional, there are three key directions, the direction of maximum inhomogeneity (the direction across the reconnection site), the direction of the reconnecting component of the magnetic field, and the direction of rough invariance (the "out of plane" direction). Using simulated spacecraft observations of magnetic reconnection in the geomagnetic tail, we extend our previous tests of the direction-finding method developed by Shi et al. (2005) and the method to determine the structure velocity relative to the spacecraft Vstr. These methods require data from four proximate spacecraft. We add artificial noise and calibration errors to the simulation fields, and then use the perturbed gradient of the magnetic field B and perturbed time derivative dB/dt, as described by Denton et al. (2010). Three new simulations are examined: a weakly three-dimensional, i.e., quasi-two-dimensional, MHD simulation without a guide field, a quasi-two-dimensional MHD simulation with a guide field, and a two-dimensional full dynamics kinetic simulation with inherent noise so that the apparent minimum gradient was not exactly zero, even without added artificial errors. We also examined variations of the spacecraft trajectory for the kinetic simulation. The accuracy of the directions found varied depending on the simulation and spacecraft trajectory, but all the directions could be found within about 10 for all cases. Various aspects of the method were examined, including how to choose averaging intervals and the best intervals for determining the directions and velocity. For the kinetic simulation, we also investigated in detail how the errors in the inferred gradient directions from the unmodified Shi et al. method (using the unperturbed gradient

  4. Object-Oriented Type Inference

    DEFF Research Database (Denmark)

    Schwartzbach, Michael Ignatieff; Palsberg, Jens

    1991-01-01

    We present a new approach to inferring types in untyped object-oriented programs with inheritance, assignments, and late binding. It guarantees that all messages are understood, annotates the program with type information, allows polymorphic methods, and can be used as the basis of an op...

  5. On Maximum Entropy and Inference

    Directory of Open Access Journals (Sweden)

    Luigi Gresele

    2017-11-01

    Full Text Available Maximum entropy is a powerful concept that entails a sharp separation between relevant and irrelevant variables. It is typically invoked in inference, once an assumption is made on what the relevant variables are, in order to estimate a model from data, that affords predictions on all other (dependent variables. Conversely, maximum entropy can be invoked to retrieve the relevant variables (sufficient statistics directly from the data, once a model is identified by Bayesian model selection. We explore this approach in the case of spin models with interactions of arbitrary order, and we discuss how relevant interactions can be inferred. In this perspective, the dimensionality of the inference problem is not set by the number of parameters in the model, but by the frequency distribution of the data. We illustrate the method showing its ability to recover the correct model in a few prototype cases and discuss its application on a real dataset.

  6. Inference Attacks and Control on Database Structures

    Directory of Open Access Journals (Sweden)

    Muhamed Turkanovic

    2015-02-01

    Full Text Available Today’s databases store information with sensitivity levels that range from public to highly sensitive, hence ensuring confidentiality can be highly important, but also requires costly control. This paper focuses on the inference problem on different database structures. It presents possible treats on privacy with relation to the inference, and control methods for mitigating these treats. The paper shows that using only access control, without any inference control is inadequate, since these models are unable to protect against indirect data access. Furthermore, it covers new inference problems which rise from the dimensions of new technologies like XML, semantics, etc.

  7. A Model-Based Evaluation of the Inverse Gaussian Transit-Time Distribution Method for Inferring Anthropogenic Carbon Storage in the Ocean

    Science.gov (United States)

    He, Yan-Chun; Tjiputra, Jerry; Langehaug, Helene R.; Jeansson, Emil; Gao, Yongqi; Schwinger, Jörg; Olsen, Are

    2018-03-01

    The Inverse Gaussian approximation of transit time distribution method (IG-TTD) is widely used to infer the anthropogenic carbon (Cant) concentration in the ocean from measurements of transient tracers such as chlorofluorocarbons (CFCs) and sulfur hexafluoride (SF6). Its accuracy relies on the validity of several assumptions, notably (i) a steady state ocean circulation, (ii) a prescribed age tracer saturation history, e.g., a constant 100% saturation, (iii) a prescribed constant degree of mixing in the ocean, (iv) a constant surface ocean air-sea CO2 disequilibrium with time, and (v) that preformed alkalinity can be sufficiently estimated by salinity or salinity and temperature. Here, these assumptions are evaluated using simulated "model-truth" of Cant. The results give the IG-TTD method a range of uncertainty from 7.8% to 13.6% (11.4 Pg C to 19.8 Pg C) due to above assumptions, which is about half of the uncertainty derived in previous model studies. Assumptions (ii), (iv) and (iii) are the three largest sources of uncertainties, accounting for 5.5%, 3.8% and 3.0%, respectively, while assumptions (i) and (v) only contribute about 0.6% and 0.7%. Regionally, the Southern Ocean contributes the largest uncertainty, of 7.8%, while the North Atlantic contributes about 1.3%. Our findings demonstrate that spatial-dependency of Δ/Γ, and temporal changes in tracer saturation and air-sea CO2 disequilibrium have strong compensating effect on the estimated Cant. The values of these parameters should be quantified to reduce the uncertainty of IG-TTD; this is increasingly important under a changing ocean climate.

  8. SEMANTIC PATCH INFERENCE

    DEFF Research Database (Denmark)

    Andersen, Jesper

    2009-01-01

    Collateral evolution the problem of updating several library-using programs in response to API changes in the used library. In this dissertation we address the issue of understanding collateral evolutions by automatically inferring a high-level specification of the changes evident in a given set ...... specifications inferred by spdiff in Linux are shown. We find that the inferred specifications concisely capture the actual collateral evolution performed in the examples....

  9. Causal inference in economics and marketing.

    Science.gov (United States)

    Varian, Hal R

    2016-07-05

    This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual-a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference.

  10. Nonparametric predictive inference in statistical process control

    NARCIS (Netherlands)

    Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.

    2000-01-01

    New methods for statistical process control are presented, where the inferences have a nonparametric predictive nature. We consider several problems in process control in terms of uncertainties about future observable random quantities, and we develop inferences for these random quantities hased on

  11. Inference in `poor` languages

    Energy Technology Data Exchange (ETDEWEB)

    Petrov, S.

    1996-10-01

    Languages with a solvable implication problem but without complete and consistent systems of inference rules (`poor` languages) are considered. The problem of existence of finite complete and consistent inference rule system for a ``poor`` language is stated independently of the language or rules syntax. Several properties of the problem arc proved. An application of results to the language of join dependencies is given.

  12. Bayesian statistical inference

    Directory of Open Access Journals (Sweden)

    Bruno De Finetti

    2017-04-01

    Full Text Available This work was translated into English and published in the volume: Bruno De Finetti, Induction and Probability, Biblioteca di Statistica, eds. P. Monari, D. Cocchi, Clueb, Bologna, 1993.Bayesian statistical Inference is one of the last fundamental philosophical papers in which we can find the essential De Finetti's approach to the statistical inference.

  13. Geometric statistical inference

    International Nuclear Information System (INIS)

    Periwal, Vipul

    1999-01-01

    A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined

  14. Statistical inference an integrated approach

    CERN Document Server

    Migon, Helio S; Louzada, Francisco

    2014-01-01

    Introduction Information The concept of probability Assessing subjective probabilities An example Linear algebra and probability Notation Outline of the bookElements of Inference Common statistical modelsLikelihood-based functions Bayes theorem Exchangeability Sufficiency and exponential family Parameter elimination Prior Distribution Entirely subjective specification Specification through functional forms Conjugacy with the exponential family Non-informative priors Hierarchical priors Estimation Introduction to decision theoryBayesian point estimation Classical point estimation Empirical Bayes estimation Comparison of estimators Interval estimation Estimation in the Normal model Approximating Methods The general problem of inference Optimization techniquesAsymptotic theory Other analytical approximations Numerical integration methods Simulation methods Hypothesis Testing Introduction Classical hypothesis testingBayesian hypothesis testing Hypothesis testing and confidence intervalsAsymptotic tests Prediction...

  15. Knowledge and inference

    CERN Document Server

    Nagao, Makoto

    1990-01-01

    Knowledge and Inference discusses an important problem for software systems: How do we treat knowledge and ideas on a computer and how do we use inference to solve problems on a computer? The book talks about the problems of knowledge and inference for the purpose of merging artificial intelligence and library science. The book begins by clarifying the concept of """"knowledge"""" from many points of view, followed by a chapter on the current state of library science and the place of artificial intelligence in library science. Subsequent chapters cover central topics in the artificial intellig

  16. Logical inference and evaluation

    International Nuclear Information System (INIS)

    Perey, F.G.

    1981-01-01

    Most methodologies of evaluation currently used are based upon the theory of statistical inference. It is generally perceived that this theory is not capable of dealing satisfactorily with what are called systematic errors. Theories of logical inference should be capable of treating all of the information available, including that not involving frequency data. A theory of logical inference is presented as an extension of deductive logic via the concept of plausibility and the application of group theory. Some conclusions, based upon the application of this theory to evaluation of data, are also given

  17. EI: A Program for Ecological Inference

    Directory of Open Access Journals (Sweden)

    Gary King

    2004-09-01

    Full Text Available The program EI provides a method of inferring individual behavior from aggregate data. It implements the statistical procedures, diagnostics, and graphics from the book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (King 1997. Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological" data to infer discrete individual-level relationships of interest when individual-level data are not available. Ecological inferences are required in political science research when individual-level surveys are unavailable (e.g., local or comparative electoral politics, unreliable (racial politics, insufficient (political geography, or infeasible (political history. They are also required in numerous areas of ma jor significance in public policy (e.g., for applying the Voting Rights Act and other academic disciplines ranging from epidemiology and marketing to sociology and quantitative history.

  18. Statistical inference on residual life

    CERN Document Server

    Jeong, Jong-Hyeon

    2014-01-01

    This is a monograph on the concept of residual life, which is an alternative summary measure of time-to-event data, or survival data. The mean residual life has been used for many years under the name of life expectancy, so it is a natural concept for summarizing survival or reliability data. It is also more interpretable than the popular hazard function, especially for communications between patients and physicians regarding the efficacy of a new drug in the medical field. This book reviews existing statistical methods to infer the residual life distribution. The review and comparison includes existing inference methods for mean and median, or quantile, residual life analysis through medical data examples. The concept of the residual life is also extended to competing risks analysis. The targeted audience includes biostatisticians, graduate students, and PhD (bio)statisticians. Knowledge in survival analysis at an introductory graduate level is advisable prior to reading this book.

  19. Statistical inference a short course

    CERN Document Server

    Panik, Michael J

    2012-01-01

    A concise, easily accessible introduction to descriptive and inferential techniques Statistical Inference: A Short Course offers a concise presentation of the essentials of basic statistics for readers seeking to acquire a working knowledge of statistical concepts, measures, and procedures. The author conducts tests on the assumption of randomness and normality, provides nonparametric methods when parametric approaches might not work. The book also explores how to determine a confidence interval for a population median while also providing coverage of ratio estimation, randomness, and causal

  20. Probability and Statistical Inference

    OpenAIRE

    Prosper, Harrison B.

    2006-01-01

    These lectures introduce key concepts in probability and statistical inference at a level suitable for graduate students in particle physics. Our goal is to paint as vivid a picture as possible of the concepts covered.

  1. On quantum statistical inference

    NARCIS (Netherlands)

    Barndorff-Nielsen, O.E.; Gill, R.D.; Jupp, P.E.

    2003-01-01

    Interest in problems of statistical inference connected to measurements of quantum systems has recently increased substantially, in step with dramatic new developments in experimental techniques for studying small quantum systems. Furthermore, developments in the theory of quantum measurements have

  2. INFERENCE BUILDING BLOCKS

    Science.gov (United States)

    2018-02-15

    expressed a variety of inference techniques on discrete and continuous distributions: exact inference, importance sampling, Metropolis-Hastings (MH...without redoing any math or rewriting any code. And although our main goal is composable reuse, our performance is also good because we can use...control paths. • The Hakaru language can express mixtures of discrete and continuous distributions, but the current disintegration transformation

  3. Gauging Variational Inference

    Energy Technology Data Exchange (ETDEWEB)

    Chertkov, Michael [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Ahn, Sungsoo [Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of); Shin, Jinwoo [Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of)

    2017-05-25

    Computing partition function is the most important statistical inference task arising in applications of Graphical Models (GM). Since it is computationally intractable, approximate methods have been used to resolve the issue in practice, where meanfield (MF) and belief propagation (BP) are arguably the most popular and successful approaches of a variational type. In this paper, we propose two new variational schemes, coined Gauged-MF (G-MF) and Gauged-BP (G-BP), improving MF and BP, respectively. Both provide lower bounds for the partition function by utilizing the so-called gauge transformation which modifies factors of GM while keeping the partition function invariant. Moreover, we prove that both G-MF and G-BP are exact for GMs with a single loop of a special structure, even though the bare MF and BP perform badly in this case. Our extensive experiments, on complete GMs of relatively small size and on large GM (up-to 300 variables) confirm that the newly proposed algorithms outperform and generalize MF and BP.

  4. Statistical learning and selective inference.

    Science.gov (United States)

    Taylor, Jonathan; Tibshirani, Robert J

    2015-06-23

    We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.

  5. Bayesian inference with ecological applications

    CERN Document Server

    Link, William A

    2009-01-01

    This text is written to provide a mathematically sound but accessible and engaging introduction to Bayesian inference specifically for environmental scientists, ecologists and wildlife biologists. It emphasizes the power and usefulness of Bayesian methods in an ecological context. The advent of fast personal computers and easily available software has simplified the use of Bayesian and hierarchical models . One obstacle remains for ecologists and wildlife biologists, namely the near absence of Bayesian texts written specifically for them. The book includes many relevant examples, is supported by software and examples on a companion website and will become an essential grounding in this approach for students and research ecologists. Engagingly written text specifically designed to demystify a complex subject Examples drawn from ecology and wildlife research An essential grounding for graduate and research ecologists in the increasingly prevalent Bayesian approach to inference Companion website with analyt...

  6. Bayesian inference on proportional elections.

    Directory of Open Access Journals (Sweden)

    Gabriel Hideki Vatanabe Brunello

    Full Text Available Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software.

  7. Probability biases as Bayesian inference

    Directory of Open Access Journals (Sweden)

    Andre; C. R. Martins

    2006-11-01

    Full Text Available In this article, I will show how several observed biases in human probabilistic reasoning can be partially explained as good heuristics for making inferences in an environment where probabilities have uncertainties associated to them. Previous results show that the weight functions and the observed violations of coalescing and stochastic dominance can be understood from a Bayesian point of view. We will review those results and see that Bayesian methods should also be used as part of the explanation behind other known biases. That means that, although the observed errors are still errors under the be understood as adaptations to the solution of real life problems. Heuristics that allow fast evaluations and mimic a Bayesian inference would be an evolutionary advantage, since they would give us an efficient way of making decisions. %XX In that sense, it should be no surprise that humans reason with % probability as it has been observed.

  8. Statistical inference an integrated Bayesianlikelihood approach

    CERN Document Server

    Aitkin, Murray

    2010-01-01

    Filling a gap in current Bayesian theory, Statistical Inference: An Integrated Bayesian/Likelihood Approach presents a unified Bayesian treatment of parameter inference and model comparisons that can be used with simple diffuse prior specifications. This novel approach provides new solutions to difficult model comparison problems and offers direct Bayesian counterparts of frequentist t-tests and other standard statistical methods for hypothesis testing.After an overview of the competing theories of statistical inference, the book introduces the Bayes/likelihood approach used throughout. It pre

  9. Type Inference with Inequalities

    DEFF Research Database (Denmark)

    Schwartzbach, Michael Ignatieff

    1991-01-01

    of (monotonic) inequalities on the types of variables and expressions. A general result about systems of inequalities over semilattices yields a solvable form. We distinguish between deciding typability (the existence of solutions) and type inference (the computation of a minimal solution). In our case, both......Type inference can be phrased as constraint-solving over types. We consider an implicitly typed language equipped with recursive types, multiple inheritance, 1st order parametric polymorphism, and assignments. Type correctness is expressed as satisfiability of a possibly infinite collection...

  10. Inference as Prediction

    Science.gov (United States)

    Watson, Jane

    2007-01-01

    Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…

  11. Hybrid Optical Inference Machines

    Science.gov (United States)

    1991-09-27

    with labels. Now, events. a set of facts cal be generated in the dyadic form "u, R 1,2" Eichmann and Caulfield (19] consider the same type of and can...these enceding-schemes. These architectures are-based pri- 19. G. Eichmann and H. J. Caulfield, "Optical Learning (Inference)marily on optical inner

  12. Polynomial Chaos Surrogates for Bayesian Inference

    KAUST Repository

    Le Maitre, Olivier

    2016-01-06

    The Bayesian inference is a popular probabilistic method to solve inverse problems, such as the identification of field parameter in a PDE model. The inference rely on the Bayes rule to update the prior density of the sought field, from observations, and derive its posterior distribution. In most cases the posterior distribution has no explicit form and has to be sampled, for instance using a Markov-Chain Monte Carlo method. In practice the prior field parameter is decomposed and truncated (e.g. by means of Karhunen- Lo´eve decomposition) to recast the inference problem into the inference of a finite number of coordinates. Although proved effective in many situations, the Bayesian inference as sketched above faces several difficulties requiring improvements. First, sampling the posterior can be a extremely costly task as it requires multiple resolutions of the PDE model for different values of the field parameter. Second, when the observations are not very much informative, the inferred parameter field can highly depends on its prior which can be somehow arbitrary. These issues have motivated the introduction of reduced modeling or surrogates for the (approximate) determination of the parametrized PDE solution and hyperparameters in the description of the prior field. Our contribution focuses on recent developments in these two directions: the acceleration of the posterior sampling by means of Polynomial Chaos expansions and the efficient treatment of parametrized covariance functions for the prior field. We also discuss the possibility of making such approach adaptive to further improve its efficiency.

  13. Inferring network structure from cascades

    Science.gov (United States)

    Ghonge, Sushrut; Vural, Dervis Can

    2017-07-01

    Many physical, biological, and social phenomena can be described by cascades taking place on a network. Often, the activity can be empirically observed, but not the underlying network of interactions. In this paper we offer three topological methods to infer the structure of any directed network given a set of cascade arrival times. Our formulas hold for a very general class of models where the activation probability of a node is a generic function of its degree and the number of its active neighbors. We report high success rates for synthetic and real networks, for several different cascade models.

  14. Bayesian inference for Hawkes processes

    DEFF Research Database (Denmark)

    Rasmussen, Jakob Gulddahl

    The Hawkes process is a practically and theoretically important class of point processes, but parameter-estimation for such a process can pose various problems. In this paper we explore and compare two approaches to Bayesian inference. The first approach is based on the so-called conditional...... intensity function, while the second approach is based on an underlying clustering and branching structure in the Hawkes process. For practical use, MCMC (Markov chain Monte Carlo) methods are employed. The two approaches are compared numerically using three examples of the Hawkes process....

  15. Bayesian inference for Hawkes processes

    DEFF Research Database (Denmark)

    Rasmussen, Jakob Gulddahl

    2013-01-01

    The Hawkes process is a practically and theoretically important class of point processes, but parameter-estimation for such a process can pose various problems. In this paper we explore and compare two approaches to Bayesian inference. The first approach is based on the so-called conditional...... intensity function, while the second approach is based on an underlying clustering and branching structure in the Hawkes process. For practical use, MCMC (Markov chain Monte Carlo) methods are employed. The two approaches are compared numerically using three examples of the Hawkes process....

  16. Inference rule and problem solving

    Energy Technology Data Exchange (ETDEWEB)

    Goto, S

    1982-04-01

    Intelligent information processing signifies an opportunity of having man's intellectual activity executed on the computer, in which inference, in place of ordinary calculation, is used as the basic operational mechanism for such an information processing. Many inference rules are derived from syllogisms in formal logic. The problem of programming this inference function is referred to as a problem solving. Although logically inference and problem-solving are in close relation, the calculation ability of current computers is on a low level for inferring. For clarifying the relation between inference and computers, nonmonotonic logic has been considered. The paper deals with the above topics. 16 references.

  17. Stochastic processes inference theory

    CERN Document Server

    Rao, Malempati M

    2014-01-01

    This is the revised and enlarged 2nd edition of the authors’ original text, which was intended to be a modest complement to Grenander's fundamental memoir on stochastic processes and related inference theory. The present volume gives a substantial account of regression analysis, both for stochastic processes and measures, and includes recent material on Ridge regression with some unexpected applications, for example in econometrics. The first three chapters can be used for a quarter or semester graduate course on inference on stochastic processes. The remaining chapters provide more advanced material on stochastic analysis suitable for graduate seminars and discussions, leading to dissertation or research work. In general, the book will be of interest to researchers in probability theory, mathematical statistics and electrical and information theory.

  18. Making Type Inference Practical

    DEFF Research Database (Denmark)

    Schwartzbach, Michael Ignatieff; Oxhøj, Nicholas; Palsberg, Jens

    1992-01-01

    We present the implementation of a type inference algorithm for untyped object-oriented programs with inheritance, assignments, and late binding. The algorithm significantly improves our previous one, presented at OOPSLA'91, since it can handle collection classes, such as List, in a useful way. Abo......, the complexity has been dramatically improved, from exponential time to low polynomial time. The implementation uses the techniques of incremental graph construction and constraint template instantiation to avoid representing intermediate results, doing superfluous work, and recomputing type information....... Experiments indicate that the implementation type checks as much as 100 lines pr. second. This results in a mature product, on which a number of tools can be based, for example a safety tool, an image compression tool, a code optimization tool, and an annotation tool. This may make type inference for object...

  19. Russell and Humean Inferences

    Directory of Open Access Journals (Sweden)

    João Paulo Monteiro

    2001-12-01

    Full Text Available Russell's The Problems of Philosophy tries to establish a new theory of induction, at the same time that Hume is there accused of an irrational/ scepticism about induction". But a careful analysis of the theory of knowledge explicitly acknowledged by Hume reveals that, contrary to the standard interpretation in the XXth century, possibly influenced by Russell, Hume deals exclusively with causal inference (which he never classifies as "causal induction", although now we are entitled to do so, never with inductive inference in general, mainly generalizations about sensible qualities of objects ( whether, e.g., "all crows are black" or not is not among Hume's concerns. Russell's theories are thus only false alternatives to Hume's, in (1912 or in his (1948.

  20. Causal inference in econometrics

    CERN Document Server

    Kreinovich, Vladik; Sriboonchitta, Songsak

    2016-01-01

    This book is devoted to the analysis of causal inference which is one of the most difficult tasks in data analysis: when two phenomena are observed to be related, it is often difficult to decide whether one of them causally influences the other one, or whether these two phenomena have a common cause. This analysis is the main focus of this volume. To get a good understanding of the causal inference, it is important to have models of economic phenomena which are as accurate as possible. Because of this need, this volume also contains papers that use non-traditional economic models, such as fuzzy models and models obtained by using neural networks and data mining techniques. It also contains papers that apply different econometric models to analyze real-life economic dependencies.

  1. Active inference and learning.

    Science.gov (United States)

    Friston, Karl; FitzGerald, Thomas; Rigoli, Francesco; Schwartenbeck, Philipp; O Doherty, John; Pezzulo, Giovanni

    2016-09-01

    This paper offers an active inference account of choice behaviour and learning. It focuses on the distinction between goal-directed and habitual behaviour and how they contextualise each other. We show that habits emerge naturally (and autodidactically) from sequential policy optimisation when agents are equipped with state-action policies. In active inference, behaviour has explorative (epistemic) and exploitative (pragmatic) aspects that are sensitive to ambiguity and risk respectively, where epistemic (ambiguity-resolving) behaviour enables pragmatic (reward-seeking) behaviour and the subsequent emergence of habits. Although goal-directed and habitual policies are usually associated with model-based and model-free schemes, we find the more important distinction is between belief-free and belief-based schemes. The underlying (variational) belief updating provides a comprehensive (if metaphorical) process theory for several phenomena, including the transfer of dopamine responses, reversal learning, habit formation and devaluation. Finally, we show that active inference reduces to a classical (Bellman) scheme, in the absence of ambiguity. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. Statistical inference for financial engineering

    CERN Document Server

    Taniguchi, Masanobu; Ogata, Hiroaki; Taniai, Hiroyuki

    2014-01-01

    This monograph provides the fundamentals of statistical inference for financial engineering and covers some selected methods suitable for analyzing financial time series data. In order to describe the actual financial data, various stochastic processes, e.g. non-Gaussian linear processes, non-linear processes, long-memory processes, locally stationary processes etc. are introduced and their optimal estimation is considered as well. This book also includes several statistical approaches, e.g., discriminant analysis, the empirical likelihood method, control variate method, quantile regression, realized volatility etc., which have been recently developed and are considered to be powerful tools for analyzing the financial data, establishing a new bridge between time series and financial engineering. This book is well suited as a professional reference book on finance, statistics and statistical financial engineering. Readers are expected to have an undergraduate-level knowledge of statistics.

  3. Learning Convex Inference of Marginals

    OpenAIRE

    Domke, Justin

    2012-01-01

    Graphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly in terms of the performance of the inference process at univariate marginal prediction. The main ...

  4. Probabilistic inductive inference: a survey

    OpenAIRE

    Ambainis, Andris

    2001-01-01

    Inductive inference is a recursion-theoretic theory of learning, first developed by E. M. Gold (1967). This paper surveys developments in probabilistic inductive inference. We mainly focus on finite inference of recursive functions, since this simple paradigm has produced the most interesting (and most complex) results.

  5. Estimating uncertainty of inference for validation

    Energy Technology Data Exchange (ETDEWEB)

    Booker, Jane M [Los Alamos National Laboratory; Langenbrunner, James R [Los Alamos National Laboratory; Hemez, Francois M [Los Alamos National Laboratory; Ross, Timothy J [UNM

    2010-09-30

    first in a series of inference uncertainty estimations. While the methods demonstrated are primarily statistical, these do not preclude the use of nonprobabilistic methods for uncertainty characterization. The methods presented permit accurate determinations for validation and eventual prediction. It is a goal that these methods establish a standard against which best practice may evolve for determining degree of validation.

  6. Grammatical inference algorithms, routines and applications

    CERN Document Server

    Wieczorek, Wojciech

    2017-01-01

    This book focuses on grammatical inference, presenting classic and modern methods of grammatical inference from the perspective of practitioners. To do so, it employs the Python programming language to present all of the methods discussed. Grammatical inference is a field that lies at the intersection of multiple disciplines, with contributions from computational linguistics, pattern recognition, machine learning, computational biology, formal learning theory and many others. Though the book is largely practical, it also includes elements of learning theory, combinatorics on words, the theory of automata and formal languages, plus references to real-world problems. The listings presented here can be directly copied and pasted into other programs, thus making the book a valuable source of ready recipes for students, academic researchers, and programmers alike, as well as an inspiration for their further development.>.

  7. Meta-learning framework applied in bioinformatics inference system design.

    Science.gov (United States)

    Arredondo, Tomás; Ormazábal, Wladimir

    2015-01-01

    This paper describes a meta-learner inference system development framework which is applied and tested in the implementation of bioinformatic inference systems. These inference systems are used for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps. This meta-learner-based approach utilises a workflow where the user provides feedback with final classification decisions which are stored in conjunction with analysed genetic sequences for periodic inference system training. The inference systems were trained and tested with three different data sets related to the bacterial degradation of aromatic compounds. The analysis of the meta-learner-based framework involved contrasting several different optimisation methods with various different parameters. The obtained inference systems were also contrasted with other standard classification methods with accurate prediction capabilities observed.

  8. Deep Learning for Population Genetic Inference.

    Science.gov (United States)

    Sheehan, Sara; Song, Yun S

    2016-03-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.

  9. Deep Learning for Population Genetic Inference.

    Directory of Open Access Journals (Sweden)

    Sara Sheehan

    2016-03-01

    Full Text Available Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data to the output (e.g., population genetic parameters of interest. We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history. Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.

  10. Deep Learning for Population Genetic Inference

    Science.gov (United States)

    Sheehan, Sara; Song, Yun S.

    2016-01-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908

  11. Inferring Phylogenetic Networks Using PhyloNet.

    Science.gov (United States)

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  12. Nonparametric statistical inference

    CERN Document Server

    Gibbons, Jean Dickinson

    2010-01-01

    Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente

  13. Emotional inferences by pragmatics

    OpenAIRE

    Iza-Miqueleiz, Mauricio

    2017-01-01

    It has for long been taken for granted that, along the course of reading a text, world knowledge is often required in order to establish coherent links between sentences (McKoon & Ratcliff 1992, Iza & Ezquerro 2000). The content grasped from a text turns out to be strongly dependent upon the reader’s additional knowledge that allows a coherent interpretation of the text as a whole. The world knowledge directing the inference may be of distinctive nature. Gygax et al. (2007) showed that m...

  14. Generic patch inference

    DEFF Research Database (Denmark)

    Andersen, Jesper; Lawall, Julia

    2010-01-01

    A key issue in maintaining Linux device drivers is the need to keep them up to date with respect to evolutions in Linux internal libraries. Currently, there is little tool support for performing and documenting such changes. In this paper we present a tool, spdiff, that identifies common changes...... developers can use it to extract an abstract representation of the set of changes that others have made. Our experiments on recent changes in Linux show that the inferred generic patches are more concise than the corresponding patches found in commits to the Linux source tree while being safe with respect...

  15. A formal model of interpersonal inference

    Directory of Open Access Journals (Sweden)

    Michael eMoutoussis

    2014-03-01

    Full Text Available Introduction: We propose that active Bayesian inference – a general framework for decision-making – can equally be applied to interpersonal exchanges. Social cognition, however, entails special challenges. We address these challenges through a novel formulation of a formal model and demonstrate its psychological significance. Method: We review relevant literature, especially with regards to interpersonal representations, formulate a mathematical model and present a simulation study. The model accommodates normative models from utility theory and places them within the broader setting of Bayesian inference. Crucially, we endow people's prior beliefs, into which utilities are absorbed, with preferences of self and others. The simulation illustrates the model's dynamics and furnishes elementary predictions of the theory. Results: 1. Because beliefs about self and others inform both the desirability and plausibility of outcomes, in this framework interpersonal representations become beliefs that have to be actively inferred. This inference, akin to 'mentalising' in the psychological literature, is based upon the outcomes of interpersonal exchanges. 2. We show how some well-known social-psychological phenomena (e.g. self-serving biases can be explained in terms of active interpersonal inference. 3. Mentalising naturally entails Bayesian updating of how people value social outcomes. Crucially this includes inference about one’s own qualities and preferences. Conclusion: We inaugurate a Bayes optimal framework for modelling intersubject variability in mentalising during interpersonal exchanges. Here, interpersonal representations are endowed with explicit functional and affective properties. We suggest the active inference framework lends itself to the study of psychiatric conditions where mentalising is distorted.

  16. IMAGINE: Interstellar MAGnetic field INference Engine

    Science.gov (United States)

    Steininger, Theo

    2018-03-01

    IMAGINE (Interstellar MAGnetic field INference Engine) performs inference on generic parametric models of the Galaxy. The modular open source framework uses highly optimized tools and technology such as the MultiNest sampler (ascl:1109.006) and the information field theory framework NIFTy (ascl:1302.013) to create an instance of the Milky Way based on a set of parameters for physical observables, using Bayesian statistics to judge the mismatch between measured data and model prediction. The flexibility of the IMAGINE framework allows for simple refitting for newly available data sets and makes state-of-the-art Bayesian methods easily accessible particularly for random components of the Galactic magnetic field.

  17. Inferring causality from noisy time series data

    DEFF Research Database (Denmark)

    Mønster, Dan; Fusaroli, Riccardo; Tylén, Kristian

    2016-01-01

    Convergent Cross-Mapping (CCM) has shown high potential to perform causal inference in the absence of models. We assess the strengths and weaknesses of the method by varying coupling strength and noise levels in coupled logistic maps. We find that CCM fails to infer accurate coupling strength...... and even causality direction in synchronized time-series and in the presence of intermediate coupling. We find that the presence of noise deterministically reduces the level of cross-mapping fidelity, while the convergence rate exhibits higher levels of robustness. Finally, we propose that controlled noise...

  18. Spurious correlations and inference in landscape genetics

    Science.gov (United States)

    Samuel A. Cushman; Erin L. Landguth

    2010-01-01

    Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...

  19. The influence of molecular markers and methods on inferring the phylogenetic relationships between the representatives of the Arini (parrots, Psittaciformes), determined on the basis of their complete mitochondrial genomes.

    Science.gov (United States)

    Urantowka, Adam Dawid; Kroczak, Aleksandra; Mackiewicz, Paweł

    2017-07-14

    Conures are a morphologically diverse group of Neotropical parrots classified as members of the tribe Arini, which has recently been subjected to a taxonomic revision. The previously broadly defined Aratinga genus of this tribe has been split into the 'true' Aratinga and three additional genera, Eupsittula, Psittacara and Thectocercus. Popular markers used in the reconstruction of the parrots' phylogenies derive from mitochondrial DNA. However, current phylogenetic analyses seem to indicate conflicting relationships between Aratinga and other conures, and also among other Arini members. Therefore, it is not clear if the mtDNA phylogenies can reliably define the species tree. The inconsistencies may result from the variable evolution rate of the markers used or their weak phylogenetic signal. To resolve these controversies and to assess to what extent the phylogenetic relationships in the tribe Arini can be inferred from mitochondrial genomes, we compared representative Arini mitogenomes as well as examined the usefulness of the individual mitochondrial markers and the efficiency of various phylogenetic methods. Single molecular markers produced inconsistent tree topologies, while different methods offered various topologies even for the same marker. A significant disagreement in these tree topologies occurred for cytb, nd2 and nd6 genes, which are commonly used in parrot phylogenies. The strongest phylogenetic signal was found in the control region and RNA genes. However, these markers cannot be used alone in inferring Arini phylogenies because they do not provide fully resolved trees. The most reliable phylogeny of the parrots under study is obtained only on the concatenated set of all mitochondrial markers. The analyses established significantly resolved relationships within the former Aratinga representatives and the main genera of the tribe Arini. Such mtDNA phylogeny can be in agreement with the species tree, owing to its match with synapomorphic features in

  20. Uncertainty in prediction and in inference

    International Nuclear Information System (INIS)

    Hilgevoord, J.; Uffink, J.

    1991-01-01

    The concepts of uncertainty in prediction and inference are introduced and illustrated using the diffraction of light as an example. The close relationship between the concepts of uncertainty in inference and resolving power is noted. A general quantitative measure of uncertainty in inference can be obtained by means of the so-called statistical distance between probability distributions. When applied to quantum mechanics, this distance leads to a measure of the distinguishability of quantum states, which essentially is the absolute value of the matrix element between the states. The importance of this result to the quantum mechanical uncertainty principle is noted. The second part of the paper provides a derivation of the statistical distance on the basis of the so-called method of support

  1. Causal inference in public health.

    Science.gov (United States)

    Glass, Thomas A; Goodman, Steven N; Hernán, Miguel A; Samet, Jonathan M

    2013-01-01

    Causal inference has a central role in public health; the determination that an association is causal indicates the possibility for intervention. We review and comment on the long-used guidelines for interpreting evidence as supporting a causal association and contrast them with the potential outcomes framework that encourages thinking in terms of causes that are interventions. We argue that in public health this framework is more suitable, providing an estimate of an action's consequences rather than the less precise notion of a risk factor's causal effect. A variety of modern statistical methods adopt this approach. When an intervention cannot be specified, causal relations can still exist, but how to intervene to change the outcome will be unclear. In application, the often-complex structure of causal processes needs to be acknowledged and appropriate data collected to study them. These newer approaches need to be brought to bear on the increasingly complex public health challenges of our globalized world.

  2. Comparing spatial grain-size trends inferred from textural parameters using percentile statistical parameters and those based on the log-hyperbolic method

    DEFF Research Database (Denmark)

    Bartholdy, Jesper; Christiansen, C.; Pedersen, Jørn Bjarke Torp

    2007-01-01

    The Folk&Ward (F&W) and the log-hyperbolic methods are applied to a small - and easy to overlook - number of typical sand sized grain-size distributions from the Danish Wadden Sea. The sand originates from the same source, and the pattern of change in the grain-size distributions is, therefore...

  3. A Phylogeny of the Monocots, as Inferred from rbcL and atpA Sequence Variation, and a Comparison of Methods for Calculating Jackknife and Bootstrap Values

    DEFF Research Database (Denmark)

    Davis, Jerrold I.; Stevenson, Dennis W.; Petersen, Gitte

    2004-01-01

    elements of Xyridaceae. A comparison was conducted of jackknife and bootstrap values, as computed using strict-consensus (SC) and frequency-within-replicates (FWR) approaches. Jackknife values tend to be higher than bootstrap values, and for each of these methods support values obtained with the FWR...

  4. Statistical Inference and Patterns of Inequality in the Global North

    Science.gov (United States)

    Moran, Timothy Patrick

    2006-01-01

    Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…

  5. Extended likelihood inference in reliability

    International Nuclear Information System (INIS)

    Martz, H.F. Jr.; Beckman, R.J.; Waller, R.A.

    1978-10-01

    Extended likelihood methods of inference are developed in which subjective information in the form of a prior distribution is combined with sampling results by means of an extended likelihood function. The extended likelihood function is standardized for use in obtaining extended likelihood intervals. Extended likelihood intervals are derived for the mean of a normal distribution with known variance, the failure-rate of an exponential distribution, and the parameter of a binomial distribution. Extended second-order likelihood methods are developed and used to solve several prediction problems associated with the exponential and binomial distributions. In particular, such quantities as the next failure-time, the number of failures in a given time period, and the time required to observe a given number of failures are predicted for the exponential model with a gamma prior distribution on the failure-rate. In addition, six types of life testing experiments are considered. For the binomial model with a beta prior distribution on the probability of nonsurvival, methods are obtained for predicting the number of nonsurvivors in a given sample size and for predicting the required sample size for observing a specified number of nonsurvivors. Examples illustrate each of the methods developed. Finally, comparisons are made with Bayesian intervals in those cases where these are known to exist

  6. Constraint Satisfaction Inference : Non-probabilistic Global Inference for Sequence Labelling

    NARCIS (Netherlands)

    Canisius, S.V.M.; van den Bosch, A.; Daelemans, W.; Basili, R.; Moschitti, A.

    2006-01-01

    We present a new method for performing sequence labelling based on the idea of using a machine-learning classifier to generate several possible output sequences, and then applying an inference procedure to select the best sequence among those. Most sequence labelling methods following a similar

  7. Inverse Ising inference with correlated samples

    International Nuclear Information System (INIS)

    Obermayer, Benedikt; Levine, Erel

    2014-01-01

    Correlations between two variables of a high-dimensional system can be indicative of an underlying interaction, but can also result from indirect effects. Inverse Ising inference is a method to distinguish one from the other. Essentially, the parameters of the least constrained statistical model are learned from the observed correlations such that direct interactions can be separated from indirect correlations. Among many other applications, this approach has been helpful for protein structure prediction, because residues which interact in the 3D structure often show correlated substitutions in a multiple sequence alignment. In this context, samples used for inference are not independent but share an evolutionary history on a phylogenetic tree. Here, we discuss the effects of correlations between samples on global inference. Such correlations could arise due to phylogeny but also via other slow dynamical processes. We present a simple analytical model to address the resulting inference biases, and develop an exact method accounting for background correlations in alignment data by combining phylogenetic modeling with an adaptive cluster expansion algorithm. We find that popular reweighting schemes are only marginally effective at removing phylogenetic bias, suggest a rescaling strategy that yields better results, and provide evidence that our conclusions carry over to the frequently used mean-field approach to the inverse Ising problem. (paper)

  8. Ensemble stacking mitigates biases in inference of synaptic connectivity.

    Science.gov (United States)

    Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N

    2018-01-01

    A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.

  9. Causal inference in biology networks with integrated belief propagation.

    Science.gov (United States)

    Chang, Rui; Karr, Jonathan R; Schadt, Eric E

    2015-01-01

    Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.

  10. BagReg: Protein inference through machine learning.

    Science.gov (United States)

    Zhao, Can; Liu, Dao; Teng, Ben; He, Zengyou

    2015-08-01

    Protein inference from the identified peptides is of primary importance in the shotgun proteomics. The target of protein inference is to identify whether each candidate protein is truly present in the sample. To date, many computational methods have been proposed to solve this problem. However, there is still no method that can fully utilize the information hidden in the input data. In this article, we propose a learning-based method named BagReg for protein inference. The method firstly artificially extracts five features from the input data, and then chooses each feature as the class feature to separately build models to predict the presence probabilities of proteins. Finally, the weak results from five prediction models are aggregated to obtain the final result. We test our method on six public available data sets. The experimental results show that our method is superior to the state-of-the-art protein inference algorithms. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Feature Inference Learning and Eyetracking

    Science.gov (United States)

    Rehder, Bob; Colner, Robert M.; Hoffman, Aaron B.

    2009-01-01

    Besides traditional supervised classification learning, people can learn categories by inferring the missing features of category members. It has been proposed that feature inference learning promotes learning a category's internal structure (e.g., its typical features and interfeature correlations) whereas classification promotes the learning of…

  12. An Inference Language for Imaging

    DEFF Research Database (Denmark)

    Pedemonte, Stefano; Catana, Ciprian; Van Leemput, Koen

    2014-01-01

    We introduce iLang, a language and software framework for probabilistic inference. The iLang framework enables the definition of directed and undirected probabilistic graphical models and the automated synthesis of high performance inference algorithms for imaging applications. The iLang framewor...

  13. Brain Imaging, Forward Inference, and Theories of Reasoning

    Science.gov (United States)

    Heit, Evan

    2015-01-01

    This review focuses on the issue of how neuroimaging studies address theoretical accounts of reasoning, through the lens of the method of forward inference (Henson, 2005, 2006). After theories of deductive and inductive reasoning are briefly presented, the method of forward inference for distinguishing between psychological theories based on brain imaging evidence is critically reviewed. Brain imaging studies of reasoning, comparing deductive and inductive arguments, comparing meaningful versus non-meaningful material, investigating hemispheric localization, and comparing conditional and relational arguments, are assessed in light of the method of forward inference. Finally, conclusions are drawn with regard to future research opportunities. PMID:25620926

  14. Brain imaging, forward inference, and theories of reasoning.

    Science.gov (United States)

    Heit, Evan

    2014-01-01

    This review focuses on the issue of how neuroimaging studies address theoretical accounts of reasoning, through the lens of the method of forward inference (Henson, 2005, 2006). After theories of deductive and inductive reasoning are briefly presented, the method of forward inference for distinguishing between psychological theories based on brain imaging evidence is critically reviewed. Brain imaging studies of reasoning, comparing deductive and inductive arguments, comparing meaningful versus non-meaningful material, investigating hemispheric localization, and comparing conditional and relational arguments, are assessed in light of the method of forward inference. Finally, conclusions are drawn with regard to future research opportunities.

  15. Social Inference Through Technology

    Science.gov (United States)

    Oulasvirta, Antti

    Awareness cues are computer-mediated, real-time indicators of people’s undertakings, whereabouts, and intentions. Already in the mid-1970 s, UNIX users could use commands such as “finger” and “talk” to find out who was online and to chat. The small icons in instant messaging (IM) applications that indicate coconversants’ presence in the discussion space are the successors of “finger” output. Similar indicators can be found in online communities, media-sharing services, Internet relay chat (IRC), and location-based messaging applications. But presence and availability indicators are only the tip of the iceberg. Technological progress has enabled richer, more accurate, and more intimate indicators. For example, there are mobile services that allow friends to query and follow each other’s locations. Remote monitoring systems developed for health care allow relatives and doctors to assess the wellbeing of homebound patients (see, e.g., Tang and Venables 2000). But users also utilize cues that have not been deliberately designed for this purpose. For example, online gamers pay attention to other characters’ behavior to infer what the other players are like “in real life.” There is a common denominator underlying these examples: shared activities rely on the technology’s representation of the remote person. The other human being is not physically present but present only through a narrow technological channel.

  16. A New Method to Infer Advancement of Saline Front in Coastal Groundwater Systems by 3D: The Case of Bari (Southern Italy Fractured Aquifer

    Directory of Open Access Journals (Sweden)

    Costantino Masciopinto

    2016-02-01

    Full Text Available A new method to study 3D saline front advancement in coastal fractured aquifers has been presented. Field groundwater salinity was measured in boreholes of the Bari (Southern Italy coastal aquifer with depth below water table. Then, the Ghyben-Herzberg freshwater/saltwater (50% sharp interface and saline front position were determined by model simulations of the freshwater flow in groundwater. Afterward, the best-fit procedure between groundwater salinity measurements, at assigned water depth of 1.0 m in boreholes, and distances of each borehole from the modelled freshwater/saltwater saline front was used to convert each position (x, y in groundwater to the water salinity concentration at depth of 1.0 m. Moreover, a second best-fit procedure was applied to the salinity measurements in boreholes with depth z. These results provided a grid file (x, y, z, salinity suitable for plotting the actual Bari aquifer salinity by 3D maps. Subsequently, in order to assess effects of pumping on the saltwater-freshwater transition zone in the coastal aquifer, the Navier-Stokes (N-S equations were applied to study transient density-driven flow and salt mass transport into freshwater of a single fracture. The rate of seawater/freshwater interface advancement given by the N-S solution was used to define the progression of saline front in Bari groundwater, starting from the actual salinity 3D map. The impact of pumping of 335 L·s−1 during the transition period of 112.8 days was easily highlighted on 3D salinity maps of Bari aquifer.

  17. Automated processing of data for supertree construction

    OpenAIRE

    Hill, Jon; Davis, Katie; Tover, Jaime; Wills, Matthew

    2015-01-01

    Talk given to Evolution2015 on the new autoprocessing functionality of the STK. This involves collecting nomenclature and taxonomic information on the OTUs to create a consistent naming scheme, and following the normal processing.

  18. Bayesian structural inference for hidden processes

    Science.gov (United States)

    Strelioff, Christopher C.; Crutchfield, James P.

    2014-04-01

    We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian structural inference (BSI) relies on a set of candidate unifilar hidden Markov model (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological ɛ-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be ɛ-machines, irrespective of estimated transition probabilities. Properties of ɛ-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.

  19. Statistical Inference on the Canadian Middle Class

    Directory of Open Access Journals (Sweden)

    Russell Davidson

    2018-03-01

    Full Text Available Conventional wisdom says that the middle classes in many developed countries have recently suffered losses, in terms of both the share of the total population belonging to the middle class, and also their share in total income. Here, distribution-free methods are developed for inference on these shares, by means of deriving expressions for their asymptotic variances of sample estimates, and the covariance of the estimates. Asymptotic inference can be undertaken based on asymptotic normality. Bootstrap inference can be expected to be more reliable, and appropriate bootstrap procedures are proposed. As an illustration, samples of individual earnings drawn from Canadian census data are used to test various hypotheses about the middle-class shares, and confidence intervals for them are computed. It is found that, for the earlier censuses, sample sizes are large enough for asymptotic and bootstrap inference to be almost identical, but that, in the twenty-first century, the bootstrap fails on account of a strange phenomenon whereby many presumably different incomes in the data are rounded to one and the same value. Another difference between the centuries is the appearance of heavy right-hand tails in the income distributions of both men and women.

  20. Ancient Biomolecules and Evolutionary Inference.

    Science.gov (United States)

    Cappellini, Enrico; Prohaska, Ana; Racimo, Fernando; Welker, Frido; Pedersen, Mikkel Winther; Allentoft, Morten E; de Barros Damgaard, Peter; Gutenbrunner, Petra; Dunne, Julie; Hammann, Simon; Roffet-Salque, Mélanie; Ilardo, Melissa; Moreno-Mayar, J Víctor; Wang, Yucheng; Sikora, Martin; Vinner, Lasse; Cox, Jürgen; Evershed, Richard P; Willerslev, Eske

    2018-04-25

    Over the last decade, studies of ancient biomolecules-particularly ancient DNA, proteins, and lipids-have revolutionized our understanding of evolutionary history. Though initially fraught with many challenges, the field now stands on firm foundations. Researchers now successfully retrieve nucleotide and amino acid sequences, as well as lipid signatures, from progressively older samples, originating from geographic areas and depositional environments that, until recently, were regarded as hostile to long-term preservation of biomolecules. Sampling frequencies and the spatial and temporal scope of studies have also increased markedly, and with them the size and quality of the data sets generated. This progress has been made possible by continuous technical innovations in analytical methods, enhanced criteria for the selection of ancient samples, integrated experimental methods, and advanced computational approaches. Here, we discuss the history and current state of ancient biomolecule research, its applications to evolutionary inference, and future directions for this young and exciting field. Expected final online publication date for the Annual Review of Biochemistry Volume 87 is June 20, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

  1. Bootstrap inference when using multiple imputation.

    Science.gov (United States)

    Schomaker, Michael; Heumann, Christian

    2018-04-16

    Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available. Copyright © 2018 John Wiley & Sons, Ltd.

  2. Fast and scalable inference of multi-sample cancer lineages.

    KAUST Repository

    Popic, Victoria

    2015-05-06

    Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee .

  3. Fast and scalable inference of multi-sample cancer lineages.

    KAUST Repository

    Popic, Victoria; Salari, Raheleh; Hajirasouliha, Iman; Kashef-Haghighi, Dorna; West, Robert B; Batzoglou, Serafim

    2015-01-01

    Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee .

  4. Indirect Inference for Stochastic Differential Equations Based on Moment Expansions

    KAUST Repository

    Ballesio, Marco; Tempone, Raul; Vilanova, Pedro

    2016-01-01

    We provide an indirect inference method to estimate the parameters of timehomogeneous scalar diffusion and jump diffusion processes. We obtain a system of ODEs that approximate the time evolution of the first two moments of the process

  5. Bayesian Information Criterion as an Alternative way of Statistical Inference

    Directory of Open Access Journals (Sweden)

    Nadejda Yu. Gubanova

    2012-05-01

    Full Text Available The article treats Bayesian information criterion as an alternative to traditional methods of statistical inference, based on NHST. The comparison of ANOVA and BIC results for psychological experiment is discussed.

  6. Dynamic spatial panels : models, methods, and inferences

    NARCIS (Netherlands)

    Elhorst, J. Paul

    This paper provides a survey of the existing literature on the specification and estimation of dynamic spatial panel data models, a collection of models for spatial panels extended to include one or more of the following variables and/or error terms: a dependent variable lagged in time, a dependent

  7. Working with sample data exploration and inference

    CERN Document Server

    Chaffe-Stengel, Priscilla

    2014-01-01

    Managers and analysts routinely collect and examine key performance measures to better understand their operations and make good decisions. Being able to render the complexity of operations data into a coherent account of significant events requires an understanding of how to work well with raw data and to make appropriate inferences. Although some statistical techniques for analyzing data and making inferences are sophisticated and require specialized expertise, there are methods that are understandable and applicable by anyone with basic algebra skills and the support of a spreadsheet package. By applying these fundamental methods themselves rather than turning over both the data and the responsibility for analysis and interpretation to an expert, managers will develop a richer understanding and potentially gain better control over their environment. This text is intended to describe these fundamental statistical techniques to managers, data analysts, and students. Statistical analysis of sample data is enh...

  8. Ensemble stacking mitigates biases in inference of synaptic connectivity

    Directory of Open Access Journals (Sweden)

    Brendan Chambers

    2018-03-01

    Full Text Available A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches. Mapping the routing of spikes through local circuitry is crucial for understanding neocortical computation. Under appropriate experimental conditions, these maps can be used to infer likely patterns of synaptic recruitment, linking activity to underlying anatomical connections. Such inferences help to reveal the synaptic implementation of population dynamics and computation. We compare a number of standard functional measures to infer underlying connectivity. We find that regularization impacts measures

  9. Inferring ontology graph structures using OWL reasoning

    KAUST Repository

    Rodriguez-Garcia, Miguel Angel

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies\\' semantic content remains a challenge.We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies\\' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph .Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  10. Inferring ontology graph structures using OWL reasoning.

    Science.gov (United States)

    Rodríguez-García, Miguel Ángel; Hoehndorf, Robert

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  11. Statistical inferences for bearings life using sudden death test

    Directory of Open Access Journals (Sweden)

    Morariu Cristin-Olimpiu

    2017-01-01

    Full Text Available In this paper we propose a calculus method for reliability indicators estimation and a complete statistical inferences for three parameters Weibull distribution of bearings life. Using experimental values regarding the durability of bearings tested on stands by the sudden death tests involves a series of particularities of the estimation using maximum likelihood method and statistical inference accomplishment. The paper detailing these features and also provides an example calculation.

  12. On principles of inductive inference

    OpenAIRE

    Kostecki, Ryszard Paweł

    2011-01-01

    We propose an intersubjective epistemic approach to foundations of probability theory and statistical inference, based on relative entropy and category theory, and aimed to bypass the mathematical and conceptual problems of existing foundational approaches.

  13. Statistical inference for stochastic processes

    National Research Council Canada - National Science Library

    Basawa, Ishwar V; Prakasa Rao, B. L. S

    1980-01-01

    The aim of this monograph is to attempt to reduce the gap between theory and applications in the area of stochastic modelling, by directing the interest of future researchers to the inference aspects...

  14. Online Emotional Inferences in Written and Auditory Texts: A Study with Children and Adults

    Science.gov (United States)

    Diergarten, Anna Katharina; Nieding, Gerhild

    2016-01-01

    Emotional inferences are conclusions that a reader draws about the emotional state of a story's protagonist. In this study, we examined whether children and adults draw emotional inferences while reading short stories or listening to an aural presentation of short stories. We used an online method that assesses inferences during reading with a…

  15. Active inference, communication and hermeneutics.

    Science.gov (United States)

    Friston, Karl J; Frith, Christopher D

    2015-07-01

    Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  16. Optimal inference with suboptimal models: Addiction and active Bayesian inference

    Science.gov (United States)

    Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl

    2015-01-01

    When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321

  17. Fused Regression for Multi-source Gene Regulatory Network Inference.

    Directory of Open Access Journals (Sweden)

    Kari Y Lam

    2016-12-01

    Full Text Available Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method's utility in learning from data collected on different experimental platforms.

  18. Bootstrapping phylogenies inferred from rearrangement data

    Directory of Open Access Journals (Sweden)

    Lin Yu

    2012-08-01

    Full Text Available Abstract Background Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. Results We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Conclusions Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its

  19. Bootstrapping phylogenies inferred from rearrangement data.

    Science.gov (United States)

    Lin, Yu; Rajan, Vaibhav; Moret, Bernard Me

    2012-08-29

    Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver

  20. The NIFTY way of Bayesian signal inference

    International Nuclear Information System (INIS)

    Selig, Marco

    2014-01-01

    We introduce NIFTY, 'Numerical Information Field Theory', a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTY can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTY as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D 3 PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy

  1. The NIFTy way of Bayesian signal inference

    Science.gov (United States)

    Selig, Marco

    2014-12-01

    We introduce NIFTy, "Numerical Information Field Theory", a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTy can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTy as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D3PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.

  2. Pointwise probability reinforcements for robust statistical inference.

    Science.gov (United States)

    Frénay, Benoît; Verleysen, Michel

    2014-02-01

    Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.

  3. Nonparametric statistical inference

    CERN Document Server

    Gibbons, Jean Dickinson

    2014-01-01

    Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.

  4. Bayesian inference of substrate properties from film behavior

    International Nuclear Information System (INIS)

    Aggarwal, R; Demkowicz, M J; Marzouk, Y M

    2015-01-01

    We demonstrate that by observing the behavior of a film deposited on a substrate, certain features of the substrate may be inferred with quantified uncertainty using Bayesian methods. We carry out this demonstration on an illustrative film/substrate model where the substrate is a Gaussian random field and the film is a two-component mixture that obeys the Cahn–Hilliard equation. We construct a stochastic reduced order model to describe the film/substrate interaction and use it to infer substrate properties from film behavior. This quantitative inference strategy may be adapted to other film/substrate systems. (paper)

  5. Surrogate based approaches to parameter inference in ocean models

    KAUST Repository

    Knio, Omar

    2016-01-06

    This talk discusses the inference of physical parameters using model surrogates. Attention is focused on the use of sampling schemes to build suitable representations of the dependence of the model response on uncertain input data. Non-intrusive spectral projections and regularized regressions are used for this purpose. A Bayesian inference formalism is then applied to update the uncertain inputs based on available measurements or observations. To perform the update, we consider two alternative approaches, based on the application of Markov Chain Monte Carlo methods or of adjoint-based optimization techniques. We outline the implementation of these techniques to infer dependence of wind drag, bottom drag, and internal mixing coefficients.

  6. Surrogate based approaches to parameter inference in ocean models

    KAUST Repository

    Knio, Omar

    2016-01-01

    This talk discusses the inference of physical parameters using model surrogates. Attention is focused on the use of sampling schemes to build suitable representations of the dependence of the model response on uncertain input data. Non-intrusive spectral projections and regularized regressions are used for this purpose. A Bayesian inference formalism is then applied to update the uncertain inputs based on available measurements or observations. To perform the update, we consider two alternative approaches, based on the application of Markov Chain Monte Carlo methods or of adjoint-based optimization techniques. We outline the implementation of these techniques to infer dependence of wind drag, bottom drag, and internal mixing coefficients.

  7. SDG multiple fault diagnosis by real-time inverse inference

    International Nuclear Information System (INIS)

    Zhang Zhaoqian; Wu Chongguang; Zhang Beike; Xia Tao; Li Anfeng

    2005-01-01

    In the past 20 years, one of the qualitative simulation technologies, signed directed graph (SDG) has been widely applied in the field of chemical fault diagnosis. However, the assumption of single fault origin was usually used by many former researchers. As a result, this will lead to the problem of combinatorial explosion and has limited SDG to the realistic application on the real process. This is mainly because that most of the former researchers used forward inference engine in the commercial expert system software to carry out the inverse diagnosis inference on the SDG model which violates the internal principle of diagnosis mechanism. In this paper, we present a new SDG multiple faults diagnosis method by real-time inverse inference. This is a method of multiple faults diagnosis from the genuine significance and the inference engine use inverse mechanism. At last, we give an example of 65t/h furnace diagnosis system to demonstrate its applicability and efficiency

  8. SDG multiple fault diagnosis by real-time inverse inference

    Energy Technology Data Exchange (ETDEWEB)

    Zhang Zhaoqian; Wu Chongguang; Zhang Beike; Xia Tao; Li Anfeng

    2005-02-01

    In the past 20 years, one of the qualitative simulation technologies, signed directed graph (SDG) has been widely applied in the field of chemical fault diagnosis. However, the assumption of single fault origin was usually used by many former researchers. As a result, this will lead to the problem of combinatorial explosion and has limited SDG to the realistic application on the real process. This is mainly because that most of the former researchers used forward inference engine in the commercial expert system software to carry out the inverse diagnosis inference on the SDG model which violates the internal principle of diagnosis mechanism. In this paper, we present a new SDG multiple faults diagnosis method by real-time inverse inference. This is a method of multiple faults diagnosis from the genuine significance and the inference engine use inverse mechanism. At last, we give an example of 65t/h furnace diagnosis system to demonstrate its applicability and efficiency.

  9. Interactive Instruction in Bayesian Inference

    DEFF Research Database (Denmark)

    Khan, Azam; Breslav, Simon; Hornbæk, Kasper

    2018-01-01

    An instructional approach is presented to improve human performance in solving Bayesian inference problems. Starting from the original text of the classic Mammography Problem, the textual expression is modified and visualizations are added according to Mayer’s principles of instruction. These pri......An instructional approach is presented to improve human performance in solving Bayesian inference problems. Starting from the original text of the classic Mammography Problem, the textual expression is modified and visualizations are added according to Mayer’s principles of instruction....... These principles concern coherence, personalization, signaling, segmenting, multimedia, spatial contiguity, and pretraining. Principles of self-explanation and interactivity are also applied. Four experiments on the Mammography Problem showed that these principles help participants answer the questions...... that an instructional approach to improving human performance in Bayesian inference is a promising direction....

  10. Geostatistical inference using crosshole ground-penetrating radar

    DEFF Research Database (Denmark)

    Looms, Majken C; Hansen, Thomas Mejer; Cordua, Knud Skou

    2010-01-01

    of the subsurface are used to evaluate the uncertainty of the inversion estimate. We have explored the full potential of the geostatistical inference method using several synthetic models of varying correlation structures and have tested the influence of different assumptions concerning the choice of covariance...... reflection profile. Furthermore, the inferred values of the subsurface global variance and the mean velocity have been corroborated with moisturecontent measurements, obtained gravimetrically from samples collected at the field site....

  11. Phylogenetic Inference of HIV Transmission Clusters

    Directory of Open Access Journals (Sweden)

    Vlad Novitsky

    2017-10-01

    Full Text Available Better understanding the structure and dynamics of HIV transmission networks is essential for designing the most efficient interventions to prevent new HIV transmissions, and ultimately for gaining control of the HIV epidemic. The inference of phylogenetic relationships and the interpretation of results rely on the definition of the HIV transmission cluster. The definition of the HIV cluster is complex and dependent on multiple factors, including the design of sampling, accuracy of sequencing, precision of sequence alignment, evolutionary models, the phylogenetic method of inference, and specified thresholds for cluster support. While the majority of studies focus on clusters, non-clustered cases could also be highly informative. A new dimension in the analysis of the global and local HIV epidemics is the concept of phylogenetically distinct HIV sub-epidemics. The identification of active HIV sub-epidemics reveals spreading viral lineages and may help in the design of targeted interventions.HIVclustering can also be affected by sampling density. Obtaining a proper sampling density may increase statistical power and reduce sampling bias, so sampling density should be taken into account in study design and in interpretation of phylogenetic results. Finally, recent advances in long-range genotyping may enable more accurate inference of HIV transmission networks. If performed in real time, it could both inform public-health strategies and be clinically relevant (e.g., drug-resistance testing.

  12. Eight challenges in phylodynamic inference

    Directory of Open Access Journals (Sweden)

    Simon D.W. Frost

    2015-03-01

    Full Text Available The field of phylodynamics, which attempts to enhance our understanding of infectious disease dynamics using pathogen phylogenies, has made great strides in the past decade. Basic epidemiological and evolutionary models are now well characterized with inferential frameworks in place. However, significant challenges remain in extending phylodynamic inference to more complex systems. These challenges include accounting for evolutionary complexities such as changing mutation rates, selection, reassortment, and recombination, as well as epidemiological complexities such as stochastic population dynamics, host population structure, and different patterns at the within-host and between-host scales. An additional challenge exists in making efficient inferences from an ever increasing corpus of sequence data.

  13. Problem solving and inference mechanisms

    Energy Technology Data Exchange (ETDEWEB)

    Furukawa, K; Nakajima, R; Yonezawa, A; Goto, S; Aoyama, A

    1982-01-01

    The heart of the fifth generation computer will be powerful mechanisms for problem solving and inference. A deduction-oriented language is to be designed, which will form the core of the whole computing system. The language is based on predicate logic with the extended features of structuring facilities, meta structures and relational data base interfaces. Parallel computation mechanisms and specialized hardware architectures are being investigated to make possible efficient realization of the language features. The project includes research into an intelligent programming system, a knowledge representation language and system, and a meta inference system to be built on the core. 30 references.

  14. Statistical theory and inference

    CERN Document Server

    Olive, David J

    2014-01-01

    This text is for  a one semester graduate course in statistical theory and covers minimal and complete sufficient statistics, maximum likelihood estimators, method of moments, bias and mean square error, uniform minimum variance estimators and the Cramer-Rao lower bound, an introduction to large sample theory, likelihood ratio tests and uniformly most powerful  tests and the Neyman Pearson Lemma. A major goal of this text is to make these topics much more accessible to students by using the theory of exponential families. Exponential families, indicator functions and the support of the distribution are used throughout the text to simplify the theory. More than 50 ``brand name" distributions are used to illustrate the theory with many examples of exponential families, maximum likelihood estimators and uniformly minimum variance unbiased estimators. There are many homework problems with over 30 pages of solutions.

  15. Inferring Phylogenetic Networks from Gene Order Data

    Directory of Open Access Journals (Sweden)

    Alexey Anatolievich Morozov

    2013-01-01

    Full Text Available Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary, sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm. Binary encoding can also be useful, but only when the methods mentioned above cannot be used.

  16. Inference in hybrid Bayesian networks

    DEFF Research Database (Denmark)

    Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael

    2009-01-01

    Since the 1980s, Bayesian Networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability-techniques (like fault trees...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....

  17. Mixed normal inference on multicointegration

    NARCIS (Netherlands)

    Boswijk, H.P.

    2009-01-01

    Asymptotic likelihood analysis of cointegration in I(2) models, see Johansen (1997, 2006), Boswijk (2000) and Paruolo (2000), has shown that inference on most parameters is mixed normal, implying hypothesis test statistics with an asymptotic 2 null distribution. The asymptotic distribution of the

  18. Statistical inference and Aristotle's Rhetoric.

    Science.gov (United States)

    Macdonald, Ranald R

    2004-11-01

    Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.

  19. Efficient Exact Inference With Loss Augmented Objective in Structured Learning.

    Science.gov (United States)

    Bauer, Alexander; Nakajima, Shinichi; Muller, Klaus-Robert

    2016-08-19

    Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms--the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the slack and margin scaling formulations of structural SVM with a variety of dissimilarity measures, e.g., Hamming loss, precision and recall, Fβ-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.

  20. A general Bayes weibull inference model for accelerated life testing

    International Nuclear Information System (INIS)

    Dorp, J. Rene van; Mazzuchi, Thomas A.

    2005-01-01

    This article presents the development of a general Bayes inference model for accelerated life testing. The failure times at a constant stress level are assumed to belong to a Weibull distribution, but the specification of strict adherence to a parametric time-transformation function is not required. Rather, prior information is used to indirectly define a multivariate prior distribution for the scale parameters at the various stress levels and the common shape parameter. Using the approach, Bayes point estimates as well as probability statements for use-stress (and accelerated) life parameters may be inferred from a host of testing scenarios. The inference procedure accommodates both the interval data sampling strategy and type I censored sampling strategy for the collection of ALT test data. The inference procedure uses the well-known MCMC (Markov Chain Monte Carlo) methods to derive posterior approximations. The approach is illustrated with an example

  1. Nonparametric inference of network structure and dynamics

    Science.gov (United States)

    Peixoto, Tiago P.

    The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among

  2. Polynomial Chaos–Based Bayesian Inference of K-Profile Parameterization in a General Circulation Model of the Tropical Pacific

    KAUST Repository

    Sraj, Ihab; Zedler, Sarah E.; Knio, Omar; Jackson, Charles S.; Hoteit, Ibrahim

    2016-01-01

    The authors present a polynomial chaos (PC)-based Bayesian inference method for quantifying the uncertainties of the K-profile parameterization (KPP) within the MIT general circulation model (MITgcm) of the tropical Pacific. The inference

  3. Approximate Inference and Deep Generative Models

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Advances in deep generative models are at the forefront of deep learning research because of the promise they offer for allowing data-efficient learning, and for model-based reinforcement learning. In this talk I'll review a few standard methods for approximate inference and introduce modern approximations which allow for efficient large-scale training of a wide variety of generative models. Finally, I'll demonstrate several important application of these models to density estimation, missing data imputation, data compression and planning.

  4. Inverse Ising Inference Using All the Data

    Science.gov (United States)

    Aurell, Erik; Ekeberg, Magnus

    2012-03-01

    We show that a method based on logistic regression, using all the data, solves the inverse Ising problem far better than mean-field calculations relying only on sample pairwise correlation functions, while still computationally feasible for hundreds of nodes. The largest improvement in reconstruction occurs for strong interactions. Using two examples, a diluted Sherrington-Kirkpatrick model and a two-dimensional lattice, we also show that interaction topologies can be recovered from few samples with good accuracy and that the use of l1 regularization is beneficial in this process, pushing inference abilities further into low-temperature regimes.

  5. Inference of Large Phylogenies Using Neighbour-Joining

    DEFF Research Database (Denmark)

    Simonsen, Martin; Mailund, Thomas; Pedersen, Christian Nørgaard Storm

    2011-01-01

    The neighbour-joining method is a widely used method for phylogenetic reconstruction which scales to thousands of taxa. However, advances in sequencing technology have made data sets with more than 10,000 related taxa widely available. Inference of such large phylogenies takes hours or days using...... the Neighbour-Joining method on a normal desktop computer because of the O(n^3) running time. RapidNJ is a search heuristic which reduce the running time of the Neighbour-Joining method significantly but at the cost of an increased memory consumption making inference of large phylogenies infeasible. We present...... two extensions for RapidNJ which reduce the memory requirements and \\makebox{allows} phylogenies with more than 50,000 taxa to be inferred efficiently on a desktop computer. Furthermore, an improved version of the search heuristic is presented which reduces the running time of RapidNJ on many data...

  6. Information-Theoretic Inference of Large Transcriptional Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Meyer Patrick

    2007-01-01

    Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

  7. Information-Theoretic Inference of Large Transcriptional Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Patrick E. Meyer

    2007-06-01

    Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

  8. Grouping preprocess for haplotype inference from SNP and CNV data

    International Nuclear Information System (INIS)

    Shindo, Hiroyuki; Chigira, Hiroshi; Nagaoka, Tomoyo; Inoue, Masato; Kamatani, Naoyuki

    2009-01-01

    The method of statistical haplotype inference is an indispensable technique in the field of medical science. The authors previously reported Hardy-Weinberg equilibrium-based haplotype inference that could manage single nucleotide polymorphism (SNP) data. We recently extended the method to cover copy number variation (CNV) data. Haplotype inference from mixed data is important because SNPs and CNVs are occasionally in linkage disequilibrium. The idea underlying the proposed method is simple, but the algorithm for it needs to be quite elaborate to reduce the calculation cost. Consequently, we have focused on the details on the algorithm in this study. Although the main advantage of the method is accuracy, in that it does not use any approximation, its main disadvantage is still the calculation cost, which is sometimes intractable for large data sets with missing values.

  9. Grouping preprocess for haplotype inference from SNP and CNV data

    Energy Technology Data Exchange (ETDEWEB)

    Shindo, Hiroyuki; Chigira, Hiroshi; Nagaoka, Tomoyo; Inoue, Masato [Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555 (Japan); Kamatani, Naoyuki, E-mail: masato.inoue@eb.waseda.ac.j [Institute of Rheumatology, Tokyo Women' s Medical University, 10-22, Kawada-cho, Shinjuku-ku, Tokyo 162-0054 (Japan)

    2009-12-01

    The method of statistical haplotype inference is an indispensable technique in the field of medical science. The authors previously reported Hardy-Weinberg equilibrium-based haplotype inference that could manage single nucleotide polymorphism (SNP) data. We recently extended the method to cover copy number variation (CNV) data. Haplotype inference from mixed data is important because SNPs and CNVs are occasionally in linkage disequilibrium. The idea underlying the proposed method is simple, but the algorithm for it needs to be quite elaborate to reduce the calculation cost. Consequently, we have focused on the details on the algorithm in this study. Although the main advantage of the method is accuracy, in that it does not use any approximation, its main disadvantage is still the calculation cost, which is sometimes intractable for large data sets with missing values.

  10. Inferring network topology from complex dynamics

    International Nuclear Information System (INIS)

    Shandilya, Srinivas Gorur; Timme, Marc

    2011-01-01

    Inferring the network topology from dynamical observations is a fundamental problem pervading research on complex systems. Here, we present a simple, direct method for inferring the structural connection topology of a network, given an observation of one collective dynamical trajectory. The general theoretical framework is applicable to arbitrary network dynamical systems described by ordinary differential equations. No interference (external driving) is required and the type of dynamics is hardly restricted in any way. In particular, the observed dynamics may be arbitrarily complex; stationary, invariant or transient; synchronous or asynchronous and chaotic or periodic. Presupposing a knowledge of the functional form of the dynamical units and of the coupling functions between them, we present an analytical solution to the inverse problem of finding the network topology from observing a time series of state variables only. Robust reconstruction is achieved in any sufficiently long generic observation of the system. We extend our method to simultaneously reconstructing both the entire network topology and all parameters appearing linear in the system's equations of motion. Reconstruction of network topology and system parameters is viable even in the presence of external noise that distorts the original dynamics substantially. The method provides a conceptually new step towards reconstructing a variety of real-world networks, including gene and protein interaction networks and neuronal circuits.

  11. Automated adaptive inference of phenomenological dynamical models

    Science.gov (United States)

    Daniels, Bryan

    Understanding the dynamics of biochemical systems can seem impossibly complicated at the microscopic level: detailed properties of every molecular species, including those that have not yet been discovered, could be important for producing macroscopic behavior. The profusion of data in this area has raised the hope that microscopic dynamics might be recovered in an automated search over possible models, yet the combinatorial growth of this space has limited these techniques to systems that contain only a few interacting species. We take a different approach inspired by coarse-grained, phenomenological models in physics. Akin to a Taylor series producing Hooke's Law, forgoing microscopic accuracy allows us to constrain the search over dynamical models to a single dimension. This makes it feasible to infer dynamics with very limited data, including cases in which important dynamical variables are unobserved. We name our method Sir Isaac after its ability to infer the dynamical structure of the law of gravitation given simulated planetary motion data. Applying the method to output from a microscopically complicated but macroscopically simple biological signaling model, it is able to adapt the level of detail to the amount of available data. Finally, using nematode behavioral time series data, the method discovers an effective switch between behavioral attractors after the application of a painful stimulus.

  12. Causal inference based on counterfactuals

    Directory of Open Access Journals (Sweden)

    Höfler M

    2005-09-01

    Full Text Available Abstract Background The counterfactual or potential outcome model has become increasingly standard for causal inference in epidemiological and medical studies. Discussion This paper provides an overview on the counterfactual and related approaches. A variety of conceptual as well as practical issues when estimating causal effects are reviewed. These include causal interactions, imperfect experiments, adjustment for confounding, time-varying exposures, competing risks and the probability of causation. It is argued that the counterfactual model of causal effects captures the main aspects of causality in health sciences and relates to many statistical procedures. Summary Counterfactuals are the basis of causal inference in medicine and epidemiology. Nevertheless, the estimation of counterfactual differences pose several difficulties, primarily in observational studies. These problems, however, reflect fundamental barriers only when learning from observations, and this does not invalidate the counterfactual concept.

  13. System Support for Forensic Inference

    Science.gov (United States)

    Gehani, Ashish; Kirchner, Florent; Shankar, Natarajan

    Digital evidence is playing an increasingly important role in prosecuting crimes. The reasons are manifold: financially lucrative targets are now connected online, systems are so complex that vulnerabilities abound and strong digital identities are being adopted, making audit trails more useful. If the discoveries of forensic analysts are to hold up to scrutiny in court, they must meet the standard for scientific evidence. Software systems are currently developed without consideration of this fact. This paper argues for the development of a formal framework for constructing “digital artifacts” that can serve as proxies for physical evidence; a system so imbued would facilitate sound digital forensic inference. A case study involving a filesystem augmentation that provides transparent support for forensic inference is described.

  14. Nonparametric Bayesian inference in biostatistics

    CERN Document Server

    Müller, Peter

    2015-01-01

    As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...

  15. Bayesian Inference and Online Learning in Poisson Neuronal Networks.

    Science.gov (United States)

    Huang, Yanping; Rao, Rajesh P N

    2016-08-01

    Motivated by the growing evidence for Bayesian computation in the brain, we show how a two-layer recurrent network of Poisson neurons can perform both approximate Bayesian inference and learning for any hidden Markov model. The lower-layer sensory neurons receive noisy measurements of hidden world states. The higher-layer neurons infer a posterior distribution over world states via Bayesian inference from inputs generated by sensory neurons. We demonstrate how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in a higher-layer neuron represents a sample of a particular hidden world state. The spiking activity across the neural population approximates the posterior distribution over hidden states. In this model, variability in spiking is regarded not as a nuisance but as an integral feature that provides the variability necessary for sampling during inference. We demonstrate how the network can learn the likelihood model, as well as the transition probabilities underlying the dynamics, using a Hebbian learning rule. We present results illustrating the ability of the network to perform inference and learning for arbitrary hidden Markov models.

  16. Automatic physical inference with information maximizing neural networks

    Science.gov (United States)

    Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.

    2018-04-01

    Compressing large data sets to a manageable number of summaries that are informative about the underlying parameters vastly simplifies both frequentist and Bayesian inference. When only simulations are available, these summaries are typically chosen heuristically, so they may inadvertently miss important information. We introduce a simulation-based machine learning technique that trains artificial neural networks to find nonlinear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). In test cases where the posterior can be derived exactly, likelihood-free inference based on automatically derived IMNN summaries produces nearly exact posteriors, showing that these summaries are good approximations to sufficient statistics. In a series of numerical examples of increasing complexity and astrophysical relevance we show that IMNNs are robustly capable of automatically finding optimal, nonlinear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima. We anticipate that the automatic physical inference method described in this paper will be essential to obtain both accurate and precise cosmological parameter estimates from complex and large astronomical data sets, including those from LSST and Euclid.

  17. On Quantum Statistical Inference, II

    OpenAIRE

    Barndorff-Nielsen, O. E.; Gill, R. D.; Jupp, P. E.

    2003-01-01

    Interest in problems of statistical inference connected to measurements of quantum systems has recently increased substantially, in step with dramatic new developments in experimental techniques for studying small quantum systems. Furthermore, theoretical developments in the theory of quantum measurements have brought the basic mathematical framework for the probability calculations much closer to that of classical probability theory. The present paper reviews this field and proposes and inte...

  18. Nonparametric predictive inference in reliability

    International Nuclear Information System (INIS)

    Coolen, F.P.A.; Coolen-Schrijner, P.; Yan, K.J.

    2002-01-01

    We introduce a recently developed statistical approach, called nonparametric predictive inference (NPI), to reliability. Bounds for the survival function for a future observation are presented. We illustrate how NPI can deal with right-censored data, and discuss aspects of competing risks. We present possible applications of NPI for Bernoulli data, and we briefly outline applications of NPI for replacement decisions. The emphasis is on introduction and illustration of NPI in reliability contexts, detailed mathematical justifications are presented elsewhere

  19. Variational inference & deep learning : A new synthesis

    NARCIS (Netherlands)

    Kingma, D.P.

    2017-01-01

    In this thesis, Variational Inference and Deep Learning: A New Synthesis, we propose novel solutions to the problems of variational (Bayesian) inference, generative modeling, representation learning, semi-supervised learning, and stochastic optimization.

  20. Variational inference & deep learning: A new synthesis

    OpenAIRE

    Kingma, D.P.

    2017-01-01

    In this thesis, Variational Inference and Deep Learning: A New Synthesis, we propose novel solutions to the problems of variational (Bayesian) inference, generative modeling, representation learning, semi-supervised learning, and stochastic optimization.

  1. Continuous Integrated Invariant Inference, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — The proposed project will develop a new technique for invariant inference and embed this and other current invariant inference and checking techniques in an...

  2. Variations on Bayesian Prediction and Inference

    Science.gov (United States)

    2016-05-09

    inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle

  3. Adaptive Inference on General Graphical Models

    OpenAIRE

    Acar, Umut A.; Ihler, Alexander T.; Mettu, Ramgopal; Sumer, Ozgur

    2012-01-01

    Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive inference is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional ...

  4. Quantum Enhanced Inference in Markov Logic Networks.

    Science.gov (United States)

    Wittek, Peter; Gogolin, Christian

    2017-04-19

    Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.

  5. Quantum Enhanced Inference in Markov Logic Networks

    Science.gov (United States)

    Wittek, Peter; Gogolin, Christian

    2017-04-01

    Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.

  6. Indirect Inference for Stochastic Differential Equations Based on Moment Expansions

    KAUST Repository

    Ballesio, Marco

    2016-01-06

    We provide an indirect inference method to estimate the parameters of timehomogeneous scalar diffusion and jump diffusion processes. We obtain a system of ODEs that approximate the time evolution of the first two moments of the process by the approximation of the stochastic model applying a second order Taylor expansion of the SDE s infinitesimal generator in the Dynkin s formula. This method allows a simple and efficient procedure to infer the parameters of such stochastic processes given the data by the maximization of the likelihood of an approximating Gaussian process described by the two moments equations. Finally, we perform numerical experiments for two datasets arising from organic and inorganic fouling deposition phenomena.

  7. Systematic parameter inference in stochastic mesoscopic modeling

    Energy Technology Data Exchange (ETDEWEB)

    Lei, Huan; Yang, Xiu [Pacific Northwest National Laboratory, Richland, WA 99352 (United States); Li, Zhen [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States); Karniadakis, George Em, E-mail: george_karniadakis@brown.edu [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)

    2017-02-01

    We propose a method to efficiently determine the optimal coarse-grained force field in mesoscopic stochastic simulations of Newtonian fluid and polymer melt systems modeled by dissipative particle dynamics (DPD) and energy conserving dissipative particle dynamics (eDPD). The response surfaces of various target properties (viscosity, diffusivity, pressure, etc.) with respect to model parameters are constructed based on the generalized polynomial chaos (gPC) expansion using simulation results on sampling points (e.g., individual parameter sets). To alleviate the computational cost to evaluate the target properties, we employ the compressive sensing method to compute the coefficients of the dominant gPC terms given the prior knowledge that the coefficients are “sparse”. The proposed method shows comparable accuracy with the standard probabilistic collocation method (PCM) while it imposes a much weaker restriction on the number of the simulation samples especially for systems with high dimensional parametric space. Fully access to the response surfaces within the confidence range enables us to infer the optimal force parameters given the desirable values of target properties at the macroscopic scale. Moreover, it enables us to investigate the intrinsic relationship between the model parameters, identify possible degeneracies in the parameter space, and optimize the model by eliminating model redundancies. The proposed method provides an efficient alternative approach for constructing mesoscopic models by inferring model parameters to recover target properties of the physics systems (e.g., from experimental measurements), where those force field parameters and formulation cannot be derived from the microscopic level in a straight forward way.

  8. Inferring time-varying network topologies from gene expression data.

    Science.gov (United States)

    Rao, Arvind; Hero, Alfred O; States, David J; Engel, James Douglas

    2007-01-01

    Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster--to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.

  9. Inferring time derivatives including cell growth rates using Gaussian processes

    Science.gov (United States)

    Swain, Peter S.; Stevenson, Keiran; Leary, Allen; Montano-Gutierrez, Luis F.; Clark, Ivan B. N.; Vogel, Jackie; Pilizota, Teuta

    2016-12-01

    Often the time derivative of a measured variable is of as much interest as the variable itself. For a growing population of biological cells, for example, the population's growth rate is typically more important than its size. Here we introduce a non-parametric method to infer first and second time derivatives as a function of time from time-series data. Our approach is based on Gaussian processes and applies to a wide range of data. In tests, the method is at least as accurate as others, but has several advantages: it estimates errors both in the inference and in any summary statistics, such as lag times, and allows interpolation with the corresponding error estimation. As illustrations, we infer growth rates of microbial cells, the rate of assembly of an amyloid fibril and both the speed and acceleration of two separating spindle pole bodies. Our algorithm should thus be broadly applicable.

  10. More than one kind of inference: re-examining what's learned in feature inference and classification.

    Science.gov (United States)

    Sweller, Naomi; Hayes, Brett K

    2010-08-01

    Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.

  11. Generative inference for cultural evolution.

    Science.gov (United States)

    Kandler, Anne; Powell, Adam

    2018-04-05

    One of the major challenges in cultural evolution is to understand why and how various forms of social learning are used in human populations, both now and in the past. To date, much of the theoretical work on social learning has been done in isolation of data, and consequently many insights focus on revealing the learning processes or the distributions of cultural variants that are expected to have evolved in human populations. In population genetics, recent methodological advances have allowed a greater understanding of the explicit demographic and/or selection mechanisms that underlie observed allele frequency distributions across the globe, and their change through time. In particular, generative frameworks-often using coalescent-based simulation coupled with approximate Bayesian computation (ABC)-have provided robust inferences on the human past, with no reliance on a priori assumptions of equilibrium. Here, we demonstrate the applicability and utility of generative inference approaches to the field of cultural evolution. The framework advocated here uses observed population-level frequency data directly to establish the likely presence or absence of particular hypothesized learning strategies. In this context, we discuss the problem of equifinality and argue that, in the light of sparse cultural data and the multiplicity of possible social learning processes, the exclusion of those processes inconsistent with the observed data might be the most instructive outcome. Finally, we summarize the findings of generative inference approaches applied to a number of case studies.This article is part of the theme issue 'Bridging cultural gaps: interdisciplinary studies in human cultural evolution'. © 2018 The Author(s).

  12. The Probabilistic Convolution Tree: Efficient Exact Bayesian Inference for Faster LC-MS/MS Protein Inference

    Science.gov (United States)

    Serang, Oliver

    2014-01-01

    Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called “causal independence”). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to and the space to where is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions. PMID:24626234

  13. Making inference from wildlife collision data: inferring predator absence from prey strikes

    Directory of Open Access Journals (Sweden)

    Peter Caley

    2017-02-01

    Full Text Available Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application.

  14. Making inference from wildlife collision data: inferring predator absence from prey strikes.

    Science.gov (United States)

    Caley, Peter; Hosack, Geoffrey R; Barry, Simon C

    2017-01-01

    Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application.

  15. sick: The Spectroscopic Inference Crank

    Science.gov (United States)

    Casey, Andrew R.

    2016-03-01

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  16. SICK: THE SPECTROSCOPIC INFERENCE CRANK

    Energy Technology Data Exchange (ETDEWEB)

    Casey, Andrew R., E-mail: arc@ast.cam.ac.uk [Institute of Astronomy, University of Cambridge, Madingley Road, Cambdridge, CB3 0HA (United Kingdom)

    2016-03-15

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  17. Inference in hybrid Bayesian networks

    International Nuclear Information System (INIS)

    Langseth, Helge; Nielsen, Thomas D.; Rumi, Rafael; Salmeron, Antonio

    2009-01-01

    Since the 1980s, Bayesian networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability techniques (like fault trees and reliability block diagrams). However, limitations in the BNs' calculation engine have prevented BNs from becoming equally popular for domains containing mixtures of both discrete and continuous variables (the so-called hybrid domains). In this paper we focus on these difficulties, and summarize some of the last decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability.

  18. SICK: THE SPECTROSCOPIC INFERENCE CRANK

    International Nuclear Information System (INIS)

    Casey, Andrew R.

    2016-01-01

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  19. Facility Activity Inference Using Radiation Networks

    Energy Technology Data Exchange (ETDEWEB)

    Rao, Nageswara S. [ORNL; Ramirez Aviles, Camila A. [ORNL

    2017-11-01

    We consider the problem of inferring the operational status of a reactor facility using measurements from a radiation sensor network deployed around the facility’s ventilation off-gas stack. The intensity of stack emissions decays with distance, and the sensor counts or measurements are inherently random with parameters determined by the intensity at the sensor’s location. We utilize the measurements to estimate the intensity at the stack, and use it in a one-sided Sequential Probability Ratio Test (SPRT) to infer on/off status of the reactor. We demonstrate the superior performance of this method over conventional majority fusers and individual sensors using (i) test measurements from a network of 21 NaI detectors, and (ii) effluence measurements collected at the stack of a reactor facility. We also analytically establish the superior detection performance of the network over individual sensors with fixed and adaptive thresholds by utilizing the Poisson distribution of the counts. We quantify the performance improvements of the network detection over individual sensors using the packing number of the intensity space.

  20. Coalescent methods for estimating phylogenetic trees.

    Science.gov (United States)

    Liu, Liang; Yu, Lili; Kubatko, Laura; Pearl, Dennis K; Edwards, Scott V

    2009-10-01

    We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.

  1. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

    Directory of Open Access Journals (Sweden)

    James B Pettengill

    Full Text Available The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis. In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples and computational (petabytes of sequence data issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST scheme. When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates there are features (e.g., genomic, assembly, and contamination that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

  2. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

    Science.gov (United States)

    Pettengill, James B; Pightling, Arthur W; Baugher, Joseph D; Rand, Hugh; Strain, Errol

    2016-01-01

    The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

  3. Inferring the conservative causal core of gene regulatory networks

    Directory of Open Access Journals (Sweden)

    Emmert-Streib Frank

    2010-09-01

    Full Text Available Abstract Background Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. Results In this paper, we introduce a novel gene regulatory network inference (GRNI algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. Conclusions For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.

  4. Inferring the conservative causal core of gene regulatory networks.

    Science.gov (United States)

    Altay, Gökmen; Emmert-Streib, Frank

    2010-09-28

    Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.

  5. Inferring probabilistic stellar rotation periods using Gaussian processes

    Science.gov (United States)

    Angus, Ruth; Morton, Timothy; Aigrain, Suzanne; Foreman-Mackey, Daniel; Rajpaul, Vinesh

    2018-02-01

    Variability in the light curves of spotted, rotating stars is often non-sinusoidal and quasi-periodic - spots move on the stellar surface and have finite lifetimes, causing stellar flux variations to slowly shift in phase. A strictly periodic sinusoid therefore cannot accurately model a rotationally modulated stellar light curve. Physical models of stellar surfaces have many drawbacks preventing effective inference, such as highly degenerate or high-dimensional parameter spaces. In this work, we test an appropriate effective model: a Gaussian Process with a quasi-periodic covariance kernel function. This highly flexible model allows sampling of the posterior probability density function of the periodic parameter, marginalizing over the other kernel hyperparameters using a Markov Chain Monte Carlo approach. To test the effectiveness of this method, we infer rotation periods from 333 simulated stellar light curves, demonstrating that the Gaussian process method produces periods that are more accurate than both a sine-fitting periodogram and an autocorrelation function method. We also demonstrate that it works well on real data, by inferring rotation periods for 275 Kepler stars with previously measured periods. We provide a table of rotation periods for these and many more, altogether 1102 Kepler objects of interest, and their posterior probability density function samples. Because this method delivers posterior probability density functions, it will enable hierarchical studies involving stellar rotation, particularly those involving population modelling, such as inferring stellar ages, obliquities in exoplanet systems, or characterizing star-planet interactions. The code used to implement this method is available online.

  6. Subjective randomness as statistical inference.

    Science.gov (United States)

    Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B

    2018-06-01

    Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.

  7. A grammar inference approach for predicting kinase specific phosphorylation sites.

    Science.gov (United States)

    Datta, Sutapa; Mukhopadhyay, Subhasis

    2015-01-01

    Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.

  8. A Grammar Inference Approach for Predicting Kinase Specific Phosphorylation Sites

    Science.gov (United States)

    Datta, Sutapa; Mukhopadhyay, Subhasis

    2015-01-01

    Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner. PMID:25886273

  9. Gene expression inference with deep learning.

    Science.gov (United States)

    Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui

    2016-06-15

    Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Simultaneous inference for model averaging of derived parameters

    DEFF Research Database (Denmark)

    Jensen, Signe Marie; Ritz, Christian

    2015-01-01

    Model averaging is a useful approach for capturing uncertainty due to model selection. Currently, this uncertainty is often quantified by means of approximations that do not easily extend to simultaneous inference. Moreover, in practice there is a need for both model averaging and simultaneous...... inference for derived parameters calculated in an after-fitting step. We propose a method for obtaining asymptotically correct standard errors for one or several model-averaged estimates of derived parameters and for obtaining simultaneous confidence intervals that asymptotically control the family...

  11. Reliability of dose volume constraint inference from clinical data

    DEFF Research Database (Denmark)

    Lutz, C M; Møller, D S; Hoffmann, L

    2017-01-01

    Dose volume histogram points (DVHPs) frequently serve as dose constraints in radiotherapy treatment planning. An experiment was designed to investigate the reliability of DVHP inference from clinical data for multiple cohort sizes and complication incidence rates. The experimental background...... was radiation pneumonitis in non-small cell lung cancer and the DVHP inference method was based on logistic regression. From 102 NSCLC real-life dose distributions and a postulated DVHP model, an 'ideal' cohort was generated where the most predictive model was equal to the postulated model. A bootstrap...

  12. Bayesian inference with information content model check for Langevin equations

    DEFF Research Database (Denmark)

    Krog, Jens F. C.; Lomholt, Michael Andersen

    2017-01-01

    The Bayesian data analysis framework has been proven to be a systematic and effective method of parameter inference and model selection for stochastic processes. In this work we introduce an information content model check which may serve as a goodness-of-fit, like the chi-square procedure...

  13. Inferring Stop-Locations from WiFi

    DEFF Research Database (Denmark)

    Wind, David Kofoed; Sapiezynski, Piotr; Furman, Magdalena Anna

    2016-01-01

    methods are based exclusively on WiFi data. We study two months of WiFi data collected every two minutes by a smartphone, and infer stop-locations in the form of labelled time-intervals. For this purpose, we investigate two algorithms, both of which scale to large datasets: a greedy approach to select...

  14. Segmentation, Inference and Classification of Partially Overlapping Nanoparticles

    KAUST Repository

    Chiwoo Park,; Huang, J. Z.; Ji, J. X.; Yu Ding,

    2013-01-01

    an agglomerate of overlapping nano-objects; infer the particle's missing contours; and ultimately, classify the particles by shape based on their complete contours. Our specific method adopts a two-stage approach: the first stage executes the task of particle

  15. A Fast Iterative Bayesian Inference Algorithm for Sparse Channel Estimation

    DEFF Research Database (Denmark)

    Pedersen, Niels Lovmand; Manchón, Carles Navarro; Fleury, Bernard Henri

    2013-01-01

    representation of the Bessel K probability density function; a highly efficient, fast iterative Bayesian inference method is then applied to the proposed model. The resulting estimator outperforms other state-of-the-art Bayesian and non-Bayesian estimators, either by yielding lower mean squared estimation error...

  16. Fisher information and statistical inference for phase-type distributions

    DEFF Research Database (Denmark)

    Bladt, Mogens; Esparza, Luz Judith R; Nielsen, Bo Friis

    2011-01-01

    This paper is concerned with statistical inference for both continuous and discrete phase-type distributions. We consider maximum likelihood estimation, where traditionally the expectation-maximization (EM) algorithm has been employed. Certain numerical aspects of this method are revised and we...

  17. Implementing and analyzing the multi-threaded LP-inference

    Science.gov (United States)

    Bolotova, S. Yu; Trofimenko, E. V.; Leschinskaya, M. V.

    2018-03-01

    The logical production equations provide new possibilities for the backward inference optimization in intelligent production-type systems. The strategy of a relevant backward inference is aimed at minimization of a number of queries to external information source (either to a database or an interactive user). The idea of the method is based on the computing of initial preimages set and searching for the true preimage. The execution of each stage can be organized independently and in parallel and the actual work at a given stage can also be distributed between parallel computers. This paper is devoted to the parallel algorithms of the relevant inference based on the advanced scheme of the parallel computations “pipeline” which allows to increase the degree of parallelism. The author also provides some details of the LP-structures implementation.

  18. SPEEDY: An Eclipse-based IDE for invariant inference

    Directory of Open Access Journals (Sweden)

    David R. Cok

    2014-04-01

    Full Text Available SPEEDY is an Eclipse-based IDE for exploring techniques that assist users in generating correct specifications, particularly including invariant inference algorithms and tools. It integrates with several back-end tools that propose invariants and will incorporate published algorithms for inferring object and loop invariants. Though the architecture is language-neutral, current SPEEDY targets C programs. Building and using SPEEDY has confirmed earlier experience demonstrating the importance of showing and editing specifications in the IDEs that developers customarily use, automating as much of the production and checking of specifications as possible, and showing counterexample information directly in the source code editing environment. As in previous work, automation of specification checking is provided by back-end SMT solvers. However, reducing the effort demanded of software developers using formal methods also requires a GUI design that guides users in writing, reviewing, and correcting specifications and automates specification inference.

  19. Inference of population history and patterns from molecular data

    DEFF Research Database (Denmark)

    Tataru, Paula

    , the existing mathematical models and computational methods need to be reformulated. I address this from an inference perspective in two areas of bioinformatics. Population genetics studies the influence exerted by various factors on the dynamics of a population's genetic variation. These factors cover...... evolutionary forces, such as mutation and selection, but also changes in population size. The aim in population genetics is to untangle the history of a population from observed genetic variation. This subject is dominated by two dual models, the Wright-Fisher and coalescent. I first introduce a new...... approximation to the Wright-Fisher model, which I show to accurately infer split times between populations. This approximation can potentially be applied for inference of mutation rates and selection coefficients. I then illustrate how the coalescent process is the natural framework for detecting traces...

  20. Technical Note: How to use Winbugs to infer animal models

    DEFF Research Database (Denmark)

    Damgaard, Lars Holm

    2007-01-01

    This paper deals with Bayesian inferences of animal models using Gibbs sampling. First, we suggest a general and efficient method for updating additive genetic effects, in which the computational cost is independent of the pedigree depth and increases linearly only with the size of the pedigree....... Second, we show how this approach can be used to draw inferences from a wide range of animal models using the computer package Winbugs. Finally, we illustrate the approach in a simulation study, in which the data are generated and analyzed using Winbugs according to a linear model with i.i.d errors...... having Student's t distributions. In conclusion, Winbugs can be used to make inferences in small-sized, quantitative, genetic data sets applying a wide range of animal models that are not yet standard in the animal breeding literature...

  1. Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.

    Science.gov (United States)

    Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping

    2015-06-07

    Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.

  2. Bayesian inference and updating of reliability data

    International Nuclear Information System (INIS)

    Sabri, Z.A.; Cullingford, M.C.; David, H.T.; Husseiny, A.A.

    1980-01-01

    A Bayes methodology for inference of reliability values using available but scarce current data is discussed. The method can be used to update failure rates as more information becomes available from field experience, assuming that the performance of a given component (or system) exhibits a nonhomogeneous Poisson process. Bayes' theorem is used to summarize the historical evidence and current component data in the form of a posterior distribution suitable for prediction and for smoothing or interpolation. An example is given. It may be appropriate to apply the methodology developed here to human error data, in which case the exponential model might be used to describe the learning behavior of the operator or maintenance crew personnel

  3. Automatic inference of indexing rules for MEDLINE

    Directory of Open Access Journals (Sweden)

    Shooshan Sonya E

    2008-11-01

    Full Text Available Abstract Background: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. Methods: In this paper, we describe the use and the customization of Inductive Logic Programming (ILP to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. Results: Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI, a system producing automatic indexing recommendations for MEDLINE. Conclusion: We expect the sets of ILP rules obtained in this experiment to be integrated into MTI.

  4. Progression inference for somatic mutations in cancer

    Directory of Open Access Journals (Sweden)

    Leif E. Peterson

    2017-04-01

    Full Text Available Computational methods were employed to determine progression inference of genomic alterations in commonly occurring cancers. Using cross-sectional TCGA data, we computed evolutionary trajectories involving selectivity relationships among pairs of gene-specific genomic alterations such as somatic mutations, deletions, amplifications, downregulation, and upregulation among the top 20 driver genes associated with each cancer. Results indicate that the majority of hierarchies involved TP53, PIK3CA, ERBB2, APC, KRAS, EGFR, IDH1, VHL, etc. Research into the order and accumulation of genomic alterations among cancer driver genes will ever-increase as the costs of nextgen sequencing subside, and personalized/precision medicine incorporates whole-genome scans into the diagnosis and treatment of cancer. Keywords: Oncology, Cancer research, Genetics, Computational biology

  5. Supplier Selection Using Fuzzy Inference System

    Directory of Open Access Journals (Sweden)

    hamidreza kadhodazadeh

    2014-01-01

    Full Text Available Suppliers are one of the most vital parts of supply chain whose operation has significant indirect effect on customer satisfaction. Since customer's expectations from organization are different, organizations should consider different standards, respectively. There are many researches in this field using different standards and methods in recent years. The purpose of this study is to propose an approach for choosing a supplier in a food manufacturing company considering cost, quality, service, type of relationship and structure standards of the supplier organization. To evaluate supplier according to the above standards, the fuzzy inference system has been used. Input data of this system includes supplier's score in any standard that is achieved by AHP approach and the output is final score of each supplier. Finally, a supplier has been selected that although is not the best in price and quality, has achieved good score in all of the standards.

  6. Lower complexity bounds for lifted inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred

    2015-01-01

    instances of the model. Numerous approaches for such “lifted inference” techniques have been proposed. While it has been demonstrated that these techniques will lead to significantly more efficient inference on some specific models, there are only very recent and still quite restricted results that show...... the feasibility of lifted inference on certain syntactically defined classes of models. Lower complexity bounds that imply some limitations for the feasibility of lifted inference on more expressive model classes were established earlier in Jaeger (2000; Jaeger, M. 2000. On the complexity of inference about...... that under the assumption that NETIME≠ETIME, there is no polynomial lifted inference algorithm for knowledge bases of weighted, quantifier-, and function-free formulas. Further strengthening earlier results, this is also shown to hold for approximate inference and for knowledge bases not containing...

  7. Type inference for correspondence types

    DEFF Research Database (Denmark)

    Hüttel, Hans; Gordon, Andy; Hansen, Rene Rydhof

    2009-01-01

    We present a correspondence type/effect system for authenticity in a π-calculus with polarized channels, dependent pair types and effect terms and show how one may, given a process P and an a priori type environment E, generate constraints that are formulae in the Alternating Least Fixed......-Point (ALFP) logic. We then show how a reasonable model of the generated constraints yields a type/effect assignment such that P becomes well-typed with respect to E if and only if this is possible. The formulae generated satisfy a finite model property; a system of constraints is satisfiable if and only...... if it has a finite model. As a consequence, we obtain the result that type/effect inference in our system is polynomial-time decidable....

  8. Bayesian inference for Markov jump processes with informative observations.

    Science.gov (United States)

    Golightly, Andrew; Wilkinson, Darren J

    2015-04-01

    In this paper we consider the problem of parameter inference for Markov jump process (MJP) representations of stochastic kinetic models. Since transition probabilities are intractable for most processes of interest yet forward simulation is straightforward, Bayesian inference typically proceeds through computationally intensive methods such as (particle) MCMC. Such methods ostensibly require the ability to simulate trajectories from the conditioned jump process. When observations are highly informative, use of the forward simulator is likely to be inefficient and may even preclude an exact (simulation based) analysis. We therefore propose three methods for improving the efficiency of simulating conditioned jump processes. A conditioned hazard is derived based on an approximation to the jump process, and used to generate end-point conditioned trajectories for use inside an importance sampling algorithm. We also adapt a recently proposed sequential Monte Carlo scheme to our problem. Essentially, trajectories are reweighted at a set of intermediate time points, with more weight assigned to trajectories that are consistent with the next observation. We consider two implementations of this approach, based on two continuous approximations of the MJP. We compare these constructs for a simple tractable jump process before using them to perform inference for a Lotka-Volterra system. The best performing construct is used to infer the parameters governing a simple model of motility regulation in Bacillus subtilis.

  9. Inference in partially identified models with many moment inequalities using Lasso

    DEFF Research Database (Denmark)

    Bugni, Federico A.; Caner, Mehmet; Kock, Anders Bredahl

    This paper considers the problem of inference in a partially identified moment (in)equality model with possibly many moment inequalities. Our contribution is to propose a novel two-step new inference method based on the combination of two ideas. On the one hand, our test statistic and critical...

  10. An efficient forward–reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks

    KAUST Repository

    Bayer, Christian; Moraes, Alvaro; Tempone, Raul; Vilanova, Pedro

    2016-01-01

    then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve

  11. Human brain lesion-deficit inference remapped.

    Science.gov (United States)

    Mah, Yee-Haur; Husain, Masud; Rees, Geraint; Nachev, Parashkev

    2014-09-01

    Our knowledge of the anatomical organization of the human brain in health and disease draws heavily on the study of patients with focal brain lesions. Historically the first method of mapping brain function, it is still potentially the most powerful, establishing the necessity of any putative neural substrate for a given function or deficit. Great inferential power, however, carries a crucial vulnerability: without stronger alternatives any consistent error cannot be easily detected. A hitherto unexamined source of such error is the structure of the high-dimensional distribution of patterns of focal damage, especially in ischaemic injury-the commonest aetiology in lesion-deficit studies-where the anatomy is naturally shaped by the architecture of the vascular tree. This distribution is so complex that analysis of lesion data sets of conventional size cannot illuminate its structure, leaving us in the dark about the presence or absence of such error. To examine this crucial question we assembled the largest known set of focal brain lesions (n = 581), derived from unselected patients with acute ischaemic injury (mean age = 62.3 years, standard deviation = 17.8, male:female ratio = 0.547), visualized with diffusion-weighted magnetic resonance imaging, and processed with validated automated lesion segmentation routines. High-dimensional analysis of this data revealed a hidden bias within the multivariate patterns of damage that will consistently distort lesion-deficit maps, displacing inferred critical regions from their true locations, in a manner opaque to replication. Quantifying the size of this mislocalization demonstrates that past lesion-deficit relationships estimated with conventional inferential methodology are likely to be significantly displaced, by a magnitude dependent on the unknown underlying lesion-deficit relationship itself. Past studies therefore cannot be retrospectively corrected, except by new knowledge that would render them redundant

  12. Reliability of dose volume constraint inference from clinical data

    Science.gov (United States)

    Lutz, C. M.; Møller, D. S.; Hoffmann, L.; Knap, M. M.; Alber, M.

    2017-04-01

    Dose volume histogram points (DVHPs) frequently serve as dose constraints in radiotherapy treatment planning. An experiment was designed to investigate the reliability of DVHP inference from clinical data for multiple cohort sizes and complication incidence rates. The experimental background was radiation pneumonitis in non-small cell lung cancer and the DVHP inference method was based on logistic regression. From 102 NSCLC real-life dose distributions and a postulated DVHP model, an ‘ideal’ cohort was generated where the most predictive model was equal to the postulated model. A bootstrap and a Cohort Replication Monte Carlo (CoRepMC) approach were applied to create 1000 equally sized populations each. The cohorts were then analyzed to establish inference frequency distributions. This was applied to nine scenarios for cohort sizes of 102 (1), 500 (2) to 2000 (3) patients (by sampling with replacement) and three postulated DVHP models. The Bootstrap was repeated for a ‘non-ideal’ cohort, where the most predictive model did not coincide with the postulated model. The Bootstrap produced chaotic results for all models of cohort size 1 for both the ideal and non-ideal cohorts. For cohort size 2 and 3, the distributions for all populations were more concentrated around the postulated DVHP. For the CoRepMC, the inference frequency increased with cohort size and incidence rate. Correct inference rates  >85 % were only achieved by cohorts with more than 500 patients. Both Bootstrap and CoRepMC indicate that inference of the correct or approximate DVHP for typical cohort sizes is highly uncertain. CoRepMC results were less spurious than Bootstrap results, demonstrating the large influence that randomness in dose-response has on the statistical analysis.

  13. Functional networks inference from rule-based machine learning models.

    Science.gov (United States)

    Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume

    2016-01-01

    Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The

  14. Inferring Demographic History Using Two-Locus Statistics.

    Science.gov (United States)

    Ragsdale, Aaron P; Gutenkunst, Ryan N

    2017-06-01

    Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.

  15. Causal inference in survival analysis using pseudo-observations.

    Science.gov (United States)

    Andersen, Per K; Syriopoulou, Elisavet; Parner, Erik T

    2017-07-30

    Causal inference for non-censored response variables, such as binary or quantitative outcomes, is often based on either (1) direct standardization ('G-formula') or (2) inverse probability of treatment assignment weights ('propensity score'). To do causal inference in survival analysis, one needs to address right-censoring, and often, special techniques are required for that purpose. We will show how censoring can be dealt with 'once and for all' by means of so-called pseudo-observations when doing causal inference in survival analysis. The pseudo-observations can be used as a replacement of the outcomes without censoring when applying 'standard' causal inference methods, such as (1) or (2) earlier. We study this idea for estimating the average causal effect of a binary treatment on the survival probability, the restricted mean lifetime, and the cumulative incidence in a competing risks situation. The methods will be illustrated in a small simulation study and via a study of patients with acute myeloid leukemia who received either myeloablative or non-myeloablative conditioning before allogeneic hematopoetic cell transplantation. We will estimate the average causal effect of the conditioning regime on outcomes such as the 3-year overall survival probability and the 3-year risk of chronic graft-versus-host disease. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  16. Genealogical and evolutionary inference with the human Y chromosome.

    Science.gov (United States)

    Stumpf, M P; Goldstein, D B

    2001-03-02

    Population genetics has emerged as a powerful tool for unraveling human history. In addition to the study of mitochondrial and autosomal DNA, attention has recently focused on Y-chromosome variation. Ambiguities and inaccuracies in data analysis, however, pose an important obstacle to further development of the field. Here we review the methods available for genealogical inference using Y-chromosome data. Approaches can be divided into those that do and those that do not use an explicit population model in genealogical inference. We describe the strengths and weaknesses of these model-based and model-free approaches, as well as difficulties associated with the mutation process that affect both methods. In the case of genealogical inference using microsatellite loci, we use coalescent simulations to show that relatively simple generalizations of the mutation process can greatly increase the accuracy of genealogical inference. Because model-free and model-based approaches have different biases and limitations, we conclude that there is considerable benefit in the continued use of both types of approaches.

  17. LAIT: a local ancestry inference toolkit.

    Science.gov (United States)

    Hui, Daniel; Fang, Zhou; Lin, Jerome; Duan, Qing; Li, Yun; Hu, Ming; Chen, Wei

    2017-09-06

    Inferring local ancestry in individuals of mixed ancestry has many applications, most notably in identifying disease-susceptible loci that vary among different ethnic groups. Many software packages are available for inferring local ancestry in admixed individuals. However, most of these existing software packages require specific formatted input files and generate output files in various types, yielding practical inconvenience. We developed a tool set, Local Ancestry Inference Toolkit (LAIT), which can convert standardized files into software-specific input file formats as well as standardize and summarize inference results for four popular local ancestry inference software: HAPMIX, LAMP, LAMP-LD, and ELAI. We tested LAIT using both simulated and real data sets and demonstrated that LAIT provides convenience to run multiple local ancestry inference software. In addition, we evaluated the performance of local ancestry software among different supported software packages, mainly focusing on inference accuracy and computational resources used. We provided a toolkit to facilitate the use of local ancestry inference software, especially for users with limited bioinformatics background.

  18. Forward and backward inference in spatial cognition.

    Directory of Open Access Journals (Sweden)

    Will D Penny

    Full Text Available This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.

  19. Generative Inferences Based on Learned Relations

    Science.gov (United States)

    Chen, Dawn; Lu, Hongjing; Holyoak, Keith J.

    2017-01-01

    A key property of relational representations is their "generativity": From partial descriptions of relations between entities, additional inferences can be drawn about other entities. A major theoretical challenge is to demonstrate how the capacity to make generative inferences could arise as a result of learning relations from…

  20. Inference in models with adaptive learning

    NARCIS (Netherlands)

    Chevillon, G.; Massmann, M.; Mavroeidis, S.

    2010-01-01

    Identification of structural parameters in models with adaptive learning can be weak, causing standard inference procedures to become unreliable. Learning also induces persistent dynamics, and this makes the distribution of estimators and test statistics non-standard. Valid inference can be

  1. Fiducial inference - A Neyman-Pearson interpretation

    NARCIS (Netherlands)

    Salome, D; VonderLinden, W; Dose,; Fischer, R; Preuss, R

    1999-01-01

    Fisher's fiducial argument is a tool for deriving inferences in the form of a probability distribution on the parameter space, not based on Bayes's Theorem. Lindley established that in exceptional situations fiducial inferences coincide with posterior distributions; in the other situations fiducial

  2. Uncertainty in prediction and in inference

    NARCIS (Netherlands)

    Hilgevoord, J.; Uffink, J.

    1991-01-01

    The concepts of uncertainty in prediction and inference are introduced and illustrated using the diffraction of light as an example. The close re-lationship between the concepts of uncertainty in inference and resolving power is noted. A general quantitative measure of uncertainty in

  3. The Impact of Disablers on Predictive Inference

    Science.gov (United States)

    Cummins, Denise Dellarosa

    2014-01-01

    People consider alternative causes when deciding whether a cause is responsible for an effect (diagnostic inference) but appear to neglect them when deciding whether an effect will occur (predictive inference). Five experiments were conducted to test a 2-part explanation of this phenomenon: namely, (a) that people interpret standard predictive…

  4. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Darwiche, Adnan; Chavira, Mark

    2006-01-01

    We describe in this paper a system for exact inference with relational Bayesian networks as defined in the publicly available PRIMULA tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference...

  5. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan

    2004-01-01

    We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...

  6. Connectivity inference from neural recording data: Challenges, mathematical bases and research directions.

    Science.gov (United States)

    Magrans de Abril, Ildefons; Yoshimoto, Junichiro; Doya, Kenji

    2018-06-01

    This article presents a review of computational methods for connectivity inference from neural activity data derived from multi-electrode recordings or fluorescence imaging. We first identify biophysical and technical challenges in connectivity inference along the data processing pipeline. We then review connectivity inference methods based on two major mathematical foundations, namely, descriptive model-free approaches and generative model-based approaches. We investigate representative studies in both categories and clarify which challenges have been addressed by which method. We further identify critical open issues and possible research directions. Copyright © 2018 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  7. Reinforcement learning or active inference?

    Science.gov (United States)

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  8. Reinforcement learning or active inference?

    Directory of Open Access Journals (Sweden)

    Karl J Friston

    2009-07-01

    Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  9. Active inference and epistemic value.

    Science.gov (United States)

    Friston, Karl; Rigoli, Francesco; Ognibene, Dimitri; Mathys, Christoph; Fitzgerald, Thomas; Pezzulo, Giovanni

    2015-01-01

    We offer a formal treatment of choice behavior based on the premise that agents minimize the expected free energy of future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic (or intrinsic) value. Minimizing expected free energy is therefore equivalent to maximizing extrinsic value or expected utility (defined in terms of prior preferences or goals), while maximizing information gain or intrinsic value (or reducing uncertainty about the causes of valuable outcomes). The resulting scheme resolves the exploration-exploitation dilemma: Epistemic value is maximized until there is no further information gain, after which exploitation is assured through maximization of extrinsic value. This is formally consistent with the Infomax principle, generalizing formulations of active vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk-sensitive (Kullback-Leibler) control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems, ad hoc softmax parameters become the expected (Bayes-optimal) precision of beliefs about, or confidence in, policies. This article focuses on the basic theory, illustrating the ideas with simulations. A key aspect of these simulations is the similarity between precision updates and dopaminergic discharges observed in conditioning paradigms.

  10. Bootstrap-Based Inference for Cube Root Consistent Estimators

    DEFF Research Database (Denmark)

    Cattaneo, Matias D.; Jansson, Michael; Nagasawa, Kenichi

    This note proposes a consistent bootstrap-based distributional approximation for cube root consistent estimators such as the maximum score estimator of Manski (1975) and the isotonic density estimator of Grenander (1956). In both cases, the standard nonparametric bootstrap is known...... to be inconsistent. Our method restores consistency of the nonparametric bootstrap by altering the shape of the criterion function defining the estimator whose distribution we seek to approximate. This modification leads to a generic and easy-to-implement resampling method for inference that is conceptually distinct...... from other available distributional approximations based on some form of modified bootstrap. We offer simulation evidence showcasing the performance of our inference method in finite samples. An extension of our methodology to general M-estimation problems is also discussed....

  11. Behavior Intention Derivation of Android Malware Using Ontology Inference

    Directory of Open Access Journals (Sweden)

    Jian Jiao

    2018-01-01

    Full Text Available Previous researches on Android malware mainly focus on malware detection, and malware’s evolution makes the process face certain hysteresis. The information presented by these detected results (malice judgment, family classification, and behavior characterization is limited for analysts. Therefore, a method is needed to restore the intention of malware, which reflects the relation between multiple behaviors of complex malware and its ultimate purpose. This paper proposes a novel description and derivation model of Android malware intention based on the theory of intention and malware reverse engineering. This approach creates ontology for malware intention to model the semantic relation between behaviors and its objects and automates the process of intention derivation by using SWRL rules transformed from intention model and Jess inference engine. Experiments on 75 typical samples show that the inference system can perform derivation of malware intention effectively, and 89.3% of the inference results are consistent with artificial analysis, which proves the feasibility and effectiveness of our theory and inference system.

  12. Inference of directional selection and mutation parameters assuming equilibrium.

    Science.gov (United States)

    Vogl, Claus; Bergman, Juraj

    2015-12-01

    In a classical study, Wright (1931) proposed a model for the evolution of a biallelic locus under the influence of mutation, directional selection and drift. He derived the equilibrium distribution of the allelic proportion conditional on the scaled mutation rate, the mutation bias and the scaled strength of directional selection. The equilibrium distribution can be used for inference of these parameters with genome-wide datasets of "site frequency spectra" (SFS). Assuming that the scaled mutation rate is low, Wright's model can be approximated by a boundary-mutation model, where mutations are introduced into the population exclusively from sites fixed for the preferred or unpreferred allelic states. With the boundary-mutation model, inference can be partitioned: (i) the shape of the SFS distribution within the polymorphic region is determined by random drift and directional selection, but not by the mutation parameters, such that inference of the selection parameter relies exclusively on the polymorphic sites in the SFS; (ii) the mutation parameters can be inferred from the amount of polymorphic and monomorphic preferred and unpreferred alleles, conditional on the selection parameter. Herein, we derive maximum likelihood estimators for the mutation and selection parameters in equilibrium and apply the method to simulated SFS data as well as empirical data from a Madagascar population of Drosophila simulans. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Phylogenetic inference with weighted codon evolutionary distances.

    Science.gov (United States)

    Criscuolo, Alexis; Michel, Christian J

    2009-04-01

    We develop a new approach to estimate a matrix of pairwise evolutionary distances from a codon-based alignment based on a codon evolutionary model. The method first computes a standard distance matrix for each of the three codon positions. Then these three distance matrices are weighted according to an estimate of the global evolutionary rate of each codon position and averaged into a unique distance matrix. Using a large set of both real and simulated codon-based alignments of nucleotide sequences, we show that this approach leads to distance matrices that have a significantly better treelikeness compared to those obtained by standard nucleotide evolutionary distances. We also propose an alternative weighting to eliminate the part of the noise often associated with some codon positions, particularly the third position, which is known to induce a fast evolutionary rate. Simulation results show that fast distance-based tree reconstruction algorithms on distance matrices based on this codon position weighting can lead to phylogenetic trees that are at least as accurate as, if not better, than those inferred by maximum likelihood. Finally, a well-known multigene dataset composed of eight yeast species and 106 codon-based alignments is reanalyzed and shows that our codon evolutionary distances allow building a phylogenetic tree which is similar to those obtained by non-distance-based methods (e.g., maximum parsimony and maximum likelihood) and also significantly improved compared to standard nucleotide evolutionary distance estimates.

  14. Inferring Molecular Processes Heterogeneity from Transcriptional Data.

    Science.gov (United States)

    Gogolewski, Krzysztof; Wronowska, Weronika; Lech, Agnieszka; Lesyng, Bogdan; Gambin, Anna

    2017-01-01

    RNA microarrays and RNA-seq are nowadays standard technologies to study the transcriptional activity of cells. Most studies focus on tracking transcriptional changes caused by specific experimental conditions. Information referring to genes up- and downregulation is evaluated analyzing the behaviour of relatively large population of cells by averaging its properties. However, even assuming perfect sample homogeneity, different subpopulations of cells can exhibit diverse transcriptomic profiles, as they may follow different regulatory/signaling pathways. The purpose of this study is to provide a novel methodological scheme to account for possible internal, functional heterogeneity in homogeneous cell lines, including cancer ones. We propose a novel computational method to infer the proportion between subpopulations of cells that manifest various functional behaviour in a given sample. Our method was validated using two datasets from RNA microarray experiments. Both experiments aimed to examine cell viability in specific experimental conditions. The presented methodology can be easily extended to RNA-seq data as well as other molecular processes. Moreover, it complements standard tools to indicate most important networks from transcriptomic data and in particular could be useful in the analysis of cancer cell lines affected by biologically active compounds or drugs.

  15. Mistaking geography for biology: inferring processes from species distributions.

    Science.gov (United States)

    Warren, Dan L; Cardillo, Marcel; Rosauer, Dan F; Bolnick, Daniel I

    2014-10-01

    Over the past few decades, there has been a rapid proliferation of statistical methods that infer evolutionary and ecological processes from data on species distributions. These methods have led to considerable new insights, but they often fail to account for the effects of historical biogeography on present-day species distributions. Because the geography of speciation can lead to patterns of spatial and temporal autocorrelation in the distributions of species within a clade, this can result in misleading inferences about the importance of deterministic processes in generating spatial patterns of biodiversity. In this opinion article, we discuss ways in which patterns of species distributions driven by historical biogeography are often interpreted as evidence of particular evolutionary or ecological processes. We focus on three areas that are especially prone to such misinterpretations: community phylogenetics, environmental niche modelling, and analyses of beta diversity (compositional turnover of biodiversity). Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.

  16. Robust Demographic Inference from Genomic and SNP Data

    Science.gov (United States)

    Excoffier, Laurent; Dupanloup, Isabelle; Huerta-Sánchez, Emilia; Sousa, Vitor C.; Foll, Matthieu

    2013-01-01

    We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets. PMID:24204310

  17. Inferring Domain Plans in Question-Answering

    National Research Council Canada - National Science Library

    Pollack, Martha E

    1986-01-01

    The importance of plan inference in models of conversation has been widely noted in the computational-linguistics literature, and its incorporation in question-answering systems has enabled a range...

  18. Scalable inference for stochastic block models

    KAUST Repository

    Peng, Chengbin; Zhang, Zhihua; Wong, Ka-Chun; Zhang, Xiangliang; Keyes, David E.

    2017-01-01

    Community detection in graphs is widely used in social and biological networks, and the stochastic block model is a powerful probabilistic tool for describing graphs with community structures. However, in the era of "big data," traditional inference

  19. Efficient algorithms for conditional independence inference

    Czech Academy of Sciences Publication Activity Database

    Bouckaert, R.; Hemmecke, R.; Lindner, S.; Studený, Milan

    2010-01-01

    Roč. 11, č. 1 (2010), s. 3453-3479 ISSN 1532-4435 R&D Projects: GA ČR GA201/08/0539; GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : conditional independence inference * linear programming approach Subject RIV: BA - General Mathematics Impact factor: 2.949, year: 2010 http://library.utia.cas.cz/separaty/2010/MTR/studeny-efficient algorithms for conditional independence inference.pdf

  20. Bayesian inference of chemical kinetic models from proposed reactions

    KAUST Repository

    Galagali, Nikhil

    2015-02-01

    © 2014 Elsevier Ltd. Bayesian inference provides a natural framework for combining experimental data with prior knowledge to develop chemical kinetic models and quantify the associated uncertainties, not only in parameter values but also in model structure. Most existing applications of Bayesian model selection methods to chemical kinetics have been limited to comparisons among a small set of models, however. The significant computational cost of evaluating posterior model probabilities renders traditional Bayesian methods infeasible when the model space becomes large. We present a new framework for tractable Bayesian model inference and uncertainty quantification using a large number of systematically generated model hypotheses. The approach involves imposing point-mass mixture priors over rate constants and exploring the resulting posterior distribution using an adaptive Markov chain Monte Carlo method. The posterior samples are used to identify plausible models, to quantify rate constant uncertainties, and to extract key diagnostic information about model structure-such as the reactions and operating pathways most strongly supported by the data. We provide numerical demonstrations of the proposed framework by inferring kinetic models for catalytic steam and dry reforming of methane using available experimental data.

  1. Inferring gene networks from discrete expression data

    KAUST Repository

    Zhang, L.

    2013-07-18

    The modeling of gene networks from transcriptional expression data is an important tool in biomedical research to reveal signaling pathways and to identify treatment targets. Current gene network modeling is primarily based on the use of Gaussian graphical models applied to continuous data, which give a closedformmarginal likelihood. In this paper,we extend network modeling to discrete data, specifically data from serial analysis of gene expression, and RNA-sequencing experiments, both of which generate counts of mRNAtranscripts in cell samples.We propose a generalized linear model to fit the discrete gene expression data and assume that the log ratios of the mean expression levels follow a Gaussian distribution.We restrict the gene network structures to decomposable graphs and derive the graphs by selecting the covariance matrix of the Gaussian distribution with the hyper-inverse Wishart priors. Furthermore, we incorporate prior network models based on gene ontology information, which avails existing biological information on the genes of interest. We conduct simulation studies to examine the performance of our discrete graphical model and apply the method to two real datasets for gene network inference. © The Author 2013. Published by Oxford University Press. All rights reserved.

  2. Multiple sequence alignment accuracy and phylogenetic inference.

    Science.gov (United States)

    Ogden, T Heath; Rosenberg, Michael S

    2006-04-01

    Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

  3. Reconciling taxonomy and phylogenetic inference: formalism and algorithms for describing discord and inferring taxonomic roots

    Directory of Open Access Journals (Sweden)

    Matsen Frederick A

    2012-05-01

    Full Text Available Abstract Background Although taxonomy is often used informally to evaluate the results of phylogenetic inference and the root of phylogenetic trees, algorithmic methods to do so are lacking. Results In this paper we formalize these procedures and develop algorithms to solve the relevant problems. In particular, we introduce a new algorithm that solves a "subcoloring" problem to express the difference between a taxonomy and a phylogeny at a given rank. This algorithm improves upon the current best algorithm in terms of asymptotic complexity for the parameter regime of interest; we also describe a branch-and-bound algorithm that saves orders of magnitude in computation on real data sets. We also develop a formalism and an algorithm for rooting phylogenetic trees according to a taxonomy. Conclusions The algorithms in this paper, and the associated freely-available software, will help biologists better use and understand taxonomically labeled phylogenetic trees.

  4. Fossils, molecules, divergence times, and the origin of lissamphibians.

    Science.gov (United States)

    Marjanović, David; Laurin, Michel

    2007-06-01

    A review of the paleontological literature shows that the early dates of appearance of Lissamphibia recently inferred from molecular data do not favor an origin of extant amphibians from temnospondyls, contrary to recent claims. A supertree is assembled using new Mesquite modules that allow extinct taxa to be incorporated into a time-calibrated phylogeny with a user-defined geological time scale. The supertree incorporates 223 extinct species of lissamphibians and has a highly significant stratigraphic fit. Some divergences can even be dated with sufficient precision to serve as calibration points in molecular divergence date analyses. Fourteen combinations of minimal branch length settings and 10 random resolutions for each polytomy give much more recent minimal origination times of lissamphibian taxa than recent studies based on a phylogenetic analyses of molecular sequences. Attempts to replicate recent molecular date estimates show that these estimates depend strongly on the choice of calibration points, on the dating method, and on the chosen model of evolution; for instance, the estimate for the date of the origin of Lissamphibia can lie between 351 and 266 Mya. This range of values is generally compatible with our time-calibrated supertree and indicates that there is no unbridgeable gap between dates obtained using the fossil record and those using molecular evidence, contrary to previous suggestions.

  5. On the criticality of inferred models

    Science.gov (United States)

    Mastromatteo, Iacopo; Marsili, Matteo

    2011-10-01

    Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality.

  6. On the criticality of inferred models

    International Nuclear Information System (INIS)

    Mastromatteo, Iacopo; Marsili, Matteo

    2011-01-01

    Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality

  7. Influence of the experimental design of gene expression studies on the inference of gene regulatory networks: environmental factors

    Directory of Open Access Journals (Sweden)

    Frank Emmert-Streib

    2013-02-01

    Full Text Available The inference of gene regulatory networks gained within recent years a considerable interest in the biology and biomedical community. The purpose of this paper is to investigate the influence that environmental conditions can exhibit on the inference performance of network inference algorithms. Specifically, we study five network inference methods, Aracne, BC3NET, CLR, C3NET and MRNET, and compare the results for three different conditions: (I observational gene expression data: normal environmental condition, (II interventional gene expression data: growth in rich media, (III interventional gene expression data: normal environmental condition interrupted by a positive spike-in stimulation. Overall, we find that different statistical inference methods lead to comparable, but condition-specific results. Further, our results suggest that non-steady-state data enhance the inferability of regulatory networks.

  8. A Bayesian Network Schema for Lessening Database Inference

    National Research Council Canada - National Science Library

    Chang, LiWu; Moskowitz, Ira S

    2001-01-01

    .... The authors introduce a formal schema for database inference analysis, based upon a Bayesian network structure, which identifies critical parameters involved in the inference problem and represents...

  9. Applied Bayesian hierarchical methods

    National Research Council Canada - National Science Library

    Congdon, P

    2010-01-01

    ... . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Posterior Inference from Bayes Formula . . . . . . . . . . . . 1.3 Markov Chain Monte Carlo Sampling in Relation to Monte Carlo Methods: Obtaining Posterior...

  10. Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

    Directory of Open Access Journals (Sweden)

    Richard R Stein

    2015-07-01

    Full Text Available Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

  11. Methodology for the inference of gene function from phenotype data.

    Science.gov (United States)

    Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A

    2014-12-12

    Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and

  12. Bayesian Inference on Gravitational Waves

    Directory of Open Access Journals (Sweden)

    Asad Ali

    2015-12-01

    Full Text Available The Bayesian approach is increasingly becoming popular among the astrophysics data analysis communities. However, the Pakistan statistics communities are unaware of this fertile interaction between the two disciplines. Bayesian methods have been in use to address astronomical problems since the very birth of the Bayes probability in eighteenth century. Today the Bayesian methods for the detection and parameter estimation of gravitational waves have solid theoretical grounds with a strong promise for the realistic applications. This article aims to introduce the Pakistan statistics communities to the applications of Bayesian Monte Carlo methods in the analysis of gravitational wave data with an  overview of the Bayesian signal detection and estimation methods and demonstration by a couple of simplified examples.

  13. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying; Hart, Jeffrey D.; Genton, Marc G.

    2012-01-01

    the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both

  14. Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

    Science.gov (United States)

    Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

    2017-01-01

    Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.

  15. A Bayesian Framework That Integrates Heterogeneous Data for Inferring Gene Regulatory Networks

    Energy Technology Data Exchange (ETDEWEB)

    Santra, Tapesh, E-mail: tapesh.santra@ucd.ie [Systems Biology Ireland, University College Dublin, Dublin (Ireland)

    2014-05-20

    Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein–protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.

  16. A Bayesian Framework That Integrates Heterogeneous Data for Inferring Gene Regulatory Networks

    International Nuclear Information System (INIS)

    Santra, Tapesh

    2014-01-01

    Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein–protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.

  17. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology

    Science.gov (United States)

    Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen

    2018-03-01

    Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data-space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper we use massive asymptotically-optimal data compression to reduce the dimensionality of the data-space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parameterized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate Density Estimation Likelihood-Free Inference with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological datasets.

  18. Network inference via adaptive optimal design

    Directory of Open Access Journals (Sweden)

    Stigter Johannes D

    2012-09-01

    Full Text Available Abstract Background Current research in network reverse engineering for genetic or metabolic networks very often does not include a proper experimental and/or input design. In this paper we address this issue in more detail and suggest a method that includes an iterative design of experiments based, on the most recent data that become available. The presented approach allows a reliable reconstruction of the network and addresses an important issue, i.e., the analysis and the propagation of uncertainties as they exist in both the data and in our own knowledge. These two types of uncertainties have their immediate ramifications for the uncertainties in the parameter estimates and, hence, are taken into account from the very beginning of our experimental design. Findings The method is demonstrated for two small networks that include a genetic network for mRNA synthesis and degradation and an oscillatory network describing a molecular network underlying adenosine 3’-5’ cyclic monophosphate (cAMP as observed in populations of Dyctyostelium cells. In both cases a substantial reduction in parameter uncertainty was observed. Extension to larger scale networks is possible but needs a more rigorous parameter estimation algorithm that includes sparsity as a constraint in the optimization procedure. Conclusion We conclude that a careful experiment design very often (but not always pays off in terms of reliability in the inferred network topology. For large scale networks a better parameter estimation algorithm is required that includes sparsity as an additional constraint. These algorithms are available in the literature and can also be used in an adaptive optimal design setting as demonstrated in this paper.

  19. Analogy in causal inference: rethinking Austin Bradford Hill's neglected consideration.

    Science.gov (United States)

    Weed, Douglas L

    2018-05-01

    The purpose of this article was to rethink and resurrect Austin Bradford Hill's "criterion" of analogy as an important consideration in causal inference. In epidemiology today, analogy is either completely ignored (e.g., in many textbooks), or equated with biologic plausibility or coherence, or aligned with the scientist's imagination. None of these examples, however, captures Hill's description of analogy. His words suggest that there may be something gained by contrasting two bodies of evidence, one from an established causal relationship, the other not. Coupled with developments in the methods of systematic assessments of evidence-including but not limited to meta-analysis-analogy can be restructured as a key component in causal inference. This new approach will require that a collection-a library-of known cases of causal inference (i.e., bodies of evidence involving established causal relationships) be developed. This library would likely include causal assessments by organizations such as the International Agency for Research on Cancer, the National Toxicology Program, and the United States Environmental Protection Agency. In addition, a process for describing key features of a causal relationship would need to be developed along with what will be considered paradigm cases of causation. Finally, it will be important to develop ways to objectively compare a "new" body of evidence with the relevant paradigm case of causation. Analogy, along with all other existing methods and causal considerations, may improve our ability to identify causal relationships. Copyright © 2018 Elsevier Inc. All rights reserved.

  20. Inference of protein diffusion probed via fluorescence correlation spectroscopy

    Science.gov (United States)

    Tsekouras, Konstantinos

    2015-03-01

    Fluctuations are an inherent part of single molecule or few particle biophysical data sets. Traditionally, ``noise'' fluctuations have been viewed as a nuisance, to be eliminated or minimized. Here we look on how statistical inference methods - that take explicit advantage of fluctuations - have allowed us to draw an unexpected picture of single molecule diffusional dynamics. Our focus is on the diffusion of proteins probed using fluorescence correlation spectroscopy (FCS). First, we discuss how - in collaboration with the Bustamante and Marqusee labs at UC Berkeley - we determined using FCS data that individual enzymes are perturbed by self-generated catalytic heat (Riedel et al, Nature, 2014). Using the tools of inference, we found how distributions of enzyme diffusion coefficients shift in the presence of substrate revealing that enzymes performing highly exothermic reactions dissipate heat by transiently accelerating their center of mass following a catalytic reaction. Next, when molecules diffuse in the cell nucleus they often appear to diffuse anomalously. We analyze FCS data - in collaboration with Rich Day at the IU Med School - to propose a simple model for transcription factor binding-unbinding in the nucleus to show that it may give rise to apparent anomalous diffusion. Here inference methods extract entire binding affinity distributions for the diffusing transcription factors, allowing us to precisely characterize their interactions with different components of the nuclear environment. From this analysis, we draw key mechanistic insight that goes beyond what is possible by simply fitting data to ``anomalous diffusion'' models.

  1. Inference of financial networks using the normalised mutual information rate

    Science.gov (United States)

    2018-01-01

    In this paper, we study data from financial markets, using the normalised Mutual Information Rate. We show how to use it to infer the underlying network structure of interrelations in the foreign currency exchange rates and stock indices of 15 currency areas. We first present the mathematical method and discuss its computational aspects, and apply it to artificial data from chaotic dynamics and to correlated normal-variates data. We then apply the method to infer the structure of the financial system from the time-series of currency exchange rates and stock indices. In particular, we study and reveal the interrelations among the various foreign currency exchange rates and stock indices in two separate networks, of which we also study their structural properties. Our results show that both inferred networks are small-world networks, sharing similar properties and having differences in terms of assortativity. Importantly, our work shows that global economies tend to connect with other economies world-wide, rather than creating small groups of local economies. Finally, the consistent interrelations depicted among the 15 currency areas are further supported by a discussion from the viewpoint of economics. PMID:29420644

  2. Sparse linear models: Variational approximate inference and Bayesian experimental design

    International Nuclear Information System (INIS)

    Seeger, Matthias W

    2009-01-01

    A wide range of problems such as signal reconstruction, denoising, source separation, feature selection, and graphical model search are addressed today by posterior maximization for linear models with sparsity-favouring prior distributions. The Bayesian posterior contains useful information far beyond its mode, which can be used to drive methods for sampling optimization (active learning), feature relevance ranking, or hyperparameter estimation, if only this representation of uncertainty can be approximated in a tractable manner. In this paper, we review recent results for variational sparse inference, and show that they share underlying computational primitives. We discuss how sampling optimization can be implemented as sequential Bayesian experimental design. While there has been tremendous recent activity to develop sparse estimation, little attendance has been given to sparse approximate inference. In this paper, we argue that many problems in practice, such as compressive sensing for real-world image reconstruction, are served much better by proper uncertainty approximations than by ever more aggressive sparse estimation algorithms. Moreover, since some variational inference methods have been given strong convex optimization characterizations recently, theoretical analysis may become possible, promising new insights into nonlinear experimental design.

  3. Sparse linear models: Variational approximate inference and Bayesian experimental design

    Energy Technology Data Exchange (ETDEWEB)

    Seeger, Matthias W [Saarland University and Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbruecken (Germany)

    2009-12-01

    A wide range of problems such as signal reconstruction, denoising, source separation, feature selection, and graphical model search are addressed today by posterior maximization for linear models with sparsity-favouring prior distributions. The Bayesian posterior contains useful information far beyond its mode, which can be used to drive methods for sampling optimization (active learning), feature relevance ranking, or hyperparameter estimation, if only this representation of uncertainty can be approximated in a tractable manner. In this paper, we review recent results for variational sparse inference, and show that they share underlying computational primitives. We discuss how sampling optimization can be implemented as sequential Bayesian experimental design. While there has been tremendous recent activity to develop sparse estimation, little attendance has been given to sparse approximate inference. In this paper, we argue that many problems in practice, such as compressive sensing for real-world image reconstruction, are served much better by proper uncertainty approximations than by ever more aggressive sparse estimation algorithms. Moreover, since some variational inference methods have been given strong convex optimization characterizations recently, theoretical analysis may become possible, promising new insights into nonlinear experimental design.

  4. Likelihood-Based Inference of B Cell Clonal Families.

    Directory of Open Access Journals (Sweden)

    Duncan K Ralph

    2016-10-01

    Full Text Available The human immune system depends on a highly diverse collection of antibody-making B cells. B cell receptor sequence diversity is generated by a random recombination process called "rearrangement" forming progenitor B cells, then a Darwinian process of lineage diversification and selection called "affinity maturation." The resulting receptors can be sequenced in high throughput for research and diagnostics. Such a collection of sequences contains a mixture of various lineages, each of which may be quite numerous, or may consist of only a single member. As a step to understanding the process and result of this diversification, one may wish to reconstruct lineage membership, i.e. to cluster sampled sequences according to which came from the same rearrangement events. We call this clustering problem "clonal family inference." In this paper we describe and validate a likelihood-based framework for clonal family inference based on a multi-hidden Markov Model (multi-HMM framework for B cell receptor sequences. We describe an agglomerative algorithm to find a maximum likelihood clustering, two approximate algorithms with various trade-offs of speed versus accuracy, and a third, fast algorithm for finding specific lineages. We show that under simulation these algorithms greatly improve upon existing clonal family inference methods, and that they also give significantly different clusters than previous methods when applied to two real data sets.

  5. Inference of financial networks using the normalised mutual information rate.

    Science.gov (United States)

    Goh, Yong Kheng; Hasim, Haslifah M; Antonopoulos, Chris G

    2018-01-01

    In this paper, we study data from financial markets, using the normalised Mutual Information Rate. We show how to use it to infer the underlying network structure of interrelations in the foreign currency exchange rates and stock indices of 15 currency areas. We first present the mathematical method and discuss its computational aspects, and apply it to artificial data from chaotic dynamics and to correlated normal-variates data. We then apply the method to infer the structure of the financial system from the time-series of currency exchange rates and stock indices. In particular, we study and reveal the interrelations among the various foreign currency exchange rates and stock indices in two separate networks, of which we also study their structural properties. Our results show that both inferred networks are small-world networks, sharing similar properties and having differences in terms of assortativity. Importantly, our work shows that global economies tend to connect with other economies world-wide, rather than creating small groups of local economies. Finally, the consistent interrelations depicted among the 15 currency areas are further supported by a discussion from the viewpoint of economics.

  6. A combinatorial perspective of the protein inference problem.

    Science.gov (United States)

    Yang, Chao; He, Zengyou; Yu, Weichuan

    2013-01-01

    In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.

  7. Algorithms for MDC-Based Multi-locus Phylogeny Inference

    Science.gov (United States)

    Yu, Yun; Warnow, Tandy; Nakhleh, Luay

    One of the criteria for inferring a species tree from a collection of gene trees, when gene tree incongruence is assumed to be due to incomplete lineage sorting (ILS), is minimize deep coalescence, or MDC. Exact algorithms for inferring the species tree from rooted, binary trees under MDC were recently introduced. Nevertheless, in phylogenetic analyses of biological data sets, estimated gene trees may differ from true gene trees, be incompletely resolved, and not necessarily rooted. In this paper, we propose new MDC formulations for the cases where the gene trees are unrooted/binary, rooted/non-binary, and unrooted/non-binary. Further, we prove structural theorems that allow us to extend the algorithms for the rooted/binary gene tree case to these cases in a straightforward manner. Finally, we study the performance of these methods in coalescent-based computer simulations.

  8. Inference and learning in sparse systems with multiple states

    International Nuclear Information System (INIS)

    Braunstein, A.; Ramezanpour, A.; Zhang, P.; Zecchina, R.

    2011-01-01

    We discuss how inference can be performed when data are sampled from the nonergodic phase of systems with multiple attractors. We take as a model system the finite connectivity Hopfield model in the memory phase and suggest a cavity method approach to reconstruct the couplings when the data are separately sampled from few attractor states. We also show how the inference results can be converted into a learning protocol for neural networks in which patterns are presented through weak external fields. The protocol is simple and fully local, and is able to store patterns with a finite overlap with the input patterns without ever reaching a spin-glass phase where all memories are lost.

  9. EEG Based Inference of Spatio-Temporal Brain Dynamics

    DEFF Research Database (Denmark)

    Hansen, Sofie Therese

    Electroencephalography (EEG) provides a measure of brain activity and has improved our understanding of the brain immensely. However, there is still much to be learned and the full potential of EEG is yet to be realized. In this thesis we suggest to improve the information gain of EEG using three...... different approaches; 1) by recovery of the EEG sources, 2) by representing and inferring the propagation path of EEG sources, and 3) by combining EEG with functional magnetic resonance imaging (fMRI). The common goal of the methods, and thus of this thesis, is to improve the spatial dimension of EEG...... recovery ability. The forward problem describes the propagation of neuronal activity in the brain to the EEG electrodes on the scalp. The geometry and conductivity of the head layers are normally required to model this path. We propose a framework for inferring forward models which is based on the EEG...

  10. Research designs and making causal inferences from health care studies.

    Science.gov (United States)

    Flannelly, Kevin J; Jankowski, Katherine R B

    2014-01-01

    This article summarizes the major types of research designs used in healthcare research, including experimental, quasi-experimental, and observational studies. Observational studies are divided into survey studies (descriptive and correlational studies), case-studies and analytic studies, the last of which are commonly used in epidemiology: case-control, retrospective cohort, and prospective cohort studies. Similarities and differences among the research designs are described and the relative strength of evidence they provide is discussed. Emphasis is placed on five criteria for drawing causal inferences that are derived from the writings of the philosopher John Stuart Mill, especially his methods or canons. The application of the criteria to experimentation is explained. Particular attention is given to the degree to which different designs meet the five criteria for making causal inferences. Examples of specific studies that have used various designs in chaplaincy research are provided.

  11. Qualitative reasoning for biological network inference from systematic perturbation experiments.

    Science.gov (United States)

    Badaloni, Silvana; Di Camillo, Barbara; Sambo, Francesco

    2012-01-01

    The systematic perturbation of the components of a biological system has been proven among the most informative experimental setups for the identification of causal relations between the components. In this paper, we present Systematic Perturbation-Qualitative Reasoning (SPQR), a novel Qualitative Reasoning approach to automate the interpretation of the results of systematic perturbation experiments. Our method is based on a qualitative abstraction of the experimental data: for each perturbation experiment, measured values of the observed variables are modeled as lower, equal or higher than the measurements in the wild type condition, when no perturbation is applied. The algorithm exploits a set of IF-THEN rules to infer causal relations between the variables, analyzing the patterns of propagation of the perturbation signals through the biological network, and is specifically designed to minimize the rate of false positives among the inferred relations. Tested on both simulated and real perturbation data, SPQR indeed exhibits a significantly higher precision than the state of the art.

  12. Vertically Integrated Seismological Analysis II : Inference

    Science.gov (United States)

    Arora, N. S.; Russell, S.; Sudderth, E.

    2009-12-01

    Methods for automatically associating detected waveform features with hypothesized seismic events, and localizing those events, are a critical component of efforts to verify the Comprehensive Test Ban Treaty (CTBT). As outlined in our companion abstract, we have developed a hierarchical model which views detection, association, and localization as an integrated probabilistic inference problem. In this abstract, we provide more details on the Markov chain Monte Carlo (MCMC) methods used to solve this inference task. MCMC generates samples from a posterior distribution π(x) over possible worlds x by defining a Markov chain whose states are the worlds x, and whose stationary distribution is π(x). In the Metropolis-Hastings (M-H) method, transitions in the Markov chain are constructed in two steps. First, given the current state x, a candidate next state x‧ is generated from a proposal distribution q(x‧ | x), which may be (more or less) arbitrary. Second, the transition to x‧ is not automatic, but occurs with an acceptance probability—α(x‧ | x) = min(1, π(x‧)q(x | x‧)/π(x)q(x‧ | x)). The seismic event model outlined in our companion abstract is quite similar to those used in multitarget tracking, for which MCMC has proved very effective. In this model, each world x is defined by a collection of events, a list of properties characterizing those events (times, locations, magnitudes, and types), and the association of each event to a set of observed detections. The target distribution π(x) = P(x | y), the posterior distribution over worlds x given the observed waveform data y at all stations. Proposal distributions then implement several types of moves between worlds. For example, birth moves create new events; death moves delete existing events; split moves partition the detections for an event into two new events; merge moves combine event pairs; swap moves modify the properties and assocations for pairs of events. Importantly, the rules for

  13. The confounding effect of population structure on bayesian skyline plot inferences of demographic history

    DEFF Research Database (Denmark)

    Heller, Rasmus; Chikhi, Lounes; Siegismund, Hans

    2013-01-01

    Many coalescent-based methods aiming to infer the demographic history of populations assume a single, isolated and panmictic population (i.e. a Wright-Fisher model). While this assumption may be reasonable under many conditions, several recent studies have shown that the results can be misleading...... when it is violated. Among the most widely applied demographic inference methods are Bayesian skyline plots (BSPs), which are used across a range of biological fields. Violations of the panmixia assumption are to be expected in many biological systems, but the consequences for skyline plot inferences...... the best scheme for inferring demographic change over a typical time scale. Analyses of data from a structured African buffalo population demonstrate how BSP results can be strengthened by simulations. We recommend that sample selection should be carefully considered in relation to population structure...

  14. Statistical inference for template aging

    Science.gov (United States)

    Schuckers, Michael E.

    2006-04-01

    A change in classification error rates for a biometric device is often referred to as template aging. Here we offer two methods for determining whether the effect of time is statistically significant. The first of these is the use of a generalized linear model to determine if these error rates change linearly over time. This approach generalizes previous work assessing the impact of covariates using generalized linear models. The second approach uses of likelihood ratio tests methodology. The focus here is on statistical methods for estimation not the underlying cause of the change in error rates over time. These methodologies are applied to data from the National Institutes of Standards and Technology Biometric Score Set Release 1. The results of these applications are discussed.

  15. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  16. Statistical inference for Cox processes

    DEFF Research Database (Denmark)

    Møller, Jesper; Waagepetersen, Rasmus Plenge

    2002-01-01

    Research has generated a number of advances in methods for spatial cluster modelling in recent years, particularly in the area of Bayesian cluster modelling. Along with these advances has come an explosion of interest in the potential applications of this work, especially in epidemiology and genome...... research.   In one integrated volume, this book reviews the state-of-the-art in spatial clustering and spatial cluster modelling, bringing together research and applications previously scattered throughout the literature. It begins with an overview of the field, then presents a series of chapters...... that illuminate the nature and purpose of cluster modelling within different application areas, including astrophysics, epidemiology, ecology, and imaging. The focus then shifts to methods, with discussions on point and object process modelling, perfect sampling of cluster processes, partitioning in space...

  17. Goal inferences about robot behavior : goal inferences and human response behaviors

    NARCIS (Netherlands)

    Broers, H.A.T.; Ham, J.R.C.; Broeders, R.; De Silva, P.; Okada, M.

    2014-01-01

    This explorative research focused on the goal inferences human observers draw based on a robot's behavior, and the extent to which those inferences predict people's behavior in response to that robot. Results show that different robot behaviors cause different response behavior from people.

  18. Models and Inference for Multivariate Spatial Extremes

    KAUST Repository

    Vettori, Sabrina

    2017-12-07

    The development of flexible and interpretable statistical methods is necessary in order to provide appropriate risk assessment measures for extreme events and natural disasters. In this thesis, we address this challenge by contributing to the developing research field of Extreme-Value Theory. We initially study the performance of existing parametric and non-parametric estimators of extremal dependence for multivariate maxima. As the dimensionality increases, non-parametric estimators are more flexible than parametric methods but present some loss in efficiency that we quantify under various scenarios. We introduce a statistical tool which imposes the required shape constraints on non-parametric estimators in high dimensions, significantly improving their performance. Furthermore, by embedding the tree-based max-stable nested logistic distribution in the Bayesian framework, we develop a statistical algorithm that identifies the most likely tree structures representing the data\\'s extremal dependence using the reversible jump Monte Carlo Markov Chain method. A mixture of these trees is then used for uncertainty assessment in prediction through Bayesian model averaging. The computational complexity of full likelihood inference is significantly decreased by deriving a recursive formula for the nested logistic model likelihood. The algorithm performance is verified through simulation experiments which also compare different likelihood procedures. Finally, we extend the nested logistic representation to the spatial framework in order to jointly model multivariate variables collected across a spatial region. This situation emerges often in environmental applications but is not often considered in the current literature. Simulation experiments show that the new class of multivariate max-stable processes is able to detect both the cross and inner spatial dependence of a number of extreme variables at a relatively low computational cost, thanks to its Bayesian hierarchical

  19. Using Alien Coins to Test Whether Simple Inference Is Bayesian

    Science.gov (United States)

    Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.

    2016-01-01

    Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…

  20. A model independent safeguard against background mismodeling for statistical inference

    Energy Technology Data Exchange (ETDEWEB)

    Priel, Nadav; Landsman, Hagar; Manfredini, Alessandro; Budnik, Ranny [Department of Particle Physics and Astrophysics, Weizmann Institute of Science, Herzl St. 234, Rehovot (Israel); Rauch, Ludwig, E-mail: nadav.priel@weizmann.ac.il, E-mail: rauch@mpi-hd.mpg.de, E-mail: hagar.landsman@weizmann.ac.il, E-mail: alessandro.manfredini@weizmann.ac.il, E-mail: ran.budnik@weizmann.ac.il [Teilchen- und Astroteilchenphysik, Max-Planck-Institut für Kernphysik, Saupfercheckweg 1, 69117 Heidelberg (Germany)

    2017-05-01

    We propose a safeguard procedure for statistical inference that provides universal protection against mismodeling of the background. The method quantifies and incorporates the signal-like residuals of the background model into the likelihood function, using information available in a calibration dataset. This prevents possible false discovery claims that may arise through unknown mismodeling, and corrects the bias in limit setting created by overestimated or underestimated background. We demonstrate how the method removes the bias created by an incomplete background model using three realistic case studies.

  1. Inferring climate variability from skewed proxy records

    Science.gov (United States)

    Emile-Geay, J.; Tingley, M.

    2013-12-01

    Many paleoclimate analyses assume a linear relationship between the proxy and the target climate variable, and that both the climate quantity and the errors follow normal distributions. An ever-increasing number of proxy records, however, are better modeled using distributions that are heavy-tailed, skewed, or otherwise non-normal, on account of the proxies reflecting non-normally distributed climate variables, or having non-linear relationships with a normally distributed climate variable. The analysis of such proxies requires a different set of tools, and this work serves as a cautionary tale on the danger of making conclusions about the underlying climate from applications of classic statistical procedures to heavily skewed proxy records. Inspired by runoff proxies, we consider an idealized proxy characterized by a nonlinear, thresholded relationship with climate, and describe three approaches to using such a record to infer past climate: (i) applying standard methods commonly used in the paleoclimate literature, without considering the non-linearities inherent to the proxy record; (ii) applying a power transform prior to using these standard methods; (iii) constructing a Bayesian model to invert the mechanistic relationship between the climate and the proxy. We find that neglecting the skewness in the proxy leads to erroneous conclusions and often exaggerates changes in climate variability between different time intervals. In contrast, an explicit treatment of the skewness, using either power transforms or a Bayesian inversion of the mechanistic model for the proxy, yields significantly better estimates of past climate variations. We apply these insights in two paleoclimate settings: (1) a classical sedimentary record from Laguna Pallcacocha, Ecuador (Moy et al., 2002). Our results agree with the qualitative aspects of previous analyses of this record, but quantitative departures are evident and hold implications for how such records are interpreted, and

  2. Segmentation, Inference and Classification of Partially Overlapping Nanoparticles

    KAUST Repository

    Chiwoo Park,

    2013-03-01

    This paper presents a method that enables automated morphology analysis of partially overlapping nanoparticles in electron micrographs. In the undertaking of morphology analysis, three tasks appear necessary: separate individual particles from an agglomerate of overlapping nano-objects; infer the particle\\'s missing contours; and ultimately, classify the particles by shape based on their complete contours. Our specific method adopts a two-stage approach: the first stage executes the task of particle separation, and the second stage conducts simultaneously the tasks of contour inference and shape classification. For the first stage, a modified ultimate erosion process is developed for decomposing a mixture of particles into markers, and then, an edge-to-marker association method is proposed to identify the set of evidences that eventually delineate individual objects. We also provided theoretical justification regarding the separation capability of the first stage. In the second stage, the set of evidences become inputs to a Gaussian mixture model on B-splines, the solution of which leads to the joint learning of the missing contour and the particle shape. Using twelve real electron micrographs of overlapping nanoparticles, we compare the proposed method with seven state-of-the-art methods. The results show the superiority of the proposed method in terms of particle recognition rate.

  3. Explanatory Preferences Shape Learning and Inference.

    Science.gov (United States)

    Lombrozo, Tania

    2016-10-01

    Explanations play an important role in learning and inference. People often learn by seeking explanations, and they assess the viability of hypotheses by considering how well they explain the data. An emerging body of work reveals that both children and adults have strong and systematic intuitions about what constitutes a good explanation, and that these explanatory preferences have a systematic impact on explanation-based processes. In particular, people favor explanations that are simple and broad, with the consequence that engaging in explanation can shape learning and inference by leading people to seek patterns and favor hypotheses that support broad and simple explanations. Given the prevalence of explanation in everyday cognition, understanding explanation is therefore crucial to understanding learning and inference. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. A Learning Algorithm for Multimodal Grammar Inference.

    Science.gov (United States)

    D'Ulizia, A; Ferri, F; Grifoni, P

    2011-12-01

    The high costs of development and maintenance of multimodal grammars in integrating and understanding input in multimodal interfaces lead to the investigation of novel algorithmic solutions in automating grammar generation and in updating processes. Many algorithms for context-free grammar inference have been developed in the natural language processing literature. An extension of these algorithms toward the inference of multimodal grammars is necessary for multimodal input processing. In this paper, we propose a novel grammar inference mechanism that allows us to learn a multimodal grammar from its positive samples of multimodal sentences. The algorithm first generates the multimodal grammar that is able to parse the positive samples of sentences and, afterward, makes use of two learning operators and the minimum description length metrics in improving the grammar description and in avoiding the over-generalization problem. The experimental results highlight the acceptable performances of the algorithm proposed in this paper since it has a very high probability of parsing valid sentences.

  5. Examples in parametric inference with R

    CERN Document Server

    Dixit, Ulhas Jayram

    2016-01-01

    This book discusses examples in parametric inference with R. Combining basic theory with modern approaches, it presents the latest developments and trends in statistical inference for students who do not have an advanced mathematical and statistical background. The topics discussed in the book are fundamental and common to many fields of statistical inference and thus serve as a point of departure for in-depth study. The book is divided into eight chapters: Chapter 1 provides an overview of topics on sufficiency and completeness, while Chapter 2 briefly discusses unbiased estimation. Chapter 3 focuses on the study of moments and maximum likelihood estimators, and Chapter 4 presents bounds for the variance. In Chapter 5, topics on consistent estimator are discussed. Chapter 6 discusses Bayes, while Chapter 7 studies some more powerful tests. Lastly, Chapter 8 examines unbiased and other tests. Senior undergraduate and graduate students in statistics and mathematics, and those who have taken an introductory cou...

  6. Statistical inference based on divergence measures

    CERN Document Server

    Pardo, Leandro

    2005-01-01

    The idea of using functionals of Information Theory, such as entropies or divergences, in statistical inference is not new. However, in spite of the fact that divergence statistics have become a very good alternative to the classical likelihood ratio test and the Pearson-type statistic in discrete models, many statisticians remain unaware of this powerful approach.Statistical Inference Based on Divergence Measures explores classical problems of statistical inference, such as estimation and hypothesis testing, on the basis of measures of entropy and divergence. The first two chapters form an overview, from a statistical perspective, of the most important measures of entropy and divergence and study their properties. The author then examines the statistical analysis of discrete multivariate data with emphasis is on problems in contingency tables and loglinear models using phi-divergence test statistics as well as minimum phi-divergence estimators. The final chapter looks at testing in general populations, prese...

  7. Simple simulation of diffusion bridges with application to likelihood inference for diffusions

    DEFF Research Database (Denmark)

    Bladt, Mogens; Sørensen, Michael

    2014-01-01

    the accuracy and efficiency of the approximate method and compare it to exact simulation methods. In the study, our method provides a very good approximation to the distribution of a diffusion bridge for bridges that are likely to occur in applications to statistical inference. To illustrate the usefulness......With a view to statistical inference for discretely observed diffusion models, we propose simple methods of simulating diffusion bridges, approximately and exactly. Diffusion bridge simulation plays a fundamental role in likelihood and Bayesian inference for diffusion processes. First a simple......-dimensional diffusions and is applicable to all one-dimensional diffusion processes with finite speed-measure. One advantage of the new approach is that simple simulation methods like the Milstein scheme can be applied to bridge simulation. Another advantage over previous bridge simulation methods is that the proposed...

  8. On the Hardness of Topology Inference

    Science.gov (United States)

    Acharya, H. B.; Gouda, M. G.

    Many systems require information about the topology of networks on the Internet, for purposes like management, efficiency, testing of new protocols and so on. However, ISPs usually do not share the actual topology maps with outsiders; thus, in order to obtain the topology of a network on the Internet, a system must reconstruct it from publicly observable data. The standard method employs traceroute to obtain paths between nodes; next, a topology is generated such that the observed paths occur in the graph. However, traceroute has the problem that some routers refuse to reveal their addresses, and appear as anonymous nodes in traces. Previous research on the problem of topology inference with anonymous nodes has demonstrated that it is at best NP-complete. In this paper, we improve upon this result. In our previous research, we showed that in the special case where nodes may be anonymous in some traces but not in all traces (so all node identifiers are known), there exist trace sets that are generable from multiple topologies. This paper extends our theory of network tracing to the general case (with strictly anonymous nodes), and shows that the problem of computing the network that generated a trace set, given the trace set, has no general solution. The weak version of the problem, which allows an algorithm to output a "small" set of networks- any one of which is the correct one- is also not solvable. Any algorithm guaranteed to output the correct topology outputs at least an exponential number of networks. Our results are surprisingly robust: they hold even when the network is known to have exactly two anonymous nodes, and every node as well as every edge in the network is guaranteed to occur in some trace. On the basis of this result, we suggest that exact reconstruction of network topology requires more powerful tools than traceroute.

  9. Statistical Inference for Data Adaptive Target Parameters.

    Science.gov (United States)

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  10. Inferring modules from human protein interactome classes

    Directory of Open Access Journals (Sweden)

    Chaurasia Gautam

    2010-07-01

    Full Text Available Abstract Background The integration of protein-protein interaction networks derived from high-throughput screening approaches and complementary sources is a key topic in systems biology. Although integration of protein interaction data is conventionally performed, the effects of this procedure on the result of network analyses has not been examined yet. In particular, in order to optimize the fusion of heterogeneous interaction datasets, it is crucial to consider not only their degree of coverage and accuracy, but also their mutual dependencies and additional salient features. Results We examined this issue based on the analysis of modules detected by network clustering methods applied to both integrated and individual (disaggregated data sources, which we call interactome classes. Due to class diversity, we deal with variable dependencies of data features arising from structural specificities and biases, but also from possible overlaps. Since highly connected regions of the human interactome may point to potential protein complexes, we have focused on the concept of modularity, and elucidated the detection power of module extraction algorithms by independent validations based on GO, MIPS and KEGG. From the combination of protein interactions with gene expressions, a confidence scoring scheme has been proposed before proceeding via GO with further classification in permanent and transient modules. Conclusions Disaggregated interactomes are shown to be informative for inferring modularity, thus contributing to perform an effective integrative analysis. Validation of the extracted modules by multiple annotation allows for the assessment of confidence measures assigned to the modules in a protein pathway context. Notably, the proposed multilayer confidence scheme can be used for network calibration by enabling a transition from unweighted to weighted interactomes based on biological evidence.

  11. Approximate Inference for Wireless Communications

    DEFF Research Database (Denmark)

    Hansen, Morten

    This thesis investigates signal processing techniques for wireless communication receivers. The aim is to improve the performance or reduce the computationally complexity of these, where the primary focus area is cellular systems such as Global System for Mobile communications (GSM) (and extensions...... to the optimal one, which usually requires an unacceptable high complexity. Some of the treated approximate methods are based on QL-factorization of the channel matrix. In the work presented in this thesis it is proven how the QL-factorization of frequency-selective channels asymptotically provides the minimum...

  12. Network inference from functional experimental data (Conference Presentation)

    Science.gov (United States)

    Desrosiers, Patrick; Labrecque, Simon; Tremblay, Maxime; Bélanger, Mathieu; De Dorlodot, Bertrand; Côté, Daniel C.

    2016-03-01

    Functional connectivity maps of neuronal networks are critical tools to understand how neurons form circuits, how information is encoded and processed by neurons, how memory is shaped, and how these basic processes are altered under pathological conditions. Current light microscopy allows to observe calcium or electrical activity of thousands of neurons simultaneously, yet assessing comprehensive connectivity maps directly from such data remains a non-trivial analytical task. There exist simple statistical methods, such as cross-correlation and Granger causality, but they only detect linear interactions between neurons. Other more involved inference methods inspired by information theory, such as mutual information and transfer entropy, identify more accurately connections between neurons but also require more computational resources. We carried out a comparative study of common connectivity inference methods. The relative accuracy and computational cost of each method was determined via simulated fluorescence traces generated with realistic computational models of interacting neurons in networks of different topologies (clustered or non-clustered) and sizes (10-1000 neurons). To bridge the computational and experimental works, we observed the intracellular calcium activity of live hippocampal neuronal cultures infected with the fluorescent calcium marker GCaMP6f. The spontaneous activity of the networks, consisting of 50-100 neurons per field of view, was recorded from 20 to 50 Hz on a microscope controlled by a homemade software. We implemented all connectivity inference methods in the software, which rapidly loads calcium fluorescence movies, segments the images, extracts the fluorescence traces, and assesses the functional connections (with strengths and directions) between each pair of neurons. We used this software to assess, in real time, the functional connectivity from real calcium imaging data in basal conditions, under plasticity protocols, and epileptic

  13. A neuro-fuzzy inference system for sensor monitoring

    International Nuclear Information System (INIS)

    Na, Man Gyun

    2001-01-01

    A neuro-fuzzy inference system combined with the wavelet denoising, PCA (principal component analysis) and SPRT (sequential probability ratio test) methods has been developed to monitor the relevant sensor using the information of other sensors. The paramters of the neuro-fuzzy inference system which estimates the relevant sensor signal are optimized by a genetic algorithm and a least-squares algorithm. The wavelet denoising technique was applied to remove noise components in input signals into the neuro-fuzzy system. By reducing the dimension of an input space into the neuro-fuzzy system without losing a significant amount of information, the PCA was used to reduce the time necessary to train the neuro-fuzzy system, simplify the structure of the neuro-fuzzy inference system and also, make easy the selection of the input signals into the neuro-fuzzy system. By using the residual signals between the estimated signals and the measured signals, the SPRT is applied to detect whether the sensors are degraded or not. The proposed sensor-monitoring algorithm was verified through applications to the pressurizer water level, the pressurizer pressure, and the hot-leg temperature sensors in pressurized water reactors

  14. STRIDE: Species Tree Root Inference from Gene Duplication Events.

    Science.gov (United States)

    Emms, David M; Kelly, Steven

    2017-12-01

    The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Inference algorithms and learning theory for Bayesian sparse factor analysis

    International Nuclear Information System (INIS)

    Rattray, Magnus; Sharp, Kevin; Stegle, Oliver; Winn, John

    2009-01-01

    Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.

  16. Visual recognition and inference using dynamic overcomplete sparse learning.

    Science.gov (United States)

    Murray, Joseph F; Kreutz-Delgado, Kenneth

    2007-09-01

    We present a hierarchical architecture and learning algorithm for visual recognition and other visual inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Using properties of biological vision for guidance, we posit a stochastic generative world model and from it develop a simplified world model (SWM) based on a tractable variational approximation that is designed to enforce sparse coding. Recent developments in computational methods for learning overcomplete representations (Lewicki & Sejnowski, 2000; Teh, Welling, Osindero, & Hinton, 2003) suggest that overcompleteness can be useful for visual tasks, and we use an overcomplete dictionary learning algorithm (Kreutz-Delgado, et al., 2003) as a preprocessing stage to produce accurate, sparse codings of images. Inference is performed by constructing a dynamic multilayer network with feedforward, feedback, and lateral connections, which is trained to approximate the SWM. Learning is done with a variant of the back-propagation-through-time algorithm, which encourages convergence to desired states within a fixed number of iterations. Vision tasks require large networks, and to make learning efficient, we take advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Experiments on a set of rotated objects demonstrate various types of visual inference and show that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.

  17. Inference algorithms and learning theory for Bayesian sparse factor analysis

    Energy Technology Data Exchange (ETDEWEB)

    Rattray, Magnus; Sharp, Kevin [School of Computer Science, University of Manchester, Manchester M13 9PL (United Kingdom); Stegle, Oliver [Max-Planck-Institute for Biological Cybernetics, Tuebingen (Germany); Winn, John, E-mail: magnus.rattray@manchester.ac.u [Microsoft Research Cambridge, Roger Needham Building, Cambridge, CB3 0FB (United Kingdom)

    2009-12-01

    Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.

  18. Bayesian inference on genetic merit under uncertain paternity

    Directory of Open Access Journals (Sweden)

    Tempelman Robert J

    2003-09-01

    Full Text Available Abstract A hierarchical animal model was developed for inference on genetic merit of livestock with uncertain paternity. Fully conditional posterior distributions for fixed and genetic effects, variance components, sire assignments and their probabilities are derived to facilitate a Bayesian inference strategy using MCMC methods. We compared this model to a model based on the Henderson average numerator relationship (ANRM in a simulation study with 10 replicated datasets generated for each of two traits. Trait 1 had a medium heritability (h2 for each of direct and maternal genetic effects whereas Trait 2 had a high h2 attributable only to direct effects. The average posterior probabilities inferred on the true sire were between 1 and 10% larger than the corresponding priors (the inverse of the number of candidate sires in a mating pasture for Trait 1 and between 4 and 13% larger than the corresponding priors for Trait 2. The predicted additive and maternal genetic effects were very similar using both models; however, model choice criteria (Pseudo Bayes Factor and Deviance Information Criterion decisively favored the proposed hierarchical model over the ANRM model.

  19. A unified framework for haplotype inference in nuclear families.

    Science.gov (United States)

    Iliadis, Alexandros; Anastassiou, Dimitris; Wang, Xiaodong

    2012-07-01

    Many large genome-wide association studies include nuclear families with more than one child (trio families), allowing for analysis of differences between siblings (sib pair analysis). Statistical power can be increased when haplotypes are used instead of genotypes. Currently, haplotype inference in families with more than one child can be performed either using the familial information or statistical information derived from the population samples but not both. Building on our recently proposed tree-based deterministic framework (TDS) for trio families, we augment its applicability to general nuclear families. We impose a minimum recombinant approach locally and independently on each multiple children family, while resorting to the population-derived information to solve the remaining ambiguities. Thus our framework incorporates all available information (familial and population) in a given study. We demonstrate that using all the constraints in our approach we can have gains in the accuracy as opposed to breaking the multiple children families to separate trios and resorting to a trio inference algorithm or phasing each family in isolation. We believe that our proposed framework could be the method of choice for haplotype inference in studies that include nuclear families with multiple children. Our software (tds2.0) is downloadable from www.ee.columbia.edu/∼anastas/tds. © 2012 The Authors Annals of Human Genetics © 2012 Blackwell Publishing Ltd/University College London.

  20. STRATEGIES IN SEISMIC INFERENCE OF SUPERGRANULAR FLOWS ON THE SUN

    Energy Technology Data Exchange (ETDEWEB)

    Bhattacharya, Jishnu; Hanasoge, Shravan M. [Department of Astronomy and Astrophysics, Tata Institute of Fundamental Research, Mumbai-400005 (India)

    2016-08-01

    Observations of the solar surface reveal the presence of flows with length scales of around 35 Mm, commonly referred to as supergranules. Inferring the subsurface flow profile of supergranules from measurements of the surface and photospheric wavefield is an important challenge faced by helioseismology. Traditionally, the inverse problem has been approached by studying the linear response of seismic waves in a horizontally translationally invariant background to the presence of the supergranule; following an iterative approach that does not depend on horizontal translational invariance might perform better, since the misfit can be analyzed post iterations. In this work, we construct synthetic observations using a reference supergranule and invert for the flow profile using surface measurements of travel times of waves belonging to modal ridges f (surface gravity) and p {sub 1} through p {sub 7} (acoustic). We study the extent to which individual modes and their combinations contribute to infer the flow. We show that this method of nonlinear iterative inversion tends to underestimate the flow velocities, as well as inferring a shallower flow profile, with significant deviations from the reference supergranule near the surface. We carry out a similar analysis for a sound-speed perturbation and find that analogous near-surface deviations persist, although the iterations converge faster and more accurately. We conclude that a better approach to inversion would be to expand the supergranule profile in an appropriate basis, thereby reducing the number of parameters being inverted for and appropriately regularizing them.

  1. Matrix dimensions bias demographic inferences: implications for comparative plant demography.

    Science.gov (United States)

    Salguero-Gómez, Roberto; Plotkin, Joshua B

    2010-12-01

    While the wealth of projection matrices in plant demography permits comparative studies, variation in matrix dimensions complicates interspecific comparisons. Collapsing matrices to a common dimension may facilitate such comparisons but may also bias the inferred demographic parameters. Here we examine how matrix dimension affects inferred demographic elasticities and how different collapsing criteria perform. We analyzed 13 x 13 matrices representing nine plant species, collapsing these matrices (i) into even 7 x 7, 5 x 5, 4 x 4, and 3 x 3 matrices and (ii) into 5 x 5 matrices using different criteria. Stasis and fecundity elasticities increased when matrix dimension was reduced, whereas those of progression and retrogression decreased. We suggest a collapsing criterion that minimizes dissimilarities between the original- and collapsed-matrix elasticities and apply it to 66 plant species to study how life span and growth form influence the relationship between matrix dimension and elasticities. Our analysis demonstrates that (i) projection matrix dimension has significant effects on inferred demographic parameters, (ii) there are better-performing methods than previously suggested for standardizing matrix dimension, and (iii) herbaceous perennial projection matrices are particularly sensitive to changes in matrix dimensionality. For comparative demographic studies, we recommend normalizing matrices to a common dimension by collapsing higher classes and leaving the first few classes unaltered.

  2. Bootstrap-based Support of HGT Inferred by Maximum Parsimony

    Directory of Open Access Journals (Sweden)

    Nakhleh Luay

    2010-05-01

    Full Text Available Abstract Background Maximum parsimony is one of the most commonly used criteria for reconstructing phylogenetic trees. Recently, Nakhleh and co-workers extended this criterion to enable reconstruction of phylogenetic networks, and demonstrated its application to detecting reticulate evolutionary relationships. However, one of the major problems with this extension has been that it favors more complex evolutionary relationships over simpler ones, thus having the potential for overestimating the amount of reticulation in the data. An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold. Results In this paper, we address this problem in a more systematic way, by proposing a nonparametric bootstrap-based measure of support of inferred reticulation events, and using it to determine the number of those events, as well as their placements. A number of samples is generated from the given sequence alignment, and reticulation events are inferred based on each sample. Finally, the support of each reticulation event is quantified based on the inferences made over all samples. Conclusions We have implemented our method in the NEPAL software tool (available publicly at http://bioinfo.cs.rice.edu/, and studied its performance on both biological and simulated data sets. While our studies show very promising results, they also highlight issues that are inherently challenging when applying the maximum parsimony criterion to detect reticulate evolution.

  3. Bootstrap-based support of HGT inferred by maximum parsimony.

    Science.gov (United States)

    Park, Hyun Jung; Jin, Guohua; Nakhleh, Luay

    2010-05-05

    Maximum parsimony is one of the most commonly used criteria for reconstructing phylogenetic trees. Recently, Nakhleh and co-workers extended this criterion to enable reconstruction of phylogenetic networks, and demonstrated its application to detecting reticulate evolutionary relationships. However, one of the major problems with this extension has been that it favors more complex evolutionary relationships over simpler ones, thus having the potential for overestimating the amount of reticulation in the data. An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold. In this paper, we address this problem in a more systematic way, by proposing a nonparametric bootstrap-based measure of support of inferred reticulation events, and using it to determine the number of those events, as well as their placements. A number of samples is generated from the given sequence alignment, and reticulation events are inferred based on each sample. Finally, the support of each reticulation event is quantified based on the inferences made over all samples. We have implemented our method in the NEPAL software tool (available publicly at http://bioinfo.cs.rice.edu/), and studied its performance on both biological and simulated data sets. While our studies show very promising results, they also highlight issues that are inherently challenging when applying the maximum parsimony criterion to detect reticulate evolution.

  4. Evaluation of the Theoretical Geothermal Potential of Inferred Geothermal Reservoirs within the Vicano–Cimino and the Sabatini Volcanic Districts (Central Italy by the Application of the Volume Method

    Directory of Open Access Journals (Sweden)

    Daniele Cinti

    2018-01-01

    Full Text Available The evaluation of the theoretical geothermal potential of identified unexploited hydrothermal reservoirs within the Vicano–Cimino and Sabatini volcanic districts (Latium region, Italy has been made on the basis of a revised version of the classical volume method. This method is based on the distribution of the partial pressure of CO2 (pCO2 in shallow and deep aquifers to delimit areas of geothermal interest, according to the hypothesis that zones of high CO2 flux, either from soil degassing and dissolved into aquifers, are spatially related to deep hydrothermal reservoirs. On the whole, 664 fluid discharges (cold waters, thermal waters, and bubbling pools have been collected from shallow and deep aquifers in the Vicano–Cimino Volcanic District and the Sabatini Volcanic District for chemical and isotopic composition, in an area of approximately 2800 km2. From this large hydro-geochemical dataset the pCO2 values have been computed and then processed to obtain a contour map of its spatial distribution by using geostatistical techniques (kriging. The map of pCO2 has been used to draw up the boundaries of potentially exploitable geothermal systems within the two volcanic districts, corresponding to the areas where endogenous CO2 raise up to the surface from the deep hydrothermal reservoirs. The overall estimated potential productivities and theoretical minimum and maximum thermal power of the two volcanic districts are of about 45 × 103 t/h and 3681–5594 MWt, respectively. This makes the Vicano–Cimino Volcanic District and the Sabatini Volcanic District very suitable for both direct and indirect exploitation of the geothermal resources, in view of the target to reduce electricity generation from conventional and poorly sustainable energy sources.

  5. Improved Inference of Heteroscedastic Fixed Effects Models

    Directory of Open Access Journals (Sweden)

    Afshan Saeed

    2016-12-01

    Full Text Available Heteroscedasticity is a stern problem that distorts estimation and testing of panel data model (PDM. Arellano (1987 proposed the White (1980 estimator for PDM with heteroscedastic errors but it provides erroneous inference for the data sets including high leverage points. In this paper, our attempt is to improve heteroscedastic consistent covariance matrix estimator (HCCME for panel dataset with high leverage points. To draw robust inference for the PDM, our focus is to improve kernel bootstrap estimators, proposed by Racine and MacKinnon (2007. The Monte Carlo scheme is used for assertion of the results.

  6. Likelihood inference for unions of interacting discs

    DEFF Research Database (Denmark)

    Møller, Jesper; Helisova, K.

    2010-01-01

    This is probably the first paper which discusses likelihood inference for a random set using a germ-grain model, where the individual grains are unobservable, edge effects occur and other complications appear. We consider the case where the grains form a disc process modelled by a marked point...... process, where the germs are the centres and the marks are the associated radii of the discs. We propose to use a recent parametric class of interacting disc process models, where the minimal sufficient statistic depends on various geometric properties of the random set, and the density is specified......-based maximum likelihood inference and the effect of specifying different reference Poisson models....

  7. Kernel learning at the first level of inference.

    Science.gov (United States)

    Cawley, Gavin C; Talbot, Nicola L C

    2014-05-01

    Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e. parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Bayesian Inference for Functional Dynamics Exploring in fMRI Data

    Directory of Open Access Journals (Sweden)

    Xuan Guo

    2016-01-01

    Full Text Available This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM, Bayesian Connectivity Change Point Model (BCCPM, and Dynamic Bayesian Variable Partition Model (DBVPM, and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.

  9. Bayesian pedigree inference with small numbers of single nucleotide polymorphisms via a factor-graph representation.

    Science.gov (United States)

    Anderson, Eric C; Ng, Thomas C

    2016-02-01

    We develop a computational framework for addressing pedigree inference problems using small numbers (80-400) of single nucleotide polymorphisms (SNPs). Our approach relaxes the assumptions, which are commonly made, that sampling is complete with respect to the pedigree and that there is no genotyping error. It relies on representing the inferred pedigree as a factor graph and invoking the Sum-Product algorithm to compute and store quantities that allow the joint probability of the data to be rapidly computed under a large class of rearrangements of the pedigree structure. This allows efficient MCMC sampling over the space of pedigrees, and, hence, Bayesian inference of pedigree structure. In this paper we restrict ourselves to inference of pedigrees without loops using SNPs assumed to be unlinked. We present the methodology in general for multigenerational inference, and we illustrate the method by applying it to the inference of full sibling groups in a large sample (n=1157) of Chinook salmon typed at 95 SNPs. The results show that our method provides a better point estimate and estimate of uncertainty than the currently best-available maximum-likelihood sibling reconstruction method. Extensions of this work to more complex scenarios are briefly discussed. Published by Elsevier Inc.

  10. Introductive remarks on causal inference

    Directory of Open Access Journals (Sweden)

    Silvana A. Romio

    2013-05-01

    Full Text Available One of the more challenging issues in epidemiological research is being able to provide an unbiased estimate of the causal exposure-disease effect, to assess the possible etiological mechanisms and the implication for public health. A major source of bias is confounding, which can spuriously create or mask the causal relationship. In the last ten years, methodological research has been developed to better de_ne the concept of causation in epidemiology and some important achievements have resulted in new statistical models. In this review, we aim to show how a technique the well known by statisticians, i.e. standardization, can be seen as a method to estimate causal e_ects, equivalent under certain conditions to the inverse probability treatment weight procedure.

  11. Scientific inference learning from data

    CERN Document Server

    Vaughan, Simon

    2013-01-01

    Providing the knowledge and practical experience to begin analysing scientific data, this book is ideal for physical sciences students wishing to improve their data handling skills. The book focuses on explaining and developing the practice and understanding of basic statistical analysis, concentrating on a few core ideas, such as the visual display of information, modelling using the likelihood function, and simulating random data. Key concepts are developed through a combination of graphical explanations, worked examples, example computer code and case studies using real data. Students will develop an understanding of the ideas behind statistical methods and gain experience in applying them in practice. Further resources are available at www.cambridge.org/9781107607590, including data files for the case studies so students can practise analysing data, and exercises to test students' understanding.

  12. Statistical inference for noisy nonlinear ecological dynamic systems.

    Science.gov (United States)

    Wood, Simon N

    2010-08-26

    Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.

  13. Principal stratification in causal inference.

    Science.gov (United States)

    Frangakis, Constantine E; Rubin, Donald B

    2002-03-01

    Many scientific problems require that treatment comparisons be adjusted for posttreatment variables, but the estimands underlying standard methods are not causal effects. To address this deficiency, we propose a general framework for comparing treatments adjusting for posttreatment variables that yields principal effects based on principal stratification. Principal stratification with respect to a posttreatment variable is a cross-classification of subjects defined by the joint potential values of that posttreatment variable tinder each of the treatments being compared. Principal effects are causal effects within a principal stratum. The key property of principal strata is that they are not affected by treatment assignment and therefore can be used just as any pretreatment covariate. such as age category. As a result, the central property of our principal effects is that they are always causal effects and do not suffer from the complications of standard posttreatment-adjusted estimands. We discuss briefly that such principal causal effects are the link between three recent applications with adjustment for posttreatment variables: (i) treatment noncompliance, (ii) missing outcomes (dropout) following treatment noncompliance. and (iii) censoring by death. We then attack the problem of surrogate or biomarker endpoints, where we show, using principal causal effects, that all current definitions of surrogacy, even when perfectly true, do not generally have the desired interpretation as causal effects of treatment on outcome. We go on to forrmulate estimands based on principal stratification and principal causal effects and show their superiority.

  14. Model averaging, optimal inference and habit formation

    Directory of Open Access Journals (Sweden)

    Thomas H B FitzGerald

    2014-06-01

    Full Text Available Postulating that the brain performs approximate Bayesian inference generates principled and empirically testable models of neuronal function – the subject of much current interest in neuroscience and related disciplines. Current formulations address inference and learning under some assumed and particular model. In reality, organisms are often faced with an additional challenge – that of determining which model or models of their environment are the best for guiding behaviour. Bayesian model averaging – which says that an agent should weight the predictions of different models according to their evidence – provides a principled way to solve this problem. Importantly, because model evidence is determined by both the accuracy and complexity of the model, optimal inference requires that these be traded off against one another. This means an agent’s behaviour should show an equivalent balance. We hypothesise that Bayesian model averaging plays an important role in cognition, given that it is both optimal and realisable within a plausible neuronal architecture. We outline model averaging and how it might be implemented, and then explore a number of implications for brain and behaviour. In particular, we propose that model averaging can explain a number of apparently suboptimal phenomena within the framework of approximate (bounded Bayesian inference, focussing particularly upon the relationship between goal-directed and habitual behaviour.

  15. Efficient Bayesian inference for ARFIMA processes

    Science.gov (United States)

    Graves, T.; Gramacy, R. B.; Franzke, C. L. E.; Watkins, N. W.

    2015-03-01

    Many geophysical quantities, like atmospheric temperature, water levels in rivers, and wind speeds, have shown evidence of long-range dependence (LRD). LRD means that these quantities experience non-trivial temporal memory, which potentially enhances their predictability, but also hampers the detection of externally forced trends. Thus, it is important to reliably identify whether or not a system exhibits LRD. In this paper we present a modern and systematic approach to the inference of LRD. Rather than Mandelbrot's fractional Gaussian noise, we use the more flexible Autoregressive Fractional Integrated Moving Average (ARFIMA) model which is widely used in time series analysis, and of increasing interest in climate science. Unlike most previous work on the inference of LRD, which is frequentist in nature, we provide a systematic treatment of Bayesian inference. In particular, we provide a new approximate likelihood for efficient parameter inference, and show how nuisance parameters (e.g. short memory effects) can be integrated over in order to focus on long memory parameters, and hypothesis testing more directly. We illustrate our new methodology on the Nile water level data, with favorable comparison to the standard estimators.

  16. Campbell's and Rubin's Perspectives on Causal Inference

    Science.gov (United States)

    West, Stephen G.; Thoemmes, Felix

    2010-01-01

    Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…

  17. HIERARCHICAL PROBABILISTIC INFERENCE OF COSMIC SHEAR

    International Nuclear Information System (INIS)

    Schneider, Michael D.; Dawson, William A.; Hogg, David W.; Marshall, Philip J.; Bard, Deborah J.; Meyers, Joshua; Lang, Dustin

    2015-01-01

    Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics

  18. Interest, Inferences, and Learning from Texts

    Science.gov (United States)

    Clinton, Virginia; van den Broek, Paul

    2012-01-01

    Topic interest and learning from texts have been found to be positively associated with each other. However, the reason for this positive association is not well understood. The purpose of this study is to examine a cognitive process, inference generation, that could explain the positive association between interest and learning from texts. In…

  19. Ignorability in Statistical and Probabilistic Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred

    2005-01-01

    When dealing with incomplete data in statistical learning, or incomplete observations in probabilistic inference, one needs to distinguish the fact that a certain event is observed from the fact that the observed event has happened. Since the modeling and computational complexities entailed...

  20. Evolutionary inference via the Poisson Indel Process.

    Science.gov (United States)

    Bouchard-Côté, Alexandre; Jordan, Michael I

    2013-01-22

    We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.

  1. Culture and Pragmatic Inference in Interpersonal Communication

    African Journals Online (AJOL)

    cognitive process, and that the human capacity for inference is crucially important ... been noted that research in interpersonal communication is currently pushing the ... communicative actions, the social-cultural world of everyday life is not only ... personal experiences of the authors', as documented over time and recreated ...

  2. Inference and the Introductory Statistics Course

    Science.gov (United States)

    Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross

    2011-01-01

    This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…

  3. Cortical information flow during inferences of agency

    NARCIS (Netherlands)

    Dogge, Myrthel; Hofman, Dennis; Boersma, Maria; Dijkerman, H Chris; Aarts, Henk

    2014-01-01

    Building on the recent finding that agency experiences do not merely rely on sensorimotor information but also on cognitive cues, this exploratory study uses electroencephalographic recordings to examine functional connectivity during agency inference processing in a setting where action and outcome

  4. Quasi-Experimental Designs for Causal Inference

    Science.gov (United States)

    Kim, Yongnam; Steiner, Peter

    2016-01-01

    When randomized experiments are infeasible, quasi-experimental designs can be exploited to evaluate causal treatment effects. The strongest quasi-experimental designs for causal inference are regression discontinuity designs, instrumental variable designs, matching and propensity score designs, and comparative interrupted time series designs. This…

  5. The importance of learning when making inferences

    Directory of Open Access Journals (Sweden)

    Jorg Rieskamp

    2008-03-01

    Full Text Available The assumption that people possess a repertoire of strategies to solve the inference problems they face has been made repeatedly. The experimental findings of two previous studies on strategy selection are reexamined from a learning perspective, which argues that people learn to select strategies for making probabilistic inferences. This learning process is modeled with the strategy selection learning (SSL theory, which assumes that people develop subjective expectancies for the strategies they have. They select strategies proportional to their expectancies, which are updated on the basis of experience. For the study by Newell, Weston, and Shanks (2003 it can be shown that people did not anticipate the success of a strategy from the beginning of the experiment. Instead, the behavior observed at the end of the experiment was the result of a learning process that can be described by the SSL theory. For the second study, by Br"oder and Schiffer (2006, the SSL theory is able to provide an explanation for why participants only slowly adapted to new environments in a dynamic inference situation. The reanalysis of the previous studies illustrates the importance of learning for probabilistic inferences.

  6. Colligation, Or the Logical Inference of Interconnection

    DEFF Research Database (Denmark)

    Falster, Peter

    1998-01-01

    laws or assumptions. Yet interconnection as an abstract concept seems to be without scientific underpinning in pure logic. Adopting a historical viewpoint, our aim is to show that the reasoning of interconnection may be identified with a neglected kind of logical inference, called "colligation...

  7. Colligation or, The Logical Inference of Interconnection

    DEFF Research Database (Denmark)

    Franksen, Ole Immanuel; Falster, Peter

    2000-01-01

    laws or assumptions. Yet interconnection as an abstract concept seems to be without scientific underpinning in oure logic. Adopting a historical viewpoint, our aim is to show that the reasoning of interconnection may be identified with a neglected kind of logical inference, called "colligation...

  8. Inferring motion and location using WLAN RSSI

    NARCIS (Netherlands)

    Kavitha Muthukrishnan, K.; van der Zwaag, B.J.; Havinga, Paul J.M.; Fuller, R.; Koutsoukos, X.

    2009-01-01

    We present novel algorithms to infer movement by making use of inherent fluctuations in the received signal strengths from existing WLAN infrastructure. We evaluate the performance of the presented algorithms based on classification metrics such as recall and precision using annotated traces

  9. Emotions as within or between people? Cultural variation in lay theories of emotion expression and inference.

    Science.gov (United States)

    Uchida, Yukiko; Townsend, Sarah S M; Rose Markus, Hazel; Bergsieker, Hilary B

    2009-11-01

    Four studies using open-ended and experimental methods test the hypothesis that in Japanese contexts, emotions are understood as between people, whereas in American contexts, emotions are understood as primarily within people. Study 1 analyzed television interviews of Olympic athletes. When asked about their relationships, Japanese athletes used significantly more emotion words than American athletes. This difference was not significant when questions asked directly about athletes' feelings. In Study 2, when describing an athlete's emotional reaction to winning, Japanese participants implicated others more often than American participants. After reading an athlete's self-description, Japanese participants inferred more emotions when the athlete mentioned relationships, whereas American participants inferred more emotions when the athlete focused only on herself (Study 3). Finally, when viewing images of athletes, Japanese participants inferred more emotions for athletes pictured with teammates, whereas American participants inferred more emotions for athletes pictured alone (Studies 4a and 4b).

  10. Terminal-Dependent Statistical Inference for the FBSDEs Models

    Directory of Open Access Journals (Sweden)

    Yunquan Song

    2014-01-01

    Full Text Available The original stochastic differential equations (OSDEs and forward-backward stochastic differential equations (FBSDEs are often used to model complex dynamic process that arise in financial, ecological, and many other areas. The main difference between OSDEs and FBSDEs is that the latter is designed to depend on a terminal condition, which is a key factor in some financial and ecological circumstances. It is interesting but challenging to estimate FBSDE parameters from noisy data and the terminal condition. However, to the best of our knowledge, the terminal-dependent statistical inference for such a model has not been explored in the existing literature. We proposed a nonparametric terminal control variables estimation method to address this problem. The reason why we use the terminal control variables is that the newly proposed inference procedures inherit the terminal-dependent characteristic. Through this new proposed method, the estimators of the functional coefficients of the FBSDEs model are obtained. The asymptotic properties of the estimators are also discussed. Simulation studies show that the proposed method gives satisfying estimates for the FBSDE parameters from noisy data and the terminal condition. A simulation is performed to test the feasibility of our method.

  11. Inferring relationships between pairs of individuals from locus heterozygosities

    Directory of Open Access Journals (Sweden)

    Spinetti Isabella

    2002-11-01

    Full Text Available Abstract Background The traditional exact method for inferring relationships between individuals from genetic data is not easily applicable in all situations that may be encountered in several fields of applied genetics. This study describes an approach that gives affordable results and is easily applicable; it is based on the probabilities that two individuals share 0, 1 or both alleles at a locus identical by state. Results We show that these probabilities (zi depend on locus heterozygosity (H, and are scarcely affected by variation of the distribution of allele frequencies. This allows us to obtain empirical curves relating zi's to H for a series of common relationships, so that the likelihood ratio of a pair of relationships between any two individuals, given their genotypes at a locus, is a function of a single parameter, H. Application to large samples of mother-child and full-sib pairs shows that the statistical power of this method to infer the correct relationship is not much lower than the exact method. Analysis of a large database of STR data proves that locus heterozygosity does not vary significantly among Caucasian populations, apart from special cases, so that the likelihood ratio of the more common relationships between pairs of individuals may be obtained by looking at tabulated zi values. Conclusions A simple method is provided, which may be used by any scientist with the help of a calculator or a spreadsheet to compute the likelihood ratios of common alternative relationships between pairs of individuals.

  12. Mixed integer linear programming for maximum-parsimony phylogeny inference.

    Science.gov (United States)

    Sridhar, Srinath; Lam, Fumei; Blelloch, Guy E; Ravi, R; Schwartz, Russell

    2008-01-01

    Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we present two integer linear programming (ILP) formulations to find the most parsimonious phylogenetic tree from a set of binary variation data. One method uses a flow-based formulation that can produce exponential numbers of variables and constraints in the worst case. The method has, however, proven extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods, solving several large mtDNA and Y-chromosome instances within a few seconds and giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality. An alternative formulation establishes that the problem can be solved with a polynomial-sized ILP. We further present a web server developed based on the exponential-sized ILP that performs fast maximum parsimony inferences and serves as a front end to a database of precomputed phylogenies spanning the human genome.

  13. Active inference, sensory attenuation and illusions.

    Science.gov (United States)

    Brown, Harriet; Adams, Rick A; Parees, Isabel; Edwards, Mark; Friston, Karl

    2013-11-01

    Active inference provides a simple and neurobiologically plausible account of how action and perception are coupled in producing (Bayes) optimal behaviour. This can be seen most easily as minimising prediction error: we can either change our predictions to explain sensory input through perception. Alternatively, we can actively change sensory input to fulfil our predictions. In active inference, this action is mediated by classical reflex arcs that minimise proprioceptive prediction error created by descending proprioceptive predictions. However, this creates a conflict between action and perception; in that, self-generated movements require predictions to override the sensory evidence that one is not actually moving. However, ignoring sensory evidence means that externally generated sensations will not be perceived. Conversely, attending to (proprioceptive and somatosensory) sensations enables the detection of externally generated events but precludes generation of actions. This conflict can be resolved by attenuating the precision of sensory evidence during movement or, equivalently, attending away from the consequences of self-made acts. We propose that this Bayes optimal withdrawal of precise sensory evidence during movement is the cause of psychophysical sensory attenuation. Furthermore, it explains the force-matching illusion and reproduces empirical results almost exactly. Finally, if attenuation is removed, the force-matching illusion disappears and false (delusional) inferences about agency emerge. This is important, given the negative correlation between sensory attenuation and delusional beliefs in normal subjects--and the reduction in the magnitude of the illusion in schizophrenia. Active inference therefore links the neuromodulatory optimisation of precision to sensory attenuation and illusory phenomena during the attribution of agency in normal subjects. It also provides a functional account of deficits in syndromes characterised by false inference

  14. A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data

    Science.gov (United States)

    Gallopin, Mélina; Rau, Andrea; Jaffrézic, Florence

    2013-01-01

    Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data. PMID:24147011

  15. A feedback framework for protein inference with peptides identified from tandem mass spectra

    Directory of Open Access Journals (Sweden)

    Shi Jinhong

    2012-11-01

    Full Text Available Abstract Background Protein inference is an important computational step in proteomics. There exists a natural nest relationship between protein inference and peptide identification, but these two steps are usually performed separately in existing methods. We believe that both peptide identification and protein inference can be improved by exploring such nest relationship. Results In this study, a feedback framework is proposed to process peptide identification reports from search engines, and an iterative method is implemented to exemplify the processing of Sequest peptide identification reports according to the framework. The iterative method is verified on two datasets with known validity of proteins and peptides, and compared with ProteinProphet and PeptideProphet. The results have shown that not only can the iterative method infer more true positive and less false positive proteins than ProteinProphet, but also identify more true positive and less false positive peptides than PeptideProphet. Conclusions The proposed iterative method implemented according to the feedback framework can unify and improve the results of peptide identification and protein inference.

  16. Prediksi Kelulusan Mata Kuliah Menggunakan Hybrid Fuzzy Inference System

    Directory of Open Access Journals (Sweden)

    Abidatul Izzah

    2016-07-01

    Full Text Available AbstrakPerguruan Tinggi merupakan salah satu institusi yang menyimpan data yang sangat informatif jika diolah secara baik. Prediksi kelulusan mahasiswa merupakan kasus di Perguruan Tinggi yang cukup banyak diteliti. Dengan mengetahui prediksi status kelulusan mahasiswa di tengah semester, dosen dapat mengantisipasi atau memberi perhatian khusus pada siswa yang diprediksi tidak lulus. Metode yang digunakan sangat bervariatif termasuk metode Fuzzy Inference System (FIS. Namun dalam implementasinya, proses pembangkitan rule fuzzy sering dilakukan secara random atau berdasarkan pemahaman pakar sehingga tidak merepresentasikan sebaran data. Oleh karena itu, dalam penelitian ini digunakan teknik Decision Tree (DT untuk membangkitkan rule. Dari uraian tersebut, penelitian bertujuan untuk memprediksi kelulusan mata kuliah menggunakan hybrid FIS dan DT. Data yang digunakan dalam penelitian ini adalah data nilai Posttest, Tugas, Kuis, dan UTS dari 106 mahasiswa Politeknik Kediri pengikut mata kuliah Algoritma dan Struktur Data. Penelitian ini diawali dari membangkitkan 5 rule yang selanjutnya digunakan dalam inferensi. Tahap selanjutnya adalah implementasi FIS dengan tahapan fuzzifikasi, inferensi, dan defuzzifikasi. Hasil yang diperoleh adalah akurasi, sensitivitas, dan spesifisitas  masing-masing adalah 94.33%, 96.55%, dan 84.21%.Kata kunci: Decision Tree, Educational Data Mining, Fuzzy Inference System, Prediksi. AbstractCollege is an institution that holds very informative data if it mined properly. Prediction about student’s graduation is a common case that many discussed. Having the predictions of student’s graduation in the middle semester, lecturer will anticipate or give some special attention to students who would be not passed. The method used to prediction is very varied including Fuzzy Inference System (FIS. However, fuzzy rule process is often generated randomly or based on knowledge experts that not represent the data distribution

  17. Using the Weibull distribution reliability, modeling and inference

    CERN Document Server

    McCool, John I

    2012-01-01

    Understand and utilize the latest developments in Weibull inferential methods While the Weibull distribution is widely used in science and engineering, most engineers do not have the necessary statistical training to implement the methodology effectively. Using the Weibull Distribution: Reliability, Modeling, and Inference fills a gap in the current literature on the topic, introducing a self-contained presentation of the probabilistic basis for the methodology while providing powerful techniques for extracting information from data. The author explains the use of the Weibull distribution

  18. Inference on inspiral signals using LISA MLDC data

    International Nuclear Information System (INIS)

    Roever, Christian; Stroeer, Alexander; Bloomer, Ed; Christensen, Nelson; Clark, James; Hendry, Martin; Messenger, Chris; Meyer, Renate; Pitkin, Matt; Toher, Jennifer; Umstaetter, Richard; Vecchio, Alberto; Veitch, John; Woan, Graham

    2007-01-01

    In this paper, we describe a Bayesian inference framework for the analysis of data obtained by LISA. We set up a model for binary inspiral signals as defined for the Mock LISA Data Challenge 1.2 (MLDC), and implemented a Markov chain Monte Carlo (MCMC) algorithm to facilitate exploration and integration of the posterior distribution over the nine-dimensional parameter space. Here, we present intermediate results showing how, using this method, information about the nine parameters can be extracted from the data

  19. INFERENCE AND SENSITIVITY IN STOCHASTIC WIND POWER FORECAST MODELS.

    KAUST Repository

    Elkantassi, Soumaya

    2017-10-03

    Reliable forecasting of wind power generation is crucial to optimal control of costs in generation of electricity with respect to the electricity demand. Here, we propose and analyze stochastic wind power forecast models described by parametrized stochastic differential equations, which introduce appropriate fluctuations in numerical forecast outputs. We use an approximate maximum likelihood method to infer the model parameters taking into account the time correlated sets of data. Furthermore, we study the validity and sensitivity of the parameters for each model. We applied our models to Uruguayan wind power production as determined by historical data and corresponding numerical forecasts for the period of March 1 to May 31, 2016.

  20. INFERENCE AND SENSITIVITY IN STOCHASTIC WIND POWER FORECAST MODELS.

    KAUST Repository

    Elkantassi, Soumaya; Kalligiannaki, Evangelia; Tempone, Raul

    2017-01-01

    Reliable forecasting of wind power generation is crucial to optimal control of costs in generation of electricity with respect to the electricity demand. Here, we propose and analyze stochastic wind power forecast models described by parametrized stochastic differential equations, which introduce appropriate fluctuations in numerical forecast outputs. We use an approximate maximum likelihood method to infer the model parameters taking into account the time correlated sets of data. Furthermore, we study the validity and sensitivity of the parameters for each model. We applied our models to Uruguayan wind power production as determined by historical data and corresponding numerical forecasts for the period of March 1 to May 31, 2016.

  1. ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM FOR END MILLING

    Directory of Open Access Journals (Sweden)

    ANGELOS P. MARKOPOULOS

    2016-09-01

    Full Text Available Soft computing is commonly used as a modelling method in various technological areas. Methods such as Artificial Neural Networks and Fuzzy Logic have found application in manufacturing technology as well. NeuroFuzzy systems, aimed to combine the benefits of both the aforementioned Artificial Intelligence methods, are a subject of research lately as have proven to be superior compared to other methods. In this paper an adaptive neuro-fuzzy inference system for the prediction of surface roughness in end milling is presented. Spindle speed, feed rate, depth of cut and vibrations were used as independent input variables, while roughness parameter Ra as dependent output variable. Several variations are tested and the results of the optimum system are presented. Final results indicate that the proposed model can accurately predict surface roughness, even for input that was not used in training.

  2. Phase inductance estimation for switched reluctance motor using adaptive neuro-fuzzy inference system

    International Nuclear Information System (INIS)

    Daldaban, Ferhat; Ustkoyuncu, Nurettin; Guney, Kerim

    2006-01-01

    A new method based on an adaptive neuro-fuzzy inference system (ANFIS) for estimating the phase inductance of switched reluctance motors (SRMs) is presented. The ANFIS has the advantages of expert knowledge of the fuzzy inference system and the learning capability of neural networks. A hybrid learning algorithm, which combines the least square method and the back propagation algorithm, is used to identify the parameters of the ANFIS. The rotor position and the phase current of the 6/4 pole SRM are used to predict the phase inductance. The phase inductance results predicted by the ANFIS are in excellent agreement with the results of the finite element method

  3. Nonparametric predictive inference for combining diagnostic tests with parametric copula

    Science.gov (United States)

    Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.

    2017-09-01

    Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.

  4. Inference of segmented color and texture description by tensor voting.

    Science.gov (United States)

    Jia, Jiaya; Tang, Chi-Keung

    2004-06-01

    A robust synthesis method is proposed to automatically infer missing color and texture information from a damaged 2D image by (N)D tensor voting (N > 3). The same approach is generalized to range and 3D data in the presence of occlusion, missing data and noise. Our method translates texture information into an adaptive (N)D tensor, followed by a voting process that infers noniteratively the optimal color values in the (N)D texture space. A two-step method is proposed. First, we perform segmentation based on insufficient geometry, color, and texture information in the input, and extrapolate partitioning boundaries by either 2D or 3D tensor voting to generate a complete segmentation for the input. Missing colors are synthesized using (N)D tensor voting in each segment. Different feature scales in the input are automatically adapted by our tensor scale analysis. Results on a variety of difficult inputs demonstrate the effectiveness of our tensor voting approach.

  5. Using inferred probabilities to measure the accuracy of imprecise forecasts

    Directory of Open Access Journals (Sweden)

    Paul Lehner

    2012-11-01

    Full Text Available Research on forecasting is effectively limited to forecasts that are expressed with clarity; which is to say that the forecasted event must be sufficiently well-defined so that it can be clearly resolved whether or not the event occurred and forecasts certainties are expressed as quantitative probabilities. When forecasts are expressed with clarity, then quantitative measures (scoring rules, calibration, discrimination, etc. can be used to measure forecast accuracy, which in turn can be used to measure the comparative accuracy of different forecasting methods. Unfortunately most real world forecasts are not expressed clearly. This lack of clarity extends to both the description of the forecast event and to the use of vague language to express forecast certainty. It is thus difficult to assess the accuracy of most real world forecasts, and consequently the accuracy the methods used to generate real world forecasts. This paper addresses this deficiency by presenting an approach to measuring the accuracy of imprecise real world forecasts using the same quantitative metrics routinely used to measure the accuracy of well-defined forecasts. To demonstrate applicability, the Inferred Probability Method is applied to measure the accuracy of forecasts in fourteen documents examining complex political domains. Key words: inferred probability, imputed probability, judgment-based forecasting, forecast accuracy, imprecise forecasts, political forecasting, verbal probability, probability calibration.

  6. Statistical inference for imperfect maintenance models with missing data

    International Nuclear Information System (INIS)

    Dijoux, Yann; Fouladirad, Mitra; Nguyen, Dinh Tuan

    2016-01-01

    The paper considers complex industrial systems with incomplete maintenance history. A corrective maintenance is performed after the occurrence of a failure and its efficiency is assumed to be imperfect. In maintenance analysis, the databases are not necessarily complete. Specifically, the observations are assumed to be window-censored. This situation arises relatively frequently after the purchase of a second-hand unit or in the absence of maintenance record during the burn-in phase. The joint assessment of the wear-out of the system and the maintenance efficiency is investigated under missing data. A review along with extensions of statistical inference procedures from an observation window are proposed in the case of perfect and minimal repair using the renewal and Poisson theories, respectively. Virtual age models are employed to model imperfect repair. In this framework, new estimation procedures are developed. In particular, maximum likelihood estimation methods are derived for the most classical virtual age models. The benefits of the new estimation procedures are highlighted by numerical simulations and an application to a real data set. - Highlights: • New estimation procedures for window-censored observations and imperfect repair. • Extensions of inference methods for perfect and minimal repair with missing data. • Overview of maximum likelihood method with complete and incomplete observations. • Benefits of the new procedures highlighted by simulation studies and real application.

  7. Information Geometry, Inference Methods and Chaotic Energy Levels Statistics

    OpenAIRE

    Cafaro, Carlo

    2008-01-01

    In this Letter, we propose a novel information-geometric characterization of chaotic (integrable) energy level statistics of a quantum antiferromagnetic Ising spin chain in a tilted (transverse) external magnetic field. Finally, we conjecture our results might find some potential physical applications in quantum energy level statistics.

  8. Terminal location planning in intermodal transportation with Bayesian inference method.

    Science.gov (United States)

    2014-01-01

    In this project, we consider the planning of terminal locations for intermodal transportation systems. For a given number of potential terminals and coexisted multiple service pairs, we find the set of appropriate terminals and their locations that p...

  9. Inferring the distribution and demography of an invasive species from sighting data: the red fox incursion into Tasmania.

    Directory of Open Access Journals (Sweden)

    Peter Caley

    Full Text Available A recent study has inferred that the red fox (Vulpes vulpes is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes--the last of which occurred in 2006, or hunter killed foxes--the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought.

  10. Inferring the distribution and demography of an invasive species from sighting data: the red fox incursion into Tasmania.

    Science.gov (United States)

    Caley, Peter; Ramsey, David S L; Barry, Simon C

    2015-01-01

    A recent study has inferred that the red fox (Vulpes vulpes) is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes--the last of which occurred in 2006, or hunter killed foxes--the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought.

  11. Inference of gene-phenotype associations via protein-protein interaction and orthology.

    Directory of Open Access Journals (Sweden)

    Panwen Wang

    Full Text Available One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.

  12. CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.

    Science.gov (United States)

    Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming

    2014-11-30

    Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .

  13. Likelihood inference for unions of interacting discs

    DEFF Research Database (Denmark)

    Møller, Jesper; Helisová, Katarina

    To the best of our knowledge, this is the first paper which discusses likelihood inference or a random set using a germ-grain model, where the individual grains are unobservable edge effects occur, and other complications appear. We consider the case where the grains form a disc process modelled...... is specified with respect to a given marked Poisson model (i.e. a Boolean model). We show how edge effects and other complications can be handled by considering a certain conditional likelihood. Our methodology is illustrated by analyzing Peter Diggle's heather dataset, where we discuss the results...... of simulation-based maximum likelihood inference and the effect of specifying different reference Poisson models....

  14. An Intuitive Dashboard for Bayesian Network Inference

    International Nuclear Information System (INIS)

    Reddy, Vikas; Farr, Anna Charisse; Wu, Paul; Mengersen, Kerrie; Yarlagadda, Prasad K D V

    2014-01-01

    Current Bayesian network software packages provide good graphical interface for users who design and develop Bayesian networks for various applications. However, the intended end-users of these networks may not necessarily find such an interface appealing and at times it could be overwhelming, particularly when the number of nodes in the network is large. To circumvent this problem, this paper presents an intuitive dashboard, which provides an additional layer of abstraction, enabling the end-users to easily perform inferences over the Bayesian networks. Unlike most software packages, which display the nodes and arcs of the network, the developed tool organises the nodes based on the cause-and-effect relationship, making the user-interaction more intuitive and friendly. In addition to performing various types of inferences, the users can conveniently use the tool to verify the behaviour of the developed Bayesian network. The tool has been developed using QT and SMILE libraries in C++

  15. Bayesianism and inference to the best explanation

    Directory of Open Access Journals (Sweden)

    Valeriano IRANZO

    2008-01-01

    Full Text Available Bayesianism and Inference to the best explanation (IBE are two different models of inference. Recently there has been some debate about the possibility of “bayesianizing” IBE. Firstly I explore several alternatives to include explanatory considerations in Bayes’s Theorem. Then I distinguish two different interpretations of prior probabilities: “IBE-Bayesianism” (IBE-Bay and “frequentist-Bayesianism” (Freq-Bay. After detailing the content of the latter, I propose a rule for assessing the priors. I also argue that Freq-Bay: (i endorses a role for explanatory value in the assessment of scientific hypotheses; (ii avoids a purely subjectivist reading of prior probabilities; and (iii fits better than IBE-Bayesianism with two basic facts about science, i.e., the prominent role played by empirical testing and the existence of many scientific theories in the past that failed to fulfil their promises and were subsequently abandoned.

  16. Dopamine, reward learning, and active inference

    Directory of Open Access Journals (Sweden)

    Thomas eFitzgerald

    2015-11-01

    Full Text Available Temporal difference learning models propose phasic dopamine signalling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We offer a resolution to this paradox based on an hypothesis that dopamine encodes the precision of beliefs about alternative actions, and thus controls the outcome-sensitivity of behaviour. We extend an active inference scheme for solving Markov decision processes to include learning, and show that simulated dopamine dynamics strongly resemble those actually observed during instrumental conditioning. Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states. Our formal approach provides a novel and parsimonious reconciliation of apparently divergent experimental findings.

  17. Dopamine, reward learning, and active inference.

    Science.gov (United States)

    FitzGerald, Thomas H B; Dolan, Raymond J; Friston, Karl

    2015-01-01

    Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We offer a resolution to this paradox based on an hypothesis that dopamine encodes the precision of beliefs about alternative actions, and thus controls the outcome-sensitivity of behavior. We extend an active inference scheme for solving Markov decision processes to include learning, and show that simulated dopamine dynamics strongly resemble those actually observed during instrumental conditioning. Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states. Our formal approach provides a novel and parsimonious reconciliation of apparently divergent experimental findings.

  18. Inferring genetic interactions from comparative fitness data.

    Science.gov (United States)

    Crona, Kristina; Gavryushkin, Alex; Greene, Devin; Beerenwinkel, Niko

    2017-12-20

    Darwinian fitness is a central concept in evolutionary biology. In practice, however, it is hardly possible to measure fitness for all genotypes in a natural population. Here, we present quantitative tools to make inferences about epistatic gene interactions when the fitness landscape is only incompletely determined due to imprecise measurements or missing observations. We demonstrate that genetic interactions can often be inferred from fitness rank orders, where all genotypes are ordered according to fitness, and even from partial fitness orders. We provide a complete characterization of rank orders that imply higher order epistasis. Our theory applies to all common types of gene interactions and facilitates comprehensive investigations of diverse genetic interactions. We analyzed various genetic systems comprising HIV-1, the malaria-causing parasite Plasmodium vivax , the fungus Aspergillus niger , and the TEM-family of β-lactamase associated with antibiotic resistance. For all systems, our approach revealed higher order interactions among mutations.

  19. An emergent approach to analogical inference

    Science.gov (United States)

    Thibodeau, Paul H.; Flusberg, Stephen J.; Glick, Jeremy J.; Sternberg, Daniel A.

    2013-03-01

    In recent years, a growing number of researchers have proposed that analogy is a core component of human cognition. According to the dominant theoretical viewpoint, analogical reasoning requires a specific suite of cognitive machinery, including explicitly coded symbolic representations and a mapping or binding mechanism that operates over these representations. Here we offer an alternative approach: we find that analogical inference can emerge naturally and spontaneously from a relatively simple, error-driven learning mechanism without the need to posit any additional analogy-specific machinery. The results also parallel findings from the developmental literature on analogy, demonstrating a shift from an initial reliance on surface feature similarity to the use of relational similarity later in training. Variants of the model allow us to consider and rule out alternative accounts of its performance. We conclude by discussing how these findings can potentially refine our understanding of the processes that are required to perform analogical inference.

  20. Statistical inference from imperfect photon detection

    International Nuclear Information System (INIS)

    Audenaert, Koenraad M R; Scheel, Stefan

    2009-01-01

    We consider the statistical properties of photon detection with imperfect detectors that exhibit dark counts and less than unit efficiency, in the context of tomographic reconstruction. In this context, the detectors are used to implement certain positive operator-valued measures (POVMs) that would allow us to reconstruct the quantum state or quantum process under consideration. Here we look at the intermediate step of inferring outcome probabilities from measured outcome frequencies, and show how this inference can be performed in a statistically sound way in the presence of detector imperfections. Merging outcome probabilities for different sets of POVMs into a consistent quantum state picture has been treated elsewhere (Audenaert and Scheel 2009 New J. Phys. 11 023028). Single-photon pulsed measurements as well as continuous wave measurements are covered.